Aaron Klotz at Mozilla

My adventures as a member of Mozilla’s GeckoView Team

Coming Around Full Circle

| Comments

One thing about me that most Mozillians don’t know is that, when I first applied to work at MoCo, I had applied to work on the mobile platform. When all was said and done, it was decided at the time that I would be a better fit for an opening on Taras Glek’s platform performance team.

My first day at Mozilla was October 15, 2012 – I will be celebrating my seventh anniversary at MoCo in just a couple short weeks! Some people with similar tenures have suggested to me that we are now “old guard,” but I’m not sure that I feel that way! Anyway, I digress.

The platform performance team eventually evolved into a desktop-focused performance team by late 2013. By the end of 2015 I had decided that it was time for a change, and by March 2016 I had moved over to work for Jim Mathies, focusing on Gecko integration with Windows. I ended up spending the next twenty or so months helping the accessibility team port their Windows implementation over to multiprocess.

Once Firefox Quantum 57 hit the streets, I scoped out and provided technical leadership for the InjectEject project, whose objective was to tackle some of the root problems with DLL injection that were causing us grief in Windows-land.

I am proud to say that, over the past three years on Jim’s team, I have done the best work of my career. I’d like to thank Brad Lassey (now at Google) for his willingness to bring me over to his group, as well as Jim, and David Bolter (a11y manager at the time) for their confidence in me. As somebody who had spent most of his adult life having no confidence in his work whatsoever, their willingness to entrust me with taking on those risks and responsibilities made an enormous difference in my self esteem and my professional life.

Over the course of H1 2019, I began to feel restless again. I knew it was time for another change. What I did not expect was that the agent of that change would be James Willcox, aka Snorp. In Whistler, Snorp planted the seed in my head that I might want to come over to work with him on GeckoView, within the mobile group which David was now managing.

The timing seemed perfect, so I made the decision to move to GeckoView. I had to finish tying up some loose ends with InjectEject, so all the various stakeholders agreed that I’d move over at the end of Q3 2019.

Which brings me to this week, when I officially join the GeckoView team, working for Emily Toop. I find it somewhat amusing that I am now joining the team that evolved from the team that I had originally applied for back in 2012. I have truly come full circle in my career at Mozilla!

So, what’s next?

  • I have a couple of InjectEject bugs that are pretty much finished, but just need some polish and code reviews before landing.

  • For the next month or two at least, I am going to continue to meet weekly with Jim to assist with the transition as he ramps up new staff on the project.

  • I still plan to be the module owner for the Firefox Launcher Process and the MSCOM library, however most day-to-day work will be done by others going forward;

  • I will continue to serve as the mozglue peer in charge of the DLL blocklist and DLL interceptor, with the same caveat.

Switching over to Android from Windows does not mean that I am leaving my Windows experience at the door; I would like to continue to be a resource on that front, so I would encourage people to continue to ask me for advice.

On the other hand, I am very much looking forward to stepping back into the mobile space. My first crack at mobile was as an intern back in 2003, when I was working with some code that had to run on PalmOS 3.0! I have not touched Android since I shipped a couple of utility apps back in 2011, so I am looking forward to learning more about what has changed. I am also looking forward to learning more about native development on Android, which is something that I never really had a chance to try.

As they used to say on Monty Python’s Flying Circus, “And now for something completely different!”

2018 Roundup: Q2, Part 1

| Comments

This is the second post in my “2018 Roundup” series. For an index of all entries, please see my blog entry for Q1.

Refactoring the DLL Interceptor

As I have alluded to previously, Gecko includes a Detours-style API hooking mechanism for Windows. In Gecko, this code is referred to as the “DLL Interceptor.” We use the DLL interceptor to instrument various functions within our own processes. As a prerequisite for future DLL injection mitigations, I needed to spend a good chunk of Q2 refactoring this code. While I was in there, I took the opportunity to improve the interceptor’s memory efficiency, thus benefitting the Fission MemShrink project. [When these changes landed, we were not yet tracking the memory savings, but I will include a rough estimate later in this post.]

A Brief Overview of Detours-style API Hooking

While many distinct function hooking techniques are used in the Windows ecosystem, the Detours-style hook is one of the most effective and most popular. While I am not going to go into too many specifics here, I’d like to offer a quick overview. In this description, “target” is the function being hooked.

Here is what happens when a function is detoured:

  1. Allocate a chunk of memory to serve as a “trampoline.” We must be able to adjust the protection attributes on that memory.

  2. Disassemble enough of the target to make room for a jmp instruction. On 32-bit x86 processors, this requires 5 bytes. x86-64 is more complicated, but generally, to jmp to an absolute address, we try to make room for 13 bytes.

  3. Copy the instructions from step 2 over to the trampoline.

  4. At the beginning of the target function, write a jmp to the hook function.

  5. Append additional instructions to the trampoline that, when executed, will cause the processor to jump back to the first valid instruction after the jmp written in step 4.

  6. If the hook function wants to pass control on to the original target function, it calls the trampoline.

Note that these steps don’t occur exactly in the order specified above; I selected the above ordering in an effort to simplify my description.

Here is my attempt at visualizing the control flow of a detoured function on x86-64:

Refactoring

Previously, the DLL interceptor relied on directly manipulating pointers in order to read and write the various instructions involved in the hook. In bug 1432653 I changed things so that the memory operations are parameterized based on two orthogonal concepts:

  • In-process vs out-of-process memory access: I wanted to be able to abstract reads and writes such that we could optionally set a hook in another process from our own.
  • Virtual memory allocation scheme: I wanted to be able to change how trampoline memory was allocated. Previously, each instance of WindowsDllInterceptor allocated its own page of memory for trampolines, but each instance also typically only sets one or two hooks. This means that most of the 4KiB page was unused. Furthermore, since Windows allocates blocks of pages on a 64KiB boundary, this wasted a lot of precious virtual address space in our 32-bit builds.

By refactoring and parameterizing these operations, we ended up with the following combinations:

  • In-process memory access, each WindowsDllInterceptor instance receives its own trampoline space;
  • In-process memory access, all WindowsDllInterceptor instances within a module share trampoline space;
  • Out-of-process memory access, each WindowsDllInterceptor instance receives its own trampoline space;
  • Out-of-process memory access, all WindowsDllInterceptor instances within a module share trampoline space (currently not implemented as this option is not particularly useful at the moment).

Instead of directly manipulating pointers, we now use instances of ReadOnlyTargetFunction, WritableTargetFunction, and Trampoline to manipulate our code/data. Those classes in turn use the memory management and virtual memory allocation policies to perform the actual reading and writing.

Memory Management Policies

The interceptor now supports two policies, MMPolicyInProcess and MMPolicyOutOfProcess. Each policy must implement the following memory operations:

  • Read
  • Write
  • Change protection attributes
  • Reserve trampoline space
  • Commit trampoline space

MMPolicyInProcess is implemented using memcpy for read and write, VirtualProtect for protection attribute changes, and VirtualAlloc for reserving and committing trampoline space.

MMPolicyOutOfProcess uses ReadProcessMemory and WriteProcessMemory for read and write. As a perf optimization, we try to batch reads and writes together to reduce the system call traffic. We obviously use VirtualProtectEx to adjust protection attributes in the other process.

Out-of-process trampoline reservation and commitment, however, is a bit different and is worth a separate call-out. We allocate trampoline space using shared memory. It is mapped into the local process with read+write permissions using MapViewOfFile. The memory is mapped into the remote process as read+execute using some code that I wrote in bug 1451511 that either uses NtMapViewOfSection or MapViewOfFile2, depending on availability. Individual pages from those chunks are then committed via VirtualAlloc in the local process and VirtualAllocEx in the remote process. This scheme enables us to read and write to trampoline memory directly, without needing to do cross-process reads and writes!

VM Sharing Policies

The code for these policies is a lot simpler than the code for the memory management policies. We now have VMSharingPolicyUnique and VMSharingPolicyShared. Each of these policies must implement the following operations:

  • Reserve space for up to N trampolines of size K;
  • Obtain a Trampoline object for the next available K-byte trampoline slot;
  • Return an iterable collection of all extant trampolines.

VMSharingPolicyShared is actually implemented by delegating to a static instance of VMSharingPolicyUnique.

Implications of Refactoring

To determine the performance implications, I added timings to our DLL Interceptor unit test. I was very happy to see that, despite the additional layers of abstraction, the C++ compiler’s optimizer was doing its job: There was no performance impact whatsoever!

Once the refactoring was complete, I switched the default VM Sharing Policy for WindowsDllInterceptor over to VMSharingPolicyShared in bug 1451524.

Browsing today’s mozilla-central tip, I count 14 locations where we instantiate interceptors inside xul.dll. Given that not all interceptors are necessarily instantiated at once, I am now offering a worst-case back-of-the-napkin estimate of the memory savings:

  • Each interceptor would likely be consuming 4KiB (most of which is unused) of committed VM. Due to Windows’ 64 KiB allocation guanularity, each interceptor would be leaving a further 60KiB of address space in a free but unusable state. Assuming all 14 interceptors were actually instantiated, they would thus consume a combined 56KiB of committed VM and 840KiB of free but unusable address space.
  • By sharing trampoline VM, the interceptors would consume only 4KiB combined and waste only 60KiB of address space, thus yielding savings of 52KiB in committed memory and 780KiB in addressable memory.

Oh, and One More Thing

Another problem that I discovered during this refactoring was bug 1459335. It turns out that some of the interceptor’s callers were not distinguishing between “I have not set this hook yet” and “I attempted to set this hook but it failed” scenarios. Across several call sites, I discovered that our code would repeatedly retry to set hooks even when they had previously failed, causing leakage of trampoline space!

To fix this, I modified the interceptor’s interface so that we use one-time initialization APIs to set hooks; since landing this bug, it is no longer possible for clients of the DLL interceptor to set a hook that had previously failed to be set.

Quantifying the memory costs of this bug is… non-trivial, but it suffices to say that fixing this bug probably resulted in the savings of at least a few hundred KiB in committed VM on affected machines.

That’s it for today’s post, folks! Thanks for reading! Coming up in Q2, Part 2: Implementing a Skeletal Launcher Process

2018 Roundup: Q1

| Comments

I had a very busy 2018. So busy, in fact, that I have not been able to devote any time to actually discussing what I worked on! I had intended to write these posts during the end of December, but a hardware failure delayed that until the new year. Alas, here we are in 2019, and I am going to do a series of retrospectives on last year’s work, broken up by quarter.

Here is an index of the remaining entries in this series:

Overview

The general theme of my work in 2018 was dealing with the DLL injection problem: On Windows, third parties love to forcibly load their DLLs into other processes – web browsers in particular, thus making Firefox a primary target.

Many of these libraries tend to alter Firefox processes in ways that hurt the stability and/or performance of our code; many chemspill releases have gone out over the years to deal with these problems. While I could rant for hours over this, the fact is that DLL injection is rampant in the ecosystem of Windows desktop applications and is not going to disappear any time soon. In the meantime, we need to be able to deal with it.

Some astute readers might be ready to send me an email or post a comment about how ignorant I am about the new(-ish) process mitigation policies that are available in newer versions of Windows. While those features are definitely useful, they are not panaceas:

  • We cannot turn on the “Extension Point Disable” policy for users of assistive technologies; screen readers rely heavily on DLL injection using SetWindowsHookEx and SetWinEventHook, both of which are covered by this policy;
  • We could enable the “Microsoft Binary Signature” policy, however that requires us to load our own DLLs first before enabling; once that happens, it is often already too late: other DLLs have already injected themselves by the time we are able to activate this policy. (Note that this could easily be solved if this policy were augmented to also permit loading of any DLL signed by the same organization as that of the process’s executable binary, but Microsoft seems to be unwilling to do this.)
  • The above mitigations are not universally available. They do not help us on Windows 7.

For me, Q1 2018 was all about gathering better data about injected DLLs.

Learning More About DLLs Injected into Firefox

One of our major pain points over the years of dealing with injected DLLs has been that the vendor of the DLL is not always apparent to us. In general, our crash reports and telemetry pings only include the leaf name of the various DLLs on a user’s system. This is intentional on our part: we want to preserve user privacy. On the other hand, this severely limits our ability to determine which party is responsible for a particular DLL.

One avenue for obtaining this information is to look at any digital signature that is embedded in the DLL. By examining the certificate that was used to sign the binary, we can extract the organization of the cert’s owner and include that with our crash reports and telemetry.

In bug 1430857 I wrote a bunch of code that enables us to extract that information from signed binaries using the Windows Authenticode APIs. Originally, in that bug, all of that signature extraction work happened from within the browser itself, while it was running: It would gather the cert information on a background thread while the browser was running, and include those annotations in a subsequent crash dump, should such a thing occur.

After some reflection, I realized that I was not gathering annotations in the right place. As an example, what if an injected DLL were to trigger a crash before the background thread had a chance to grab that DLL’s cert information?

I realized that the best place to gather this information was in a post-processing step after the crash dump had been generated, and in fact we already had the right mechanism for doing so: the minidump-analyzer program was already doing post-processing on Firefox crash dumps before sending them back to Mozilla. I moved the signature extraction and crash annotation code out of Gecko and into the analyzer in bug 1436845.

(As an aside, while working on the minidump-analyzer, I found some problems with how it handled command line arguments: it was assuming that main passes its argv as UTF-8, which is not true on Windows. I fixed those issues in bug 1437156.)

In bug 1434489 I also ended up adding this information to the “modules ping” that we have in telemetry; IIRC this ping is only sent weekly. When the modules ping is requested, we gather the module cert info asynchronously on a background thread.

Finally, I had to modify Socorro (the back-end for crash-stats) to be able to understand the signature annotations and be able to display them via bug 1434495. This required two commits: one to modify the Socorro stackwalker to merge the module signature information into the full crash report, and another to add a “Signed By” column to every report’s “Modules” tab to display the signature information (Note that this column is only present when at least one module in a particular crash report contains signature information).

The end result was very satisfying: Most of the injected DLLs in our Windows crash reports are signed, so it is now much easier to identify their vendors!

This project was very satisifying for me in many ways: First of all, surfacing this information was an itch that I had been wanting to scratch for quite some time. Secondly, this really was a “full stack” project, touching everything from extracting signature info from binaries using C++, all the way up to writing some back-end code in Python and a little touch of front-end stuff to surface the data in the web app.

Note that, while this project focused on Windows because of the severity of the library injection problem on that platform, it would be easy enough to reuse most of this code for macOS builds as well; the only major work for the latter case would be for extracting signature information from a dylib. This is not currently a priority for us, though.

Thanks for reading! Coming up in Q2: Refactoring the DLL Interceptor!

Legacy Firefox Extensions and “Userspace”

| Comments

This week’s release of Firefox Quantum has prompted all kinds of feedback, both positive and negative. That is not surprising to anybody – any software that has a large number of users is going to be a topic for discussion, especially when the release in question is undoubtedly a watershed.

While I have previously blogged about the transition to WebExtensions, now that we have actually passed through the cutoff for legacy extensions, I have decided to add some new commentary on the subject.

One analogy that has been used in the discussion of the extension ecosystem is that of kernelspace and userspace. The crux of the analogy is that Gecko is equivalent to an operating system kernel, and thus extensions are the user-mode programs that run atop that kernel. The argument then follows that Mozilla’s deprecation and removal of legacy extension capabilities is akin to “breaking” userspace. [Some people who say this are using the same tone as Linus does whenever he eviscerates Linux developers who break userspace, which is neither productive nor welcomed by anyone, but I digress.] Unfortunately, that analogy simply does not map to the legacy extension model.

Legacy Extensions as Kernel Modules

The most significant problem with the userspace analogy is that legacy extensions effectively meld with Gecko and become part of Gecko itself. If we accept the premise that Gecko is like a monolithic OS kernel, then we must also accept that the analogical equivalent of loading arbitrary code into that kernel, is the kernel module. Such components are loaded into the kernel and effectively become part of it. Their code runs with full privileges. They break whenever significant changes are made to the kernel itself.

Sound familiar?

Legacy extensions were akin to kernel modules. When there is no abstraction, there can be no such thing as userspace. This is precisely the problem that WebExtensions solves!

Building Out a Legacy API

Maybe somebody out there is thinking, “well what if you took all the APIs that legacy extensions used, turned that into a ‘userspace,’ and then just left that part alone?”

Which APIs? Where do we draw the line? Do we check the code coverage for every legacy addon in AMO and use that to determine what to include?

Remember, there was no abstraction; installed legacy addons are fused to Gecko. If we pledge not to touch anything that legacy addons might touch, then we cannot touch anything at all.

Where do we go from here? Freeze an old version of Gecko and host an entire copy of it inside web content? Compile it to WebAssembly? [Oh God, what have I done?]

If that’s not a maintenance burden, I don’t know what is!

A Kernel Analogy for WebExtensions

Another problem with the legacy-extensions-as-userspace analogy is that it leaves awkward room for web content, whose API is abstract and well-defined. I do not think that it is appropriate to consider web content to be equivalent to a sandboxed application, as sandboxed applications use the same (albeit restricted) API as normal applications. I would suggest that the presence of WebExtensions gives us a better kernel analogy:

  • Gecko is the kernel;
  • WebExtensions are privileged user applications;
  • Web content runs as unprivileged user applications.

In Conclusion

Declaring that legacy extensions are userspace does not make them so. The way that the technology actually worked defies the abstract model that the analogy attempts to impose upon it. On the other hand, we can use the failure of that analogy to explain why WebExtensions are important and construct an extension ecosystem that does fit with that analogy.

Win32 Gotchas

| Comments

For the second time since I have been at Mozilla I have encountered a situation where hooks are called for notifications of a newly created window, but that window has not yet been initialized properly, causing the hooks to behave badly.

The first time was inside our window neutering code in IPC, while the second time was in our accessibility code.

Every time I have seen this, there is code that follows this pattern:

1
2
3
4
5
HWND hwnd = CreateWindowEx(/* ... */);
if (hwnd) {
  // Do some follow-up initialization to hwnd (Using SetProp as an example):
  SetProp(hwnd, "Foo", bar);
}

This seems innocuous enough, right?

The problem is that CreateWindowEx calls hooks. If those hooks then try to do something like GetProp(hwnd, "Foo"), that call is going to fail because the “Foo” prop has not yet been set.

The key takeaway from this is that, if you are creating a new window, you must do any follow-up initialization from within your window proc’s WM_CREATE handler. This will guarantee that your window’s initialization will have completed before any hooks are called.

You might be thinking, “But I don’t set any hooks!” While this may be true, you must not forget about hooks set by third-party code.

“But those hooks won’t know anything about my program’s internals, right?”

Perhaps, perhaps not. But when those hooks fire, they give third-party software the opportunity to run. In some cases, those hooks might even cause the thread to reenter your own code. Your window had better be completely initialized when this happens!

In the case of my latest discovery of this issue in bug 1380471, I made it possible to use a C++11 lambda to simplify this pattern.

CreateWindowEx accepts a lpParam parameter which is then passed to the WM_CREATE handler as the lpCreateParams member of a CREATESTRUCT.

By setting lpParam to a pointer to a std::function<void(HWND)>, we may then supply any callable that we wish for follow-up window initialization.

Using the previous code sample as a baseline, this allows me to revise the code to safely set a property like this:

1
2
3
4
5
6
std::function<void(HWND)> onCreate([](HWND aHwnd) -> void {
  SetProp(aHwnd, "Foo", bar);
});

HWND hwnd = CreateWindowEx(/* ... */, &onCreate);
// At this point is already too late to further initialize hwnd!

Note that since lpParam is always passed during WM_CREATE, which always fires before CreateWindowEx returns, it is safe for onCreate to live on the stack.

I liked this solution for the a11y case because it preserved the locality of the initialization code within the function that called CreateWindowEx; the window proc for this window is implemented in another source file and the follow-up initialization depends on the context surrounding the CreateWindowEx call.

Speaking of window procs, here is how that window’s WM_CREATE handler invokes the callable:

1
2
3
4
5
6
7
8
9
10
11
12
13
switch (uMsg) {
  case WM_CREATE: {
    auto createStruct = reinterpret_cast<CREATESTRUCT*>(lParam);
    auto createProc = reinterpret_cast<std::function<void(HWND)>*>(
      createStruct->lpCreateParams);

    if (createProc && *createProc) {
      (*createProc)(hwnd);
    }

    return 0;
  }
  // ...

TL;DR: If you see a pattern where further initialization work is being done on an HWND after a CreateWindowEx call, move that initialization code to your window’s WM_CREATE handler instead.

Why I Prefer Using CRITICAL_SECTIONs for Mutexes in Windows Nightly Builds

| Comments

In the past I have argued that our Nightly builds, both debug and release, should use CRITICAL_SECTIONs (with full debug info) for our implementation of mozilla::Mutex. I’d like to illustrate some reasons why this is so useful.

They enable more utility in WinDbg extensions

Every time you initialize a CRITICAL_SECTION, Windows inserts the CS’s debug info into a process-wide linked list. This enables their discovery by the Windows debugging engine, and makes the !cs, !critsec, and !locks commands more useful.

They enable profiling of their initialization and acquisition

When the “Create user mode stack trace database” gflag is enabled, Windows records the call stack of the thread that called InitializeCriticalSection on that CS. Windows also records the call stack of the owning thread once it has acquired the CS. This can be very useful for debugging deadlocks.

They track their contention counts

Since every CS has been placed in a process-wide linked list, we may now ask the debugger to dump statistics about every live CS in the process. In particular, we can ask the debugger to output the contention counts for each CS in the process. After running a workload against Nightly, we may then take the contention output, sort it descendingly, and be able to determine which CRITICAL_SECTIONs are the most contended in the process.

We may then want to more closely inspect the hottest CSes to determine whether there is anything that we can do to reduce contention and all of the extra context switching that entails.

In Summary

When we use SRWLOCKs or initialize our CRITICAL_SECTIONs with the CRITICAL_SECTION_NO_DEBUG_INFO flag, we are denying ourselves access to this information. That’s fine on release builds, but on Nightly I think it is worth having around. While I realize that most Mozilla developers have not used this until now (otherwise I would not be writing this blog post), this rich debugger info is one of those things that you do not miss until you do not have it.

For further reading about critical section debug info, check out this archived article from MSDN Magazine.

Asynchronous Plugin Initialization: Requiem

| Comments

My colleague bsmedberg njn is going to be removing asynchronous plugin initialization in bug 1352575. Sadly the feature never became solid enough to remain enabled on release, so we cut our losses and cancelled the project early in 2016. Now that code is just a bunch of dead weight. With the deprecation of non-Flash NPAPI plugins in Firefox 52, our developers are now working on simplifying the remaining NPAPI code as much as possible.

Obviously the removal of that code does not prevent me from discussing some of the more interesting facets of that work.

Today I am going to talk about how async plugin init worked when web content attempted to access a property on a plugin’s scriptable object, when that plugin had not yet completed its asynchronous initialization.

As described on MDN, the DOM queries a plugin for scriptability by calling NPP_GetValue with the NPPVpluginScriptableNPObject constant. With async plugin init, we did not return the true NPAPI scriptable object back to the DOM. Instead we returned a surrogate object. This meant that we did not need to synchronously wait for the plugin to initialize before returning a result back to the DOM.

If the DOM subsequently called into that surrogate object, the surrogate would be forced to synchronize with the plugin. There was a limit on how much fakery the async surrogate could do once the DOM needed a definitive answer – after all, the NPAPI itself is entirely synchronous. While you may question whether the asynchronous surrogate actually bought us any responsiveness, performance profiles and measurements that I took at the time did indeed demonstrate that the asynchronous surrogate did buy us enough additional concurrency to make it worthwhile. A good number of plugin instantiations were able to complete in time before the DOM had made a single invocation on the surrogate.

Once the surrogate object had synchronized with the plugin, it would then mostly act as a pass-through to the plugin’s real NPAPI scriptable object, with one notable exception: property accesses.

The reason for this is not necessarily obvious, so allow me to elaborate:

The DOM usually sets up a scriptable object as follows:


this.__proto__.__proto__.__proto__
  • Where this is the WebIDL object (ie, content’s <embed> element);
  • Whose prototype is the NPAPI scriptable object;
  • Whose prototype is the shared WebIDL prototype;
  • Whose prototype is Object.prototype.

NPAPI is reentrant (some might say insanely reentrant). It is possible (and indeed common) for a plugin to set properties on the WebIDL object from within the plugin’s NPP_New.

Suppose that the DOM tries to access a property on the plugin’s WebIDL object that is normally set by the plugin’s NPP_New. In the asynchronous case, the plugin’s initialization might still be in progress, so that property might not yet exist.

In the case where the property does not yet exist on the WebIDL object, JavaScript fails to retrieve an “own” property. It then moves on to the first prototype and attempts to resolve the property on that. As outlined above, this prototype would actually be the async surrogate. The async surrogate would then be in a situation where it must absolutely produce a definitive result, so this would trigger synchronization with the plugin. At this point the plugin would be guaranteed to have finished initializing.

Now we have a problem: JS was already examining the NPAPI scriptable object when it blocked to synchronize with the plugin. Meanwhile, the plugin went ahead and set properties (including the one that we’re interested in) on the WebIDL object. By the time that JS execution resumes, it would already be looking too far up the prototype chain to see those new properties!

The surrogate needed to be aware of this when it synchronized with the plugin during a property access. If the plugin had already completed its initialization (thus rendering synchronization unnecessary), the surrogate would simply pass the property access on to the real NPAPI scriptable object. On the other hand, if a synchronization was performed, the surrogate would first retry the WebIDL object by querying for the WebIDL object’s “own” properties, and return the own property if it now existed. If no own property existed on the WebIDL object, then the surrogate would revert to its “pass through all the things” behaviour.

If I hadn’t made the asynchronous surrogate scriptable object do that, we would have ended up with a strange situation where the DOM’s initial property access on an embed could fail non-deterministically during page load.

That’s enough chatter for today. I enjoy blogging about my crazy hacks that make the impossible, umm… possible, so maybe I’ll write some more of these in the future.

New Team, New Project

| Comments

In February of this year I switched teams: After 3+ years on Mozilla’s Performance Team, and after having the word “performance” in my job description in some form or another for several years prior to that, I decided that it was time for me to move on to new challenges. Fortunately the Platform org was willing to have me set up shop under the (e10s|sandboxing|platform integration) umbrella.

I am pretty excited about this new role!

My first project is to sort out the accessibility situation under Windows e10s. This started back at Mozlando last December. A number of engineers from across the Platform org, plus me, got together to brainstorm. Not too long after we had all returned home, I ended up making a suggestion on an email thread that has evolved into the core concept that I am currently attempting. As is typical at Mozilla, no deed goes unpunished, so I have been asked to flesh out my ideas. An overview of this plan is available on the wiki.

My hope is that I’ll be able to deliver a working, “version 0.9” type of demo in time for our London all-hands at the end of Q2. Hopefully we will be able to deliver on that!

Some Additional Notes

I am using this section of the blog post to make some additional notes. I don’t feel that these ideas are strong enough to commit to a wiki yet, but I do want them to be publicly available.

Once concern that our colleagues at NVAccess have identified is that the current COM interfaces are too chatty; this is a major reason why screen readers frequently inject libraries into the Firefox address space. If we serve our content a11y objects as remote COM objects, there is concern that performance would suffer. This concern is not due to latency, but rather due to frequency of calls; one function call does not provide sufficient information to the a11y client. As a result, multiple round trips are required to fetch all of the information that is required for a particular DOM node.

My gut feeling about this is that this is probably a legitimate concern, however we cannot make good decisions without quantifying the performance. My plan going forward is to proceed with a naïve implementation of COM remoting to start, followed by work on reducing round trips as necessary.

Smart Proxies

One idea that was discussed is the idea of the content process speculatively sending information to the chrome process that might be needed in the future. For example, if we have an IAccessible, we can expect that multiple properties will be queried off that interface. A smart proxy could ship that data across the RPC channel during marshaling so that querying that additional information does not require additional round trips.

COM makes this possible using “handler marshaling.” I have dug up some information about how to do this and am posting it here for posterity:

House of COM, May 1999 Microsoft Systems Journal;
Implementing and Activating a Handler with Extra Data Supplied by Server on MSDN;
Wicked Code, August 2000 MSDN Magazine. This is not available on the MSDN Magazine website but I have an archived copy on CD-ROM.

New Mozdbgext Command: !iat

| Comments

As of today I have added a new command to mozdbgext: !iat.

The syntax is pretty simple:

!iat <hexadecimal address>

This address shouldn’t be just any pointer; it should be the address of an entry in the current module’s import address table (IAT). These addresses are typically very identifiable by the _imp_ prefix in their symbol names.

The purpose of this extension is to look up the name of the DLL from whom the function is being imported. Furthermore, the extension checks the expected target address of the import with the actual target address of the import. This allows us to detect API hooking via IAT patching.

An Example Session

I fired up a local copy of Nightly, attached a debugger to it, and dumped the call stack of its main thread:


 # ChildEBP RetAddr
00 0018ecd0 765aa32a ntdll!NtWaitForMultipleObjects+0xc
01 0018ee64 761ec47b KERNELBASE!WaitForMultipleObjectsEx+0x10a
02 0018eecc 1406905a USER32!MsgWaitForMultipleObjectsEx+0x17b
03 0018ef18 1408e2c8 xul!mozilla::widget::WinUtils::WaitForMessage+0x5a
04 0018ef84 13fdae56 xul!nsAppShell::ProcessNextNativeEvent+0x188
05 0018ef9c 13fe3778 xul!nsBaseAppShell::DoProcessNextNativeEvent+0x36
06 0018efbc 10329001 xul!nsBaseAppShell::OnProcessNextEvent+0x158
07 0018f0e0 1038e612 xul!nsThread::ProcessNextEvent+0x401
08 0018f0fc 1095de03 xul!NS_ProcessNextEvent+0x62
09 0018f130 108e493d xul!mozilla::ipc::MessagePump::Run+0x273
0a 0018f154 108e48b2 xul!MessageLoop::RunInternal+0x4d
0b 0018f18c 108e448d xul!MessageLoop::RunHandler+0x82
0c 0018f1ac 13fe78f0 xul!MessageLoop::Run+0x1d
0d 0018f1b8 14090f07 xul!nsBaseAppShell::Run+0x50
0e 0018f1c8 1509823f xul!nsAppShell::Run+0x17
0f 0018f1e4 1514975a xul!nsAppStartup::Run+0x6f
10 0018f5e8 15146527 xul!XREMain::XRE_mainRun+0x146a
11 0018f650 1514c04a xul!XREMain::XRE_main+0x327
12 0018f768 00215c1e xul!XRE_main+0x3a
13 0018f940 00214dbd firefox!do_main+0x5ae
14 0018f9e4 0021662e firefox!NS_internal_main+0x18d
15 0018fa18 0021a269 firefox!wmain+0x12e
16 0018fa60 76e338f4 firefox!__tmainCRTStartup+0xfe
17 0018fa74 77d656c3 KERNEL32!BaseThreadInitThunk+0x24
18 0018fabc 77d6568e ntdll!__RtlUserThreadStart+0x2f
19 0018facc 00000000 ntdll!_RtlUserThreadStart+0x1b

Let us examine the code at frame 3:


14069042 6a04            push    4
14069044 68ff1d0000      push    1DFFh
14069049 8b5508          mov     edx,dword ptr [ebp+8]
1406904c 2b55f8          sub     edx,dword ptr [ebp-8]
1406904f 52              push    edx
14069050 6a00            push    0
14069052 6a00            push    0
14069054 ff159cc57d19    call    dword ptr [xul!_imp__MsgWaitForMultipleObjectsEx (197dc59c)]
1406905a 8945f4          mov     dword ptr [ebp-0Ch],eax

Notice the function call to MsgWaitForMultipleObjectsEx occurs indirectly; the call instruction is referencing a pointer within the xul.dll binary itself. This is the IAT entry that corresponds to that function.

Now, if I load mozdbgext, I can take the address of that IAT entry and execute the following command:


0:000> !iat 0x197dc59c
Expected target: USER32.dll!MsgWaitForMultipleObjectsEx
Actual target: USER32!MsgWaitForMultipleObjectsEx+0x0

!iat has done two things for us:

  1. It did a reverse lookup to determine the module and function name for the import that corresponds to that particular IAT entry; and
  2. It followed the IAT pointer and determined the symbol at the target address.

Normally we want both the expected and actual targets to match. If they don’t, we should investigate further, as this mismatch may indicate that the IAT has been patched by a third party.

Note that !iat command is breakpad aware (provided that you’ve already loaded the symbols using !bploadsyms) but can fall back to the Microsoft symbol engine as necessary.

Further note that the !iat command does not yet accept the _imp_ symbolic names for the IAT entries, you need to enter the hexadecimal representation of the pointer.

Announcing Mozdbgext

| Comments

A well-known problem at Mozilla is that, while most of our desktop users run Windows, most of Mozilla’s developers do not. There are a lot of problems that result from that, but one of the most frustrating to me is that sometimes those of us that actually use Windows for development find ourselves at a disadvantage when it comes to tooling or other productivity enhancers.

In many ways this problem is also a Catch-22: People don’t want to use Windows for many reasons, but tooling is big part of the problem. OTOH, nobody is motivated to improve the tooling situation if nobody is actually going to use them.

A couple of weeks ago my frustrations with the situation boiled over when I learned that our Cpp unit test suite could not log symbolicated call stacks, resulting in my filing of bug 1238305 and bug 1240605. Not only could we not log those stacks, in many situations we could not view them in a debugger either.

Due to the fact that PDB files consume a large amount of disk space, we don’t keep those when building from integration or try repositories. Unfortunately they are be quite useful to have when there is a build failure. Most of our integration builds, however, do include breakpad symbols. Developers may also explicitly request symbols for their try builds.

A couple of years ago I had begun working on a WinDbg debugger extension that was tailored to Mozilla development. It had mostly bitrotted over time, but I decided to resurrect it for a new purpose: to help WinDbg* grok breakpad.

Enter mozdbgext

mozdbgext is the result. This extension adds a few commands that makes Win32 debugging with breakpad a little bit easier.

The original plan was that I wanted mozdbgext to load breakpad symbols and then insert them into the debugger’s symbol table via the IDebugSymbols3::AddSyntheticSymbol API. Unfortunately the design of this API is not well equipped for bulk loading of synthetic symbols: each individual symbol insertion causes the debugger to re-sort its entire symbol table. Since xul.dll’s quantity of symbols is in the six-figure range, using this API to load that quantity of symbols is prohibitively expensive. I tweeted a Microsoft PM who works on Debugging Tools for Windows, asking if there would be any improvements there, but it sounds like this is not going to be happening any time soon.

My original plan would have been ideal from a UX perspective: the breakpad symbols would look just like any other symbols in the debugger and could be accessed and manipulated using the same set of commands. Since synthetic symbols would not work for me in this case, I went for “Plan B:” Extension commands that are separate from, but analagous to, regular WinDbg commands.

I plan to continuously improve the commands that are available. Until I have a proper README checked in, I’ll introduce the commands here.

Loading the Extension

  1. Use the .load command: .load <path_to_mozdbgext_dll>

Loading the Breakpad Symbols

  1. Extract the breakpad symbols into a directory.
  2. In the debugger, enter !bploadsyms <path_to_breakpad_symbol_directory>
  3. Note that this command will take some time to load all the relevant symbols.

Working with Breakpad Symbols

Note: You must have successfully run the !bploadsyms command first!

As a general guide, I am attempting to name each breakpad command similarly to the native WinDbg command, except that the command name is prefixed by !bp.

  • Stack trace: !bpk
  • Find nearest symbol to address: !bpln <address> where address is specified as a hexadecimal value.

Downloading windbgext

I have pre-built binaries (32-bit, 64-bit) available for download.

Note that there are several other commands that are “roughed-in” at this point and do not work correctly yet. Please stick to the documented commands at this time.


* When I write “WinDbg”, I am really referring to any debugger in the Debugging Tools for Windows package, including cdb.