Aaron Klotz at Mozilla

My adventures as a member of Mozilla’s Platform Integration Team

New Team, New Project

| Comments

In February of this year I switched teams: After 3+ years on Mozilla’s Performance Team, and after having the word “performance” in my job description in some form or another for several years prior to that, I decided that it was time for me to move on to new challenges. Fortunately the Platform org was willing to have me set up shop under the (e10s|sandboxing|platform integration) umbrella.

I am pretty excited about this new role!

My first project is to sort out the accessibility situation under Windows e10s. This started back at Mozlando last December. A number of engineers from across the Platform org, plus me, got together to brainstorm. Not too long after we had all returned home, I ended up making a suggestion on an email thread that has evolved into the core concept that I am currently attempting. As is typical at Mozilla, no deed goes unpunished, so I have been asked to flesh out my ideas. An overview of this plan is available on the wiki.

My hope is that I’ll be able to deliver a working, “version 0.9” type of demo in time for our London all-hands at the end of Q2. Hopefully we will be able to deliver on that!

Some Additional Notes

I am using this section of the blog post to make some additional notes. I don’t feel that these ideas are strong enough to commit to a wiki yet, but I do want them to be publicly available.

Once concern that our colleagues at NVAccess have identified is that the current COM interfaces are too chatty; this is a major reason why screen readers frequently inject libraries into the Firefox address space. If we serve our content a11y objects as remote COM objects, there is concern that performance would suffer. This concern is not due to latency, but rather due to frequency of calls; one function call does not provide sufficient information to the a11y client. As a result, multiple round trips are required to fetch all of the information that is required for a particular DOM node.

My gut feeling about this is that this is probably a legitimate concern, however we cannot make good decisions without quantifying the performance. My plan going forward is to proceed with a naïve implementation of COM remoting to start, followed by work on reducing round trips as necessary.

Smart Proxies

One idea that was discussed is the idea of the content process speculatively sending information to the chrome process that might be needed in the future. For example, if we have an IAccessible, we can expect that multiple properties will be queried off that interface. A smart proxy could ship that data across the RPC channel during marshaling so that querying that additional information does not require additional round trips.

COM makes this possible using “handler marshaling.” I have dug up some information about how to do this and am posting it here for posterity:

House of COM, May 1999 Microsoft Systems Journal;
Implementing and Activating a Handler with Extra Data Supplied by Server on MSDN;
Wicked Code, August 2000 MSDN Magazine. This is not available on the MSDN Magazine website but I have an archived copy on CD-ROM.

New Mozdbgext Command: !iat

| Comments

As of today I have added a new command to mozdbgext: !iat.

The syntax is pretty simple:

!iat <hexadecimal address>

This address shouldn’t be just any pointer; it should be the address of an entry in the current module’s import address table (IAT). These addresses are typically very identifiable by the _imp_ prefix in their symbol names.

The purpose of this extension is to look up the name of the DLL from whom the function is being imported. Furthermore, the extension checks the expected target address of the import with the actual target address of the import. This allows us to detect API hooking via IAT patching.

An Example Session

I fired up a local copy of Nightly, attached a debugger to it, and dumped the call stack of its main thread:

 # ChildEBP RetAddr
00 0018ecd0 765aa32a ntdll!NtWaitForMultipleObjects+0xc
01 0018ee64 761ec47b KERNELBASE!WaitForMultipleObjectsEx+0x10a
02 0018eecc 1406905a USER32!MsgWaitForMultipleObjectsEx+0x17b
03 0018ef18 1408e2c8 xul!mozilla::widget::WinUtils::WaitForMessage+0x5a
04 0018ef84 13fdae56 xul!nsAppShell::ProcessNextNativeEvent+0x188
05 0018ef9c 13fe3778 xul!nsBaseAppShell::DoProcessNextNativeEvent+0x36
06 0018efbc 10329001 xul!nsBaseAppShell::OnProcessNextEvent+0x158
07 0018f0e0 1038e612 xul!nsThread::ProcessNextEvent+0x401
08 0018f0fc 1095de03 xul!NS_ProcessNextEvent+0x62
09 0018f130 108e493d xul!mozilla::ipc::MessagePump::Run+0x273
0a 0018f154 108e48b2 xul!MessageLoop::RunInternal+0x4d
0b 0018f18c 108e448d xul!MessageLoop::RunHandler+0x82
0c 0018f1ac 13fe78f0 xul!MessageLoop::Run+0x1d
0d 0018f1b8 14090f07 xul!nsBaseAppShell::Run+0x50
0e 0018f1c8 1509823f xul!nsAppShell::Run+0x17
0f 0018f1e4 1514975a xul!nsAppStartup::Run+0x6f
10 0018f5e8 15146527 xul!XREMain::XRE_mainRun+0x146a
11 0018f650 1514c04a xul!XREMain::XRE_main+0x327
12 0018f768 00215c1e xul!XRE_main+0x3a
13 0018f940 00214dbd firefox!do_main+0x5ae
14 0018f9e4 0021662e firefox!NS_internal_main+0x18d
15 0018fa18 0021a269 firefox!wmain+0x12e
16 0018fa60 76e338f4 firefox!__tmainCRTStartup+0xfe
17 0018fa74 77d656c3 KERNEL32!BaseThreadInitThunk+0x24
18 0018fabc 77d6568e ntdll!__RtlUserThreadStart+0x2f
19 0018facc 00000000 ntdll!_RtlUserThreadStart+0x1b

Let us examine the code at frame 3:

14069042 6a04            push    4
14069044 68ff1d0000      push    1DFFh
14069049 8b5508          mov     edx,dword ptr [ebp+8]
1406904c 2b55f8          sub     edx,dword ptr [ebp-8]
1406904f 52              push    edx
14069050 6a00            push    0
14069052 6a00            push    0
14069054 ff159cc57d19    call    dword ptr [xul!_imp__MsgWaitForMultipleObjectsEx (197dc59c)]
1406905a 8945f4          mov     dword ptr [ebp-0Ch],eax

Notice the function call to MsgWaitForMultipleObjectsEx occurs indirectly; the call instruction is referencing a pointer within the xul.dll binary itself. This is the IAT entry that corresponds to that function.

Now, if I load mozdbgext, I can take the address of that IAT entry and execute the following command:

0:000> !iat 0x197dc59c
Expected target: USER32.dll!MsgWaitForMultipleObjectsEx
Actual target: USER32!MsgWaitForMultipleObjectsEx+0x0

!iat has done two things for us:

  1. It did a reverse lookup to determine the module and function name for the import that corresponds to that particular IAT entry; and
  2. It followed the IAT pointer and determined the symbol at the target address.

Normally we want both the expected and actual targets to match. If they don’t, we should investigate further, as this mismatch may indicate that the IAT has been patched by a third party.

Note that !iat command is breakpad aware (provided that you’ve already loaded the symbols using !bploadsyms) but can fall back to the Microsoft symbol engine as necessary.

Further note that the !iat command does not yet accept the _imp_ symbolic names for the IAT entries, you need to enter the hexadecimal representation of the pointer.

Announcing Mozdbgext

| Comments

A well-known problem at Mozilla is that, while most of our desktop users run Windows, most of Mozilla’s developers do not. There are a lot of problems that result from that, but one of the most frustrating to me is that sometimes those of us that actually use Windows for development find ourselves at a disadvantage when it comes to tooling or other productivity enhancers.

In many ways this problem is also a Catch-22: People don’t want to use Windows for many reasons, but tooling is big part of the problem. OTOH, nobody is motivated to improve the tooling situation if nobody is actually going to use them.

A couple of weeks ago my frustrations with the situation boiled over when I learned that our Cpp unit test suite could not log symbolicated call stacks, resulting in my filing of bug 1238305 and bug 1240605. Not only could we not log those stacks, in many situations we could not view them in a debugger either.

Due to the fact that PDB files consume a large amount of disk space, we don’t keep those when building from integration or try repositories. Unfortunately they are be quite useful to have when there is a build failure. Most of our integration builds, however, do include breakpad symbols. Developers may also explicitly request symbols for their try builds.

A couple of years ago I had begun working on a WinDbg debugger extension that was tailored to Mozilla development. It had mostly bitrotted over time, but I decided to resurrect it for a new purpose: to help WinDbg* grok breakpad.

Enter mozdbgext

mozdbgext is the result. This extension adds a few commands that makes Win32 debugging with breakpad a little bit easier.

The original plan was that I wanted mozdbgext to load breakpad symbols and then insert them into the debugger’s symbol table via the IDebugSymbols3::AddSyntheticSymbol API. Unfortunately the design of this API is not well equipped for bulk loading of synthetic symbols: each individual symbol insertion causes the debugger to re-sort its entire symbol table. Since xul.dll’s quantity of symbols is in the six-figure range, using this API to load that quantity of symbols is prohibitively expensive. I tweeted a Microsoft PM who works on Debugging Tools for Windows, asking if there would be any improvements there, but it sounds like this is not going to be happening any time soon.

My original plan would have been ideal from a UX perspective: the breakpad symbols would look just like any other symbols in the debugger and could be accessed and manipulated using the same set of commands. Since synthetic symbols would not work for me in this case, I went for “Plan B:” Extension commands that are separate from, but analagous to, regular WinDbg commands.

I plan to continuously improve the commands that are available. Until I have a proper README checked in, I’ll introduce the commands here.

Loading the Extension

  1. Use the .load command: .load <path_to_mozdbgext_dll>

Loading the Breakpad Symbols

  1. Extract the breakpad symbols into a directory.
  2. In the debugger, enter !bploadsyms <path_to_breakpad_symbol_directory>
  3. Note that this command will take some time to load all the relevant symbols.

Working with Breakpad Symbols

Note: You must have successfully run the !bploadsyms command first!

As a general guide, I am attempting to name each breakpad command similarly to the native WinDbg command, except that the command name is prefixed by !bp.

  • Stack trace: !bpk
  • Find nearest symbol to address: !bpln <address> where address is specified as a hexadecimal value.

Downloading windbgext

I have pre-built binaries (32-bit, 64-bit) available for download.

Note that there are several other commands that are “roughed-in” at this point and do not work correctly yet. Please stick to the documented commands at this time.

* When I write “WinDbg”, I am really referring to any debugger in the Debugging Tools for Windows package, including cdb.

Bugs From Hell: Injected Third-party Code + Detours = a Bad Time

| Comments

Happy New Year!

I’m finally getting ‘round to writing about a nasty bug that I had to spend a bunch of time with in Q4 2015. It’s one of the more challenging problems that I’ve had to break and I’ve been asked a lot of questions about it. I’m talking about bug 1218473.

How This All Began

In bug 1213567 I had landed a patch to intercept calls to CreateWindowEx. This was necessary because it was apparent in that bug that window subclassing was occurring while a window was neutered (“neutering” is terminology that is specific to Mozilla’s Win32 IPC code).

While I’ll save a discussion on the specifics of window neutering for another day, for our purposes it is sufficient for me to point out that subclassing a neutered window is bad because it creates an infinite recursion scenario with window procedures that will eventually overflow the stack.

Neutering is triggered during certain types of IPC calls as soon as a message is sent to an unneutered window on the thread making the IPC call. Unfortunately in the case of bug 1213567, the message triggering the neutering was WM_CREATE. Shortly after creating that window, the code responsible would subclass said window. Since WM_CREATE had already triggered neutering, this would result in the pathological case that triggers the stack overflow.

For a fix, what I wanted to do is to prevent messages that were sent immediately during the execution of CreateWindow (such as WM_CREATE) from triggering neutering prematurely. By intercepting calls to CreateWindowEx, I could wrap those calls with a RAII object that temporarily suppresses the neutering. Since the subclassing occurs immediately after window creation, this meant that this subclassing operation was now safe.

Unfortunately, shortly after landing bug 1213567, bug 1218473 was filed.

Where to Start

It wasn’t obvious where to start debugging this. While a crash spike was clearly correlated with the landing of bug 1213567, the crashes were occurring in code that had nothing to do with IPC or Win32. For example, the first stack that I looked at was js::CreateRegExpMatchResult!

When it is just not clear where to begin, I like to start by looking at our correlation data in Socorro – you’d be surprised how often they can bring problems into sharp relief!

In this case, the correlation data didn’t disappoint: there was 100% correlation with a module called _etoured.dll. There was also correlation with the presence of both NVIDIA video drivers and Intel video drivers. Clearly this was a concern only when NVIDIA Optimus technology was enabled.

I also had a pretty strong hypothesis about what _etoured.dll was: For many years, Microsoft Research has shipped a package called Detours. Detours is a library that is used for intercepting Win32 API calls. While the changelog for Detours 3.0 points out that it has “Removed [the] requirement for including detoured.dll in processes,” in previous versions of the package, this library was required to be injected into the target process.

I concluded that _etoured.dll was most likely a renamed version of detoured.dll from Detours 2.x.

Following The Trail

Now that I knew the likely culprit, I needed to know how it was getting there. During a November trip to the Mozilla Toronto office, I spent some time debugging a test laptop that was configured with Optimus.

Knowing that the presence of Detours was somehow interfering with our own API interception, I decided to find out whether it was also trying to intercept CreateWindowExW. I launched windbg, started Firefox with it, and then told it to break as soon as user32.dll was loaded:

sxe ld:user32.dll

Then I pressed F5 to resume execution. When the debugger broke again, this time user32 was now in memory. I wanted the debugger to break as soon as CreateWindowExW was touched:

ba w 4 user32!CreateWindowExW

Once again I resumed execution. Then the debugger broke on the memory access and gave me this call stack:

mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0

This stack is a gold mine of information. In particular, it tells us the following:

  1. The offending DLLs are being injected by AppInit_DLLs (and in fact, Raymond Chen has blogged about this exact case in the past).

  2. nvinit.dll is the name of the DLL that is injected by step 1.

  3. nvinit.dll loads nvd3d9wrap.dll which then uses Detours to patch our copy of CreateWindowExW.

I then became curious as to which other functions they were patching.

Since Detours is patching executable code, we know that at some point it is going to need to call VirtualProtect to make the target code writable. In the worst case, VirtualProtect’s caller is going to pass the address of the page where the target code resides. In the best case, the caller will pass in the address of the target function itself!

I restarted windbg, but this time I set a breakpoint on VirtualProtect:

bp kernel32!VirtualProtect

I then resumed the debugger and examined the call stack every time it broke. While not every single VirtualProtect call would correspond to a detour, it would be obvious when it was, as the NVIDIA DLLs would be on the call stack.

The first time I caught a detour, I examined the address being passed to VirtualProtect: I ended up with the best possible case: the address was pointing to the actual target function! From there I was able to distill a list of other functions being hooked by the injected NVIDIA DLLs.

Putting it all Together

By this point I knew who was hooking our code and knew how it was getting there. I also noticed that CreateWindowEx is the only function that the NVIDIA DLLs and our own code were both trying to intercept. Clearly there was some kind of bad interaction occurring between the two interception mechanisms, but what was it?

I decided to go back and examine a specific crash dump. In particular, I wanted to examine three different memory locations:

  1. The first few instructions of user32!CreateWindowExW;
  2. The first few instructions of xul!CreateWindowExWHook; and
  3. The site of the call to user32!CreateWindowExW that triggered the crash.

Of those three locations, the only one that looked off was location 2:

6b1f6611 56              push    esi
6b1f6612 ff15f033e975    call    dword ptr [combase!CClassCache::CLSvrClassEntry::GetDDEInfo+0x41 (75e933f0)]
6b1f6618 c3              ret
6b1f6619 7106            jno     xul!`anonymous namespace'::CreateWindowExWHook+0x6 (6b1f6621)
xul!`anonymous namespace'::CreateWindowExWHook:
6b1f661b cc              int     3
6b1f661c cc              int     3
6b1f661d cc              int     3
6b1f661e cc              int     3
6b1f661f cc              int     3
6b1f6620 cc              int     3
6b1f6621 ff              ???

Why the hell were the first six bytes filled with breakpoint instructions?

I decided at this point to look at some source code. Fortunately Microsoft publishes the 32-bit source code for Detours, licensed for non-profit use, under the name “Detours Express.”

I found a copy of Detours Express 2.1 and checked out the code. First I wanted to know where all of these 0xcc bytes were coming from. A quick grep turned up what I was looking for:

inline PBYTE detour_gen_brk(PBYTE pbCode, PBYTE pbLimit)
    while (pbCode < pbLimit) {
        *pbCode++ = 0xcc;   // brk;
    return pbCode;

Now that I knew which function was generating the int 3 instructions, I then wanted to find its callers. Soon I found:

#ifdef DETOURS_X86
    pbSrc = detour_gen_jmp_immediate(pTrampoline->rbCode + cbTarget, pTrampoline->pbRemain);
    pbSrc = detour_gen_brk(pbSrc,
                           pTrampoline->rbCode + sizeof(pTrampoline->rbCode));
#endif // DETOURS_X86

Okay, so Detours writes the breakpoints out immediately after it has written a jmp pointing to its trampoline.

Why is our hook function being trampolined?

The reason must be because our hook was installed first! Detours has detected that and has decided that the best place to trampoline to the NVIDIA hook is at the beginning of our hook function.

But Detours is using the wrong address!

We can see that because the int 3 instructions are written out at the beginning of CreateWindowExWHook, even though there should be a jmp instruction first.

Detours is calculating the wrong address to write its jmp!

Finding a Workaround

Once I knew what the problem was, I needed to know more about the why – only then would I be able to come up with a way to work around this problem.

I decided to reconstruct the scenario where both our code and Detours are trying to hook the same function, but our hook was installed first. I would then follow along through the Detours code to determine how it calculated the wrong address to install its jmp.

The first thing to keep in mind is that Mozilla’s function interception code takes advantage of hot-patch points in Windows. If the target function begins with a mov edi, edi prolog, we use a hot-patch style hook instead of a trampoline hook. I am not going to go into detail about hot-patch hooks here – the above Raymond Chen link contains enough details to answer your questions. For the purposes of this blog post, the important point is that Mozilla’s code patches the mov edi, edi, so NVIDIA’s Detours library would need to recognize and follow the jmps that our code patched in, in order to write its own jmp at CreateWindowExWHook.

Tracing through the Detours code, I found the place where it checks for a hot-patch hook and follows the jmp if necessary. While examining a function called detour_skip_jmp, I found the bug:

        pbNew = pbCode + *(INT32 *)&pbCode[1];

This code is supposed to be telling Detours where the target address of a jmp is, so that Detours can follow it. pbNew is supposed to be the target address of the jmp. pbCode is referencing the address of the beginning of the jmp instruction itself. Unfortunately, with this type of jmp instruction, target addresses are always relative to the address of the next instruction, not the current instruction! Since the current jmp instruction is five bytes long, Detours ends up writing its jmp five bytes prior to the intended target address!

I went and checked the source code for Detours Express 3.0 to see if this had been fixed, and indeed it had:

        PBYTE pbNew = pbCode + 5 + *(INT32 *)&pbCode[1];

That doesn’t do much for me right now, however, since the NVIDIA stuff is still using Detours 2.x.

In the case of Mozilla’s code, there is legitimate executable code at that incorrect address that Detours writes to. It is corrupting the last few instructions of that function, thus explaining those mysterious crashes that were seemingly unrelated code.

I confirmed this by downloading the binaries from the build that was associated with the crash dump that I was analyzing. [As an aside, I should point out that you need to grab the identical binaries for this exercise; you cannot build from the same source revision and expect this to work due to variability that is introduced into builds by things like PGO.]

The five bytes preceeding CreateWindowExHookW in the crash dump diverged from those same bytes in the original binaries. I could also make out that the overwritten bytes consisted of a jmp instruction.

In Summary

Let us now review what we know at this point:

  • Detours 2.x doesn’t correctly follow jmps from hot-patch hooks;
  • If Detours tries to hook something that has already been hot-patched (including legitimate hot patches from Microsoft), it will write bytes at incorrect addresses;
  • NVIDIA Optimus injects this buggy code into everybody’s address spaces via an AppInit_DLLs entry for nvinit.dll.

How can we best distill this into a suitable workaround?

One option could be to block the NVIDIA DLLs outright. In most cases this would probably be the simplest option, but I was hesitant to do so this time. I was concerned about the unintended consequences of blocking what, for better or worse, is a user-mode component of NVIDIA video drivers.

Instead I decided to take advantage of the fact that we now know how this bug is triggered. I have modified our API interception code such that if it detects the presence of NVIDIA Optimus, it disables hot-patch style hooks.

Not only will this take care of the crash spike that started when I landed bug 1213567, I also expect it to take care of other crash signatures whose relationship to this bug was not obvious.

That concludes this episode of Bugs from Hell. Until next time…

On WebExtensions

| Comments

There has been enough that has been said over the past week about WebExtensions that I wasn’t sure if I wanted to write this post. As usual, I can’t seem to help myself. Note the usual disclaimer that this is my personal opinion. Further note that I have no involvement with WebExtensions at this time, so I write this from the point of view of an observer.

API? What API?

I shall begin with the proposition that the legacy, non-jetpack environment for addons is not an API. As ridiculous as some readers might consider this to be, please humour me for a moment.

Let us go back to the acronym, “API.” Application Programming Interface. While the usage of the term “API” seems to have expanded over the years to encompass just about any type of interface whatsoever, I’d like to explore the first letter of that acronym: Application.

An Application Programming Interface is a specific type of interface that is exposed for the purposes of building applications. It typically provides a formal abstraction layer that isolates applications from the implementation details behind the lower tier(s) in the software stack. In the case of web browsers, I suggest that there are two distinct types of applications: web content, and extensions.

There is obviously a very well defined API for web content. On the other hand, I would argue that Gecko’s legacy addon environment is not an API at all! From the point of view of an extension, there is no abstraction, limited formality, and not necessarily an intention to be used by applications.

An extension is imported into Firefox with full privileges and can access whatever it wants. Does it have access to interfaces? Yes, but are those interfaces intended for applications? Some are, but many are not. The environment that Gecko currently provides for legacy addons is analagous to an operating system running every single application in kernel mode. Is that powerful? Absolutely! Is that the best thing to do for maintainability and robustness? Absolutely not!

Somewhere a line needs to be drawn to demarcate this abstraction layer and improve Gecko developers’ ability to make improvements under the hood. Last week’s announcement was an invitation to addon developers to help shape that future. Please participate and please do so constructively!

WebExtensions are not Chrome Extensions

When I first heard rumors about WebExtensions in Whistler, my source made it very clear to me that the WebExtensions initiative is not about making Chrome extensions run in Firefox. In fact, I am quite disappointed with some of the press coverage that seems to completely miss this point.

Yes, WebExtensions will be implementing some APIs to be source compatible with Chrome. That makes it easier to port a Chrome extension, but porting will still be necessary. I like the Venn Diagram concept that the WebExtensions FAQ uses: Some Chrome APIs will not be available in WebExtensions. On the other hand, WebExtensions will be providing APIs above and beyond the Chrome API set that will maintain Firefox’s legacy of extensibility.

Please try not to think of this project as Mozilla taking functionality away. In general I think it is safe to think of this as an opportunity to move that same functionality to a mechanism that is more formal and abstract.

Interesting Win32 APIs

| Comments

Yesterday I decided to diff the export tables of some core Win32 DLLs to see what’s changed between Windows 8.1 and the Windows 10 technical preview. There weren’t many changes, but the ones that were there are quite exciting IMHO. While researching these new APIs, I also stumbled across some others that were added during the Windows 8 timeframe that we should be considering as well.

Volatile Ranges

While my diff showed these APIs as new exports for Windows 10, the MSDN docs claim that these APIs are actually new for the Windows 8.1 Update. Using the OfferVirtualMemory and ReclaimVirtualMemory functions, we can now specify ranges of virtual memory that are safe to discarded under memory pressure. Later on, should we request that access be restored to that memory, the kernel will either return that virtual memory to us unmodified, or advise us that the associated pages have been discarded.

A couple of years ago we had an intern on the Perf Team who was working on bringing this capability to Linux. I am pleasantly surprised that this is now offered on Windows.

madvise(MADV_WILLNEED) for Win32

For the longest time we have been hacking around the absence of a madvise-like API on Win32. On Linux we will do a madvise(MADV_WILLNEED) on memory-mapped files when we want the kernel to read ahead. On Win32, we were opening the backing file and then doing a series of sequential reads through the file to force the kernel to cache the file data. As of Windows 8, we can now call PrefetchVirtualMemory for a similar effect.

Operation Recorder: An API for SuperFetch

The OperationStart and OperationEnd APIs are intended to record access patterns during a file I/O operation. SuperFetch will then create prefetch files for the operation, enabling prefetch capabilities above and beyond the use case of initial process startup.

Memory Pressure Notifications

This API is not actually new, but I couldn’t find any invocations of it in the Mozilla codebase. CreateMemoryResourceNotification allocates a kernel handle that becomes signalled when physical memory is running low. Gecko already has facilities for handling memory pressure events on other platforms, so we should probably add this to the Win32 port.

WaitMessage Considered Harmful

| Comments

I could apologize for the clickbaity title, but I won’t. I have no shame.

Today I want to talk about some code that we imported from Chromium some time ago. I replaced it in Mozilla’s codebase a few months back in bug 1072752:

(message_pump_win.cc) download
    // A WM_* message is available.
    // If a parent child relationship exists between windows across threads
    // then their thread inputs are implicitly attached.
    // This causes the MsgWaitForMultipleObjectsEx API to return indicating
    // that messages are ready for processing (Specifically, mouse messages
    // intended for the child window may appear if the child window has
    // capture).
    // The subsequent PeekMessages call may fail to return any messages thus
    // causing us to enter a tight loop at times.
    // The WaitMessage call below is a workaround to give the child window
    // some time to process its input messages.
    MSG msg = {0};
    DWORD queue_status = GetQueueStatus(QS_MOUSE);
    if (HIWORD(queue_status) & QS_MOUSE &&

This code is wrong. Very wrong.

Let us start with the calls to GetQueueStatus and PeekMessage. Those APIs mark any messages already in the thread’s message queue as having been seen, such that they are no longer considered “new.” Even though those function calls do not remove messages from the queue, any messages that were in the queue at this point are considered to be “old.”

The logic in this code snippet is essentially saying, “if the queue contains mouse messages that do not belong to this thread, then they must belong to an attached thread.” The code then calls WaitMessage in an effort to give the other thread(s) a chance to process their mouse messages. This is where the code goes off the rails.

The documentation for WaitMessage states the following:

Note that WaitMessage does not return if there is unread input in the message queue after the thread has called a function to check the queue. This is because functions such as PeekMessage, GetMessage, GetQueueStatus, WaitMessage, MsgWaitForMultipleObjects, and MsgWaitForMultipleObjectsEx check the queue and then change the state information for the queue so that the input is no longer considered new. A subsequent call to WaitMessage will not return until new input of the specified type arrives. The existing unread input (received prior to the last time the thread checked the queue) is ignored.

WaitMessage will only return if there is a new (as opposed to any) message in the queue for the calling thread. Any messages for the calling thread that were already in there at the time of the GetQueueStatus and PeekMessage calls are no longer new, so they are ignored.

There might very well be a message at the head of that queue that should be processed by the current thread. Instead it is ignored while we wait for other threads. Here is the crux of the problem: we’re waiting on other threads whose input queues are attached to our own! That other thread can’t process its messages because our thread has messages in front of its messages; on the other hand, our thread has blocked itself!

The only way to break this deadlock is for new messages to be added to the queue. That is a big reason why we’re seeing things like bug 1105386: Moving the mouse adds new messages to the queue, making WaitMessage unblock.

I’ve already eliminated this code in Mozilla’s codebase, but the challenge is going to be getting rid of this code in third-party binaries that attach their own windows to Firefox’s windows.

Attached Input Queues on Firefox for Windows

| Comments

I’ve previously blogged indirectly about attached input queues, but today I want to address the issue directly. What once was a nuisance in the realm of plugin hangs has grown into a more serious problem in the land of OMTC and e10s.

As a brief recap for those who are not very familiar with this problem: imagine two windows, each on their own separate threads, forming a parent-child relationship with each other. When this situation arises, Windows implicitly attaches together and synchronizes their input queues, putting each thread at the mercy of the other attached threads’ ability to pump messages. If one thread does something bad in its message pump, any other threads that are attached to it are likely to be affected as well.

One of the biggest annoyances when it comes to knowledge about which threads are affected, is that we are essentially flying blind. There is no way to query Windows for information about attached input queues. This is unfortunate, as it would be really nice to obtain some specific knowledge to allow us to analyze the state of Firefox threads’ input queues so that we can mitigate the problem.

I had previously been working on a personal side project to make this possible, but in light of recent developments (and a tweet from bsmedberg), I decided to bring this investigation under the umbrella of my full-time job. I’m pleased to announce that I’ve finished the first cut of a utility that I call the Input Queue Visualizer, or iqvis.

iqvis consists of two components, one of which is a kernel-mode driver. This driver exposes input queue attachment data to user mode. The iqvis user-mode executable is the client that queries the driver and outputs the results. In the next section I’m going to discuss the inner workings of iqvis. Following that, I’ll discuss the results of running iqvis on an instance of Firefox.

Input Queue Visualizer Internals

First of all, let’s start off with this caveat: Nearly everything that this driver does involves undocumented APIs and data structures. Because of this, iqvis does some things that you should never do in production software.

One of the big consequences of using undocumented information is that iqvis requires pointers to very specific locations in kernel memory to accomplish things. These pointers will change every time that Windows is updated. To mitigate this, I kind of cheated: it turns out that debugging symbols exist for all of the locations that iqvis needs to access! I wrote the iqvis client to invoke the dbghelp engine to extract the pointers that I need from Windows symbols and send those values as the input to the DeviceIoControl call that triggers the data collection. Passing pointers from user mode to be accessed in kernel mode is a very dangerous thing to do (and again, I would never do it in production software), but it is damn convenient for iqvis!

Another issue is that these undocumented details change between Windows versions. The initial version of iqvis works on 64-bit Windows 8.1, but different code is required for other major releases, such as Windows 7. The iqvis driver theoretically will work on Windows 7 but I need to make a few bug fixes for that case.

So, getting those details out of the way, we can address the crux of the problem: we need to query input queue attachment information from win32k.sys, which is the driver that implements USER and GDI system calls on Windows NT systems.

In particular, the window manager maintains a linked list that describes thread attachment info as a triple that points to the “from” thread, the “to” thread, and a count. The count is necessary because the same two threads may be attached to each other multiple times. The iqvis driver walks this linked list in a thread-safe way to obtain the attachment data, and then copies it to the output buffer for the DeviceIoControl request.

Since iqvis involves a device driver, and since I have not digitally signed that device driver, one can’t just run iqvis and call it a day. This program won’t work unless the computer was either booted with kernel debugging enabled, or it was booted with driver signing temporarily disabled.

Running iqvis against Firefox

Today I ran iqvis using today’s Nightly 39 as well as the lastest release of Flash. I also tried it with Flash Protected Mode both disabled and enabled. (Note that these examples used an older version of iqvis that outputs thread IDs in hexadecimal. The current version uses decimal for its output.)

Protected Mode Disabled

FromTID ToTID Count
ac8 df4 1

Looking up the thread IDs:

  • df4 is the Firefox main thread;
  • ac8 is the plugin-container main thread.

I think that the output from this case is pretty much what I was expecting to see. The protected mode case, however, is more interesting.

Protected Mode Enabled

FromTID ToTID Count
f8c dbc 1
794 f8c 3

Looking up the thread IDs:

  • dbc is the Firefox main thread;
  • f8c is the plugin-container main thread;
  • 794 is the Flash sandbox main thread.

Notice how Flash is attached to plugin-container, which is then attached to Firefox. Notice that transitively the Flash sandbox is effectively attached to Firefox, confirming previous hypotheses that I’ve discussed with colleagues in the past.

Also notice how the Flash sandbox attachment to plugin-container has a count of 3!

In Conclusion

In my opinion, my Input Queue Visualizer has already yielded some very interesting data. Hopefully this will help us to troubleshoot our issues in the future. Oh, and the code is up on GitHub! It’s poorly documented at the moment, but just remember to only try running it on 64-bit Windows 8.1 for the time being!

Profile Unlocking in Firefox 34 for Windows

| Comments

Today’s Nightly 34 build includes the work I did for bug 286355: a profile unlocker for our Windows users. This should be very helpful to those users whose workflow is interrupted by a Firefox instance that cannot start because a previous Firefox instance has not finished shutting down.

Firefox 34 users running Windows Vista or newer will now be presented with this dialog box:

Clicking “Close Firefox” will terminate that previous instance and proceed with starting your new Firefox instance.

Unfortunately this feature is not available to Windows XP users. To support this feature on Windows XP we would need to call undocumented API functions. I prefer to avoid calling undocumented APIs when writing production software due to the potential stability and compatibility issues that can arise from doing so.

While this feature adds some convenience to an otherwise annoying issue, please be assured that the Desktop Performance Team will continue to investigate and fix the root causes of long shutdowns so that a profile unlocker hopefully becomes unnecessary.