This week’s release of Firefox Quantum has prompted all kinds of feedback, both
positive and negative. That is not surprising to anybody – any software that
has a large number of users is going to be a topic for discussion, especially
when the release in question is undoubtedly a watershed.
While I have previously
blogged about the transition to WebExtensions, now that we have actually passed
through the cutoff for legacy extensions, I have decided to add some new
commentary on the subject.
One analogy that has been used in the discussion of the extension ecosystem is
that of kernelspace and userspace. The crux of the analogy is that Gecko is
equivalent to an operating system kernel, and thus extensions are the user-mode
programs that run atop that kernel. The argument then follows that Mozilla’s
deprecation and removal of legacy extension capabilities is akin to “breaking”
userspace. [Some people who say this are using the same tone as Linus does
whenever he eviscerates Linux developers who break userspace, which is neither
productive nor welcomed by anyone, but I digress.] Unfortunately, that analogy
simply does not map to the legacy extension model.
Legacy Extensions as Kernel Modules
The most significant problem with the userspace analogy is that legacy extensions
effectively meld with Gecko and become part of Gecko itself. If we accept the
premise that Gecko is like a monolithic OS kernel, then we must also accept that
the analogical equivalent of loading arbitrary code into that kernel, is the
kernel module. Such components are loaded into the kernel and effectively become
part of it. Their code runs with full privileges. They break whenever
significant changes are made to the kernel itself.
Sound familiar?
Legacy extensions were akin to kernel modules. When there is no abstraction,
there can be no such thing as userspace. This is precisely the problem that
WebExtensions solves!
Building Out a Legacy API
Maybe somebody out there is thinking, “well what if you took all the APIs that
legacy extensions used, turned that into a ‘userspace,’ and then just left that
part alone?”
Which APIs? Where do we draw the line? Do we check the code coverage for every
legacy addon in AMO and use that to determine what to include?
Remember, there was no abstraction; installed legacy addons are fused to Gecko.
If we pledge not to touch anything that legacy addons might touch, then we
cannot touch anything at all.
Where do we go from here? Freeze an old version of Gecko and host an entire copy
of it inside web content? Compile it to WebAssembly? [Oh God, what have I done?]
If that’s not a maintenance burden, I don’t know what is!
A Kernel Analogy for WebExtensions
Another problem with the legacy-extensions-as-userspace analogy is that it leaves
awkward room for web content, whose API is abstract and well-defined. I do not
think that it is appropriate to consider web content to be equivalent to a
sandboxed application, as sandboxed applications use the same (albeit restricted)
API as normal applications. I would suggest that the presence of WebExtensions
gives us a better kernel analogy:
Gecko is the kernel;
WebExtensions are privileged user applications;
Web content runs as unprivileged user applications.
In Conclusion
Declaring that legacy extensions are userspace does not make them so. The way that
the technology actually worked defies the abstract model that the analogy
attempts to impose upon it. On the other hand, we can use the failure of that
analogy to explain why WebExtensions are important and construct an extension
ecosystem that does fit with that analogy.
For the second time since I have been at Mozilla I have encountered a situation
where hooks are called for notifications of a newly created window, but that
window has not yet been initialized properly, causing the hooks to behave badly.
The first time was inside our window neutering code in IPC, while the second
time was in our accessibility code.
Every time I have seen this, there is code that follows this pattern:
12345
HWNDhwnd=CreateWindowEx(/* ... */);if(hwnd){// Do some follow-up initialization to hwnd (Using SetProp as an example):SetProp(hwnd,"Foo",bar);}
This seems innocuous enough, right?
The problem is that CreateWindowEx calls hooks. If those hooks then try to do
something like GetProp(hwnd, "Foo"), that call is going to fail because the
“Foo” prop has not yet been set.
The key takeaway from this is that, if you are creating a new window, you must
do any follow-up initialization from within your window proc’s WM_CREATE
handler. This will guarantee that your window’s initialization will have
completed before any hooks are called.
You might be thinking, “But I don’t set any hooks!” While this may be true, you
must not forget about hooks set by third-party code.
“But those hooks won’t know anything about my program’s internals, right?”
Perhaps, perhaps not. But when those hooks fire, they give third-party software
the opportunity to run. In some cases, those hooks might even cause the thread
to reenter your own code. Your window had better be completely initialized
when this happens!
In the case of my latest discovery of this issue in bug 1380471, I made it
possible to use a C++11 lambda to simplify this pattern.
CreateWindowEx
accepts a lpParam parameter which is then passed to the WM_CREATE handler
as the lpCreateParams member of a CREATESTRUCT.
By setting lpParam to a pointer to a std::function<void(HWND)>, we may then
supply any callable that we wish for follow-up window initialization.
Using the previous code sample as a baseline, this allows me to revise the code
to safely set a property like this:
123456
std::function<void(HWND)>onCreate([](HWNDaHwnd)->void{SetProp(aHwnd,"Foo",bar);});HWNDhwnd=CreateWindowEx(/* ... */,&onCreate);// At this point is already too late to further initialize hwnd!
Note that since lpParam is always passed during WM_CREATE, which always fires
before CreateWindowEx returns, it is safe for onCreate to live on the stack.
I liked this solution for the a11y case because it preserved the locality of
the initialization code within the function that called CreateWindowEx; the
window proc for this window is implemented in another source file and the
follow-up initialization depends on the context surrounding the CreateWindowEx
call.
Speaking of window procs, here is how that window’s WM_CREATE handler invokes
the callable:
TL;DR: If you see a pattern where further initialization work is being done
on an HWND after a CreateWindowEx call, move that initialization code to your
window’s WM_CREATE handler instead.
In the past I have argued that our Nightly builds, both debug and release, should
use CRITICAL_SECTIONs (with full debug info) for our implementation of
mozilla::Mutex. I’d like to illustrate some reasons why this is so useful.
They enable more utility in WinDbg extensions
Every time you initialize a CRITICAL_SECTION, Windows inserts the CS’s
debug info into a process-wide linked list. This enables their discovery by
the Windows debugging engine, and makes the !cs, !critsec, and !locks
commands more useful.
They enable profiling of their initialization and acquisition
When the “Create user mode stack trace database” gflag is enabled, Windows
records the call stack of the thread that called InitializeCriticalSection
on that CS. Windows also records the call stack of the owning thread once
it has acquired the CS. This can be very useful for debugging deadlocks.
They track their contention counts
Since every CS has been placed in a process-wide linked list, we may now ask
the debugger to dump statistics about every live CS in the process. In
particular, we can ask the debugger to output the contention counts for each
CS in the process. After running a workload against Nightly, we may then take
the contention output, sort it descendingly, and be able to determine which
CRITICAL_SECTIONs are the most contended in the process.
We may then want to more closely inspect the hottest CSes to determine whether
there is anything that we can do to reduce contention and all of the extra
context switching that entails.
In Summary
When we use SRWLOCKs or initialize our CRITICAL_SECTIONs with the
CRITICAL_SECTION_NO_DEBUG_INFO flag, we are denying ourselves access to this
information. That’s fine on release builds, but on Nightly I think it is worth
having around. While I realize that most Mozilla developers have not used this
until now (otherwise I would not be writing this blog post), this rich debugger
info is one of those things that you do not miss until you do not have it.
For further reading about critical section debug info, check out
this
archived article from MSDN Magazine.
My colleague bsmedberg njn is going to be removing asynchronous plugin
initialization in bug 1352575. Sadly the feature never became solid enough
to remain enabled on release, so we cut our losses and cancelled the project
early in 2016. Now that code is just a bunch of dead weight. With the
deprecation of non-Flash NPAPI plugins in Firefox 52, our developers are now
working on simplifying the remaining NPAPI code as much as possible.
Obviously the removal of that code does not prevent me from discussing some of
the more interesting facets of that work.
Today I am going to talk about how async plugin init worked when web content
attempted to access a property on a plugin’s scriptable object, when that
plugin had not yet completed its asynchronous initialization.
As described on MDN,
the DOM queries a plugin for scriptability by calling NPP_GetValue with the
NPPVpluginScriptableNPObject constant. With async plugin init, we did not
return the true NPAPI scriptable object back to the DOM. Instead we returned
a surrogate object. This meant that we did not need to synchronously wait for
the plugin to initialize before returning a result back to the DOM.
If the DOM subsequently called into that surrogate object, the surrogate would
be forced to synchronize with the plugin. There was a limit on how much fakery
the async surrogate could do once the DOM needed a definitive answer – after
all, the NPAPI itself is entirely synchronous. While you may question whether
the asynchronous surrogate actually bought us any responsiveness, performance
profiles and measurements that I took at the time did indeed demonstrate that
the asynchronous surrogate did buy us enough additional concurrency to make it
worthwhile. A good number of plugin instantiations were able to complete in
time before the DOM had made a single invocation on the surrogate.
Once the surrogate object had synchronized with the plugin, it would then mostly
act as a pass-through to the plugin’s real NPAPI scriptable object, with one
notable exception: property accesses.
The reason for this is not necessarily obvious, so allow me to elaborate:
The DOM usually sets up a scriptable object as follows:
this.__proto__.__proto__.__proto__
Where this is the WebIDL object (ie, content’s <embed> element);
Whose prototype is the NPAPI scriptable object;
Whose prototype is the shared WebIDL prototype;
Whose prototype is Object.prototype.
NPAPI is reentrant (some might say insanely reentrant). It is possible (and
indeed common) for a plugin to set properties on the WebIDL object from within
the plugin’s NPP_New.
Suppose that the DOM tries to access a property on the plugin’s WebIDL object
that is normally set by the plugin’s NPP_New. In the asynchronous case, the
plugin’s initialization might still be in progress, so that property might not
yet exist.
In the case where the property does not yet exist on the WebIDL object, JavaScript
fails to retrieve an “own” property. It then moves on to the first prototype
and attempts to resolve the property on that. As outlined above, this prototype
would actually be the async surrogate. The async surrogate would then be in a
situation where it must absolutely produce a definitive result, so this would
trigger synchronization with the plugin. At this point the plugin would be
guaranteed to have finished initializing.
Now we have a problem: JS was already examining the NPAPI scriptable object when
it blocked to synchronize with the plugin. Meanwhile, the plugin went ahead and
set properties (including the one that we’re interested in) on the WebIDL object.
By the time that JS execution resumes, it would already be looking too far up the
prototype chain to see those new properties!
The surrogate needed to be aware of this when it synchronized with the plugin
during a property access. If the plugin had already completed its initialization
(thus rendering synchronization unnecessary), the surrogate would simply pass the
property access on to the real NPAPI scriptable object. On the other hand, if a
synchronization was performed, the surrogate would first retry the WebIDL object
by querying for the WebIDL object’s “own” properties, and return the own property
if it now existed. If no own property existed on the WebIDL object, then the
surrogate would revert to its “pass through all the things” behaviour.
If I hadn’t made the asynchronous surrogate scriptable object do that, we would
have ended up with a strange situation where the DOM’s initial property access
on an embed could fail non-deterministically during page load.
That’s enough chatter for today. I enjoy blogging about my crazy hacks that make
the impossible, umm… possible, so maybe I’ll write some more of these in the
future.
In February of this year I switched teams: After 3+ years on Mozilla’s
Performance Team, and after having the word “performance” in my job description
in some form or another for several years prior to that, I decided that it was
time for me to move on to new challenges. Fortunately the Platform org was
willing to have me set up shop under the (e10s|sandboxing|platform integration)
umbrella.
I am pretty excited about this new role!
My first project is to sort out the accessibility situation under Windows e10s.
This started back at Mozlando last December. A number of engineers from across
the Platform org, plus me, got together to brainstorm. Not too long after we had
all returned home, I ended up making a suggestion on an email thread that has
evolved into the core concept that I am currently attempting. As is typical at
Mozilla, no deed goes unpunished, so I have been asked to flesh out my ideas.
An overview of this plan is available on the wiki.
My hope is that I’ll be able to deliver a working, “version 0.9” type of demo
in time for our London all-hands at the end of Q2. Hopefully we will be able to
deliver on that!
Some Additional Notes
I am using this section of the blog post to make some additional notes. I don’t
feel that these ideas are strong enough to commit to a wiki yet, but I do want
them to be publicly available.
Once concern that our colleagues at NVAccess have identified is that the
current COM interfaces are too chatty; this is a major reason why screen
readers frequently inject libraries into the Firefox address space. If we serve
our content a11y objects as remote COM objects, there is concern that performance
would suffer. This concern is not due to latency, but rather due to frequency
of calls; one function call does not provide sufficient information to the a11y
client. As a result, multiple round trips are required to fetch all of the
information that is required for a particular DOM node.
My gut feeling about this is that this is probably a legitimate concern, however
we cannot make good decisions without quantifying the performance. My plan going
forward is to proceed with a naïve implementation of COM remoting to start,
followed by work on reducing round trips as necessary.
Smart Proxies
One idea that was discussed is the idea of the content process speculatively
sending information to the chrome process that might be needed in the future.
For example, if we have an IAccessible, we can expect that multiple properties
will be queried off that interface. A smart proxy could ship that data across
the RPC channel during marshaling so that querying that additional information
does not require additional round trips.
COM makes this possible using “handler marshaling.” I have dug up some
information about how to do this and am posting it here for posterity:
As of today I have added a new command to mozdbgext: !iat.
The syntax is pretty simple:
!iat <hexadecimal address>
This address shouldn’t be just any pointer; it should be the address of an
entry in the current module’s import address table (IAT). These addresses
are typically very identifiable by the _imp_ prefix in their symbol names.
The purpose of this extension is to look up the name of the DLL from whom the
function is being imported. Furthermore, the extension checks the expected
target address of the import with the actual target address of the import. This
allows us to detect API hooking via IAT patching.
An Example Session
I fired up a local copy of Nightly, attached a debugger to it, and dumped the
call stack of its main thread:
Notice the function call to MsgWaitForMultipleObjectsEx occurs indirectly;
the call instruction is referencing a pointer within the xul.dll binary
itself. This is the IAT entry that corresponds to that function.
Now, if I load mozdbgext, I can take the address of that IAT entry and execute
the following command:
0:000> !iat 0x197dc59c
Expected target: USER32.dll!MsgWaitForMultipleObjectsEx
Actual target: USER32!MsgWaitForMultipleObjectsEx+0x0
!iat has done two things for us:
It did a reverse lookup to determine the module and function name for the
import that corresponds to that particular IAT entry; and
It followed the IAT pointer and determined the symbol at the target address.
Normally we want both the expected and actual targets to match. If they don’t,
we should investigate further, as this mismatch may indicate that the IAT has
been patched by a third party.
Note that !iat command is breakpad aware (provided that you’ve already
loaded the symbols using !bploadsyms) but can fall back to the Microsoft
symbol engine as necessary.
Further note that the !iat command does not yet accept the _imp_ symbolic
names for the IAT entries, you need to enter the hexadecimal representation of
the pointer.
A well-known problem at Mozilla is that, while most of our desktop users run
Windows, most of Mozilla’s developers do not. There are a lot of problems that
result from that, but one of the most frustrating to me is that sometimes
those of us that actually use Windows for development find ourselves at a
disadvantage when it comes to tooling or other productivity enhancers.
In many ways this problem is also a Catch-22: People don’t want to use Windows
for many reasons, but tooling is big part of the problem. OTOH, nobody is
motivated to improve the tooling situation if nobody is actually going to
use them.
A couple of weeks ago my frustrations with the situation boiled over when I
learned that our Cpp unit test suite could not log symbolicated call stacks,
resulting in my filing of bug 1238305 and bug 1240605. Not only could we
not log those stacks, in many situations we could not view them in a debugger
either.
Due to the fact that PDB files consume a large amount of disk space, we don’t
keep those when building from integration or try repositories. Unfortunately
they are be quite useful to have when there is a build failure. Most of our
integration builds, however, do include breakpad symbols. Developers may also
explicitly request symbols
for their try builds.
A couple of years ago I had begun working on a WinDbg debugger extension that
was tailored to Mozilla development. It had mostly bitrotted over time, but I
decided to resurrect it for a new purpose: to help WinDbg*
grok breakpad.
Enter mozdbgext
mozdbgext is the result. This extension
adds a few commands that makes Win32 debugging with breakpad a little bit easier.
The original plan was that I wanted mozdbgext to load breakpad symbols and then
insert them into the debugger’s symbol table via the IDebugSymbols3::AddSyntheticSymbol
API. Unfortunately the design of this API is not well equipped for bulk loading
of synthetic symbols: each individual symbol insertion causes the debugger to
re-sort its entire symbol table. Since xul.dll’s quantity of symbols is in the
six-figure range, using this API to load that quantity of symbols is
prohibitively expensive. I tweeted a Microsoft PM who works on Debugging Tools
for Windows, asking if there would be any improvements there, but it sounds like
this is not going to be happening any time soon.
My original plan would have been ideal from a UX perspective: the breakpad
symbols would look just like any other symbols in the debugger and could be
accessed and manipulated using the same set of commands. Since synthetic symbols
would not work for me in this case, I went for “Plan B:” Extension commands that
are separate from, but analagous to, regular WinDbg commands.
I plan to continuously improve the commands that are available. Until I have a
proper README checked in, I’ll introduce the commands here.
Loading the Extension
Use the .load command: .load <path_to_mozdbgext_dll>
Loading the Breakpad Symbols
Extract the breakpad symbols into a directory.
In the debugger, enter !bploadsyms <path_to_breakpad_symbol_directory>
Note that this command will take some time to load all the relevant symbols.
Working with Breakpad Symbols
Note: You must have successfully run the !bploadsyms command first!
As a general guide, I am attempting to name each breakpad command similarly to
the native WinDbg command, except that the command name is prefixed by !bp.
Stack trace: !bpk
Find nearest symbol to address: !bpln <address> where address is specified
as a hexadecimal value.
Downloading windbgext
I have pre-built binaries (32-bit, 64-bit) available for download.
Note that there are several other commands that are “roughed-in” at this point
and do not work correctly yet. Please stick to the documented commands at this
time.
* When I write “WinDbg”, I am really
referring to any debugger in the Debugging Tools for Windows package,
including cdb.
I’m finally getting ‘round to writing about a nasty bug that I had to spend a
bunch of time with in Q4 2015. It’s one of the more challenging problems that
I’ve had to break and I’ve been asked a lot of questions about it. I’m talking
about bug 1218473.
How This All Began
In bug 1213567 I had landed a patch to intercept calls to CreateWindowEx.
This was necessary because it was apparent in that bug that window subclassing
was occurring while a window was neutered (“neutering” is terminology that is
specific to Mozilla’s Win32 IPC code).
While I’ll save a discussion on the specifics of window neutering for another
day, for our purposes it is sufficient for me to point out that subclassing a
neutered window is bad because it creates an infinite recursion scenario with
window procedures that will eventually overflow the stack.
Neutering is triggered during certain types of IPC calls as soon as a message is
sent to an unneutered window on the thread making the IPC call. Unfortunately in
the case of bug 1213567, the message triggering the neutering was
WM_CREATE. Shortly after creating that window, the code responsible would
subclass said window. Since WM_CREATE had already triggered neutering, this
would result in the pathological case that triggers the stack overflow.
For a fix, what I wanted to do is to prevent messages that were sent immediately
during the execution of CreateWindow (such as WM_CREATE) from triggering
neutering prematurely. By intercepting calls to CreateWindowEx, I could wrap
those calls with a RAII object that temporarily suppresses the neutering. Since
the subclassing occurs immediately after window creation, this meant that
this subclassing operation was now safe.
It wasn’t obvious where to start debugging this. While a crash spike was clearly
correlated with the landing of bug 1213567, the crashes were occurring in
code that had nothing to do with IPC or Win32. For example, the first stack that
I looked at was js::CreateRegExpMatchResult!
When it is just not clear where to begin, I like to start by looking at our
correlation data in Socorro – you’d be surprised how often they can bring
problems into sharp relief!
In this case, the correlation data didn’t disappoint: there
was 100% correlation
with a module called _etoured.dll. There was also correlation with the
presence of both NVIDIA video drivers and Intel video drivers. Clearly this
was a concern only when NVIDIA Optimus technology was enabled.
I also had a pretty strong hypothesis about what _etoured.dll was: For many
years, Microsoft Research has shipped a package called
Detours. Detours is a
library that is used for intercepting Win32 API calls. While the changelog for
Detours 3.0 points out that it has “Removed [the] requirement for including
detoured.dll in processes,” in previous versions of the package, this library
was required to be injected into the target process.
I concluded that _etoured.dll was most likely a renamed version of
detoured.dll from Detours 2.x.
Following The Trail
Now that I knew the likely culprit, I needed to know how it was getting there.
During a November trip to the Mozilla Toronto office, I spent some time
debugging a test laptop that was configured with Optimus.
Knowing that the presence of Detours was somehow interfering with our own API
interception, I decided to find out whether it was also trying to intercept
CreateWindowExW. I launched windbg, started Firefox with it, and then told
it to break as soon as user32.dll was loaded:
sxe ld:user32.dll
Then I pressed F5 to resume execution. When the debugger broke again, this
time user32 was now in memory. I wanted the debugger to break as soon as
CreateWindowExW was touched:
ba w 4 user32!CreateWindowExW
Once again I resumed execution. Then the debugger broke on the memory access and
gave me this call stack:
This stack is a gold mine of information. In particular, it tells us the
following:
The offending DLLs are being injected by AppInit_DLLs (and in fact, Raymond
Chen has blogged about
this exact case in the past).
nvinit.dll is the name of the DLL that is injected by step 1.
nvinit.dll loads nvd3d9wrap.dll which then uses Detours to patch
our copy of CreateWindowExW.
I then became curious as to which other functions they were patching.
Since Detours is patching executable code, we know that at some point it is
going to need to call VirtualProtect to make the target code writable. In the
worst case, VirtualProtect’s caller is going to pass the address of the page
where the target code resides. In the best case, the caller will pass in the
address of the target function itself!
I restarted windbg, but this time I set a breakpoint on VirtualProtect:
bp kernel32!VirtualProtect
I then resumed the debugger and examined the call stack every time it broke.
While not every single VirtualProtect call would correspond to a detour, it
would be obvious when it was, as the NVIDIA DLLs would be on the call stack.
The first time I caught a detour, I examined the address being passed to
VirtualProtect: I ended up with the best possible case: the address was
pointing to the actual target function! From there I was able to distill a
list of other
functions being hooked by the injected NVIDIA DLLs.
Putting it all Together
By this point I knew who was hooking our code and knew how it was getting there.
I also noticed that CreateWindowEx is the only function that the NVIDIA DLLs
and our own code were both trying to intercept. Clearly there was some kind of
bad interaction occurring between the two interception mechanisms, but what was
it?
I decided to go back and examine a
specific
crash dump. In particular, I wanted to examine three different memory locations:
The first few instructions of user32!CreateWindowExW;
The first few instructions of xul!CreateWindowExWHook; and
The site of the call to user32!CreateWindowExW that triggered the crash.
Of those three locations, the only one that looked off was location 2:
6b1f6611 56 push esi
6b1f6612 ff15f033e975 call dword ptr [combase!CClassCache::CLSvrClassEntry::GetDDEInfo+0x41 (75e933f0)]
6b1f6618 c3 ret
6b1f6619 7106 jno xul!`anonymous namespace'::CreateWindowExWHook+0x6 (6b1f6621)
xul!`anonymous namespace'::CreateWindowExWHook:
6b1f661b cc int 3
6b1f661c cc int 3
6b1f661d cc int 3
6b1f661e cc int 3
6b1f661f cc int 3
6b1f6620 cc int 3
6b1f6621 ff ???
Why the hell were the first six bytes filled with breakpoint instructions?
I decided at this point to look at some source code. Fortunately Microsoft
publishes the 32-bit source code for Detours, licensed for non-profit use,
under the name “Detours Express.”
I found a copy of Detours Express 2.1 and checked out the code. First I wanted
to know where all of these 0xcc bytes were coming from. A quick grep turned
up what I was looking for:
Okay, so Detours writes the breakpoints out immediately after it has written a
jmp pointing to its trampoline.
Why is our hook function being trampolined?
The reason must be because our hook was installed first! Detours has
detected that and has decided that the best place to trampoline to the NVIDIA
hook is at the beginning of our hook function.
But Detours is using the wrong address!
We can see that because the int 3 instructions are written out at the
beginning of CreateWindowExWHook, even though there should be a jmp
instruction first.
Detours is calculating the wrong address to write its jmp!
Finding a Workaround
Once I knew what the problem was, I needed to know more about the why –
only then would I be able to come up with a way to work around this problem.
I decided to reconstruct the scenario where both our code and Detours are trying
to hook the same function, but our hook was installed first. I would then
follow along through the Detours code to determine how it calculated the wrong
address to install its jmp.
The first thing to keep in mind is that Mozilla’s function interception code
takes advantage of hot-patch points
in Windows. If the target function begins with a mov edi, edi prolog, we
use a hot-patch style hook instead of a trampoline hook. I am not going to go
into detail about hot-patch hooks here – the above Raymond Chen link contains
enough details to answer your questions. For the purposes of this blog post, the
important point is that Mozilla’s code patches the mov edi, edi, so NVIDIA’s
Detours library would need to recognize and follow the jmps that our code
patched in, in order to write its own jmp at CreateWindowExWHook.
Tracing through the Detours code, I found the place where it checks for a
hot-patch hook and follows the jmp if necessary. While examining a function
called detour_skip_jmp, I found the bug:
detours.cpp
124
pbNew=pbCode+*(INT32*)&pbCode[1];
This code is supposed to be telling Detours where the target address of a jmp
is, so that Detours can follow it. pbNew is supposed to be the target address
of the jmp. pbCode is referencing the address of the beginning of the jmp
instruction itself. Unfortunately, with this type of jmp instruction, target
addresses are always relative to the address of the next instruction, not
the current instruction! Since the current jmp instruction is five bytes
long, Detours ends up writing its jmpfive bytes prior to the intended
target address!
I went and checked the source code for Detours Express 3.0 to see if this had
been fixed, and indeed it had:
detours.cpp
163
PBYTEpbNew=pbCode+5+*(INT32*)&pbCode[1];
That doesn’t do much for me right now, however, since the NVIDIA stuff is still
using Detours 2.x.
In the case of Mozilla’s code, there is legitimate executable code at that
incorrect address that Detours writes to. It is corrupting the last few
instructions of that function, thus explaining those mysterious crashes that
were seemingly unrelated code.
I confirmed this by downloading the binaries from the build that was associated
with the crash dump that I was analyzing. [As an aside, I should point out that
you need to grab the identical binaries for this exercise; you cannot build
from the same source revision and expect this to work due to variability that
is introduced into builds by things like PGO.]
The five bytes preceeding CreateWindowExHookW in the crash dump diverged from
those same bytes in the original binaries. I could also make out that the
overwritten bytes consisted of a jmp instruction.
In Summary
Let us now review what we know at this point:
Detours 2.x doesn’t correctly follow jmps from hot-patch hooks;
If Detours tries to hook something that has already been hot-patched
(including legitimate hot patches from Microsoft), it will write bytes at
incorrect addresses;
NVIDIA Optimus injects this buggy code into everybody’s address spaces via an
AppInit_DLLs entry for nvinit.dll.
How can we best distill this into a suitable workaround?
One option could be to block the NVIDIA DLLs outright. In most cases this would
probably be the simplest option, but I was hesitant to do so this time. I was
concerned about the unintended consequences of blocking what, for better or
worse, is a user-mode component of NVIDIA video drivers.
Instead I decided to take advantage of the fact that we now know how this bug is
triggered. I have modified our API interception code such that if it detects
the presence of NVIDIA Optimus, it disables hot-patch style hooks.
Not only will this take care of the crash spike that started when I landed
bug 1213567, I also expect it to take care of other crash signatures
whose relationship to this bug was not obvious.
That concludes this episode of Bugs from Hell. Until next time…
There has been enough that has been said over the past week about WebExtensions
that I wasn’t sure if I wanted to write this post. As usual, I can’t seem to
help myself. Note the usual disclaimer that this is my personal opinion. Further
note that I have no involvement with WebExtensions at this time, so I write this
from the point of view of an observer.
API? What API?
I shall begin with the proposition that the legacy, non-jetpack
environment for addons is not an API. As ridiculous as some readers might
consider this to be, please humour me for a moment.
Let us go back to the acronym, “API.” Application Programming Interface.
While the usage of the term “API” seems to have expanded over the years to encompass
just about any type of interface whatsoever, I’d like to explore the first letter of that
acronym: Application.
An Application Programming Interface is a specific type of interface that is
exposed for the purposes of building applications. It typically provides a
formal abstraction layer that isolates applications from the implementation
details behind the lower tier(s) in the software stack. In the case of web
browsers, I suggest that there are two distinct types of applications:
web content, and extensions.
There is obviously a very well defined API for web content. On the other hand,
I would argue that Gecko’s legacy addon environment is not an API at all! From
the point of view of an extension, there is no abstraction, limited formality,
and not necessarily an intention to be used by applications.
An extension is imported into Firefox with full privileges and can access whatever
it wants. Does it have access to interfaces? Yes, but are those interfaces intended
for applications? Some are, but many are not. The environment that Gecko
currently provides for legacy addons is analagous to an operating system running
every single application in kernel mode. Is that powerful? Absolutely! Is that
the best thing to do for maintainability and robustness? Absolutely not!
Somewhere a line needs to be drawn to demarcate this abstraction layer and
improve Gecko developers’ ability to make improvements under the hood. Last
week’s announcement was an invitation to addon developers to help shape that
future. Please participate and please do so constructively!
WebExtensions are not Chrome Extensions
When I first heard rumors about WebExtensions in Whistler, my source made it
very clear to me that the WebExtensions initiative is not about making Chrome
extensions run in Firefox. In fact, I am quite disappointed with some of the
press coverage that seems to completely miss this point.
Yes, WebExtensions will be implementing some APIs to be source compatible
with Chrome. That makes it easier to port a Chrome extension, but porting will
still be necessary. I like the Venn Diagram concept that the WebExtensions FAQ
uses: Some Chrome APIs will not be available in WebExtensions. On the other hand,
WebExtensions will be providing APIs above and beyond the Chrome API set that
will maintain Firefox’s legacy of extensibility.
Please try not to think of this project as Mozilla taking functionality away.
In general I think it is safe to think of this as an opportunity to move that
same functionality to a mechanism that is more formal and abstract.
Yesterday I decided to diff the export tables of some core Win32 DLLs to see what’s
changed between Windows 8.1 and the Windows 10 technical preview. There weren’t
many changes, but the ones that were there are quite exciting IMHO. While
researching these new APIs, I also stumbled across some others that were
added during the Windows 8 timeframe that we should be considering as well.
Volatile Ranges
While my diff showed these APIs as new exports for Windows 10, the MSDN docs
claim that these APIs are actually new for the Windows 8.1 Update. Using the
OfferVirtualMemory
and ReclaimVirtualMemory
functions, we can now specify ranges of virtual memory that are safe to
discarded under memory pressure. Later on, should we request that access be
restored to that memory, the kernel will either return that virtual memory to us
unmodified, or advise us that the associated pages have been discarded.
A couple of years ago we had an intern on the Perf Team who was working on
bringing this capability to Linux. I am pleasantly surprised that this is now
offered on Windows.
madvise(MADV_WILLNEED) for Win32
For the longest time we have been hacking around the absence of a madvise-like
API on Win32. On Linux we will do a madvise(MADV_WILLNEED) on memory-mapped
files when we want the kernel to read ahead. On Win32, we were opening the
backing file and then doing a series of sequential reads through the file to
force the kernel to cache the file data. As of Windows 8, we can now call
PrefetchVirtualMemory
for a similar effect.
Operation Recorder: An API for SuperFetch
The OperationStart
and OperationEnd
APIs are intended to record access patterns during a file I/O operation.
SuperFetch will then create prefetch files for the operation, enabling prefetch
capabilities above and beyond the use case of initial process startup.
Memory Pressure Notifications
This API is not actually new, but I couldn’t find any invocations of it in the
Mozilla codebase. CreateMemoryResourceNotification
allocates a kernel handle that becomes signalled when physical memory is running
low. Gecko already has facilities for handling memory pressure events on other
platforms, so we should probably add this to the Win32 port.