Aaron Klotz’s Software Blog

My Adventures in Software Development

Asynchronous Plugin Initialization: Requiem

| Comments

My colleague bsmedberg njn is going to be removing asynchronous plugin initialization in bug 1352575. Sadly the feature never became solid enough to remain enabled on release, so we cut our losses and cancelled the project early in 2016. Now that code is just a bunch of dead weight. With the deprecation of non-Flash NPAPI plugins in Firefox 52, our developers are now working on simplifying the remaining NPAPI code as much as possible.

Obviously the removal of that code does not prevent me from discussing some of the more interesting facets of that work.

Today I am going to talk about how async plugin init worked when web content attempted to access a property on a plugin’s scriptable object, when that plugin had not yet completed its asynchronous initialization.

As described on MDN, the DOM queries a plugin for scriptability by calling NPP_GetValue with the NPPVpluginScriptableNPObject constant. With async plugin init, we did not return the true NPAPI scriptable object back to the DOM. Instead we returned a surrogate object. This meant that we did not need to synchronously wait for the plugin to initialize before returning a result back to the DOM.

If the DOM subsequently called into that surrogate object, the surrogate would be forced to synchronize with the plugin. There was a limit on how much fakery the async surrogate could do once the DOM needed a definitive answer — after all, the NPAPI itself is entirely synchronous. While you may question whether the asynchronous surrogate actually bought us any responsiveness, performance profiles and measurements that I took at the time did indeed demonstrate that the asynchronous surrogate did buy us enough additional concurrency to make it worthwhile. A good number of plugin instantiations were able to complete in time before the DOM had made a single invocation on the surrogate.

Once the surrogate object had synchronized with the plugin, it would then mostly act as a pass-through to the plugin’s real NPAPI scriptable object, with one notable exception: property accesses.

The reason for this is not necessarily obvious, so allow me to elaborate:

The DOM usually sets up a scriptable object as follows:

  • Where this is the WebIDL object (ie, content’s <embed> element);
  • Whose prototype is the NPAPI scriptable object;
  • Whose prototype is the shared WebIDL prototype;
  • Whose prototype is Object.prototype.

NPAPI is reentrant (some might say insanely reentrant). It is possible (and indeed common) for a plugin to set properties on the WebIDL object from within the plugin’s NPP_New.

Suppose that the DOM tries to access a property on the plugin’s WebIDL object that is normally set by the plugin’s NPP_New. In the asynchronous case, the plugin’s initialization might still be in progress, so that property might not yet exist.

In the case where the property does not yet exist on the WebIDL object, JavaScript fails to retrieve an “own” property. It then moves on to the first prototype and attempts to resolve the property on that. As outlined above, this prototype would actually be the async surrogate. The async surrogate would then be in a situation where it must absolutely produce a definitive result, so this would trigger synchronization with the plugin. At this point the plugin would be guaranteed to have finished initializing.

Now we have a problem: JS was already examining the NPAPI scriptable object when it blocked to synchronize with the plugin. Meanwhile, the plugin went ahead and set properties (including the one that we’re interested in) on the WebIDL object. By the time that JS execution resumes, it would already be looking too far up the prototype chain to see those new properties!

The surrogate needed to be aware of this when it synchronized with the plugin during a property access. If the plugin had already completed its initialization (thus rendering synchronization unnecessary), the surrogate would simply pass the property access on to the real NPAPI scriptable object. On the other hand, if a synchronization was performed, the surrogate would first retry the WebIDL object by querying for the WebIDL object’s “own” properties, and return the own property if it now existed. If no own property existed on the WebIDL object, then the surrogate would revert to its “pass through all the things” behaviour.

If I hadn’t made the asynchronous surrogate scriptable object do that, we would have ended up with a strange situation where the DOM’s initial property access on an embed could fail non-deterministically during page load.

That’s enough chatter for today. I enjoy blogging about my crazy hacks that make the impossible, umm… possible, so maybe I’ll write some more of these in the future.

New Team, New Project

| Comments

In February of this year I switched teams: After 3+ years on Mozilla’s Performance Team, and after having the word “performance” in my job description in some form or another for several years prior to that, I decided that it was time for me to move on to new challenges. Fortunately the Platform org was willing to have me set up shop under the (e10s|sandboxing|platform integration) umbrella.

I am pretty excited about this new role!

My first project is to sort out the accessibility situation under Windows e10s. This started back at Mozlando last December. A number of engineers from across the Platform org, plus me, got together to brainstorm. Not too long after we had all returned home, I ended up making a suggestion on an email thread that has evolved into the core concept that I am currently attempting. As is typical at Mozilla, no deed goes unpunished, so I have been asked to flesh out my ideas. An overview of this plan is available on the wiki.

My hope is that I’ll be able to deliver a working, “version 0.9” type of demo in time for our London all-hands at the end of Q2. Hopefully we will be able to deliver on that!

Some Additional Notes

I am using this section of the blog post to make some additional notes. I don’t feel that these ideas are strong enough to commit to a wiki yet, but I do want them to be publicly available.

Once concern that our colleagues at NVAccess have identified is that the current COM interfaces are too chatty; this is a major reason why screen readers frequently inject libraries into the Firefox address space. If we serve our content a11y objects as remote COM objects, there is concern that performance would suffer. This concern is not due to latency, but rather due to frequency of calls; one function call does not provide sufficient information to the a11y client. As a result, multiple round trips are required to fetch all of the information that is required for a particular DOM node.

My gut feeling about this is that this is probably a legitimate concern, however we cannot make good decisions without quantifying the performance. My plan going forward is to proceed with a naïve implementation of COM remoting to start, followed by work on reducing round trips as necessary.

Smart Proxies

One idea that was discussed is the idea of the content process speculatively sending information to the chrome process that might be needed in the future. For example, if we have an IAccessible, we can expect that multiple properties will be queried off that interface. A smart proxy could ship that data across the RPC channel during marshaling so that querying that additional information does not require additional round trips.

COM makes this possible using “handler marshaling.” I have dug up some information about how to do this and am posting it here for posterity:

House of COM, May 1999 Microsoft Systems Journal;
Implementing and Activating a Handler with Extra Data Supplied by Server on MSDN;
Wicked Code, August 2000 MSDN Magazine. This is not available on the MSDN Magazine website but I have an archived copy on CD-ROM.

New Mozdbgext Command: !iat

| Comments

As of today I have added a new command to mozdbgext: !iat.

The syntax is pretty simple:

!iat <hexadecimal address>

This address shouldn’t be just any pointer; it should be the address of an entry in the current module’s import address table (IAT). These addresses are typically very identifiable by the _imp_ prefix in their symbol names.

The purpose of this extension is to look up the name of the DLL from whom the function is being imported. Furthermore, the extension checks the expected target address of the import with the actual target address of the import. This allows us to detect API hooking via IAT patching.

An Example Session

I fired up a local copy of Nightly, attached a debugger to it, and dumped the call stack of its main thread:

 # ChildEBP RetAddr
00 0018ecd0 765aa32a ntdll!NtWaitForMultipleObjects+0xc
01 0018ee64 761ec47b KERNELBASE!WaitForMultipleObjectsEx+0x10a
02 0018eecc 1406905a USER32!MsgWaitForMultipleObjectsEx+0x17b
03 0018ef18 1408e2c8 xul!mozilla::widget::WinUtils::WaitForMessage+0x5a
04 0018ef84 13fdae56 xul!nsAppShell::ProcessNextNativeEvent+0x188
05 0018ef9c 13fe3778 xul!nsBaseAppShell::DoProcessNextNativeEvent+0x36
06 0018efbc 10329001 xul!nsBaseAppShell::OnProcessNextEvent+0x158
07 0018f0e0 1038e612 xul!nsThread::ProcessNextEvent+0x401
08 0018f0fc 1095de03 xul!NS_ProcessNextEvent+0x62
09 0018f130 108e493d xul!mozilla::ipc::MessagePump::Run+0x273
0a 0018f154 108e48b2 xul!MessageLoop::RunInternal+0x4d
0b 0018f18c 108e448d xul!MessageLoop::RunHandler+0x82
0c 0018f1ac 13fe78f0 xul!MessageLoop::Run+0x1d
0d 0018f1b8 14090f07 xul!nsBaseAppShell::Run+0x50
0e 0018f1c8 1509823f xul!nsAppShell::Run+0x17
0f 0018f1e4 1514975a xul!nsAppStartup::Run+0x6f
10 0018f5e8 15146527 xul!XREMain::XRE_mainRun+0x146a
11 0018f650 1514c04a xul!XREMain::XRE_main+0x327
12 0018f768 00215c1e xul!XRE_main+0x3a
13 0018f940 00214dbd firefox!do_main+0x5ae
14 0018f9e4 0021662e firefox!NS_internal_main+0x18d
15 0018fa18 0021a269 firefox!wmain+0x12e
16 0018fa60 76e338f4 firefox!__tmainCRTStartup+0xfe
17 0018fa74 77d656c3 KERNEL32!BaseThreadInitThunk+0x24
18 0018fabc 77d6568e ntdll!__RtlUserThreadStart+0x2f
19 0018facc 00000000 ntdll!_RtlUserThreadStart+0x1b

Let us examine the code at frame 3:

14069042 6a04            push    4
14069044 68ff1d0000      push    1DFFh
14069049 8b5508          mov     edx,dword ptr [ebp+8]
1406904c 2b55f8          sub     edx,dword ptr [ebp-8]
1406904f 52              push    edx
14069050 6a00            push    0
14069052 6a00            push    0
14069054 ff159cc57d19    call    dword ptr [xul!_imp__MsgWaitForMultipleObjectsEx (197dc59c)]
1406905a 8945f4          mov     dword ptr [ebp-0Ch],eax

Notice the function call to MsgWaitForMultipleObjectsEx occurs indirectly; the call instruction is referencing a pointer within the xul.dll binary itself. This is the IAT entry that corresponds to that function.

Now, if I load mozdbgext, I can take the address of that IAT entry and execute the following command:

0:000> !iat 0x197dc59c
Expected target: USER32.dll!MsgWaitForMultipleObjectsEx
Actual target: USER32!MsgWaitForMultipleObjectsEx+0x0

!iat has done two things for us:

  1. It did a reverse lookup to determine the module and function name for the import that corresponds to that particular IAT entry; and
  2. It followed the IAT pointer and determined the symbol at the target address.

Normally we want both the expected and actual targets to match. If they don’t, we should investigate further, as this mismatch may indicate that the IAT has been patched by a third party.

Note that !iat command is breakpad aware (provided that you’ve already loaded the symbols using !bploadsyms) but can fall back to the Microsoft symbol engine as necessary.

Further note that the !iat command does not yet accept the _imp_ symbolic names for the IAT entries, you need to enter the hexadecimal representation of the pointer.