<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Aaron Klotz at Mozilla]]></title>
  <link href="http://dblohm7.ca/atom.xml" rel="self"/>
  <link href="http://dblohm7.ca/"/>
  <updated>2013-06-13T11:17:01-06:00</updated>
  <id>http://dblohm7.ca/</id>
  <author>
    <name><![CDATA[Aaron Klotz]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Detecting Main Thread I/O with SPS]]></title>
    <link href="http://dblohm7.ca/blog/2013/06/12/detecting-main-thread-i-slash-o-with-sps/"/>
    <updated>2013-06-12T18:00:00-06:00</updated>
    <id>http://dblohm7.ca/blog/2013/06/12/detecting-main-thread-i-slash-o-with-sps</id>
    <content type="html"><![CDATA[<p>In <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=867762">bug 867762</a> I am landing I/O
interpose facilities for SPS. This feature will allow main thread I/O to be
displayed in the profiler UI.</p>

<p>There are a few components to this patch. Classes that implement the
<code>IOInterposerModule</code> interface are responsible for hooking into an arbitrary I/O
facility and calling into <code>IOInterposer</code> with those events.</p>

<p>The <code>IOInterposer</code> receives those events and then dispatches them to any registered
observers. For the purposes of SPS there is only one observer,
<code>ProfilerIOInterposeObserver</code>, however I wrote this keeping in mind that we
may eventually want to use the I/O interpose facilities for other purposes (such
as shutdown write poisioning).</p>

<p>For bug 867762 I have implemented two modules. <code>NSPRInterposer</code> hooks into NSPR
file I/O and generates events whenever a <code>PR_Read</code>, <code>PR_Write</code>, or <code>PR_Sync</code> occurs.
The second module, <code>SQLiteInterposer</code>, leverages the SQLite VFS that we use for
telemetry to tap into reads, writes, and fsyncs generated by SQLite. In the future
we expect to expand this into several additional modules. Some ideas include a module
that uses Event Tracing for Windows to read events from the Windows kernel and a
module that interposes calls to <code>read</code>, <code>write</code>, and <code>fsync</code> on Linux.</p>

<p>On the observer side of things, for now we simply insert a marker into the profiler
timeline whenever main thread I/O is reported. Here is a sample screenshot of the
markers in action (the pink stuff for readers who are unfamiliar with SPS markers):</p>

<p><a href="http://dblohm7.ca/images/iomarkers.png"><img src="http://dblohm7.ca/images/iomarkers.png"></a></p>

<p>In <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=867757">bug 867757</a> (under review)
this will become more sophisticated, as SPS will immediately sample the callstack of
the thread whose I/O has been intercepted. This annotated stack will be inserted
directly into the timeline.</p>

<p>There will need to be some UI changes for this to be presented in a reasonable way,
but with both existing patches applied the data is already useful. By firing up
Cleopatra and filtering on either <code>NSPRInterposer</code> or <code>SQLiteInterposer</code>, you can
isolate and view the main thread I/O stacks:</p>

<p><a href="http://dblohm7.ca/images/iofilteredstacks.png"><img src="http://dblohm7.ca/images/iofilteredstacks.png"></a></p>

<p>Hopefully these patches will prove to be beneficial to our efforts to eliminate
main thread I/O.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Plugin Hang UI on Aurora]]></title>
    <link href="http://dblohm7.ca/blog/2013/02/15/plugin-hang-ui-on-aurora/"/>
    <updated>2013-02-15T17:00:00-07:00</updated>
    <id>http://dblohm7.ca/blog/2013/02/15/plugin-hang-ui-on-aurora</id>
    <content type="html"><![CDATA[<h2>The UI Has Landed</h2>

<p>The <a href="http://dblohm7.ca/blog/2012/11/22/plugin-hang-user-interface-for-firefox/">Plugin Hang UI</a>
landed in mozilla-central in time for January&rsquo;s merge. This means that it is
now available on both Nightly and Aurora.</p>

<p>While it&rsquo;s great that this code is now available to a larger audience, there
are consequences to this. :&ndash;)</p>

<h2>Telemetry and Pref Adjustments</h2>

<p>The first (and more pleasant) consequence is that we are now receiving telemetry
about the UI&rsquo;s usage patterns. This allows us to make some adjustments as the
Plugin Hang UI gets closer to the release channel. As happy as I am that this
feature will be putting users in the driver&rsquo;s seat when dealing with hung
plugins, it&rsquo;s also important to not annoy users.</p>

<p>Initially our telemetry suggested that the Plugin Hang UI frequently appeared
but then cancelled automatically because the plugin resumed execution. This
indicated to us that we should increase the default value of the
<code>dom.ipc.plugins.hangUITimeoutSecs</code> preference (<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=833560">bug 833560</a>).
There has also been some discussion about scaling the Hang UI threshold depending
on hardware performance and user behaviour. This threshold is tricky to balance;
while we want users to be able to terminate a hanging plugin, we want to provide
that feature with minimal annoyance.</p>

<h2>Crashes</h2>

<p>Another consequence of the feature landing is that we received reports from
Nightly and Aurora showing that the Plugin Hang UI was inducing crashes in the
browser itself. Unfortunately the only thing that the stack traces were telling
us was that the browser-side code that hosts out-of-process plugins was not
being cleaned up properly after forcibly terminating the plugin container. We
didn&rsquo;t have any steps to reproduce. What broke this mystery wide open was when
our crash stats were finally able to show some correlations. I learned that
nearly 50% of the crashes happened on single-core CPUs.</p>

<h3>The Problem</h3>

<p>Once I was able to test this out for myself, the issue revealed itself in short order.
Fortunately for me, even though the problem involved thread scheduling, I was still able
to reproduce it in a debugger. <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=828034#c8">This patch</a>
took care of the problem.</p>

<p>There&rsquo;s a bit of nuance to what was happening with this crash. When
<code>plugin-container.exe</code> is forcibly terminated, cleanup of the browser-facing
plugin code may be executed in one of two different sequences:</p>

<h3>Sequence 1 (Most common)</h3>

<ol>
<li><p>After terminating <code>plugin-container.exe</code>, the Plugin Hang UI posts a
<code>CleanupFromTimeout</code> work item to the main thread. Concurrently on the
I/O thread, Firefox&rsquo;s <code>RPCChannel</code> detects an error and posts a
<code>OnNotifyMaybeChannelError</code> work item.</p></li>
<li><p><code>OnNotifyMaybeChannelError</code> executes and sets the channel&rsquo;s state to
<code>ChannelError</code>. It also cleans up the <code>PluginModuleParent</code> actor and its
actors for subprotocols.</p></li>
<li><p><code>CleanupFromTimeout</code> runs and attempts to Close the channel. This
is effectively a no-op since the channel was already closed with an
error status by step 2.</p></li>
</ol>


<h3>Sequence 2 (Less common unless running on fewer cores)</h3>

<ol>
<li><p>Same as in Sequence 1.</p></li>
<li><p><code>CleanupFromTimeout</code> runs before <code>OnNotifyMaybeChannelError</code>. This
work item attempts to do a regular <code>Close</code> on the channel. Since the
channel&rsquo;s state is still set to <code>ChannelConnected</code>, the close operation
doesn&rsquo;t realize that it needs to do additonal cleanup. It performs
a clean shutdown of the RPC channel without properly cleaning up the
IPDL actors.</p></li>
<li><p><code>OnMaybeNotifyChannelError</code> runs, sees the channel is already closed
due to the activities in step 2, and does nothing.</p></li>
<li><p>A crash later occurs because the actors were never cleaned up properly.</p></li>
</ol>


<h3>Some Additional Analysis</h3>

<p>Sequence 2 cannot crash if the <code>plugin-container.exe</code> process is terminated
by the main thread. This is because <code>PluginModuleParent::ShouldContinueFromReplyTimeout</code>
returns <code>false</code> in this case and the channel&rsquo;s state is set to <code>ChannelTimeout</code> by
the time that the <code>CleanupFromTimeout</code> work item executes. This guarantees that
a full cleanup will be done by <code>CleanupFromTimeout</code>.</p>

<p>With the Plugin Hang UI, <code>plugin-container.exe</code> is not terminated by the main
thread, so the channel&rsquo;s state must be explicitly updated after termination.</p>

<h3>Solution</h3>

<p>The patch modifies the <code>CleanupFromTimeout</code> work item so that if the
plugin container was terminated outside the main thread, the channel is
explicitly closed with an error state. This ensures that the actors are
properly cleaned up.</p>

<h2>Hangs</h2>

<p>I filed <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=834127">bug 834127</a>
when QA discovered that sometimes the Plugin Hang UI was not being displayed.
I found out that Firefox was correctly spawning <code>plugin-hang-ui.exe</code>, however
it was not showing any UI.</p>

<p>This lead be back down the input queue rabbit hole that I briefly discussed
last time. I learned that on an intermittent basis, the Plugin Hang UI dialog
box was being hung up on Win32 <code>ShowWindow</code> and <code>SetFocus</code> calls. What I did
know was that my attempts to explicitly detach the Plugin Hang UI&rsquo;s thread
from the hung Firefox thread weren&rsquo;t working as well as I would have liked.</p>

<p>After <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=834127#c9">numerous attempts</a>
at fixing these issues, I determined that for the time being we will need to
rescind the owner-owned relationship between Firefox and the Plugin Hang UI
dialog. If we didn&rsquo;t do so, seemingly benign actions like calling <code>PeekMessage</code>
or passing a <code>WM_NCLBUTTONDOWN</code> message to <code>DefWindowProc</code> would be enough
to bring the Plugin Hang UI to a halt. Why is this, you ask? I decided to
peek into kernel mode to find out.</p>

<h3>Adventures in Kernel Mode</h3>

<p>Recall that the guts of <code>USER32</code> and <code>GDI32</code> were moved into kernel mode
via <code>win32k.sys</code> in the Windows NT 3.5 timeframe. It follows that any
efforts to examine Win32 internals necessitates peering into kernel mode.
I didn&rsquo;t need an elaborate remote debugging setup for my purposes, so I
used <a href="http://technet.microsoft.com/en-us/sysinternals/bb897415">Sysinternals LiveKd</a>
to take a snapshot of my kernel and debug it.</p>

<p>My workflow was essentially as follows:</p>

<ol>
<li>Plugin Hang UI gets stuck</li>
<li>Attach a debugger (I use WinDbg) to <code>plugin-hang-ui.exe</code></li>
<li>Run LiveKd</li>
<li>Type <code>!thread -t &lt;tid&gt;</code> into the kernel debugger, where <code>&lt;tid&gt;</code> is the
thread ID that is hung in the user-mode WinDbg</li>
</ol>


<p>Every kernel-mode call stack that I examined on a hung thread usually ended up going through this code path:</p>

<pre><samp>
...
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`0def3ec0 fffff800`0327c652 : fffffa80`079c43c0 fffffa80`079c43c0 00000000`00000000 fffffa80`00000008 : nt!KiSwapContext+0x7a
fffff880`0def4000 fffff800`0328da9f : fffffd54`0000000c fffffd54`000002a0 000002ac`00000000 00000804`fffffd54 : nt!KiCommitThreadWait+0x1d2
fffff880`0def4090 fffff800`03278a14 : 00000000`00000000 00000000`00000005 00000000`00000000 fffff800`03279600 : nt!KeWaitForSingleObject+0x19f
fffff880`0def4130 fffff800`03279691 : fffffa80`079c43c0 fffffa80`079c4410 00000000`00000000 00000000`00000000 : nt!KiSuspendThread+0x54
fffff880`0def4170 fffff800`0327c85d : fffffa80`079c43c0 00000000`00000000 fffff800`032789c0 00000000`00000000 : nt!KiDeliverApc+0x201
fffff880`0def41f0 fffff800`0328da9f : fffffa80`0a7510e0 fffff800`0327c26f fffffa80`00000000 fffff800`03402e80 : nt!KiCommitThreadWait+0x3dd
fffff880`0def4280 fffff960`0010d457 : fffff900`c2255200 00000000`0000000d 00000000`00000001 00000000`00000000 : nt!KeWaitForSingleObject+0x19f
fffff880`0def4320 fffff960`0010d4f1 : fffff880`0def0000 fffff900`c08e0e20 00000000`00000000 00000000`00000000 : win32k!xxxRealSleepThread+0x257
fffff880`0def43c0 fffff960`000c46d3 : 00000000`00000000 fffff900`c093ee90 00000000`00000200 00000000`00000046 : win32k!xxxSleepThread+0x59
fffff880`0def43f0 fffff960`0010c53e : fffff900`c093ee90 fffff900`c08e0e20 00000000`00000000 fffff900`c25724b0 : win32k!xxxInterSendMsgEx+0x112a
fffff880`0def4500 fffff960`000d8ccf : fffff900`c25724b0 fffff900`c25724b0 <mark>00000000`003c031a</mark> fffff900`c0800b90 : win32k!xxxSendMessageTimeout+0x1de
fffff880`0def45b0 fffff960`000dae0b : fffff960`00000006 fffff880`0def47c0 fffff960`00000000 fffff900`00000000 : win32k!xxxCalcValidRects+0x1a3
fffff880`0def46f0 fffff960`000dac3a : fffff900`00000001 fffff900`c08e0e20 fffff880`0def49b8 fffff900`00000000 : win32k!xxxEndDeferWindowPosEx+0x18f
fffff880`0def47b0 fffff960`000d6e09 : 00000000`00000000 00000000`00000001 fffff900`00000000 fffff800`00000000 : win32k!xxxSetWindowPos+0x156
fffff880`0def4830 fffff960`0007c66d : fffff900`00000000 fffff900`0004366c fffff900`00000000 fffff900`c209f410 : win32k!xxxActivateThisWindow+0x441
...
</samp></pre>


<p>The highlighted hexadecimal value above is a handle to a window with class
<code>MSCTFIME UI</code> (i.e. a Microsoft IME window) that was created by the main
UI thread in the hung Flash process. Despite my best efforts to try to
disconnect the Plugin Hang UI input queue from these threads, Windows is
insistent on sending a synchronous, inter-thread message to the IME window
on the hung Flash thread. This doesn&rsquo;t happen if we omit setting Firefox&rsquo;s
<code>HWND</code> as the Plugin Hang UI&rsquo;s owner window.</p>

<h3>Implications</h3>

<p>Now that the Plugin Hang UI specifies a <code>NULL</code> owner, Windows can no longer
guarantee that the Plugin Hang UI will always appear above the Firefox window
that was active at the time that the plugin hang occurred. Having said that,
our experiments have shown that Windows does not try to promote an unresponsive
window to the front of the Z-order.</p>

<p>In a multiple-window situation, Windows may move the other, inactive Firefox
windows ahead of the Plugin Hang UI in the Z-order, but the window that was
active when Firefox first hung does not move. I think that this is acceptable.
In fact, the same behavour would be observable if we had been able to specify
an owner.</p>

<p>If you&rsquo;re interested in helping to test this feature, please install Nightly
and try the repro from <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=834127#c0">bug 834127</a>.
When the Plugin Hang UI fires, try to move around the Plugin Hang UI as well
as the other windows on your desktop, including Firefox. If the Z-order changes
differently from the way that I have observed here, please try to generate
some steps to reproduce, drop me a line (aklotz on <code>#perf</code> or MoCo email)
and let me know what&rsquo;s going on. Thanks!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Plugin Hang User Interface for Firefox]]></title>
    <link href="http://dblohm7.ca/blog/2012/11/22/plugin-hang-user-interface-for-firefox/"/>
    <updated>2012-11-22T10:22:00-07:00</updated>
    <id>http://dblohm7.ca/blog/2012/11/22/plugin-hang-user-interface-for-firefox</id>
    <content type="html"><![CDATA[<p>My first significant project at Mozilla has been
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=805591">bug 805591</a>,
implementing a user interface to be displayed by Firefox when
an out-of-process plugin hangs. This is important because when NPAPI
plugins hang, they block the main thread in Firefox. When the main
thread is blocked, the Firefox user interface grinds to a halt.</p>

<p>The purpose of the Plugin Hang UI is twofold:</p>

<ul>
<li>To inform the user which plugin is responsible for the hang; and</li>
<li>Provide the user with an opportunity to terminate the plugin if he/she wishes.</li>
</ul>


<p>Currently this is feature is exclusive to Firefox Desktop on Windows.
When the Hang UI is invoked, you&rsquo;ll notice that Firefox has created
a new child process, plugin-hang-ui.exe. This child process will
display the Plugin Hang UI dialog box.</p>

<p><img src="http://dblohm7.ca/images/hangui.png"></p>

<p>My patch is currently under review, but the Plugin Hang UI is fully functional
in a <a href="http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/aklotz@mozilla.com-f5d8fdf4f29a/try-win32/firefox-19.0a1.en-US.win32.installer.exe">try build</a>,
so please check it out and play with it!</p>

<h2>Changes to Preferences</h2>

<ul>
<li><p><code>dom.ipc.plugins.hangUITimeoutSecs</code>: This is the number of seconds
that Firefox waits for a hung plugin before displaying the Plugin Hang UI.
Setting this value to zero disables the Plugin Hang UI.</p></li>
<li><p><code>dom.ipc.plugins.timeoutSecs</code>: This pref is still used with the Hang UI,
but its semantics change a little bit. Once the Plugin Hang UI has been
displayed, Firefox will wait <code>dom.ipc.plugins.timeoutSecs</code> seconds before
terminating the plugin automatically, even if the user did not elect to
stop the plugin using the Hang UI. This is nearly identical to the way that
Firefox behaves without the Hang UI &mdash; the only difference is that now this
timeout period doesn&rsquo;t commence until the Hang UI is displayed.</p></li>
<li><p><code>dom.ipc.plugins.hangUIMinDisplaySecs</code>: This is the minimum number
of seconds that Firefox should display the Plugin Hang UI. If
<code>dom.ipc.plugins.timeoutSecs</code> is set to a value lower than this
pref, the Hang UI is disabled. The idea here is that, if
<code>dom.ipc.plugins.timeoutSecs</code> is only set to 5 seconds to begin with,
then the plugin will already have been auto terminated before the
user has half a chance to read the Plugin Hang UI.</p></li>
</ul>


<h2>Implementation Details</h2>

<p>There are a couple of interesting things that I&rsquo;d like to point out
relating to this patch.</p>

<p>Because the Plugin Hang UI is a child process, I needed to ensure that
its window always appears in front of the hung Firefox window. I decided
that I would have Firefox send its top-level <code>HWND</code> over to the child
process so that the Hang UI could specify that handle as the owner of its
top-level window. If you&rsquo;ve seen <a href="http://blogs.msdn.com/b/oldnewthing">Raymond Chen&rsquo;s</a>
<a href="http://channel9.msdn.com/Blogs/scobleizer/Raymond-Chen-PDC-05-Talk-Five-Things-Every-Win32-Programmer-Needs-to-Know">Five Things Every Win32 Programmer Needs to Know</a>,
then you&rsquo;ll know what is coming: Deadlock!</p>

<p>The problem is that the Hang UI window and the Firefox window are
created by different threads. When I set the owner relationship
between the Hang UI top-level window and the Firefox top-level window,
the window manager implicitly synchronizes both threads&#8217; input queues.
In the case of the Plugin Hang UI, this is a huge problem: Windows just
synchronized the Hang UI&rsquo;s input queue with a hung thread!</p>

<p>What&rsquo;s the solution, then? Think about it this way: the window manager
might have implicitly synchronized the two threads&#8217; input queues, but
there&rsquo;s no reason why we can&rsquo;t then <em>explicitly</em> desynchronize them!</p>

<p>Here&rsquo;s the first thing I do in the Plugin Hang UI&rsquo;s <code>WM_INITDIALOG</code> handler:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'>      <span class="c1">// Disentangle our input queue from the hung Firefox process</span>
</span><span class='line'>      <span class="n">AttachThreadInput</span><span class="p">(</span><span class="n">GetCurrentThreadId</span><span class="p">(),</span>
</span><span class='line'>                        <span class="n">GetWindowThreadProcessId</span><span class="p">(</span><span class="n">mParentWindow</span><span class="p">,</span> <span class="n">nullptr</span><span class="p">),</span>
</span><span class='line'>                        <span class="n">FALSE</span><span class="p">);</span>
</span></code></pre></td></tr></table></div></figure>


<p>Without this line, the Hang UI is at risk of deadlock. This is
especially true if the user attempts to send input to the Firefox
window underneath.</p>
]]></content>
  </entry>
  
</feed>
