blob: d34ba6c96165154192c2f5590116eec10c541436 [file] [log] [blame]
<h1>Introducing the JetStream benchmark suite</h1>
<p>Today we are introducing a new WebKit JavaScript benchmark test suite,
<a href="http://www.browserbench.org/JetStream">JetStream</a>. JetStream codifies what
our de facto process has been &mdash; to combine latency and throughput benchmarks with roughly
equal weighting, and capturing both metrics of traditional JavaScript programming styles as
well as new JavaScript-based technologies that have captured our imaginations. Scores on
JetStream are a good indicator of the performance users would see in advanced web applications
like games.</p>
<p>Optimizing the performance of our JavaScript engine is a high priority for the WebKit
project. Examples of some of the improvements we introduced in the last year include
<a href="https://bugs.webkit.org/show_bug.cgi?id=112839">concurrent compilation</a>,
<a href="https://bugs.webkit.org/show_bug.cgi?id=121074">generational GC</a>, and the
<a href="https://bugs.webkit.org/show_bug.cgi?id=112840">FTL JIT</a>. Engineering such
improvements requires focus: we try to prioritize high-impact projects over building and
maintaining complex optimizations that have smaller benefits.
Thus, we motivate performance work with
benchmarks that illustrate the kinds of workloads that WebKit users will likely encounter.
This philosophy of benchmark-driven development has long been part of WebKit.</p>
<h2>The previous state of JavaScript benchmarking</h2>
<p>As we made enhancements to the WebKit JavaScript engine, we found that no single
benchmark suite was entirely representative of the scenarios that we wanted to improve. We
like <a href="https://www.webkit.org/perf/sunspider/sunspider.html">SunSpider</a> for its
coverage of commonly-used language constructs and for the fact that its running time is
representative of the running time of code on the web, but it falls short for measuring
peak throughput. We like <a href="https://developers.google.com/octane/">Octane</a>, but it
skews too far in the other direction: it's useful for determining our engine's peak
throughput but isn't sensitive enough to the performance you'd be most likely to see on typical
web workloads. It also downplays novel JavaScript technologies like asm.js; only one of
Octane's 15 benchmarks was an asm.js test, and this test ignores floating point
performance.</p>
<p>Finding good asm.js benchmarks is difficult. Even though
<a href="https://github.com/kripken/emscripten">Emscripten</a> is gaining
mindshare, its tests are long-running and until recently, lacked a web harness.
So we built our own asm.js benchmarks by using tests
from the <a href="http://llvm.org/">LLVM</a>
<a href="http://llvm.org/docs/TestingGuide.html#test-suite">test suite</a>.
These C and C++ tests are
used by LLVM developers to track performance improvements of the clang/LLVM compiler stack.
Emscripten itself uses LLVM to
generate JavaScript code. This makes the LLVM test suite particularly appropriate for testing
how well a JavaScript engine handles native code. Another benefit of our new tests is that they
are much quicker to run than the Emscripten test suite.</p>
<p>Having good JavaScript benchmarks allows us to confidently pursue ambitious improvements to
WebKit. For example, SunSpider guided our
concurrent compilation work, while the asm.js tests and Octane's throughput tests motivated
our work on the FTL JIT. But allowing our testing to be based on
a hodgepodge of these different benchmark suites has become impractical. It's
difficult
to tell contributors what they should be testing if there is no unified test suite
that can tell them if their change had the desired effect on performance. We want one test
suite that can report one score in the end, and we want this one score to be representative
of WebKit's future direction.</p>
<h2>Designing the new JetStream benchmark suite</h2>
<p>Different WebKit components require different approaches to measuring performance.
In some cases, the obvious approach works pretty well: for example, many layout and
rendering optimizations can be driven by measuring page load time on representative web pages.
But measuring the performance of a programming language implementation requires
more subtlety. We want to increase the benchmarks' sensitivity to core engine improvements, but
not so much so that we lose perspective on how those engine improvements play out in real
web sites. We want to minimize the opportunities for system noise to throw off our measurements,
but anytime a workload is inherently prone to noise, we want a benchmark to show this.
We want our benchmarks to represent a high-fidelity approximation of the workloads that
WebKit users are likely to care about.</p>
<p>JetStream combines a variety of JavaScript benchmarks, covering a variety of advanced
workloads and programming techniques, and reports a single score that balances them using a
geometric mean. Each test is run three times and scores are reported with 95% confidence
intervals. Each benchmark measures a distinct workload, and no single optimization technique
is sufficient to speed up all benchmarks. Some benchmarks demonstrate tradeoffs, and aggressive
or specialized optimization for one benchmark might make another benchmark slower. Demonstrating
trade-offs is crucial for our work. As discussed in my
<a href="https://www.webkit.org/blog/3362/introducing-the-webkit-ftl-jit/">previous post about
our new JIT compiler</a>, WebKit tries to dynamically adapt to workload using different
execution tiers. But this is never perfect. For example, while our new FTL JIT compiler
gives us fantastic speed-ups on peak throughput tests, it does cause slight regressions in
some ramp-up tests. New optimizations for advanced language runtimes often run into such
trade-offs, and our goal with JetStream is to have a benchmark that informs us about
the trade-offs that we are making.</p>
<p>JetStream includes benchmarks from the SunSpider 1.0.2 and Octane 2 JavaScript
benchmark suites. It also includes benchmarks from the LLVM compiler open source
project, compiled to JavaScript using Emscripten 1.13. It also includes a
benchmark based on the Apache Harmony open source project's HashMap, hand-translated to
JavaScript. More information about the benchmarks included in JetStream is
available on the <a href="http://www.browserbench.org/JetStream-1.0/in-depth.html">JetStream
In Depth</a> page.</p>
<p>We're excited to be introducing this new benchmark. To run it, simply visit
<a href="http://www.browserbench.org/JetStream/">browserbench.org/JetStream</a>. You can
<a href="http://bugs.webkit.org/">file bugs</a> against the benchmark using WebKit's bug
management system under the Tools/Tests component.</p>