| <h1>Introducing the JetStream benchmark suite</h1> |
| |
| <p>Today we are introducing a new WebKit JavaScript benchmark test suite, |
| <a href="http://www.browserbench.org/JetStream">JetStream</a>. JetStream codifies what |
| our de facto process has been — to combine latency and throughput benchmarks with roughly |
| equal weighting, and capturing both metrics of traditional JavaScript programming styles as |
| well as new JavaScript-based technologies that have captured our imaginations. Scores on |
| JetStream are a good indicator of the performance users would see in advanced web applications |
| like games.</p> |
| |
| <p>Optimizing the performance of our JavaScript engine is a high priority for the WebKit |
| project. Examples of some of the improvements we introduced in the last year include |
| <a href="https://bugs.webkit.org/show_bug.cgi?id=112839">concurrent compilation</a>, |
| <a href="https://bugs.webkit.org/show_bug.cgi?id=121074">generational GC</a>, and the |
| <a href="https://bugs.webkit.org/show_bug.cgi?id=112840">FTL JIT</a>. Engineering such |
| improvements requires focus: we try to prioritize high-impact projects over building and |
| maintaining complex optimizations that have smaller benefits. |
| Thus, we motivate performance work with |
| benchmarks that illustrate the kinds of workloads that WebKit users will likely encounter. |
| This philosophy of benchmark-driven development has long been part of WebKit.</p> |
| |
| <h2>The previous state of JavaScript benchmarking</h2> |
| |
| <p>As we made enhancements to the WebKit JavaScript engine, we found that no single |
| benchmark suite was entirely representative of the scenarios that we wanted to improve. We |
| like <a href="https://www.webkit.org/perf/sunspider/sunspider.html">SunSpider</a> for its |
| coverage of commonly-used language constructs and for the fact that its running time is |
| representative of the running time of code on the web, but it falls short for measuring |
| peak throughput. We like <a href="https://developers.google.com/octane/">Octane</a>, but it |
| skews too far in the other direction: it's useful for determining our engine's peak |
| throughput but isn't sensitive enough to the performance you'd be most likely to see on typical |
| web workloads. It also downplays novel JavaScript technologies like asm.js; only one of |
| Octane's 15 benchmarks was an asm.js test, and this test ignores floating point |
| performance.</p> |
| |
| <p>Finding good asm.js benchmarks is difficult. Even though |
| <a href="https://github.com/kripken/emscripten">Emscripten</a> is gaining |
| mindshare, its tests are long-running and until recently, lacked a web harness. |
| So we built our own asm.js benchmarks by using tests |
| from the <a href="http://llvm.org/">LLVM</a> |
| <a href="http://llvm.org/docs/TestingGuide.html#test-suite">test suite</a>. |
| These C and C++ tests are |
| used by LLVM developers to track performance improvements of the clang/LLVM compiler stack. |
| Emscripten itself uses LLVM to |
| generate JavaScript code. This makes the LLVM test suite particularly appropriate for testing |
| how well a JavaScript engine handles native code. Another benefit of our new tests is that they |
| are much quicker to run than the Emscripten test suite.</p> |
| |
| <p>Having good JavaScript benchmarks allows us to confidently pursue ambitious improvements to |
| WebKit. For example, SunSpider guided our |
| concurrent compilation work, while the asm.js tests and Octane's throughput tests motivated |
| our work on the FTL JIT. But allowing our testing to be based on |
| a hodgepodge of these different benchmark suites has become impractical. It's |
| difficult |
| to tell contributors what they should be testing if there is no unified test suite |
| that can tell them if their change had the desired effect on performance. We want one test |
| suite that can report one score in the end, and we want this one score to be representative |
| of WebKit's future direction.</p> |
| |
| <h2>Designing the new JetStream benchmark suite</h2> |
| |
| <p>Different WebKit components require different approaches to measuring performance. |
| In some cases, the obvious approach works pretty well: for example, many layout and |
| rendering optimizations can be driven by measuring page load time on representative web pages. |
| But measuring the performance of a programming language implementation requires |
| more subtlety. We want to increase the benchmarks' sensitivity to core engine improvements, but |
| not so much so that we lose perspective on how those engine improvements play out in real |
| web sites. We want to minimize the opportunities for system noise to throw off our measurements, |
| but anytime a workload is inherently prone to noise, we want a benchmark to show this. |
| We want our benchmarks to represent a high-fidelity approximation of the workloads that |
| WebKit users are likely to care about.</p> |
| |
| <p>JetStream combines a variety of JavaScript benchmarks, covering a variety of advanced |
| workloads and programming techniques, and reports a single score that balances them using a |
| geometric mean. Each test is run three times and scores are reported with 95% confidence |
| intervals. Each benchmark measures a distinct workload, and no single optimization technique |
| is sufficient to speed up all benchmarks. Some benchmarks demonstrate tradeoffs, and aggressive |
| or specialized optimization for one benchmark might make another benchmark slower. Demonstrating |
| trade-offs is crucial for our work. As discussed in my |
| <a href="https://www.webkit.org/blog/3362/introducing-the-webkit-ftl-jit/">previous post about |
| our new JIT compiler</a>, WebKit tries to dynamically adapt to workload using different |
| execution tiers. But this is never perfect. For example, while our new FTL JIT compiler |
| gives us fantastic speed-ups on peak throughput tests, it does cause slight regressions in |
| some ramp-up tests. New optimizations for advanced language runtimes often run into such |
| trade-offs, and our goal with JetStream is to have a benchmark that informs us about |
| the trade-offs that we are making.</p> |
| |
| <p>JetStream includes benchmarks from the SunSpider 1.0.2 and Octane 2 JavaScript |
| benchmark suites. It also includes benchmarks from the LLVM compiler open source |
| project, compiled to JavaScript using Emscripten 1.13. It also includes a |
| benchmark based on the Apache Harmony open source project's HashMap, hand-translated to |
| JavaScript. More information about the benchmarks included in JetStream is |
| available on the <a href="http://www.browserbench.org/JetStream-1.0/in-depth.html">JetStream |
| In Depth</a> page.</p> |
| |
| <p>We're excited to be introducing this new benchmark. To run it, simply visit |
| <a href="http://www.browserbench.org/JetStream/">browserbench.org/JetStream</a>. You can |
| <a href="http://bugs.webkit.org/">file bugs</a> against the benchmark using WebKit's bug |
| management system under the Tools/Tests component.</p> |