| |
| * Add more test cases. Categories we'd like to cover (with reasonably |
| real-world tests, preferably not microbenchmarks) include: |
| |
| (X marks the ones that are fairly well covered now). |
| |
| X math (general) |
| X bitops |
| X 3-d (the math bits) |
| - crypto / encoding |
| X string processing |
| - regexps |
| - date processing |
| - array processing |
| - control flow |
| - function calls / recursion |
| - object access (unclear if it is possible to make a realistic |
| benchmark that isolates this) |
| |
| I'd specifically like to add all the computer language shootout |
| tests that Mozilla is using. |
| |
| * Normalize tests. Most of the test cases available have a repeat |
| count of some sort, so the time they take can be tuned. The tests |
| should be tuned so that each category contributes about the same |
| total, and so each test in each category contributes about the same |
| amount. The question is, what implementation should be the baseline? |
| My current thought is to either pick some specific browser on a |
| specific platform (IE 7 or Firefox 2 perhaps), or try to target the |
| average that some set of same-generation release browsers get on |
| each test. The latter is more work. IE7 is probably a reasonable |
| normalization target since it is the latest version of the most |
| popular browser, so results on this benchmark will tell you how much |
| you have to gain or lose by using a different browser. |
| |
| * Instead of using the standard error, the correct way to calculate |
| a 95% confidence interval for a small sample is the t-test. |
| <http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves |
| using values from a 2-tailed t-distribution table instead of 1.96 to |
| multiply by the error function, a table is available at |
| <http://www.medcalc.be/manual/t-distribution.php> |
| |
| * Add support to compare two different engines (or two builds of the |
| same engine) interleaved. |
| |
| * Add support to compare two existing sets of saved results. |
| |
| * Allow repeat count to be controlled from the browser-hosted version |
| and the WebKitTools wrapper script. |
| |
| * Add support to run only a subset of the tests (both command-line and |
| web versions). |
| |
| * Add a profile mode for the command-line version that runs the tests |
| repeatedly in the same command-line interpreter instance, for ease |
| of profiling. |
| |
| * Make the browser-hosted version prettier, both in general design and |
| maybe using bar graphs for the output. |
| |
| * Make it possible to track change over time and generate a graph per |
| result showing result and error bar for each version. |
| |
| * Hook up to automated testing / buildbot infrastructure. |
| |
| * Possibly... add the ability to download iBench from its original |
| server, pull out the JS test content, preprocess it, and add it as a |
| category to the benchmark. |
| |
| * Profit. |