[JSC] Add operationToInt32SensibleSlow to optimize kraken pbkdf2 and sha256
https://bugs.webkit.org/show_bug.cgi?id=167736
Reviewed by Saam Barati.
JSTests:
* stress/to-int32-sensible.js: Added.
(shouldBe):
(toInt32):
(test):
Source/JavaScriptCore:
Add a new function operationToInt32SensibleSlow. This function is only
called after x86 cvttss2si_rr is failed. This means that the
given double number never in range of int32 truncatable numbers.
As a result, exp in operationToInt32 always becomes >= 31. So
we can change the condition from `exp < 32` to `exp == 31`.
This makes missingOne constant. And it leads significantly good
code generation.
The original operationToInt32 code.
170: 66 48 0f 7e c1 movq %xmm0,%rcx
175: 31 c0 xor %eax,%eax
177: 66 48 0f 7e c6 movq %xmm0,%rsi
17c: 48 c1 f9 34 sar $0x34,%rcx
180: 81 e1 ff 07 00 00 and $0x7ff,%ecx
186: 8d 91 01 fc ff ff lea -0x3ff(%rcx),%edx
18c: 83 fa 53 cmp $0x53,%edx
18f: 77 37 ja 1c8 <_ZN3JSC16operationToInt32Ed+0x58>
191: 83 fa 34 cmp $0x34,%edx
194: 7f 3a jg 1d0 <_ZN3JSC16operationToInt32Ed+0x60>
196: b9 34 00 00 00 mov $0x34,%ecx
19b: 66 48 0f 7e c7 movq %xmm0,%rdi
1a0: 29 d1 sub %edx,%ecx
1a2: 48 d3 ff sar %cl,%rdi
1a5: 83 fa 1f cmp $0x1f,%edx
1a8: 89 f8 mov %edi,%eax
1aa: 7f 12 jg 1be <_ZN3JSC16operationToInt32Ed+0x4e>
1ac: 89 d1 mov %edx,%ecx
1ae: b8 01 00 00 00 mov $0x1,%eax
1b3: d3 e0 shl %cl,%eax
1b5: 89 c2 mov %eax,%edx
1b7: 8d 40 ff lea -0x1(%rax),%eax
1ba: 21 f8 and %edi,%eax
1bc: 01 d0 add %edx,%eax
1be: 89 c2 mov %eax,%edx
1c0: f7 da neg %edx
1c2: 48 85 f6 test %rsi,%rsi
1c5: 0f 48 c2 cmovs %edx,%eax
1c8: f3 c3 repz retq
1ca: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1d0: 66 48 0f 7e c0 movq %xmm0,%rax
1d5: 81 e9 33 04 00 00 sub $0x433,%ecx
1db: 48 d3 e0 shl %cl,%rax
1de: eb de jmp 1be <_ZN3JSC16operationToInt32Ed+0x4e>
The operationToInt32SensibleSlow code.
1e0: 66 48 0f 7e c1 movq %xmm0,%rcx
1e5: 66 48 0f 7e c2 movq %xmm0,%rdx
1ea: 48 c1 f9 34 sar $0x34,%rcx
1ee: 81 e1 ff 07 00 00 and $0x7ff,%ecx
1f4: 8d b1 01 fc ff ff lea -0x3ff(%rcx),%esi
1fa: 83 fe 34 cmp $0x34,%esi
1fd: 7e 21 jle 220 <_ZN3JSC28operationToInt32SensibleSlowEd+0x40>
1ff: 66 48 0f 7e c0 movq %xmm0,%rax
204: 81 e9 33 04 00 00 sub $0x433,%ecx
20a: 48 d3 e0 shl %cl,%rax
20d: 89 c1 mov %eax,%ecx
20f: f7 d9 neg %ecx
211: 48 85 d2 test %rdx,%rdx
214: 0f 48 c1 cmovs %ecx,%eax
217: c3 retq
218: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
21f: 00
220: 66 48 0f 7e c0 movq %xmm0,%rax
225: b9 34 00 00 00 mov $0x34,%ecx
22a: 29 f1 sub %esi,%ecx
22c: 48 d3 f8 sar %cl,%rax
22f: 89 c1 mov %eax,%ecx
231: 81 c9 00 00 00 80 or $0x80000000,%ecx
237: 83 fe 1f cmp $0x1f,%esi
23a: 0f 44 c1 cmove %ecx,%eax
23d: 89 c1 mov %eax,%ecx
23f: f7 d9 neg %ecx
241: 48 85 d2 test %rdx,%rdx
244: 0f 48 c1 cmovs %ecx,%eax
247: c3 retq
248: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
24f: 00
This improves kraken pbkdf2 by 10.8% and sha256 by 7.5%.
baseline patched
stanford-crypto-pbkdf2 153.195+-2.745 ^ 138.204+-2.513 ^ definitely 1.1085x faster
stanford-crypto-sha256-iterative 49.047+-1.038 ^ 45.610+-1.235 ^ definitely 1.0754x faster
<arithmetic> 101.121+-1.379 ^ 91.907+-1.500 ^ definitely 1.1003x faster
* assembler/CPU.h:
(JSC::hasSensibleDoubleToInt):
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::compileValueToInt32):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::doubleToInt32):
(JSC::FTL::DFG::LowerDFGToB3::sensibleDoubleToInt32):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::hasSensibleDoubleToInt): Deleted.
* ftl/FTLOutput.h:
* runtime/MathCommon.cpp:
(JSC::operationToInt32SensibleSlow):
* runtime/MathCommon.h:
git-svn-id: http://svn.webkit.org/repository/webkit/trunk@211670 268f45cc-cd09-0410-ab3c-d52691b4dbfc
10 files changed