2008-11-13 Cameron Zwarich <zwarich@apple.com>
Reviewed by Maciej Stachowiak.
Bug 21943: Avoid needless reads of temporary values in CTI code
<https://bugs.webkit.org/show_bug.cgi?id=21943>
If an opcode needs to load a virtual register and a previous opcode left
the contents of that virtual register in a machine register, use the
value in the machine register rather than getting it from memory.
In order to perform this optimization, it is necessary to know the
jump tagets in the CodeBlock. For temporaries, the only problematic
jump targets are binary logical operators and the ternary conditional
operator. However, if this optimization were to be extended to local
variable registers as well, other jump targets would need to be
included, like switch statement cases and the beginnings of catch
blocks.
This optimization also requires that the fast case and the slow case
of an opcode use emitPutResult() on the same register, which was chosen
to be eax, as that is the register into which we read the first operand
of opcodes. In order to make this the case, we needed to add some mov
instructions to the slow cases of some instructions.
This optimizaton is not applied whenever compileBinaryArithOp() is used
to compile an opcode, because different machine registers may be used to
store the final result. It seems possible to rewrite the code generation
in compileBinaryArithOp() to allow for this optimization.
This optimization is also not applied when generating slow cases,
because some fast cases overwrite the value of eax before jumping to the
slow case. In the future, it may be possible to apply this optimization
to slow cases as well, but it did not seem to be a speedup when testing
an early version of this patch.
This is a 1.0% speedup on SunSpider and a 6.3% speedup on the V8
benchmark suite.
* VM/CTI.cpp:
(JSC::CTI::killLastResultRegister):
(JSC::CTI::emitGetArg):
(JSC::CTI::emitGetPutArg):
(JSC::CTI::emitGetCTIParam):
(JSC::CTI::emitGetFromCallFrameHeader):
(JSC::CTI::emitPutResult):
(JSC::CTI::emitCTICall):
(JSC::CTI::CTI):
(JSC::CTI::compileOpCall):
(JSC::CTI::compileOpStrictEq):
(JSC::CTI::emitSlowScriptCheck):
(JSC::CTI::compileBinaryArithOp):
(JSC::CTI::privateCompileMainPass):
(JSC::CTI::privateCompileSlowCases):
(JSC::CTI::privateCompileGetByIdProto):
(JSC::CTI::privateCompilePatchGetArrayLength):
* VM/CTI.h:
* VM/CodeBlock.h:
(JSC::CodeBlock::isTemporaryRegisterIndex):
* bytecompiler/CodeGenerator.cpp:
(JSC::CodeGenerator::emitLabel):
git-svn-id: http://svn.webkit.org/repository/webkit/trunk@38368 268f45cc-cd09-0410-ab3c-d52691b4dbfc
diff --git a/JavaScriptCore/VM/CTI.h b/JavaScriptCore/VM/CTI.h
index 25a41a8..909262c 100644
--- a/JavaScriptCore/VM/CTI.h
+++ b/JavaScriptCore/VM/CTI.h
@@ -374,7 +374,7 @@
void compileBinaryArithOp(OpcodeID, unsigned dst, unsigned src1, unsigned src2, OperandTypes opi, unsigned i);
void compileBinaryArithOpSlowCase(Instruction*, OpcodeID, Vector<SlowCaseEntry>::iterator& iter, unsigned dst, unsigned src1, unsigned src2, OperandTypes opi, unsigned i);
- void emitGetArg(int src, X86Assembler::RegisterID dst);
+ void emitGetArg(int src, X86Assembler::RegisterID dst, unsigned i);
void emitGetPutArg(unsigned src, unsigned offset, X86Assembler::RegisterID scratch);
void emitPutArg(X86Assembler::RegisterID src, unsigned offset);
void emitPutArgConstant(unsigned value, unsigned offset);
@@ -430,6 +430,8 @@
void printOpcodeOperandTypes(unsigned src1, unsigned src2);
#endif
+ void killLastResultRegister();
+
X86Assembler m_jit;
Machine* m_machine;
JSGlobalData* m_globalData;
@@ -456,9 +458,11 @@
Vector<SlowCaseEntry> m_slowCases;
Vector<SwitchRecord> m_switches;
+ int m_lastResultBytecodeRegister;
+ unsigned m_jumpTargetsPosition;
+
// This limit comes from the limit set in PCRE
static const int MaxPatternSize = (1 << 16);
-
};
}