fourthTier: DFG should be able to run on a separate thread
https://bugs.webkit.org/show_bug.cgi?id=112839

Source/JavaScriptCore:

Reviewed by Geoffrey Garen.

This is the final bit of concurrent JITing. The idea is that there is a
single global worklist, and a single global thread, that does all
optimizing compilation. This is the DFG::Worklist. It contains a queue of
DFG::Plans, and a map from CodeBlock* (the baseline code block we're
trying to optimize) to DFG::Plan. If the DFGDriver tries to concurrently
compile something, it puts the Plan on the Worklist. The Worklist's
thread will compile that Plan eventually, and when it's done, it will
signal its completion by (1) notifying anyone waiting for the Worklist to
be done, and (2) forcing the CodeBlock::m_jitExecuteCounter to take slow
path. The next Baseline JIT cti_optimize call will then install all ready
(i.e. compiled) Plans for that VM. Note that (1) is only for the GC and
VM shutdown, which will want to ensure that there aren't any outstanding
async compilations before proceeding. They do so by simply waiting for
all of the plans for the current VM to complete. (2) is the actual way
that code typically gets installed.

This is all very racy by design. For example, just as we try to force the
execute counter to take slow path, the main thread may be setting the
execute counter to some other value. The main thread must set it to
another value because (a) JIT code is constantly incrementing the counter
in a racy way, (b) the cti_optimize slow path will set it to some
large-ish negative value to ensure that cti_optimize isn't called
repeatedly, and (c) OSR exits from previously jettisoned code blocks may
still want to reset the counter values. This "race" is made benign, by
ensuring that while there is an asynchronous compilation, we at worse set
the counter to optimizeAfterWarmUp and never to deferIndefinitely. Hence
if the race happens then the worst case is that we wait another ~1000
counts before installing the optimized code. Another defense is that if
any CodeBlock calls into cti_optimize, then it will check for all ready
plans for the VM - so even if a code block has to wait another ~1000
executions before it calls cti_optimize to do the installation, it may
actually end up being installed sooner because a different code block had
called cti_optimize, potentially for an unrelated reason.

Special care is taken to ensure that installing plans informs the GC
about the increased memory usage, but also ensures that we don't recurse
infinitely - since at start of GC we try to install outstanding plans.
This is done by introducing a new GC deferral mechanism (the DeferGC
block-scoped thingy), which will ensure that GCs don't happen in the
scope but are allowed to happen after. This still leaves the strange
corner case that cti_optimize may install outstanding plans, then GC, and
that GC may jettison the code block that was installed. This, and the
fact that the plan that we took slow path to install could have been a
failed or invalid compile, mean that we have to take special precautions
in cti_optimize.

This patch also fixes a number of small concurrency bugs that I found
when things started running. There are probably more of those bugs still
left to fix. This patch just fixes the ones I know about.

Concurrent compilation is right now only enabled on X86_64 Mac. We need
platforms that are sufficiently CAStastic so that we can do the various
memory fence and CAS tricks that make this safe. We also need a platform
that uses JSVALUE64. And we need pthread_once. So, that pretty much means
just X64_64 for now. Enabling Linux-64_64 should be a breeze, but I'll
leave that up to the Qt and GTK+ ports to do at their discretion.

This is a solid speed-up on SunSpider (8-9%) and V8Spider (16%), our two
main compile-time benchmarks. Most peculiarly, this also appears to
reduce measurement noise, rather than increasing it as you would have
expected. I don't understand that result but I like it anyway. On the
other hand, this is a slight (1%) slow-down on V8v7. I will continue to
investigate this but I think that the results are already good enough
that we should land this as-is. So far, it appears that the slow-down is
due to this breaking the don't-compile-inlineables heuristics. See
investigation in https://bugs.webkit.org/show_bug.cgi?id=116556 and the
bug https://bugs.webkit.org/show_bug.cgi?id=116557.

* JavaScriptCore.xcodeproj/project.pbxproj:
* bytecode/CodeBlock.cpp:
(JSC):
(JSC::CodeBlock::finalizeUnconditionally):
(JSC::CodeBlock::resetStubInternal):
(JSC::CodeBlock::baselineVersion):
(JSC::CodeBlock::hasOptimizedReplacement):
(JSC::CodeBlock::optimizationThresholdScalingFactor):
(JSC::CodeBlock::checkIfOptimizationThresholdReached):
(JSC::CodeBlock::optimizeNextInvocation):
(JSC::CodeBlock::dontOptimizeAnytimeSoon):
(JSC::CodeBlock::optimizeAfterWarmUp):
(JSC::CodeBlock::optimizeAfterLongWarmUp):
(JSC::CodeBlock::optimizeSoon):
(JSC::CodeBlock::forceOptimizationSlowPathConcurrently):
(JSC::CodeBlock::setOptimizationThresholdBasedOnCompilationResult):
(JSC::CodeBlock::updateAllPredictionsAndCountLiveness):
(JSC::CodeBlock::updateAllArrayPredictions):
(JSC::CodeBlock::shouldOptimizeNow):
* bytecode/CodeBlock.h:
(CodeBlock):
(JSC::CodeBlock::jitCompile):
* bytecode/CodeBlockLock.h:
(JSC):
* bytecode/ExecutionCounter.cpp:
(JSC::ExecutionCounter::forceSlowPathConcurrently):
(JSC):
(JSC::ExecutionCounter::setThreshold):
* bytecode/ExecutionCounter.h:
(ExecutionCounter):
* debugger/Debugger.cpp:
(JSC::Debugger::recompileAllJSFunctions):
* dfg/DFGByteCodeParser.cpp:
(JSC::DFG::ByteCodeParser::injectLazyOperandSpeculation):
(JSC::DFG::ByteCodeParser::getArrayMode):
(JSC::DFG::ByteCodeParser::getArrayModeAndEmitChecks):
* dfg/DFGCommon.h:
(JSC::DFG::enableConcurrentJIT):
(DFG):
* dfg/DFGDriver.cpp:
(JSC::DFG::compile):
* dfg/DFGGraph.cpp:
(JSC::DFG::Graph::Graph):
* dfg/DFGGraph.h:
(Graph):
* dfg/DFGOSREntry.cpp:
(JSC::DFG::prepareOSREntry):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::Plan):
(JSC::DFG::Plan::compileInThread):
(JSC::DFG::Plan::key):
(DFG):
* dfg/DFGPlan.h:
(DFG):
(Plan):
* dfg/DFGWorklist.cpp: Added.
(DFG):
(JSC::DFG::Worklist::Worklist):
(JSC::DFG::Worklist::~Worklist):
(JSC::DFG::Worklist::finishCreation):
(JSC::DFG::Worklist::create):
(JSC::DFG::Worklist::enqueue):
(JSC::DFG::Worklist::compilationState):
(JSC::DFG::Worklist::waitUntilAllPlansForVMAreReady):
(JSC::DFG::Worklist::removeAllReadyPlansForVM):
(JSC::DFG::Worklist::completeAllReadyPlansForVM):
(JSC::DFG::Worklist::completeAllPlansForVM):
(JSC::DFG::Worklist::queueLength):
(JSC::DFG::Worklist::dump):
(JSC::DFG::Worklist::runThread):
(JSC::DFG::Worklist::threadFunction):
(JSC::DFG::initializeGlobalWorklistOnce):
(JSC::DFG::globalWorklist):
* dfg/DFGWorklist.h: Added.
(DFG):
(Worklist):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::allocateBlock):
* heap/DeferGC.h: Added.
(JSC):
(DeferGC):
(JSC::DeferGC::DeferGC):
(JSC::DeferGC::~DeferGC):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::reportExtraMemoryCostSlowCase):
(JSC::Heap::collectAllGarbage):
(JSC::Heap::collect):
(JSC::Heap::collectIfNecessaryOrDefer):
(JSC):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
* heap/Heap.h:
(Heap):
(JSC::Heap::isCollecting):
(JSC):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::allocateSlowCase):
* jit/JIT.cpp:
(JSC::JIT::privateCompile):
* jit/JIT.h:
* jit/JITStubs.cpp:
(JSC::DEFINE_STUB_FUNCTION):
* llint/LLIntSlowPaths.cpp:
(JSC::LLInt::jitCompileAndSetHeuristics):
(JSC::LLInt::entryOSR):
(JSC::LLInt::LLINT_SLOW_PATH_DECL):
* profiler/ProfilerBytecodes.h:
* runtime/ConcurrentJITLock.h: Added.
(JSC):
* runtime/ExecutionHarness.h:
(JSC::replaceWithDeferredOptimizedCode):
* runtime/JSSegmentedVariableObject.cpp:
(JSC::JSSegmentedVariableObject::findRegisterIndex):
(JSC::JSSegmentedVariableObject::addRegisters):
* runtime/JSSegmentedVariableObject.h:
(JSSegmentedVariableObject):
* runtime/Options.h:
(JSC):
* runtime/Structure.h:
(Structure):
* runtime/StructureInlines.h:
(JSC::Structure::propertyTable):
* runtime/SymbolTable.h:
(SymbolTable):
* runtime/VM.cpp:
(JSC::VM::VM):
(JSC::VM::~VM):
(JSC::VM::prepareToDiscardCode):
(JSC):
(JSC::VM::discardAllCode):
(JSC::VM::releaseExecutableMemory):
* runtime/VM.h:
(DFG):
(VM):

Source/WTF:

Reviewed by Geoffrey Garen.

* wtf/ByteSpinLock.h:
Make it non-copyable. We previously had bugs where we used ByteSpinLock as a locker.
Clearly that's bad.

* wtf/MetaAllocatorHandle.h:
Make it thread-safe ref-counted, since we may now be passing them between the
concurrent JIT thread and the main thread.

* wtf/Vector.h:
(WTF::Vector::takeLast):
I've wanted this method for ages, and now I finally added.

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@153169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
diff --git a/Source/JavaScriptCore/heap/Heap.cpp b/Source/JavaScriptCore/heap/Heap.cpp
index 7494717..96ffbd6 100644
--- a/Source/JavaScriptCore/heap/Heap.cpp
+++ b/Source/JavaScriptCore/heap/Heap.cpp
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2011 Apple Inc. All rights reserved.
+ *  Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2011, 2013 Apple Inc. All rights reserved.
  *  Copyright (C) 2007 Eric Seidel <eric@webkit.org>
  *
  *  This library is free software; you can redistribute it and/or
@@ -26,6 +26,7 @@
 #include "CopiedSpace.h"
 #include "CopiedSpaceInlines.h"
 #include "CopyVisitorInlines.h"
+#include "DFGWorklist.h"
 #include "GCActivityCallback.h"
 #include "HeapRootVisitor.h"
 #include "HeapStatistics.h"
@@ -264,6 +265,7 @@
     , m_lastCodeDiscardTime(WTF::currentTime())
     , m_activityCallback(DefaultGCActivityCallback::create(this))
     , m_sweeper(IncrementalSweeper::create(this))
+    , m_deferralDepth(0)
 {
     m_storageSpace.init();
 }
@@ -306,8 +308,7 @@
     // collecting more frequently as long as it stays alive.
 
     didAllocate(cost);
-    if (shouldCollect())
-        collect(DoNotSweep);
+    collectIfNecessaryOrDefer();
 }
 
 void Heap::reportAbandonedObjectGraph()
@@ -693,7 +694,7 @@
 {
     if (!m_isSafeToCollect)
         return;
-
+    
     collect(DoSweep);
 }
 
@@ -703,12 +704,18 @@
 {
     SamplingRegion samplingRegion("Garbage Collection");
     
+    RELEASE_ASSERT(!m_deferralDepth);
     GCPHASE(Collect);
     ASSERT(vm()->apiLock().currentThreadIsHoldingLock());
     RELEASE_ASSERT(vm()->identifierTable == wtfThreadData().currentIdentifierTable());
     ASSERT(m_isSafeToCollect);
     JAVASCRIPTCORE_GC_BEGIN();
     RELEASE_ASSERT(m_operationInProgress == NoOperation);
+    
+    m_deferralDepth++; // Make sure that we don't GC in this call.
+    m_vm->prepareToDiscardCode();
+    m_deferralDepth--; // Decrement deferal manually, so we don't GC when we do so, since we are already GCing!.
+    
     m_operationInProgress = Collection;
 
     m_activityCallback->willCollect();
@@ -809,6 +816,18 @@
         HeapStatistics::showObjectStatistics(this);
 }
 
+bool Heap::collectIfNecessaryOrDefer()
+{
+    if (m_deferralDepth)
+        return false;
+    
+    if (!shouldCollect())
+        return false;
+    
+    collect(DoNotSweep);
+    return true;
+}
+
 void Heap::markDeadObjects()
 {
     m_objectSpace.forEachDeadCell<MarkObject>();
@@ -893,4 +912,20 @@
     m_objectSpace.forEachDeadCell<Zombify>();
 }
 
+void Heap::incrementDeferralDepth()
+{
+    RELEASE_ASSERT(m_deferralDepth < 100); // Sanity check to make sure this doesn't get ridiculous.
+    
+    m_deferralDepth++;
+}
+
+void Heap::decrementDeferralDepthAndGCIfNeeded()
+{
+    RELEASE_ASSERT(m_deferralDepth >= 1);
+    
+    m_deferralDepth--;
+    
+    collectIfNecessaryOrDefer();
+}
+
 } // namespace JSC