| <html> |
| <head> |
| <title>B3 Intermediate Representation</title> |
| <link rel="stylesheet" type="text/css" href="style.css"> |
| </head> |
| <body> |
| <div id="banner"> |
| <a href="http://www.webkit.org/" class="banner-link"> |
| <div id="logo" class="site-logo"> |
| WebKit |
| <span class="tagline">Open Source Web Browser Engine</span> |
| </div> |
| </a> |
| </div> |
| <div id="contents"> |
| <h1><a href="index.html">Bare Bones Backend</a> / B3 Intermediate Representation</h1> |
| <p>B3 IR is a C-like SSA representation of a procedure. A procedure has a root block at |
| which it starts execution when it is invoked. A procedure does not have to terminate, but |
| if it does, then it can be either due to a Return, which gracefully returns some value, or |
| by a side-exit at designated instructions. B3 gives the client a lot of flexibility to |
| implement many different kinds of side-exits.</p> |
| |
| <p>B3 is designed to represent procedures for the purpose of transforming them. Knowing |
| what transformations are legal requires knowing what a procedure does. A transformation |
| is valid if it does not change the observable behavior of a procedure. This document |
| tells you what B3 procedures do by telling you what each construct in B3 IR does.</p> |
| |
| <h2>Procedure</h2> |
| |
| <p>The parent object of all things in B3 is the Procedure. Every time you want to compile |
| something, you start by creating a Procedure. The lifecycle of a Procedure is |
| usually:</p> |
| |
| <ol> |
| <li>Create the Procedure.</li> |
| <li>Populate the Procedure with code.</li> |
| <li>Use either the <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Compilation.h">high-level |
| Compilation API</a> or the |
| <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Generate.h">low-level |
| generation API</a>.</li> |
| </ol> |
| |
| <p>The act of compiling the Procedure changes it in-place, making it unsuitable for |
| compiling again. Always create a new Procedure every time you want to compile |
| something.</p> |
| |
| <h2>Types</h2> |
| |
| <p>B3 has a trivial type system with only five types:</p> |
| |
| <dl> |
| <dt>Void</dt> |
| <dd>Used to say that an instruction does not return a value.</dd> |
| |
| <dt>Int32</dt> |
| <dd>32-bit integer. Integers don't have sign, but operations on them do.</dd> |
| |
| <dt>Int64</dt> |
| <dd>64-bit integer.</dd> |
| |
| <dt>Float</dt> |
| <dd>32-bit binary floating point number.</dd> |
| |
| <dt>Double</dt> |
| <dd>64-bit binary floating point number.</dd> |
| </dl> |
| |
| <p>B3 does not have a pointer type. Instead, the <code>B3::pointerType()</code> function will |
| return either Int32 or Int64 depending on which kind of integer can be used to represent a |
| pointer on the current platform. It's not a goal of B3 to support hardware targets that require |
| pointers and integers to be segregated. It's not a goal of B3 to support GC (garbage |
| collection) roots as a separate type, since JSC uses |
| <a href="http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf">Bartlett-style conservative |
| root scanning</a>. This doesn't preclude any mainstream garbage collection algorithms, |
| including copying, generational, or concurrent collectors, and frees up the compiler to perform |
| more optimizations.</p> |
| |
| <h2>Values</h2> |
| |
| <p>Variables, and the instructions that define them, are represented using the Value object. |
| The Value object has a return type, a kind, and zero or more children. Children are |
| references to other Values. Those values are used as input to the instruction that |
| computes this value.</p> |
| |
| <p>The value kind is a combination of an opcode and optional flags. The flags allow a single |
| opcode to have many variants. For example, Div and Mod may have the Chill flag set to indicate |
| that they should not trap on corner cases. Alternatively, Load/Store opcodes may have the |
| Traps flag set to indicate that they should trap deterministically.</p> |
| |
| <p>Values also have a unique 32-bit index that is used as the name.</p> |
| |
| <p>Example:</p> |
| |
| <pre><code>Int32 @3 = Add(@1, @2)</code></pre> |
| |
| <p>This represents a single Value instance. Its index is 3. It is an Int32. The opcode is |
| Add, and its children are @1 and @2.</p> |
| |
| <p>Values may also have additional meta-data. We use special subclasses of the B3::Value |
| class for values that need meta-data. For example, the Load value needs a 32-bit offset |
| for the load. We use the MemoryValue class for memory-accessing values, which all have |
| such an offset.</p> |
| |
| <h2>Stack Slots</h2> |
| |
| <p>B3 exposes the concept of stack-allocated data and gives the client a lot of control. |
| By default, stack slots get allocated wherever B3 chooses. It will try to pack them as |
| much as possible. After compilation is done, you can retrieve each stack slot's location |
| in the form of an offset from the frame pointer.</p> |
| |
| <p>You can force stack slots to end up at a particular offset from the frame pointer, though |
| this is very dangerous. For example, B3 assumes that it can use the slots closest to the |
| frame pointer for callee-saves, and currently when you force something to a particular |
| frame pointer offset, there is no mechanism to notice that this space is also being used |
| for callee-saves. Therefore, we recommend not using the frame pointer offset forcing |
| feature unless you know a lot about the ABI and you have no other choice.</p> |
| |
| <h2>Variables</h2> |
| |
| <p>Sometimes, SSA is inconvenient. For example, it's hard to do path specialization over SSA. |
| B3 has the concept of Variables as a fall-back. The backend knows how to handle them and |
| will coalesce and copy-propagate them. Inside the B3 optimizer, there is a classic SSA |
| builder that eliminates variables and builds SSA in their place.</p> |
| |
| <p>You can create Variables by using Procedure::addVariable(), and then you can access them |
| using the Get and Set opcodes.</p> |
| |
| <p>The fixSSA() phase will convert variables to SSA. If you use a lot of variables in your |
| input to B3, it's a good idea to run fixSSA() manually before running the compiler. The |
| default optimizer only runs fixSSA() towards the middle of optimizations. Passing non-SSA code |
| as input to the optimizer may render the early phases ineffective. Fortunately, B3 phases |
| are super easy to run. The following runs SSA fix-up on a Procedure named "proc":</p> |
| |
| <pre><code>fixSSA(proc);</code></pre> |
| |
| <h2>Control flow</h2> |
| |
| <p>B3 represents control flow using basic blocks. Each basic block may have zero or more |
| predecessors. Each basic block may have zero or more successors. The meaning of the |
| successors is determined by the basic block's last Value, which must be a terminal. A |
| value is terminal if:</p> |
| |
| <pre><code>value->effects().terminal</code></pre> |
| |
| <p>Some opcodes are definitely terminal, like Jump, Branch, Oops, Return, and Switch. But a |
| value with the Patchpoint opcode may or may not be terminal. In general it's necessary to |
| check the <code>terminal</code> bit as shown above.</p> |
| |
| <p>Each basic block contains a Vector<Value*> as the contents of the block. Control |
| flow inside the block is implicit based on the order of Values in the vector.</p> |
| |
| <h2>Memory model</h2> |
| |
| <p>B3 uses a secondary address space of <i>abstract heaps</i> to reason about aliasing and |
| concurrency. This address space is 32-bit and we point at it using the HeapRange type. If you |
| never supply HeapRanges to your memory operations (the default), they will have [0, UINT_MAX] |
| as their range, indicating that the operation may conservatively write to all 2<sup>32</sup> |
| abstract heaps. Two memory operations are said to alias if they access the same abstract heaps. |
| A simple example of using HeapRanges is with loads and stores, but you can also supply |
| HeapRanges as bounds on the effects of calls and patchpoints.</p> |
| |
| <p>Fences are modeled as phantom effects that we call <i>fence ranges</i>. For example, a load |
| fence could be represented as a patchpoint that writes to [0, UINT_MAX]. B3 models load fences, |
| store fences, store-load fences, acquire fences, and release fences using fence ranges. The |
| absence of a fence range means that there is no fence.</p> |
| |
| <p>Acquire fences are modeled as phantom writes to memory, while release fences are modeled as |
| phantom reads from memory. By itself this does not permit B3 to do all of the reorderings that |
| acquire/release allow. B3 is therefore allowed to treat loads/stores with fence ranges more |
| precisely. The phantom write effects of a fenced load cannot write to anything that is read or |
| written after the fence. The phantom read effects of a fenced store cannot read anything that |
| is written after the fence.</p> |
| |
| <p>Applying a fence range to a load or store only makes that access a half-fence: load-with-fence |
| is an acquire fence that only prevents later operations from being hoisted and store-with-fence |
| is a release fence that only prevents earlier operations from being sunk. The canonical way to |
| express a fully fenced load (which is fenced in <i>both</i> directions) is to use |
| AtomicXchgAdd(0, ptr) with a non-empty fence range. The canonical way to express a fully fenced |
| store is to use AtomicXchg(value, ptr) with a non-empty fence range.</p> |
| |
| <h2>Opcodes</h2> |
| |
| <p>This section describes opcodes in the following format:</p> |
| |
| <dl> |
| <dt>Int32 Foo(Int64, Double)</dt> |
| <dd>This describes an opcode named Foo that uses Int32 as its return type and takes two |
| children - one of type Int64 and another of type Double.</dd> |
| </dl> |
| |
| <p>We sometimes use the wildcard type T to represent polymorphic operations, like "T Add(T, |
| T)". This means that the value must take two children of the same type and returns a |
| value of that type. We use the type IntPtr to mean either Int32, or Int64, depending on |
| the platform.</p> |
| |
| <p>Some opcodes can have some flags set. A description of flags follows the description of |
| opcodes.</p> |
| |
| <h3>Opcode descriptions</h3> |
| |
| <dl> |
| <dt>Void Nop()</dt> |
| <dd>The empty value. Instead of removing Values from basic blocks, most optimizations |
| convert them to Nops. Various phases run fix-up where all Nops are removed in one pass. |
| It's common to see Nops in intermediate versions of B3 IR during optimizations. Nops |
| never lead to any code being generated and they do not impede optimizations, so they are |
| usually harmless. You can convert a Value to a Nop by doing convertToNop().</dd> |
| |
| <dt>T Identity(T)</dt> |
| <dd>Returns the passed value. May be used for any type except Void. Instead of replacing |
| all uses of a Value with a different Value, most optimizations convert them to Identity. |
| Various phases run fix-up where all uses of Identity are replaced with the Identity's |
| child (transitively, so Identity(Identity(Identity(@x))) is changed to just @x). Even |
| the instruction selector "sees through" Identity. You can remove all references to |
| Identity in any value by calling Value::performSubstitution(). You can convert a Value |
| to an Identity by doing convertToIdentity(otherValue). If the value is Void, |
| convertToIdentity() converts it to a Nop instead.</dd> |
| |
| <dt>T Opaque(T)</dt> |
| <dd>Returns the passed value. May be used for any type except Void. This opcode is provided as a hack |
| for avoiding B3 optimizations that the client finds unprofitable. B3 treats Opaque as an identity |
| only during instruction selection. All prior optimizations (including all B3 IR optimizations) do |
| not know what Opaque does, other than Opaque(x) == Opaque(x) and Opaque(Opaque(x)) == Opaque(x).</dd> |
| |
| <dt>Int32 Const32(constant)</dt> |
| <dd>32-bit integer constant. Must use the Const32Value class, which has space for the |
| int32_t constant.</dd> |
| |
| <dt>Int64 Const64(constant)</dt> |
| <dd>64-bit integer constant. Must use the Const64Value class, which has space for the |
| int64_t constant.</dd> |
| |
| <dt>Float ConstFloat(constant)</dt> |
| <dd>Float constant. Must use the ConstFloatValue class, which has space for the float constant.</dd> |
| |
| <dt>Double ConstDouble(constant)</dt> |
| <dd>Double constant. Must use the ConstDoubleValue class, which has space for the double constant.</dd> |
| |
| <dt>Void Set(value, variable)</dt> |
| <dd>Assigns the given value to the given Variable. Must use the VariableValue class.</dd> |
| |
| <dt>T Get(variable)</dt> |
| <dd>Returns the current value of the given Variable. Its return type depends on the |
| variable. Must use the VariableValue class.</dd> |
| |
| <dt>IntPtr SlotBase(stackSlot)</dt> |
| <dd>Returns a pointer to the base of the given stack slot. Must use the SlotBaseValue |
| class.</dd> |
| |
| <dt>IntPtr|Double ArgumentReg(%register)</dt> |
| <dd>Returns the value that the given register had at the prologue of the procedure. It |
| returns IntPtr for general-purpose registers and Double for FPRs. Must use the |
| ArgumentRegValue class.</dd> |
| |
| <dt>IntPtr FramePointer()</dt> |
| <dd>Returns the value of the frame pointer register. B3 procedures alway use a frame |
| pointer ABI, and the frame pointer is always guaranteed to have this value anywhere |
| inside the procedure.</dd> |
| |
| <dt>T Add(T, T)</dt> |
| <dd>Works with any type except Void. For integer types, this represents addition with |
| wrap-around semantics. For floating point types, this represents addition according to |
| the IEEE 854 spec. B3 does not have any notion of "fast math". A transformation over |
| floating point code is valid if the new code produces exactly the same value, bit for |
| bit.</dd> |
| |
| <dt>T Sub(T, T)</dt> |
| <dd>Works with any type except Void. For integer types, this represents subtraction with |
| wrap-around semantics. For floating point types, this represents subtraction according |
| to the IEEE 854 spec.</dd> |
| |
| <dt>T Mul(T, T)</dt> |
| <dd>Works with any type except Void. For integer types, this represents multiplication |
| with wrap-around semantics. For floating point types, this represents multiplication |
| according to the IEEE 854 spec.</dd> |
| |
| <dt>T Div(T, T)</dt> |
| <dd>Works with any type except Void. For integer types, this represents signed |
| division with round-to-zero. By default, its behavior is undefined for x/0 or |
| -2<sup>31</sup>/-1. For floating point types, this represents division according |
| to the IEEE 854 spec. Integer Div may have the Chill flag set.</dd> |
| |
| <dt>T Mod(T, T)</dt> |
| <dd>Works with any type except Void. For integer types, this represents signed |
| modulo. By default, its behavior is undefined for x%0 or -2<sup>31</sup>%-1. For |
| floating point types, this represents modulo according to "fmod()". Integer Mod may have the |
| Chill flag set.</dd> |
| |
| <dt>T Neg(T)</dt> |
| <dd>Works with any type except Void. For integer types, this represents twos-complement |
| negation. For floating point types, this represents negation according to the IEEE |
| spec.</dd> |
| |
| <dt>T BitAnd(T, T)</dt> |
| <dd>Bitwise and. Valid for any type except Void.</dd> |
| |
| <dt>T BitOr(T, T)</dt> |
| <dd>Bitwise or. Valid for any type except Void.</dd> |
| |
| <dt>T BitXor(T, T)</dt> |
| <dd>Bitwise xor. Valid for any type except Void.</dd> |
| |
| <dt>T Shl(T, Int32)</dt> |
| <dd>Shift left. Valid for Int32 and Int64. The shift amount is always Int32. Only the |
| low 31 bits of the shift amount are used for Int32. Only the low 63 bits of the shift |
| amount are used for Int64.</dd> |
| |
| <dt>T SShr(T, Int32)</dt> |
| <dd>Shift right with sign extension. Valid for Int32 and Int64. The shift amount is |
| always Int32. Only the low 31 bits of the shift amount are used for Int32. Only the |
| low 63 bits of the shift amount are used for Int64.</dd> |
| |
| <dt>T ZShr(T, Int32)</dt> |
| <dd>Shift right with zero extension. Valid for Int32 and Int64. The shift amount is |
| always Int32. Only the low 31 bits of the shift amount are used for Int32. Only the |
| low 63 bits of the shift amount are used for Int64.</dd> |
| |
| <dt>T RotR(T, Int32)</dt> |
| <dd>Rotate right. Valid for Int32 and Int64. The shift amount is always Int32. Only the |
| low 31 bits of the shift amount are used for Int32. Only the low 63 bits of the shift |
| amount are used for Int64.</dd> |
| |
| <dt>T RotL(T, Int32)</dt> |
| <dd>Rotate left. Valid for Int32 and Int64. The shift amount is always Int32. Only the |
| low 31 bits of the shift amount are used for Int32. Only the low 63 bits of the shift |
| amount are used for Int64.</dd> |
| |
| <dt>T Clz(T)</dt> |
| <dd>Count leading zeroes. Valid for Int32 and Int64.</dd> |
| |
| <dt>T Abs(T)</dt> |
| <dd>Absolute value. Valid for Float and Double.</dd> |
| |
| <dt>T Ceil(T)</dt> |
| <dd>Ceiling. Valid for Float and Double.</dd> |
| |
| <dt>T Floor(T)</dt> |
| <dd>Flooring. Valid for Float and Double.</dd> |
| |
| <dt>T Sqrt(T)</dt> |
| <dd>Square root. Valid for Float and Double.</dd> |
| |
| <dt>U BitwiseCast(T)</dt> |
| <dd>If T is Int32 or Int64, it returns the bitwise-corresponding Float or Double, |
| respectively. If T is Float or Double, it returns the bitwise-corresponding Int32 or |
| Int64, respectively.</dd> |
| |
| <dt>Int32 SExt8(Int32)</dt> |
| <dd>Fills the top 24 bits of the integer with the low byte's sign extension.</dd> |
| |
| <dt>Int32 SExt16(Int32)</dt> |
| <dd>Fills the top 16 bits of the integer with the low short's sign extension.</dd> |
| |
| <dt>Int64 SExt32(Int32)</dt> |
| <dd>Returns a 64-bit integer such that the low 32 bits are the given Int32 value and the |
| high 32 bits are its sign extension.</dd> |
| |
| <dt>Int64 ZExt32(Int32)</dt> |
| <dd>Returns a 64-bit integer such that the low 32 bits are the given Int32 value and the |
| high 32 bits are zero.</dd> |
| |
| <dt>U Trunc(T)</dt> |
| <dd>Returns the low 32 bits of the 64-bit value. If T is Int64 then U is Int32. |
| If T is Double then U is Float.</dd> |
| |
| <dt>Double IToD(T)</dt> |
| <dd>Converts the given integer into a double. Value for Int32 or Int64 inputs.</dd> |
| |
| <dt>Double FloatToDouble(Float)</dt> |
| <dd>Converts the given float into a double.</dd> |
| |
| <dt>Float DoubleToFloat(Double)</dt> |
| <dd>Converts the given double into a float.</dd> |
| |
| <dt>Int32 Equal(T, T)</dt> |
| <dd>Compares the two values. If they are equal, return 1; else return 0. Valid for all |
| types except Void. Integer comparisons simply compare all bits. Floating point |
| comparisons mostly compare bits, but have some corner cases: positive zero and negative |
| zero are considered equal, and they return false when either value is NaN.</dd> |
| |
| <dt>Int32 NotEqual(T, T)</dt> |
| <dd>The opposite of Equal(). NotEqual(@x, @y) yields the same result as BitXor(Equal(@x, |
| @y), 1).</dd> |
| |
| <dt>Int32 LessThan(T, T)</dt> |
| <dd>Returns 1 if the left value is less than the right one, 0 otherwise. Does a signed |
| comparison for integers. For floating point comparisons, this has the usual caveats |
| with respect to negative zero and NaN.</dd> |
| |
| <dt>Int32 GreaterThan(T, T)</dt> |
| <dd>Returns 1 if the left value is greater than the right one, 0 otherwise. Does a signed |
| comparison for integers. For floating point comparisons, this has the usual caveats |
| with respect to negative zero and NaN.</dd> |
| |
| <dt>Int32 LessEqual(T, T)</dt> |
| <dd>Returns 1 if the left value is less than or equal to the right one, 0 otherwise. Does |
| a signed comparison for integers. For floating point comparisons, this has the usual |
| caveats with respect to negative zero and NaN.</dd> |
| |
| <dt>Int32 GreaterEqual(T, T)</dt> |
| <dd>Returns 1 if the left value is greater than or equal to the right one, 0 otherwise. |
| Does a signed comparison for integers. For floating point comparisons, this has the |
| usual caveats with respect to negative zero and NaN.</dd> |
| |
| <dt>Int32 Above(T, T)</dt> |
| <dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left |
| value is unsigned-greater-than the right one, 0 otherwise.</dd> |
| |
| <dt>Int32 Below(T, T)</dt> |
| <dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left |
| value is unsigned-less-than the right one, 0 otherwise.</dd> |
| |
| <dt>Int32 AboveEqual(T, T)</dt> |
| <dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left |
| value is unsigned-greater-than-or-equal the right one, 0 otherwise.</dd> |
| |
| <dt>Int32 BelowEqual(T, T)</dt> |
| <dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left |
| value is unsigned-less-than-or-equal the right one, 0 otherwise.</dd> |
| |
| <dt>Int32 EqualOrUnordered(T, T)</dt> |
| <dd>Floating point comparison, valid for Float and Double only. Returns 1 if the left |
| value is equal to the right one or if either value is NaN. Returns 0 otherwise.</dd> |
| |
| <dt>T Select(U, T, T)</dt> |
| <dd>Returns either the second child or the third child. T can be any type except Void. U |
| can be either Int32 or Int64. If the first child is non-zero, returns the second child. |
| Otherwise returns the third child.</dd> |
| |
| <dt>Int32 Load8Z(IntPtr, offset)</dt> |
| <dd>Loads a byte from the address, which is computed by adding the compile-time 32-bit |
| signed integer offset to the child value. Zero extends the loaded byte, so the high 24 |
| bits are all zero. Must use the MemoryValue class. May have the Traps flag set. May set |
| a fence range to turn this into a load acquire.</dd> |
| |
| <dt>Int32 Load8S(IntPtr, offset)</dt> |
| <dd>Loads a byte from the address, which is computed by adding the compile-time 32-bit |
| signed integer offset to the child value. Sign extends the loaded byte. Must use the |
| MemoryValue class. May have the Traps flag set. May set |
| a fence range to turn this into a load acquire.</dd> |
| |
| <dt>Int32 Load16Z(IntPtr, offset)</dt> |
| <dd>Loads a 16-bit integer from the address, which is computed by adding the compile-time |
| 32-bit signed integer offset to the child value. Zero extends the loaded 16-bit |
| integer, so the high 16 bits are all zero. Misaligned loads are not penalized. Must |
| use the MemoryValue class. May have the Traps flag set. May set |
| a fence range to turn this into a load acquire.</dd> |
| |
| <dt>Int32 Load16S(IntPtr, offset)</dt> |
| <dd>Loads a 16-bit integer from the address, which is computed by adding the compile-time |
| 32-bit signed integer offset to the child value. Sign extends the loaded 16-bit |
| integer. Misaligned loads are not penalized. Must use the MemoryValue class. May have the |
| Traps flag set. May set a fence range to turn this into a load acquire.</dd> |
| |
| <dt>T Load(IntPtr, offset)</dt> |
| <dd>Valid for any type except Void. Loads a value of that type from the address, which is |
| computed by adding the compile-time 32-bit signed integer offset to the child value. |
| Misaligned loads are not penalized. Must use the MemoryValue class. May have the Traps |
| flag set. May set a fence range to turn this into a load acquire.</dd> |
| |
| <dt>Void Store8(Int32, IntPtr, offset)</dt> |
| <dd>Stores a the low byte of the first child into the address computed by adding the |
| compile-time 32-bit signed integer offset to the second child. Must use the MemoryValue |
| class. May have the Traps flag set. May set a fence range to turn this into a store release.</dd> |
| |
| <dt>Void Store16(Int32, IntPtr, offset)</dt> |
| <dd>Stores a the low 16 bits of the first child into the address computed by adding the |
| compile-time 32-bit signed integer offset to the second child. Misaligned stores are |
| not penalized. Must use the MemoryValue class. May have the Traps flag set. May set a |
| fence range to turn this into a store release.</dd> |
| |
| <dt>Void Store(T, IntPtr, offset)</dt> |
| <dd>Stores the value in the first child into the address computed by adding the |
| compile-time 32-bit signed integer offset to the second child. Misaligned stores are |
| not penalized. Must use the MemoryValue class. May have the Traps flag set. May set a |
| fence range to turn this into a store release.</dd> |
| |
| <dt>Int32 AtomicWeakCAS(T expectedValue, T newValue, IntPtr ptr)</dt> |
| <dd>Performs a weak <a href="https://en.wikipedia.org/wiki/Compare-and-swap">CAS |
| (compare-and-swap)</a>. Returns a boolean (0 or 1) to indicate the result (failure or |
| success, respectively). May spuriously fail, in which case it will do nothing and return |
| 0. You can supply a fence range to indicate that the CAS is fenced. Fenced CAS has both |
| acquire and release fences, and claims to both read and write the full fence range in |
| addition to whatever primary range is supplied for the CAS itself. AtomicWeakCAS only |
| has fencing behavior in the case that it succeeds. Atomic operations take a width parameter, |
| allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory locations.</dd> |
| |
| <dt>T AtomicStrongCAS(T expectedValue, T newValue, IntPtr ptr)</dt> |
| <dd>Performs a strong <a href="https://en.wikipedia.org/wiki/Compare-and-swap">CAS |
| (compare-and-swap)</a>. Returns the old value before the CAS. The instruction selector |
| is smart about Equal(AtomicStrongCAS(expected, ...), expected) patterns, so this opcode |
| is also the best way to do a strong CAS that returns a boolean - just use Equal to compare |
| the result to expected. You can supply a fence range to indicate that the CAS is fenced. |
| Fenced CAS has both acquire and release fences, and claims to both read and write the full |
| fence range in addition to whatever primary range is supplied for the CAS itself. |
| AtomicStrongCAS has the same fencing on failure and success. Atomic operations take a width |
| parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory |
| locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd> |
| |
| <dt>T AtomicXchgAdd(T, IntPtr)</dt> |
| <dd>Atomically adds a value to the memory location and returns the old value. Is allowed to |
| have a fence range, which causes it to have acquire/release fencing. Atomic operations take |
| a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory |
| locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd> |
| |
| <dt>T AtomicXchgAnd(T, IntPtr)</dt> |
| <dd>Atomically bitwise-ands a value to the memory location and returns the old value. Is allowed to |
| have a fence range, which causes it to have acquire/release fencing. Atomic operations take |
| a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory |
| locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd> |
| |
| <dt>T AtomicXchgOr(T, IntPtr)</dt> |
| <dd>Atomically bitwise-ors a value to the memory location and returns the old value. Is allowed to |
| have a fence range, which causes it to have acquire/release fencing. Atomic operations take |
| a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory |
| locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd> |
| |
| <dt>T AtomicXchgSub(T, IntPtr)</dt> |
| <dd>Atomically subtracts a value to the memory location and returns the old value. Is allowed to |
| have a fence range, which causes it to have acquire/release fencing. Atomic operations take |
| a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory |
| locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd> |
| |
| <dt>T AtomicXchgXor(T, IntPtr)</dt> |
| <dd>Atomically bitwise-xors a value to the memory location and returns the old value. Is allowed to |
| have a fence range, which causes it to have acquire/release fencing. Atomic operations take |
| a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory |
| locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd> |
| |
| <dt>T AtomicXchg(T, IntPtr)</dt> |
| <dd>Atomically stores a value to the memory location and returns the old value. Is allowed to |
| have a fence range, which causes it to have acquire/release fencing. Atomic operations take |
| a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory |
| locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd> |
| |
| <dt>T Depend(T)</dt> |
| <dd>Only valid for integer types. This is guaranteed to return zero by xoring the input with |
| itself, and the B3 compiler guarantees that it will not leverage this knowledge for any |
| optimization. On x86, this is lowered to a Fence with read=Bottom, write=Top (i.e. a load-load |
| fence) and then it's constant-folded to zero. On ARM, this is preserved through the compiler |
| pipeline all the way to the MacroAssembler, which turns this into a <code>eor</code>. Using |
| Depend is the most efficient way to create load-load dependencies, regarldess of platform. It |
| allows B3's CSE to work for surrounding loads and it even supports CSEing the dependency |
| itself - so two identical load chains with no interleaving effects can be turned into one. On |
| ARM, it also avoids emitting an expensive <code>dmb ish</code> load-load fence. Because of the |
| benefits for CSE, it's recommended to use Depend for load-load dependencies even if you're only |
| targeting x86.</dd> |
| |
| <dt>IntPtr WasmAddress(IntPtr, pinnedGPR)</dt> |
| <dd>This is used to compute the address of a wasm memory load from a pinned base GPR. |
| Must use the WasmAddressValue class.</dd> |
| |
| <dt>Void Fence()</dt> |
| <dd>Abstracts standalone data fences on x86 and ARM. Must use the FenceValue class, which has |
| two additional members that configure the precise meaning of the fence: |
| <code>HeapRange FenceValue::read</code> and <code>HeapRange FenceValue::write</code>. If the |
| <code>write</code> range is empty, this is taken to be a store-store fence, which leads to |
| no code generation on x86 and the weaker <code>dmb ishst</code> fence on ARM. If the write |
| range is non-empty, this produces <code>mfence</code> on x86 and <code>dmb ish</code> on |
| ARM. Within B3 IR, the Fence also reports the read/write in its effects. This allows you to |
| scope the fence for the purpose of B3's load elimination. For example, you may use a Fence |
| to protect a store from being sunk below a specific load. In that case, you can claim to |
| read just that store's range and write that load's range.</dd> |
| |
| <dt>T1 CCall(IntPtr, [T2, [T3, ...]])</dt> |
| <dd>Performs a C function call to the function pointed to by the first child. The types |
| that the function takes and the type that it returns are determined by the types of the |
| children and the type of the CCallValue. Only the first child is mandatory. Must use |
| the CCallValue class.</dd> |
| |
| <dt>T1 Patchpoint([T2, [T3, ...]])</dt> |
| <dd> |
| <p>A Patchpoint is a customizable value. Patchpoints take zero or more values of any |
| type and return any type. A Patchpoint's behavior is determined by the generator |
| object. The generator is a C++ lambda that gets called during code generation. It gets |
| passed an assembler instance (specifically, CCallHelpers&) and an object describing |
| where to find all of the input values and where to put the result. Here's an example:</p> |
| |
| <pre><code>PatchpointValue* patchpoint = block->appendNew<PatchpointValue>(proc, Int32, Origin()); |
| patchpoint->append(ConstrainedValue(arg1, ValueRep::SomeRegister)); |
| patchpoint->append(ConstrainedValue(arg2, ValueRep::SomeRegister)); |
| patchpoint->setGenerator( |
| [&] (CCallHelpers& jit, const StackmapGenerationParams& params) { |
| jit.add32(params[1].gpr(), params[2].gpr(), params[0].gpr()); |
| });</code></pre> |
| |
| <p>This creates a patchpoint that just adds two numbers. The patchpoint is set to return |
| Int32. The two child values, arg1 and arg2, are passed to the patchpoint with the |
| SomeRegister constraint, which just requests that they get put in appropriate |
| registers (GPR for integer values, FPR for floating point values). The generator uses |
| the params object to figure out which registers the inputs are in (params[1] and |
| params[2]) and which register to put the result in (params[0]). Many sophisticated |
| constraints are possible. You can request that a child gets placed in a specific |
| register. You can list additional registers that are |
| clobbered - either at the top of the patchpoint (i.e. early) so that the clobbered |
| registers interfere with the inputs, or at the bottom of the patchpoint (i.e. late) so |
| that the clobbered registers interfere with the output. Patchpoint constraints also |
| allow you to force values onto the call argument area of the stack. Patchpoints are |
| powerful enough to implement custom calling conventions, inline caches, and |
| side-exits.</p> |
| |
| <p>A patchpoint is allowed to "side exit", i.e. abruptly exit from the procedure. If it |
| wants to do so by returning, it can use Procedure's API for getting the callee-save |
| register layout, unwinding the callee-saves, and then returning. More likely, the |
| patchpoint will take some exit state as part of its arguments, and it will manipulate |
| the call frame in place to make it look like another execution engine's call frame. |
| This is called OSR, and JavaScriptCore does it a lot.</p> |
| |
| <p>A patchpoint can be used as a terminal. Simply set the <code>terminal</code> bit:</p> |
| |
| <pre><code>patchpoint->effects.terminal = true;</code></pre> |
| |
| <p>The generator can determine where to branch by using the StackmapGenerationParams to |
| get the set of labels for all successors. You're guaranteed that the number of |
| successors of the patchpoint's basic block will be the same as when you created it. |
| However, like any value, a patchpoint may be cloned. This means, for example, that if |
| you use this to implement a table jump then the jump table must be created inside the |
| generator, so that you get one jump table per clone.</p> |
| |
| <p>Must use the PatchpointValue class with the Patchpoint opcode.</p> |
| </dd> |
| |
| <dt>T CheckAdd(T, T, [T2, [T3, ...]])</dt> |
| <dd>Checked integer addition. Works for T = Int32 or T = Int64. First first two children |
| are mandatory. Additional children are optional. All of the Check instructions take a |
| generator and value constraints like a Patchpoint. In the case of a CheckAdd, the |
| generator runs on the path where the integer addition overflowed. B3 assumes that |
| CheckAdd will side-exit upon overflow, so the generator must do some kind of |
| termination. Usually, this is used to implement OSR exits on integer overflow and the |
| optional arguments to CheckAdd will be the OSR exit state. Must use the CheckValue |
| class.</dd> |
| |
| <dt>T CheckSub(T, T, [T2, [T3, ...]])</dt> |
| <dd>Checked integer subtraction. Works for T = Int32 or T = Int64. First first two |
| children are mandatory. Additional children are optional. All of the Check |
| instructions take a generator and value constraints like a Patchpoint. In the case of a |
| CheckSub, the generator runs on the path where the integer subtraction overflowed. B3 |
| assumes that CheckSub will side-exit upon overflow, so the generator must do some kind |
| of termination. Usually, this is used to implement OSR exits on integer overflow and |
| the optional arguments to CheckSub will be the OSR exit state. You can use CheckSub for |
| checked negation by using zero for the first child. B3 will select the native negate |
| instruction when you do this. Must use the CheckValue class.</dd> |
| |
| <dt>T CheckMul(T, T, [T2, [T3, ...]])</dt> |
| <dd>Checked integer multiplication. Works for T = Int32 or T = Int64. First first two |
| children are mandatory. Additional children are optional. All of the Check |
| instructions take a generator and value constraints like a Patchpoint. In the case of a |
| CheckMul, the generator runs on the path where the integer multiplication overflowed. |
| B3 assumes that CheckMul will side-exit upon overflow, so the generator must do some |
| kind of termination. Usually, this is used to implement OSR exits on integer overflow |
| and the optional arguments to CheckMul will be the OSR exit state. Must use the |
| CheckValue class.</dd> |
| |
| <dt>Void Check(T, [T2, [T3, ...]])</dt> |
| <dd>Exit check. Works for T = Int32 or T = Int64. This branches on the first child. If |
| the first child is zero, this just falls through. If it's non-zero, it goes to the exit |
| path generated by the passed generator. Only the first child is mandatory. B3 assumes |
| that Check will side-exit when the first child is non-zero, so the generator must do |
| some kind of termination. Usually, this is used to implement OSR exit checks and the |
| optional arguments to Check will be the OSR exit state. Check supports efficient |
| compare/branch fusion, so you can Check for fairly sophisticated predicates. For |
| example, Check(Equal(LessThan(@a, @b), 0)) where @a and @b are doubles will be generated |
| to an instruction that branches to the exit if @a >= @b or if either @a or @b are |
| NaN. Must use the CheckValue class.</dd> |
| |
| <dt>Void WasmBoundsCheck(Int32, pinnedGPR, offset)</dt> |
| <dd>Special Wasm opcode. This branches on the first child. If the first child plus the offset |
| produces a Int64 less than to the pinnedGPR this falls through. Otherwise, it branches to |
| the exit path generated by the passed generator. Unlike the Patch/Check family, the |
| generator used by WasmBoundsCheck should be set on the Procedure itself. The GRPReg passed in |
| pinnedGPR must also be marked as pinned by calling the Procedure's pinning API, or it must be |
| InvalidGPR, in which case the out-of-bounds limit is 4GiB. In the later case a maximum parameter |
| can be provided, to further constrain the out-of-bounds limit and help generate smaller |
| immediates. B3 assumes the WasmBoundsCheck will side-exit when it branches, so the generator |
| must do some kind of termination. In Wasm this is used to trap and unwind back to JS. Must use |
| the WasmBoundsCheckValue class.</dd> |
| |
| <dt>Void Upsilon(T, ^phi)</dt> |
| <dd>B3 uses SSA. SSA requires that each variable gets assigned to only once anywhere in |
| the procedure. But that doesn't work if you have a variable that is supposed to be the |
| result of merging two values along control flow. B3 uses Phi values to represent value |
| merges, just like SSA compilers usually do. But while other SSA compilers represent the |
| inputs to the Phi by listing the control flow edges from which the Phi gets its values, |
| B3 uses the Upsilon value. Each Phi behaves as if it has a memory location associated |
| with it. Executing the Phi behaves like a load from that memory location. |
| Upsilon(@value, ^phi) behaves like a store of @value into the memory location associated |
| with @phi. We say "^phi" when we mean that we are writing to the memory location |
| associated with the Phi. Must use the UpsilonValue class.</dd> |
| |
| <dt>T Phi()</dt> |
| <dd>Works for any type except Void. Represents a local memory location large enough to |
| hold T. Loads from that memory location. The only way to store to that location is |
| with Upsilon.</dd> |
| |
| <dt>Void Jump(takenBlock)</dt> |
| <dd>Jumps to takenBlock. This must appear at the end of the basic block. The basic block |
| must have exactly one successor.</dd> |
| |
| <dt>Void Branch(T, takenBlock, notTakenBlock)</dt> |
| <dd>Works for T = Int32 or T = Int64. Branches on the child. If the child is non-zero, |
| it branches to the takenBlock. Otherwise it branches to the notTakenBlock. Must appear |
| at the end of the basic block. The block must have exactly two successors.</dd> |
| |
| <dt>Void Switch(T, cases...)</dt> |
| <dd>Works for T = Int32 or T = Int64. Switches on the child. Contains a list of switch |
| cases. Each switch case has an integer constant and a target block. The switch value |
| also contains a fall-through target in case the child has a value that wasn't mentioned |
| in the cases list. Must use the SwitchValue class. Must appear at the end of the basic |
| block. The block must have one successor for each case, plus a successor for the |
| fall-through (default) case. You can manage the successors of a block containing a Switch |
| using API in SwitchValue, like SwitchValue::appendCase() and |
| SwitchValue::setFallThrough().</dd> |
| |
| <dt>Void EntrySwitch()</dt> |
| <dd> |
| <p>B3 supports multiple procedure entrypoints. The way you create multiple entrypoints is |
| by placing an EntrySwitch at the end of the root block. The root block must then have a |
| successor for each entrypoint. Additionally, you must tell the Procedure how many |
| entrypoints you want. For example:</p> |
| <pre><code>Procedure proc; |
| proc.setNumEntrypoints(3); |
| BasicBlock* root = proc.addBlock(); |
| BasicBlock* firstEntrypoint = proc.addBlock(); |
| BasicBlock* secondEntrypoint = proc.addBlock(); |
| BasicBlock* thirdEntrypoint = proc.addBlock(); |
| root->appendNew<Value>(proc, EntrySwitch, Origin()); |
| root->appendSuccessor(firstEntrypoint); |
| root->appendSuccessor(secondEntrypoint); |
| root->appendSuccessor(thirdEntrypoint);</code></pre> |
| <p>This is the canonical way to use EntrySwitch, however the semantics are flexible enough |
| to allow its use anywhere in the control flow graph. You can have an arbitrary number of |
| EntrySwitches. This flexibility ensures that EntrySwitch works even when B3 does |
| transformations that move code above the EntrySwitch, duplicate the EntrySwitch itself, |
| or do any number of other unexpected things.</p> |
| <p>EntrySwitch behaves as if each Procedure has a variable called Entry. Each physical |
| entrypoint sets Entry to the index of that entrypoint (so 0, 1, or 2, above) and jumps to |
| the root block. EntrySwitch is just a switch on the Entry variable. Transformations over |
| code that uses EntrySwitch are valid so long as they don't change the procedure's |
| behavior under these semantics.</p> |
| <p>EntrySwitch is implemented without any actual variables or switching. Instead, all code |
| that lies on some path from the root block to some EntrySwitch is cloned for each |
| entrypoint. This lowering is done as a very late phase in Air, so most of the compiler |
| does not have to know anything about entrypoints. Most of the compiler treats EntrySwitch |
| as an opaque control flow construct. EntrySwitch lowering is allowed to clone an |
| arbitrary amount of code. However, normal uses of EntrySwitch will place it at the end of |
| an empty root block and B3 will only hoist a handful of things above EntrySwitch. This |
| leads to only a small amount of cloned code in practice.</p> |
| </dd> |
| |
| <dt>Void Return(T <i>(optional)</i>)</dt> |
| <dd> |
| <p> |
| Returns the control flow to the caller and terminates the procedure. |
| Must appear at the end of the basic block. The block must have zero successors. |
| </p> |
| <p> |
| If the node has a child, its value is returned. The type of the child can be any type except Void. |
| </p> |
| </dd> |
| |
| <dt>Void Oops()</dt> |
| <dd>Indicates unreachable code. This may be implemented as either a trap or as a bare |
| fall-through, since B3 is allowed to assume that this will never be reached. Must appear |
| at the end of the basic block. The block must have zero successors. Note that we also use |
| the Oops opcode to mean "no such opcode" in some B3 transformations.</dd> |
| </dl> |
| |
| <h2>Flags</h2> |
| |
| <p>This section describes flags. These may be set in |
| <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Kind.h"><code>Kind</code></a>.</p> |
| |
| <dl> |
| <dt>Chill</dt> |
| <dd><i>Applies to: Div, Mod.</i> You can create a chill Div/Mod by saying |
| <code>chill(Div)</code>. This creates a Kind that has the Chill flag set. This can only be |
| used with interer types. An operation is said to be chill if it returns a sensible value |
| whenever its non-chill form would have had undefined behavior. Chill Div turns x/0 into 0 |
| and -2<sup>31</sup>/-1 into -2<sup>31</sup>. We recognize this in IR because it's exactly |
| the semantics of division on ARM64, and it's also exactly the semantics that JavaScript |
| wants for "(x/y)|0". Chill Mod turns x%0 into 0 and -2<sup>31</sup>%-1 into 0. This matches |
| the semantics of JavaScript "(x%y)|0".</dd> |
| |
| <dt>Traps</dt> |
| <dd><i>Applies to: Load8Z, Load8S, Load16Z, Load16S, Load, Store8, Store16, Store.</i> You can |
| create a trapping Kind from an opcode by saying <code>trapping(opcode)</code>. For example, |
| <code>trapping(Load)</code>. An operation is said to be trapping if it will cause a page |
| fault when used on an invalid pointer and this page fault will be observed. This means that |
| the compiler cannot fudge when the page fault happens. This is logically equivalent to what |
| B3 calls <code>Effects::exitsSideways</code>, but further implies that if any of the B3 |
| values used to fuse an Air instruction were trapping, then the Air instruction must have its |
| <code>Air::Kind::traps</code> flag set. The compiler won't help you identify where you |
| trapped. Even if you use the compiler's origin facility to track down the trap location, you |
| may get the origin of any B3 value that was incorporated into the fused instruction that |
| caused the trap. For example, "Add(Load<Traps>(ptr), $1)" may claim to trap at the Add |
| rather than the Load on x86, because this pattern is a perfect candidate for add-load |
| fusion. Nevertheless, you are guaranteed to get the trap and the trap will be observed at |
| the point you intended. For example, the compiler will not hoist a trapping load past any |
| effects, even those outside of its read range, because the trap is presumed to read top. The |
| compiler will not attempt to DCE a trapping load. The compiler will not attempt to sink or |
| eliminate any trapping stores, even if they are dead because of a guaranteed subsequent |
| store to the same address, because we conservatively assume that the store was done for the |
| trap effect. This feature is meant to support high throughput memory safety checks in |
| WebAssembly.</dd> |
| </dl> |
| |
| </div> |
| </body> |
| </html> |
| |