Websites/webkit.org/docs/b3/assembly-intermediate-representation.html - WebKit - Git at Google

 <html>
 <head>
   <title>Assembly Intermediate Representation</title>
   <link rel="stylesheet" type="text/css" href="style.css">
 </head>
 <body>
   <div id="banner">
     <a href="http://www.webkit.org/" class="banner-link">
       <div id="logo" class="site-logo">
         WebKit
         <span class="tagline">Open Source Web Browser Engine</span>
       </div>
     </a>
   </div>
   <div id="contents">
     <h1><a href="index.html">Bare Bones Backend</a> / Assembly Intermediate Representation</h1>
     <p>The B3 compiler comprises two intermediate representations: a higher-level
       <a href="https://en.wikipedia.org/wiki/Static_single_assignment_form">SSA</a>-based
       representation called <a href="intermediate-representation.html">B3 IR</a> and a lower-level
       representation that focuses of machine details, like registers. This lower-level form is called
       Air (Assembly Intermediate Representation).</p>

     <p>Air programs are represented using a
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirCode.h"><code>Air::Code</code></a>
       object. <code>Code</code> comprises an array of
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirBasicBlock.h">basic
         blocks</a>. Each basic block comprises an array of
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirInst.h"><code>Inst</code></a>s.
       Air has an explicit control flow graph: each basic block has predecessor and successor blocks.
       Execution always begins at the first basic block (<code>code[0]</code>). The <code>Inst</code>s
       in each block are executed in order. Each <code>Inst</code> has an opcode, some flags, an
       array of arguments
       (<a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirArg.h"><code>Arg</code></a>s),
       and an origin. The opcode and flags are wrapped in a <code>Kind</code>, to make it
       convenient to carry them around together. The origin is simply a B3 IR
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Value.h"><code>Value</code></a>.
       Some opcodes use the origin for additional meta-data. This works because Air code always
       coexists with the B3 procedure from which it was generated.</p>

     <p>This document begins by describing the philosophy of Air. The behavior of <code>Arg</code>s is
       central to Air's execution model, which is described in the section that follows. The last
       section describes the way Air opcodes are defined.</p>

     <h2>Philosophy of Air</h2>

     <p>B3 is designed to be portable to many kinds of CPUs. Currently, it supports x86-64 and ARM64,
       which are quite different from each other. In B3 IR, we expose very few instruction set
       details. It's a goal of B3 IR to ensure that B3 values behave the same way except when the
       alternative would be counterproductive (like with pointer size, the corner-case behaviors of
       division, or calling convention customization). But to effectively compile code to different
       CPUs, the compiler has to eventually make instruction set details explicit. This is where Air
       comes in. B3 locks in most CPU-specific details at the moment of conversion to Air, and the Air
       code is irreversibly tied to some specific CPU.</p>

     <p>Air is an instruction <i>superset</i>: it recognizes all of the instructions from all CPUs
       that Air may target. In its lowest-level form, Air is simply a way of describing an assembly
       instruction sequence, and this includes CPU concepts like registers and direct accesses to the
       stack. Air also has a higher-level form in which the assembly has not yet undergone register
       or stack allocation. Therefore, Air also supports abstract registers (called
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirTmp.h"><code>Tmp</code></a>s)
       and abstract
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirStackSlot.h">stack
         slots</a>. A <code>Tmp</code> object can either hold an unallocated temporary or a
       register.</p>

     <h3>Air as an Instruction Superset</h3>
     <p>Air has syntax to speak of all of the CPU instructions we know about. It is possible to speak
       of an x86-64 instruction while compiling for ARM64, for example. Clients of Air, such as the B3
       to Air lowering phase, are allowed to pick any Air opcode and ask if that opcode would be
       valid on the current CPU. They are also allowed to check if specific forms of any given opcode
       are valid. This allows clients to optimize for multiple instruction sets by cascading through
       the possible opcodes that they know of, starting with the one they think is most efficient.
       Some of those opcodes may only be available on one CPU while others are available
       everywhere. Instruction selection does not need to know which instructions work on which CPUs;
       Air will tell you if some instruction happens to not be valid right now for whatever reason.</p>

     <p>Air opcodes support overloading. For example, the Add32 opcode has both two-operand and
       three-operand <i>overloads</i>, and those overloads have multiple <i>forms</i>: the first
       operand may or may not be permitted to be an immediate and depending on the CPU and some of the
       other operands may or may not be allowed to be memory addresses. We use <i>opcode overload</i>
       to refer to all forms of an opcode that share the same number of arguments, and <i>opcode
         form</i> to mean the number of arguments and their types. A fundamental Air operation is
       <code>Inst::isValidForm()</code>, which tells the client if the instruction's current form is
       valid on the current CPU. This may return false either because the Inst is not well-formed for
       any CPU or because it is not valid for the current CPU even though it may be valid on some
       other CPU. There is also <code>Air::isValidForm()</code>, which can answer if the form you are
       intending to use will be valid even if you have not created an <code>Inst</code> yet. This
       allows clients to generate Air by experimenting with different forms before settling on the one
       that the current CPU supports.</p>

     <h3>Air as a High-Level Assembly</h3>
     <p>Air doesn't require the client to perform register or stack allocation. Anywhere that Air
       accepts a register it will also accept a <code>Tmp</code>. Anywhere that Air accepts an address
       it will also accept a stack slot reference. Air code generation comprises a register allocator
       and a stack allocator, which turn <code>Tmp</code>s into <code>Reg</code>s and stack slots into
       addresses with the frame pointer (or stack pointer) as the base and some integer offset. Air
       allows clients to speak of registers directly even while also using <code>Tmp</code>s, and the
       register allocator will ensure that it avoids clobbering registers that the client code already
       relies upon. This is possible because Air has precise modeling of how instructions use
       registers, so it's always possible to determine which registers are live at any point in the
       Air code.</p>

     <p>Air's philosophy allows B3 to use it for converting high-level, mostly-CPU-agnostic SSA
       procedures into code for the current CPU. Air is an instruction superset that allows clients to
       consider all available instructions on all possible CPUs and query which forms of those
       instructions are available on the current CPU. Air also supports for high-level concepts like
       <code>Tmp</code>s and stack slots, which allows B3 to Air lowering to focus on which
       instructions to use without worrying about register allocation or stack layout.</p>

     <h2>Args and the Air Execution Model</h2>
     <p>Air can be thought of as an
       <a href="https://en.wikipedia.org/wiki/Orthogonal_instruction_set">orthogonal instruction
         set</a>. It's possible to construct an <code>Inst</code> with any combination of opcode and
       arguments. The opcode determines what Air will do to the arguments - it may read from them or
       write to them, for example. Orthognality implies that any argument that is read may be either
       a register (or <code>Tmp</code>), an address, or an immediate; while any argument that is
       written may be either a register or an address. Air constrains orthognality where the target
       CPU would. For example, none of Air's target CPUs would support an <code>Add32</code>
       instruction that loads its sources from memory <i>and</i> stores its result into memory. Even
       x86 doesn't go that far. Either before or after creating an <code>Inst</code>, the client can
       query if a particular combination of arguments (for example, three memory addresses) would be
       valid for a given opcode (for example, <code>Add32</code>).</p>

     <p>Air arguments are represented using the <code>Arg</code> object. <code>Arg</code> can
       represent any of the following assembly operands:</p>

     <dl>
       <dt>Tmp</dt>
       <dd>A <code>Tmp</code> represents either a register or a temporary.</dd>

       <dt>Imm, BigImm, BitImm, and BitImm64</dt>
       <dd>These are various kinds of immediates. We distinguish between big and small immediates
         because some instructions only allow immediates within a certain range. We distinguish
         between immediates for bit operations and immediates for all other operations because ARM
         has different constraints on the values of immediates depending on whether they are used for
         bit math.</dd>

       <dt>Addr, Stack, CallArg, and Index</dt>
       <dd>These are all memory addresses. Addr is a base-offset address, which uses a <code>Tmp</code>
         for the base and an immediate for the offset. Stack and CallArg are abstract stack offsets.
         Index is a base-index address, which has a pair of <code>Tmp</code>s (base and index) as well
         as an offset immediate and a scaling immediate for the index.</dd>

       <dt>RelCond, ResCond, and DoubleCond</dt>
       <dd>These are condition codes for various kinds of branches.</dd>

       <dt>Special</dt>
       <dd>Air allows certain <code>Inst</code>s to point to additional meta-data. The Special
         argument type is used for such meta-data. It holds a <code>Arg::Special*</code>.</dd>

       <dt>WidthArg</dt>
       <dd>Some special Air opcodes take operands that describe the width of the operation. Possible
         values are <code>Width8</code>, <code>Width16</code>, <code>Width32</code>, and
         <code>Width64</code>.</dd>
     </dl>

     <p>The opcode of an <code>Inst</code> combined with the overload - i.e. the number of arguments
       - determines what the <code>Inst</code> will do to each argument. The behavior of arguments
       comes down to three dimensions that are determined by the opcode and overload:</p>

     <dl>
       <dt>Role</dt>
       <dd>The role of an argument is an enum that describes the timing of when the argument is
         accessed, how it's accessed (use, a.k.a. read; or def, a.k.a write), how important that
         access is for performance (either <i>warm</i> or <i>cold</i>), and how writes affect the top
         bits (either ignores them or zero-fills them). The timing of an argument role is discussed
         further below. The performance requirements of an argument are used for register allocation
         prioritization. A warm argument is counted towards the register allocation priority
         heuristic, while a cold one isn't.</dd>

       <dt>Type</dt>
       <dd>Air recognizes two types, <code>GP</code> (general purpose) and <code>FP</code> (floating
         point). Arguments also have type. It's important to remember that there is both a type for
         the argument as determined by the opcode and overload, and a type of the argument itself.
         Some arguments are untyped, which means that they may be used regardless of the type desired
         by the opcode/overload. For example, addresses are untyped. Other arguments have specific
         type, like registers and <code>Tmp</code>s. Except for <code>BigImm</code>, immediates are
         <code>GP</code>.</dd>

       <dt>Width</dt>
       <dd>The amount of bits, starting at lowest-order, that the instruction affects. Possible values
         are <code>Width8</code>, <code>Width16</code>, <code>Width32</code>, and
         <code>Width64</code>.</dd>
     </dl>

     <p>The timing of an argument role is important, and brings us to the order of execution of an
       <code>Inst</code>. Each <code>Inst</code> can be thought of as executing three
       steps:</p>

     <ol>
       <li>Perform <i>early</i> actions.</li>
       <li>Perform <i>primary</i> actions.</li>
       <li>Perform <i>late</i> actions.</li>
     </ol>

     <p>The early actions of one instruction happen immediately after the late actions of
       the instruction before it. However, many Air analyses view them as happening at the same time.
       For example, any register usage in the early action of one instruction interferes with the
       register usage in the late action of the instruction that came before it. All of Air's
       liveness and interference analyses reason about the
       <a href="https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error"><i>fence posts</i></a>
       between instructions, where the late actions of the previous instruction and the early actions
       of the next form an interference clique.</p>

     <p>Let's consider a simple example, like <code>Add32</code> with two arguments. Let's say that
       the first argument is a memory location and the second argument is a register. Air uses the
       <a href="https://en.wikipedia.org/wiki/X86_assembly_language#Syntax">AT&amp;T style</a> of
       placing the destination argument last for most instructions. This Add32 loads from memory and
       adds the value to the register. Air writes this as:</p>

     <pre><code>Add32 42(%rax), %rcx</code></pre>

     <p>This instruction will proceed in three steps:</p>

     <ol>
       <li>Load the value at offset 42 from the memory address in <code>%rax</code>. The result is
         stored in an internal, hidden CPU location for the remainder of execution. Even if the
         instruction later stores to memory and overwrites this value, <code>Add32</code> will still
         use the original value it had loaded. We say that this is an <i>early use</i>. At the same
         time, the CPU will load the value of <code>%rcx</code> and store it in a hidden CPU
         location. This is also an early use.</li>
       <li>Add the two values together. Store the result in hidden CPU location.</li>
       <li>Zero-extend the resulting value and store it into <code>%rcx</code>. This is a <i>late
           def with zero extension</i>.</li>
     </ol>

     <p>Therefore, the two-argument overload of <code>Add32</code> ascribes the following to its
       arguments:</p>

     <ul>
       <li>The roles are <i>early warm use</i> for the first argument (<code>42(%rax)</code>) and
         <i>early warm use with late warm def with zero extension</i> for the second argument
         (<code>%rcx</code>). Early warm use is written as <code>Use</code> for short. Early warm
         use with late warm def with zero extension is written as <code>UseZDef</code> for
         short.</li>
       <li>The types of both arguments are <code>GP</code>. This matches <code>42(%rax)</code> because
         addresses match any type. This matches <code>%rcx</code> because it's a general-purpose
         register.</li>
       <li>The widths of both arguments are <code>Width32</code>. Combined with <code>UseZDef</code>,
         this means that the instruction will read the low 32 bits of <code>%rcx</code> in the early
         actions and it will store to <i>all</i> bits of <code>%rcx</code> in the late actions, but it
         will ensure that all but the low 32 bits are zero.</li>
     </ul>

     <p>Air can tell you what role, type, and width is ascribed to each argument by using the
       <code>Inst::forEachArg(func)</code> operation. It takes a callback of type
       <code>void(Arg&, Arg::Role, Arg::Type, Arg::Width)</code>. For our <code>Add32</code> example,
       this callback would get called twice:</p>

     <ol>
       <li><code>func(42(%rax), Use, GP, Width32)</code></li>
       <li><code>func(%rcx, UseZDef, GP, Width32)</code></li>
     </ol>

     <p>It's important to remember that Air's summaries of what instructions do to arguments are
       not meant to be exhaustive. For example, if an instruction claims to use an address, this
       tells you that the instruction will perform a load but it tells you nothing about how that
       load will be performed. This means that unless you know exactly what it means to use/def an
       argument, you cannot perform this transformation:</p>

     <ul>
       <li>Before:
         <pre><code>Foo32 (%rax)</code></pre></li>
       <li>After:
         <pre><code>Move32 (%rax), %tmp
 Foo32 %tmp</code></pre></li>
     </ul>

     <p>Even if you know that Foo32 only uses its argument, you cannot do this because Move32 may
       not load from the address using exactly the same kind of load that Foo32 would have done.
       Memory accesses have many dimensions of options: alignment semantics (if you're lucky then
       misaligned accesses run fast but sometimes they ignore low bits, trap if unaligned, or run
       super slow when unaligned, and this behavior may depend on the opcode), temporality and
       memory ordering, determinism of trapping, etc. Just seeing that an instruction uses an
       address does not tell you what kind of load will happen, and currently Air does not have the
       ability to answer such questions. Fortunately, Air does not have much need to move memory
       accesses out of instructions. Uses and defs of temporaries, registers, immediates, and spill
       slots don't have these caveats and so those arguments can be moved from one instruction to
       another without worries.</p>

     <p>Air's introspection of <code>Inst</code>s tends to be quite fast thanks to the use of template
       specialization and C++ lambdas. The <code>forEachArg()</code> template method uses an efficient
       arrangement of switch statements to determine the opcode and overload. If <code>func</code> is
       a C++ lambda, we expect <code>forEachArg()</code> to be specialized for that lambda. Therefore,
       this idiom avoids virtual dispatch or memory allocation.</p>

     <p>Air supports exotic roles, such as late uses and early defs. There is even the
       <code>Scratch</code> role, which means early def and late use. Speaking of a <code>Tmp</code>
       in the <code>Scratch</code> role means that the <code>Tmp</code> will be assigned a register
       that is guaranteed to not interfere with any of the other registers that the instruction
       speaks of. Late uses and early defs are crucial for patchpoints, which may require
       that one of the incoming values be given a register that does not interfere with whatever
       register is used for the result. This can be expressed either as giving the inputs a late use
       role or by giving the outputs an early def role. The full list of possible roles is:</p>

     <dl>
       <dt>Use</dt>
       <dd>Early warm use.</dd>

       <dt>ColdUse</dt>
       <dd>Early cold use.</dd>

       <dt>LateUse</dt>
       <dd>Late warm use.</dd>

       <dt>LateColdUse</dt>
       <dd>Late cold use.</dd>

       <dt>Def</dt>
       <dd>Late def. <i>Note that all defs are warm.</i></dd>

       <dt>ZDef</dt>
       <dd>Late def with zero-extension.</dd>

       <dt>UseDef</dt>
       <dd>Early warm use and late def.</dd>

       <dt>UseZDef</dt>
       <dd>Early warm use and late def with zero extension.</dd>

       <dt>EarlyDef</dt>
       <dd>Early def.</dd>

       <dt>Scratch</dt>
       <dd>Early def and late warm use.</dd>

       <dt>UseAddr</dt>
       <dd>Early warm use of the address's components.</dd>
     </dl>

     <p><code>UseAddr</code> is interesting for the <code>Lea</code> (load effective address)
       instruction, which evaluates the address and places the result into a temporary or register.
       The argument must be an address, but <code>UseAddr</code> means that we don't actually read
       from the address. Note that using an address in any of the other roles always implies that the
       components of the address are used early and warm (i.e. <code>Use</code>).</p>

     <p>Air arguments are central to Air's execution model. The early and late actions of an
       instruction have to do with arguments, and what happens to each argument during the early and
       late actions is determined by the opcode and the number of arguments (i.e. the overload).
       Clients of Air may create an <code>Inst</code> with any combination of opcode and arguments
       and then query, using <code>isValidForm()</code> if the opcode, overload, and specific
       arguments are valid for the current CPU.</p>

     <h2>Defining Air</h2>
     <p>Air has many opcodes and the opcodes have many different overloads and forms. Air makes it
       easy to reason about all of them with helpers like <code>isValidForm()</code> and
       <code>forEachArg()</code>. It also provides a <code>Inst::generate()</code> function that will
       generate code for the instruction, provided that it does not use any non-register
       <code>Tmp</code>s or any abstract stack slots. If we wrote the
       code for validating, iterating, and generating each form by hand, we would have a bad time.
       For this reason, Air comes with an
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/opcode_generator.rb">opcode code generator</a>
       that uses a
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes">opcode definition file</a>
       as input. The opcode definition file use a simple and concise syntax that lets us define many
       opcodes at once and constrain them to the CPU kinds that support them. Additionally, Air
       supports <i>custom</i> opcodes, where the code generator emits calls to
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirCustom.h">hand-written
         C++ code</a>. This section describes the opcode definition language.</p>

     <p>It's easiest to understand the opcode definitions with an example. Let's use the two-argument
       overload of <code>Add32</code>.</p>

     <pre><code>Add32 U:G:32, UZD:G:32
     Tmp, Tmp
     x86: Imm, Addr
     x86: Imm, Index
     Imm, Tmp
     x86: Addr, Tmp
     x86: Tmp, Addr
     x86: Tmp, Index</code></pre>

     <p>The first line defines the overload. It has two arguments. The first argument serves the
       <code>Use</code> role, shorted as <code>U</code>. It is general-purpose, shortened as
       <code>G</code>. It has a 32-bit width. Hence the string <code>U:G:32</code>. Similarly,
       <code>UZD:G:32</code> means <code>UseZDef</code>, <code>GP</code>, <code>Width32</code>.</p>

     <p>The next lines list the available forms of the overload. A form is a list of possible kinds of
       arguments. These use the same terminology for <code>Arg</code> kinds from the previous section,
       with the caveat that <code>Addr</code> implies that <code>Addr</code>, <code>Stack</code>, or
       <code>CallArg</code> would be accepted.</p>

     <p>Prefixing any line with <code>x86:</code> means that this form is only available on x86
       CPUs, such as x86 or x86-64.</p>

     <p>Air opcodes are designed to work with JavaScriptCore's existing MacroAssembler. By default, an
       opcode is automatically given a code generator that calls
       <code>MacroAssembler::<i>opcodeName</i></code>, where <i>opcodeName</i> is derived by
       lower-casing the first letter of the Air opcode name. <code>Add32</code> becomes
       <code>MacroAssembler::add32</code>, for example.</p>

     <p>See the header of
       <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes">AirOpcode.opcodes</a>
       for a complete list of shorthand used by Air's opcode definition language.</p>

     <h2>Summary</h2>
     <p>Air is designed around JavaScriptCore's existing MacroAssembler. Air has Inst objects,
       which each describe some method call to the MacroAssembler: an Inst's opcode indicates
       which method name to use and its args indicate the arguments to pass to that method. We
       use code generation to create a massive switch statement that turns the reflective Insts
       into actual calls to MacroAssembler. Consequently, we can "add" new instructions to Air
       usually by just editing the <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes">AirOpcode.opcodes</a>
       file.</p>
   </div>
 </body>
 </html>