NaplesPU LLVM Documentation

From NaplesPU Documentation
Revision as of 09:32, 31 March 2019 by Francesco (talk | contribs) (ISA Support)
Jump to: navigation, search

The main task of the backend is to generate nu+ assembly code from the LLVM IR obtained by the Clang frontend. It also handles object representation of classes needed to create the assembler and the disassembler. The nu+ backend is contained in the NuPlus folder under "compiler/lib/Target" directory. It contains several files, each implementing a specific class of the LLVM Framework.

An LLVM backend is constructed on two types of file, C++ and TableGen source files. Refer to section TableGen to get a detailed explanation of the latters.

Required reading

Before working on LLVM, you should be familiar with some things. In particular:

  1. Basic Blocks
  2. SSA (Static Single Assignment) form
  3. AST (Abstract Syntax tree)
  4. DAG Direct Acyclic Graph.

In addition to general aspects about compilers, it is recommended to review the following topics:

  1. LLVM architecture
  2. LLVM Intermediate Representation

See the following textbook for other information Getting Started with LLVM Core Libraries and LLVM Cookbook.

See also this article to get an overview of the main CodeGenerator phases.

TableGen

TableGen is a record-oriented language used to describe the target-specific information. It is written by the LLVM team in order to simplify the back-end development and to avoid potential code redundancy. For example, by using TableGen, if some feature of the target-specific register file changes, you do not need to modify different files wherever the register appears but you need only to modify the .td file that contains its definition. Actually, the TableGen is used to define instruction formats, instructions, registers, pattern-matching DAGs, instruction selection matching order, calling conventions, and target platform properties.

For other informations, check the TableGen Documentation

Backend Description

This section shows how is implemented the backend support for nu+ within LLVM.

Target Definition

The target-specific information is explained in TableGen files. The custom target is defined by creating a new NuPlus.td file, in which the target itself is described. This file contains the implementation of the target-independent interfaces provided by Target.td. Implementations are done by using the class inheritance mechanism.

The code below is the Target class definition that should be implemented in NuPlus.td.

class Target {
  InstrInfo InstructionSet;
  list<AsmParser> AssemblyParsers = [DefaultAsmParser];
  list<AsmParserVariant> AssemblyParserVariants = [DefaultAsmParserVariant];
  list<AsmWriter> AssemblyWriters = [DefaultAsmWriter];
}

This file should also include the other defined .td target-related files. The target definition is done as follows:

def : Processor<"nuplus", NoItineraries, []>;

where Processor is a class defined in Target.td: class Processor<string n, ProcessorItineraries pi, list<SubtargetFeature> f>

where:

  • n is the chipset name, used in the command line option -mcpu to determine the appropriate chip.
  • p is the processor itinerary, as described in the theoretical description of the LLVM instruction scheduling phase. NoItinerary means that no itinerary is defined.
  • f is a list of target features.

Registers Definition

The target-specific registers are defined in NuPlusRegisterInfo.td. LLVM provides two ways to define a register, both of them declared in Target.td. The first one is used to define a simple scalar register and follows the declaration below:

class Register<string n, list<string> altNames = []>

where n is the register name, while altNames is a list of register alternative names.

The second method to define a register is to inherit from the class declared below:

class RegisterWithSubRegs<string n, list<Register> subregs>

This second way is used when it is required to define a register that is a collection of n sub-registers. There is also a third way to define registers. It consists of defining a super-register, that is a pseudo-register resulting of the combination of other sub-registers. It is useful when there is no architectural support for larger registers:

class RegisterTuples<list<SubRegIndex> Indices, list<dag> Regs>

For example, the following code is the declaration of 32-bit register class:

class NuPlus32GPRReg<string n> : Register <n>;

At this point, registers can be instantiated as follows:

foreach i = 0-57 in {
    def S#i : MyTargetGPRReg<"s"#i>, DwarfRegNum<[i]>;
 }
 ...
def SP_REG : MyTargetGPRReg<"sp">, DwarfRegNum<[61]>; //stack pointer
...
foreach i = 0-63 in {
    def V#i : MyTargetGPRReg<"v"#i>, DwarfRegNum<[!add(i, 64)]>;
}

The instantiation reflects the custom target hardware architecture, with a set of scalar registers and a set of vectorial ones. Each register inherits also from theDwarfRegNum assigning to it an incremental number. This is useful for the internal identification of registers, consistent with the DWARF standard.

Now that registers are defined they must belong to classes in order to define the allocation of them.

LLVM provides the RegisterClass defined as below:

class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, dag regList>

in which:

  • namespace is the namespace associate to it;
  • regTypesis a list of ValueType values indicating the types of variables that can be allocated into them;
  • alignment is the alignment associated to the registers in the RegisterClass;
  • regList is the list of registers that belong to the class. Since the parameter type is dag, TableGen provides operands to define a set of registers in terms of a set of operators.

For example, to match our target features it is necessary to define two classes:

  • GPR32, that is the abstraction of 32-bit wide scalar registers;
  • VR512W, that are the abstraction of 512-bit wide vectorial registers;
def GPR32 : RegisterClass<"MyTarget", [i32, f32, i64, f64], 32, (add (sequence "S%u", 0, 57),
  TR_REG, MR_REG, FP_REG, SP_REG, RA_REG, PC_REG)>;
def VR512W : RegisterClass<"MyTarget", [v16i32, v16f32, v16i8, v16i16], 512, (sequence "V%u", 0, 63)>;

Calling Convention

This section shows how to define the calling convention, that is how parameters are passed to sub-functions, and how the return value is sent back to the caller.

The calling convention is defined in the NuPlusCallingConv.td by using the classes defined in the TargetCallingConv.td file.

In our purposes, it's required to define the calling convention in terms of the registers used to pass the arguments to the callee. LLVM provides the CallingConv class defined below:

class CallingConv<list<CCAction> actions>

This class requires a list of CCAction. TargetCallingConv.td contains a set of CCAction derived classes that must be used to define the sub-function calling behaviour.

The calling convention for the custom target device is defined by using the first eight registers, and then the stack for the remaining parameters. It means that for 32-bit variables, they are passed to the callee by using the registers Si, where i = 0..7. The same schema is adopted for vectorial variables.

The calling convention should also take care of the passing of type that is not natively supported by the target. In this case, the solution adopted is the \textit{type promotion}. The mechanism is simple: it consists of promoting the unsupported type to a supported one. It can be easily implemented by only using the CCAction classes provided by LLVM. By using the just described approach, i1, i8, i16 are promoted to i32 while v16i8, v16i16 are promoted to v16i32.

def CC_NuPlus : CallingConv<[
  CCIfType<[i1, i8, i16], CCPromoteToType<i32>>,
  CCIfType<[v16i8, v16i16], CCPromoteToType<v16i32>>,
  CCIfNotVarArg<CCIfType<[i32, f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7]>>>,
  CCIfNotVarArg<CCIfType<[v16i32, v16f32], CCAssignToReg<[V0, V1, V2, V3, V4, V5, V6, V7]>>>,
  CCIfType<[i32, f32], CCAssignToStack<4, 4>>,
  CCIfType<[v16i32, v16f32], CCAssignToStack<64, 64>>
]>;

The calling convention for nu+ in terms of results returning mechanism is realised by passing them in the first six registers.

It could be also possible that the return type does not correspond to any native target type. The solution is promoting.

def RetCC_NuPlus32 : CallingConv<[
CCIfType<[i1, i8, i16], CCPromoteToType<i32>>,
CCIfType<[i32, f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5]>>>

LLVM also provides a mechanism to define the registers that the callee must save before starting the function execution. The mechanism consists in defining an instance of the class CalleeSavedRegs:

class CalleeSavedRegs<dag saves>

where saves is the list of registers to be saved.

For example, in nu+, the callee saved registers are defined as follows:

def NuPlusCSR : CalleeSavedRegs<(add MR_REG, FP_REG, RA_REG)>

ISA Support

In order to define the LLVM support for nu+, the next step is to handle the target-specific code generation by implementing its ISA (Instruction Set Architecture) in LLVM. In order to describe it, TableGen provides the class Instruction located in Target.td.

class Instruction {
  string Namespace = "";
  dag OutOperandList;       
  dag InOperandList;     
  string AsmString = ""; 
  list<dag> Pattern;
  list<Register> Uses = []; 
  list<Register> Defs = [];
  list<Predicate> Predicates = [];
  int Size = 0;
  bit isReturn     = 0;     
  bit isBranch     = 0;    
  bit isPseudo     = 0;    
  bit isCodeGenOnly = 0;
  bit isAsmParserOnly = 0;
  ...
}

As it is clear, the Instruction class provides a lot of member variable to describe an instruction. The following are a subset that is used for our purposes:

  • OutOperandList is the set of output operands of the instruction, defined as of dag type.
  • InOperandList is the set of input operands of the instruction, defined as of dag type.
  • AsmString is the.s format to print the instruction with.
  • isReturn defines the instruction as a return one.
  • isBranch defines the instruction as a branch one, that is, it breaks the sequential control flow.
  • isPseudo defines the instruction as a pseudo one, or rather an instruction for which there is not a corresponding machine instruction.
  • isCodeGenOnly, isAsmParserOnly and usesCustomInserter are also defining a pseudo-instruction, but with different behaviour. The first field is describing a pseudo-instruction that is used for codegen modelling purposes while the second one is describing a pseudo-instruction used by the assembler parser. The third one tells that the pseudo-instruction needs to be manually lowered to a machine instruction.
  • Uses and Defs are lists of registers that are respectively read and written by the instruction.
  • Predicates are logical expressions that must be checked in the Instruction Selection phase.
  • Pattern is a list of dag, each of them explains a DAG Pattern for the instruction used by the instruction selector to turn a set of SelectionDAG}nodes in the corresponding target-specific instruction.

In order to handle the target platform ISA complexity it has been adapted an hierarchical schema of classes, each one represents a specific instruction format.

First of all it is defined a generic format for the target-specific instruction:

class InstNuPlus<dag outs, dag ins, string asmstr, list<dag> pattern>
          : Instruction {
  field bits<32> Inst;
  let Namespace = "MyTarget";
  let Size = 4;
  dag OutOperandList = outs;
  dag InOperandList = ins;
  let AsmString   = asmstr;
  let Pattern = pattern;
}


All instruction classes are inherited by the one above, specifying what the instruction bits are representing. Since the target supports different types of instruction format, for each of them is defined as a different class. Each class is then defined in terms of other sub-classes. For example, it may be useful to analyse the R format, that is the instruction format used to handle logical and arithmetic operations and memory operations. By analysing the instruction type it may be useful to distinguish among different sub-types, by structuring a hierarchy.

In this way, we shall define a set of classes implementing the hierarchy exposed, from the root, that represents the most generic R-type instruction, to leaves. Since the instruction format requires a set of FMT bits, it is also necessary to define a class to handle them:

class Fmt<bit val> {
  bit Value = val;
}

The root class is defined as the most general class of the hierarchy. For this reason, the second source operand is not defined. The derived class, implementing the two-operand R-type instruction, will properly set the related field.

class FR<dag outs, dag ins, string asmstr, list<dag> pattern, bits<6> opcode, Fmt fmt2, Fmt fmt1, Fmt fmt0>
   : InstNuPlus<outs, ins, asmstr, pattern> {
  bits <6> dst;
  bits <6> src0;
  let Inst{31-30} = 0;
  let Inst{29-24} = opcode;
  let Inst{23-18} = dst;
  let Inst{17-12} = src0;
  let Inst{5} = 0; //unused
  let Inst{4} = 0; //No 64-bit support
  let Inst{3} = fmt2.Value;
  let Inst{2} = fmt1.Value;
  let Inst{1} = fmt0.Value;
  let Inst{0} = 1;
  let Uses=[MR_REG];
}

class FR_TwoOp<dag outs, dag ins, string asmstr, list<dag> pattern, bits<6> opcode, Fmt fmt2, Fmt fmt1, Fmt fmt0>
   : FR<outs, ins, asmstr, pattern, opcode, fmt2, fmt1, fmt0> {
  bits <6> src1;
  let Inst{11-6} = src1;
  
}


class FR_OneOp<dag outs, dag ins, string asmstr, list<dag> pattern, bits<6> opcode, Fmt fmt2, Fmt fmt1, Fmt fmt0>
   : FR_TwoOp<outs, ins, asmstr, pattern, opcode, fmt2, fmt1, fmt0> {
   let Inst{11-6} = 0;
}

The code above is showing the implementation of the classes hierarchy illustrated above. All the other instruction format are implementing by using the same method.

However, the class definition method is not a flexible way to explain the commonalities among more than two definition instances. As a solution, LLVM provides the multiclass construct. For example, if an arithmetic instruction is designed to work both with an immediate value or two registers, the class construct forces to have an implementation as described below.

class rrinst<int opc, string asmstr>
  : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
         (ops GPR:$dst, GPR:$src1, GPR:$src2)>;
class riinst<int opc, string asmstr>
  : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
         (ops GPR:$dst, GPR:$src1, Imm:$src2)>;
def ADD_rr : rrinst<0b111, "add">;
def ADD_ri : riinst<0b111, "add">;
...

The usage of the class construct in this context is quietly unfeasible. Differently, below is showed how clean is the approach using the multiclass construct.

multiclass ri_inst<int opc, string asmstr> {
  def _rr : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
                 (ops GPR:$dst, GPR:$src1, GPR:$src2)>;
  def _ri : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
                 (ops GPR:$dst, GPR:$src1, Imm:$src2)>;
}
defm ADD : ri_inst<0b111, "add">;
...

The multiclass construct is used to handle different register types for arithmetic operations. Therefore, despite defining several classes, one for each combination of scalar and vectorial registers as source operands, the usage of the multiclass construct is the best solution to do it.

multiclass FArithInt_TwoOp<string operator, SDNode OpNode, bits<6> opcode> {
  def SSS_32 : FR_TwoOp<
    (outs GPR32:$dst),
    (ins GPR32:$src0, GPR32:$src1),
    operator # "_i32 $dst, $src0, $src1",
    [(set i32:$dst, (OpNode i32:$src0, i32:$src1))],
    opcode,
    Fmt_S,
    Fmt_S,
    Fmt_S>;