NaplesPUISelLowering.cpp

From NaplesPU Documentation
Revision as of 15:43, 21 June 2019 by Francesco (talk | contribs) (Francesco moved page NuPlusISelLowering.cpp to NaplesPUISelLowering.cpp)
Jump to: navigation, search

NuPlusTargetLowering class

The NuPlusTargetLowering class implements the TargetLowering interface and is contained in the NuPlusISelLowering.h/.cpp files. The class must describe how to lower LLVM code to machine code. This has two main components:

  1. Which ValueTypes are natively supported by the target.
  2. Which operations are supported for supported ValueTypes.

The NuPlusTargetLowering class can be thought of being composed by four sections:

  1. The class constructor.
  2. The custom SDNode lowering.
  3. The custom MachineInstruction insertion.
  4. The implementation of other TargetLowering virtual methods.

The Class Constructor

The NuPlusTargetLowering class constructor describes the MachineValueTypes supported by nu+ with the corresponding RegisterClass, the operations not supported for supported MachineValueTypes and the corresponding action to perform, other useful information concerning the Instruction-Lowering phase. These informations are described using the protected methods provided by the base class TargetLowering.

addRegisterClass

The addRegisterClass method tells the Code Generator the supported MachineValueTypes and the RegisterClass to use. The MachineValueTypes are defined in the MVT class in "compiler/include/llvm/CodeGen/MachineValueType.h", while the RegisterClass are those defined in NuPlusRegisterInfo.td. The following table summarize the supported MachineValueTypes and the relative RegisterClass.

RegisterClass MVT
GPR32RegClass i32, f32
GPR64RegClass i64, f64
VR512WRegClass v16i32, v16f32
VR512LRegClass v8i64, v8f64

The corresponding code is:


  // Scalar data types
  addRegisterClass(MVT::i32, &NuPlus::GPR32RegClass);
  addRegisterClass(MVT::f32, &NuPlus::GPR32RegClass);

  addRegisterClass(MVT::i64, &NuPlus::GPR64RegClass);
  addRegisterClass(MVT::f64, &NuPlus::GPR64RegClass);

  // Vector data types (512-bit wide)
  addRegisterClass(MVT::v16i32, &NuPlus::VR512WRegClass);
  addRegisterClass(MVT::v16f32, &NuPlus::VR512WRegClass);

  addRegisterClass(MVT::v8i64, &NuPlus::VR512LRegClass);
  addRegisterClass(MVT::v8f64, &NuPlus::VR512LRegClass);

setOperationAction

The setOperationAction method is used to tell the Code Generator the unsupported operation for the specified MachineValueType and the action to perform, this because by default every operation is considered Legal.

This information is used by the LLVM Code Generator during the InstructionSelection phase. So the input to give the function are the Operation Code (defined in the NodeType enum in "/compiler/include/llvm/CodeGen/ISDOpcodes.h"), the associated MachineValueTypes and the action to perform in order to legalize the operation. At this stage each operation is described by a Node in the SelectionDAG.

The action must be one of those in the LegalizeAction enum contained in the TargetLoweringBase class (/compiler/include/llvm/Target/TargetLowering.h). Hence the possible actions are:

  • Legal, the target natively supports this operation.
  • Promote, this operation should be executed in a larger type.
  • Expand, try to expand this to other ops, otherwise use a libcall.
  • LibCall, don't try to expand this to other ops, always use a libcall.
  • Custom, use the LowerOperation hook to implement custom lowering.

The most used in the nu++ backend are the Expand and the Custom actions. As mentioned before, the Expand action tries to expand the node using equivalent operations, if this cannot be done, a function call is generated. The signature is defined in the InitLibcallNames in "/compiler/lib/CodeGen/TargetLoweringBase.cpp". With the Custom action, the lowering must be manually done implementing a proper function.

Custom Lowering

When an operation must be custom lowered (using setOperationAction with Custom action), LLVM calls the LowerOperation function passing it the Node to lower and the DAG. The lowering is handled in this way:

  • The LowerOperation function is implemented with a switch on the Node Opcode, according to its value returns the SDValue of a specific function.
  • The operand specific functions are named as LowerOPERAND_NAME and they must generate the Machine Nodes that correspond to the input Node (object of SDValue class).

Selection DAG Node representation

LLVM uses mainly three classes to represent a SelectionDAG: SDNode and SDValue and SDUse. They are defined in "/compiler/include/llvm/CodeGen/SelectionDAGNodes.h".

The SDNode class is the representation of a node in the Selection DAG and stores information about the operation type and its operands (list of SDUse objects). The SDValue represents the output value of a SDNode. However, a SDNode may return muiltiple valuesas the result of a computation. Thus the SDValue stores informations about the SDNode as well as which return value to use from that node. The SDUse class represents a a use of a SDNode and holds an SDValue, which records the SDNode being used and the result number, a pointer to the SDNode using the value, and Next and Prev pointers, which link together all the uses of an SDNode.

From a visual point of view, we can imagine a Selection DAG as composed by nodes and and edges. A node has an operand list, an operation, an ID (all of them stored in a SDNode object) and one or more output values (each one represented by a SDValue object). An arrow goes from an input operand to a return value (informations stored in a a SDUse object) and represents data flow among nodes. There are also "chain" edges that represent control flow and provide an ordering between nodes that have side effects such as loads, stores, calls, returns, etc.

It is possible to visualize, during a compilation, the DAGs LLVM generates by passing the flag "-mllvm -debug=isel" (textual form) or using the "-view*" options (visual form). Note that LLVM must be compiled in "Debug" mode in order to show such informations.

Lowering

The purpose of a custom lowering node function is to substitute the input node with a legal node or DAG that performs the same operation. The difference among a "legal" DAG and an "illegal" one, is that the legal DAG uses only operations an types supported by the machine.

For example lets analyze "LowerFNEG" function. It is used to lower the FNEG node using legal nodes. The FNEG operation must perform the negation of a floating-point value (see "/compiler/include/llvm/CodeGen/ISDOpcodes.h" for definition). Since the nu+ architecture does not have an instruction to directly perform the operation, we have to perform the lowering manually.

The first step is to declare in the NuPlusTargetLowering class constructor that the FNEG operation must be custom lowered for all the supported floating-point types.

  // Note that this is just an example code. In nu+ backed 
  // we use for loops to make the file more readable.
  setOperationAction(ISD::FNEG, MVT::f32, Custom);
  setOperationAction(ISD::FNEG, MVT::f64, Custom);
  setOperationAction(ISD::FNEG, MVT::v16f32, Custom);
  setOperationAction(ISD::FNEG, MVT::v8f64, Custom);

Then we update the switch in the "LowerOperation" function so that calls the "LowerFNEG" function, previously defined in the NuPlusTargetLowering" interface (located in "/compiler/lib/Target/NuPlus/NuPlusISelLowering.h").

 SDValue NuPlusTargetLowering::LowerOperation(SDValue Op,
                                            SelectionDAG &DAG) const {
  ...
  case ISD::FNEG:
    return LowerFNEG(Op, DAG);
  ...
 }
 class NuPlusTargetLowering : public TargetLowering {
  public:
   ...
   SDValue LowerFNEG(SDValue Op, SelectionDAG &DAG) const;
   ...
 }


Then we define the "LowerFNEG" function.

SDValue NuPlusTargetLowering::LowerFNEG(SDValue Op, SelectionDAG &DAG) const {
  SDLoc DL(Op);
  MVT ResultVT = Op.getValueType().getSimpleVT();
  MVT IntermediateVT;

  if (ResultVT.isVector()) {
    IntermediateVT = 
            (ResultVT.getVectorElementType() == MVT::f32) ? MVT::v16i32 : MVT::v8i64;
  } else {
    IntermediateVT = 
            (ResultVT.getScalarType() == MVT::f32) ? MVT::i32 : MVT::i64;
  }

  SDValue rhs = (IntermediateVT.getScalarType() == MVT::i32)   ? 
                DAG.getConstant(0x80000000, DL, MVT::i32) :
                DAG.getConstant(0x8000000000000000, DL, MVT::i64);
                
  SDValue iconv;
  if (ResultVT.isVector())
    rhs = DAG.getNode(NuPlusISD::SPLAT, DL, IntermediateVT, rhs);

  iconv = DAG.getNode(ISD::BITCAST, DL, IntermediateVT, Op.getOperand(0));
  SDValue flipped = DAG.getNode(ISD::XOR, DL, IntermediateVT, iconv, rhs);
  return DAG.getNode(ISD::BITCAST, DL, ResultVT, flipped);
}

The two inputs are "SDValue Op" and "SelectionDAG &DAG". Op is the node that must be lowered, in this case an FNEG operation. It is a SDValue object because in this way we have also information about the output value. DAG is used to create the nodes containing the operations the one contained in Op.


This image shows how the FNEG is lowered in case of scalar floating-point value.
This image shows how the FNEG is lowered in case of vector floating-point value. Differently from the scalar case, the SPLAT node is added before the XOR.

The function must return the last of the legal nodes generated. The figures belove show lowering of FNEG node.

According to the IEEE 754 standard the MSB is the sign bit, thus to negate a floating-point value the MSB must be inverted. This could be done by XORing the floating-point value with 0x80000000 (for 32-bit floating-point) or 0x8000000000000000 (for 64-bit floating-point). The XOR operation, however, can only be done between integer operations. Thus floating-point operand must be bitcasted into an integer before XORing it. The bitcast operation tells LLVM to interpret a value with a different type with same width (e.g. i32 <-> f32). The inverse bistacst operation need to be done also for the XOR return value that must be interpreted as a floating-point.

In case of the FNEG is done on a vector type, we must add another node to correctly load the constant 0x80000000 (or 0x8000000000000000). This is done through the SPLAT operation. This operation takes a scalar and copies it on all the vector elements. In hardware it corresponds to a move operation.

The function code first checks the result type, in order to determine the appropriate integer type for the constant and XOR operation (It is stored in the "IntermediateVT" variable). Then we use the "getConstant" method to generate the Constant Node. The method takes the constant value, a SDLoc object containing the Op IR location info and the value type.

If the input operand type is a vector, we generate a SPLAT node. Then we create the XOR node preceded and succeeded by the SPLAT operations.

The method "getNode" is used to generate nodes. It takes the node Opcode, the SDLoc object, the output value type and the input operands. The function returns only the SDValue of the last BITCAST operation since it stores the informations about the previous nodes.

The SPLAT node does not belong to the "ISD" namespace but to the "NuPlusISD" namespace. This because LLVM does not have this node type and is defined only in the nu+ back-end. It will be resolved later during the Select phase using one of the patterns defined in NuPlusInstrInfo.td.

The other lowering functions act in a similar way. They lower the input node using the SelectionDAG class methods according to the operation to implement and the input node informations. However, there are cases in which nodes tagged to be custom lowered are simply expanded. This is done, for example, in the "LowerBUILD_VECTOR" function in which the BUILD_VECTOR node must be lowered with a SPLAT operation only if the vector it tries to build is composed by the same scalar value, otherwise the expansion of the BUILD_VECTOR node is sufficient.

The way to tell the Code Generator that the node must be expanded is to return an empty SDValue.


SDValue NuPlusTargetLowering::LowerBUILD_VECTOR(SDValue Op,
                                               SelectionDAG &DAG) const {
 // MVT VT = Op.getValueType().getSimpleVT();
  SDLoc DL(Op);

  if (isSplatVector(Op.getNode())) {
    // This is a constant node that is duplicated to all lanes.
    // Convert it to a SPLAT node.
    return CheckSplat(Op, Op.getOperand(0), DAG);
  }

  return SDValue(); // Expand
}

Custom MachineInstruction Insertion

Another way to lower unsupported operations is to manually emit Machine Instructions. The emission of Machine Instructions takes place just before the Register Allocation. It is important to remember that the Machine Instructions are still in SSA form, so care must be taken when creating virtual registers.

To instruct LLVM about what operations must have a custom inserter, a (pseudo)instruction in NuPlusInstrInfo.td with the usesCustomInserter flag setted to 1 must be defined.

When an MachineInstruction must be custom inserted, LLVM calls the EmitInstrWithCustomInserter function passing it the Machine Instruction to lower and the Basic Block, the instruction belongs:

  • The EmitInstrWithCustomInserter function is implemented with a switch on the Machine Instruction Opcode, according to its value returns the modified input basic block.
  • The operand specific functions are named as EmitOPERATION_NAME and they must generate the instructions that correspond to the input Machine Instruction.

Instruction Insertion

The purpose of a custom instruction inserter function is to substitute the input Machine Instruction with instructions that perform the same operation.

For example, lets consider the "EmitLoadI32" function.

In nu+, the immediate field of type I instruction is 16 bits wide so to load a 32-bit immediate value the instruction couple MOVEIH-MOVEIL must be emitted.

The EmitLoadI32 function is called by the EmitInstrWithCustomInserter hook function. The latter is called when the Code Generator encounters a machine instruction which has the usesCustomInserter flag setted to 1.

Therefore, in NuPlusInstrInfo.td, the pseudo-instruction LoadI32 is defined with usesCustomInserter = 1. The corresponding node is generated during the Select phase, when the pattern "(set i32:$dst, (i32 imm:$val))" is detected.

let usesCustomInserter = 1 in {
  def LoadI32 : Pseudo<
    (outs GPR32:$dst),
    (ins GPR32:$val),
    [(set i32:$dst, (i32 imm:$val))]>;

  ...

}

When must emit the instruction corresponding to LoadI32, the "EmitLoadI32" is called (through the EmitInstrWithCustomInserter hook function). The emit functions have at least two inputs (MachineInstr *MI and MachineBasicBlock *BB) and must return the MachineBasicBlock modified.

To emit instructions the BuildMI interface (defined in "/compiler/include/llvm/CodeGen/MachineInstrBuilder.h") is used. This interface allows a simple way to generate instructions, providing methods like addReg, addImm and etc.

In the case of the EmitLoadI32, first the immediate values is taken from the LoadI32 operation, then the InsertLoad32Immediate function is called. This function is responsible of generating the MOVEIH-MOVEIL instruction couple. At last LoadI32 is deleted and the resulting Basic Block is returned.

MachineBasicBlock *
NuPlusTargetLowering::EmitLoadI32(MachineInstr *MI,
                                  MachineBasicBlock *BB) const {

  DebugLoc DL = MI->getDebugLoc();
  const TargetInstrInfo *TII = Subtarget.getInstrInfo();
  
  // Get the integer immediate value.
  int64_t ImmOp = MI->getOperand(1).getImm();
  // Call the fuction that generates the MIs.
  InsertLoad32Immediate(MI, BB, TII, &DL, MI->getOperand(0).getReg(), ImmOp);

  MI->eraseFromParent();

  return BB;
}


The InsertLoad32Immediate function uses the BuildMI interface to generate the instructions. There are several versions of BuildMI, the one used takes as inputs the MachineBasicBlock where to insert the instructions, the pointer to the MachineInstruction before which insert the new instruction, a DebugLoc and the instruction description (MCInstrDesc class object) it must generate. The instruction description is obtained through the TargetInstrInfo get method passing it the instruction name.

Note: in a Machine Instruction the first operand is the output.

void InsertLoad32Immediate (MachineInstr *MI, MachineBasicBlock *BB,
                            const TargetInstrInfo *TII, DebugLoc *DL,
                            unsigned DestReg, int64_t Immediate) {

  BuildMI(*BB, *MI, *DL, TII->get(NuPlus::MOVEIHSI))
              .addReg(DestReg, RegState::Define)
              .addImm(((Immediate >> 16) & 0xFFFF));
  BuildMI(*BB, *MI, *DL, TII->get(NuPlus::MOVEILSI))
              .addReg(DestReg)
              .addImm((Immediate & 0xFFFF));

}

Since the instruction are still in SSA form, the code above shows a workaround to use the same destination register.


Other Methods

The NuPlusTargetLowering class redefines also some other methods used to instruct the Code Generator about the specific capabilities supported or unsupported by the nu+ architecture.