Extending NaplesPU for 64-bit support

From NaplesPU Documentation
Jump to: navigation, search

NaplesPU toolchain can be extended to support 64-bit operations. A git branch with full 64-bit support is provided. Consequently, if it is necessary to compile the toolchain supporting this extension, a checkout on llvm-7-64b branch is required.

Changes are related to both frontend and backend.

NaplesPU Frontend Modifications

NaplesPU frontend abstracts target informations through the TargetInfo class, extending it in the NaplesPUTargetInfo implementation.

Since 64-bit operations require to support double-integer and double-floating-point formats, the following changes and additions are required in the target definition:

 class LLVM_LIBRARY_VISIBILITY NaplesPUTargetInfo : public TargetInfo {
   ...
 public:
   NaplesPUTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
     : TargetInfo(Triple) {
     ...
     resetDataLayout("e-m:e-p:32:32-i64:64:64-i32:32:32-f32:32:32-f64:64:64");
     LongDoubleWidth = 64;
     LongDoubleAlign = 64;
     DoubleWidth = 64;
     DoubleAlign = 64;
     LongWidth = 64;
     LongAlign = 64;
     LongLongWidth = 64;
     LongLongAlign = 64;
   }

NaplesPU Backend Modifications

This section describes the backend modification to be applied for 64-bit support.

Registers Definition

The 64-bit support for registers is based on the "Sub-Reg" behaviour. Since NaplesPU registers are 32-bit wide, a 64-bit variable is stored split in two parts:

  • The higher 32-bit are placed in the S[i] register;
  • The lower 32-bit are placed in the S[i+1] register.

The following class is declared in NaplesPURegisterInfo.td.

 class NaplesPU64GPRReg<bits<16> Enc, string n, list<Register> subregs>
   : NaplesPURegWithSubRegs<Enc, n, subregs> {
   let SubRegIndices = [sub_even, sub_odd];
   let CoveredBySubRegs = 1;
 }

The register instantiation is realized as follows:

 foreach i = 0-28 in {
 def S#!shl(i, 1)#_S#!add(!shl(i, 1), 1) : NaplesPU64GPRReg<!shl(i, 1), "s"#!shl(i, 1)#_64,
              [!cast<NaplesPUGPRReg>("S"#!shl(i, 1)),
              !cast<NaplesPUGPRReg>("S"#!add(!shl(i, 1), 1))]>;
}

Using the newly defined 64-bit support, it is possible to manage vector registers partitioned in eight cells, in which each one is 64-bit wide.

def VR512L : RegisterClass<"NaplesPU", [v8i64, v8f64, v8i8, v8i16, v8i32], 512, (sequence "V%u", 0, 63)>;

Calling Convention

Calling conventions for NaplesPU are modified supporting the 64-bit registers. As a result, the 32-bit Calling Convention is extended by adding the following lines:

CCIfType<[v8i8, v8i16, v8i32], CCPromoteToType<v8i64>>,

CCIfNotVarArg<CCIfType<[i64, f64], CCAssignToReg<[S0_S1, S2_S3, S4_S5, S6_S7, S8_S9,       S10_S11, S12_S13, S14_S15]>>>,

CCIfNotVarArg<CCIfType<[v16i32, v16f32, v8i64, v8f64], CCAssignToReg<[V0, V1, V2, V3, V4,  V5, V6, V7]>>>,

The lines above describes how parameters are passed through registers. Also the stack assignment is described below:

   CCIfType<[i64, f64], CCAssignToStack<8, 8>>,
   CCIfType<[v16i32, v16f32, v8i64, v8f64], CCAssignToStack<64, 64>>

The return value calling convention is extended follows:

 CCIfType<[i64, f64], CCAssignToReg<[S0_S1, S2_S3, S4_S5, S6_S7, S8_S9, S10_S11]>>,
 CCIfType<[v8i8, v8i16, v8i32], CCPromoteToType<v8i64>>,
 CCIfType<[v16i32, v16f32, v8i64, v8f64], CCAssignToReg<[V0, V1, V2, V3, V4, V5]>>

ISA Support

Referring to 32-bit ISA Support the 64-bit support on instructions is realized by extending the NaplesPU instruction hierarchy, adding a bit that acts as a mark.

As an example, in the following code, the bit <FR_TwoOp_Unmasked_64> class is derived from the <FR_TwoOp_Unmasked>, setting the sixth parameter as <1>. Recalling what described in the 32-bit ISA Support, it describes if the instruction refers to a 64-bit behaviour or not.

class FR_TwoOp_Unmasked_64<dag outs, dag ins, string asmstr, list<dag> pattern, bits<6> opcode, Fmt fmt2, Fmt fmt1, Fmt fmt0>
: FR_TwoOp_Unmasked<outs, ins, asmstr, pattern, opcode, 1, fmt2, fmt1, fmt0> {}

Instruction Lowering

Since new instructions are defined, a lowering behaviour on them is often required. As a result, NaplesPUISelLowering is extended in some points. For instance, since LoadI64 is a pseudo-instruction, it must be expanded as follows:

case NaplesPU::LoadI64:
  return EmitLoadI64(&MI, BB);
...
MachineBasicBlock *
 NaplesPUTargetLowering::EmitLoadI64(MachineInstr *MI,
                                   MachineBasicBlock *BB) const {
 
   DebugLoc DL = MI->getDebugLoc();
   const TargetInstrInfo *TII = Subtarget.getInstrInfo();
   MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
 
   // Create the destination register
   unsigned DstReg = MI->getOperand(0).getReg();
   int64_t ImmOp = MI->getOperand(1).getImm();
 
   unsigned Immediate = ((ImmOp >> 32) & 0xffffffff);
 
   BuildMI(*BB, MI, DL, TII->get(NaplesPU::MOVEIHSI))
               .addReg(DstReg, RegState::Define, NaplesPU::sub_odd)
               .addImm(((Immediate >> 16) & 0xFFFF));
   BuildMI(*BB, MI, DL, TII->get(NaplesPU::MOVEILSI))
               .addReg(DstReg, 0, NaplesPU::sub_odd)
               .addImm((Immediate & 0xFFFF));
 
   Immediate = (ImmOp & 0xffffffff);

   BuildMI(*BB, MI, DL, TII->get(NaplesPU::MOVEIHSI))
               .addReg(DstReg, 0, NaplesPU::sub_even)
               .addImm(((Immediate >> 16) & 0xFFFF));
   BuildMI(*BB, MI, DL, TII->get(NaplesPU::MOVEILSI))
               .addReg(DstReg, 0, NaplesPU::sub_even)
               .addImm((Immediate & 0xFFFF));
 
   MI->eraseFromParent();
 
   return BB;
 }

As it is explained, the load operation for 64-bit immediate value is realized by requiring a NaplesPU64GPR register and writing the split parts on the sub-registers.

Disassembler Support

Since new register classes are introduced, proper decode methods are implemented in NaplesPUDisassembler. As a result, DecodeGPR64RegisterClasses and DecodeVR512LRegisterClasses are added to disassemble the code.