ISA

From NaplesPU Documentation
Revision as of 15:50, 25 October 2017 by Catello (talk | contribs) (Data Types)
Jump to: navigation, search

Register File

The nu+ register file is composed by a scalar register file and a vector register file; each one containing 64 registers.

The scalar register file has 64 registers. The first 58 are general purpose registers, while the remaining 8 are special purpose registers. Each scalar register can store up to 32 bits of data. However the nu+ architecture can support also 64 bits of data, storing it in a couple of contiguous registers.


ScalarRegFile.png


The vector register file has 64 general purpose registers Each vector register can store up to 512 bits of data. Each vector can store 16 x 32 bits or 8 x 64 bits of data.

VectorRegFile.png

Data Types

The following table sums up the data types that are possible to use in nu+. The Type column has the C/C++ type names, the LLVM type column presents the type names used in LLVM and the Register column shows the register type in which a value of a specific type is stored.

Type LLVM Type Register Notes
bool i1 scalar (32 bits) It is expanded to 32 bits
char i8 scalar (32 bits) It is expanded to 32 bits
short i16 scalar (32 bits) It is expanded to 32 bits
int i32 scalar (32 bits)
float f32 scalar (32 bits)
long long int i64 scalar (64 bits)
double f64 scalar (64 bits)
vec16i8, vec16u8 v16i8 vector (16 x 32 bits) It is expanded to 32 bits vector
vec16i16, vec16u16 v16i16 vector (16 x 32 bits) It is expanded to 32 bits vector
vec16i32, vec16u32 v16i32 vector (16 x 32 bits)
vec16f32 v16f32 vector (16 x 32 bits)
vec8i8, vec8u8 v8i8 vector (8 x 64 bits) It is expanded to 64 bits vector
vec8i16, vec8u16 v8i16 vector (8 x 64 bits) It is expanded to 64 bits vector
vec8i32, vec8u32 v8i32 vector (8 x 64 bits) It is expanded to 64 bits vector
vec8f32 v8f32 vector (16 x 32 bits) It is considered as a 16 elements vector
vec8i64, vec8u64 v8i64 vector (8 x 64 bits)
vec8f64 v8f64 vector (8 x 64 bits)

Instructions Format

The nu+ instructions have a fixed length of 32 bits. They are grouped in seven types:

  • The R type includes the logical and arithmetic operations and memory operations.
  • The I type includes the logical and arithmetic operations between a register operand and an immediate operand.
  • The MOVEI type includes the load operations of an immediate operand in a register.
  • The C type used for control operations and for synchronization instructions.
  • The JR type includes jump instructions.
  • The M type includes the instructions used to access memory.
  • The M-poly type is used for memory instructions which uses a polyhedral access pattern.

ISA

R type instructions

  • RR (Register to Register) has a destination register and two source registers.
  • RI (Register Immediate) has a destination register and one source registers and an immediate encoded in the instruction word.
or 1 or Rb
and 2 and Rd = Ra & Rb
xor 3 xor Rd = Ra ^ Rb
add 4 addition Rd = Ra + Rb
sub 5 subtraction Rd = Ra – Rb
mull 6 multiplication Rd = Ra * Rb
mulh 7 high multiply Rd = Ra * Rb
mulhu 8 high multiply unsigned Rd = Ra * Rb
ashr 9 arithmetic shift right Rd = Ra ‘>> Rb
shr 10 shift right Rd = Ra >> Rb
shl 11 shift left Rd = Ra << Rb
clz 12 count leading zeros
ctz 13 count trailing zeros
shuffle 24 vector shuffle Rd[i] = Ra[Rb[i]]
getlane 25 Get lane from vector Rd = Ra[Rb]
move 32 move register Rd = Ra
add_f 33 floating point add Rd = Ra + Rb
sub_f 34 floating point sub Rd = Ra – Rb
mul_f 35 floating point multiplication Rd = Ra * Rb
div_f 36 floating point division Rd = Ra / Rb
sext8 43 sign extend 8 bits
sext16 44 sign extend 16 bits
sext32 45 sign extend 32 bits
f32tof64 46 cast float to double
f64tof32 47 cast double to float
i32tof32 48 cast integer to float
f32toi32 49 cast float to integer
cmpeq 14 compare equal Rd = Ra == Rb
cmpne 15 compare not equal Rd = Ra != Rb
cmpgt 16 compare greater then Rd = Ra > Rb
cmpge 17 compare greater or equal Rd = Ra >= Rb
cmplt 18 compare less then Rd = Ra < Rb
cmple 19 compare less or equal Rd = Ra <= Rb
cmpgt_u 20 unsigned compare greater then Rd = Ra > Rb
cmpge_u 21 unsigned compare greater or equal Rd = Ra >= Rb
cmplt_u 22 unsigned compare less then Rd = Ra < Rb
cmple_u 23 unsigned compare less or equal Rd = Ra <= Rb
cmpeq_fp 37 floating point compare equal Rd = Ra == Rb
cmpne_fp 38 floating point compare not equal Rd = Ra != Rb
cmpgt_fp 39 floating point compare greater then Rd = Ra > Rb
cmpge_fp 40 floating point compare greater or equal Rd = Ra >= Rb
cmplt_fp 41 floating point compare less then Rd = Ra < Rb
cmple_fp 42 floating point compare less or equal Rd = Ra <= Rb

I type instructions

Mnemonic Opcode Meaning Operation
ori 1 or Imm
andi 2 and Rd = Ra & Imm
xori 3 xor Rd = Ra ^ Imm
addi 4 addition Rd = Ra + Imm
subi 5 subtraction Rd = Ra – Imm
mulli 6 multiplication Rd = Ra * Imm
mulhi 7 high multiply Rd = Ra * Imm
mulhui 8 high multiply unsigned Rd = Ra * Imm
ashri 9 arithmetic shift right Rd = Ra ‘>> Imm
shri 10 shift right Rd = Ra >> Imm
shli 11 shift left Rd = Ra << Imm
getlane 25 Get lane from vector Rd = Ra[Imm]

MOVEI type instructions

MVI (Move Immediate) has a destination register and a 16 bit instruction encoded immediate.


Mnemonic Opcode Meaning Operation
moveil 0 move the 16 less significant bits Rd = Ra & 0xFFFF
moveih 1 move the 16 most significant bits Rd = (Ra >> 16) & 0xFFFF
movei 2 move the 16 less significant bits with zero extension Rd = (Rd ^ Rd) & (Ra & 0xFFFF)

C type instructions

Mnemonic Opcode Meaning Operation
barrier_core 0 barrier through all the nu+’s cores
barrier_thread 1 barrier through all the threads of a core
flush 2 flush a cache line to the system memory

JR type instructions

J type instructions

M type instructions

MEM (Memory Instruction) has a destination/source field, in case of load the first register asses the destination register, otherwise in case of store the first register contains the store value. Next in both cases there is the base address and the immediate. The sum of base address and immediate will give the effective memory address.

Mnemonic Opcode Meaning Operation
loadXD_s8 0 load 1 byte with sign extension Rd = [Rbase + Offset]
loadXD_s16 1 load 2 bytes with sign extension Rd = [Rbase + Offset]
load32D 2 load 1 word Rd = [Rbase + Offset]
loadXD_u8 4 load 1 byte with zero extension Rd = [Rbase + Offset]
loadXD_u16 5 load 2 bytes with zero extension Rd = [Rbase + Offset]
load64D_s32 2 load 1 word sign-extended to 1 double-word Rd = [Rbase + Offset]
load64D_u32 6 load 1 word zero-extended to 1 double-word Rd = [Rbase + Offset]
load64D 3 load 1 double-word Rd = [Rbase + Offset]
loadD_vYi8 7 load a vector of Y bytes with sign extension Rd = [Rbase + Offset]
loadD_vYi16 8 load a vector of Y 2 bytes with sign extension Rd = [Rbase + Offset]
loadD_vYi32 9 load a vector of Y words with sign extension Rd = [Rbase + Offset]
loadD_v8i64 10 load a vector of 8 double-words Rd = [Rbase + Offset]
loadD_vYu8 11 load a vector of Y bytes with zero extension Rd = [Rbase + Offset]
loadD_vYu16 12 load a vector of Y 2 bytes with zero extension Rd = [Rbase + Offset]
loadD_vYu32 13 load a vector of Y words with zero extension Rd = [Rbase + Offset]
loadD_g_32 16 load 16 words from different memory addresses Rd[i] = [Rbase[i]]
storeXD_8 32 store 1 byte [Rbase + Offset] = Rs
storeXD_16 33 store 2 bytes [Rbase + Offset] = Rs
store32D 34 store 1 word [Rbase + Offset] = Rs
store64D_32 34 store 1 word [Rbase + Offset] = Rs
store64D 35 store 1 double-word [Rbase + Offset] = Rs
storeD_vYi8 32 store Y bytes [Rbase + Offset] = Rs
storeD_vYi16 33 store Y 2 bytes [Rbase + Offset] = Rs
storeD_vYi32 34 store Y words [Rbase + Offset] = Rs
storeD_v8i64 35 store Y double-words [Rbase + Offset] = Rs
storeD_s_32 42 store 16 words to different memory addresses [Rbase[i]] = Rs[i]

M-poly type instructions

NOP instruction