Include
Contents
NPU Defines
The main NPU include file npu_defines.sv
stores global define at the core level. Parameters, such as the number of hardware lanes, number of register per register file, or memory address width, are defined in this file:
`define HW_LANE 16 `define ADDRESS_SIZE 32 `define REGISTER_NUMBER 64
along with information on special purpose registers:
`define PC_REG ( `REGISTER_NUMBER - 1 ) `define RA_REG ( `REGISTER_NUMBER - 2 ) `define SP_REG ( `REGISTER_NUMBER - 3 ) `define FP_REG ( `REGISTER_NUMBER - 4 ) `define MASK_REG ( `REGISTER_NUMBER - 5 )
and opcode definitions and instruction decoded-related data structure definitions:
NOT = `OP_CODE_WIDTH'b000000, OR = `OP_CODE_WIDTH'b000001, AND = `OP_CODE_WIDTH'b000010, XOR = `OP_CODE_WIDTH'b000011,
User Defines
The user defines include file npu_user_defines.sv
exposes to the final user all the configurable parameters of the architecture. Typically those parameters are bound to be a power of two.
Core-side user defines: all the following are bound to be a power of two
- THREAD_NUMB: number of hardware thread instantiated, default 8.
- USER_ICACHE_SET: number of instruction cache sets, default 32.
- USER_ICACHE_WAY: number of instruction cache sets, default 4.
- USER_DCACHE_SET: number of data cache sets, default 32.
- USER_DCACHE_WAY: number of data cache sets, default 4.
- USER_L2CACHE_SET: number of L2 data cache sets, default 128.
- USER_L2CACHE_WAY: number of L2 data cache sets, default 8.
- NPU_SPM: when defined allocates a scratchpad memory in each NPU core.
- NPU_FPU: when defined allocates an FPU in each NPU core.
System-side user defines:
- DIRECTORY_BARRIER: when defined the manycore system supports a distributed directory mechanism spread over all tiles. Otherwise, it allocates a single centralized directory. The single-core version always has a centralized synchronization master.
- CENTRAL_SYNCH_ID: Centralized directory ID, used only when DIRECTORY_BARRIER is undefined.
- NoC_X_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a power of 2.
- NoC_Y_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a 3power of 2.
- TILE_MEMORY_ID: Memory Tile ID.
- TILE_H2C_ID: Host interface Tile ID.
- TILE_NPU: number of tile with an NPU core.
- IO_MAP_BASE_ADDR: base address of the non-coherent memory space dedicated for IO devices, default value 0xFF00_0000.
- IO_MAP_SIZE: width of the non-coherent memory space dedicated for IO devices, default value 0x00FF_FF00.
Furthermore, DISPLAY variables are defined, all commented by default. When a DISPLAY variable is active, it generates a file, under a folder named after the selected kernel. Each DISPLAY variable logs a defined kind of transaction, namely:
* DISPLAY_MEMORY: logs on file the memory state at the end of the kernel execution. * DISPLAY_MEMORY_TRANS: logs on file all requests to the main memory. * DISPLAY_INT: logs every integer operation in the integer pipeline, and their results. * DISPLAY_CORE: enables log from the core (file display_core.txt). * DISPLAY_ISSUE: logs all scheduled instructions, and tracks the scheduled PC and the issued Thread, when DISPLAY_CORE is defined. * DISPLAY_INT: logs all results from the integer module, when DISPLAY_CORE is defined. * DISPLAY_WB: logs all results from the writeback module, when DISPLAY_CORE is defined. * DISPLAY_LDST: enables logging into the load/store unit (file display_ldst.txt). * DISPLAY_CACHE_CONTROLLER: logs memory transactions between Load/Store unit and the main memory. * DISPLAY_SYNCH_CORE: logs synchronization requests within the core. * DISPLAY_BARRIER_CORE: logs synchronization releases from the Synchronization master. * DISPLAY_COHERENCE: logs all coherence transactions among CCs, DCs and MC. * DISPLAY_THREAD_STATUS: displays all active threads status and trap reason.
These variables selectively enable the logging of a specific feature. For each define, a log file is typically created in npu/simulationlog/<name_of_the_kernel>/display_<name>
. DISPLAY_SIMULATION_LOG variable has to be always defined in the simulation flow, this also displays architectural information on the tcl shell.
Scratchpad Memory Defines
The scratchpad memory defines include file npu_spm_defines.sv
exposes to the final user, all the configurable parameters related to the scratchpad memory at the core-level, along with other component-specific typedefs. The SPM has the following configurable parameters:
- SM_PROCESSING_ELEMENTS: number of concurrent input requests, default value 16 (equal to the number of hardware lanes).
- SM_ENTRIES: number of entries per bank (similar to cache sets).
- SM_MEMORY_BANKS: number of banks, default value 16.
- SM_BYTE_PER_ENTRY: number of bytes per entry, default value 4.
Network Defines
Network related defines are spread over two include files, namely npu_message_service_defines.sv
and npu_network_define.sv
. The first defines shared typedefs used for the host interfacing mechanism over the Network-on-Chip.
The latter defines the length of flits and exposes two configuration parameters two the final user:
- VC_PER_PORT: number of Virtual Channel per router port (5 ports), default value 4, must be a power of 2.
- QUEUE_LEN_PER_VC: length of the FIFO for each Virtual Channel, default value 16, must be a power of 2.
The remainder of the file defines structures that ease flit management:
typedef struct packed { flit_type_t flit_type; vc_id_t vc_id; port_t next_hop_port; tile_address_t destination; tile_destination_t core_destination; } flit_header_t;
typedef logic [`PAYLOAD_W-1:0] flit_body_t;
typedef struct packed { flit_header_t header; flit_body_t payload; } flit_t;
Synchronization Defines
The Synchronization Defines include file (npu_sychronization_defines.sv) mainly defines a data structure that describes the synchronization message at the highest level of the network stack:
typedef struct packed { barrier_t id_barrier; cnt_barrier_t cnt_setup; tile_id_t tile_id_source; } sync_account_message_t;
typedef struct packed { barrier_t id_barrier; logic [$bits(cnt_barrier_t)+$bits(tile_id_t)-1:0] padding; } sync_release_message_t;
Synchronization traffics are encapsulated in a sync_message_t and then in a service_message_t:
typedef struct packed { sync_message_type_t sync_type;
union packed { sync_account_message_t account_mess; sync_release_message_t release_mess; } sync_mess; } sync_message_t;
typedef struct packed{ cnt_barrier_t cnt; tile_mask_t mask_slave; } barrier_data_t;
Coherence Defines
The Coherence Defines include file (called npu_coherence_defines.sv) defines all types and structures used by coherence actors (mainly Cache Controller and Directory Controller), such as TSHR entry type
typedef struct packed { logic valid; logic [`DIRECTORY_STATE_WIDTH - 1 : 0] state; l2_cache_address_t address; tile_mask_t sharers_list; tile_address_t owner; } tshr_entry_t;
or the request a CC can accept:
typedef enum coherence_request_t { load = 0, store = 1, replacement = 2, Fwd_GetS = 3, Fwd_GetM = 4, Inv = 5, Put_Ack = 6, Data_from_Dir_ack_eqz = 7, Data_from_Dir_ack_gtz = 8, Data_from_Owner = 9, Inv_Ack = 10, Last_Inv_Ack = 11, recall = 12, flush = 13, load_uncoherent = 14, store_uncoherent = 15, replacement_uncoherent = 16, flush_uncoherent = 17, Fwd_Flush = 18, dinv = 19, dinv_uncoherent = 20 } coherence_requests_enum_t;
and, of course, kind of message a coherence actor can send:
typedef enum message_request_t { GETS = 0, GETM = 1, PUTS = 2, PUTM = 3, DIR_FLUSH = 13 } message_requests_enum_t;
Types and functions here defined are extensively used by the protocol ROMs, at both Cache Controller- and Directory Controller-side.