Include

From NaplesPU Documentation
Revision as of 14:06, 21 June 2019 by Mirko (talk | contribs)
Jump to: navigation, search

NPU Defines

The main NPU include file npu_defines.sv stores global define at the core level. Parameters, such as the number of hardware lanes, number of register per register file, or memory address width, are defined in this file:

`define HW_LANE                 16
`define ADDRESS_SIZE            32
`define REGISTER_NUMBER         64

along with information on special purpose registers:

`define PC_REG                  ( `REGISTER_NUMBER - 1 )
`define RA_REG                  ( `REGISTER_NUMBER - 2 )
`define SP_REG                  ( `REGISTER_NUMBER - 3 )
`define FP_REG                  ( `REGISTER_NUMBER - 4 )
`define MASK_REG                ( `REGISTER_NUMBER - 5 )

and opcode definitions and instruction decoded-related data structure definitions:

NOT      = `OP_CODE_WIDTH'b000000,
OR       = `OP_CODE_WIDTH'b000001,
AND      = `OP_CODE_WIDTH'b000010,
XOR      = `OP_CODE_WIDTH'b000011,

User Defines

The User Defines include file (npu_user_defines.sv) exposes to the final user all the configurable parameters of the architecture. Typically those parameters are bound to be a power of two.

Core-side user defines: all the following are bound to be a power of two

  • THREAD_NUMB: number of hardware thread instantiated, default 8.
  • USER_ICACHE_SET: number of instruction cache sets, default 32.
  • USER_ICACHE_WAY: number of instruction cache sets, default 4.
  • USER_DCACHE_SET: number of data cache sets, default 32.
  • USER_DCACHE_WAY: number of data cache sets, default 4.
  • USER_L2CACHE_SET: number of L2 data cache sets, default 128.
  • USER_L2CACHE_WAY: number of L2 data cache sets, default 8.
  • NPU_SPM: when defined allocates a scratchpad memory in each NPU core.
  • NPU_FPU: when defined allocates an FPU in each NPU core.

System-side user defines:

  • DIRECTORY_BARRIER: when defined the manycore system supports a distributed directory mechanism spread over all tiles. Otherwise, it allocates a single centralized directory. The single-core version always has a centralized synchronization master.
  • CENTRAL_SYNCH_ID: Centralized directory ID, used only when DIRECTORY_BARRIER is undefined.
  • NoC_X_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a power of 2.
  • NoC_Y_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a 3power of 2.
  • TILE_MEMORY_ID: Memory Tile ID.
  • TILE_H2C_ID: Host interface Tile ID.
  • TILE_NPU: number of tile with an NPU core.
  • IO_MAP_BASE_ADDR: base address of the non-coherent memory space dedicated for IO devices, default value 0xFF00_0000.
  • IO_MAP_SIZE: width of the non-coherent memory space dedicated for IO devices, default value 0x00FF_FF00.

Furthermore, DISPLAY variables are defined, all commented by default. When a DISPLAY variable is active, it generates a file, under a folder named after the selected kernel. Each DISPLAY variable logs a defined kind of transaction, namely:

* DISPLAY_MEMORY: logs on file the memory state at the end of the kernel execution.
* DISPLAY_MEMORY_TRANS: logs on file all requests to the main memory.
* DISPLAY_INT: logs every integer operation in the integer pipeline, and their results.
* DISPLAY_CORE: enables log from the core (file display_core.txt).
* DISPLAY_ISSUE: logs all scheduled instructions, and tracks the scheduled PC and the issued Thread, when DISPLAY_CORE is defined.
* DISPLAY_INT: logs all results from the integer module, when DISPLAY_CORE is defined.
* DISPLAY_WB: logs all results from the writeback module, when DISPLAY_CORE is defined.
* DISPLAY_LDST: enables logging into the load/store unit (file display_ldst.txt).
* DISPLAY_CACHE_CONTROLLER: logs memory transactions between Load/Store unit and the main memory.
* DISPLAY_SYNCH_CORE: logs synchronization requests within the core.
* DISPLAY_BARRIER_CORE: logs synchronization releases from the Synchronization master.
* DISPLAY_COHERENCE: logs all coherence transactions among CCs, DCs and MC.
* DISPLAY_THREAD_STATUS: displays all active threads status and trap reason.

These variables selectively enable the logging of a specific feature. For each define, a log file is typically created in npu/simulationlog/<name_of_the_kernel>/display_<name>. DISPLAY_SIMULATION_LOG variable has to be always defined in the simulation flow, this also displays architectural information on the tcl shell.

Scratchpad Memory Defines

The Scratchpad Memory Define include file (npu_spm_defines.sv) exposes to the final user, all the configurable parameters related to the scratchpad memory at the core-level, along with other component-specific typedefs. The SPM has the following configurable paramenters:

  • SM_PROCESSING_ELEMENTS: number of concurrent input requests, default value 16 (equal to the number of hardware lanes).
  • SM_ENTRIES: number of entries per bank (similar to cache sets).
  • SM_MEMORY_BANKS: number of banks, default value 16.
  • SM_BYTE_PER_ENTRY: number of bytes per entry, default value 4.

Network Defines

The Network Defines are spread over two include files, namely npu_message_service_defines.sv and npu_network_define.sv. The first defines shared typedefs used for the host interfacing mechanism over the Network-on-Chip. The latter defines the length of flits and exposes two configuration parameters two the final user:

  • VC_PER_PORT: number of Virtual Channel per router port (5 ports), default value 4, must be a power of 2.
  • QUEUE_LEN_PER_VC: length of the FIFO for each Virtual Channel, default value 16, must be a power of 2.

The remainder of the file defines structures that ease flit management:

typedef struct packed {
    flit_type_t flit_type;
    vc_id_t vc_id;
    port_t next_hop_port;
    tile_address_t destination;
    tile_destination_t core_destination;
} flit_header_t;
typedef logic [`PAYLOAD_W-1:0] flit_body_t;
typedef struct packed {
    flit_header_t header;
    flit_body_t payload;
} flit_t;

Synchronization Defines

The Synchronization Defines include file (npu_sychronization_defines.sv) mainly defines a data structure that describes the synchronization message at the highest level of the network stack:

typedef struct packed {
    barrier_t id_barrier;
    cnt_barrier_t cnt_setup;
    tile_id_t tile_id_source;
} sync_account_message_t;
typedef struct packed {
    barrier_t id_barrier;
    logic [$bits(cnt_barrier_t)+$bits(tile_id_t)-1:0] padding;
} sync_release_message_t;

Synchronization traffics are encapsulated in a sync_message_t and then in a service_message_t:

typedef struct packed {
    sync_message_type_t sync_type;
    union packed {
         sync_account_message_t account_mess;
         sync_release_message_t release_mess;
    } sync_mess;
} sync_message_t;
typedef struct packed{
    cnt_barrier_t cnt;
    tile_mask_t mask_slave;
} barrier_data_t;

Coherence Defines

The Coherence Defines include file (called npu_coherence_defines.sv) defines all types and structures used by coherence actors (mainly Cache Controller and Directory Controller), such as TSHR entry type

typedef struct packed {
    logic                                  valid;
    logic [`DIRECTORY_STATE_WIDTH - 1 : 0] state;
    l2_cache_address_t                     address;
    tile_mask_t                            sharers_list;
    tile_address_t                         owner;
} tshr_entry_t;

or the request a CC can accept:

typedef enum coherence_request_t {
    load                   = 0,
    store                  = 1,
    replacement            = 2,
    Fwd_GetS               = 3,
    Fwd_GetM               = 4,
    Inv                    = 5,
    Put_Ack                = 6,
    Data_from_Dir_ack_eqz  = 7,
    Data_from_Dir_ack_gtz  = 8,
    Data_from_Owner        = 9,
    Inv_Ack                = 10,
    Last_Inv_Ack           = 11,
    recall                 = 12,
    flush                  = 13,
    load_uncoherent        = 14,
    store_uncoherent       = 15,
    replacement_uncoherent = 16,
    flush_uncoherent       = 17,
    Fwd_Flush              = 18,
    dinv                   = 19,
    dinv_uncoherent        = 20
} coherence_requests_enum_t;

and, of course, kind of message a coherence actor can send:

typedef enum message_request_t {
    GETS      = 0,
    GETM      = 1,
    PUTS      = 2,
    PUTM     = 3,
    DIR_FLUSH = 13
} message_requests_enum_t;

Types and functions here defined are extensively used by the protocol ROMs, at both Cache Controller- and Directory Controller-side.