Difference between revisions of "Heterogeneous Tile"

From NaplesPU Documentation
Jump to: navigation, search
(Memory Interface)
(Memory Interface)
Line 21: Line 21:
 
  STORE_V_16  = 'h25 - 'b100101
 
  STORE_V_16  = 'h25 - 'b100101
 
  STORE_V_32  = 'h26 - 'b100110
 
  STORE_V_32  = 'h26 - 'b100110
 +
 +
A custom core to be integrate d in the nu+ system ought to implement the following interface in order to communicate with the memory system:
 +
/* Memory Interface */
 +
// To Heterogeneous LSU
 +
output logic                                req_out_valid,    // Valid signal for issued memory requests
 +
output logic [31 : 0]                        req_out_id,        // ID of the issued request, mainly used for debugging
 +
output logic [THREAD_IDX_W - 1 : 0]          req_out_thread_id, // Thread ID of issued request. Requests running on different threads are dispatched to the CC conccurrently
 +
output logic [7 : 0]                        req_out_op,        // Operation performed
 +
output logic [ADDRESS_WIDTH - 1 : 0]        req_out_address,  // Issued request address
 +
output logic [DATA_WIDTH - 1    : 0]        req_out_data,      // Data output
 +
// From Heterogeneous LSU
 +
input  logic                                resp_in_valid,      // Valid signal for the incoming responses
 +
input  logic [31 : 0]                        resp_in_id,        // ID of the incoming response, mainly used for debugging
 +
input  logic [THREAD_IDX_W - 1 : 0]          resp_in_thread_id,  // Thread ID of the incoming response
 +
input  logic [7 : 0]                        resp_in_op,        // Operation code
 +
input  logic [DATA_WIDTH - 1 : 0]            resp_in_cache_line, // Incoming data
 +
input  logic [BYTES_PERLINE - 1 : 0]        resp_in_store_mask, // Bitmask of the position of the requesting bytes in the incoming data bus
 +
input  logic [ADDRESS_WIDTH - 1 : 0]        resp_in_address,    // Incoming response address
  
 
===Synchronization Interface===                                                   
 
===Synchronization Interface===                                                   

Revision as of 18:58, 13 May 2019

The nu+ project provides a heterogeneous tile integrated into the NoC, meant to be extended by the user. Such a tile provides a first example of how to integrate a custom module in the network-on-chip with a dedicated tile.

Memory Interface

The Memory Interface provides a transparent way to interact with the coherence system. The memory interface implements a simple valid/available handshake per thread, a different thread might issue different memory transaction and those are concurrently handled by the coherence system.

When a thread has a memory request, it first checks the availability bit related to its ID, if this is high the thread issues a memory transaction setting the valid bit and loading all the needed information on the Memory Interface.

Supported memory operations are reported below along with their opcodes:

LOAD_8      = 'h0  - 'b000000    
LOAD_16     = 'h1  - 'b000001
LOAD_32     = 'h2  - 'b000010
LOAD_V_8    = 'h7  - 'b000111
LOAD_V_16   = 'h8  - 'b001000
LOAD_V_32   = 'h9  - 'b001001
STORE_8     = 'h20 - 'b100000
STORE_16    = 'h21 - 'b100001
STORE_32    = 'h22 - 'b100010
STORE_V_8   = 'h24 - 'b100100
STORE_V_16  = 'h25 - 'b100101
STORE_V_32  = 'h26 - 'b100110

A custom core to be integrate d in the nu+ system ought to implement the following interface in order to communicate with the memory system:

/* Memory Interface */
// To Heterogeneous LSU
output logic                                 req_out_valid,     // Valid signal for issued memory requests
output logic [31 : 0]                        req_out_id,        // ID of the issued request, mainly used for debugging
output logic [THREAD_IDX_W - 1 : 0]          req_out_thread_id, // Thread ID of issued request. Requests running on different threads are dispatched to the CC conccurrently 
output logic [7 : 0]                         req_out_op,        // Operation performed
output logic [ADDRESS_WIDTH - 1 : 0]         req_out_address,   // Issued request address
output logic [DATA_WIDTH - 1    : 0]         req_out_data,      // Data output
// From Heterogeneous LSU
input  logic                                 resp_in_valid,      // Valid signal for the incoming responses
input  logic [31 : 0]                        resp_in_id,         // ID of the incoming response, mainly used for debugging
input  logic [THREAD_IDX_W - 1 : 0]          resp_in_thread_id,  // Thread ID of the incoming response
input  logic [7 : 0]                         resp_in_op,         // Operation code
input  logic [DATA_WIDTH - 1 : 0]            resp_in_cache_line, // Incoming data
input  logic [BYTES_PERLINE - 1 : 0]         resp_in_store_mask, // Bitmask of the position of the requesting bytes in the incoming data bus
input  logic [ADDRESS_WIDTH - 1 : 0]         resp_in_address,    // Incoming response address

Synchronization Interface

The Synchronization Interface connects the user logic with the synchronization module core-side allocated within the tile (namely the barrier_core unit). Such an interface allows user logic to synchronize on a thread grain. The synchronization mechanism supports inter- and intra- tile barrier synchronization. When a thread hits a synchronization point, it issues a request to the distributed synchronization master through the Synchronization Interface. Then, the thread is stalled (up to the user logic) till its release signal is high again.

Heterogeneous Dummy provided

This FSM first synchronizes with other ht in the NoC. Each dummy core in a ht tile requires a synchronization for LOCAL_BARRIER_NUMB threads (default = 4). The SEND_BARRIER state sends LOCAL_BARRIER_NUMB requests with barrier ID 42 through the Synchronization interface. It sets the total number of threads synchronizing on the barrier ID 42 equal to TOTAL_BARRIER_NUMB (= LOCAL_BARRIER_NUMB x `TILE_HT, number of heterogeneous tile in the system). When the last barrier is issued, SEND_BARRIER jumps to WAIT_SYNCH waiting for the ACK from the synchronization master. At this point all threads in each ht tile are synchronized, and the FSM starts all pending memory transactions. The START_MEM_READ_TRANS performs LOCAL_WRITE_REQS read operations (default = 128), performing a LOAD_8 operation (op code = 0) each time. In the default configuration, 128 LOAD_8 operations on consecutive addresses are spread among all threads and issued to the LSU through the Memory interface. When read operations are over, the FSM starts write operations in a similar way. The START_MEM_WRITE_TRANS performs LOCAL_WRITE_REQS (default = 128) write operations on consecutive addresses through the Memory interface. This time the operation performed is a STORE_8, and all ht tile are issuing the same store operation on same addresses compiting for the ownership in a transparent way. The coherence is totally handled by the LSU and CC, on the core side lsu_het_almost_full bitmap states the availability of the LSU for each thread (both writing and reading). In both states, a thread first checks the availability stored in a position equal to its ID (lsu_het_almost_full[thread_id]), then performs a memory transaction.