Heterogeneous Tile

From NaplesPU Documentation
Revision as of 18:55, 13 May 2019 by Mirko (talk | contribs) (Created page with "The nu+ project provides a heterogeneous tile integrated into the NoC, meant to be extended by the user. Such a tile provides a first example of how to integrate a custom modu...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The nu+ project provides a heterogeneous tile integrated into the NoC, meant to be extended by the user. Such a tile provides a first example of how to integrate a custom module in the network-on-chip with a dedicated tile.

Memory Interface

The Memory Interface provides a transparent way to interact with the coherence system. The memory interface implements a simple valid/available handshake per thread, different thread might issue different memory transaction and those are concurrently handled by the coherence system. When a thread has a memory request, it first checks the avaliability bit related to its ID, if this is high the thread issues a nenory transaction setting the valid bit and loading all the needed information on the Memory Interface. Supported memory operations are reported below along with their opcodes:

LOAD_8      = 'h0  - 'b000000    
LOAD_16     = 'h1  - 'b000001
LOAD_32     = 'h2  - 'b000010
LOAD_V_8    = 'h7  - 'b000111
LOAD_V_16   = 'h8  - 'b001000
LOAD_V_32   = 'h9  - 'b001001
STORE_8     = 'h20 - 'b100000
STORE_16    = 'h21 - 'b100001
STORE_32    = 'h22 - 'b100010
STORE_V_8   = 'h24 - 'b100100
STORE_V_16  = 'h25 - 'b100101
STORE_V_32  = 'h26 - 'b100110
                               

Synchronization Interface

The Synchronization Interface connects the user logic with the synchronization module core-side allocated within the tile (namely the barrier_core unit). Such an interface allows user logic to synchronize on a thread grain. The synchronization mechanism supports inter- and intra- tile barrier synchronization. When a thread hits a synchronization point, it issues a request to the distributed synchronization master through the Synchronization Interface. Then, the thread is stalled (up to the user logic) till its release signal is high again.

Heterogeneous Dummy provided

This FSM first synchronizes with other ht in the NoC. Each dummy core in a ht tile requires a synchronization for LOCAL_BARRIER_NUMB threads (default = 4). The SEND_BARRIER state sends LOCAL_BARRIER_NUMB requests with barrier ID 42 through the Synchronization interface. It sets the total number of threads synchronizing on the barrier ID 42 equal to TOTAL_BARRIER_NUMB (= LOCAL_BARRIER_NUMB x `TILE_HT, number of heterogeneous tile in the system). When the last barrier is issued, SEND_BARRIER jumps to WAIT_SYNCH waiting for the ACK from the synchronization master. At this point all threads in each ht tile are synchronized, and the FSM starts all pending memory transactions. The START_MEM_READ_TRANS performs LOCAL_WRITE_REQS read operations (default = 128), performing a LOAD_8 operation (op code = 0) each time. In the default configuration, 128 LOAD_8 operations on consecutive addresses are spread among all threads and issued to the LSU through the Memory interface. When read operations are over, the FSM starts write operations in a similar way. The START_MEM_WRITE_TRANS performs LOCAL_WRITE_REQS (default = 128) write operations on consecutive addresses through the Memory interface. This time the operation performed is a STORE_8, and all ht tile are issuing the same store operation on same addresses compiting for the ownership in a transparent way. The coherence is totally handled by the LSU and CC, on the core side lsu_het_almost_full bitmap states the availability of the LSU for each thread (both writing and reading). In both states, a thread first checks the availability stored in a position equal to its ID (lsu_het_almost_full[thread_id]), then performs a memory transaction.