Difference between revisions of "Load/Store unit"

From NaplesPU Documentation
Jump to: navigation, search
(Stage 1)
Line 10: Line 10:
 
If the second stage is able to execute the instruction provided by the i-th thread, it asserts combinatorially the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy.
 
If the second stage is able to execute the instruction provided by the i-th thread, it asserts combinatorially the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy.
  
Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary. The alignment is done for byte, half-word and word operation. The vectorial alignment implies the compression of the data in order to write it consecutively. For example, a vec16i8 -
+
Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary. The alignment is done for byte, half-word and word operation. The vectorial alignment implies the compression of the data in order to write it consecutively. For example, a vec16i8 - that has got 1 significative byte each 4 bytes - is compressed to have 16 consecutive bytes.
that has got 1 significative byte each 4 bytes - is compressed to have 16 consecutive bytes.
 
  
 
'''PER MIRKO: SECONDO ME NON FUNZIONA STA COSA SE CI OPERI DI NUOVO'''
 
'''PER MIRKO: SECONDO ME NON FUNZIONA STA COSA SE CI OPERI DI NUOVO'''

Revision as of 19:52, 22 September 2017

This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself in order to reduce the memory access latency. It is divided in three stages (more details will be furnished further). It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit a signal in order to stop a thread when a miss raises. Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well.

The Load Store Unit does not store specific coherence protocol information (as stable states) but it stores privileges for all cached addresses. Each cache line has two privileges: can read and can write. Those privileges determine cache misses/hits and are updated only by the Cache Controller.

Stage 1

This is the first stage of the Load/Store Pipeline Unit. This stage has one queue per thread in which store the threads-relative instructions coming from the Operand Fetch, then provides in parallel one instruction per thread to the second stage .

If the second stage is able to execute the instruction provided by the i-th thread, it asserts combinatorially the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy.

Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary. The alignment is done for byte, half-word and word operation. The vectorial alignment implies the compression of the data in order to write it consecutively. For example, a vec16i8 - that has got 1 significative byte each 4 bytes - is compressed to have 16 consecutive bytes.

PER MIRKO: SECONDO ME NON FUNZIONA STA COSA SE CI OPERI DI NUOVO

The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx

PER MIRKO: SECONDO MEINVECE CI VUOLE INSTRUCTION VALID

The stage contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer. The output of this this buffer competes with the normal issued load/store instruction to be re-executed. The recycled instructions have an higher priority respect to the other operations.

Note that this stage consumes much memory space because the queues store the entire instructions and the relative fetched operands.

Stage 2

/*

* This stage has the main scope to choose a request to serve and fetching the tag&privileges from the tag cache.
* 
* It receives from the previous stage the load/store requests plus the recycle request for each thread; both kind of request
* are dequeued using two different signals (they are treated differently), but assert the same signal. The recycling has the
* priority over the regular requests. 
* 
* The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache and it has the highest priority
* 
* Basing on these signals, an arbiter choose a request and it performs the tag&privileges read or update.
* In parallel, another reading can happen because of a snooping request by the cache controller.


Stage 3

/*

* This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection and
* the data fetching, if necessary. It receives as well other two signals about flushing, and evicting/updating.
*
* The output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing,
* plus another important signal about the thread sleeping if a miss occurs.
*/