Difference between revisions of "Network"

From NaplesPU Documentation
Jump to: navigation, search
(General architecture)
 
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The many-core system relies on an interconnection network in order to exchange the coherence, synchronization and boot messages. An interconnection network is a programmable system that moves data between two or more terminals.
+
In a many-core system, the interconnection network has the vital goal of allowing various devices to communicate efficiently.
A network-on-chip is an interconnection network connecting microarchitectural components. Out NoC choice is a 2D mesh. A mesh is a segmented bus in two dimensions with added complexity to route data across dimensions.
 
  
It is explained how the network-on-chip is designed and created. A packet from a source has to be: (1) injected/ejected in/from the network system, (2) routed to the destination over specific wires. The first operation is done by the Network interface, the second from the Router.
+
The network infrastructure is mainly designed for coherence and synchronization messages. In particular, NaplesPU coherence system provides a private L1 core cache and a shared L2 directory-based cache, with the L2 cache completely distributed among all tiles. Each address has an associated home tile in charge to handle the state of the cache line is stored. Similarly, the home directory handles coherence requests from cores.
 +
On the same infrastructure, service messages flow either for host-to-device communication and handling IO mapped peripherals.
  
In order to have a scalable system, each tile has its own Router and Network Interface.
+
Each tile is equipped with multiple devices (called from now on network users) requiring network access.
 +
The network infrastructure must thus provide inter-tile and intra-tile communication capabilities. The interface offered to its users must be as generic as possible, and independent of the specific network topology and implementation details.
  
== Network Interface ==
+
== General architecture ==
  
The Network Interface is the "glue" that merge all the component inside a tile that want to communicate with other tile in the NoC. It has several interface with the element inside the tile and an interface with the router.
+
[[File:npu_manycore.png|800px|NaplesPU manycore architecture]]
Basically, it has to convert a packet from the tile into flit injected in to the network and viceversa. In order to avoid deadlock, four different virtual network are used: request, forwaded request, response and service network.
 
  
The interface to the tile communicate with directory controller, cache controller and service units (boot manager, barrier core unit, synchronization manager).
+
Tiles are organized as a 2D mesh grid.
The units use the VN in this way:
 
  
[[File:NI_VN.jpg|400px|Ni virual network]]
+
Every tile has a [[Network router|router]], which is the component responsible for inter-tile communication, and a [[Network interface|network interface]], which offers a transparent interface to network users.
+
The network interface acts here as a bridge. Its interface must adapt to the requirements of multiple network users, converting requests from user’s format to network format, and backwards.
The unit is divided in two parts:
+
Once a request is converted in network format, the router takes charge of its handling.
* TO router, in which the vn_core2net units buffer and convert the packet in flit;
 
* FROM router, in which the vn_net2core units buffer and convert the flit in packet.
 
  
These two units support the multicast, sending k times a packet in unicast as many as the destinations are.
+
The basic communication unit supported by the router is the flit. A request is thus broken down in flits by the network interface and sent to the local router (injection), and requests are rebuilt using flits received from the local router (ejection).
 +
The router has no information of application messages, and it just sees them as a stream of flits. As a sequence of flits can be arbitrarily long, the router can offer the maximum flexibility, allowing requests of unspecified length.
 +
The ultimate goal of the router is to ensure that flits are correctly injected, ejected and forwarded (routed) through the mesh.
  
The vn_net2core units should be four as well as vn_core2net units, but the response network is linked with the DC and CC at the same time.
+
== Routing protocol ==
So the solution is to add another vn_net2core and vn_core2net unit with the same output of the other one. If the output of the NI contains two different output port - so an output arbiter is useless, the two vn_core2net response units, firstly, has to compete among them and, secondly, among all the VN.
 
  
[[File:NI.png|800px|Network Interface]]
+
The NaplesPU system works under the assumption that no flit can be lost. This means that routers must buffer packets, and eventually stall in case of full output buffers, to avoid packet drop. In this process, as routers wait for each other, a circular dependency can potentially be created. Since routers cannot drop packets, a deadlock may occur, and we must prevent it from happening.
  
Note that packet_body_size is linked with the flit_numb, but we prefer to calculate them separately. (FILT_NUM = ceil(PACKET_BODY/FLIT_PAYLOAD) )
+
As we route packets through the mesh, a flit can enter a router and leave it from any cardinal direction. It is obvious that routing flits along a straight line cannot form a circular dependency. For this reason, only turns must be analyzed.
 +
The simplest solution is to ban some possible turns, in a way that disallows circular dependency.
  
=== vn_net2core ===
+
The routing protocol adopted in NaplesPU is called '''XY Dimension-Order Routing''', or DOR. It forces a packet to be routed first along the X-axis, and then along the Y-axis. It is one of the simplest routing protocols, as it takes its decision independently of current network status (a so-called oblivious protocol), and requires little logic to be implemented, although offering deadlock avoidance and shortest path routing.
  
This module stores incoming flit from the network and rebuilt the original packet. Also, it handles back-pressure informations (credit on/off).
+
Besides that, NaplesPU Network System implements a '''routing look-ahead''' strategy. Although this is an implementation-phase optimization, it may be useful to report it here. Every router does the route calculation, and so the output port calculation, as if it were the next router along the path, and sends the result along with the flit. This means that every router, when receiving a flit, already knows the output port of that flit. This allows reducing router's pipeline length, as now the routing calculation can be done in parallel with other useful work (like output port allocation). Further details are discussed on the [[Network router|router]] page.
A flit is formed by an header and a body, the header has two fields: |TYPE|VCID|. VCID is fixed by the virtual channel ID where the flit is sent. The virtual channel depends on the type of message. The filed TYPE can be: HEAD, BODY, TAIL or HT. It is used by the control units to handles different flits.
 
  
When the control unit checks the TAIL or HT header, the packet is complete and stored in packed FIFO output directly connected to the Cache Controller.
+
== Virtual channels and flow control ==
  
E.g. : If those flit sequence occurs:
+
Virtual channels are an extensively adopted technique in network design. They allow building multiple virtual networks starting from a single physical one.
          1st Flit in => {FLIT_TYPE_HEAD, FLIT_BODY_SIZE'h20}
 
          2nd Flit in => {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h40}
 
          3rd Flit in => {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h60}
 
          4th Flit in => {FLIT_TYPE_TAIL, FLIT_BODY_SIZE'h10};
 
  
The rebuilt packet passed to the Cache Controller is:
+
The main problem they try to solve is head-of-line blocking. It happens when a flit that cannot be routed (maybe because the next router input buffer is full) reaches the head of the input queue, preventing the successive flits, which potentially belong to independent traffic flows, from being served. If those blocked flits belong to different traffic flows, it makes sense to buffer them on different queues.
          Packet out => {FLIT_BODY_SIZE'h10, FLIT_BODY_SIZE'h60, FLIT_BODY_SIZE'h40, FLIT_BODY_SIZE'h20}
 
  
A FIFO stores the reconstructed packet. When the CC can read, it asserts packet_consumed bit.
+
Virtual channels are called virtual for this reason: there is only a single link between two routers, but the result is like having multiple physical channels dedicated to each traffic flow. It is the router’s responsibility to ensure that virtual channels are properly time multiplexed on the same link.
 
The FIFO threshold is reduced of 2 due to controller: if a sequence of consecutive 1-flit packet arrives, the on-off backpressure almost_full signal will raise up the clock edge after the threshold crossing as usual, so it is important to reduce of 2 the threshold to avoid packet lost. If the packet arriving near the threshold are bigger than 1 flit, the enqueue will be stopped with 1 free buffer space.
 
  
==== Control unit ====
+
Virtual channels can also be used to prevent deadlocks, in case the routing protocol allows them. To achieve this, virtual channels allocation must happen in a fixed order; or an “escape virtual channel” must be provided, whose flits gets routed with a deadlock-free protocol. As long as virtual channels allocation is fair, a flit will eventually be served.
Flits from the network are not stored in any FIFOs. The router_valid signal is directly connected to the rebuilt packet control unit.
 
In Control Unit all incoming flit are mounted in a packet. It checks the Flit header, if it is a TAIL or a HT type, the control unit stores the composed packet in the output FIFO to the Cache Controller.
 
  
[[File:N2C_CU.png|800px|N2C_CU]]
+
In NaplesPU the number of virtual channels is represented by the constant parameter VC_PER_PORT, currently set to 4, as many as the type of network messages: ''Requests'', ''Responses'', ''Forwards'', ''Service Messages''.
 +
As a given message type is associated with a specific virtual channel, the network interface component must know on which virtual channel it has to inject the flits. The router, as stated before, has no knowledge of application messages. This means that it must ensure as little as to guarantee that messages don’t get routed on wrong virtual channels: for this reason, '''a flit will be kept on the same virtual channel in every router along with its path'''. Besides that, given the absence of a flit reorder logic in the input buffers, '''flits of multiple packets cannot interleave on the same virtual channel'''. That is, it must be ensured that once a packet is partially been sent, no other packets can be granted access to that specific virtual channel, until the request is completely fulfilled.
  
=== vn_core2net ===
+
In NaplesPU every virtual channel has a dedicated input buffer, so flow control is also be implemented on a virtual channel basis. This means that every router sends to its neighbour information regarding the status of its buffers, for each virtual channel: they are called ''back-pressure signals''. In particular, back-pressure is implemented using a single signal, called '''ON/OFF''', which when turned high prevents other routers from sending packets to that specific virtual channel.
  
This module stores the original packet and converts in flit for the network. The conversion in flit starts fetching the packet from an internal queue.
+
== Data structures ==
When the requestor has to send a packet, it asserts packed_valid bit, directly connected to the FIFO enqueue_en port. Those informations are used by the Control Unit to translate packet in FLITs for each destination.
 
  
==== Control unit ====
+
The network infrastructure uses a bunch of simple data structures to represent information.
The Control Unit strips the packet from the Cache Controller into N flits for the next router. It checks the packet_has_data field, if a packet does not contain data, the CU generates just a flit (HT type), otherwise it generates N flits. It supports multicasting through multiple unicast messages.
 
  
A priority encoder selects from a mask which destination has to be served. All the information of the header flit are straightway filled, but the flit type.
+
The main data structure is the flit, which is the exact amount of information transferred between routers.
  
  assign packet_dest_pending                = packet_destinations_valid & ~dest_served;
+
  typedef struct packed {
 +
flit_type_t flit_type;
 +
vc_id_t vc_id;
 +
port_t next_hop_port;
 +
tile_address_t destination;
 +
tile_destination_t core_destination;
 +
logic [`PAYLOAD_W-1:0] payload;
 +
} flit_t;
  
rr_arbiter # (
+
First of all, we must know which position a flit occupies in the original packet. In particular, we are only interested in a few scenarios:
    .NUM_REQUESTERS ( DEST_NUMB )
+
* this is a head flit (the first flit of a packet): it represents the start of a request; the router must decide when to grant access to the output virtual channel;
)
+
* this is a body flit (not the first nor the last): the request has already been granted access to the output virtual channel, and body flits must leave from that same virtual channel, moreover the grant must be held;
rr_arbiter (
+
* this is a tail flit (the last flit of a packet): after serving this flit as a body flit, the grant can be released;
    .clk        ( clk                  ) ,
+
* this is a head-tail flit (a packet composed of one flit): the flit gets routed as a head flit and the grant is released;
    .reset      ( reset                ) ,
 
    .request    ( packet_dest_pending  ) ,
 
    .update_lru ( 1'b0                ) ,
 
    .grant_oh  ( destination_grant_oh )
 
) ;
 
  
The units performs the multicast throughout k unicast: when a destination is served (a packet is completed), the corresponding bit in the destination mask is deasserted.
+
The flit type is thus defined in this way.
  
  dest_served <= dest_served | destination_grant_oh;
+
  typedef enum logic [`FLIT_TYPE_W-1 : 0] {
 +
HEADER,
 +
BODY,
 +
TAIL,
 +
HT
 +
} flit_type_t;
  
[[File:C2N_CU.png|800px|C2N_CU]]
+
The virtual channel id is a numeric identifier. As stated before, the router has no knowledge of each virtual channel semantic, although it is reported as a comment on each line.
  
The units has to know if the multicast is on. In this case, the signal packet_destinations_valid is a bitmap of destination to reach and the real_dest has the TILE_COUNT width; else the signal real_dest contains the (x,y) coordinates of the destination
+
typedef enum logic [`VC_ID_W-1 : 0] {
 +
VC0, // Request
 +
VC1, // Response Inject
 +
VC2, // Fwd
 +
VC3  // Service VC
 +
} vc_id_t;
  
generate
+
Due to the look-ahead routing technique, the next hop port field is the output port through which the flit must leave the current router. The output port at the current router is in fact calculated by the previous router on the path, as explained in [[#Routing protocol|this paragraph]]. Each router has 5 I/O ports: one for each cardinal direction, and the local injection/ejection port.
    if ( DEST_OH == "TRUE" ) begin
 
      assign
 
          real_dest.x  = destination_grant_id[`TOT_X_NODE_W - 1 : 0 ],
 
          real_dest.y  = destination_grant_id[`TOT_Y_NODE_W + `TOT_X_NODE_W - 1 -: `TOT_X_NODE_W];
 
    end else
 
      assign real_dest = packet_destinations[destination_grant_id];
 
endgenerate
 
  
Note: if DEST_OH is false, the core_destination signal contains the component ID inside the tile that will receive the packet, else it has no sense.
+
typedef enum logic [`PORT_NUM_W-1 : 0 ] {
 +
LOCAL = 0,
 +
EAST  = 1,
 +
NORTH = 2,
 +
WEST  = 3,
 +
SOUTH = 4
 +
} port_t;
  
assign cu_flit_out_header.core_destination = tile_destination_t'( destination_grant_oh[`DEST_TILE_W -1 : 0] );
+
As tiles are organized in a mesh, they can be identified by a two-dimensional coordinate.
  
== Router ==
+
typedef struct packed {
 +
logic [`TOT_Y_NODE_W-1:0] y;
 +
logic [`TOT_X_NODE_W-1:0] x;
 +
} tile_address_t;
  
The router moves data between two or more terminals, so the interface is standard: input and output flit, input and output write enable, and backpressure signals.
+
To support intra-tile addressing, an additional field is provided. At the moment, the devices that need to be addressed are the cache controller and the directory controller.
  
This is a virtual-channel flow control X-Y look-ahead router for a 2D-mesh topology.
+
typedef enum logic [`DEST_TILE_W - 1 : 0] {
 +
TO_DC = `DEST_TILE_W'b00001,
 +
TO_CC = `DEST_TILE_W'b00010
 +
} tile_destination_t;
  
The first choice is to use only input buffering, so this will take one pipe stage. Another technique widely used is the look-ahead routing, that permits the route calculation of the next node. It is possible to merge the virtual channel and switch allocation in just one stage.
+
== Implementation ==
  
Recapping, there are 4 stages, two of them working in parallel (routing and allocation stages), for a total of three stages. To further reduce the pipeline stages, the crossbar and link traversal stage is not buffered, reducing the stages at two and, de facto, merging the last stage to the first one.
+
*[[Network router]]  
 
+
*[[Network interface]]
[[File:router.jpg|800px|router]]
 
 
 
=== First stage ===
 
 
 
There will be five different port - cardinal directions plus local port -, each one with V different queues, where V is the number of virtual channels presented.
 
 
 
[[File:first_stage.jpg|800px|First stage Router]]
 
 
 
There are two queues: one to house flits (FQ) and another to house only head flits (HQ). The queue lengths are equals to contemplate the worst case - packets with only one flit. Every time a valid flit enters in this unit, the HQ enqueues its only if the flit type is `head' or `head-tail'. The FQ has the task of housing all the flits, while the HQ has to "register" all the entrance packets. To assert the dequeue signal for HQ, either allocator grant assertion and the output of a tail flit have to happen, so the number of elements in the HQ determines the number of packet entered in this virtual channel.
 
 
 
header_fifo (
 
    .enqueue_en  ( wr_en_in & flit_in.vc_id == i & ( flit_in.flit_type == HEADER | flit_in.flit_type == HT ) ),
 
    .value_i    ( flit_in.next_hop_port                                                                    ),
 
    .dequeue_en  ( ( ip_flit_in_mux[i].flit_type == TAIL | ip_flit_in_mux[i].flit_type == HT ) & sa_grant[i] ),
 
    ...
 
    ...
 
 
 
This organization works only if a condition is respected: the flits of each packets are stored consecutively and ordered in the FQ. To obtain this condition, a deterministic routing has to be used and all the network interfaces have to send all the flits of a packet without interleaving with other packet flits.
 
 
 
=== Second stage ===
 
 
 
The second stage has got two units working in parallel: the look-ahead routing unit and allocator unit. This two units are linked throughout a intermediate logic.
 
The allocator unit has to accord a grant for each port. This signal is feedback either to first stage and to a second-stage multiplexer as selector signal. This mux receives as input all the virtual channel output  for that port, electing as output only one flit - based on the selection signal. This output flit goes in the look-ahead routing to calculate the next-hop port destination.
 
 
 
[[File:second_stage.jpg|800px|Second stage Router]]
 
 
 
==== Allocation  ====
 
The allocation unit grants a  flit to go toward a specific port of a specific virtual channel, handling the contention of virtual channels and crossbar ports. Each single allocator is a two-stage input-first separable allocator that permits a reduced number of component respect to other allocator.
 
 
 
The overall unit receives as many allocation request as the ports are. Each request asks  to obtain a destination port grant for each of its own virtual channel - the total number of request lines is P x V x P. The allocation outputs are two for each port: (1) the winner destination port that will go into the crossbar selection; (2) the winner virtual channel that is feedback to move the proper flit at the crossbar input.
 
 
 
[[File:allocation.jpg|800px|Allocation]]
 
 
 
The allocation unit has to respect the following rules:
 
* the packets can move only in their respective virtual channel;
 
* a virtual channel can request only one port per time;
 
* the physical link can be interleaved by flits belonging to different flows;
 
* when a packet acquires a virtual channel on an output port, no other packets on different input ports can acquire that virtual channel on that output port.
 
 
 
===== Allocatore core =====
 
The virtual channel and switch allocation is logically the same for both, so it is
 
encased in a unit called allocator core. It is simply a parametrizable number
 
of parallel arbiters in which the input and output are properly scrambled and
 
the output are or-ed to obtain a port-granularity grant.
 
 
 
[[File:allocatore_core.png|400px|allocatore_core]]
 
 
 
The difference between other stages is that each arbiter is a round-robin
 
arbiter with a grant-hold circuit. This permits to obtain an uninterrupted use
 
of the obtained resource, especially requested to respect one of the rule in the
 
VC allocation.
 
 
 
rr_arbiter u_rr_arbiter (
 
  . request ( request ),
 
  . update_lru ('{ default : '0}) ,
 
  . grant_oh ( grant_arb ),
 
  .*
 
);
 
assign grant_oh = anyhold ? hold : grant_arb ;
 
assign hold = last & hold_in ;
 
assign anyhold = | hold ;
 
always_ff @( posedge clk , posedge reset ) last <= grant_oh ;
 
 
 
===== Virtual channel allocation =====
 
The first step for the virtual channel allocator is removed because the hypothesis is that only one port per time can be requested for each virtual channel. Under this condition, a first-stage arbitration is useless, so only the second stage is implemented troughout the allocatore_core instantiation.
 
 
 
The use of grant-hold arbiters in the second stage avoids that a packet loses its grant when other requests arrive after this grant. The on-off input signal is properly used to avoid that a flit is send to a full virtual channel in the next node.
 
 
 
===== Switch allocation =====
 
The switch allocator receives as input the output signals from VC allocation and all the port requests. For each port, there is a signal assertion for each winning virtual channel. These winners now compete for a switch allocation. Two arbiter stage are necessary. The first stage arbiter has as many round-robin arbiter as the input port are.
 
Each round-robin arbiter chooses one VC per port and uses this result to select the request port associated at this winning VC. The winning request port goes at the input of second stage arbiter as well as the winning requests for the other ports. The second stage arbiter is an instantiation of the allocator core and chooses what input port can access to the physical links. This signal is important for two reasons: (1) it is moved toward the round-robin unit previously and-ed with the winning VC for each port; (2) it is registered, and-ed with the winning destination port, and used as selection port for the crossbar (for each port).
 
 
 
==== Flit handler ====
 
A mux uses the granted_vc signal to grant one of the input flit to the output register. This flit then will goes to the input crossbar port.
 
 
 
always_comb begin
 
    flit_in_granted_mod[i] = flit_in_granted[i];
 
    if ( flit_in_granted[i].flit_type == HEADER || flit_in_granted[i].flit_type == HT )
 
      flit_in_granted_mod[i].next_hop_port = port_t'( lk_next_port[i] );
 
end
 
 
 
==== Next hop routing calculation ====
 
The look-ahead routing calculates the destination port of the next node instead of the actual one because the actual destination port is yet ready in the header flit. The algorithm is a version of the X-Y deterministic routing. It is deadlock-free because it removes four on eight possible turns: when a packet turns towards Y directions, it cannot turn more.
 

Latest revision as of 14:09, 25 June 2019

In a many-core system, the interconnection network has the vital goal of allowing various devices to communicate efficiently.

The network infrastructure is mainly designed for coherence and synchronization messages. In particular, NaplesPU coherence system provides a private L1 core cache and a shared L2 directory-based cache, with the L2 cache completely distributed among all tiles. Each address has an associated home tile in charge to handle the state of the cache line is stored. Similarly, the home directory handles coherence requests from cores. On the same infrastructure, service messages flow either for host-to-device communication and handling IO mapped peripherals.

Each tile is equipped with multiple devices (called from now on network users) requiring network access. The network infrastructure must thus provide inter-tile and intra-tile communication capabilities. The interface offered to its users must be as generic as possible, and independent of the specific network topology and implementation details.

General architecture

NaplesPU manycore architecture

Tiles are organized as a 2D mesh grid.

Every tile has a router, which is the component responsible for inter-tile communication, and a network interface, which offers a transparent interface to network users. The network interface acts here as a bridge. Its interface must adapt to the requirements of multiple network users, converting requests from user’s format to network format, and backwards. Once a request is converted in network format, the router takes charge of its handling.

The basic communication unit supported by the router is the flit. A request is thus broken down in flits by the network interface and sent to the local router (injection), and requests are rebuilt using flits received from the local router (ejection). The router has no information of application messages, and it just sees them as a stream of flits. As a sequence of flits can be arbitrarily long, the router can offer the maximum flexibility, allowing requests of unspecified length. The ultimate goal of the router is to ensure that flits are correctly injected, ejected and forwarded (routed) through the mesh.

Routing protocol

The NaplesPU system works under the assumption that no flit can be lost. This means that routers must buffer packets, and eventually stall in case of full output buffers, to avoid packet drop. In this process, as routers wait for each other, a circular dependency can potentially be created. Since routers cannot drop packets, a deadlock may occur, and we must prevent it from happening.

As we route packets through the mesh, a flit can enter a router and leave it from any cardinal direction. It is obvious that routing flits along a straight line cannot form a circular dependency. For this reason, only turns must be analyzed. The simplest solution is to ban some possible turns, in a way that disallows circular dependency.

The routing protocol adopted in NaplesPU is called XY Dimension-Order Routing, or DOR. It forces a packet to be routed first along the X-axis, and then along the Y-axis. It is one of the simplest routing protocols, as it takes its decision independently of current network status (a so-called oblivious protocol), and requires little logic to be implemented, although offering deadlock avoidance and shortest path routing.

Besides that, NaplesPU Network System implements a routing look-ahead strategy. Although this is an implementation-phase optimization, it may be useful to report it here. Every router does the route calculation, and so the output port calculation, as if it were the next router along the path, and sends the result along with the flit. This means that every router, when receiving a flit, already knows the output port of that flit. This allows reducing router's pipeline length, as now the routing calculation can be done in parallel with other useful work (like output port allocation). Further details are discussed on the router page.

Virtual channels and flow control

Virtual channels are an extensively adopted technique in network design. They allow building multiple virtual networks starting from a single physical one.

The main problem they try to solve is head-of-line blocking. It happens when a flit that cannot be routed (maybe because the next router input buffer is full) reaches the head of the input queue, preventing the successive flits, which potentially belong to independent traffic flows, from being served. If those blocked flits belong to different traffic flows, it makes sense to buffer them on different queues.

Virtual channels are called virtual for this reason: there is only a single link between two routers, but the result is like having multiple physical channels dedicated to each traffic flow. It is the router’s responsibility to ensure that virtual channels are properly time multiplexed on the same link.

Virtual channels can also be used to prevent deadlocks, in case the routing protocol allows them. To achieve this, virtual channels allocation must happen in a fixed order; or an “escape virtual channel” must be provided, whose flits gets routed with a deadlock-free protocol. As long as virtual channels allocation is fair, a flit will eventually be served.

In NaplesPU the number of virtual channels is represented by the constant parameter VC_PER_PORT, currently set to 4, as many as the type of network messages: Requests, Responses, Forwards, Service Messages. As a given message type is associated with a specific virtual channel, the network interface component must know on which virtual channel it has to inject the flits. The router, as stated before, has no knowledge of application messages. This means that it must ensure as little as to guarantee that messages don’t get routed on wrong virtual channels: for this reason, a flit will be kept on the same virtual channel in every router along with its path. Besides that, given the absence of a flit reorder logic in the input buffers, flits of multiple packets cannot interleave on the same virtual channel. That is, it must be ensured that once a packet is partially been sent, no other packets can be granted access to that specific virtual channel, until the request is completely fulfilled.

In NaplesPU every virtual channel has a dedicated input buffer, so flow control is also be implemented on a virtual channel basis. This means that every router sends to its neighbour information regarding the status of its buffers, for each virtual channel: they are called back-pressure signals. In particular, back-pressure is implemented using a single signal, called ON/OFF, which when turned high prevents other routers from sending packets to that specific virtual channel.

Data structures

The network infrastructure uses a bunch of simple data structures to represent information.

The main data structure is the flit, which is the exact amount of information transferred between routers.

typedef struct packed {
	flit_type_t flit_type;
	vc_id_t vc_id;
	port_t next_hop_port;
	tile_address_t destination;
	tile_destination_t core_destination;
	logic [`PAYLOAD_W-1:0] payload;
} flit_t;

First of all, we must know which position a flit occupies in the original packet. In particular, we are only interested in a few scenarios:

  • this is a head flit (the first flit of a packet): it represents the start of a request; the router must decide when to grant access to the output virtual channel;
  • this is a body flit (not the first nor the last): the request has already been granted access to the output virtual channel, and body flits must leave from that same virtual channel, moreover the grant must be held;
  • this is a tail flit (the last flit of a packet): after serving this flit as a body flit, the grant can be released;
  • this is a head-tail flit (a packet composed of one flit): the flit gets routed as a head flit and the grant is released;

The flit type is thus defined in this way.

typedef enum logic [`FLIT_TYPE_W-1 : 0] {
	HEADER,
	BODY,
	TAIL,
	HT
} flit_type_t;

The virtual channel id is a numeric identifier. As stated before, the router has no knowledge of each virtual channel semantic, although it is reported as a comment on each line.

typedef enum logic [`VC_ID_W-1 : 0] {
	VC0, // Request
	VC1, // Response Inject
	VC2, // Fwd
	VC3  // Service VC
} vc_id_t;

Due to the look-ahead routing technique, the next hop port field is the output port through which the flit must leave the current router. The output port at the current router is in fact calculated by the previous router on the path, as explained in this paragraph. Each router has 5 I/O ports: one for each cardinal direction, and the local injection/ejection port.

typedef enum logic [`PORT_NUM_W-1 : 0 ] {
	LOCAL = 0,
	EAST  = 1,
	NORTH = 2,
	WEST  = 3,
	SOUTH = 4
} port_t;

As tiles are organized in a mesh, they can be identified by a two-dimensional coordinate.

typedef struct packed {
	logic [`TOT_Y_NODE_W-1:0] y;
	logic [`TOT_X_NODE_W-1:0] x;
} tile_address_t;

To support intra-tile addressing, an additional field is provided. At the moment, the devices that need to be addressed are the cache controller and the directory controller.

typedef enum logic [`DEST_TILE_W - 1 : 0] {
	TO_DC = `DEST_TILE_W'b00001,
	TO_CC = `DEST_TILE_W'b00010
} tile_destination_t;

Implementation