Difference between revisions of "Network interface"

From NaplesPU Documentation
Jump to: navigation, search
Line 1: Line 1:
 +
TODO: rivalidare figura
 +
 
The network interface implementation is discussed on this page.
 
The network interface implementation is discussed on this page.
  

Revision as of 17:01, 30 December 2018

TODO: rivalidare figura

The network interface implementation is discussed on this page.

It has the role of abstracting the network communication details from the tile components. For this reason, on one side it communicates with tile components, and on the other with the router. It must know packets format, to break them in flits to be injected. It must also know on which virtual channel a packet should be injected/ejected.

The virtual channels used by tile components are reported below.

Virtual channel usage

It should be clear that because of the coherence protocol implemented, both the directory controller and the cache controller will need access to the Response virtual channel. This needs to be handled correctly by the network interface.

Another important feature that must be implemented is multicast addressing. In fact the directory controller could send a packet to multiple recipients. As we don't have explicit support for multicasting in the routing protocol, it needs to be handled as a sequence of unicast messages.

General architecture

The network interface has a pretty regular internal structure. This is because ejection and injection can be implemented as two distinct functionality, allowing a modular design. Moreover, the ejection and injection logic is almost the same for every virtual channel.

For this reason, two main modules are provided, which handle the network communication for a single virtual channel:

  • virtual_network_core_to_net, which handles flit injection;
  • virtual_network_net_to_core, which handles ejection.

Both of them are parameterized to adapt easily to the different virtual channel needs. In particular the virtual channel number and the packet length must be specified.

As the router eject flits toward the tile, only the network interface for that specific virtual channel will reassemble them into a complete request and buffer it, until the corresponding component is ready to work. An example for virtual channel 0 is provided below.

// --- Request Virtual Network VC0 --- //
virtual_network_net_to_core # (
	.VCID             ( VC0                                   ),
	.PACKET_BODY_SIZE ( $bits ( coherence_request_message_t ) ),
	.FLIT_NUMB        ( `RESP_FLIT_NUMB                       ),
	...
	...
)
request_virtual_network_net_to_core (
	...
	//Cache Controller interface
	.vn_ntc_packet_out    ( ni_request           ),
	.vn_ntc_packet_valid  ( ni_request_valid     ),
	.core_packet_consumed ( dc_request_consumed  ),
	//Router interface
	.vn_ntc_credit        ( ni_credit[VC0]       ),
	.router_flit_valid    ( router_flit_in_valid ),
	.router_flit_in       ( router_flit_in       )
);

Note that the packet body size parameter is linked with the flit number parameter, but the module handles them separately.

On the other side, the injection logic will buffer outgoing packets, splitting them to flits, and compete with the others to obtain access to the unique router local port. To grant access to the local injection port, a round-robin arbiter with a grant-and-hold circuitry has been used. The granted virtual channel index is used as a selector in a multiplexer which sends the right flit to the router.

assign vno_requests =
	{vn_packet_pending[ VC3 ] & ~router_credit[VC3],
		vn_packet_pending[ VC2 ] & ~router_credit[VC2],
		vn_packet_pending[ VC1 ] & ~router_credit[VC1],
		vn_packet_pending[ VC0 ] & ~router_credit[VC0]};

rr_arbiter # (
	.NUM_REQUESTERS ( `VC_PER_PORT )
)
ni_request_rr_arbiter (
	.clk        ( clk          ),
	.reset      ( reset        ),
	.request    ( vno_requests ),
	.update_lru ( 1'b1         ),
	.grant_oh   ( vno_granted  )
) ;

oh_to_idx # (
	.NUM_SIGNALS ( `VC_PER_PORT ),
	.DIRECTION   ( "LSB0"       )
)
ni_request_grant_oh_to_idx (
	.one_hot ( vno_granted    ),
	.index   ( vco_granted_id )
);

assign ni_flit_out    = vn_flit_out[vco_granted_id],
	ni_flit_out_valid = vn_flit_valid[vco_granted_id];

Special care must be given to the response virtual channel. Two injection and two ejection modules are instanced for this virtual channel, each interfacing respectively with the cache controller and the directory controller.

The ejection module supports a specific parameter to let it know if it is interfacing the directory or the cache controller. When this parameter is set, it will check the core_destination field in the flit (see flit structure) to know if it should ignore it.

The injection modules are not aware of who is using them. For this reason, both will compete to be granted access to the virtual channel (and the winner will compete with the others to access router input port). A round-robin arbiter with a grant-and-hold circuitry is used. This will ensure that once one of the controllers has gained access to the virtual channel, it will retain it until the full request has been sent.

// each bit is high respectively if the Cache Controller or the Directory Controller wants to inject a packet
assign pending_tmp = {response_in[CC_ID].vn_packet_pending, response_in[DC_ID].vn_packet_pending};

// the arbiter chooses among the two of them
grant_hold_rr_arbiter #(
	.NUM_REQUESTERS( 2 )
)
response_vn_rr_arbiter (
	.clk      ( clk                  ),
	.reset    ( reset                ),
	.request  ( pending_tmp          ),
	.hold_in  ( pending_tmp          ),
	.grant_oh ( response_vn_grant_oh )
);

// the arbitration result is used to select the winning flit, and the signals are updated accordingly
assign vn_packet_pending[ VC1 ] = |response_vn_grant_oh ;
assign vn_flit_out[VC1]         = response_vn_grant_oh[0]? response_in[DC_ID].vn_flit_out   : response_in[CC_ID].vn_flit_out;
assign vn_flit_valid[VC1]       = response_vn_grant_oh[0]? response_in[DC_ID].vn_flit_valid : response_in[CC_ID].vn_flit_valid;

Network Interface

Network to core module

This module is composed mainly of two parts: a control unit which handles the incoming flits, rebuilding them as a packet; and a queue of rebuilt packets.

Control unit

The control unit is composed of a simple logic driving registers to store the temporary results of the rebuilding process.

It keeps track of the count of flits received until now, and uses this count to know where the new incoming flits should be placed in the rebuilt packet.

logic       [$clog2( FLIT_NUMB ) - 1 : 0] count;

flit_body_t [FLIT_NUMB - 1 : 0]           rebuilt_packet;

...

if (router_flit_valid) begin
	rebuilt_packet[count] <= router_flit_in.payload;
	
	if (router_flit_in.flit_type == TAIL || router_flit_in.flit_type == HT) begin
		count <= '{default: '0};
		cu_packet_rebuilt_compl <= 1'b1;
	end else
		count <= count + 1;
...

It should be noted that it also detects if the flits are meant to be delivered to the cache controller or to the directory controller. This is because on the same response virtual channel we can find requests for both of them, as explained above.

	if (router_flit_in.flit_type == HEADER || router_flit_in.flit_type == HT) begin
		cu_is_for_cc <= router_flit_in.core_destination == TO_CC;
		cu_is_for_dc <= router_flit_in.core_destination == TO_DC;
	end 

Rebuilt packets queue

Rebuilt packets are stored in a FIFO, so we can enqueue multiple requests. When the receiver component is ready to handle the request, it will assert the core_packet_consumed signal, de facto freeing one buffer slot.

The back-pressure signals will be raised when there are two free buffer slots. This accounts for the worst case, when there is a sequence of 1-flit packets incoming, some of which are yet in the pipe stages and should not be lost. The pipe stages in between are two: the router crossbar and the control unit of this module.

sync_fifo #(
	.WIDTH                 ( PACKET_BODY_SIZE     ),
	.SIZE                  ( PACKET_FIFO_SIZE     ),
	.ALMOST_FULL_THRESHOLD ( PACKET_FIFO_SIZE - 2 ) 
)
rebuilt_packet_fifo (
	...
	.almost_full ( packet_alm_fifo_full      ),
	.enqueue_en  ( enqueue_en                ),
	.value_i     ( cu_rebuilt_packet         ),
	.empty       ( rebuilt_packet_fifo_empty ),
	.almost_empty(                           ),
	.dequeue_en  ( core_packet_consumed      ),
	.value_o     ( vn_ntc_packet_out         )
);

assign vn_ntc_credit       = packet_alm_fifo_full;

The enqueue signal is generated from the incoming flit virtual channel ID. For the reason explained above, the only exception are that of cache and directory controllers. In that case, the module will also check that the flit is for the cache/directory controller, and it will enqueue it based on the TYPE parameter.

generate
	if ( TYPE == "CC" )
		assign
			enqueue_en    = cu_packet_rebuilt_compl & cu_is_for_cc;
	else if ( TYPE == "DC" )
		assign enqueue_en = cu_packet_rebuilt_compl & cu_is_for_dc;
	else
		assign enqueue_en = cu_packet_rebuilt_compl;
endgenerate

Example

If the incoming packet arrives as this sequence of flits:

1st Flit = {FLIT_TYPE_HEAD, FLIT_BODY_SIZE'h20}
2nd Flit = {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h40}
3rd Flit = {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h60}
4th Flit = {FLIT_TYPE_TAIL, FLIT_BODY_SIZE'h10};

The rebuilt packet passed to the Cache Controller is:

Packet = {FLIT_BODY_SIZE'h10, FLIT_BODY_SIZE'h60, FLIT_BODY_SIZE'h40, FLIT_BODY_SIZE'h20}

Core to network module

This module should split a packet into flits, and send them to the router local port. It should also support multicast, implemented as multiple unicast messages. It is composed of two parts: a packet queue, and a control unit which handles the outgoing flits.

Request queue

Incoming requests are enqueued in a FIFO as the control unit handles them. It also provides stop signals for the device that is generating requests, in case the FIFO gets full.

The structure actually enqueued is composed of the packet body among with all the recipients of the message.

typedef struct packed {
	logic [PACKET_BODY_SIZE - 1 : 0] packet_body;
	logic packet_has_data;
	tile_address_t [DEST_NUMB - 1 : 0 ] packet_destinations;
	logic [DEST_NUMB - 1 : 0 ] packet_destinations_valid;
} packet_information_t;

A request will be enqueued when the requester asserts the packet_valid signal, and the head of the FIFO will be dequeued when the control unit will notify completion.

sync_fifo # (
	.WIDTH                 ( $bits ( packet_information_t ) ),
	.SIZE                  ( PACKET_FIFO_SIZE               ),
	.ALMOST_FULL_THRESHOLD ( PACKET_ALMOST_FULL_THRESHOLD   )
)
packet_in_fifo (
	...
	.almost_full  ( vn_packet_fifo_full    ),
	.enqueue_en   ( packet_valid           ),
	.value_i      ( packet_information_in  ),
	.empty        ( packet_fifo_empty      ),
	.almost_empty (                        ),
	.dequeue_en   ( cu_packet_dequeue      ),
	.value_o      ( packet_information_out )
) ;

Request signals are generated, which allow this module to compete for router port access.

assign packet_pending                               = ~packet_fifo_empty;

Control unit

Control unit's responsibility are:

  • to split a packet in flits;
  • to determine the type (head, body, tail, head-tail) of each flit;
  • to do a pre-routing of the outgoing flits, as routers implement routing look-ahead (see Network);
  • to properly handle multicast messages, if required;
  • to account for router back-pressure signals;
control_unit_packet_to_flit # (
	parameter DEST_OH          = "TRUE",
	...
	parameter PACKET_BODY_SIZE = 256,
	parameter DEST_NUMB        = 4 )
(
	...
	input  logic                                                packet_valid,
	input  logic                                                packet_has_data,
	input  tile_address_t [DEST_NUMB - 1 : 0]                   packet_destinations,
	input  logic          [DEST_NUMB - 1 : 0]                   packet_destinations_valid,

	input  logic                                                flit_credit,

	output logic          [$clog2 ( PACKET_BODY_SIZE ) - 1 : 0] cu_packet_chunck_sel,
	output logic                                                cu_flit_valid,
	output flit_header_t                                        cu_flit_out_header,
	output logic                                                cu_packet_dequeue

);

A request is considered fulfilled when a unicast message has been sent to all its recipients. Recipients are served in a round-robin fashion.

rr_arbiter # (
	.NUM_REQUESTERS ( DEST_NUMB )
)
rr_arbiter (
	.clk        ( clk                  ) ,
	.reset      ( reset                ) ,
	.request    ( packet_dest_pending  ) ,
	.update_lru ( 1'b0                 ) ,
	.grant_oh   ( destination_grant_oh )
);

Already served recipients are stored in a bit mask, which gets updated with the grant signal after each sending.

dest_served <= dest_served | destination_grant_oh;

Using this bit mask we can know if there is any remaining recipient.

assign packet_dest_pending                 = packet_destinations_valid & ~dest_served;
assign packet_has_dest_pending             = |packet_dest_pending;

Routing is done for each recipient, based on the destination address.

The parameter DEST_OH determines how multicast addresses are generated. If DEST_OH is true, the module will consider the input signal packet_destination as a one-hot encoded bit mask, where every tile has a corresponding position in this mask. Otherwise, if DEST_OH is false, multicast addresses are passed to the module into the packet_destinations input signal, and packet_destinations_valid is used to know which position into the array contains a valid address.

generate
	if ( DEST_OH == "TRUE" ) begin
		assign
			real_dest.x  = destination_grant_id[`TOT_X_NODE_W - 1 : 0 ],
			real_dest.y  = destination_grant_id[`TOT_Y_NODE_W + `TOT_X_NODE_W - 1 : `TOT_X_NODE_W];
	end else begin
		assign real_dest = packet_destinations[destination_grant_id];
	end
endgenerate

A finite state machine handles the remaining required actions, keeping count of how many flits have been sent up until now, and choosing the flit payload among the current packet chunk to be sent.