This is the 6th and final part of my Hello AFU tutorial. In the last post, I started building out a state machine for the AFU and read from the data structure that the WED points to. In this post, I’ll finish off the state machine, pulling down the data in our stripes XOR them together and write that data back to userland.
Reading the Stripes
Since the largest memory size I can request via the PSL is for 128 bytes, I’ll make requests for that amount. I need a scratch pad for this data so I’ll add two 1024 bit internal registers for these chunks of data. I’ll also need a variable to know when I’ve received both chunks, so I’ll setup a small register for that as well.
logic [0:1023] stripe1_data; logic [0:1023] stripe2_data; logic stripe_received;
REQUEST_STRIPES state I’ll request data from stripe1 in one cycle, then stripe2 in the next, I’ll use the command’s tag to know where I am in that process. I’ll set my
stripe_received to 0, to indicate I’ve not yet retrieved either.
REQUEST_STRIPES: begin command_out.valid <= 1; command_out.size = 128; command_out.command <= READ_CL_NA; if (command_out.tag == REQUEST_READ) begin command_out.tag <= STRIPE1_READ; command_out.address <= request.stripe1; end else begin command_out.tag <= STRIPE2_READ; command_out.address <= request.stripe2; current_state <= WAITING_FOR_STRIPES; stripe_received <= 0; end end
With the requests for stripe data sent, I need to wait for the data to come back. This could happen in any order, so I need to be ready for either.
WAITING_FOR_STRIPES: begin command_out.valid <= 0; if (buffer_in.write_valid) begin case(buffer_in.write_tag) STRIPE1_READ: begin if (buffer_in.write_address == 0) begin stripe1_data[0:511] <= buffer_in.write_data; end else begine stripe1_data[512:1023] <= buffer_in.write_data; end end STRIPE2_READ: begin if (buffer_in.write_address == 0) begin stripe2_data[0:511] <= buffer_in.write_data; end else begine stripe2_data[512:1023] <= buffer_in.write_data; end end endcase end end
In the same state, I’ll look for the tags to come in over the response interface. On the first request I set the
stripe_received register, the second request the state progresses to
if (response.valid) begin if (response.tag == STRIPE1_READ || response.tag == STRIPE2_READ) begin if (stripe_received) begin current_state <= WRITE_PARITY; end else begin stripe_received <= 1; end end end
Where is this Parity?
I decided to parity the stripes via
assign, by creating one new internal variable
parity_data can be referenced for the XOR’d value of
logic [0:1023] parity_data; assign parity_data = stripe1_data ^ stripe2_data;
Since I set the buffer latency to 1, the data being put on the buffer for writing to memory needs to be shifted back a cycle.
logic [0:511] write_buffer; shift_register #(512) write_shift ( .clock(clock), .in(write_buffer), .out(buffer_out.read_data));
Now I need to write the parity data to the memory at
request.parity. This is pretty similar to reading memory. I’ll send a
WRITE_CL “write cacheline” command and align my data with
buffer_out.read_data, returning the first half for address 0 and the high half in 1.
WRITE_PARITY: begin if (command_out.tag != PARITY_WRITE) begin command_out.command <= WRITE_NA; command_out.address <= request.parity; command_out.tag <= PARITY_WRITE; command_out.valid <= 1; end else begin command_out.valid <= 0; // Read half depending on address if (buffer_in.read_address == 0) begin write_buffer <= parity_data[0:511]; end else begin write_buffer <= parity_data[512:1023]; end // Handle response if (response.valid && response.tag == PARITY_WRITE) begin current_state <= DONE; end end end
After the parity is written, the job is complete. The state progresses to
DONE when the write comes back on the response interface.
done flag is a little trickier, since it is not on a 128 or 64-byte alignment. The PSL can handle writing to any address, but the data must be aligned within the 128-byte read bus. If the data size you’re writing to is 64 bytes or less you can let the same data sit on the buffer interface for both addresses.
In this case, the
done field is 32 bytes past WED. and I’m doing a 1 byte write. I’ll align my data starting at the 256th bit, writing 8 bits. I’ll write a 1 in the first byte to set the little-endian unsigned 64bit number to a non-zero.
DONE: begin if (command_out.tag != DONE_WRITE) begin command_out.tag <= DONE_WRITE; command_out.size <= 1; command_out.address <= wed + 32; command_out.valid <= 1; write_buffer[256:319] <= 1; end else begin command_out.valid <= 0; end end
With that, the parity is written and the userspace application can see when it completes. Here’s the output from the
INFO:Connecting to host 'localhost' port 16384 [example structure example: 0x7fa500 example->size: 128 example->stripe1: 0x7fa600 example->stripe2: 0x7fa780 example->parity: 0x7fa880 &(example->done): 0x7fa520 Attached to AFU Waiting for completion by AFU done: 0 done: 0 done: 1 PARITY: That is some proper parity! This is exactly what I'm expecting to see. I'd also like to see this running on some real gear soon Releasing AFU
That completes the basic function of this AFU, I’ll commit my changes here.
Now I’ll extend the design to support more than 128-byte buffers, this just requires an offset buffer that keep track of the current offset relative to the total size of the buffer to generate parity for.
I’ll start by adding a new variable for the offset that matches the data type as size.
longint unsigned offset;
Then I’ll set it to 0 in the
offset <= 0;
REQUEST_STRIPES state I’ll add the offset to the stripe pointers.
command_out.address <= request.stripe1 + offset;
WRITE_PARITY state I’ll add the offset to the parity pointer, and check to see if the operation is complete.
command_out.address <= request.parity + offset;
if (offset + 128 < request.size) begin offset <= offset + 128; current_state <= REQUEST_STRIPES; end else begin current_state <= DONE; end
With that I’d say this AFU is good enough for this tutorial. I’ll commit my changes and welcome pull requests if you find improvements to this tutorial. Hope this helps you hack on CAPI!