Peeking into the Alpha-Data KU3

At this point in my digital design adventure, I wanted to get my feet wet learning how to debug on live hardware. My goal was simple, I want to be able to interactively write to and read from a single register running on an implemented design. This post documents the process that can be followed to replicate my end results with Vivado and an Alpha-Data KU3 card.

Starting fresh

I like to start learning new things with a fairly bare-bones approach. So my first step here is to create a new project in Vivado. I named my project ku3-peeking, chose RTL Project for my project type and picked the FPGA part that’s on the KU3, xcku060-ffva1156-2-e.

New Project



Adding some source

Next I’ll add a couple SystemVerilog files. One for my top level source and one for the module that I’ll be peeking and poking at through the debugger.

My test module is pretty simple, just a positive edge triggered register.

`timescale 1ns / 1ps

module testmodule (
  input clk,
  input data_in,
  output logic data_out
);

  always_ff @ (posedge clk) begin
    data_out <= data_in;
  end

endmodule

My top module initially will just instantiate my testmodule and a few wires to eventually hook everything up.

`timescale 1ns / 1ps

module top ();

  wire clk;
  wire data_in;
  wire data_out;

  testmodule my_test (
    .clk(clk),
    .data_in(data_in),
    .data_out(data_out)
  );

endmodule

By default Vivado will look for the ‘top’ module when building the project, so this is good for now.

Adding a debug core

With the basic design in place, you can pull up the IP Catalog and select from a few different debug cores. For my goal, the VIO core is nice as it’ll let me read and drive signals.

VIO in catalog

Double clicking on the VIO core will bring up a prompt of options that can be set for the debug core.

VIO Customization Wizard

All of these defaults are fine for my case of a 1 bit register, but you can see you can easily add additional input and output probes. After hitting OK here you’ll get a prompt for starting an out-of-context synthesis run, which will synthesize the VIO core while you continue to work in Vivado.

Within the IP Sources tab, you can drill down into the hierarchy and find templates to instantiate this new core in your design files.

VIO Instance Template

So I’ll modify my top file, to add a new block that wires the VIO core to the clock and my testmodule instance.

  vio_0 my_test_vio (
    .clk(clk),
    .probe_in0(data_out),
    .probe_out0(data_in)
  );

Clocking in

For me, pulling in the clock for a legit hardware design was the most difficult part to figure out. I spelunked documentation and did much Googling, but eventually I reached out to some more experienced folks that helped lead me in the right direction.

The first step in this process is to check the users manual for the device at hand. In this case I’m looking to use the Fabric Clocks described in section 3.2.2 of Alpha Data’s ADM-PCIE-KU3 User Manual. The relevant bits here say there are 2 available fabric clocks, a 200MHz and a 250MHz clock. It lets me know that these pins use the LVDS I/O standard, what pins are available for each clock, and it also notes that I must set a constraint to set the DIFF_TERM_ADV to TERM_100 as there is a requirement that these clocks are terminated within the FPGA. I don’t fully comprehend what that means quite yet but the doc tells me to do it so I oblige.

To use the LVDS structures built into the FPGA, I need to instantiate a module that can take the LVDS input signals and provide a simple clock output. The module for this is called a Differential Input Buffer; it can be found in Vivado’s Language Templates window.

Differential Input Buffer Template

I’ll copy and clean up the template into my top module file, I’ll also add clk_p and clk_n as inputs to the top module so I can route them into the IBUFDS instance.

module top (
  input clk_p,
  input clk_n
);

  wire clk;
  wire data_in;
  wire data_out;

  IBUFDS #(
    .DQS_BIAS("FALSE")
  ) IBUFDS_inst (
    .O(clk),
    .I(clk_p),
    .IB(clk_n)
   );

Since I’m using this clock to drive multiple blocks, it’s a best practice to use a General Clock Buffer (BUFG) so that Vivado will choose an appropriate buffer for this FPGA and minimize clock skewing. This is found in the Templates as well, under Verilog->Device Primitive Instantiation->Kintex UltraScale->CLOCK->BUFFER->General Clock Buffer (BUFG).

I’ll add this to my top module as well, routing the output of my IBUFDS buffer to the BUFG. The output of BUFG will be the clock signal used by my module and by the VIO core.

module top (
  input clk_p,
  input clk_n
);

  wire clk_int;
  wire clk;
  wire data_in;
  wire data_out;

  IBUFDS #(
    .DQS_BIAS("FALSE")
  ) IBUFDS_inst (
    .O(clk_int),
    .I(clk_p),
    .IB(clk_n)
   );

  BUFG BUFG_inst (
    .O(clk),
    .I(clk_int)
   );

At this point I can run Synthesis and inspect the design. Here’s the schematic view that shows the big picture.

schematic

Selecting the clock pins

From the schematic view I can click on the clk_p pin and it’ll select that pin from the I/O Ports view below the schematic in the IDE. From there, I can set the Site, I/O Std and DIFF_TERM_ADV as described in the KU3 User Manual. In this case I’m choosing to use the 250MHz Fabric Clock.

IO Port Planning

After making these changes I used CTRL+S to save my settings, and Vivado prompted me to save a .xdc constraints file which I named myconstraints.xdc.

The file generated has these contents, part of the contents here are from the selections I made in the IDE and some are added by Vivado for the VIO core I instantiated.

set_property PACKAGE_PIN AA24 [get_ports clk_p]
set_property IOSTANDARD LVDS [get_ports clk_p]
set_property DIFF_TERM_ADV TERM_100 [get_ports clk_p]
set_property C_CLK_INPUT_FREQ_HZ 300000000 [get_debug_cores dbg_hub]
set_property C_ENABLE_CLK_DIVIDER false [get_debug_cores dbg_hub]
set_property C_USER_SCAN_CHAIN 1 [get_debug_cores dbg_hub]
connect_debug_port dbg_hub/clk [get_nets clk]

With that in place I can re-run synthesis, run implementation and generate a bitstream for my KU3.

Deploying and testing

With the bitstream generated, I can open the Hardware Manager. After connecting to my device I can use the Program Device option, which will auto-populate with my bitfile and debug probes file, hit Program and wait for the magic to happen.

Program Device

Once programmed, if all works well Vivado will automatically open a VIO dashboard window. In that window you can hit the green + to add the input and output probes.

Adding Probes

With that open I can start poking at the data_in signal and watch the updates reflect in the data_out signal.

Peeking and poking signals

Conclusion

With all this setup I have an easy means to do some interactive control with designs that are running in a live FPGA. I hope other Vivado noobies can follow this guide to help in their digital design adventures. If you follow this guide and run into any issues reach out to me and I’ll try to help you out.

I’d like to thank JT Kellington, Kevin Irick and Mark Paluszkiewicz for offering their help and experience. I ran into many issues trying to hack my way through this and their assistance was extremely helpful in getting this up and running. Thank you!

Hello AFU on Alpha-Data KU3

Picking up on the Hello AFU project, I’ve recently gone through the motions of building the Hello AFU project for an actual CAPI device and tested it out. This post documents the process I followed to build and deploy this on real hardware.

Requirements

To complete this process you’ll need a few things:
* A POWER8 based machine, for me I’m using a Barreleye server
* An Alpha-Data KU3 card
* The latest HDK archive from Alpha Data’s support site, at this time that file is named ADMPCIEKU3_CAPI_HDK_REL18MAR16.zip
* A licensed version of Xilinx’s Vivado



Preparing files for the build

First off, we need to extract the HDK

unzip ADMPCIEKU3_CAPI_HDK_REL18MAR16.zip

In the HDK by default, there will be some AFU source files in adku060_capi_1_1_release/Sources/afu/ we’ll jump in there and delete them, then copy over the SystemVerilog files from the hello-afu repository

cd adku060_capi_1_1_release/Sources/afu/
rm *
cp ~/projects/hello-afu/*.sv .

Next, open the project file adku060_capi_1_1_release/Sources/prj/psl_fpga.prj in a text editor to change a few lines. Remove all of the lines that start with verilog work, then add lines to reference the source files we copied into the afu directory. Some bash-fu for that:

cd ../prj
sed -i '/^verilog work/d' psl_fpga.prj
for i in `ls ../afu/*.sv | cut -d'/' -f3`; do echo "verilog work \"afu/$i\"" >> psl_fpga.prj; done

That should have us setup to build our AFU in leiu of the one that comes with the HDK!

Build and flash the binfile

With our files in the right spot and our project file modified, we just need to run a few of the tcl scripts in the HDK through vivado.

vivado -mode batch -source psl_fpga.tcl -notrace
vivado -mode batch -source write_bitstream.tcl -notrace

The first run here does the heavy lifting of synthesis, place and route, etc. The second command generates the actual binfile and bitfile that we can use to flash the device. The first command takes a significant amount of time on my i7-equipped laptop, about 40 minutes, the second command completed in about 9 seconds. Maybe someday we’ll have a CAPI-based accelerator for synthesis and place & route! Now that the building is complete I have my bitfile at capi-adku060/psl_fpga_flash.bin

To flash this to your device to a card that already has the PSL working you can use the capi-flash-script utility. If your card is factory-fresh or in a bad state, you can use a JTAG programmer and Vivado’s Hardware Manager to flash directly from your laptop, or remotely via xvcserver.

Using the AFU

After I flashed my AFU, I ensured libcxl was setup on my server. Since I’m running Ubunt 16.04 I simply installed it via apt.

apt-get install -y libcxl-dev

Next I rebooted the machine so that everything is nice and fresh, as part of the PCIe reset the bitfile from the KU3’s flash chip will be flashed onto the FPGA. I can verify the card is in a good state because I have my cxl device at /dev/cxl/afu0.0d.

I run my test_afu binary from the hello-afu project and boom! The same result as I get from simulation, woo-hoo!

How I scripted testing for SystemVerilog

In Python and other languages, I’m used to having test suites and a handy runner to pull it all together. While digital logic designers are usually pretty good about writing test benches to go along with the RTL code, I didn’t find a lot of resources out on the interwebs that described a process for automating tests against SystemVerilog code. In my desire to have something similar to continuous integration testing for RTL I explored what options I’d have for scripting the process.

Design Under Test

The motivator for this excersize is that I was wanted to extend my shift register implementation to support a configurable depth. Initially it would shift data after one clock cycle, but it can be useful to have other delays as well. I also wanted to use this as an example for the SystemVerilog package manager I’ve been slapping together, so I wanted some way to tell early if I do something dumb that’ll impact other projects that use this shift register.

So I extended my module to include this new depth parameter:

module shift_register
    #(parameter width = 1,
      parameter depth = 1) (
  input logic clock,
  input logic [0:width-1] in,
  output logic [0:width-1] out);

  logic [0:depth-1][0:width-1] data;

  always_ff @ (posedge clock) begin
    {out, data} <= {data, in};
  end
endmodule

To test this I wrote this basic test bench:

module test_shift_register_default();

  logic clock;
  logic data_in;
  logic data_out;

  shift_register dut(clock, data_in, data_out);

  initial begin
    clock <= 0;
    data_in <= 0;
    #3 data_in <= 1;
    #4 assert (data_out == 0);
    data_in <= 0;
    #4 assert(data_out == 1);
    #5 $finish();
  end

  // 2ns clock
  always #2 clock <= ~clock;

endmodule

This test runs some basic assertions to validate the vanilla version of this shift module, I wrote tests for a few parameter permutations as well.



Vivado Batch Mode

Since the cards I have at work to tinker with use Xilinx based FPGAs, I decided to see how I could go through this process using Vivado. I read partially through an official Vivado simulation tutorial; in particular, chapter 3 on running simulations in “batch mode”. After some tinkering with the process I found this was a lot easier than I initially expected.

The first thing I needed to do was use xvlog to parse and build my SystemVerilog code.

$ xvlog --sv shift_register.sv tests/*.sv
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "/home/kwilke/projects/shift-register/shift_register.sv" into library work
INFO: [VRFC 10-311] analyzing module shift_register
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "/home/kwilke/projects/shift-register/tests/test_16bit_delayed.sv" into library work
INFO: [VRFC 10-311] analyzing module test_shift_register_16bit_delayed
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "/home/kwilke/projects/shift-register/tests/test_8bit.sv" into library work
INFO: [VRFC 10-311] analyzing module test_shift_register_8bit
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "/home/kwilke/projects/shift-register/tests/test_default.sv" into library work
INFO: [VRFC 10-311] analyzing module test_shift_register_default
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "/home/kwilke/projects/shift-register/tests/test_delayed.sv" into library work
INFO: [VRFC 10-311] analyzing module test_shift_register_delayed

This creates a xsim.dir that has a file built for each module in my project, and a few log files.

Next, I use xelab to create a simulation snapshot. I’m not sure what a simulation snapshot is, but I know the xsim binary enjoys working with them! Generally I’m just giving it an argument for the module I want to use as my top level entity.

$ xelab test_shift_register_default
Vivado Simulator 2015.4
Copyright 1986-1999, 2001-2015 Xilinx, Inc. All Rights Reserved.
Running: /opt/Xilinx/Vivado/2015.4/bin/unwrapped/lnx64.o/xelab test_shift_register_default 
Multi-threading is on. Using 2 slave threads.
Starting static elaboration
Completed static elaboration
Starting simulation data flow analysis
Completed simulation data flow analysis
Time Resolution for simulation is 1ns
Compiling module work.shift_register
Compiling module work.test_shift_register_default
Built simulation snapshot work.test_shift_register_default

****** Webtalk v2015.4 (64-bit)
  **** SW Build 1412921 on Wed Nov 18 09:44:32 MST 2015
  **** IP Build 1412160 on Tue Nov 17 13:47:24 MST 2015
    ** Copyright 1986-2015 Xilinx, Inc. All Rights Reserved.

source /home/kwilke/projects/shift-register/xsim.dir/work.test_shift_register_default/webtalk/xsim_webtalk.tcl -notrace
INFO: [Common 17-206] Exiting Webtalk at Thu Mar 17 08:45:07 2016...

This adds some more data to the xsim.dir and brings them to a state that I can run the tests in. At this point I can use xsim with the -R flag to run simulation until it ends, then quit.

$ xsim -R test_shift_register_default

****** xsim v2015.4 (64-bit)
  **** SW Build 1412921 on Wed Nov 18 09:44:32 MST 2015
  **** IP Build 1412160 on Tue Nov 17 13:47:24 MST 2015
    ** Copyright 1986-2015 Xilinx, Inc. All Rights Reserved.

source xsim.dir/work.test_shift_register_default/xsim_script.tcl
# xsim {work.test_shift_register_default} -maxdeltaid 10000 -autoloadwcfg -runall
Vivado Simulator 2015.4
Time resolution is 1 ns
run -all
$finish called at time : 16 ns : File "/home/kwilke/projects/shift-register/tests/test_default.sv" Line 24
exit
INFO: [Common 17-206] Exiting xsim at Thu Mar 17 08:48:42 2016...

Automating the process

Initially, I whipped up a Makefile to allow me to run a make test to test my project. Since the xsim program doesn’t exit with an error status code for assertion failures, I was piping it’s output to a small shell script. This process was a little wonky.

At this point I decided to add it to Packilog so I could control the process a little easier with Python. I found this to be a pleasant and familar type of flow.

$ packilog -t
Using vivado testing driver
Building tests.
Running test module "test_shift_register_default"
Simulating test module 'test_shift_register_default'.

****** xsim v2015.4 (64-bit)
  **** SW Build 1412921 on Wed Nov 18 09:44:32 MST 2015
  **** IP Build 1412160 on Tue Nov 17 13:47:24 MST 2015
    ** Copyright 1986-2015 Xilinx, Inc. All Rights Reserved.

source xsim.dir/work.test_shift_register_default/xsim_script.tcl
# xsim {work.test_shift_register_default} -maxdeltaid 10000 -autoloadwcfg -runall
Vivado Simulator 2015.4
Time resolution is 1 ns
run -all
$finish called at time : 16 ns : File "/home/kwilke/projects/shift-register/tests/test_default.sv" Line 24
exit
INFO: [Common 17-206] Exiting xsim at Thu Mar 17 08:53:11 2016...

Result: PASS
Running test module "test_shift_register_delayed"
Simulating test module 'test_shift_register_delayed'.

****** xsim v2015.4 (64-bit)
  **** SW Build 1412921 on Wed Nov 18 09:44:32 MST 2015
  **** IP Build 1412160 on Tue Nov 17 13:47:24 MST 2015
    ** Copyright 1986-2015 Xilinx, Inc. All Rights Reserved.

source xsim.dir/work.test_shift_register_delayed/xsim_script.tcl
# xsim {work.test_shift_register_delayed} -maxdeltaid 10000 -autoloadwcfg -runall
Vivado Simulator 2015.4
Time resolution is 1 ns
run -all
$finish called at time : 32 ns : File "/home/kwilke/projects/shift-register/tests/test_delayed.sv" Line 28
exit
INFO: [Common 17-206] Exiting xsim at Thu Mar 17 08:53:13 2016...

Result: PASS
Running test module "test_shift_register_8bit"
Simulating test module 'test_shift_register_8bit'.

****** xsim v2015.4 (64-bit)
  **** SW Build 1412921 on Wed Nov 18 09:44:32 MST 2015
  **** IP Build 1412160 on Tue Nov 17 13:47:24 MST 2015
    ** Copyright 1986-2015 Xilinx, Inc. All Rights Reserved.

source xsim.dir/work.test_shift_register_8bit/xsim_script.tcl
# xsim {work.test_shift_register_8bit} -maxdeltaid 10000 -autoloadwcfg -runall
Vivado Simulator 2015.4
Time resolution is 1 ns
run -all
$finish called at time : 16 ns : File "/home/kwilke/projects/shift-register/tests/test_8bit.sv" Line 24
exit
INFO: [Common 17-206] Exiting xsim at Thu Mar 17 08:53:25 2016...

Result: PASS
Running test module "test_shift_register_16bit_delayed"
Simulating test module 'test_shift_register_16bit_delayed'.

****** xsim v2015.4 (64-bit)
  **** SW Build 1412921 on Wed Nov 18 09:44:32 MST 2015
  **** IP Build 1412160 on Tue Nov 17 13:47:24 MST 2015
    ** Copyright 1986-2015 Xilinx, Inc. All Rights Reserved.

source xsim.dir/work.test_shift_register_16bit_delayed/xsim_script.tcl
# xsim {work.test_shift_register_16bit_delayed} -maxdeltaid 10000 -autoloadwcfg -runall
Vivado Simulator 2015.4
Time resolution is 1 ns
run -all
$finish called at time : 21 ns : File "/home/kwilke/projects/shift-register/tests/test_16bit_delayed.sv" Line 24
exit
INFO: [Common 17-206] Exiting xsim at Thu Mar 17 08:53:28 2016...

Result: PASS
Package Test Result: PASS

Hopefully the test driver pattern in Packilog will allow for easy cross-tool testing, as I would like to get support for a variety of common tools. ‘m pretty happy with this flow so far because it allows me to write my code in my editor of choice and use a simple command line program to pull it all together.