CAPI (Coherent Accelerator Processor Interface) is an exciting technology that should allow developers to more easily design applications that utilize a FPGA accelerator. This article documents my initial spelunking into this technology.
A little context
This is my first foray into tinkering with FPGAs and digital logic design in general. For those unfamiliar with this technology, an FPGA (Field-Programmable Gate Array) is a type of integrated circuit that essentially allows software defined hardware. For the designer it’s almost like a pile of digital logic gates and some mechanisms that allow you to define how they are connected together. With this technology, and the appropriate skill sets, the FPGA can be programmed to act like nearly any piece of digital hardware.
An accelerator is used kind of like a co-proccessor that can be used to hardware implement computationally expensive algorithms. The idea is that instead of processing something on the general purpose CPU in your computer you delegate processing to a piece of hardware designed specifically for the task at hand; somewhat similar to using a GPU for 3D rendering. In my first couple projects my goal is to make something more functional than practical.
CAPI is a technology that should allow me to focus on the interesting parts of designing an accelerated application. Instead of worrying about how I’m going to communicate between code running in a Linux userspace application and custom piece of hardware, I get to focus my efforts on the application and the hardware itself!
To run it on real gear you’ll need a POWER8-based server, for me the plan is to tinker with this on the Barreleye server that I
play with work on for Rackspace. To make this more accessible to other developers I will focus mostly on my design process and simulation on my x86_64 workstation.
My simulation environment
If you want to set this up for yourself I recommend you grab the flavor of Ubuntu that you like the most and install the Quartus Prime software. I’m using the 30-day Evaluation of Quartus Prime Standard Edition, but I believe the free Lite Edition would suffice for this tinkering as well. Elect to install ModelSim Starter Edition as part of the Quartus installation process.
Moar CAPI talk and terminology
An important part of the CAPI system on the FPGA side of things is the POWER Service Layer (PSL), which helps create the bridge between your custom hardware and userspace application. The accelerator itself is referred to as a Accelerator Function Unit (AFU) in the context of CAPI, this is the part I am most interested in designing.
On the userspace side of things, libcxl is the library you include in your application to communicate with the PSL and the AFU(s) behind it.
The Power Service Layer Simulation Engine (pslse) can be used to help design and test this technology without the need for the physical gear. In the next few bits I’ll outline the process I have taken to set this up on my machine and run a sample project.
Building and setting up PSLSE
First, clone down the pslse repo from github
git clone https://github.com/ibm-capi/pslse
Build the AFU driver
The AFU driver is used by ModelSim to transmit signals between a simulated design and a running instance of PSLSE. To build it you’ll need to find the
vpi_user.h header included in your ModelSim installation. For me this is located in
/home/$USER/altera/15.1/modelsim_ase/include/. You’ll also need to compile for 32bit as ModelSim is a 32bit application.
cd pslse/afu_driver/src/ export VPI_USER_H_DIR="/home/$USER/altera/15.1/modelsim_ase/include/" BIT32=y make
If you get an error about not finding a
cdefs.h header, you’ll just need to install the
You can run
file veriuser.sl to verify it generated a
ELF 32-bit LSB shared object.
Build pslse itself
PSLSE has a straight forward build process, just make sure to build this for 32bit use as well.
cd ../../pslse/ BIT32=y make
Build libcxl from pslse repo
There is a variant of libcxl inside of the PSLSE repo that is modified for use in a simulated environment. This can be compiled for 64bit architecture as it communicates with the pslse over a socket.
cd ../libcxl/ make
Memcopy example project
IBM has a downloadable Memcopy Demo you can find here to test your setup. The next couple steps will outline the process I’ve taken to run this sample project.
Create new Quartus project
In Quartus, go to
New Project Wizard.... If you get the introduction screen, click next to get to the
Directory, Name, Top-Level Entity page. Set the working project directory to a new directory to store the project files, I named my directory
memcopy-example. I also named my project
memcopy-example. After naming the project the top level design entity field will mimic the project name, but for this project we’ll use
top as our top level entity to match the
top.v file provided in the pslse repo. After filling in those fields you can hit
Finish to exit the wizard.
Copy files into project directory
From the pslse repo, copy
afu_driver/verilog/top.v into your new project directory.
From the MemcopyDemoKit.tar.gz archive, copy all the files in
capi-memcpy/memcpy/ into your project directory.
Synthesize and start simulation
k or go to
Start Analysis & Synthesis to build the project. When complete the bottom message area should say something like
Quartus Prime Analysis & Synthesis was successful. 0 errors, 57 warnings and you’ll see a green check next to
Analysis & Elaboration in the tasks window.
Next, go to
Run Simulation Tool->
RTL Simulation to open ModelSim. On my box this initially gave me an error about some license file stuff, even though I was using the free version of ModelSim. I followed this helpful post to fix the missing dependencies.
Point ModelSim to the pslse veriuser.sl
When ModelSim starts it will create
simulation/modelsim/modelsim.ini in your Quartus project directory. Open this file and search for
Veriuser. Add a line to this file that sets Veriuser to a path to the
veriuser.sl you compiled within the
afu_driver/src directory of the pslse repo. Example below:
Veriuser = /tmp/pslse/afu_driver/src/veriuser.sl
Back in ModelSim, go to
Compile Options and hit
OK so that ModelSim will reload the configuration. Unfortunately, this file is overwritten when you open ModelSim later, so you’ll need to do this each time you open ModelSim unless you modify the template at
~/altera/15.1/modelsim_ase/modelsim.ini, which would affect all of your Quartus projects.
Power up the AFU
In ModelSim, go to
Start Simulation. In the window that comes up, expand the
work node, select the
top module and hit
Once started, the Transcript box should end with something like
Errors: 0, Warnings: 0. If you do get an error, it might be because the
veriuser.sl was compiled for 64bit architecture but is being loaded by a 32bit application.
Now that the simulation is prepared, we can run it via
Continue. After a short bit ModelSim should output a message in the transcript that reads something like
# AFU Server is waiting for connection on localhost:32768 and ModelSim might appear to freeze up, though it actually seems to be blocking on a connection attempt.
Open a terminal and go into the
pslse/ directory within the pslse repo. Check
shim_host.dat to make sure the port matches what the AFU server is waiting on. Then kick off the pslse server with
./pslse. Once ModelSim connects, the window should become responsive again. Within ModelSim, use
All to keep the simulation going.
In the terminal you have pslse running you should get some output like this:
INFO:PSLSE version 1.002 compiled @ Jan 11 2016 11:00:04 INFO:PSLSE parm values: Seed = 13 Timeout = 10 seconds Response = 16% Paged = 3% Reorder = 86% Buffer = 82% INFO:Attempting to connect AFU: afu0.0 @ localhost:32768 PSL_SOCKET: Using PSL protocol level : 0.9908.0 INFO:Clocking afu0.0 INFO:Started PSLSE server, listening on localhost:16384 INFO:Stopping clocks to afu0.0
At this point, ModelSim and PSLSE are both ready for a userspace application to put them to good use.
Run the userspace application that utilizes the AFU
Go into the
capi-memcpy folder extracted from the MemcopyDemoKit archive. Edit the
Makefile included in this folder to set the
PSLSE_DIR variable to the
libcxl directory within the pslse repo. Run
make to build the application.
Since the libcxl library isn’t installed to the systems library path, you will need to set
LD_LIBRARY_PATH to the same path you pointed
PSLSE_DIR to before the program will be able to run.
That should do it! My output is something like this (duplicate lines removed for brevity):
INFO:Connecting to host 'localhost' port 16384 Using seed 1452713916 Starting copy of 8192 bytes from 0x00000000015bc580 to 0x00000000015be600 Timeout after 1 seconds waiting for AFU to start Command events: 0x000000002284d401:0x00000000015bc480:0x8000008000000000 0x0000000022: Tag:0x09,1 Command:0x0a00,1 Addr:0x00000000015bc480,1 abt:0 cch:0x0 size:128 0xffffffffff: Tag:0xff,1 Command:0x1fff,1 Addr:0xffffffffffffffff,1 abt:7 cch:0xffff size:4095
0xffffffffff: Tag:0xff,1 Command:0x1fff,1 Addr:0xffffffffffffffff,1 abt:7 cch:0xffff size:4095 0xffffffffff: Response events: 0x000000002c: Tag:0x09,1 Code:0x08 credits:1 0xffffffffff: Tag:0xff,1 Code:0xff credits:7
0xffffffffff: Tag:0xff,1 Code:0xff credits:7 0xffffffffff: Control events: 0x0000000002:0x0000000002: Done, Error:0x0000000000000000 0x0000000004:
I’ve not fully wrapped my head around how this works nor verified if it’s actually copying blocks of memory as it alludes to, though it does appear to be a working setup. As I gain some more knowledge with VHDL/Verilog and the CAPI technology I hope to produce an easily understood breakdown of what is going on here and how a developer could dabble with their own designs. Stay tuned.