FPGA meets 6502

So I’m working on designing my own homebrew 6502-based microcomputer. I have a lot of ideas for features I’d like, but before I make too many crazy decisions I’d like to solidify my understanding of the processor I’m building the whole system around. To this end, I am grabbing my Arty-A7, a W65C02S mounted on a breadboard and a bag of jumper cables to wire them together.

Planning Phase

I chose to use my Arty-A7 FPGA board because it has a ton of I/O pins, it can easily meet my performance requirements and because I just plain love tinkering with FPGAs.

This will take a smidge of organization to get started. I began by taking an inventory of the various pin functions on the W65C02S package.

Pin Diagram for W65C02S from Western Design Center Product Documentation

Generally speaking, there are 16 pins for the address bus, 8 pins for the data bus, 13 control pins, 2 power pins and 1 no-connect. This means I’ll need to wire 37 IO pins + 2 power pins to my FPGA board. With that in mind I’ve decided to use a mix of the PMOD interfaces on my Arty board as well as the Arduino-shield pins next to them. Each PMOD has 8 IO pins so I’ll use 1 PMOD for the data bus and 2 for the address bus, with the control pins on adjacent shield header. I’ll pull power from one of the PMODs as well.

Rough Pin Planning

With my rough plan in place, I took some notes on how my 6502 pins will map to the specific pins. I planned for which PMOD pin each bit of the address and data buses will go, along with associating each control pin to a numbered shield pin. Now for the fun part!

Wired Up

Now that the rough planning is complete, I carefully use the wire jumpers to connect everything as planned. I wired one pin at a time roughly starting from the top of the 6502 and working down, it’s a bit of a mess but it does the job.

The resulting mess of wires

Now that the plumbing is in place, the real fun begins. I start by opening up Vivado and begin getting these pins mapped into a design. I’ll create a new top.sv file to kick off the design.

`timescale 1ns / 1ps

module top(
// Arty stuff
input logic clk_100mhz,
input logic reset,

// 6502 stuff
input logic [15:0] address,
inout logic [7:0] data,
input logic vpb,
input logic phi1o,
input logic phi2o,
input logic rwb,
input logic mlb,
input logic sync,
output logic resb,
output logic sob,
output logic phi2,
output logic be,
output logic irqb,
output logic nmib,
output logic rdy
);

endmodule

The next step is to tediously map the various pins I planned out to the pins on the pins I’ve defined in my top module. The PMOD pins are easily found in the reference wiki for the board, but I didn’t see the IO shield pins on the primary documentation page, so I referred to the provided schematic files for that information. With all that pin mapping information, I added the pins properties into a constraints file.

A portion of the XDC Constraint File

At this point, everything looks to be ready for tinkering. As an initial experiment to validate that it is working, I’ll use Vivado’s VIO (Virtual Input/Output) IP block to manually probe and interact with these pins. Though, just before I generate that, I’ll add a few more lines to my top.sv to support bi-directional use of the data bus. For the 6502, when the rwb pin is high, the CPU is reading from the data bus, so I’ll use that as the trigger to put my write_data on that data bus during those times.

  logic [7:0] write_data;
assign data = (rwb) ? write_data : 'bZ,
sob = 1,
be = 1,
irqb = 1,
nmib = 1,
rdy = 1;

Once I am running on real hardware, I’ll be stepping through the reset logic for the 6502. So for this project irqb, be, nmib, rdy and sob can all stay at 1 for now, as I won’t be using them.

Now seems like the time to start building the VIO block. I’ll add ports for the various inputs and outputs, including the register I just added for bi-directional communication on the data bus.

I’ll let Vivado synthesize the core, and I’ll add it to the top module.

vio_0 debug_core (
.clk(clk_100mhz), // input wire clk
.probe_in0(address), // input wire [15 : 0] probe_in0
.probe_in1(data), // input wire [7 : 0] probe_in1
.probe_in2(vpb), // input wire [0 : 0] probe_in2
.probe_in3(phi1o), // input wire [0 : 0] probe_in3
.probe_in4(phi2o), // input wire [0 : 0] probe_in4
.probe_in5(rwb), // input wire [0 : 0] probe_in5
.probe_in6(mlb), // input wire [0 : 0] probe_in6
.probe_in7(sync), // input wire [0 : 0] probe_in7
.probe_out0(write_data), // output wire [7 : 0] probe_out0
.probe_out1(resb), // output wire [0 : 0] probe_out1
.probe_out2(phi2) // output wire [0 : 0] probe_out2
)

Now it is time for the moment of truth!

Playing with a Running CPU

With all that in place, it’s time to build everything and start playing around. After running synthesis, implementation and generating a bitfile, I can flash my new design and debug core to the Arty board and pull up the VIO interface that shows me the current state of things.

VIO showing probe states

At this point a bit more knowledge on the how the 6502 operates is necessary.  The phi2 signal is the input clock for the CPU. The core of this 6502 is fully static, so this clock can be stopped at anytime and the state within the CPU preserved. This allows me to interact and step the clock manually through VIO, even though my manual input will be extremely slow for interacting with hardware.

Since the clock looks good, I’m going to perform a reset of the 6502. Near the end of the reset cycle I’m expecting the address bus to read for the reset vector, a pointer to where the CPU should begin execution after reset. For the timing requirements on getting a reset to work properly, the documentation from WDC states specifically what to do:

3.11
Reset (RESB)
The Reset (RESB) input is used to initialize the microprocessor and start program execution. The RESB signal must be held low for at least two clock cycles after VDD reaches operating voltage. 

When a positive edge is detected, there will be a reset sequence lasting seven clock cycles. The program counter is loaded with the reset vector from locations FFFC (low byte) and FFFD (high byte). This is the start location for program control. RESB should be held high after reset for normal operation.

Western Design Center, Inc., W65C02S Datasheet

So I’ll hold reset low for at least 2 cycles, then set it back to high. I’ll continue to manually cycle the clock until I see CPU begin to fetch the RESET vector. As that vector is read, I will manually give it the address 0xDEAD to begin execution at. Then, to keep it running I’ll give it the opcode 0xEA (NOP) so it continues to read empty instructions over the bus. It’s worth noting that the 6502 does most things in alignment to the falling edge of the phi2 clock.

The 6502 has been reset and is running the instructions it’s reading from the FPGA! In the future I’ll extend this design to better simulate an imaginary system. That’ll be my stopping place for today, if you have any questions or feedback please leave a comment!

Beginning Logic Design – Part 14

Hello and welcome to Part 14 of my Beginning Logic Design series! In the last episode, I added my ALU operations. For this round, I want to add implement some operators for manipulating a stack and some handling for calling subroutines. Let’s jump to it!

My Stack System

The stack pointer of my cpu will keep track of the “top” of the stack. Most CPUs have a stack that grows “down”, but my CPU already has a lot of inefficiencies and I’m feeling rebellious so my stack will grow up! I current reset the stack to 0 on reset, so at the start of a program it should be ready to go.

I’ll use the first few available opcodes from my EXTRA operation family for my stack related functions.

F0: push A
F1: push B
F2: push C
F3: pop A
F4: pop B
F5: pop C

As before I’ll start by roughly mocking out this organization in my PERFORM state

EXTRA: begin
  case (instruction[3:0])
    // Push A
    0: begin
      
    end
    // Push B
    1: begin
      
    end
    // Push C
    2: begin
      
    end
    // Pop A
    3: begin
      
    end
    // Pop B
    4: begin
      
    end
    // Pop C
    5: begin
      
    end
  endcase
end

Now I’ll start on the PUSH A operations. I’ll need to write A to the memory address my stack pointer is currently set to, then increment the stack pointer. Since this involves some bus interactions it’ll take two cycles.

On the first I’ll put the A register value in the write_data register, set the address_bus to my stack pointer and enable write.

For the second cycle, I’ll clear my write signal, increment my stack and return to FETCH to continue my program, easy as that!

0: begin
  case (cycle)
    0: begin
      write_data <= a;
      address_bus <= stack;
      write <= 1;
    end
    1: begin
      write <= 0;
      stack++;
      state <= FETCH;
      program_counter++;
    end
  endcase
end

And by the magic of copy-pasta, I extend this to my other two registers.

// Push B
1: begin
  case (cycle)
    0: begin
      write_data <= b;
      address_bus <= stack;
      write <= 1;
    end
    1: begin
      write <= 0;
      stack++;
      state <= FETCH;
      program_counter++;
    end
  endcase
end
// Push C
2: begin
  case (cycle)
    0: begin
      write_data <= c;
      address_bus <= stack;
      write <= 1;
    end
    1: begin
      write <= 0;
      stack++;
      state <= FETCH;
      program_counter++;
    end
  endcase
end

Now for the inverse operation POP. This means performing a read with the decremented stack pointer and storing that into the desired register, which will also be two cycles. On the first I’ll predecrement stack as I set the address_bus to it. On the second I’ll clear my read, store the returned value and go back into FETCH.

// Pop A
3: begin
  case (cycle)
    0: begin
      address_bus <= --stack;
      read <= 1;
    end
    1: begin
      read <= 0;
      a <= data_bus;
      state <= FETCH;
      program_counter++;
    end
  endcase
end

I honestly didn’t think implementing push and pop would be quite so easy, everything was working well on the first attempt.  As before I’ll copy my way through to implement this for B and C.

// Pop B
4: begin
  case (cycle)
    0: begin
      address_bus <= --stack;
      read <= 1;
    end
    1: begin
      read <= 0;
      b <= data_bus;
      state <= FETCH;
      program_counter++;
    end
  endcase
end
// Pop C
5: begin
  case (cycle)
    0: begin
      address_bus <= --stack;
      read <= 1;
    end
    1: begin
      read <= 0;
      c <= data_bus;
      state <= FETCH;
      program_counter++;
    end
  endcase
end



Subroutines

The next two instructions I want to implement are an operation that jumps into a subroutine and a paired operator that returns from that subroutine. I’ll try to keep these operations pretty simple. I’ll first stub out my opcodes.

// Jump subroutine
6: begin
  case (cycle)
    
  endcase
end
// Return from subroutine
7: begin
  case (cycle)
    
  endcase
end

For my JSR operation (jump to subroutine), I’ll first push my next instruction address to the top of my stack, then jump the program to the next address. This will take 4 total bus interactions so my current 2-bit cycle variable will not allow for this, I’ll modify my cycle to 3-bits so it can count to 8 and start implementing.

Pretty quickly intro drafting my implementation of this, and right after gloating how easy push/pop was to implement, I noticed this one was going to be a bit trickier! The first thing I need to do is calculate the address of the next instruction and push the most significant byte to the stack.

0: begin
  write <= 1;
  address_bus <= stack;
  program_counter += 3;
  write_data <= program_counter[15:8];
end

On the next cycle, I complete the return address right by setting the next stack byte to the least significant byte.

1: begin
  address_bus <= stack + 1;
  write_data <= program_counter[7:0];
end

With the pointer written to the stack, I’ll begin reading the next pointer to jump to and increment my stack by the length of the pointer (2 bytes). Since my program counter is now ahead of the pointer to jump to, I need to look back 2 bytes for the most significant byte of the subroutine’s address.

2: begin
  write <= 0;
  read <= 1;
  address_bus <= program_counter - 2;
  stack += 2;
end

I’ll store the returned most signifcant byte for the subroutine in my x register and request the next byte.

3: begin
  x <= data_bus;
  address_bus <= program_counter - 1;
end

Then finally I’ll be done with the bus and can jump into the subroutine.

4: begin
  read <= 0;
  program_counter <= {x, data_bus};
  state <= FETCH;
end

Phew! I had a few issues with implementing this at first, primarily from not managing my pointers properly. With time, patience and debugging in the simulator it did eventually work out.

The ReTurn from Subroutine (RTS) thankfully is a bit easier, and will only take three cycles. First I’ll begin the read for the least significant byte of where to jump back to.

0: begin
  read <= 1;
  address_bus <= --stack;
end

On the second cycle, I’ll store that byte in x and read the most significant byte of the return pointer.

1: begin
  address_bus <= --stack;
  x <= data_bus;
end

On the last cycle we can stop the read and jump to the return pointer!

2: begin
  read <= 0;
  program_counter <= {data_bus, x};
  state <= FETCH;
end

That’ll do it! I’ll use this program to test it, annotated with addresses and comments for brevity:

8000: c0 de     ; Set A = 0xDE
8002: f0        ; Push A to stack
8003: f6 80 07  ; Jump into subroutine at 0x8007
8006: e0        ; Halt machine
8007: c1 20     ; Set B = 0x20
8009: c2 17     ; Set C = 0x12
800b: f7        ; Return

In simulation it works like a charm!

With that working I am done with the initial set of goals I had for this CPU, and this series along with that! I hope some folks have found this series interesting and/or useful. If you have any improvements to suggest or would like me to cover the implementation of any of this in further detail please leave a note in the comments. Keep tinkering!!

Beginning Logic Design – Part 13

Hello and welcome to Part 13 of my Beginning Logic Design series! In the last post I implemented my branch instructions. For this round, I want to implement my ALU operations.

ALU Instructions and Arguments

For my ALU, I want to follow a slightly different pattern for my arguments. In the instructions implemented so far the lower 4 bits of the instruction represented a certain operation within the instruction family. For the ALU operations I’d like to use these 4 bits to instead represent the operands of the instruction.

With the 4  bits available, I’ll use 2 bits to encode each operand with the following representations:

00 - A register
01 - B register
10 - C register
11 - Unused

So the overall format (in binary) of these ALU instructions will be iiiiaabb. Where i represents the instruction, a the first encoded operand and b for the second.

For all of the ALU instructions, I will use the second operand to indicate where the result will be stored. The instructions ADD, SUBTRACT, BIT_AND, BIT_OR and BIT_XOR all use two operands, so the second operand is used in the instruction and is where the result is stored. For the remaining operations INCREMENT, DECREMENT, BIT_NOT, SHIFT_LEFT, SHIFT_RIGHT, ROTATE_LEFT and ROTATE_RIGHT the first operand is used in the operation and the second is where the result is to be stored.



Wiring up the ALU

The first thing I’ll need to build instructions for the ALU, will be to actually include it in the processor!

First, near the top of my cpu.sv file I’ll include my ALU package.

import ALU::*;

Next, inside my cpu module, just under the other internal declarations, I’ll add signals to interface with my ALU and the ALU instance itself.

// ALU signals and module
logic alu_clock;
opcode alu_operation;
logic [7:0] alu_a;
logic [7:0] alu_b;
logic alu_carry_in;
logic [7:0] alu_y;
logic alu_zero;
logic alu_sign;
logic alu_carry_out;
logic alu_overflow;
assign alu_clock = !clock;
alu cpu_alu (
  alu_clock,
  alu_operation,
  alu_a,
  alu_b,
  alu_carry_in,
  alu_y,
  alu_zero,
  alu_sign,
  alu_carry_out,
  alu_overflow
);

I’ve set my alu_clock to follow an inverted clock similar to how the system bus operates.

Next, within my FETCH  CPU state, I’ll add another $cast() call to set my alu_operation to be upper four bits of my current instruction, just like I have for my op_type since I mapped the same values for CPU and ALU operations. There are some possible edge cases where the CPU operation will map to a number that has no meaning to the ALU, so we’ll add a sanity check to make sure it’s within the supported range.

if (data_bus[7:4] < 15)
 $cast(alu_operation, data_bus[7:4]);

That’ll get the basics in place for the ALU.

Implementing ALU operations

The first instruction I want to get setup is the ADD instruction.

In the first cycle of ADD, I’ll set my ALU variables to match the registers specified in the instruction as well as passing in our current carry flag:

CPU_ADD: begin
  case(cycle)
    0: begin
      case(instruction[3:2])
        0: alu_a <= a;
        1: alu_a <= b;
        2: alu_a <= c;
      endcase
      case(instruction[1:0])
        0: alu_b <= a;
        1: alu_b <= b;
        2: alu_b <= c;
      endcase
      alu_carry_in <= carry;
    end
  endcase
end

On the next cycle our ALU will have presented its results so we can, in a similar fashion, store the result and set the modified flags.

1: begin
  case(instruction[1:0])
    0: a <= alu_y;
    1: b <= alu_y;
    2: c <= alu_y;
  endcase
  carry <= alu_carry_out;
  zero <= alu_zero;
  sign <= alu_sign;
  overflow <= alu_overflow;
  program_counter++;
  state <= FETCH;
end

Getting the ADD to work was just that easy, but better yet this pattern also works for SUBTRACT! We can just let both operations follow this same case statement.

CPU_ADD, CPU_SUBTRACT: begin
  case(cycle)
    0: begin
      case(instruction[3:2])
        0: alu_a <= a;
        1: alu_a <= b;
        2: alu_a <= c;
      endcase
      case(instruction[1:0])
        0: alu_b <= a;
        1: alu_b <= b;
        2: alu_b <= c;
      endcase
      alu_carry_in <= carry;
    end
    1: begin
      case(instruction[1:0])
        0: a <= alu_y;
        1: b <= alu_y;
        2: c <= alu_y;
      endcase
      carry <= alu_carry_out;
      zero <= alu_zero;
      sign <= alu_sign;
      overflow <= alu_overflow;
      program_counter++;
      state <= FETCH;
    end
  endcase
end

It almost supports SHIFT_RIGHT, ROTATE_LEFT and ROTATE_RIGHT too, as these operations should also set most of these same flags. The issue is that ADD and SUBTRACT affect the overflow flag, so I’ll use my powers of copy-pasta to separate those into a case that doesn’t set overflow, but is otherwise identical.

CPU_SHIFT_RIGHT, CPU_ROTATE_LEFT, CPU_ROTATE_RIGHT: begin
  case(cycle)
    0: begin
      case(instruction[3:2])
        0: alu_a <= a;
        1: alu_a <= b;
        2: alu_a <= c;
      endcase
      case(instruction[1:0])
        0: alu_b <= a;
        1: alu_b <= b;
        2: alu_b <= c;
      endcase
      alu_carry_in <= carry;
    end
    1: begin
      case(instruction[1:0])
        0: a <= alu_y;
        1: b <= alu_y;
        2: c <= alu_y;
      endcase
      carry <= alu_carry_out;
      zero <= alu_zero;
      sign <= alu_sign;
      program_counter++;
      state <= FETCH;
    end
  endcase
end

That’s 5 of the 12 operations already. The last 7 can also be bundled into the same case statement, the only difference for them is they don’t care what the carry flag is set to, and they only affect the sign and zero flags.

CPU_INCREMENT, CPU_DECREMENT, CPU_AND, CPU_OR, CPU_XOR, CPU_NOR, CPU_SHIFT_LEFT: begin
  case(cycle)
    0: begin
      case(instruction[3:2])
        0: alu_a <= a;
        1: alu_a <= b;
        2: alu_a <= c;
      endcase
      case(instruction[1:0])
        0: alu_b <= a;
        1: alu_b <= b;
        2: alu_b <= c;
      endcase
    end
    1: begin
      case(instruction[1:0])
        0: a <= alu_y;
        1: b <= alu_y;
        2: c <= alu_y;
      endcase
      zero <= alu_zero;
      sign <= alu_sign;
      program_counter++;
      state <= FETCH;
    end
  endcase
end

Huzzah! We can now utilize our ALU operations via our CPU program code. In the next post I will add some operations to my CPU to include stack functionality and operations that can be used to call subroutines. As always, I welcome your feedback and questions in the comments. Keep tinkering!