OZ-3 Final Post
I’m going to be making final posts for projects that will act as homepages and summaries of the projects. They’ll be posted on my homepage, as well as at the top of the project’s blog. So here goes, this is the final post for the OZ-3:
A couple summers ago, I designed a 32-bit processor at the gate level in a digital logic simulator called Logisim. It’s a reduced instruction set computer, implementing 32 instructions that are decoded and executed in a 5-stage pipeline. Those stages would be: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory and I/O (MEMIO), and Writeback (WB).
The ALU supports most common operations, such as shifts, rotates, addition, subtraction, and logic functions, and operates only with two’s-complement numbers. Technically, it deals with all numbers, it just interprets them as two’s-complement. To save on cycles, each stage of the processor can forward the results of an instruction back to the instruction decoder, so that the current instruction doesn’t have to wait for the previous instruction’s result to get to the writeback stage before it can be used. There are 32 32-bit registers organized as a register file, with register 0 wired to zero. The register file is inside the Instruction Decode stage of the pipeline so that the stage can easily read and write registers, as well as handle forwarding easily by using forwarded values instead of values that are in the registers.
I designed the instruction set architecture and created the instructions, producing my own opcodes and addressing modes. My main focus was reducing the amount of gate logic necessary to decode instructions, so the instructions and their opcodes are organized in a funny way to reach that goal. Also, all of the instructions are fetched in one cycle, so there are no instructions that would involve grabbing an immediate value or other data in the address following the instruction. I leaned toward simplicity most of the time because I was designing the processor at the gate level, so some functionality is limited, but I think I still retained a fair amount of flexibility in the design. For example, I lost some flexibility because I chose to separate the data and instruction storage, but that kept fetching instructions and making memory transactions simpler.
Input and output capabilities are 16 input pins, 16 output pins, a 32-bit input port, and a 32-bit output port. Extra components, like a multiplexer and a few registers, have to be used to expand the port I/O.
A block diagram of the inside of the processor shows how it’s organized:
One of the biggest challenges was to fit the OZ-3 in a system that included the peripherals on the development board I used, which was Digilent’s Nexys-2 board, which has a Xilinx Spartan-3E with 500,000 equivalent gates. The system, in the end, used the on-board RAM and Flash memory, to store data and instructions, respectively. The buttons, switches, and PS/2 port are open for input, while all of the PMOD connectors and the 7-segment display are used for output. The PMOD connectors are used mainly for the 2×16 character LCD screen, but a fair amount of other devices can be attached.
Using the memory resources (the RAM and Flash) was probably the most difficult roadblock to overcome when I was writing the VHDL. The main issue was that the RAM and Flash share address and data buses, so only one can be used at a time. However, instructions need to be fetched every cycle, and memory transactions would have to happen at the same time. Modifying the processor to accommodate this could get messy, so instead I created a memory controller that would alternate control of the buses between the RAM and the Flash. This cut how fast the processor could run in half, but those were just limitations of the system I was working in. The memory controller runs twice as fast as the processor to flip and flop between the memories in time.
The second issue was that the RAM and Flash are only sixteen bits wide. As it so happens, the OZ-3 has 32-bit instructions. I modified the fetch stage to run twice as fast as the rest of the processor to get in both halves of the instruction in time, but that actually meant that I would have to cut how fast the rest of the processor ran in half, as the Flash was already being run as fast as it can. I also had to change the way memory transactions are handled by adding instructions that write the upper or lower half of a 32-bit value into memory, and instructions that read an address in memory into the upper or lower half of a register.
Once complete, the system looked like this:
In terms of the software I developed for it, the most complex program I wrote was one that keeps a 1024-character buffer in memory of text that the user inputs through a keyboard. The LCD screen then displays two 16-character lines of the buffer. Using the number pad, the user can scroll up and down through the buffer, and left and right within a line. It’s fairly simple, but it took quite a while to write in assembly.
To develop software such as this, I wrote an assembler in C++. This was also my first experience in parsing something as complex as code that has labels, white space, and comments. Labels are just a way to mark a place in the program that can later be referred to, as a way of jumping back to a certain instruction. The assembler ignores all white space, so I was able to indent the code to make it more readable. Here’s a sample piece of code:
lbl MAIN_LOOP
#Reset the display update only flag
addi r10, r0, 0
cpi r7, 1
brne CURSOR_LINE_2
noop
noop
noop
#Set the cursor address if it's on line 1
opin0 9
opin1 8
addi r8, r2, 127
oprt r8
opin0 8
jp SKIP_LINE_2
noop
noop
noop
#Set the cursor address if it's on line 2
lbl CURSOR_LINE_2
opin0 9
opin1 8
addi r8, r2, 191
oprt r8
opin0 8
So, this is a good example of indenting, labels, and comments. There aren’t any in-line comments because they don’t appeal to me usually, but they would be valid if I were to use them. To create a loop, all that would need to be added to, say, the indented code under the “CURSOR_LINE_2″ label is “jp CURSOR_LINE_2″ and three no-op instructions. The reason the no-ops are needed is that when a branch or jump instruction is fetched, the condition that determines which branch is taken isn’t carried out until the instruction reaches the execute stage, which is three stages in. So, if there weren’t no-ops there, the instructions immediately following the branch instruction would be fed into the pipeline. If the branch instruction did end up being taken, the instructions following the branch instruction should be skipped. But if they’re already being executed, they could cause problems.
Well, that’s all for now, unless I think of something else I’d like to add to this post. This has been an awesome project that introduced me to logic and CPU design, hardware design, FPGAs, and VHDL. It taught me to read datasheets very carefully and thoroughly to save time and stress when attempting to interface with memory. It was really cool for me to design my own CPU, place it in a system to make a simple computer, write code for it in an assembly language that I created, then make it all happen in real life.
Ben Oztalay


The diagrams are very nice and even easy to understand. -thumbs up-