Senior Project’s All Done
As a quick overview of my senior project, it was to design a stack-based processor (the OZ-4) and compare its performance and architecture to that of a register-based processor (the OZ-3) that I designed about a year and a half ago. I decided to do this project because I was rather unfamiliar with stack-based architectures, and I wanted to explore them and see just how well they perform. To make these processors a reality, I implemented both of them, as well as a small system around them, to test their performance. Since I was also exploring stack-based processors, I programmed the OZ-4 to generate Mandelbrot fractals, as it did in this picture:
Since I last posted here in March, I’ve finished my senior project and presented it at my high school on one of the senior project nights, as you can see in the video above. At the end of March, I had almost finished the VHDL code for the stack-based OZ-4. It was only about a week after that that I had finished the OZ-4 without running into too much trouble, and then I started programming it to generate Mandelbrot fractals.
This was rather difficult. I had to get used to moving things around the stack and keeping them in the right places while also performing the calculations and setting up the stack for the next iteration. For the actual calculation of each iteration on a certain pixel, I changed the order in which some things are calculated so that they can be on the stack in such a way that when the iteration is all said and done, the stack is ready for the next one. At the end of it all, I had a 76-instruction loop that was the core of the Mandelbrot calculations. It rand the actual calculation, checked the various exit conditions, and moved the stack around appropriately. As I was making this program, I discovered that there was a fair amount of wasted cycles when programming a stack-based processor, and those cycles were dedicated to manipulating the stack. Those operations don’t actually do much to advance the program in the right direction, since they aren’t actually doing calculations.
I finished the Mandelbrot generator after letting the project sit around for about a month while I finished up the school year. It generates the image you see in the video in about twenty seconds, which is a bit slow compared to other Mandelbrot generators. However, I think that I could have clocked the processor faster than 25 MHz and taken that number down. At that rate, it was calculating about 15,000 pixels per second. In the end, I wasn’t really out to make the speediest Mandelbrot generator, I was just learning about how to program stack-based processors.
Now, since my project is officially a comparison of stack- and register-based processors, I compared the OZ-3 and OZ-4. The OZ-3 is a register-based, pipelined processor that I designed a couple years ago, then implemented over the year after I finished designing it. To make a fair comparison, the two processors had very similar I/O capabilities, so they could be plugged into the same system as the other. They ran at the same frequency, 25 MHz, and they had access to the same amount of memory, and had the same ALU for performing the complex fixed-point multiplication that they would need to generate Mandelbrot fractals. Here’s the system that they were in:
Unfortunately, I didn’t have time to get the OZ-3 to generate Mandelbrot fractals or use the mouse, but I compared the two processors in other ways. I broke down the results into the table below. The four programs they ran were simply counting to 100, finding all of the prime numbers from 0 to 100, calculating the first 50 numbers in the Fibonacci sequence, and calculating a math expression.
| OZ-3 | OZ-4 | |
| Count to 100 | 102 cycles | 202 cycles |
| Prime Numbers | 3.0 ms | 3.8 ms |
| Fibonacci | 153 cycles | 151 cycles |
| Math Expression | 19 cycles | 19 cycles |
(((5 + 3)6) + 24) – ((33 – 6)2) / (7 + (2*4)) = ?
For reference, that’s the math expression I had the processors calculate. As you can see, the OZ-3, the register-based processor, performed better than the OZ-4 overall. I was surprised about this. At the start of this project, I had just assumed that stack-based processors are faster than register-based processors. This was not the case here. The assumptions came from the fact that it seemed like stacks are used a lot in modern computing, and that HP calculators, with their stacks, can usually do math faster than other calculators. Nothing solid to base my hypothesis on, and it was proven false here.
I found a few reasons for this result. First, the stack-based OZ-4 wasted quite a bit of time just moving the stack around so that it could work on the right numbers, and this is one of the main faults of stack-based processors. Register-based architectures allow access to all of the data in the register at any time, and are thus more flexible, whereas the stack denies access to all but a few pieces of information at any one time. Next, the instruction set of the OZ-4 was somewhat limiting. It wasn’t very flexible, and basic functions like branches and jumps took two to three cycles each, as opposed to one or two. I did this to keep in line with the zero-operand instruction sets of many other stack-based processors, but that certainly didn’t create an advantage.
To look at a specific example, the reason the OZ-3 ran twice as fast in counting to 100 is that the OZ-3 could run this instruction in one cycle: “addi r1, r1, 1″, which adds 1 to register 1. The OZ-4, though, had to use one instruction to push 1 onto the stack, then another to add it to the counter. Another one to look at is evaluating the math expression. I thought the OZ-4 would cream the OZ-3 in this test, but they were exactly equal. The stack-based architecture can easily and intuitively evaluate that expression from the inside out, keeping sub-results on the stack until they need to be combined into the full result. But, because of the flexibility of register-based architectures, I was actually able to program the OZ-3 to use its registers as a sort of pseudo-stack, and it could keep the same performance as the OZ-4.
However, just looking at their performance doesn’t provide a full comparison. The OZ-4 may be slower in software, but its hardware is significantly smaller and simpler than the OZ-3, and this would, in theory, let it run at a higher frequency than the OZ-3. I didn’t test this specifically, but it is one advantage the OZ-4 has. This also made it easier to design the OZ-4. The second big advantage the OZ-4 has over the OZ-3 is program size. The OZ-3′s instructions are a full 32 bits long because they need an opcode and three fields to specify the source and destination registers. In other addressing modes, the space is taken up by immediate values. The OZ-4′s instructions, though, are only twelve bits long, and could even be six bits long 99% of the time. The first six bits are the opcode, and the last six bits are for specifying which immediate value is to be used in the event of a PUSH instruction. The instructions are shorter because the operands are implicitly on the stack, and don’t need to be encoded into the instruction.
Finally, here are the block diagrams for both processors:
The OZ-4′s:
The OZ-3′s:

Ben Oztalay


