A4 - Saving & Loading

From VO-EM Wiki
Jump to: navigation, search

In the previous tutorials, all of the processing we did was stored within the CPU's registers. However, in order to communicate with the other devices connected to it, the CPU needs to be able to save information to, and load information from, other memory addresses.


The VO-EM console has many devices inside of it. Each device exists within a range of memory addresses. When we assemble programs using the assembler, and load them via the "load dlx" button on the debugger, our programs exist in the space between address 0x0, and 0x7FFF. (If you hadn't noticed by now, the '0x' means the following number is in Hexadecimal.) Physically, you can imagine that they are on a cartridge, as seen in many old consoles.

The cartridge memory is ROM, meaning that while you can read data from it, you cannot write data to it. Quite often, a need arises to temporarily store data somewhere other than the CPU's registers. Maybe we've used up all 32 registers, or maybe we're just jumping to another part of the program which wants to use the registers that our data is in. For this, we use RAM.

The VO-EM console has two banks of RAM, but the smaller, first bank is the one we'll be using for this tutorial. It exists in the address space between 0x8000 and 0xFFFF. RAM is very useful, because we can read and write data to it freely. However, whenever the device is reset, RAM will be wiped, meaning it's not useful for storing things like save data or high score tables.

Loading data

There are a few different ways to load data into a register. The simplest way is to just use an addi, or "add immediate". When using addi, the value that we want to load is encoded right into the instruction. So, for example, if we write

           addi   r1,r0,5

then, the number 5 will be put straight into the instruction:

20 01 00 05   
          ^--There it is!

Even if we use .equ to define 5 elsewhere, as mentioned last time

a     .equ    5
      addi    r1,r0,a 

the resultant machine code won't be any different, because the assembler just replaces 'a' with '5'.

In the branching tutorial, we used .word to define our arguments, and loaded them with lw. You've probably guessed by now, but when we use

a     .word   5
      lw      r1,a

What's inserted into the opcode is not the number 5, but the value of a from the symbol table. So, in our multiplication program, where 'a' was '0' in the symbol table, the resultant instruction was

8C 01 00 00
          ^--There it is!

You may be wondering what the point of having two methods for loading things is, or you may well have already figured it out. There are actually two reasons. The first is a technical consideration.

When we want to load very large numbers, "addi" is not a very good solution. The reason for this is the way it's encoded as an instruction. Remember that it's an I-format instruction:

|_____||_____||____|  |__________________|
   |      |      |              |
Opcode    i      j          Kuns/Ksgn

Which means that there are only 16 bits to store the number we want to add to our register. Since our registers hold 32 bits, we can only fill up half the register using an add command. Trying to load any number larger than 0xFFFF will cause the assembler to throw an error, as it won't be able to fit our instruction into 4 bytes.

So instead, we put the value into our actual progam with .word, and then load it using load word (lw):

bignum    .equ  16#10000
          addi  r1,r0,16#10000    ;causes an error
          addi  r1,r0,bignum      ;also causes an error
bigword   .word 16#10000  
          lw    r1,bigword        ;works just fine

The second time we want to use a load instead of an addi is when we want to load a value from a different device. ROM, as you remember, is unchanging. It can't be written to. However, there are other devices in the computer whose values may change.

For example, there exists at address 0xFFFFFF04 the timer device, which counts up once every CPU cycle. Loading this address with lw at different times will yield a different number. At 0xFFFFFF00 sits the gamepad device. Its value changes based on which buttons the user is pressing.


Saving works backwards to loading. Where loading reads the value of a memory address into a specified CPU register, saving writes your CPU register to the device at a specified memory address.

There are many reasons to save data - often you will just be saving information to RAM for safekeeping, but saving is also the main way that the CPU gives instructions to other devices.

Saving to certain addresses within the GPU, for example, can be used to put images on the screen, change the colour of objects, and the like.

Limitations & Workarounds

Since RAM starts at 0x8000, that is the address that we will want to save to.

However, if you write the following

RAM      .equ    16#8000    ;<--- we use 16# to tell the compiler it's hex
         addi    r1,r0,16#50
         sw      RAM,r1

and attempt to assemble it, you'll see that pesky "Expression value overflows" error once again. To see why that happens, have a look at the way "sw" is encoded; it's an I-format instruction, meaning it looks like

|_____||_____||____|  |__________________|
   |      |      |              |
Opcode    i      j          Kuns/Ksgn

and what's more, it uses the signed value of "K". This means that the highest positive number it can accept in the target field is 0x7FFF.

But hold on - there's no such thing as a negative memory address. So, then, why is the value signed? The answer is in the "i" part of the opcode. When we try to save or load a value, the target we give is actually added to a register. If we don't tell the assembler what register to use, it assumes we want r0 (eg, we want to add the number to zero).

We can choose the register to add to by putting it in brackets next to the target value.

So when we write

        sw     RAM,r1

The assembler assumes we want

        sw     RAM(r0),r1

Which translates to

Save the contents of r1 into the memory at the address RAM + the contents of r0.

It may have just clicked how we can load and save values into RAM - we need to store the RAM offset in a register.

So, first we specify our RAM offset:

RAM_OFFSET    .word     16#8000

Then, we load it up

              lw        r30,RAM_OFFSET

I put it in r30 so it's nice and out of the way. Next, we load up our values that we wanted to save in RAM:

              addi      r1,r0,4

And then, we save it!

              sw        (r30),r1 

Just to make sure it worked, let's load the number we just stored back into register 2:

              lw        r2,(r30)

It's exactly the same, just backwards!


Let's just try this out to make sure it's all working - here's the full program:

RAM_OFFSET   .word    16#8000
             lw       r30,RAM_OFFSET
             addi     r1,r0,4
             sw       (r30),r1
             lw       r2,(r30)

I saved mine as "saveload.dls" and assembled it with

java -jar dasm.jar saveload.dls -a -l

I used -l so I could get a good look at what it assembled to.

Try loading the dlx file up in the debugger and running it. You'll need to switch to page 3 on the debugger (the memory status page) by pressing "3", and then press "e" to jump to the first bank of RAM. You should see that address 0x8000 now contains 00 00 00 04, meaning we successfully saved our register contents into RAM. Switching back to the CPU status page (press "1") will show us that register 2 does indeed contain the value from RAM.

Keeping track of multiple values

So, if we can use a register as the offset by putting it in brackets, what's the point of being able to use a given value? Well, one use for it is to conveniently keep track of many values. For example, we could say

RAM_OFFSET  .word   16#8000
a           .equ   0
b           .equ   4
c           .equ   8  
            lw     r30,RAM_OFFSET

and then use our new a, b, and c offsets to store into different parts of RAM. The reason they increase by 4 is because we're saving 4-byte words.

Let's try using them:

            addi   r1,r0,3
            sw     a(r30),r1
            addi   r1,r0,8
            sw     b(r30),r1
            addi   r1,r0,3000
            sw     c(r30),r1

We'll just put a 'halt' on there and assemble away:

RAM_OFFSET  .word  16#8000
a           .equ   0
b           .equ   4
c           .equ   8  
            lw     r30,RAM_OFFSET
            addi   r1,r0,3
            sw     a(r30),r1
            addi   r1,r0,8
            sw     b(r30),r1
            addi   r1,r0,3000
            sw     c(r30),r1

Using the debugger's memory status page once again should show us that all three of our values have been stored sequentially in RAM. Of course, loading them works the same way, for example:

            lw     r3,c(r30)

will load "8" into register 3.


This may seem like a lot of effort to accomplish a simple task, but using offsets is the very basis for the CPU's communication with the rest of the device. We will be using them for everything from managing device memory to displaying data on the screen, so it's important to have a working understanding of them.

In the next tutorial, we're going to learn about calling subroutines.


It's possible to load the RAM offset into r30 with just one arithmetic command. Can you figure out how? Hint, we're only one binary digit short when using a signed operation.

Bonus: Different kinds of saves & loads

While we have only been using "sw" and "lw", there are actually a few different kinds of save and load commands. The "w" in sw and lw stands for "word", meaning that it will load 4 bytes to completely fill the register. We can also use "h" for halfword, meaning it will load 2 bytes, filling the lower half of the register, and "b" for "byte", meaning it will just load one byte into the lowest byte of the register.

When using lw, the address we load from must be a multiple of 4, or the device will throw an exception and crash.

lh must be aligned to a multiple of 2, and lb can be any byte address.

Trap for new players

When using lh and lb, the signedness of the value you load is preserved. What this means is that the highest bit on the value you load will be copied into the unfilled parts of the register.

So, if you say

a    .byteu  10#128   ;(which is 1000 0000 in binary)(byteu means 'unsigned byte')
     lb      r1,a 

Then the value of r1 will be, in binary

1111 1111  1111 1111  1111 11111 1000 0000

This may seem unintuitive since you went to the trouble of declaring "a" as an unsigned byte, but the CPU doesn't know or care about typecasting - remember, it's just kicking numbers around.

So, if you want to load things without preserving the signedness, you need to use "lbu" and "lhu", or "load byte unsigned" and "load halfword unsigned". These commands will just fill the unused parts of the register with zeroes regardless of the value of the loaded information.


Finally, there is a command known as "lhi", or "load halfword immediate". This does not actually function like a load operation, and I prefer to refer to it as "load high".

What it does is takes a 16 bit value, and loads it into the top half of the given register. So, for example, if you say

       lhi   r1,1 

r1 will then contain

0000 0000  0000 0001  0000 0000  0000 0000

You can't use a register offset with this command (eg, you can't say lhi r1,1(r30)), so it really just functions as a strange addui instruction.