Graphics Processing Unit

From VO-EM Wiki
(Redirected from GPU)
Jump to: navigation, search

When it comes to putting images on the screen, the GPU does the bulk of the work.

Overview

The VO-EM CPU is very slow. It only runs at 72khz. As a result, it is not viable to draw data to the screen directly, pixel-by-pixel. Even without any other processing, it would take 3.3 seconds just to draw one frame!

Instead, we can write data to the GPU, and then tell the GPU how to organise this data on the screen. Thanks to that, it's possible to draw 30 frames per second, and still have time to do game logic processing as well.

Timing

Main article: Machine Cycle

The GPU works in 4 main phases. These occur in order, once every frame (30 times per second):

DMA phase 
The GPU will check for, and perform, pending DMA transfers 45 times per frame. The CPU performs up to 16 instructions between each poll.
Horizontal Blank 
The display buffer is written to in horizontal stripes. It will fire a horizontal blank Interrupt Request each time a stripe is complete. The default stripe size is 160 (one stripe per frame), however this can be configured at run-time to fire more often (at most, it will draw 160x1px-high stripes and fire 160 interrupts per frame). Up to 8 CPU instructions are performed per 1px stripe (So, if the fidelity is set to 2px, there will be 16 instructions. At default, there are 1280 instructions. It is possible to change the fidelity between horizontal blanks in the same frame.
Vertical Blank 
The image on the display is replaced by the image that we have been drawing for the last 1280 cycles, and an interrupt is fired.
Rest 
The GPU does nothing for 416 CPU cycles.

Image Data

The first 0x6000 bytes (about 24kb) of the GPU is set aside for image memory, with the first 0x3000 being reserved for sprites, and the second 0x3000 being reserved for tiles. This allows for a total of 128 each sprites and tiles to be stored in the GPU's memory at any one time.

Each image is 16x16 pixels, and can consist of up to 8 colours. The 8-colour palette is not specified in image memory; instead, an image is stored as an index of values, to be colourized at a later date.

The image format is as follows: Each *horizontal row* of pixels is represented by 6 bytes of memory.

//One row of indexed pixels (each alphanumeric pair = one bit)
a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7 d0 d1 d2 d3 d4 d5 d6 d7
e0 e1 e2 e3 e4 e5 e6 e7 f0 f1 f2 f3 f4 f5 f6 f7

The left-most bit is the left-most pixel in a the row, and the first byte in the row is the *least significant byte*. So, using the above example, the left most pixel would be a0c0e0.

So in the example:
0xxx xxxx xxxx xxxx
1xxx xxxx xxxx xxxx
0xxx xxxx xxxx xxxx
The left-most pixel is 010 (=2), meaning the left-most pixel will be colourised by the third (0, 1, 2) colour in the palette assigned to it.

With this format, each image takes up 96 bytes of space. Images are indexed in 96byte blocks, so if another module refers to the image at index 2, this would be the image data beginning at byte 288.

Palette

Palette data is stored from 0x6000 to 0x6199, taking up a total of 512 bytes of space, and allowing for a total of 32 palettes. The first 16 palettes are reserved for sprite use, and the second 16 (from 6100) may only be used for tiles.

Each palette contains, in order, 8 colours, with each colour being represented with two bytes. The colour format is as follows:

XRRR RRGG GGGB BBBB

Where X is (currently) unused. Best practice is to leave it as 0 for now.

Instance

Instances are the "sprites" of the device in the traditional sense. They can be drawn anywhere on the screen, and will be placed in front of the background tile layer, and behind the foreground tile and window layers.

A total of 64 instances can be managed, and they can all be drawn at any one time. They are manipulated with the values beginning in 0x6200 of GPU memory. Each instance takes up 4 bytes of space.

Instances are drawn in order from lowest to highest in memory. So, if both the instance at 0x6200 and the instance at 0x6204 are active and overlapping, the instance at 0x6204 will be drawn second and hence be on top.

Instances are organized as follows:

 Options     Image        x          y
TDVH PPPP 0III IIII XXXX XXXX YYYY YYYY
0 1 2 3

Options explained:

[T]ransparency 
Pixels indexed as 0 will not be drawn, instead of being colourized by the first colour in the palette.
[D]ouble 
This sprite will be drawn as 16x32 (double height). The bottom half of the sprite will be the next sprite in image memory. CAUTION: If this is set to 1, the least significant pit of the Image byte is ignored. (IE, if you try to draw a double-height instance starting at index 3, this will be ignored and it will be a double-height instance starting from image data index 2 instead)
[V]ertical flip 
The image will be flipped vertically.
[H]orizontal flip 
The image will be flipped horizontally.
[PPPP]alette 
A value between 0 and 15 which selects which palette to colourize this instance's image data with.
[III IIII]mage 
A value between 0 and 127 which determines which image in image data to draw this instance as.
XXXX XXXX/YYYY YYYY 
The sprite's position on the screen (0-255). This is offset to the top left by 16 pixels, meaning setting this value to 10 will draw the sprite half off the screen (in effect drawing to -6). Setting either X or Y to 0 will disable drawing for this instance, and is recommended when not needed, as it saves GPU time.

Tile Map

The memory space from 0x6400 to 0x69FF is used to organise tile-based images on the screen. Most commonly, it will be used for backgrounds and UI elements.

There are three layers of tiles available, with each being 16*16 tiles in size. Each tile is described by a two-byte halfword, meaning one layer of tiles is 512 bytes in total.

Tile map layers are not identical, although they are similar.

Tile0 
Starting at 0x6400, this layer is scrollable, but does not support transparency.
Tile1 
Starting at 0x6600, this layer is the same as Tile0, but supports transparency. It draws in front of instances.
Window 
Starting at 0x6800, this layer draws in front of everything. It does not support scrolling.

One tile in memory is organized as follows:


 Options     Image
VHT0 PPPP SIII IIII
0 1

[V]ertical flip 
The tile image will be flipped vertically.
[H]orizontal flip 
The tile image will be flipped horizontally.
[T]ransparency 
Pixels indexed as 0 will not be drawn, instead of being colourized by the first colour in the palette.
[PPPP]alette 
A value between 0 and 15 which selects which palette to colourize this tile's image data with.
[S]witch 
This must be set to draw tile 0. It is considered best practice to set this to 1 whenever you wish the tile to be drawn.
[III IIII]mage 
A value between 0 and 127 which determines which image in image data to draw this instance as. The tile will not be drawn if the entire lower byte is set to zero, so set S to 1 if you wish to draw the sprite at index zero. This is due to a hardware limitation.

Note that if the current layer does not support one of these options (such as transparency), said option simply does nothing.

Layers

The abovementioned tile maps are controlled by a set of registered starting at 0x6A00.

Each layer control register is 4 bytes long. There are 4 total registers, 1 each for the tile layers and 1 for the instance layer. They all have identical options, however some options are not actually supported for certain layers. The layout is as follows:


 Options     Trans        X          Y
000B BBBS TTTT TTTT XXXX XXXX YYYY YYYY
0 1 2 3

[B]lend mode 
Select layer blend mode (Future feature; currently unsupported)
[S]witch 
enable/disable layer
[TTTT TTTT]ransparency 
Set opacity of layer (Future feature; currently unsupported)
XXXX XXXX YYYY YYYY 
X/Y offset for the layer. Note that these are signed integers, with a total range from -127 to 127. Caution: graphical artifacting may occur if you move a layer far enough that its bounding box is visible on the screen.

Direct Memory Access

It would take an exceedingly long time to copy enough data to fill all of the image, palette and tile map memory in one go. As such, the GPU supports block copying of data to itself from anywhere in the device. This is termed Direct Memory Access (or DMA), as it bypasses the usual method of memory access and is much faster for doing so.

During the DMA phase (see timing), the GPU checks 45 times to see if a DMA transfer has been requested. If it finds that one has indeed been requested, it performs the request, then fires a DMA Complete interrupt request.

DMA transfers are managed with four control registers, located starting at 0x7300. Each is four bytes long. They are as follows:

0x7300 DMASRC 
The address to begin transferring from. This is an absolute addess in device memory.
0x7304 DMADST 
The address to transfer to - this is an address within the GPU. Meaning, you do not include the GPU's offset in memory. Ie, to transfer something into the beginning of tile palette memory, set address 0x6100, not address 0x16100.
0x7308 DMALEN 
The length of the transfer in bytes. The DMA request will transfer this many bytes, starting from DMASRC, to the area specified in DMALEN. There is no error correction, so misaligned writes could cause major data corruption to the GPU's memory. (It would, for example, be unwise to copy tile map data into your instance control memory).
0x730C DMA_GO 
If the least significant bit of this address is not zero when polled, the GPU will perform the DMA request described in the previous three registers, then set this address to zero, before firing a DMA Complete interrupt request. It is fine to simply write one byte to 0x730F to set this register.

Control Registers

This section of GPU memory modifies the behaviour of the GPU itself. It begins at 0x7300.

0x7300 - 0x730F DMA controls 
Registers that regulate DMA transfers.
0x7314 DMA_IRQ 
Used to set the IRQ priority level for DMA transfer complete interrupts.
0x7318 VBLANK_IRQ 
Sets IRQ priority level for Vertical Blank
0x731C HBLANK_IRQ 
Sets IRQ priority level for Horizontal Blank interrupts.
0x7320 MODE_IRQ 
Sets IRQ priority level for GPU Mode Change interrupts. (Future feature; currently unsupported)
0x7324 HBLANK_FIDELITY
Values range from 0-7. Selects the stripe height to draw in each horizontal blank. Rangers are as follows:
0 - 1px
1 - 2px
2 - 4px
3 - 8px
4 - 16px
5 - 32px
6 - 80px
7 - 160px