I O W A S T A T E U N I V E R S I T Y
D E S I G N D O C U M E N T
M A Y 1 0 0 1 1 2 / 0 6 / 2 0 0 9
LIST OF FIGURES
LIST OF TABLES
DEFINITIONS
EXECUTIVE SUMMARY
END PRODUCT AND DELIVERABLES
APPLICATION OF ENGINEERING PRINCIPLES
SYSTEM ANALYSIS
FUNCTIONAL REQUIREMENTS
N ON -F UNCTIONAL R EQUIREMENTS
LIMITATIONS
A SSUMPTIONS
P LATFORM
FUNCTIONAL DECOMPOSITION
NES O VERVIEW
FPGA I NTEGRATIO N
CENTRAL PROCESSING UNIT
PICTURE PROCESSING UNIT
DETAILED DESIGN
N ES O VERVIEW
FPGA I NTEGRATION
CENTRAL P ROCESSING UNIT
PICTURE PROCESSING UNI T
OTHER MODULES
C LOCKS AND TIMING
S YSTEM CONTROL
CONTROLLER POLLING
A UDIO
TESTING AND EVALUATION
REFERENCES
2
9
12
14
15
15
5
8
8
8
9
9
5
5
4
4
3
3
3
30
30
30
30
31
31
15
18
23
25
30
NES
FPGA
PPU
I/O
VGA
VHDL
CPU
ALU
Sprite
SPR-RAM
VRAM
DMA
Figure 1: NES System Overview ................................................................................ 11
Figure 2: FPGA Function ............................................................................................. 13
Figure 3: CPU Function ................................................................................................ 14
Figure 4: High-Level Overview of the NES System ............................................... 16
Figure 5: NES Schematic with CPU and PPU .......................................................... 17
Figure 6: FPGA Flow Schematic ................................................................................. 18
Figure 7: NES Controller Pin Layout ......................................................................... 19
Figure 8: TFT Video Signal Model ............................................................................ 21
Figure 9: Detailed CPU Schematic ............................................................................. 23
Figure 10: PPU Memory Map ...................................................................................... 26
Figure 11: PPU Schematic ............................................................................................ 29
Table 1: Definitions ...................................................................................................... 3
Table 2: Status Register ................................................................................................ 25
Table 3: PPU Register Descriptions ........................................................................... 27
Nintendo Entertainment System
Field-programmable gate array
Picture Processing Unit
Input/Output
Video Graphics Array
Very High Speed Integrated Circuit Hardware Description
Language
Central Processing Unit
Arithmetic Logical Unit
Objects of the screen images that can be moved on the screen
Sprite RAM
Video RAM
Direct Memory Access
Table 1: Project Definitions
3
Iowa State University’s electrical and computer engineering department prides itself on being a leader in technical advancements in its fields. One of these fields is reconfigurable hardware, specifically field-programmable gate arrays, using manufactured integrated circuits and programming them in the field. These
FPGAs allow for reconfigurable computing and reconfigurable systems that are highly valuable today to carry out multiple functions at a very low cost.
To help Iowa State University further themselves as a leader and advance students learning at the same time a FPGA-based project was presented from computer electrical and computer engineering faculty. The emulation of the
Nintendo Entertainment System on a FPGA board would showcase Iowa State
University’s expertise in reconfigurable hardware.
A project team consisting of four students from Iowa State University has been created to plan, design, implement, and test the FPGA emulation of the original
NES. The team has experience in processor design and computer architecture through computer engineering coursework taken at Iowa State.
In the following document, the design team will outline the comprehensive design of the emulation project. The document will focus on the system as a whole, as well as the evaluation of the decomposition of the system into specific components. The main components of the project system are the central processing unit (CPU) and picture processing unit (PPU). Along with those major components, many other small components must be implemented to create a working emulation. These include, but are not limited to, data buses, hardware integration, input/output and memory mapping, and user interfaces, both input and output.
The final design will consist of an FPGA board emulating the internal components of the Nintendo Entertainment System. This includes the central processing unit, the picture processing unit, and the input/output functions of the original system. The input/output functions will include the original NES controllers being used, reading from the original NES game cartridge, and outputting the audio and video to a monitor. No other software or hardware deliverables will be produced, aside from project documentation.
4
Our emulation of the Nintendo Entertainment System makes use of many engineering principles that we have encountered in our coursework. The largest application is our knowledge of computer architecture and hardware design.
Since our emulator is hardware based and we are creating two processors, an emulation of the 6502 processor and a picture processing unit, our VHDL skills are being heavily used. Given that we are working with processors we are also be using our assembly programming skills. Lastly, we are making use of our digital design skills and our embedded systems skills since our platform is an
FPGA.
FUNCTIONAL REQUIREMENTS
NES Emulated System
FR 1.1 – System must accurately emulate the original NES and its features.
FR 1.2 – System must run a game through a NES ROM file stored via a
CompactFlash memory card.
FR 1.3 – System must accept input from an original NES controller connected to the
FPGA.
FR 1.4 – System must be able to output information to a display or screen at the original NES frame rate.
FR 1.5 – System must be able to transfer the NES assembly code from the memory card to the FPGA inputs where the VHDL code can have access to the input.
FR 1.6 – System must be able to handle all assembly code instructions and memory management that the original NES handled.
FPGA Required
FR 2.1 – System must be able to be implemented on a FPGA Xilinx 570X board used by the Computer Engineering department.
FR 2.2 – System must be able to properly transfer all VHDL code and convert it to the assembly code needed to program the logic resources on the FPGA.
5
FR 2.3 – FPGA must be able to operate and perform the given VHDL code under the correct operating frequencies given.
FR 2.4 – FPGA must be able to allocate resources to create a buffer for the VGA output.
FR 2.5 – FPGA must be able to accept input from an original NES controller through an I/O port or pin on the board.
FR 2.6 – FPGA must be able to store a NES ROM file and be able to access it when the emulation starts.
FR 2.7 – FPGA must be able to perform operations to push the VGA buffer information (pixel data) to a display connected to it.
FR 2.8 – FPGA must provide sufficient logic resources to recreate the components of the original NES design.
FR 2.9 – FPGA must be able to alter its clock speed range to meet multiple clock
CPU speeds for the operation of several components contained in the NES.
FR 3.1 – CPU must be implemented through VHDL components and be able be tested though Xilinx and ModelSim.
FR 3.2 – CPU must implement its own clock frequencies based off of a given clock frequency from the FPGA board.
FR 3.3 – CPU clock frequency must meet original NES frequency to the point that speed of play in NES games is negligible.
FR 3.4 – CPU must properly decode ROM instructions and handle different clock cycle rotations for different instructions.
FR 3.4 – CPU must be able to handle the various clock cycle rotations for different types of instructions without forcing an unnecessary idle.
FR 3.5 – CPU registers must be able to handle Read/Write capabilities at appropriate speeds.
FR 3.6 – CPU must contain logic based off of input and current instruction to control function of the CPU and PPU.
FR 3.7 – CPU must contain an Arithmetic Logic Unit to perform computations needed for proper functionality.
6
PPU
FR 4.1 – PPU must be able to display images to a connected display.
FR 4.2 – PPU must contain space on the FPGA for Virtual RAM and be able to retrieve it within 2 PPU clock cycles at max.
FR 4.3 – PPU must contain space on the FPGA for Sprite RAM and be able to retrieve it within 2 PPU clock cycles at max.
FR 4.4 – PPU must be able to hold a color palette and color generator that matches the NES original ability.
FR 4.5 – PPU must be able to retrieve information directly from the ROM file and decode the image attributes, color, and horizontal and vertical location to display.
FR 4.6 – PPU registers must be able to handle Read/Write capabilities at appropriate speeds.
FR 4.7 – PPU must be able to send information of each created pixel to a buffer storing the image on the FPGA before outputting to the display.
FR 4.8 – PPU must be implemented through VHDL components and be able be tested though Xilinx and ModelSim.
FR 4.9 – PPU must implement its own clock frequencies based off of a given clock frequency from the FPGA board.
I/O
FR 5.1 – Input functionality of the system must come from an external source, via
NES controller, or other NES device.
FR 5.2 – Output functionality of system must be sent to an external source, the board must not output information on its own hardware.
FR 5.3 – Input/output functionality must continuously poll the input device and pull information to the CPU when a sequence has been hit.
FR 5.4 – When original NES input devices are used all functions of the device must be accounted for and properly working before use.
7
NON-FUNCTIONAL REQUIREMENTS
NFR 1. 1 – All components of the CPU, PPU, I/O must be written in VHDL to properly work with testing of the entire system.
NFR 1.2 – All changes to any VHDL or other code must be placed into the projects SVN so that it can be documented and backed up.
NFR 1.3 – All code must be placed on the SVN so that the code can go through a code review at any time and place.
NFR 1.4 – All code must be well commented and any related information regarding more documentation on the code’s use must be placed in the SVN.
LIMITATIONS
LM 1.1 – The FPGA boards have a limited amount of logic resources so the entire system must fit on it.
LM 1.2 – The maximum required clock speed required by our implementation cannot exceed the FPGA’s maximum clock.
LM 1.3 – Our system must be compatible with NES assembly and implement compatible components.
ASSUMPTIONS
AS 1.1 – The two-semester time period is adequate to complete the project.
AS 1.2 – The Xilinx FPGA boards and development tools provided by the ECpE department contain enough logic resources to emulate the full NES.
AS 1.3 – The Xilinx FPGA boards and development tools provided by the ECpE department has a clock generator that can be accurately adjusted to the speed required by the NES components.
AS 1.4 – The NES documentation online is enough to determine the internal components of the NES.
AS 1.5 – The internal components are documented well enough to be able to emulate their functionality.
AS 1.6 – NES compatible compilers exist so we can create our own programs
8
PLATFORM
Our Nintendo Entertainment System emulator will be implemented on the Xilinx
ML507, a general purpose FPGA. This FPGA contains a 160 x 38 array of configurable logic blocks, 11,200 Virtex-5 slices containing four look-up tables and four flip-flops, and
128 DSP48E slices containing a 25 x 18 multiplier, an adder, and an accumulator. The board also contains 256 MB of DDR2 RAM, VGA/DVI output, PS/2 Ports for a keyboard and mouse, and audio output. The board’s clock runs at a maximum of 550
MHz. This board should be overly adequate for the task we have at hand considering the Nintendo Entertainment System’s original process ran at 1.79 MHz and contained only 2 KB of onboard RAM.
NES OVERVIEW
Overall, the NES system consists of a few major component blocks. The two largest (and more significant) blocks are the CPU and PPU. These handle the major processing and function of the system on the Xilinx board. The other, smaller components are the different RAM blocks, the input/output control, the game cartridge, and the data and control buses. The components are described below in detail, as well as illustrated in
Figure 1.
CPU: This component handles all of the instructions from the gaming module. In other words, each game instruction must pass through and be decoded and processed by the
CPU in order to pass that information to other components of the system. Each instruction tells the NES system how to change the game that is currently being played.
It consists mainly of an instruction and control logic decode, arithmetic logic unit (ALU), branch logic, registers, and specific output to other components, like the PPU. The CPU accesses a shared memory with other components to do its calculations, and also uses user input to determine how to proceed.
PPU: The PPU controls all of the visual aspects of the NES. The PPU receives its instructions of what to put on the screen from the PPU. From there, it processes that information and accesses its own memory to retrieve the right images and colors to display. The PPU has its own RAM that stores all of the sprite data and other video processing data.
Game Cartridge: The game cartridge is a key element in the function of the system. The instructions to run a game come from this game cartridge dynamically, telling each component how to operate. It is our hope that we will be able to integrate an actual NES game cartridge, but at the very least, we aim to use a ROM file to feed instructions into the CPU.
9
RAM: There are several different memory modules in the NES system. As previously discussed, the PPU has its own RAM to access and store visual data in. There is also a more general memory block, in which the CPU can access and store the data it needs for its calculations. The memory blocks are 2KB in size.
Data and Control Bus: The data and control buses provide a very important link between each component. The data bus is able to be read and written by each component, but the control bus can only be written to by the CPU as it computes its instructions. The data bus is the method in which the CPU and PPU share data. The CPU will write data to the data bus, of which certain data will indicate that the PPU will start processing.
I/O: The input and output control is the last major component of the NES system. This block processes the input from the user as well as other components, and the output processes the data that is being transmitted outside the system to the display monitor.
The I/O module uses the data and control buses to access the information from the CPU and PPU that affect the output of the system.
10
PRAM
SRAM
16 bits
8 bits
8 bits
Figure 1: NES System Overview
11
FPGA INTEGRATION
The field programmable gate array (FPGA) is the core hardware component that provides the ability to emulate the Nintendo Entertainment System (NES) through reprogrammable hardware. It has 3 main components that the emulation of NES will use: Logic Resources, Memory Resources, and Physical Connections.
Logic Resources (FPGA): The Xilinx ML507 board contains logic resources that can be programmed to fit the individual program loaded to the board. Once the VHDL code of the CPU, PPU, and other are loaded onto the FPGA board, the logic resources are adjusted to meet the demand of the program. This is the main limitation of the FPGA board for the NES emulation because of the large amount of components created to support playing a NES game.
Memory Resources (FPGA): The FPGA contains its own memory for use by the programs loaded into the FPGA. It contains 256 DDR2 memory, 1 MB SRAM, 32 MB flash memory, along with multiple ports for additional memory space. Since the NES was very limited in its memory space, the FPGA provides much more memory space than is needed for the emulation.
Physical Connections (FPGA): The FPGA offers multiple physical connections that will be used by the NES emulation. The physical connection has to be able to accept input from the user’s controller, transfer the games instructions and memory to the board, and display the outputted image to a display.
Input (FPGA): The FPGA offers multiple USB ports, Compact Flash card, as well as serial connections that can be used to capture the input from the user, hold the
NES ROM file, and connect the original NES controllers to the board.
Output (FPGA): The FPGA has DVI/VGA video output and multiple audio outputs to handle the transmission of the image to the display. It also has several
LED’s for debugging purposes.
12
Logic Resources (FPGA)
Input (FPGA)
Physical Connections
(FPGA)
Output (FPGA)
CPU
Memory Resources (FPGA)
PPU
Figure 2: FPGA Function
13
CENTRAL PROCESSING UNIT
Figure 3: CPU Function
Instruction Fetch: In this portion of the CPU’s execution cycle the next instruction is fetched from memory.
Instruction Decode: In this stage the instruction is decoded into an opcode then based on the opcode the operands for the instruction (if there are any) can be evaluated. The 6502 processor has eight addressing modes therefore each instruction could have up to eight different ways of being executed because the addressing mode defines how the operands are evaluated.
Control Logic Generation: Here the decoded instruction is turned into a series of signals that control various parts of the processor. An example of this is a write enable signal which tells the arithmetic logic unit to write to a register or not.
Instruction Execution: Here the signals generated by the control logic generator are used to execute an instruction. Arithmetic is performed at this stage and information can be stored to registers.
14
PICTURE PROCESSING UNIT
The Picture Processing Unit (PPU) is the part of the NES that is responsible for generating the video signal required to display the screen image to the user. It uses information stored in special PPU memories to determine the background color as well as the location of the sprites on the screen. The CPU does not have direct access to the
PPU memory instead it loads values into the PPU’s memory via specially designed PPU access registers.
Functional States
Memory Loading (Vblank): In this state the CPU has access to the PPU memory and can make any necessary changes to values corresponding to sprite color, sprite position or background color. During this state nothing will be rendered and the screen is blank.
Image rendering: Once this state is signaled the PPU will start to process and render the screen image. It uses the values stored in the Sprite-Ram (SPR-RAM) and Video-Ram
(VRAM) to generate the picture. The image will be generated in a pixel-by-pixel fashion starting in the upper left corner moving across the screen one pixel at a time. Then move back to the left side and continue again on the next line. During this state the CPU does not have access to change any of the PPU’s memory as it could cause corruption of the picture.
Components
Sprite-RAM: 256 byte set of memory used to store the position and color information of each sprite
Video Ram: 4KB set of memory used to store screen information, background details, and additional sprite related data
Access Bus: The PPU has two access buses one is used only for accessing dedicated PPU memory. The other is a common shared bus that allows the CPU to pass data to the PPU
I/O Registers: 8bit registers used for the CPU to interact with the various components of the PPU
Rendering Logic: The part of the PPU that will actually create and render the screen image
NES OVERVIEW
A more detailed analysis of the NES system as a whole will help us understand how all of the individual components are integrated and function as a whole. This section will outline, in more detail, the operation of the PPU, CPU, and game file (ROM) together, as well as describe the inputs and outputs of the system from a high-level view. The diagram in Figure 4 is a very high-level image of the NES system. Its purpose is to create a visual connection for the reader to the system and how our team visualizes the project end result.
15
Figure 4: High-Level Overview of the NES System
The NES system is comprised mainly of the CPU and PPU, as mentioned several times before. These two elements are the key factors in the functionality of the system overall.
This next diagram takes a much more detailed look at the ML507 Xilinx board (from the
Figure 4 above), focusing on the CPU and PPU.
As seen in Figure 5, the CPU and PPU share data between themselves via a System
Control component in order to execute instructions and computations.
The CPU takes input of interrupts and clocks (as well as the instruction) and outputs data and address values that the PPU reads from in order to recognize when to start displaying on the monitor. The PPU will read the data and output the visual image
(video signal) from the entire system. The CPU is also responsible for outputting the audio signal from the system. Outputs from both the CPU and PPU compose the data necessary to play a game on the NES. The game (ROM file) is accessed via Compact
Flash that feeds instructions and other information into the CPU.
16
System Control
CPU PPU Controller Polling Audio Clock Generation
Memory
ROM
Interface
VGA
Figure 5: NES Schematic with CPU and PPU
Once the overall VHDL replication of the NES has been created, the next step is to load that code onto the FPGA board. This is done by first synthesizing the code, then downloading it to the board, using the Xilinx development suite.
Design Tradeoffs
Game Cartridge: In our system, we hope to be able to implement game play using a cartridge that works with the original NES system. However, the base requirement is that we be able to play a game using the ROM files that are stored on the cartridge.
Using a physical cartridge would require additional hardware to connect the cartridge and logic to integrate it with the rest of the components. To implement the game cartridge would require more pins than available on the FPGA board.
Controller/User Input: Preferably, we will use the NES controllers to provide user input to the system during game play. An alternative is implementing user input using a keyboard, however this requires using the PS2 port and adding additional modules for implementation.
Video Output: The system must output video signals in order to be able to display the visual effects of the game. Our goal is to convert this output to VGA or component output in order to use the system with a monitor.
Audio Output: The design of the NES CPU required an additional module to be created to handle the audio output for the NES game. The functionality provided by audio output would not be worth the time to design the output module since audio is not used in the games other than for aesthetics. The excess information produced by the CPU in order to synthesize audio will be sent to a non-function audio module.
17
FPGA INTEGRATION
The FPGA has three critical functional components that need to be addressed. The first is the ability to load of the NES VHDL module onto the FPGA and appropriately set the logic and memory resources on the board. The second is the handling of input signals from the NES controllers and ROM file information sent to the CPU inputs. Lastly, the ability to handled the VGA frame creation, individual frame buffering, and outputting the frame to a display or monitor.
NES Controller
Input 1
NES Controller
Input 2
NES ROM File
Computer
Xilinx
Transfer
UI Port 1 UI Port 2 Game Port 3
Input Connections
Logic Resources
Memory
Resources
Video VGA Buffer
Output
Connections
FPGA
Board
Display/Monitor
Figure 6: FPGA Flow Schematic
18
Input Connections: The NES system has several forms of input that need to be addressed.
User Input (2): NES offered 2 user inputs where the controller is a 7-pin input allowing for a four-directional pad (D-pad), two buttons (A,B), start and select buttons.
Design Tradeoff: Keyboards are readily available for use as an input device, however it would be an unwise use of time to set up the PS2 interface for the
FPGA board relay of information from the keyboard and set commands for individual buttons that are needed for NES operation. The NES controller can be used and information can be sent from the controller to an adapter that can connect to I/O pins. This I/O pin information can then be used for input that can be read by the NES emulation.
Overview: The controller uses 7-pin interface to provide user input to the
NES polling module. These also could be interchanged with other NES input devices such as the Zapper or Running Pad that use the same 7-pin setup.
The 7-pin connection then would have a custom adapter, readily available for purchase, to attach to the I/O pins on the FPGA board that can be read by the
NES emulation. This requires following the NES controller operation that sends 1 bit of data per clock cycle for 8 clock cycles corresponding to what buttons are being pressed on the controller. The controllers are powered using the power through the FPGA board.
Parts needed: 2 NES controllers, 2 NES controller sockets, wired pins to
FPGA board
Figure 7: NES Controller Pin Layout
19
NES Cartridge/ROM File: NES had 1 input for the NES video games that was a 72-pin connection to the NES hardware.
Design Tradeoff: For an emulation of the NES system, it is not necessary to implement an adapter for the game cartridge. Instead the project is looking to emulate the system by using ROM file of the video game onto the
CompactFlash card and obtain the input information from that video game.
Overview: The ROM will be stored on the CompactFlash card and pull information from this memory location. The ROM file acts as a temporary
NES video game cartridge so the CPU and PPU can pull instructions and memory as it would from the original design. On initialization of the NES emulation, the ROM file will be copied into a FPGA memory core that can be accessed by the NES components.
Power/Reset: NES had two control buttons for powering on the system and resetting the system for changing or restarting a game.
Design Tradeoff: The power buttons is implemented by starting the NES emulation when the FPGA board’s power switch is turned on. The reset functionality will be tied to a button located on the FPGA. This saves setting up another port connection and the transfer of information to the NES emulation.
Memory Resources: The memory resources are being used by three parts: CPU, PPU, and VGA buffer.
CPU: 2KB.
PPU: Memory use of 256 bytes for sprite data storage, 32 bytes for palette storage, 4KB tile sets.
Video VGA Buffer: 2KB external RAM for graphics information storage.
20
Output Connection: The output connection of the NES was composite video, using an analog signal to send information to the display.
Design Tradeoff:
Since the original NES video output was composite and the FPGA outputs only in DVI/VGA a different approach must be taken. In order to follow how the NES color formatting and palettes created the pixel color information, the data must be converted into a VGA format and outputted.
Overview:
The Xilinx ML507 board has a Thin Film Transistor (TFT) controller that gives the ability to control VGA output. The core for using the TFT is included with the Xilinx development suite. The hardware control receives pixel data from a separate memory device and outputs it while controlling the timing for each of the lines of the displayed frame. The output process can be turned on and off by enabling/disabling the
“enable bit” of the TFT’s control register. The output can use either DVI or VGA by setting the C_TFT_INTERFACE value to ‘0’ for DVI or ‘1’ for
VGA.
Figure 8: TFT Video Signal Model
The TFT controller uses the PLB bus to connect the stored pixel information in the PLB memory. The video memory expected by the TFT controller in the PLB memory is RGB pixel information contained in a 32bit word. The memory should be in a 2MB region consisting of 1024 data words (1 word = 32 bits) for each of the 512 lines to complete the frame.
So, the 1024 x 512 memory space uses the first 640 x 480 to display the image.
21
Display: Output of the NES resulted in a picture resolution of 256 x 240 pixels.
Design Tradeoff: The TFT used by the FPGA to output the VGA has a resolution of 640x480 pixels. This difference results in the original display being about 50% of what is outputted by the FPGA board. This gives two options:
Option 1: The original design is followed and the output is displayed at
256 x 240 pixels either in the upper left corner of the display or centered in the middle of the display.
Option 2: The 256 x 240 pixels can be duplicated increasing the screen size by 200% to 512 x 480 pixels without affecting the operating of the
NES system. The image will be displayed in the center of the screen with any unused space set to black.
Overview:
Option 1: No alteration to the VHDL code is needed except for sending additional pixels, set to black, for the unused space on the screen. This requires minimal changes in the overall design of the PPU, changing only the image resolution displayed.
Option 2: Initially when the NES emulation is started the memory holding the pixel information should set the pixel data to black. Then, the first created pixel is placed in the memory location of the 65th pixel to allow the screen to center itself. When an individual pixel’s data is created by the PPU’s rendering process it must be stored into memory and then an exact copy must be stored into memory as the following pixel to be displayed. Once a line of pixels has been rendered (640 pixels wide) the line must be duplicated and stored as the line below. This process is repeated until the last pixel of the original (256x 240 pixels) has been completed.
22
CENTRAL PROCESSING UNIT
Figure 9: Detailed CPU Schematic
23
CPU Components
ALU Hold Register: This register holds the output of the Arithmetic Logic Units operation.
Accumulator: This register holds intermediate data for the CPU.
Address Bus High Register: This register holds the upper section of the address bus.
Address Bus Low Register: This register holds the lower section of the address bus.
Arithmetic Logic: This piece of logic performs arithmetic operations on non-decimal 8-bit values. It can perform the following operations: addition, and, or, xor, and shift right.
Bus Interface: This interface allows for the transfer of data into and out of the CPU.
Clock Generator: This component generates the clock for the processor.
Control Logic: This logic outputs all the signals necessary to execute an instruction based on the instructions opcode and operands.
Data Bus Input: This bus brings data into the processor.
Data Output Register: This register holds information that is leaving the processor.
Input Register A: This register holds one of the operands going into the arithmetic logic unit.
Input Register B: This register holds one of the operands going into the arithmetic logic unit.
Instruction Decode: This logic takes an instruction and decodes it to determine how the control logic should proceed with it.
Interrupt & Reset Control: If an interrupt is set or the processor is reset this piece of logic determines how the CPU will react to it.
Program Counter High Increment Logic: This piece of logic determines how to increment the upper portion of the program counter.
Program Counter High Register: This register holds the upper portion of the program counter.
Program Counter Low Increment Logic: This logic determines how to increment the lower portion of the program counter.
Program Counter Low Register: This register holds the lower portion of the program counter.
24
Stack Pointer: This register holds the address at which the top of the stack is located.
Status Register: This register holds status information that the CPU uses to make decisions. Note that bits 2 and 4 are unused in this design of the CPU. The layout of the status register is as follows:
Bit
Purpose
7
Carry
Flag
6
Zero
Flag
5
Interrupt
4
---
3
Break
Mask Flag
Table 2: Status Register
2 1
--Overflow
Flag
0
Negative
Flag
X: This register holds intermediate data for the CPU.
Y: This register holds intermediate data for the CPU.
Since our platform is overly adequate for the logic we are designing there are not many tradeoffs in our CPU design. We have made a choice to modularize our design so we can test our components individually rather than having to test the system as a whole once the entire processor is finished.
PICTURE PROCESSING UNIT
Top level PPU VHDL entity
The top level PPU entity within the VHDL code follows the same inputs and outputs scheme as the original hardware based PPU with one exception. The team feels this will make it simpler to implement and integrate with the other components if we use the already defined signals. entity PPU is
Port ( data_in : in STD_LOGIC_VECTOR (7 downto 0);
data_out : out STD_LOGIC_VECTOR (7 downto 0);
rwDirection : in STD_LOGIC;
regSelect : in STD_LOGIC_VECTOR (15 downto 0);
cs : in STD_LOGIC;
clock_in : in STD_LOGIC;
Vblank : in STD_LOGIC;
ALE : in STD_LOGIC;
readVMem : in STD_LOGIC;
writeVMem : in STD_LOGIC;
V_out : out STD_LOGIC_VECTOR (10 downto 0)); end PPU;
25
PPU Signal Descriptions data_in, data_out: These signals are contain the values written to or read from the PPU memory.In the original PPU the data signal was bidirectional. We decided to make an input and an output for ease of implementation rwDirection: Determines if the PPU is reading or writing from the data signals regSelect: Determines which of the PPU’s registers will be access via the data signals cs: tells the PPU it can write to the data dus clock_in: PPU clock signal
Vblank: Used to tell the PPU when to start rendering the image
ALE: Used when the PPU is accessing the VRAM readVMem: enables reading from the VRAM writeVMem: enable writing to the VRAM
V_out: the video output signal
Figure 10: PPU Memory Map
26
Figure 10 shows how the PPU’s VRAM is allocated. Since the VRAM’s address space is
16bits but there is less physical memory some of the logical memory address locations will actually directly mirror/reference different locations within the physical memory.
Register descriptions
The PPU has 9 I/O registers associated with it. They are accessed via memory mapped
I/O at CPU memory locations $2000-$2007 and $4014.
Register Address Name Function
2000
2001
2002
Control Register 1
Control Register 2
PPU Status Register
PPU related control settings
PPU related control settings tells the CPU of the PPU’s status
2003
2004
2005
SPR-RAM Address
SPR-RAM Data
Scrolling Offset
Specifies the address of the SPR-RAM to write
Specifies the value to write in the SPR-RAM
Used in screen scrolling to offset the background
2006
2007
4014
VRAM Address
VRAM Data
Sprite DMA
Specifies the address of the VRAM to write
Specifies the value to write in the VRAM
Used for Direct Memory Access to the SPR-
RAM
Table 3: PPU Register Descriptions
Notes:
2006: Requires two writes for a single address because it is an 8bit register and VRAM addresses are 16bits
4014: Used when transferring large amounts of data to the SPR-RAM to increase efficiency
Pattern Table
This area of memory stores 8x8 tiles which are the available background pictures that can be used when generating the background.
Name Tables
Names tables contain the list of patterns which should be displayed when generating the image. It’s mainly a matrix of references to the tiles within the pattern memory. Each name tables hold 32x30 tile references. Since there are multiple name tables it allows for easier transition between backgrounds within game play.
27
Palettes
This portion of memory store the values needed to generate a specific color. When generating the image the tiles specified in the name tables reference this table when determine the color of the background or sprite. This allows for the same tile on a different place on the screen to have different colors but still only one representation within the pattern table.
Design Tradeoffs
Approach 1: Implementing the PPU could be done primarily within a single large VHDL file. This approach was going to become cumbersome especially when trying to determine what a signal did and what VHDL process it with.
Approach 2: In this approach the PPU is divided into different functioning modules such as VRAM access, SPR-RAM access, image rendering, video signal generator, and control logic. Each of the modules is the responsible for its own function. Signals will then be connected between all of the modules so that data can be directly transferred between them.
Approach 3: This approach is similar 2 the second but instead of requiring that signals exist between all the different modules. Signals will be created between the modules and the control logic. The control logic will then forward the data from one module to another and necessary. This should reduce complexity during implementation and reduce chance of trying to drive a signal within multiple processes, which is not allowed in VHDL.
Video Output: The original NES used a RF signal for the video output. Based on the I/O ports on the FPGA board we need to use VGA to output the video signal. We could either have the PPU still generate the RF signal and add another component to convert that signal to VGA, adding additional complexity. Another option is to modify the PPU to generate the VGA signal directly. We feel that modifying the PPU to directly produce the VGA signal is the better option and will be easier to implement as there are numerous resources available on how to produce a VGA signal.
In our implementation we plan on taking approach 3. This diagram gives an overview of the different modules and how they will interconnect. The ram related registers will be separated into their corresponding module and the other four will be in the control module.
28
Figure 11 : PPU Schematic
29
CLOCKS AND TIMING
We will create a timing module that will utilize the FPGA’s clock signal to generate clocks for all of the NES modules.
CPU: The CPU uses a 21MHz clock that is separated into two signals by inverting the original clock signal.
PPU: The PPU will utilize the CPU’s clock as well as a clock three times as fast as the CPU’s clock.
Controller: The controller polling module will use a 5 MHz clock.
VGA: The VGA’s clock speed will depend on the PPU’s ability to render a video frame.
SYSTEM CONTROL
This module handles the initial setup of the NES emulation when then FPGA board is powered on, as well as handling reset functions when the reset button is pressed on the board. It will determine which of the CPU, PPU, RAM, ROM file, or controller polling modules can write to the data bus at a given time. This module will also handle other functionality for integrating between modules such as direct memory access (DMA).
CONTROLLER POLLING
This module will interface with the I/O pins on the FPGA and provide a register to the CPU with the status of each of the buttons on the controller. The CPU can then request the values of the controllers’ polling register to update its values and perform the necessary changes of the displayed images. The polling of these values is continuous throughout the NES emulation.
AUDIO
This non-functioning module will take in all the signals used by the CPU that would be needed if implementing NES audio functionality. In the future this module could be made functional if time permits.
30
Individual Component
VHDL based test benches will be created to test the functionality of the individual components. Within the CPU its sub-modules, like the ALU and Instruction Decode, will be tested to verify that they generate the required results. The same will be done for the
PPU’s memory access and rendering modules. Once the sub-modules are tested the can be put together and tested again at the component level using the same VHDL based test bench method. Since there are 6502 compilers it will be easier to test the CPU as we can write a program in assembly and have the compiler generate the necessary binary values. To the PPU as a whole we are going to have to make use of existing software based NES simulators which show values within the various PPU memories and then compare those values to values within our PPU’s memory.
Components integrated together
Once the individual components are tested we will connect them together and create a large test bench that calls for the CPU to access PPU registers. This will test the interconnectivity of these two modules. In addition a test bench will need to simulate controller inputs and have memory to contain the program data (game).
On board
The final step in the testing will be to make sure that our code will synthesize down and load onto the board. We will synthesize our code at various points during development to make sure we don’t run into any coding issues. We’ll try to wait until most of the
Modelsim based test bench testing is done and then begin testing the controllers and other integration that requires the board.
http://nesdev.parodius.com/NESDoc.pdf
http://nesdev.parodius.com/2C02%20technical%20reference.TXT
http://nocash.emubase.de/everynes.htm#iomap http://www.xilinx.com/support/documentation/ip_documentation/xps_tft.pdf
31