The 4th Annual Seminar of National Science Fellowship 2004 [ME11] UKM8032 microcontroller design and FPGA implementation by integrating DW8051 IP Core for SoC design Victer Chong, Prof. Dr. Masuri Othman VLSI Design Centre, Blok Inovasi 2 Fakulti Kejuruteraan, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia. Introduction Microcontroller is an essential component for many general and application specific purposes such as airbag control system in automotive industry. Current applications of any microcontroller vary with larger size, higher cost, and lower processing speed yet not locally designed. Hence, a high performance Intel compatible 8-bits 8032 microcontroller has been designed towards realizing a System-onChip (SoC) design, i.e. UKM8032. The SoC integrates Synopsys’s DW8051 Intellectual Property (IP) core and other peripherals that were designed using IEEE 1364-1995 Verilog Hardware Description Language (HDL). This local design aims for a smaller size, lower cost and higher operating speed. Due to unavailability of fabrication library, the prior Application Specific Integrated Circuit (ASIC) design has been functionally verified at Register Transfer Level (RTL) as a reference for final Field Programmable Grid Array (FPGA) hardware implementation. An internal Read Only Memory (ROM) unit has been incorporated into the FPGA design for testing purposes which enables the design to reach operating speed up to 100MHz on Virtex2 FPGA prototype board. The industrial design and verification methodology have been practiced to successfully test the design in FPGA hardware before presuming the ASIC implementation for fabrication in the future. Materials and Methods DW8051 IP core was being verified and integrated into UKM8032_asic and UKM8032_fpga in two stages (Figure 1) i.e. RTL functional stage and followed by logic functional stage after synthesis process. Due to unavailability of technology library for ASIC implementation, the encrypted DW8051 IP core was integrated with other peripherals in RTL level for UKM8032_asic. Whereas the synthesized core netlist (targeted to Virtex2 479 FPGA library) was integrated with other peripherals that were designed for FPGA implementation in UKM8032_fpga. DW8051 IP core’s spectification configured DW8051 IP core’s RTL Functional Verification UKM8032_asic’s Peripherals DW8051_core RTL file Top-level of UKM8032_asic UKM8032_asic RTL file UKM8032_asic’s RTL Functional Verification Simulation Result File (SRF) DW8051_core RTL file DW8051 IP core’s Synthesis Process DW8051_core netlist file DW8051 IP core’s Logic Functional Verification DW8051_core netlist file Top-level of UKM8032_fpga UKM8032_fpga RTL file UKM8032_fpga’s Synthesis process UKM8032_fpga netlist file UKM8032_fpga’s Logic Functional Verification UKM8032_fpga netlist file FPGA Implementation & Functional Verification For Each Level UKM8032_fpga’s Peripherals FIGURE 1 The usage of the DW8051 core in UKM8032_asic and UKM8032_fpga design Overall workflow of designing and implementing UKM8032 microcontroller in FPGA is separated into three important parts (Figure 2): 1) DW8051 IP core’s RTL and logic verification. 2) UKM8032_asic’s RTL verification. 3) UKM8032_fpga’s logic verification and FPGA implementation. DW8051_core’s RTL and logic verification The first part at the left portion (Figure 2) was for RTL and logic functional verification of the DW8051 IP core in Synopsys’s CoreConsultant environment that called up Verilog Compiler Simulator (VCS) simulation tool and Design Compiler (DC) synthesis tool. The 4th Annual Seminar of National Science Fellowship 2004 1 2 Spesifikasi/Konfigurasi 3 trace and strobe files (.ram, .pct, .wrt, .stbs0, .stbp). The simulation is verified by comparing the output of the testbench with the provided core’s GSF. UKM8032_iopad.edif ( netlist ) Pelaksanaan FPGA Sub-sub Komponen Tambahan: UKM8032_p0_asic.v UKM8032_p1_asic.v UKM8032_p2_asic.v UKM8032_p3_asic.v UKM8032_ext_sfr.v UKM8032_ram_256_beh.v IP Teras DW8051 Perihalan RTL (Verilog HDL) ( encrypted ) Fail Rujukan Simulasi Untuk Teras DW8051 ( GSF ) Sub-sub Komponen Tambahan: UKM8032_p0_fpga.v UKM8032_p1_fpga.v UKM8032_p2_fpga.v UKM8032_p3_fpga.v UKM8032_ext_sfr.v UKM8032_ram_256_fpga.v DW8051_core.v ( encrypted RTL ) DW8051_core_tb.v ( testbench ) ukm8032_translate.v UKM8032_tb.v ( testbench ) Pustaka simulasi Virtex2 ( SIMPRIMS ) *Pengesahan Pascapenterjemahan Pengesahan Kefungsian ( RTL ) ukm8032.ngd DW8051_core.v ( encrypted RTL ) Aras-tertinggi Mikropengawal UKM8032_asic DW8051_core_fpga.v ( netlist ) Aras-tertinggi Mikropengawal UKM8032_fpga DW8051_core.v ( encrypted RTL ) Pustaka sintesis Virtex2 ( virtex2-4.db ) Sintesis Logik UKM8032_iopad.ngo Penterjemahan ( Translate ) Kekangan pelaksanaan FPGA ( UKM8032_constrain.ucf ) ukm8032_map.v & sdf UKM8032_tb.v ( testbench ) Pustaka simulasi Virtex2 ( SIMPRIMS ) Perihalan RTL ( Verilog HDL ) UKM8032_asic.v ( RTL ) UKM8032_tb.v ( testbench ) *Pengesahan Kefungsian ( RTL ) Netlist Peringkat-Get Perihalan RTL ( Verilog HDL ) Kekangan sintesis ( syn_DC_UKM8032fpga.tcl ) UKM8032_fpga.v ( RTL ) Pustaka sintesis Virtex2 ( virtex2-4.db ) Sintesis Logik Simulasi Peringkat-RTL DW8051_core_fpga.v ( netlist ) DW8051_core_tb.v ( testbench ) Pustaka simulasi Virtex2 ( UNISIMS ) UKM8032list_asic: Sebelum Pelaksanaan FPGA: (pengesahan kefungsian-RTL) UKM8032_asic.v DW8051_core.v (encypted RTL) UKM8032_p0_asic.v UKM8032_p1_asic.v UKM8032_p2_asic.v UKM8032_p3_asic.v UKM8032_ext_sfr.v UKM8032_ram_256_beh.v *Pengesahan Pasca-pemetaan ukm8032.ncd ukm8032_map.ncd ukm8032.pcf Penempatan & Penyambungan ( PAR ) ukm8032_timesim.v & sdf UKM8032_tb.v ( testbench ) Pustaka simulasi Virtex2 ( SIMPRIMS ) Perbandingan Fail Simulasi Pengesahan Logik Fail rujukan simulasi (GSF) untuk teras DW8051 : .ram .pct .wrt .stbs0 .stbp Pemetaan ( Map ) 1 Fail keluaran simulasi di direktori ‘./sim_res_UKM8032’ Netlist Peringkat-Get UKM8032_iopad.v ( netlist ) UKM8032_tb.v ( testbench ) Pustaka simulasi Virtex2 ( UNISIMS ) *Pengesahan Pasca-PAR Fail keluaran simulasi : .ram .pct .wrt .stbs0 .stbp Perbandingan fail keluaran simulasi ukm8032.ncd UKM8032_tb Fail Rujukan Simulasi ( SRF ) *Pengesahan Logik Jika tidak sepadan Penjanaan Fail Bit Aturcara himpunan (.hex) ukm8032.bit Senarai aturcara himpunan untuk diuji (UKM8032_tb.cmd) Pangkalan sesiri Model peranti pangkalan sesiri (DW8051_ext_serial) 2 Sepadan Pembebanan Cip Virtex2 Mula / Akhir Keseluruhan Aliran cmd2task_s.pl -c Rekabentuk Utama Setiap Bahagian Perwakilan Peringkat Rekabentuk Fail Masukan/Keluaran Aturcara himpunan (.mem) Pengujian Proses Dalam Aliran *Kaedah pengesahan yang sama di peringkat masing-masing Peranti ujian (DUT) UKM8032_asic Senarai aturcara himpunan untuk diuji (UKM8032_tb.task) Senarai tugas untuk aturcara himpunan yang diuji (UKM8032_tb_main) Pangkalan sampukan Gelombang simulasi (UKM8032sim.dump) vcs -RPP Gelombang intermediate (UKM8032sim.dump.vpd) Analisis gelombang simulasi Penjana paten ujian (DW8051_test_pattern_gen) mem_bus Ulang simulasi FIGURE 2 Overall workflow of designing and implementing UKM8032 microcontroller in FPGA 64 kB External ROM The encrypted core (DW8051_core.v) was RTL functionally verified by VCS before being integrated to the top-level design of UKM8032_asic at the second part. The verification was carried out by a provided simulation testbench (DW8051_core_tb.v) and a simulation’s output comparison with the provided core’s Golden Simulation Files (GSF). The core was also synthesized by DC and logic functionally verified (UNISIMS, FPGA simulation library required) before able to be integrated into top level UKM8032_fpga design at the final part. After a satisfactory result of the core’s synthesis which met the positive slack time, acceptable total area and operating frequency specification of 25MHz, the synthesized core netlist (DW8051_core_fpga.v) is ready to be instantiated into UKM8032_fpga. 64 kB External RAM Fail rujukan simulasi (SRF) untuk mikropengawal UKM8032_fpga: .ram .pct .wrt .stbs0 .stbp FIGURE 3 RTL functional verification method for UKM8032_asic UKM8032_fpga’s logic verification and FPGA implementation Final part at the right portion (Figure 2) was to verify multiple level of UKM8032_fpga by VCS after being synthesized by DC and during FPGA implemention of the design via Xilinx’s Integrated Synthesis Environment (ISE). The design was synthesized by targeting to the same Virtex2 FPGA synthesis library with customized timing constrains using Synopsys’s DC with good slack time, area optimization and operating frequency of 25MHz. The generated gate level netlist (UKM8032_mapped_hier.edif) was logic functionally verified. The verification was carried out by author’s written simulation testbench (UKM8032_tb.v) and a simulation’s output comparison with SRF from UKM8032_asic design (Figure 4). UNISIMS and SIMPRIMS simulation library were needed respectively to verify UKM8032_fpga in logic level and FPGA implementation level. UKM8032_asic’s RTL verification Second part at the centre portion (Figure 2) was for RTL functional verification of the UKM8032_asic by VCS to produce Simulation Result Files (SRF) at microcontroller level as a reference design for UKM8032_fpga at the final part. The verification was carried out by author’s written simulation testbench (UKM8032_tb.v). The testbench (Figure 3) basically reads in program files (in ‘.mem’ extension) that have been translated from hex code, does some opcode and other related tests of the microcontroller and finally writes out some 480 The 4th Annual Seminar of National Science Fellowship 2004 and-erase method used with EPROM. Besides, this method eases the troubleshooting works where there is no any external hard wire involved. Hence higher operating clock frequency up to 100Mhz could be achieved. In this way, the design would have 32 I/O ports available that enabled us to have wider testing experience. Successful verification of the written programs was an important step to prove the designed modules and the UKM8032 microcontroller as a whole were working properly. Maximum of 16Kbyte internal memory capacity could be implemented this way due to the limited amount of available onboard memory blocks (16). UKM8032list_fpga: Fail rujukan simulasi (SRF) : .ram .pct .wrt .stbs0 .stbp Sebelum Pelaksanaan FPGA: (peringkat-logik) UKM8032_iopad.v (netlist) Sebelum Pelaksanaan FPGA Pustaka simulasi Virtex2 (UNISIMS) atau Semasa Pelaksanaan FPGA: (peringkat pasca-penterjemahan) ukm8032_translate.v atau Semasa Pelaksanaan FPGA Pustaka simulasi Virtex2 (SIMPRIMS) atau (peringkat pasca-pemetaan) ukm8032_map.v & ukm8032_map.sdf 1 Fail keluaran simulasi di direktori ‘./sim_res_UKM8032’ atau (peringkat pasca-PAR) ukm8032_timesim.v & ukm8032.timesim.sdf Fail keluaran simulasi : .ram .pct .wrt .stbs0 .stbp Perbandingan fail keluaran simulasi UKM8032_tb Aturcara himpunan (.hex) cmd2task_s.pl -c Aturcara himpunan (.mem) Jika tidak sepadan Senarai aturcara himpunan untuk diuji (UKM8032_tb.cmd) Pangkalan sesiri Peranti ujian (DUT) UKM8032_fpga Senarai aturcara himpunan untuk diuji (UKM8032_tb.task) Senarai tugas untuk aturcara himpunan yang diuji (UKM8032_tb_main) Model peranti pangkalan sesiri (DW8051_ext_serial) Pangkalan sampukan 2 Sepadan Gelombang simulasi (UKM8032sim.dump) vcs -RPP Gelombang intermediate (UKM8032sim.dump.vpd) Analisis gelombang simulasi Penjana paten ujian (DW8051_test_pattern_gen) mem_bus 64 kB External ROM 64 kB External RAM Ulang simulasi Proses seterusnya FIGURE 4 Multiple level’s functional verification method for UKM8032_fpga. Mikropengawal ujian ‘UKM8050_fpga’ Aturcara himpunan ujian yang mudah dibangunkan Mikropengawal ‘UKM8032_fpga’ sebagai peranti yang diuji (DUT) There was a need to convert the generated netlist (UKM8032_mapped_hier.edif) to a netlist that ISE could understand by inserting buffers and pads to all of the design’s ports in DC. Input buffers (IBUF) for input ports, output buffers (OBUF) for output ports, general clock buffer (BUFGP) for clocking port and bidirectional buffers (IOBUF) for I/O ports of the design. Proceeded with the FPGA implementation, the converted netlist (UKM8032_iopad.edif) together with the customized constrains (25MHz clock frequency and I/O placement) were fed into ISE before the tool was able to translate, map, place and route (PAR) the design. Each level generated a simulation model for respective simulation level. The post-map and post-PAR simulation needed Standard Delay File (SDF) which provided real timing information to simulate. Success of all post-translate, post-map and post-PAR simulations resulted in the design’s bit file (UKM8032.bit) being generated to be downloaded to the Virtex2 FPGA chip by using Xilinx’s iMPACT after some configuration setup. The FPGA implementation was completed by a testing process which incorporated a 2kB sized internal ROM unit in the UKM8032_fpga design under test (DUT) i.e. UKM8050_fpga testing module (figure 5). Simple LED oriented programs have been written and loaded into the memory as a customizable constrains in ISE. This method helped in programs loading instead of burn- Kod hex untuk aturcara himpunan ujian Pelaksanaan FPGA Sub-sub komponen tambahan untuk mikropengawal ‘UKM8032_fpga’ Kod aturcara himpunan ujian dalam format .ucf Teras ‘DW8051_core_fpga_2kBrom’ yang telah dikonfigurasi untuk menyediakan antaramuka kepada ROM bersaiz 2kB Kekangan pelaksanaan FPGA (UKM8050constrain.ucf) irom_bus ROM bersaiz 2kB ‘ROM_2048_fpga’ (UKM8050_rom_2048_fpga.v) FIGURE 5 Testing method of UKM8032_fpga microcontroller using internal ROM unit within testing module UKM8050_fpga Results The clock frequency of the DW8051 core and the UKM8032 microcontroller was targeted at 25Mhz to prevent over-constrained issues. Figure 6 shows the successful simulation wave output that was being generated by testbench (figure 4) on opcode test of ‘op_25_35’ that mainly test on opcode 25 (ADD A,direct) and opcode 35 (ADDC A, direct) for UKM8032_fpga at post_PAR level by using SIMPRIMS FPGA simulation library. 481 The 4th Annual Seminar of National Science Fellowship 2004 FIGURE 6 ‘op_25_35’ opcode test simulation wave output of UKM8032_fpga microcontroller FIGURE 8 Logic schematic of synthesized toplevel UKM8032_fpga microcontroller Constrains such as operating frequency of 25Mhz, 2ns of the input and output delay, WCCOM operating condition and xc2v2504_avg wire load model were used to synthesize the top-level UKM8032_fpga design successfully in DC with a maximum slack time of 4.39ns (10.975%). The number of cells utilized was 171 which brought to a total cell area of 3104 units. The synthesized result of UKM8032_fpga are illustrated in Figure 7 and 8. Design implementation in Xilinx’s ISE showed a successful implementation in FPGA (FIGURE 9) with a total of 1440 (46%) slices and 2521 (41%) of 4-input LUTs were distributed. 37 (14%) external IOBs were used in the design. One out of 32 block RAMs was used for internal RAM and one of 16 GCLKs was used for clocking port. Altogether brought to a total of 86257 equivalent gate counts to implement the microcontroller design. Real-time clock operating frequency could reach up to 35.56MHz (28.122ns) instead of the targeted 25Mhz. FIGURE 7 External interface (40 pins) of the UKM8032_fpga microcontroller FIGURE 9 Successful translation, mapping and PAR implementation of the design in FPGA 482 The 4th Annual Seminar of National Science Fellowship 2004 TABLE 1 Comparison of synthesis and FPGA implementation results for main design UKM8032_fpga and testing module UKM8050_fpga Process Results Synthesis Target frequency (ns) Slack time (ns) Total cell (library unit) Library Area (‘libArea’) FPGA implementation: (i) Map Number of ‘Slices’ Number of 4-input LUTs Number of external IOBs Number of RAMBs memory block Number of GCLKs Number of equivalent gate counts (ii) PAR Operating frequency (ns) (iii) Testing Practical operating frequency Final output of the FPGA implementation via Xilinx’s ISE was the design’s bit programming file generation, ‘ukm8032.bit’ (Figure10). The bit file was generated to be downloaded onto the Xilinx Virtex2 FPGA board (device : xc2v500, package : fg456 and speed : -4) after some configurations have been setup via Xilinx’s iMPACT. Main design UKM8032_fpga (0kB iROM) 40 (atau 25MHz) 4.39 171 3104 Testing Module UKM8050_fpga (2kB iROM) 40 (atau 25MHz) 1.05 180 3119.5 1440 / 3072 2521 / 6144 37 / 264 1 / 32 1 / 16 86257 1412 / 3072 2465 / 6144 37 / 264 2 / 32 1 / 16 151457 28.122 (or 35.56MHz) 27.453 (or 36.43MHz) 70MHz 70MHz slack time of 1.05ns (2.625%). The number of cells was 180 which brought to a total cell area of 3119.5 units (Table 1). The FPGA implementation of UKM8050_FPGA was successful with a total of 1412 (45%) slices and 2465 (40%) of 4input LUTs were distributed and 37 (14%) external IOBs were used. Two blocks of RAMBs were used for internal RAM and ROM respectively with one GCLK for clocking port. Altogether brought to a total of 151457 equivalent gate counts to implement the testing module. The operating frequency expected to reach up to 34.04Mhz (29.380ns) but practically 70MHz onboard (LED oriented programs testing) instead of the targeted 25MHz. Discussion The design for test UKM8050_fpga microcontroller was also being synthesized with clock frequency of 50MHz (using 50MHz core’s frequency) which resulted in a maximum slack time of -2.72ns (-13.6%). The number of cells was 180 which brought to a total cell area of 3177 units. FPGA implementation of the design was also successful with a total of 1514 (49%) slices and 2686 (43%) of 4-input LUTs were distributed. 37 (14%) external IOBs, two RAMBs blocks and one GCLKs were used. FIGURE 10 Successful generation of design’s bit programming file (ukm8032.bit) The testing module, UKM8050_fpga microcontroller incorporated a 2Kbyte internal ROM was synthesized with the same constrains i.e.25MHz resulted in a maximum 483 The 4th Annual Seminar of National Science Fellowship 2004 Altogether brought to a total of increased 152783 equivalent gate counts to implement the testing module at targeted frequency 50MHz which expected to operate up to 50.83MHz (19.674ns) but practically 100MHz onboard (LED oriented programs testing). The maximum of 16Kbytes internal memory was also implemented and successfully tested which was named as UKM8054_fpga testing module. This was where both core and microcontroller have been fully utilized in term of speed and memory size disregarded the area constraints. If more than 16Kbytes of memory capacity is required, the UKM8032_fpga microcontroller can be interface externally to RAM and ROM memory in order to operate a larger programs. Acknowledgements The author wished to thank the Ministry of Science, Technology and Innovation (MOSTI), Malaysia for the National Science Fellowship awarded to Victer Chong. The author was also grateful to Prof. Dr. Masuri Othman for his expertise and experience in the interpretation of the overall progress and results. References Synopsys Inc. (2002). Designware DW8051 MacroCell Databook. Mazidi, M.A., Mazidi J.G. (2000). The 8051 Microcontroller & Embedded Systems. Prentice Hall Inc. Intel Corporation. (1994). MCS®51 Microcontroller Family User’s Manual. Williamette HDL Inc. (1999). Basic Verilog with VCS version 4.5. Palnitkar S. (2003) Verilog HDL: A Guide to Digital Design and Synthesis. Sun Microsystems Inc. Xilinx Inc. (2002). Xilinx 5 Software Manuals. Xilinx Inc. (2003). Xilinx Integrated Synthesis Environment (ISE) Fundamentals of FPGA Design. 484