DOCX - WWW4 Server - North Carolina State University

Brandon Harley Dwiel
(919) 213-1236
Doctor of Philosophy in Computer Engineering
North Carolina State University
Master of Science in Computer Engineering
North Carolina State University
Bachelor of Science in Computer Engineering
Southern Illinois University, Carbondale
Minor: Computer Science, Mathematics and Management
Intern, Intel Corporation
Summer 2012
Group: Server Development Group; Manager: Sailesh Kottapalli
Was part of an architecture team responsible for planning and designing communication components on a future Intel
server product. The components were related to the power management unit, the random number generator and other
uncore components. Tasks included communicating with other teams in order to assess all requirements and restrictions;
creating high-level designs and specifications using Visio and Word; presenting this work to other teams for feedback
and approval.
Research Assistant, North Carolina State University
2010 –Present
Advisor: Dr. Eric Rotenberg
 Exploiting 3-D IC for Energy Efficient Heterogeneous Processors
Lead architecture researcher charged with designing a heterogeneous processors that, when implemented using 3-D IC
technology, delivers an improved performance/power spectrum over the 2-D version of the same design. Avenues of
contributions includes: design and verification of cores using Verilog; researching 3D techniques using C++ and Verilog
simulations (e.g., 3D fast thread migration, 3D resource sharing among cores on different tiers and exploiting on-chip
DRAM); and the development of power modeling tool for different processor designs using Cacti and PrimeTime results
from synthesizing a wide range of core designs. The project includes two fabrications – one 2D and one 3D.
 FPGA Modeling of Diverse Superscalar Processors
Developed a tool to automatically construct, synthesize and simulate diverse superscalar processors on an FPGA.
Performance, resources and complexity are managed using various FPGA-specific techniques (e.g., clock decoupling).
The project included integrating 3rd-party IP such as DRAM, UART and SD Card controllers.
 FabScalar
Contributed to FabScalar, a tool for generating RTL of out-of-order superscalar processors with different superscalar
widths, depths and structure sizes. Major contributions include bug fixes, porting to MIPS ISA, updating the code to
SystemVerilog and various frontend and backend design improvements.
 C++ Processor Simulator Development
Updated an in-house C++ multi-core processor simulator to support the research group’s needs. Improvements include:
porting the simulator from Alpha to the MIPS ISA, adding support for heterogeneous core types and thread migration,
and developing new in-order and out-of-order core models.
 Web Application Development
Conceptualized and developed a web application used to generate and view graphs comparing user-requested statistics of
simulation runs. The application server stores and retrieves simulation results from a database and sends the userrequested data to the client, which generates and displays an interactive graph. Required the use and knowledge of
Python, SQLite, HTML, Javascript/JQuery/AJAX and various popular web application frameworks.
 Comprehensively and Dynamically Reconfigurable Superscalar Processor
Contributed to the initial concepts and designs of a dynamically reconfigurable superscalar processor. Each configuration
mimics the frequency, performance and power of static designs while also dynamically adapting to changing workload
Undergraduate Research Assistant, Southern Illinois University, Carbondale
2008 –2009
Advisor: Dr. Wei Zhang
 Study on the Energy Efficiency of Just-in-time Compiler Optimizations
Researched the impact of dynamic compiler optimizations on the performance and energy consumption of mobile
devices. Energy dissipation was modeled using Wattch models. This research was presented at the Argonne National
Laboratory’s Symposium for Undergraduates in Science, Engineering and Mathematics (2008).
E. Rotenberg, Brandon H. Dwiel, E. Forbes, Z. Zhang, R. Widialaksono, R. Basu Roy Chowdhury, N. Tshibangu, S.
Lipa, W. R. Davis, and P. D. Franzon. “Rationale for a 3D Heterogeneous Multi-core Processor”, Proceedings of the 31st
IEEE International Conference on Computer Design (ICCD-31), October 2013.
S. Priyadarshi, N. K. Choudhary, B. Dwiel, A. Upreti, E. Rotenberg, R. Davis, and P. Franzon. “Hetero2 3D Integration:
A Scheme for Optimizing Efficiency/Cost of Chip Multiprocessors”, Proceedings of the 14th IEEE International
Symposium on Quality Electronic Design (ISQED), March 2013.
N.K. Choudhary, Brandon H. Dwiel, and Eric Rotenberg. “A Physical Design Study of FabScalar-generated Superscalar
Cores”, Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), 2012.
N. K. Choudhary, S. Wadhavkar, J. Gandhi, T.A. Shah, H. Mayukh, Brandon H. Dwiel, S. Navada, H.H. Najaf-abadi,
and Eric Rotenberg. “FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical
Superscalar Template”, IEEE Micro, Special Issue: Micro's Top Picks from 2011 Computer Architecture Conferences
(MICRO TOP PICKS), Vol. 32, No. 3, May/June 2012.
Brandon H. Dwiel, N. K. Choudhary, and Eric Rotenberg. “FPGA Modeling of Diverse Superscalar Processors”, in
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2012.
N. K. Choudhary, S. Wadhavkar, J. Gandhi, T.A. Shah, H. Mayukh, Brandon H. Dwiel, S. Navada, H.H. Najaf-abadi,
and Eric Rotenberg. “FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical
Superscalar Template”, in Proceedings of the 38th International Symposium on Computer Architecture (ISCA), 2011.
Tools: Cacti, McPAT, Marss86, GEM5, SimpleScalar, Xilinx ISE, Cadence (NC-Verilog, Virtuoso, Encounter), Synopsys
(Design Compiler, PrimeTime), SPICE, Mentor (ModelSim, Questa), Universal Verification Methodology/Open
Verification Methodology
Programming Languages: SystemVerilog, VHDL, C, C++, System C, Java, OpenMP, MPI, CUDA, OpenCL, Perl, Python,
HTML, Javascript, JQuery, AJAX, Linux shell and Linux system programming
Implementation of a novel, copy-free RMT recovery scheme (C++ and Verilog)
GPGPU implementation of an image rotation function (CUDA and OpenCL)
Schematic to final layout of an 8x8 CAM using 45 nm technology (optimized for 𝐸𝐷 2 )
Implementation of an out-of-order superscalar pipeline simulator based on Tomasulo’s algorithm (C++)
Implementation and physical design of a variable block size motion estimator (VBSME) for low-power H.264 video
compression (Verilog)
Frontend and backend compiler implementation from scratch for the ICE9 academic ISA (Flex, Bison, C)
Frontend, code generation and optimizations implemented for the LLVM compiler (Flex, Bison, C++)
Implementation of a multi-core cache simulator with the MESI cache coherence protocol (C)
Parallelization of various programs using OpenMP, MPI, CUDA and the Cell processor (C)
Computer Architecture: Advanced Microarchitecture, System Architecture for Data Centers, Computer Design and
Technology, Architecture of Parallel Computers, Multi-Core/Many-Core GPU Architecture and Programming, Compiler
Code Generation and Optimization
VLSI: Electronic System Level and Physical Design, Digital Electronics, Digital ASIC Design, VLSI Systems Design
Programming: Programming Parallel Systems, Embedded Systems with Linux and Android, Compiler Construction