word

advertisement
NIKET KUMAR CHOUDHARY
Phone: 845-337-6468
E-mail: nkchoudh@ece.ncsu.edu
Web: www4.ncsu.edu/~nkchoudh
EDUCATION
Ph.D. in Computer Engineering
North Carolina State University, Raleigh, NC, GPA: 3.84/4.0
(Aug 09–May 12, Expected)
M.S. in Computer Engineering
North Carolina State University, Raleigh, NC, GPA: 3.88/4.0
(Aug 07–Aug 09)
Bachelor in Information & Communication Technology
Dhirubhai Ambani Institute of Information and Communication Technology, India
(Aug 01-May 05)
RESEARCH INTERESTS
Computer architecture, Processor microarchitecture, Low-power & low-effort design methodology, 3-D IC
architecture design, Dynamic binary translation & optimization, Emerging technologies and their
interaction with architecture.
WORK EXPERIENCE
Research Assistant, North Carolina State University, NC, USA
(Aug 07– Present)
Advisor: Eric Rotenberg
• FabScalar: Developed a novel approach to generate synthesizable RTL of an arbitrary out-of-order
superscalar processor based on canonical pipeline template. Generated processors’ RTL differ in the
three major dimensions: superscalar width, pipeline depth, and sizes of structures for extracting
instruction-level parallelism (ILP).
Note: FabScalar Infrastructure has been released to the research community and it is currently being
used in multiple universities.
• Design of Heterogeneous Multi-Core: Exploring heterogeneous multi-core architectures comprised
of many FabScalar-generated superscalar cores, each customized to different application
characteristics. This new paradigm enables higher computational efficiency with lower design &
verification cost.
• Energy Efficient 3-D CPU: Exploring 3-D IC based heterogeneous multi-core architectures.
Focusing on fast thread migration using through-silicon-vias and integration of diverse cores designed
on different process technology nodes.
• Design Space Exploration techniques: Application of classic search and machine-learning
techniques for fast design space exploration to find an optimal processor design.
• Performance and Power modeling: Developed detailed timing and power model of a processor in
C++, and validated it against RTL implementations. The model is used to study performance and
power bottlenecks.
Research Intern, Microsoft Corporation, Redmond, USA
(May 11- Aug 11)
Group: XCG CAPS & Computer Architecture, Manager: Doug Burger
• Worked on the micro-architecture for the E2 Dynamic Multicore Processor. E2 is an advanced Explicit
Datagraph Execution (EDGE) instruction set architecture.
• Developed components relating to composing cores together to achieve better single-thread
performance.
Software Engineering Intern, Intel, Santa Clara, USA
(May 10- Aug 10)
Group: Binary Translation (BiTS), Manager: Daniel M. Lavery
• Binary translation (BT) system analysis including evaluating the impact of the dynamic BT system on
processor microarchitecture.
1
Design Engineer, ARM Private Ltd, Bangalore, India
(Aug 05– Jul 07)
Group: Processor Division; Manager: Rahoul Varma
• Implementation of Dynamic Voltage and Frequency Scaling solution for ARM1176JZF-S processor.
• Benchmarking of multiple ARM processors for power, performance, and area targeted to different
CMOS technologies (130, 90, & 65nm) using Cadence and Synopsys digital products.
Intern, Cadence Design System, Bangalore, India
(Jan 05– May 05)
Group: RTL Compiler, Manager: Taher Abbasi
• Developed solutions for efficient synthesis of high performance floating point datapath based on IEEE
754 standard.
Intern, Reliance Infocomm, Mumbai, India
• Undergraduate internship.
(May 03– Jul 03)
TECHNICAL SKILLS
Simulators:
Programming Language:
Professional Tools:
Operating Systems:
GEMS, SimpleScalar
C, C++, SystemC, CUDA, OpenMP, Verilog HDL, SPICE, Perl
Cadence (NC-Verilog, SOC Encounter, Virtuoso), Synopsys (Design
Compiler, PrimeTime), MATLAB
Linux, Unix, Windows
AWARDS AND ACHIEVEMENTS
•
•
•
•
•
IEEE Micro’s Top Picks (2012), FabScalar work selected as one of the 12 computer architecture
papers of 2011
Top Picks recognizes the most significant research papers in computer architecture based on novelty
and long-term impact every year.
Received 1st place in graduate student category of ACM Student Research Competition (2010)
Awarded student travel grant to attend ISCA-2011, PACT-2010, PACT-2009, and ISCA-2009
Member National Scholars Honor Society
Chairman IEEE student branch, DAIICT (2004)
PUBLICATIONS
•
Niket K. Choudhary et al. “FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores
within a Canonical Superscalar Template”, IEEE Micro, Special Issue: Micro's Top Picks from 2011
Computer Architecture Conferences (MICRO TOP PICKS), Vol. 32, No. 3, May/June 2012.
•
B.H. Dwiel, Niket K. Choudhary, and Eric Rotenberg. “FPGA Modeling of Diverse Superscalar
Processors”, Proceedings of the IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS), 2012
•
George Patsilaras, Niket K. Choudhary, and James Tuck. "Efficiently Exploiting Memory Level
Parallelism on Asymmetric Multicore Processors in the Dark Silicon Era", Proceedings of the ACM
Transactions on Architecture and Code Optimization (TACO) special issue on High-Performance and
Embedded Architectures and Compilers (HiPEAC), 2012.
•
Niket K. Choudhary et al. “FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores
within a Canonical Superscalar Template”, Proceedings of the IEEE/ACM International Symposium
on Computer Architecture (ISCA), 2011.
•
Niket K. Choudhary, Sandeep Navada, R. Ginjupali, and G. Khanna. “An Exploration of OpenCL on
Multiple Hardware Platforms for a Numerical Relativity Aplication”, Proceedings of the IASTED
International Conference on Parallel and Distributed Computing and Systems (PDCS), 2011.
2
•
S. Navada, Niket K. Choudhary, and Eric Rotenberg. "Criticality-driven Superscalar Design Space
Exploration", Proceedings of the IEEE/ACM International Conference on Parallel Architectures and
Compilation Techniques (PACT), 2010.
•
G. Patsilaras, Niket K. Choudhary, and James Tuck. "Design Trade-offs for Memory Level
Parallelism on a Asymmetric Multicore System", Workshop on Parallel Execution of Sequential
Programs on Multi-core Architectures (PESPMA-3), in conjunction with ISCA-37, 2010.
•
H. H. Najaf-abadi, Niket K. Choudhary, and Eric Rotenberg. "Core-Selectability in Chip
Multiprocessors", Proceedings of the IEEE/ACM International Conference on Parallel Architectures
and Compilation Techniques (PACT), 2009.
•
Niket K. Choudhary et al. "FabScalar", Workshop on Architecture Research Prototyping (WARP-4),
in conjunction with ISCA-36, 2009.
•
Niket K. Choudhary et al. "ARM's IEM Implementation with Cadence Digital IC Products",
Proceedings of the Cadence Designer Network Live Conference (CDNLive), October 2006.
PRESENTATIONS
•
•
•
FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar
Template, ISCA-38, San Jose, 2011.
FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar
Template, PACT-19, Vienna, 2010.
ARM's IEM Implementation with Cadence Digital IC Products, CDNLive, Bangalore, 2006.
PROFESSIONAL SERVICE
•
•
External reviewer for International Conference on High-Performance Computer Architecture 2011.
Reviewer for ARM architecture material of Computer Organization and Architecture textbook by
William Stallings (latest edition).
COURSE PROJECTS
•
•
•
•
•
•
•
•
•
•
•
•
•
Implementation of multicluster microarchitecture simulator in C++, and integrated power and
performance analysis of different design choices.
Parallelization of gravitation wave modeling, Lax Wendroff algorithm, using OpenCL for ATI graphic
processor.
Parallelization of image rotation function using CUDA for nVidia graphic processor.
Implementation of process scheduling algorithms, reader/write locks, and virtual memory support on
XINU (Intel x86 based operating system).
Implementation of Owner and Sharer predictors in Simultaneous Multiprocessing environment on
MOESI coherence protocol using GEMS package in Simics.
Implementation of RegionScout Mechanism on MOSI coherence protocol using GEMS package in
Simics.
Implementation of a superscalar out-of-order pipeline simulator based on Tomasulo’s algorithm in C.
Implementation of L1 and L2 data cache simulator in C with Markov chain prefetcher for different
write policies for characterizing memory behavior.
Implementation of branch predictor simulator in C for characterizing branch behavior.
Verilog RTL implementation of regular expression matching hardware.
VLSI implementation of 48-bits 4-port SRAM on 45nm process node (Schematic Design to Final
Layout), optimized for energy-delay product.
Lexing, parsing, code generation, optimization and scheduling for MiniC (a reduced form of C)
Implementation of a simple reliable transport protocol on top of UDP on a UNIX machine.
3
RELEVANT COURSES
Advance Microarchitecture
Advance Parallel Computer Arch
Multi-core/Many-core Architecture and Programming
Digital ASIC Design
Parallel Computer Architecture
Code Generation and Optimization
Operating System
VLSI Systems Design
REFERENCES
Dr. Eric Rotenberg ericro@ece.ncsu.edu Phone: 919-513-2822
Dr. Greg Byrd
gbyrd@ece.ncsu.edu Phone: 919-513-2508
Other references available on request.
PERSONAL
Citizen of India holding F1 visa.
4
Download