ameyk1(at)umbc.edu http://www.csee.umbc.edu/~ameyk1/ https://www.linkedin.com/in/ameyk
(443)-310-7323
My research goal is to enhance future computing systems by implementing efficient hardware, and algorithmic optimizations for hardware performance. I am particularly interested in machine learning acceleration to build hardware systems that are resource efficient, cognitive and trustworthy. My current research focuses on data reduction and encryption using compressive sensing, hardware Trojan detection using machine learning techniques, and many-core architecture enhancements for big data processing and machine learning. My work is currently targeted for improving wide variety of applications including image processing, biomedical and radar signal processing. I am interested in applying my research experience to embedded, cyber-physical and IoT applications.
University of Maryland, Baltimore County (Baltimore, MD, USA) August2012-Present
PhD. Candidate in Computer Engineering
Thesis: Secured Embedded Many-Core Accelerator for Big Data Processing
Advisor: Professor Tinoosh Mohsenin
Vellore Institute of Technology (Vellore, TN, India) June2008-May2010
M.Tech in VLSI Design
Thesis: Dynamic Energy Efficient Memory Controller for a H.264/AVC Application
Advisor: Professor V. Arunachalam
University of Mumbai (Mumbai, MH, India)
B.E. (Bachelor of Engineering) in Electronics and Telecommunication
July2004-May2007
Languages Known : Verilog, VHDL, PERL, Python, UNIX, Matlab, MPI, Cuda and C++
Tools familiar with : Cadence RTL Compiler, Encounter, NCSim and Virtuoso, Synopsys Design Compiler,
Mentor Graphics ModelSim, Altera Quartus, Xilinx ISE tools, NVIDIA Visual Profiler
Research Assistant, Advisor: Professor Tinoosh Mohsenin
Energy Efficient High Performance Computing (EEHPC) Lab
August2013-Present
University of Maryland, Baltimore County, Computer Science and Electrical Engineering,
Algorithm Enhancements and Architectures for Efficient Recovery of Compressed Signals
Designed two modifications to Compressive Sensing (CS) reconstruction OMP algorithm, which significantly reduce hardware complexity and execution time. Studied impact of modifying two different stages of algorithm on reconstruction matrix properties and characteristics such as column reductions threshold, and degree of randomness in stochastic randomness. Theories were verified using ASIC implementations on 65nm CMOS technology for image reconstruction applications.
Quality of hardware reconstruction is verified using Signal to Reconstruction error rate and Signal to noise ratio.
Low Overhead CS-based Framework for Big Data Acceleration
Designed CS-based encryption technique to secure big data transfers between memory and host processor. CS-based encryption was prototyped using two FPGA for secured data communications.
Built CS-based data transfer reduction framework, which reduces up to 60% of data transfers to accelerate Big Data processing on hardware and demonstrated using ARM CPU and PENC many-core platform on Hadoop MapReduce platform. Face detection algorithm is implemented on Hadoop platform to demonstrate quality of reconstruction.
VLSI Design of PENC Many-Core Architecture
Key role in design and implementation of 192 core many-core chip called Power Efficient Nano
Cluster (PENC). The PENC many-core demonstrated improved efficiency of biomedical and machine learning applications. The key features of PENC include: Globally Asynchronous locally Synchronous
(GALS) architecture, shared memory per cluster and per processor dynamic voltage and frequency scaling capabilities.
Intern May2014-August2014
USC Information Science Institute (USC-ISI), Arlington Virginia
Feedback based Real-Time Trojan Detection in Integrated Circuits
Designed and implemented a framework for run-time Trojan detection using machine learning algorithms. Built training data set (consisting of 10,000 plus records) based on hardware behavior analysis for three different types of Trojans. Designed and implemented Support Vector Machine and
Modified Balance Winnow machine learning algorithm on FPGA. Demonstrated accuracy of Trojan detection framework on many-core architecture, where Trojans were inserted using simple three gate logic. The framework shows 94% accuracy of detection and it has 2% area and 3% power overhead.
Teaching Assistant August2012-May2014
University of Maryland, Baltimore County, Computer Science and Electrical Engineering
CMPE 415: Programmable Logic Devices
Directed all discussion classes, graded homework, projects and final exam. Designed two homework mainly concentrating on state machine implementation using Verilog HDL.
CMPE 311: C Programming and Embedded Systems
Directed all discussion classes, two different sessions per week. Graded homework, projects and final exam for an undergraduate laboratory concentrated on embedded system design using ATMega169.
CMPE640: Custom VLSI Design
Directed all discussion classes, graded homework, projects. Developed tutorials for Cadence Virtuoso and analog design environment.
VLSI Design and Verification Engineer
Silicon Interfaces Pvt. Ltd., Mumbai, India
Jaunary2012-June2012
Development of Wireless LAN 802.11n
Key member in team of IP core developers, that designed and developed MAC management and control functionality. Designed Quality-of-Service (QoS) Traffic Scheduling module and verified it using direct random test cases using Verilog HDL. The design can handle packetized Direct Sequence
Spread Spectrum (DSSS) and Orthogonal Frequency Division Multiplexing (OFDM) data transmission.
The IP was taped out using 65nm technology.
June2010-September2011 VLSI Engineer
Three D Microsystem Pvt. Ltd., Mumbai, India
InfiniBand 12x LPE Protocol
Was key member of verification team, executing responsibilities like implementation of the "e" based
Verification IP using monitors, transactors and scoreboard. Involved in planning Test Strategy and developing Test Cases for verifying the functionality of the DUT. Implementation of Verification environment that includes directed, direct random and random test cases to verify the DUT.
Amey Kulkarni , Youngok Pino, Matthew French, and Tinoosh Mohsenin, “Real-Time Anomaly Detection
Framework for Many-Core Router through Machine Learning Techniques”, ACM Journal on Emerging
Technologies in Computing Systems, Vol. 13, No. 1, Article 10 , Publication date: March 2016
Amey Kulkarni , Colin Shea, Tahmid Abtahi, and Tinoosh Mohsenin, “Low Overhead CS-based
Heterogeneous Framework for Big Data Acceleration”, ACM Transaction on Embedded Computing Systems
(submitted, under review)
Amey Kulkarni , and Tinoosh Mohsenin, “Low Overhead Architectures for OMP Compressive Sensing
Reconstruction Algorithm”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems (to be submitted)
Amey Kulkarni , Youngok Pino, and Tinoosh Mohsenin, “Adaptive Real-time Trojan Detection Framework through Machine Learning”, 2016 IEEE International Symposium on Hardware Oriented Security and Trust
(HOST), Washington, DC, 2016.
Amey Kulkarni, Ali Jafari, Chris Sagedy and Tinoosh Mohsenin, “Sketching-based High-Performance
Biomedical Big Data Processing Accelerator”, in Circuits and Systems (ISCAS), 2016 IEEE International
Symposium on , 24-27 May 2016. (Invited Paper)
Amey Kulkarni , Youngok Pino, and Tinoosh Mohsenin, “Supervised Post-Deployment Trojan Detection in
Integrated Circuits”, in 17th International Symposium on Quality Electronic Design (ISQED), 2016.
Amey Kulkarni, Ali Jafari, Colin Shea, and Tinoosh Mohsenin, “CS-based Secured Big Data Processing on
FPGA”, 24th Annual IEEE Symposium on
FCCM'16.
, Washington DC.
Field-Programmable Custom Computing Machines, 2016.
Amey Kulkarni, Tahmid Abtahi, Emily Smith, and Tinoosh Mohsenin, “Low Energy Sketching Engines on
Many-Core Platform for Big Data Acceleration”, in Proceedings of the 26th edition of the great lakes symposium on VLSI (GLSVLSI '16). ACM, New York, NY, USA.
Adam Page, Amey Kulkarni , and Tinoosh Mohsenin, “Utilizing deep neural nets for an embedded ECGbased biometric authentication system”, in Biomedical Circuits and Systems Conference (BioCAS), 2015
IEEE , vol., no., pp.1-4, 22-24 Oct. 2015.
Amey Kulkarni, Tinoosh Mohsenin , “ Accelerating compressive sensing reconstruction OMP algorithm with
CPU, GPU, FPGA and domain specific many-core”, in Circuits and Systems (ISCAS), 2015 IEEE International
Symposium on , vol., no., pp.970-973 , 24-27 May 2015.
Mohammad Khavari Tavana, Amey Kulkarni , Abbas Rahimi, Tinoosh Mohsenin, and Houman Homayoun,
“Energy-efficient mapping of biomedical applications on domain-specific accelerator under process variation”, In Proceedings of the 2014 international symposium on Low power electronics and design (ISLPED '14). ACM, New York, NY, USA, 275-278.
Tawana Khawari, Amey Kulkarni, Abbas Rahimi, Tinoosh Mohsenin and Houman Homayoun “Energy-
Efficient Mapping of Real-time Tasks in Many-Core Accelerator Under Process Variation”, ACM/IEEE 51st
Design Automation Conference, DAC 2014 (Work-In-Progress)
Amey Kulkarni , Houman Homayoun, and Tinoosh Mohsenin, “A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction”, in Proceedings of the 24th edition of the great lakes symposium on VLSI (GLSVLSI '14). ACM, New York, NY, USA, 299-304.
Amey Kulkarni and Tinoosh Mohsenin, “ Parallel And Reconfigurable Architectures For OMP Compressive
Sensing Reconstruction Algorithm”, International SPIE Conference on Defense, Security, and Sensing,
May2014.
Amey Kulkarni and Tinoosh Mohsenin, “High Performance Architectures for OMP Compressive Sensing
Reconstruction Algorithm”, 39th Annual GOMACTech Conference, April 2014.
Amey Kulkarni and V.Arunachalam, “FPGA Implementation of Dynamic Energy Efficient Memory
Controller for a H.264/AVC Application”,
Edition.
International Journal of Computer Application (IJCA), April'2011
Amey Kulkarni and V.Arunachalam, “FPGA Implementation & Comparison of Current Trends in Memory
Scheduler for Multimedia Application”, International Conference and Workshop on Emerging Trends and
Technology (ICWET) , February2011.
Amey Kulkarni and V.Arunachalam, “FPGA Implementation Of Dynamic Memory Access Scheduler”,
International Conference on Communication, Computers and Devices (ICCCD) 2010, December2010.
HGST San Jose research center, San Jose, CA
“Secured Embedded Many-Core Accelerator for Big Data Processing”, March 2016.
NVIDIA, Santa Clara, CA
“Secured Embedded Many-Core Accelerator for Big Data Processing”, March 2016.
USC – Information Science Institute, Arlington, VA
“Real-time Security for Many-Core through Machine Learning Approach”, May 2014.
Graduate Student Mentor 2013- Present
Guided final year undergraduate and graduate students for various projects. Responsibilities included development of a simulator for the PENC Many-Core Processor. FPGA prototyping and ASIC implementation of supervised machine learning algorithms in biomedical applications.
Guest Lecturer
CMPE 650: Digital Systems January2014
Directed students on efficient coding using Verilog HDL for digital hardware design, prepared notes and delivered a lecture on best practices for state machine design using Verilog HDL.
CMPE415: Programmable Logic Devices January2014
Guided students on correct usage of full case parallel case directives, delivered a lecture and explained practically on Xilinx FPGA.
CMPE 650: Digital Systems April2015
Demonstrated UART interface using Verilog HDL in class and delivered a lecture. Directed students to write effective use of Verilog HDL for the course project.
CMPE415: Programmable Logic Devices April2015
Delivered a lecture on synthesis techniques for Asynchronous FIFO designs and explained practically on
Xilinx FPGA using RTL synthesis.
CMPE 650: Digital Systems March2016
Prepared notes and delivered lecture on supervised machine learning algorithms and its hardware implementation strategies for the course project.
CMPE 641: Advanced VLSI Design March2016
Directed students to place and route the design using Cadence Encounter tool, prepared notes and delivered a lecture on static timing analysis of the design.
Reviewer
ACM Transactions on Reconfigurable Technology and Systems
IEEE International Symposium on Quality Electronic Design (ISQED2013)
IEEE International Symposium on Circuits and Systems (ISCAS2014)
IEEE International Conference on Computer Design (ICCD2015)
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM2015)
ACM/IEEE Design Automation Conference (DAC2016)
International Solid State Circuits Conference (ISSCC2016, ISSCC2015)
University of Maryland, Baltimore County Research Assistantship (2013 – Present)
Travel Grant award, IEEE Hardware Oriented Security and Trust (HOST) Conference Committee, May 2016
Travel Grant award, ACM Great Lake Symposium on VLSI (GLSVLSI) Conference Committee, May 2016
Travel Grant award, Chair of Computer Science and Electrical Engineering Department, University of
Maryland Baltimore County, March 2016
Graduate Student Travel Award, University of Maryland Baltimore County, May 2014
Best paper award nominee, International Conference and Workshop on Emerging Trends and Technology
(ICWET), Mumbai, India, May 2011
Computer Architecture
ASIC Designs
Signal Processing for Big Data
Advanced Machine Learning
Probability and Random Processes VLSI Design
Digital Image Processing
Memory System design
Fault-tolerant designs Low-Power IC designs
Digital Signal Processing Hardware Implementation
Professor Tinoosh Mohsenin
Department of Computer Science and Electrical Engineering
University of Maryland Baltimore County, Baltimore, MD http://www.csee.umbc.edu/~tinoosh/ tinoosh(at)umbc.edu
Dr. Youngok Pino
Computer Engineer
NAVSEA, Washington D.C youngok.ko(at)gmail.com
Professor Farinaz Koushanfar
Professor and Henry Booker Faculty Scholar of Electrical and Computer Engineering
University of California, San Diego, CA www.ece.rice.edu/~fk1/ farinaz(at)ucsd.edu
Professor Houman Homayoun
Department of Electrical and Computer Engineering
George Mason University https://ece.gmu.edu/~hhomayou/ hhomayou(at)gmu.edu