PENGCHENG LI CSB 631 ! University of Rochester ! Rochester, NY, 14627 Phone: 585.732.8199 ! E-mail: landy0220@gmail.com Research Interests Compiler, programming systems, various parallelization techniques (automatic, runtime, speculative, transition memory), locality analysis and memory management, cloud-based storage and computing Education Ph.D., Computer Science, University of Rochester M.S., Institute of Computing Technology, Chinese Academy Sciences ! September, 2012 to Present Sep., 2007 to July, 2010 Recommended for Advanced Compiler Technology Lab of ICT’s Master Program without Tests B.S., University of Science & Technology of China September, 2003 to July, 2007 ! score 90/100, Rank 6/150 ! Freshman Scholarship (2003), Guang Hua Scholarship (2004), Huan Mao Tong Scholarship (2005) and Golden Scholarship (2006) for Top 1% students Work Experiences ! ! ! Research intern, NEC American Laboratories Research intern, Futurewei, (Huwei US. Research Lab) Inc. Senior engineer, Baidu, Inc. May, 2014 to August, 2014 June, 2013 to August, 2013 July, 2007 to July, 2010 Research & Project Experiences Statistical Memory Allocator at UR Graduate student research assistant C++, Java, Python, Ruby November, 2012 to Present Propose Liveness, a novel memory behavior analysis metric for object-oriented programs, which describes the number of live objects all time windows. ! Propose a real-time statistical analysis model and a behavior-oriented adaptive memory allocator for maximal performance and minimal memory consumption. ! Data Race Detection in GPU Programs at UR Graduate student research assistant C, CUDA November, 2013 to Present Implement a novel data structure expansion compiler framework based on LLVM for GPU programs to protect memory and isolate memory from different warps. ! Propose a novel two-run approach and a lightweight runtime library by utilizing GPU’s hardware characteristics to detect data races in a sound and convergent way. ! Speculative parallelization was explored to reduce performance overhead. ! Runtime Code Layout Optimization at UR Graduate student research assistant C,C++ May, 2013 to Present Choose and encapsulate code blocks that will be scheduled by using LLVM. Schedule code blocks to the best addresses at runtime based on our hierarchical affinity model and conflict aware model. Aimed at optimizing long-running real-world applications. ! A Lightweight Speculative Parallelization Mechanism at NEC Lab C,C++ Research intern ! ! ! May, 2014 to August, 2014 Target at sequential codes mainly regarding array-based loops. Design a lightweight dependence validation mechanism for array accesses at runtime. Performance improvements: up to 4x performance speedup for 6 six programs we tested. Code Layout Optimization For Real-world Applications at Futurewei, Inc. Research intern C,C++ June, 2013 to August, 2013 Propose a novel hierarchical affinity locality model on code blocks. ! Study two code layout optimizations in LLVM, globally function reordering optimization and across-procedure basic-block reordering optimization, which is rarely studied before. ! Performance improvements: up to 26% for SPEC CPU2006. ! Baidu Flow Clean System at Baidu, Inc. C,C++ Senior Engineer ! ! ! June, 2010 to July, 2012 Develop custom multi-combination anti-attack flow clean system for all services of Baidu. Design many anti-attack strategies, including abundant of three and seven level strategies. Performance: up to 6 million PPS forward performance under 2.6.32 version Linux kernel. Baidu Gateway NAT System at Baidu, Inc. C,C++ Senior Engineer June, 2010 to July, 2012 Study Linux LVS (Linux virtual server) and develop Baidu NAT system implemented on a many-core platform. ! Performance: up to 10 gigabytes forward performance of 64 bytes packets. ! Develop many optimizations including lock free optimization and storage optimization, etc. ! Programming Model for Many-core Cluster at ICT, CAS C,C++,UPC Graduate student research assistant February, 2008 to June, 2010 Transplant UPC runtime on to a many-core architecture developed by ICT. ! Develop compile-time and run-time systems for hierarchy data distribution language extensions. ! Design compiler optimizations: dynamic SPM management, horizontal communication optimization, etc. ! A Static Program Slicing Tool for Dawning 5000 cluster at ICT, CAS Graduate student research assistant C,C++ September, 2007 to January, 2008 Study and develop communication slicing tool to lessen time of collecting communication pattern, performance is improved up to 90% for NPB-MPI benchmark. ! A SIMD Optimization Compiler System at ICT, CAS Graduate student research assistant C,C++ February, 2007 to June, 2007 ! ! ! Design data-contraction optimization improving performance of applu in SPEC2000 by 84% and structure-reorganization optimization improving performance of mcf in SPEC2000 by 30% Fix bugs for a source-to-source translation module for FORTRAN language in Open64 compiler. Publications Pengcheng Li, Chen Ding and Hao Luo. “Modeling Heap Data Growth Using Average Liveness”, in 2014 International Symposium on Memory Management (ISMM '14), Edinburgh, Scotland, UK, June, 2014. Pengcheng Li, Hao Luo, Chen Ding, Ziang Hu and Handong Ye. “Code Layout Optimization for Defensiveness and Politeness in Shared Cache”, In 43nd International Conference on Parallel Processing (ICPP '14), Minneapolis, Minnesota, USA, September, 2014. Chen Ding and Pengcheng Li. “Cache-Conscious Memory Management”, in 2014 Workshop on Memory System Performance and Correctness (MSPC '14), Edinburgh, Scotland, UK, June, 2014. Hao Luo, Pengcheng Li and Chen Ding, “Optimal Thread-to-Core Mapping for Pipeline Programs”, in 2014 Workshop on Memory System Performance and Correctness (MSPC '14), Edinburgh, Scotland, UK, June, 2014. Pengcheng Li, Chen Ding, Xiaoyu Hu and Tolga Soyata, “LDetector: A Low Overhead Race Detector For GPU Programs”, in 5th Workshop on Determinism and Correctness in Parallel Programming (WoDet '14), Salt Lake City, Utah, USA, March, 2014. Pengcheng Li and Chen Ding, “All-window Liveness”, in 2013 Workshop on Memory System Performance and Correctness (MSPC '13), Seattle, Washington, USA, July, 2013. IT Skills Proficient: C, C++, CUDA; Familiar: Java, MPI, OpenMP, FORTRAN. ! Strong experiences in software development under LINUX and scripts languages, such as Ruby, Shell. ! Proficient in LLVM and related analysis tools, such as inter-procedure analysis, symbolic analysis tool ! Proficient in Open64 compiler and inside compiler optimizations. ! Proficient in Runtime technologies, parallel/concurrent programming, multi-thread programming ! Proficient in Linux LVS load balance system and mature big-company-scale load balancer system ! Solid knowledge: NVIDIA architectures, algorithm design and analysis, computer architecture, compiler theory, data structure, design patterns, operating system, computer network, etc. !