UP-STAT 2016 Fifth Annual Joint Conference of the Upstate Chapters of the American Statistical Association April 22-23, 2016, Canisius College, Buffalo, NY Data Science, Statistical Practice, and Education Organizing Committee Dr. Michael McDermott, University of Rochester (Program Chair) Dr. Ernest Fokoué, Rochester Institute of Technology (Data Competition Chair) Dr. Leonid Khinkis, Canisius College Dr. Yusuf Bilgic, SUNY Geneseo Mr. Christopher Claeys, KJT, Inc. (Webmaster) Mr. Padraic Neville, SAS Institute Dr. John Handley, PARC, Inc. Dr. Tanzy Love, University of Rochester Local Organizing Committee Dr. Leonid Khinkis, Canisius College Dr. Mel Crotzer, Dr. Adina Oprisan Dr. Jeff Miecznikowski Dr. Bruce Sun Dr. William Brady Dr. James Huard Dr. Debra Burhans All other Canisius College and ASA Buffalo members Conference Venue Canisius College 2001 Main St Buffalo, NY14208 Most events will take place in the Science Hall building. WIFI is available on campus. Friday 9-11, Fokue’s tutorial will be in the Horan O'Donnell Science building Friday's cocktail hour, poster session, and dinner will take place in the Student Center Conference participants will be able to park in the Science Hall parking ramp Conference Sponsors PARC, A Xerox Company – Data Competition Sponsor Canisius College Center for Quality and Applied Statistics – Rochester Institute of Technology (RIT) SAS State University of New York – College at Geneseo Revolution Analytics - Microsoft BlueCross BlueShield of Western New York M&T Bank iCitizen Institute For Autism Research - Canisius College American Statistical Association (ASA) – Rochester Chapter, Buffalo Chapter, Syracuse Chapter Conference Website http://www.up-stat.org/ UP-STAT 2016 PROGRAM OUTLINE Friday, April 22 8:00-9:00 Registration 9:00-11:00 Tutorials Science Hall Commons A Gentle Introduction to Statistical Learning Theory for Data Science Rm HO 109 Ernest Fokoué, Rochester Institute of Technology Human Factors in Graph Design Rm SH 1004 Esa M. Rantanen, Rochester Institute of Technology 1:00-3:00 Tutorial Kaggle Predictive Analytics with Random Forests and Boosted Trees Rm SH 1013 A Padraic Neville, SAS Institute 1:00-2:00 Tutorial Analyzing Gravitational Wave Data from the LIGO Open Science Center John Whelan, Rochester Institute of Technology 2:00-3:00 Rm SH 1013 B Tutorial Practical Natural Language Processing Rm SH 1013 B Emily Prud’hommeaux, Rochester Institute of Technology 3:15-4:15 Fr. Haus Memorial Mathematics Lecture Modeling the Effect of Age in Human Performance Rm SH 1013 AB Richard De Veaux, C. Carlisle and Margaret Tippit Professor of Statistics Department of Mathematics and Statistics, Williams College 4:30-5:55 Panel Discussion Rm SH 1013 AB The Multiple Facets of Data Science Ernest Fokoué, Rochester Institute of Technology (Moderator) Richard De Veaux, Williams College Reneta Barneva, SUNY Fredonia John Coles, CUBRC Beth Sardina, HealthNow H. David Sheets, Canisius College Brian Sullivan, M&T Bank 6:00-7:00 Poster Session Grupp, 2nd floor of Student Center 7:00-9:00 Conference Dinner Grupp, 2nd floor of Student Center Saturday, April 23 8:00-9:00 Breakfast/registration/orientation Science Hall Commons 9:00-9:20 Welcome Science Hall Commons 9:30-10:15 Parallel sessions Science Hall Classrooms Session 1A: Epidemiology / Experimental Design Session 1B: Advances in Clustering Methodology Session 1C: Undergraduate/Graduate Statistics Education Session 1D: Biostatistical Methods Session 1E: Machine Learning Session 1F: Geological Applications / Image Generation 10:00-12:00 Rm SH 1013 A Rm SH 1004 Rm SH 1028 Rm SH 1036 Rm SH 1017 Rm SH 1053 Tutorial Tidy Data Analysis in R with dplyr, ggplot2, and broom Rm SH 1013 B David Robinson, Stack Overflow 10:25-11:10 Parallel sessions Session 2A: Multiple Outcomes in Biostatistics Rm SH 1004 Session 2B: High-Dimensional Data Analysis with Dimension Reduction Rm SH 1028 Session 2C: Infectious Disease Modeling Rm SH 1053 Session 2D: Analytics in Sports Rm SH 1013 A Session 2E: Statistics in the Evaluation of Education Rm SH 1017 Session 2F: Data Science / Computing Rm SH 1036 11:20-12:05 Parallel sessions Session 3A: Undergraduate Statistics Education Session 3B: Health Policy Statistics Session 3C: Statistical Applications in Physics Session 3D: Biostatistical Modeling Session 3E: Statistics in Music Session 3F: Statistical Applications in Education Rm SH 1053 Rm SH 1013 A Rm SH 1004 Rm SH 1036 Rm SH 1028 Rm SH 1017 12:15-1:15 Lunch / Poster Session 1:25-2:35 Session 4: Provost’s welcome and Keynote Lecture 2:40-3:20 Data competition session (three 10-minute presentations) 3:30-4:15 Parallel sessions Session 5A: Innovative Methods for Missing Data Session 5B: Use of Computing in Statistics Education Session 5C: Bioinformatics Session 5D: Methods to Enrich Statistics Education Session 5E: Extreme Values / Sparse Signal Recovery Session 5F: Machine Learning Applications 4:25-5:00 Awards and wrap-up Science Hall Commons Rm SH 1004 Rm SH 1028 Rm SH 1053 Rm SH 1013 A Rm SH 1013 B Rm SH 1036 Science Hall Commons UP-STAT 2016 PROGRAM FOR SATURDAY, APRIL 23 SESSION 1A Room 1013A Epidemiology / Experimental Design Session Chair: 9:30-9:50 Donald Harrington The statistics of suicides Jennifer Bready Division of Math and Information Technology, Mount Saint Mary College 9:55-10:15 Order-of-addition experiments Joseph Voelkel School of Mathematical Sciences, Rochester Institute of Technology SESSION 1B Room 1004 Advances in Clustering Methodology Motivated by Mixed Data Type Applications Session Organizer/Chair: 9:30-9:45 Rebecca Nugent, Department of Statistics, Carnegie Mellon University Hitting the wall: mixture models of long distance running strategies § Joseph Pane Department of Statistics, Carnegie Mellon University 9:45-10:00 Prediction via clusters of CPT codes for improving surgical outcomes § Elizabeth Lorenzi Department of Statistical Science, Duke University 10:00-10:15 Clustering text data for record linkage Samuel Ventura Department of Statistics, Carnegie Mellon University SESSION 1C Room 1028 Undergraduate/Graduate Statistics Education Session Chair: 9:30-9:50 Ernest Fokoue Psychological statistics and the Stockholm syndrome Susan Mason, Sarah Battaglia, and Sarah Ribble Department of Psychology, Niagara University 9:55-10:15 Interdisciplinary professional science master’s program on data analytics between SUNY Buffalo State and SUNY Fredonia Joaquin Carbonara and Valentin Brimkov Department of Mathematics, SUNY Buffalo State Reneta Barneva, Department of Applied Professional Studies, SUNY Fredonia SESSION 1D Room 1036 Biostatistical Methods Session Chair: 9:30-9:50 Matt McCall Detecting changes in matched case count data three ways: analysis of the impact of longacting injectable antipsychotics on medical services in a Texas Medicaid population John C. Handley PARC, Inc. Douglas Brink, Janelle Sheen, Anson Williams, and Lawrence Dent, Xerox Corporation Robert Berringer, AllyAlign Health, Inc. 9:55-10:15 Bayesian approaches to missing data with known bounds: dental amalgams and the Seychelles Child Development Study § Chang Liu and Sally Thurston Department of Biostatistics and Computational Biology, University of Rochester Medical Center SESSION 1E Room 1017 Machine Learning Session Chair: 9:30-9:50 Michael McDermott An introduction to ensemble methods for machine learning § Kenneth Tyler Wilcox School of Mathematical Sciences, Rochester Institute of Technology 9:55-10:15 Generalization of training error bounds to test error bounds of the boosting algorithm § Paige Houston and Ernest Fokoué School of Mathematical Sciences, Rochester Institute of Technology SESSION 1F Room 1053 Geological Applications / Image Generation Session Chair: 9:30-9:50 Tanzy Love A new approach to quantifying stratigraphic resolution: measurements of accuracy in geological sequences H. David Sheets Department of Physics, Canisius College 9:55-10:15 Image generation in the era of deep architectures Ifeoma Nwogu Department of Computer Science and Engineering, SUNY at Buffalo SESSION 2A Room 1004 Strategies for Analyzing Multiple Outcomes in Biostatistics Session Organizer/Chair: 10:25-10:45 Amy LaLonde, Department of Biostatistics and Computational Biology, University of Rochester Medical Center Clustering multiple outcomes via the Dirichlet process prior § Amy LaLonde and Tanzy Love Department of Biostatistics and Computational Biology, University of Rochester Medical Center 10:50-11:10 Global tests for multiple outcomes in randomized trials § Donald Hebert Department of Biostatistics and Computational Biology, University of Rochester Medical Center SESSION 2B Room 1028 New Progress in High-Dimensional Data Analysis with Dimension Reduction Session Organizer/Chair: 10:25-10:40 Wei Qian, School of Mathematical Sciences, Rochester Institute of Technology Consistency and convergence rate for the nearest subspace classifier Yi Wang Department of Mathematics, Syracuse University 10:40-10:55 A new approach to sparse sufficient dimension reduction with applications to highdimensional data analysis Wei Qian School of Mathematical Sciences, Rochester Institute of Technology 10:55-11:10 Tensor sliced inverse regression with application to neuroimaging data analysis Shanshan Ding Department of Applied Economics and Statistics, University of Delaware SESSION 2C Room 1053 Statistical Methods and Computational Tools for Modeling the Spread of Infectious Diseases Session Organizer/Chair: 10:25-10:45 Samuel Ventura, Carnegie Mellon University Statistical and computational tools for generating synthetic ecosystems § Lee Richardson Department of Statistics, Carnegie Mellon University 10:50-11:10 An overview of statistical modeling of infectious diseases § Shannon Gallagher Department of Statistics, Carnegie Mellon University SESSION 2D Room 1013A Analytics in Sports Session Chair: 10:25-10:45 Tanzy Love Statistics and data analytics in sports management Reneta Barneva Department of Applied Professional Studies, SUNY Fredonia Valentin Brimkov Department of Mathematics, SUNY Buffalo State Patrick Hung and Kamen Kanev Faculty of Business and Information Technology, University of Ontario Institute of Technology 10:50-11:10 Predictive analytics tools for identifying the keys to victory in professional tennis § Shruti Jauhari and Aniket Morankar Department of Computer Science, Rochester Institute of Technology Ernest Fokoué School of Mathematical Sciences, Rochester Institute of Technology SESSION 2E Room 1017 Statistics in the Evaluation of Education Session Chair: 10:25-10:45 Ernest Fokoue An analysis of covariance challenge to the Educational Testing Service Richard Escobales Department of Mathematics and Statistics, Canisius College Ronald Rothenberg Department of Mathematics, CUNY, Queens College 10:50-11:10 Model, model, my dear model! Tell me who the most effective is Yusuf K. Bilgic Department of Mathematics, SUNY Geneseo SESSION 2F Room 1036 Data Science / Computing Session Chair: 10:25-10:45 Padraic Neville An example of four data science curveballs § Tamal Biswas and Kenneth Regan Department of Computer Science and Engineering, SUNY at Buffalo 10:50-11:10 Initializing R objects using data frames: tips, tricks, and some helpful code to get you started Donald Harrington Department of Biostatistics and Computational Biology, University of Rochester Medical Center SESSION 3A Room 1053 Undergraduate Statistics Education Session Organizer/Chair: 11:20-11:45 Rebecca Nugent, Department of Statistics, Carnegie Mellon University Undergraduate statistics: How do students get there? What happens when they leave? (and everything in between . . .) Rebecca Nugent and Paige Houser Department of Statistics, Carnegie Mellon University 11:45-12:05 Panel discussion SESSION 3B Room 1013A Health Policy Statistics Session Organizer/Chair: 11:20-11:35 Xueya Cai, Department of Biostatistics and Computational Biology, University of Rochester Medical Center Application of two stage residual inclusion (2SRI) model in testing the impact of length of stay on short-term readmission rate Xueya Cai Department of Biostatistics and Computational Biology, University of Rochester Medical Center 11:35-11:50 Health Services and Outcomes Research Methodology (HSORM): an international peerreviewed journal for statistics in health policy Yue Li Department of Public Health Sciences, University of Rochester Medical Center 11:50-12:05 Jackknife empirical likelihood inference for inequality and poverty measures Dongliang Wang Department of Public Health and Preventive Medicine, Upstate Medical University SESSION 3C Room 1004 Statistical Applications in Physics Session Chair: 11:20-11:40 Tanzy Love Investigation of statistical and non-statistical fluctuations observed in relativistic heavy-ion collisions at AGS and CERN energies Gurmukh Singh Department of Computer and Information Sciences, SUNY Fredonia Provash Mali and Amitabha Mukhopadhyay Department of Physics, North Bengal University, India 11:45-12:05 Analysis of big data at the Jefferson Lab Michael Wood Department of Physics, Canisius College SESSION 3D Room 1036 Biostatistical Modeling Session Chair: 11:20-11:40 John C. Handley Techniques for estimating time-varying parameters in a cardiovascular disease dynamics model § Jacob Goldberg Department of Mathematics, SUNY Geneseo 11:45-12:05 Prediction of event times with a parametric modeling approach in clinical trials § Chongshu Chen Department of Biostatistics and Computational Biology, University of Rochester Medical Center SESSION 3E Room 1028 Statistics in Music Session Chair: 11:20-11:40 Michael McDermott Can statistics make me a millionaire music producer? § Jessica Young School of Mathematical Sciences, Rochester Institute of Technology 11:45-12:05 Automatic singer identification of popular music via singing voice separation § Shiteng Yang School of Mathematical Sciences, Rochester Institute of Technology SESSION 3F Room 1017 Statistical Applications in Education Session Chair: 11:20-11:40 Ernest Fokoue Monte Carlo-based enrollment projections Matthew Hertz Department of Computer Science, Canisius College 11:45-12:05 Program for International Student Assessment analysis in R § Kierra Shay Department of Mathematics, SUNY Geneseo SESSION 4 Science Commons Keynote Lecture Session Chair: 1:25-2:35 Ernest Fokoué, Rochester Institute of Technology From the classroom to the boardroom – how do we get there? Richard De Veaux Department of Mathematics and Statistics, Williams College SESSION 5A Room 1004 Innovative Techniques to Address Missing Data in Biostatistics Session Organizer/Chair: 3:30-3:45 Valeriia Sherina, Department of Biostatistics and Computational Biology, University of Rochester Medical Center Challenges in estimating model parameters in qPCR § Valeriia Sherina, Matthew McCall, and Tanzy Love Department of Biostatistics and Computational Biology, University of Rochester Medical Center 3:45-4:00 Semiparametric inference concerning the geometric mean with detection limit § Bokai Wang, Changyong Feng, Hongyue Wang, and Xin Tu Department of Biostatistics and Computational Biology, University of Rochester Medical Center 4:00-4:15 Two data analysis methods concerning missing data § Lin Ge Department of Biostatistics and Computational Biology, University of Rochester Medical Center SESSION 5B Room 1028 Use of Computing in Statistics Education Session Chair: 3:30-3:50 Ernest Fokoué Teaching programming skills to finance students: how to design and teach a great course Yuxing Yan Department of Economics and Finance, Canisius College 3:55-4:15 The biomarker challenge R package: a two stage array/validation in-class exercise § Luther Vucic, Dietrich Kuhlmann, and Daniel Gaile Department of Biostatistics, SUNY at Buffalo Jeremiah Grabowski School of Public Health and Health Professions, SUNY at Buffalo SESSION 5C Room 1053 Bioinformatics Session Chair: 3:30-3:50 Matt McCall A short survey of computational structural proteomics research using the Protein Data Bank (PDB) as the main data set Vicente M. Reyes Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology 3:55-4:15 A novel and quick method to power pilot studies for the comparison of assay platforms under controlled specificity § Zhuolin He, Ziqiang Chen, and Daniel Gaile Department of Biostatistics, SUNY at Buffalo SESSION 5D Room 1013A Methods to Enrich Statistics Education Session Chair: 3:30-3:50 Tanzy Love Enriching statistics classes in K-12 schools by adding practical ideas from the history of mathematics and statistics Mucahit Polat and Celal Aydar § Graduate School of Education, SUNY at Buffalo 3:55-4:15 Multiple linear regression with sports data § Matthew D’Amico, Jake Ryder, and Yusuf Bilgic Department of Mathematics, SUNY Geneseo SESSION 5E Room 1013B Extreme Values / Sparse Signal Recovery Session Chair: 3:30-3:50 John C. Handley EXTREME statistics: data analysis for disasters § Nicholas LaVigne Department of Mathematics, SUNY Geneseo 3:55-4:15 Minimax optimal sparse signal recovery with Poisson statistics Mohammad Rohban Broad Institute of Harvard and MIT SESSION 5F Room 1036 Machine Learning Applications Session Organizer/Chair: 3:30-3:50 Gabriela Olinto, School of Mathematical Sciences, Rochester Institute of Technology, and Soleo Communications Evolutionary weighting scheme for random subspace learning § André Lobato Ramos School of Mathematical Sciences, Rochester Institute of Technology 3:55-4:15 Natural language processing for automatic keyword extraction and topic summarization § Gabriela Olinto School of Mathematical Sciences, Rochester Institute of Technology, and Soleo Communications William Consagra Soleo Communications § Indicates presentations that are eligible for the student presentation awards