Version 2 Universidade Nacional da Irlanda em Galway -- Ciência sem Fronteiras PhD Project Template Use one form per project Please complete & submit to international@nuigalway.ie as soon as possible, and by 27/11/2012 In your email, begin the subject line with [SWB] (be sure to use square brackets) to ensure that your email is filed correctly. Emails will be automatically filed PI name & contact details: School: Dr Michael Madden: michael.madden@nuigwalway.ie +353 91 493797 Information Technology, College of Engineering & Informatics Has project been agreed with head (or nominee) of proposed registration school? Yes Research Centre / group affiliation: Machine Learning & Data Mining Research Group Research group / centre website: http://datamining.it.nuigalway.ie/ PI website / link to CV: http://datamining.it.nuigalway.ie/ Brief summary of PI research / research group / centre activity (2 or 3 lines max): We focus on advances in techniques for machine learning and data mining, motivated by solving important and interesting practical applications. We work on classification algorithms, kernel methods, reinforcement learning, probabilistic inference, and data stream mining. Title & brief description of PhD project (suitable for publication on web): Comparison, Clustering and Search for Chemical Spectroscopy Data The Machine Learning and Data Mining Group has conducted successful research over many years on applying machine learning methods to the field of chemometrics (statistical analysis of chemical data). This work has led to publications, patents and a spin-out company. Our work to date has focused on mixture analysis and classification tasks. This project will extend our previous work and focus on other tasks of importance to spectroscopy: • Comparison: given two spectra, evaluate on a numerical scale how similar they are to each other (from 0 if completely identical to 1 if completely different: having no materials in common) • Clustering: given a comparison function, the difference between all pairs of spectra in a database can be computed, and the results used to identify clusters; i.e. groups of entries that are similar to each other and different from the rest • Search: given a comparison function and the spectrum of an unknown substance, a ranked list of the most similar substances in the database can be produced; furthermore, if clustering has been performed, the unknown substance can be positioned relative to the clusters. Version 2 Universidade Nacional da Irlanda em Galway -- Ciência sem Fronteiras Although standard methods for these tasks exist, we believe that we can improve on them, as we have done for classification tasks, by applying machine learning methods to them and by developing new methods that have underlying assumptions that are well aligned to the characteristics of the spectroscopy data. Unique selling points of PhD project in NUI Galway: The National University of Ireland Galway, founded in 1845, has a distinguished record in scholarship and research. The University enjoys a close relationship with Galway city, which is shaped by artistic communities, active student life, innovative industry and leading edge research. The University’s Structured PhD programme enables researchers to take advanced technical courses as well as to develop their research skills in the early stages of their PhD work. The Machine Learning & Data Mining Group has been working on machine learning methods for analysis of chemistry data for many years. As well as producing publications, this work has led to patents and a spin-out company (Analyze IQ Limited). The group has links to Chemistry researchers in NUI Galway and elsewhere, and strong international industry linkages. The group also has collaborations with leading research groups in other universities internationally, such as the University of California Berkeley, the University of California Irvine and the University of Helsinki, Finland. Name & contact details for project queries, if different from PI named above: As above. Please indicate the graduates of which disciplines that should apply: Computer Science, Software Engineering, Mathematics, or similar disciplines. Ciência sem Fronteiras / Science Without Borders Priority Area: Please indicate the specific programme priority area under which the proposed PhD project fits- choose only one (tick box): Engineering and other technological areas Pure and Natural Sciences (e.g. mathematics, physics, chemistry)/Physical Sciences (Mathematics, Physics, Chemistry, Biology and Geosciences) Health and Biomedical Sciences / Clinical, Pré-clinical and Health Sciences Information and Communication Technologies (ICTs), Computing Aerospace Pharmaceuticals Sustainable Agricultural Production Oil, Gas and Coal Renewable Energy Minerals, Minerals Technology Biotechnology Nanotechnology and New Materials Version 2 Universidade Nacional da Irlanda em Galway -- Ciência sem Fronteiras Technologies for Prevention and Mitigation of Natural Disasters Bioprospecting and Biodiversity Marine Sciences Creative Industry New technologies in constructive engineering Please indicate which of the following applies to this project (referring to Science Without Borders arrangements): Suitable only as a Full PhD (Y/N): _ ____ Available to candidates seeking a Sandwich PhD arrangement (Y/N): _____ Suitable for either/Don’t know: