The Machine Learning Salon Starter Kit Jacqueline Isabelle Forien 1st Edition - Summer 2015 ABOUT................................................................................37 The Machine Learning Salon Starter Kit....................................................37 Founder of The Machine Learning Salon...................................................38 MOOC, Opencourseware in English...................................39 COURSERA: Machine Learning Stanford Course....................................39 COURSERA: Pratical Machine Learning..................................................39 COURSERA: Neural Networks for Machine Learning..............................40 COURSERA: Data Science Specialization.................................................40 COURSERA: Reasoning, Data Analysis and Writing Specialization.........42 COURSERA: Data Mining Specialization.................................................43 COURSERA: Cloud Computing Specialization.........................................46 COURSERA: Miscellaneous.......................................................................48 STANFORD University: Stanford Engineering Everywhere......................52 STANFORD University: 2015 Stanford HPC Conference Video Gallery.53 STANFORD University: Awni Hannun of Baidu Research.......................53 STANFORD University: Steve Cousins of Savioke....................................53 STANFORD University: Ron Fagin of IBM Research...............................54 STANFORD University: CS224d: Deep Learning for Natural Language Processing by Richard Socher, 2015............................................................54 EdX: Articifial Intelligence (BerkeleyX).......................................................55 EdX: Big Data and Social Physics (Ethics)...................................................55 EdX: Introduction to Computational Thinking and Data Science.............56 MIT OpenCourseWare (OCW)...................................................................56 VLAB MIT Entreprise Forum Bay Area, Machine Learning Videos.........56 Foundations of Machine Learning by Mehryar Mohri - 10 years of Homeworks with Solutions and Lecture Slides............................................57 Carnegie Mellon University (CMU) Video resources..................................57 CMU: Convex Optimisation, Fall 2013, by Barnabas Poczos and Ryan Tibshirani.....................................................................................................58 CMU: Machine Learning, Spring 2011, by Tom Mitchell..........................58 CMU: 10-601 Machine Learning Spring 2015 - Lecture 18 by Maria-Florina Balcan...........................................................................................................59 CMU: 10-601 Machine Learning Spring 2015, Homeworks & Solutions & Code (Matlab)...............................................................................................59 CMU: 10-601 Machine Learning Spring 2015 - Recitation 10 by Kirstin Early.. 59 CMU: Abulhair Saparov’s Youtube Channel..............................................59 CMU: Machine Learning Course by Roni Rosenfeld, Spring 2015............59 CMU: Language and Statistics by Roni Rosenfeld, Spring 2015................60 Metacademy Concept list and roadmap list.................................................60 HARVARD University: Advanced Machine Learning, Fall 2013...............61 HARVARD University: Data Science Course, Fall 2013.............................61 OXFORD University: Nando de Freitas Video Lectures............................61 OXFORD University: Deep learning - Introduction by Nando de Freitas, 2015. 62 OXFORD University: Deep learning - Linear Models by Nando de Freitas, 2015..............................................................................................................62 OXFORD University: Yee Whye Teh Home Page, Department of Statistics, University College........................................................................................62 CAMBRIDGE University: Machine Learning Slides, Spring 2014............63 CALTECH University: Learning from Data...............................................63 UNIVERSITY COLLEGE LONDON (UCL): Discovery.........................63 UCL: Supervised Learning by Mark Herbster............................................64 Yann LeCun’s Publications...........................................................................64 Ecole Normale Superieure: Francis Bach, Courses and Exercises with solutions (English-French) ...........................................................................................64 Technion, Israel Institute of Technology, Machine Learning Videos..........65 E0 370: Statistical Learning Theory by Prof. Shivani Agarwal, Indian Institute of Science.....................................................................................................66 NPTEL, National Programme on Technology Enhanced Learning, India67 . Pattern Recognition Class, Universität Heidelberg, 2012 (Videos in English)67 Videolectures.net..........................................................................................70 MLSS Machine Learning Summer Schools Videos....................................70 GoogleTechTalks..........................................................................................71 Udacity Opencourseware.............................................................................71 Udacity's Videos ..........................................................................................73 Mathematicalmonk Machine Learning.......................................................73 Judea Pearl Symposium................................................................................73 SIGDATA, Indian Institute of Technology Kanpur....................................74 Hakka Labs ..................................................................................................74 Open Yale Course........................................................................................74 COLUMBIA University: Machine Learning resources...............................74 COLUMBIA University: Applied Data Science by Ian Langmore and Daniel Krasner.........................................................................................................75 Deep Learning..............................................................................................75 BigDataWeek Videos....................................................................................76 Neural Information Processing Systems Foundation (NIPS) Video resources76 NIPS 2014 Workshop Videos.......................................................................76 NIPS 2014 Workshop - (Bengio) OPT2014 Optimization for Machine Learning ......................................................................................................................77 Hong Kong Open Source Conference 2013 (English&Chinese) ................77 ICLR 2014 Videos.......................................................................................77 ICLR 2013 Videos.......................................................................................78 Machine Learning Conference Videos........................................................78 Internet Archive...........................................................................................79 University of Berkeley..................................................................................80 AMP Camps, Big Data Bootcamp, UC Berkeley ........................................80 AI on the Web, AIMA (Artificial Intelligence: A Modern Approach) by Stuart Russell and Peter Norvig..............................................................................80 Resources and Tools of Noah's ARK Research Group...............................80 ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2014.........81 The Royal Society .......................................................................................82 Statistical and causal approaches to machine learning by Professor Bernhard Schölkopf......................................................................................................83 Deep Learning RNNaissance with Dr. Juergen Schmidhuber.....................83 Introduction to Deep Learning with Python by Alec Radford....................83 A Statistical Learning/Pattern Recognition Glossary by Thomas Minka...83 The Kalman Filter Website by Greg Welch and Gary Bishop.....................83 Lisbon Machine Learning School (LXMLS)...............................................84 LXMLS Slides, 2014....................................................................................85 INTRODUCTORY APPLIED MACHINE LEARNING by Victor Lavrenko and Nigel Goddard, University of Edinburgh, 2011...................................86 Data Mining and Machine Learning Course Material by Bamshad Mobasher, DePaul University, Fall 2014........................................................................86 Intelligent Information Retrieval by Bamshad Mobasher, DePaul University, Winter 2015..................................................................................................86 Student Dave Youtube Channel...................................................................87 Current Courses of Justin E. Esarey, RICE University................................87 From Bytes to Bites: How Data Science Might Help Feed the World by David Lobell, Stanford University..........................................................................88 Conference on Empirical Methods in Natural Language Processing (and forerunners) (EMNLP)..................................................................................88 Columbia University's Laboratory for Intelligent Imaging and Neural Computing (LIINC).....................................................................................89 Enabling Brain-Computer Interfaces for Labeling Our Environment by Paul Sadja.............................................................................................................89 The Unreasonable Effectivness Of Deep Learning by Yann LeCun, Sept 2014.. 89 Machine Learning by Prof. Shai Ben-David, University of Waterloo, Lecture 1-3, Jan 2015................................................................................................89 Computer Vision by Richard E. Turner, Slides, Exercises & Solutions, University of Cambridge.............................................................................90 Probability and Statistics by Carl Edward Rasmussen, Slides, University of Cambridge....................................................................................................90 Machine Learning by Carl Edward Rasmussen, Slides, University of Cambridge....................................................................................................90 Seth Grimes's videos.....................................................................................90 Introduction to Reinforcement Learning by Shane Conway, Nov 2014......90 Machine Learning and Data Mining by Prof. Dr. Volker Tresp, 2014, LMU91 Applied Machine Learning by Joelle Pineau, Fall 2014, McGill University91 Analyzing data from the city of Montreal....................................................91 Artificial Intelligence by Joelle Pineau, Winter 2014-2015, McGill University..... 91 Talking Machines: The History of Machine Learning from the Inside Out92 The Simons Institute for the Theory of Computing..................................92 DIKU - Datalogisk Institut, Københavns Universitet Youtube Channel....92 Hashing in machine learning by John Langford, Microsoft Research.........93 Dimensionality reductions by Alexander Andoni, Microsoft Research.......93 RE.WORK Deep Learning Summit Videos, San Francisco 2015..............93 Machine Learning Tutorial, UNSW Australia.............................................93 Oxford's Podcast...........................................................................................93 Natural Language Processing by Mohamed Alaa El-Dien Aly, 2014, KAUST..... 94 QUT - Queensland University of Technology, Brisbane, Australia............94 Data & Society..............................................................................................94 Open Book for people with autism...............................................................95 NUMDAN, Recherche et téléchargement d’archives de revues mathématiques numérisées....................................................................................................95 Project Euclid, mathematics and statistics online.........................................95 Statistical Modeling: The Two Cultures by Leo Breiman, 2001..................95 mini-DML....................................................................................................95 MISCELLANEOUS....................................................................................96 The Automatic Statistician project...............................................................96 A selection of Youtube's featured channels..................................................97 Introduction To Modern Brain-Computer Interface Design by Swartz Center for Computational Neuroscience.................................................................98 Distributed Computing Courses (lectures, exercises with solutions) by ETH Zurich, Group of Prof. Roger Wattenhofer.................................................98 The wonderful and terrifying implications of computers that can learn | Jeremy Howard | TEDxBrussels..............................................................................99 Partially derivative, A podcast about data, data science, and awesomeness!99 Class Central................................................................................................99 Beginning to Advanced University CS Courses.........................................100 WIRED UK Youtube Channel..................................................................100 Davos 2015 - A Brave New World - How will advances in artificial intelligence, smart sensors and social technology change our lives?...............................100 World Economic Forum.............................................................................101 The Global Gender Gap Report................................................................101 The LINCS project....................................................................................102 Australian Academy of Science.................................................................102 Artificial intelligence: Machines on the rise................................................102 Bill Gates Q&A on Reddit..........................................................................103 Second Price went to Yarin Gal for his extrapolated art image, Cambridge University Engineering Photo Competition...............................................103 Draw from a Deep Gaussian Process by David Duvenaud, Cambridge University Engineering Photo Competition...............................................103 MOOC, Opencourseware in Spanish................................104 MOOC, Opencourseware in German...............................104 MOOC, Opencourseware in Italian..................................104 MOOC, Opencourseware in French..................................105 France Universite Numerique (FUN).........................................................105 FUN: MinesTelecom: 04006 Fondamentaux pour le Big Data.................105 University of Laval (French Canadian)......................................................105 Théorie algorithm. des graphes..................................................................106 Hugo Larochelle, Apprentissage automatique, French Canadian.............106 Francis Bach, Ecole Normale Superieure - Courses and Exercises with solutions (English-French) .........................................................................................107 College de France, Mathematics and Digital Science, French...................108 Le Laboratoire de Recherche en Informatique (LRI)................................108 MOOC, Opencourseware in Russian.................................110 Russian Machine Learning Resources.......................................................110 The Yandex School of Data Analysis.........................................................110 Alexander D’yakonov Resources................................................................111 MOOC, Opencourseware in Japanese..............................112 MOOC, Opencourseware in Chinese................................113 Yeeyan Coursera Chinese Classroom........................................................113 Hong Kong Open Source Conference 2013 .............................................113 Guokr.com..................................................................................................113 MOOC, Opencourseware in Portuguese...........................115 Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF, 2010............................................................................................................115 Algoritmo de Aprendizado de Máquina by Aurora Trinidad Ramirez Pozo, Universidade Federal do Paraná, UFPR....................................................115 Digital Library, Universidad de Sao Paulo.................................................115 MOOC, Opencourseware in Hebrew................................116 Open University of Israel...........................................................................116 Homeworks, Assignments & Solutions................................117 CS229 Stanford Machine Learning List of projects (free access to abstracts), 2013 and previous years.............................................................................117 CS229 Stanford Machine Learning by Andrew Ng, Autumn 2014 .........117 CS 445/545 Machine Learning by Melanie Mitchell, Winter Quarter 2014117 Introduction to Machine Learning, Machine Learning Lab, University of Freiburg, Germany.....................................................................................118 Unsupervised Feature Learning and Deep Learning by Andrew Ng, 2011 ?118 Machine Learning by Andrew Ng, 2011....................................................118 Pattern Recognition and Machine Learning, Solutions to Exercises, by Markus Svensen and Christopher Bishop, 2009......................................................119 Machine Learning Course by Aude Billard, Exercises & Solutions, EPFL, Switzerland.................................................................................................119 T-61.3025 Principles of Pattern Recognition Weekly Exercises with Solutions (in English), Aalto University, Finland, 2015..................................................119 T-61.3050 Machine Learning: Basic Principles Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014.......................................119 CSE-E5430 Scalable Cloud Computing Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014............................................119 Weekly Exercises with Solutions (in English) from Aalto University, Finland120 SurfStat Australia: an online text in introductory Statistics.......................120 Learning from Data by Amos Storkey, Tutorial & Worksheets (with solutions), University of Edinburgh, Fall 2014............................................................120 Web Search and Mining by Christopher Manning and Prabhakar Raghavan,, Winter 2005................................................................................................120 Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Spring 2014................................................................................................120 Introduction to Time Series by Peter Bartlett, Berkeley, Homework & solutions, Fall 2010.....................................................................................................121 Introduction to Machine Learning by Stuart Russel, CS 194-10, Fall 2011, Assignments & Solutions............................................................................121 Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Fall 2009.....................................................................................................121 Advanced Topics in Machine Learning by Arthur Gretton, 2015, University College London (exercises with solutions)..................................................121 Reinforcement Learning by David Silver, 2015, University College London (exercises with solutions).............................................................................121 Emmanuel Candes Lectures, Homeworks & Solutions, Stanford University (great resources, not to be missed!).............................................................122 Advanced Topics in Convex Optimization by Emmanuel Candes, Handouts, Homeworks & Solutions, Winter 2015, Stanford University.....................122 MSM 4M13 Multicriteria Decision Making by SÁNDOR ZOLTÁN NÉMETH, School of Mathematics, University of Birmingham.............122 10-601 Machine Learning Spring 2015, Homeworks & Solutions & Code (Matlab)......................................................................................................122 Introduction to Machine Learning by Alex Smola, CMU, Homeworks & Solutions.....................................................................................................123 Applications.........................................................................124 MIT Media Lab.........................................................................................124 TEDx San Francisco, Connected Reality..................................................124 Emotion&Pain Project................................................................................124 IBM Research.............................................................................................125 EFPL Ecole Polytechnique Fédérale de Lausanne ....................................125 Visualizing MBTA Data: An interactive exploration of Boston's subway system.. 126 Commercial Applications ...................................................127 Google glass................................................................................................127 Google self-driving car...............................................................................127 SenseFly......................................................................................................127 HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER..........................................................127 RESEARCH PAPERS, in English......................................128 Cambridge University Publications page...................................................128 arXiv.org by Cornell University Library ...................................................128 Google Scholar...........................................................................................128 Google Research.........................................................................................128 Yahoo Research..........................................................................................129 Microsoft Research.....................................................................................129 Journal from MIT Press.............................................................................129 DROPS, Dagstulh Research Online Publication Server............................129 OPEN SOURCE SOFTWARE, in English.......................130 Weka 3: Data Mining Software in Java......................................................130 A deep-learning library for Java.................................................................130 List of Java ML Software by Machine Learning Mastery.........................130 List of Java ML Software by MLOSS........................................................130 MathFinder: Math API Discovery and Migration, Software Engineering and Analysis Lab (SEAL), IISc Bangalore.........................................................130 Google Java Style........................................................................................131 JSAT: java-statistical-analysis-tool by Edward Raff....................................131 Theano Library for Deep Learning, Python..............................................131 Theano and LSTM for Sentiment Analysis by Frederic Bastien, Universite de Montreal.....................................................................................................132 Introduction to Deep Learning with Python..............................................132 COURSERA: An Introduction to Interactive Programming in Python (Part 1)... 132 COURSERA: An Introduction to Interactive Programming in Python (Part 2)... 133 COURSERA: Programming for Everybody (Python)...............................133 Udacity - Programming foundations with Python.....................................133 Scikit-learn, Machine Learning in Python.................................................133 Pydata ........................................................................................................134 PyData NYC 2014 Videos..........................................................................134 PyData, The Complete Works by Rohit Sivaprasad..................................134 Anaconda...................................................................................................135 Ipython Interactive Computing..................................................................135 Scipy...........................................................................................................135 Numpy........................................................................................................136 matplotlib...................................................................................................136 pandas.........................................................................................................136 SymPy.........................................................................................................136 Orange........................................................................................................137 Pythonic Perambulations: How to be a Bayesian in Python......................137 emcee..........................................................................................................137 PyMC.........................................................................................................137 Pylearn2......................................................................................................137 PyCon US 2014..........................................................................................138 PyCon India 2012......................................................................................138 PyCon India 2013......................................................................................138 Montreal Python........................................................................................138 SciPy 2014..................................................................................................139 PyLadies London Meetup resources..........................................................139 Python Tools for Machine Learning by CB Insights..................................139 Python Tutorials by Jessica MacKellar.......................................................139 INTRODUCTION TO PYTHON FOR DATA MINING.....................140 Notebook Gallery: Links to the best IPython and Jupyter Notebooks by ?140 Google Python Style Guide........................................................................140 Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper............................................................................................141 PyBrain Library..........................................................................................141 Classifying MNIST dataset with Pybrain...................................................142 OCTAVE....................................................................................................142 PMTK Toolbox by Matt Dunham, Kevin Murphy...................................142 Octave Tutorial by Paul Nissenson.............................................................143 JULIA.........................................................................................................143 Julia by example by Samuel Colvin............................................................144 The R PROJECT for Statistical Computing.............................................144 Coursera: R Programming.........................................................................144 R Graph Gallery........................................................................................145 Code School - R Course.............................................................................145 Coursera R programming..........................................................................145 Open Intro R Labs.....................................................................................145 R Tutorial...................................................................................................145 DataCamp R Course.................................................................................146 R Bloggers..................................................................................................146 R-Project Package: caret: Classification and Regression Training.........146 A Short Introduction to the caret Package by Max Kuhn.........................146 R packages by Hadley Wickham................................................................147 Google's R Style Guide..............................................................................147 STAN Software..........................................................................................147 List of Machine Learning Open Source Software.....................................148 Google Prediction API...............................................................................148 Reddit ........................................................................................................149 SCHOGUN toolbox..................................................................................149 Comparison between ML toolbox.............................................................149 Infer.NET, Microsoft Research...................................................................149 F# Software Foundation.............................................................................150 BigML........................................................................................................150 BRML Toolbox in Matlab/Julia David Barber Toolbox, University College London.......................................................................................................150 SCILAB......................................................................................................150 OverFeat and Torch7, CILVR Lab @ NYU.............................................150 FAIR open sources deep-learning modules for Torch................................151 IPython kernel for Torch with visualization and plotting...........................151 Deep Learning Lecture 9: Neural networks and modular design in Torch by Nando de Freitas, Oxford University.........................................................151 Deep Learning Lecture 8: Modular back-propagation, logistic regression and Torch..........................................................................................................151 Machine Learning with Torch7: Defining your own Neural Net Module152 . Lua Tutorial in 15 Minutes by Tyler Neylon.............................................152 Google: Punctuation, symbols & operators in search.................................152 WolframAlpha............................................................................................152 Computation and the Future of Mathematics by Stephen Wolfram, Oxford's Podcast........................................................................................................153 Mloss.org....................................................................................................153 Sourceforge.................................................................................................153 AForge.NET Framework............................................................................153 cuda-convnet..............................................................................................153 word2vec.....................................................................................................154 Open Machine Learning Workshop organized by Alekh Agarwal, Alina Beygelzimer, and John Langford, August 2014..........................................154 Maxim Milakov Software...........................................................................154 Alfonso Nieto-Castanon Software..............................................................154 Lib Skylark..................................................................................................155 Mutual Information Text Explorer............................................................155 Data Science Resources by Jonathan Bower on GitHub...........................155 Joseph Misiti Blog.......................................................................................156 Michael Waskom GitHub repositories.......................................................156 Visualizing distributions of data.................................................................156 Exploring Seaborn and Pandas based plot types in HoloViews by Philipp John Frederic Rudiger.........................................................................................157 "Machine Learning: An Algorithmic Perspective" Code by Stephen Marsland.... 157 Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!)..............................................................................................157 Open Source Hong Kong..........................................................................158 Lamda Group, Nanjing University............................................................158 GATE, General Architecture for Text Engineering...................................158 CLARIN, Common Language Resources and Technology Infrastructure159 FLaReNet, Fostering Language Resources Network..................................159 My Data Science Resources by Viktor Shaumann.....................................159 MISCELLANEOUS..................................................................................160 Overleaf (ex WriteLaTeX).........................................................................160 Interview of Dr John Lees-Miller by Imperial College London ACM Student Chapter.......................................................................................................160 LISA Lab GitHub repository, Université de Montréal .............................160 MILA, Institut des algorithmes d'apprentissage de Montréal, Montreal Institute for Learning Algorithms.............................................................................161 Vowpal Wabbit GitHub repository by John Langford...............................161 Google-styleguide: Style guides for Google-originated open-source projects161 BIG DATA/CLOUD COMPUTING, in English.............162 Apache Spark Machine Learning Library.................................................162 Ampcamp, Big Data Boot Camp...............................................................162 Spark Summit 2013 Videos .......................................................................162 Spark Summit 2014 Videos .......................................................................162 Spark Summit 2015 Videos & Slides..........................................................163 Spark Summit Training & Videos..............................................................163 Databricks Videos.......................................................................................163 SF Scala & SF Bay Area Machine Learning, Joseph Bradley: Decision Trees on Spark...........................................................................................................163 Apache Mahout ML library.......................................................................163 Apache Mahout on Javaworld....................................................................164 MapReduce programming with Apache Hadoop, 2008............................164 Hadoop Users Group UK..........................................................................164 Deeplearning4j...........................................................................................164 Udacity opencourseware "Intro to Hadoop and MapReduce" ................165 Storm Apache............................................................................................166 Scaling Apache Storm by Taylor Goetz.....................................................166 Michael Viogiatzis Blog .............................................................................166 Prediction IO..............................................................................................166 PredictionIO tutorial - Thomas Stone - PAPIs.io '14.................................166 Container Cluster Manager.......................................................................167 Domino Data Labs.....................................................................................167 Data Science Central..................................................................................168 Amazon Web Services Videos....................................................................168 Google Cloud Computing Videos..............................................................168 VLAB: Deep Learning: Intelligence from Big Data, Stanford Graduate School of Business..................................................................................................168 Machine Learning and Big Data in Cyber Security Eyal Kolman Technion Lecture .......................................................................................................168 Chaire Machine Learning Big Data, Telecom Paris Tech (Videos in French)168 An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014...................................................................................169 Big Data Requires Big Visions For Big Change | Martin Hilbert | TEDxUCL... 170 Ethical Quandary in the Age of Big Data | Justin Grace | TEDxUCL...170 Big Data & Dangerous Ideas | Daniel Hulme | TEDxUCL....................171 List of good free Programming and Data Resources, BITBOOTCAMP.171 BIG Data, Medical Imaging and Machine Intelligence by Professor H.R.Tizhoosh at the University of Waterloo.............................................172 Session 6: Science in the cloud: big data and new technology...................172 MapReduce for C: Run Native Code in Hadoop by Google Open Source Software......................................................................................................172 Machine Learning & Big Data at Spotify with Andy Sloane, Big Data Madison Meetup.......................................................................................................173 Hands on tutorial on Neo4J with Max De Marzi, Big Data Madison Meetup..... 173 TED Talk: What do we do with all this big data? by Susan Etlinger.........173 Big Data's Big Deal by Viktor Mayer-Schonberger, Oxford's Podcast.......173 BID Data Project - Big Data Analytics with Small Footprint.....................174 SF Big Analytics and SF Machine learning meetup: Machine Learning at the Limit by Prof. John Canny.........................................................................174 COMPETITIONS, in English...........................................176 Angry Birds AI Competition......................................................................176 ChaLearn...................................................................................................176 ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015)177 Kaggle........................................................................................................177 Kaggle Competition Past Solutions............................................................177 Kaggle Connectomics Winning Solution Research Article........................177 Solution to the Galaxy Zoo Challenge by Sander Dieleman.....................177 Winning 2 Kaggle in class competitions on spam......................................178 Matlab Benchmark for Packing Santa’s Sleigh translated in Python.........178 Machine learning best practices we've learned from hundreds of competitions Ben Hamner (Kaggle)................................................................................178 TEDx San Francisco, Jeremy Howard talk (Connecting Devices with Algorithms).................................................................................................178 CrowdANALYTICS..................................................................................178 Challenges for governmental applications..................................................178 InnoCentive Challenge Center..................................................................178 TunedIT.....................................................................................................179 Ants, AI Challenge, sponsored by Google, 2011........................................179 International Collegial Programming Contest...........................................179 Dream challenges.......................................................................................179 Texata.........................................................................................................180 IoT World Forum Young Women's Innovation Grand Challenge.............180 COMPETITIONS, in French............................................182 COMPETITIONS, in Russian...........................................182 Russian AI Cup - Competition Programming Artificial Intelligence.........182 OPEN DATASET, in English.............................................183 Friday Lunch time Lectures at the Open Data Institute, Videos, slides and podcasts (not to be missed!)........................................................................183 Open data Institute: Certify your open data..............................................183 The Text REtrieval Conference (TREC) Datasets.....................................183 HDX Humanitarian Data Exchange.........................................................184 World Data Bank........................................................................................185 US Dataset.................................................................................................185 US City Open Data Census.......................................................................186 Machine Learning repository.....................................................................186 IMAGENET..............................................................................................186 Stanford Large Network Dataset Collection..............................................187 Deep Learning datasets..............................................................................187 Open Government Data (OGD) Platform India.......................................188 Yahoo Datasets...........................................................................................188 Windows Azure Marketplace.....................................................................188 Amazon Public Data Sets...........................................................................188 Wikipedia: Database Download.................................................................189 Gutenberg project (Free books available in different format, useful for NLP)189 Freebase......................................................................................................189 Datamob Data............................................................................................189 Reddit Datasets...........................................................................................189 100+ Interesting Data Sets for Statistics....................................................189 Data portal of the City of Chicago............................................................190 Data portal of the City of Seattle..............................................................190 Data portal of the City of LA....................................................................190 California Department of Water Resources..............................................190 Data portal of the City of Dallas...............................................................191 Data portal of the City of Austin...............................................................191 How to produce and use datasets: lessons learned, mlwave.......................191 MITx and HarvardX release MOOC datasets and visualization tools.....192 Finding the perfect house using open data, Justin Palmer’s Blog...............192 Synapse.......................................................................................................192 NYC Taxi Trips Date from 2013...............................................................192 Sebastian Raschka’s Dataset Collections....................................................192 Awesome Public Datasets by Xiaming Chen, Shanghai, China................192 UK Dataset.................................................................................................193 LONDON DATASTORE - 601 datasets found (28-08-2015)..................193 Transport For London Open Data, UK....................................................193 Gaussian Processes List of Datasets...........................................................193 The New York Times Linked Open Data .................................................194 Google Public Data Explorer.....................................................................194 The Million Song Dataset..........................................................................195 CrowFlower Open Data Library...............................................................195 OPEN DATASET, in French..............................................196 Montreal, Portail Donnees Ouvertes (French&English), Canada..............196 Insee, France...............................................................................................196 RATP Open Data, French Tube in Paris, France......................................196 L’Open-Data français cartographié...........................................................196 OPEN DATASET, China...................................................197 Lamda Group.............................................................................................197 DATA VISUALIZATION..................................................198 Visualization Lab Gallery, Computer Science Division, University of California, Berkeley......................................................................................................198 Visualization Lab Software, Computer Science Division, University of California, Berkeley....................................................................................200 Visualization Lab Course Wiki, Computer Science Division, University of California, Berkeley....................................................................................200 Mike Bostock..............................................................................................200 Eyeo Festival...............................................................................................200 MIT Data Collider.....................................................................................200 D3 JS Data-Driven Documents..................................................................200 Shan He, Research Fellow at MIT Senseable City Lab.............................201 Gource software version control visualization ...........................................201 Logstalgia, website access log visualization................................................201 Andrew Caudwell's Blog............................................................................201 MLDemos , EPFL, Switzerland.................................................................202 The University of Florida Sparse Matrix Collection.................................202 Visualization & Graphics lab, Dept. of CSA and SERC, Indian Institute of Science, Bangalore.....................................................................................203 Allison McCann.........................................................................................203 Scott Murray..............................................................................................203 Gephi: The Open Graph Viz Platform......................................................203 Data Analysis and Visualization Using R by David Robinson...................204 Visualising Data Blog (Huge list of resources, great blog!).........................204 The 8 hats of Data Visualisation Design by Andy Kirk.............................205 Andy Kirk, Visualisation consultant at the Big Data Week, 2013..............205 Image Gallery by the Arts and Humanities Research Council, UK..........205 Setosa.io by Victor Powell & Lewis Lehe...................................................205 BOOKS, in English............................................................206 2015 206 Bayesian Reasoning and Machine Learning, David Barber, 2012 (online version 04-2015)......................................................................................................206 Deep Learning (Artificial Intelligence) , An MIT Press book in preparation, by Yoshua Bengio, Ian Goodfellow and Aaron Courville, Jul-2015................206 Neural Networks and Deep Learning by Michael Nielsen, 2015 .............207 2014 208 An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014...................................................................................208 Deep Learning Tutorial by LISA Lab, University of Montreal, 2014.......209 Statistical Inference for Everyone, by Professor Bryan Blais, 2014............210 Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014............................................................................................................210 Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi, Huan Liu, 2014............................................................................................................211 Causal Inference by Miguel A. Hernán and James M. Robins, May 14, 2014, Draft...........................................................................................................212 Slides for High Performance Python tutorial at EuroSciPy, 2014 by Ian Ozsvald. 213 Probabilistic Programming and Bayesian Methods for Hackers by Cameron Davidson-Pilon, 2014.................................................................................213 Past, Present, and Future of Statistical Science by COPSS, 2014.............213 Essential of Metaheuristics by Sean Luke, 2014........................................213 2013 214 Interactive Data Visualization for the Web By Scott Murray, 2013...........214 Statistical Model Building, Machine Learning, and the Ah-Ha Moment by Grace Wahba, 2013....................................................................................214 An Introduction to Statistical Learning with applications in R. by Gareth James Daniela Witten Trevor Hastie Robert Tibshirani, 2013 (first printing).....214 2012 215 Reinforcement Learning by Richard S. Sutton and Andrew G. Barto, 2012, Second edition in progress (PDF)...............................................................215 R Graphics Cookbook Code Resources (Graphs with ggplot2) by Winston Chang, 2012...............................................................................................215 Supervised Sequence Labelling with Recurrent Neural Networks by Alex Graves, 2012...............................................................................................215 A course in Machine Learning by Hal Daume, 2012................................216 Machine Learning in Action, Peter Harrington, 2012...............................216 A Programmer's Guide to Data Mining, by Ron Zacharski, 2012............216 2010 217 Artificial Intelligence, Foundations of Computational Agents by David Poole and Alan Mackworth, 2010........................................................................217 Introduction to Machine Learning by Ethem Alpaydın, MIT Press, Second Edition, 2010, 579 pages............................................................................217 2009 218 The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J. Friedman, 2009............................................................................................................218 Learning Deep Architecture for AI by Yoshua Bengio, 2009....................219 An Introduction to Information Retrieval by Christopher D. Manning Prabhakar Raghavan Hinrich Schütze, 2009.............................................219 2008 220 Kernel Method in Machine Learning by Thomas Hofmann; Bernhard Schölkopf; Alexander J. Smola, 2008.........................................................220 Introduction to Machine Learning, Alex Smola, S.V.N. Vishwanathan, 2008 220 2006 221 Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006221 Gaussian processes for Machine Learning, C. Rasmussen and C. Williams, 2006 ....................................................................................................................222 2005 222 Bayesian Machine Learning by Chakraborty, Sounak, 2005.....................222 Machine Learning by Tom Mitchell, 2005................................................222 2003 223 Information Theory, Inference, and Learning Algorithms, David McKay, 2003.. 223 MISCELLANEOUS 224 Free Book List.............................................................................................224 Free resource book (need to sign in)...........................................................224 Wikipedia: Machine Learning, the Complete Guide.................................224 ISSUU........................................................................................................224 Neural Networks, A Systematic Introduction by Raul Rojas.....................225 BOOKS, in Spanish............................................................226 BOOKS, in Portuguese.......................................................226 BOOKS, in German...........................................................226 BOOKS, in Italian..............................................................226 BOOKS, in French.............................................................226 BOOKS, in Russian............................................................227 Pattern Recognition by А.Б.Мерков, 2014................................................227 Algorithmic models of learning classification: rationale, comparison, selection, 2014............................................................................................................227 BOOKS, in Japanese..........................................................227 BOOKS, in Chinese...........................................................228 Blog recommending useful books...............................................................228 Textbook for Statistics................................................................................228 Introduction to Pattern recognition............................................................228 Translated version of Machine Learning by Tom Mitchell.......................228 Presentation, Infographics and Documents in English.......229 Meetup's Presentations...............................................................................229 Slideshare.com............................................................................................229 Slides.com...................................................................................................229 Powershow.com..........................................................................................229 Speaker Deck..............................................................................................229 Introduction to Artificial Intelligence, 2014, University of Waterloo........229 Aprendizado de Maquina, Conceitos e definicoes by Jose Augusto Baranauskas . 229 Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF, 2010............................................................................................................230 NYC ML Meetup, 2014.............................................................................230 Statistics with Doodles by Thomas Levine.................................................230 Conferences.........................................................................231 ICML, Lille, France 2015...........................................................................231 ICML, Beijing, China 2014.......................................................................231 ICML, Atlanta, US 2013...........................................................................231 ICML, Edinburgh, UK 2012.....................................................................231 ICML, Bellevue, US 2011..........................................................................231 Full archive of ICML.................................................................................231 Machine Learning Conference Videos......................................................232 Annual Machine Learning Symposium.....................................................232 MLSS Machine Learning Summer Schools..............................................232 Data Gotham 2012, 2013...........................................................................232 Meetup ......................................................................................................232 Data Science Weekly List of Meetups.....................................................232 London Machine Learning Meetup...........................................................232 BLOGS, in English.............................................................233 Igor Carron Blog........................................................................................233 Data Science Weekly..................................................................................233 Yann LeCun, Google+...............................................................................233 KDD Community, Knowledge discovery and Data Mining......................233 Kaggle Blog................................................................................................233 Digg............................................................................................................234 Feedly..........................................................................................................234 Mlwave.......................................................................................................234 FastML.......................................................................................................234 Beating the Benchmark..............................................................................234 Trevor Stephens Blog.................................................................................235 Mozilla Hacks.............................................................................................235 Banach's Algorithmic Corner, University of Warsaw................................235 DataCamp Blog..........................................................................................235 Natural Language Processing Blog, Hal Daume........................................235 Maxim Milakov Blog..................................................................................235 Alfonso Nieto-Castanon Blog.....................................................................235 Persontyle Blog...........................................................................................236 Analytics Vidhya.........................................................................................236 Bugra Akyildiz Blog....................................................................................237 Rasbt Blog..................................................................................................237 Gilles Louppe Blog.....................................................................................237 AI Topics....................................................................................................237 AI International..........................................................................................237 Joseph Misiti Blog.......................................................................................237 MIRI, Machine Intelligence Research Institute.........................................238 Kevin Davenport Data Blog.......................................................................238 Alexandre Passant Blog..............................................................................238 Daniel Nouri Blog......................................................................................239 Yvonne Rogers Blog...................................................................................239 Igor Subbotin Blog (Both in English & Russian)........................................239 Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!)..............................................................................................239 Popular Science Website.............................................................................240 HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER .........................................................240 Max Woolf Blog.........................................................................................240 Rasmus Bååth Research Blog.....................................................................240 Flowing Data Blog......................................................................................241 The Shape of Data Blog............................................................................241 Data School Blog........................................................................................242 Julia Evans Blog..........................................................................................242 Stephan Hügel's Blog.................................................................................243 BACKCHANNEL "Tech Stories Hub" by Steven Levy............................244 DataScience Vegas.....................................................................................245 The Twitter Developer Blog.......................................................................245 Tyler Neylon Blog.......................................................................................245 Victor Powell Blog......................................................................................245 CrowFlower Blog........................................................................................245 Edward Raff Blog......................................................................................245 Dirk Gorissen Blog and Projects.................................................................246 Joseph Jacobs Homepage & Blog...............................................................246 MISCELLANEOUS..................................................................................246 Allen Institute for Artificial Intelligence (AI2)............................................246 Artificial General Intelligence (AGI) Society..............................................247 AUAI, Association for Uncertainty in Artificial Intelligence.....................247 BLOGS, in Spanish.............................................................248 BLOGS, in Portuguese........................................................248 BLOGS, in Italian...............................................................248 BLOGS, in German............................................................248 BLOGS, in French..............................................................249 L'ATELIER's News ...................................................................................249 BLOGS, in Russian.............................................................250 Igor Subbotin's Blog (Both in English & Russian) (Huge list of resources)250 BLOGS, in Japanese...........................................................251 BLOGS, in Chinese............................................................251 JOURNALS, in English......................................................252 Journal of Machine Learning Research, MIT Press..................................252 Machine Learning Journal (last article could be downloaded for free)......252 Machine Learning (Theory).......................................................................252 List of Journals on Microsoft Academic Research website........................252 Wired magazine..........................................................................................252 Data Science Central..................................................................................252 JOURNALS, in Spanish.....................................................253 JOURNALS, in Portuguese................................................253 JOURNALS, in Italian.......................................................253 JOURNALS, in German....................................................253 JOURNALS, in French.......................................................253 JOURNALS, in Russian.....................................................253 JOURNALS, in Japanese....................................................254 JOURNALS, in Chinese.....................................................254 FORUM, Q&A, in English.................................................255 Data Tau.....................................................................................................255 Hacker News..............................................................................................255 Kaggle Forums...........................................................................................255 Reddit /r/MachineLearning.....................................................................255 Reddit /r/generative..................................................................................256 Cross validated Stack Exchange.................................................................256 Open data Stack Exchange........................................................................256 Data Science Beta Stack Exchange............................................................256 Quora.........................................................................................................256 Machine Learning Impact Forum..............................................................257 FORUM, Q&A, in Spanish................................................258 FORUM, Q&A, in Portuguese...........................................258 FORUM, Q&A, in Italian...................................................258 FORUM, Q&A, in German...............................................258 FORUM, Q&A, in French..................................................258 FORUM, Q&A, in Russian.................................................259 Reddit in Russian .......................................................................................259 Habrahabr.ru Forum (in Russian translated by Google Chrome)..............259 FORUM, Q&A, in Japanese...............................................260 FORUM, Q&A, in Chinese................................................260 Zhihu.com..................................................................................................260 Guokr.com..................................................................................................260 Governmental REPORTS, in English................................262 Big Data report, Whitehouse, US..............................................................262 FUN, in English...................................................................263 Founder of PhD Comics............................................................................263 MACHINE LEARNING RESEARCH GROUPS, in USA264 Computer Science and Artificial Intelligence Lab, MIT...........................264 Artificial Intelligence Laboratory, Stanford University..............................264 Machine Learning Department, Carnegie Mellon University..................265 Noah's ARK Research Group, Carnegie Mellon University.....................265 Intelligent Interactive Systems Group, Harvard University.......................265 Statistical Machine Learning, University of California, Berkeley..............266 UC Berkeley AMPLab, AMP: ALGORITHMS MACHINES PEOPLE267 Berkeley Institute for Data Science............................................................267 Department of Computer Science - ARTIFICIAL INTELLIGENCE & MACHINE LEARNING, Princeton University........................................268 Research Laboratories and Groups, University of California, Los Angeles (UCLA).......................................................................................................268 Cornwell University...................................................................................269 Machine Learning Research, University of Illinois at Urbana Champaign269 Department of Computing + Mathematical Science, California Institute of Technology, Caltech...................................................................................269 Machine Learning, University of Washington...........................................270 "Big Data" Research and Education, University of Washington...............270 Social Robotics Lab - Yale University........................................................270 ML@GT, Georgia Institute of Technology...............................................271 Machine Learning Research Group, University of Texas and Austin.......271 Penn Research in Machine Learning, University of Pennsylvania............271 Machine Learning @ Columbia University...............................................271 New York City University...........................................................................271 University of Chicago................................................................................272 The Johns Hopkins Center for Language and Speech Processing (CLSP) Archive Videos............................................................................................272 MISCELLENEAOUS...............................................................................272 IARPA Organization..................................................................................272 MACHINE LEARNING RESEARCH GROUPS, in Canada 273 Machine Learning Lab, University of Toronto.........................................273 The Fields Institute for Research in Mathematical Science, University of Toronto.......................................................................................................273 Artificial Intelligence Research Group, University of Waterloo.................273 Artificial Intelligence Research Groups, University of British Columbia .274 MILA, Machine Learning Lab, University of Montreal...........................275 Intelligence artificielle, University of Sherbrooke......................................276 Centre de recherche sur les environnements intelligents, University of Sherbrooke.................................................................................................276 Machine Learning Research Group, University of Laval..........................277 MACHINE LEARNING RESEARCH GROUPS, in Brazil278 MACHINE LEARNING RESEARCH GROUPS, in United Kingdom.............................................................................279 The Centre for Computational Statistics and Machine Learning (CSML), University College London........................................................................279 CASA (Centre for Advanced Spatial Studies) Working Papers, University College London..........................................................................................279 The Machine Learning Research Group in the Department of Engineering Science, Oxford University.........................................................................280 Machine Learning Group, Imperial College..............................................281 The Data Science Institute, Imperial College............................................282 The University of Edinburgh, Institute for Adaptive and Neural Computation... 282 Cambridge University................................................................................282 Centre for Intelligent Sensing, Queen Mary University of London..........282 ICRI, The Intel Collaborative Research Institute......................................283 MACHINE LEARNING RESEARCH GROUPS, in France 284 Magnet, MAchine learninG in information NETworks, INRIA...............284 Sierra Team - Ecole Normale Superieure , CNRS, INRIA.......................284 ENS Ecole Normale Superieure................................................................285 WILLOW Publications and PhD Thesis....................................................286 Laboratoire Hubert Curien UMR CNRS 5516, Machine Learning........286 MACHINE LEARNING RESEARCH GROUPS, in Germany .............................................................................................288 Max Planck Institute for Intelligent Systems, Tübingen site......................288 BRML Research Lab, Institute of Informatics at the Technische Universität München....................................................................................................288 HCI, Heidelberg Collaboratory for Image Processing, Universität Heidelberg.... 289 MACHINE LEARNING RESEARCH GROUPS, in Switzerland .........................................................................290 EPFL Ecole Polytechnique Federale de Lausanne, Switzerland................290 IDSIA: the Swiss AI Lab............................................................................290 MACHINE LEARNING RESEARCH GROUPS, in Netherlands.........................................................................292 Machine Learning Research Groups in The Netherlands.........................292 MACHINE LEARNING RESEARCH GROUPS, in POLAND............................................................................293 University of Warsaw, Dept. of Mathematics, Informatics and Mechanics293 MACHINE LEARNING RESEARCH GROUPS, in India294 RESEARCH LABS, Department of Computer Science and Automation, IISc, Bangalore....................................................................................................294 MLSIG: Machine Learning Special Interest Group, Indian Institute of Science. 294 MACHINE LEARNING RESEARCH GROUPS, in China 295 Peking University........................................................................................295 University of Science and Technology of China, USTC..........................296 Nanjing University.....................................................................................296 MACHINE LEARNING RESEARCH GROUPS, in Russia. 298 Moscow State University............................................................................298 MACHINE LEARNING RESEARCH GROUPS, in Australia 299 NICTA Machine Learning Research Group.............................................299 ACADEMICS, USA...........................................................300 Andrew Ng, Stanford University................................................................300 Emmanuel Candes, Stanford University....................................................300 Tom Mitchell, Carnegie Mellon University (CMU)...................................300 Robert Kass, CMU....................................................................................301 Alexander J. Smola, CMU.........................................................................301 Maria-Florina Balcan, CMU.....................................................................302 Abulhair Saparov, CMU............................................................................302 John Canny, Berkeley University,................................................................302 Robert Schapire, Princeton University.......................................................303 Mona Singh, Princeton University.............................................................303 Olga Troyanskaya, Princeton University...................................................303 Judea Pearl, Cognitive System Laboratory, UCLA....................................304 Justin Esarey Lectures, Assistant Professor of Political Science, Rice University... 304 Hal Daume III, University of Maryland...................................................304 Melanie Mitchell, Portland State University..............................................305 ACADEMICS, France........................................................306 Francis Bach, Ecole Normale Supérieure..................................................306 Gaël Varoquaux, INRIA............................................................................306 ACADEMICS, in United Kingdom...................................308 John Shaw-Taylor, University College London..........................................308 Mark Herbster, University College London...............................................308 David Barber, University College London.................................................309 Gabriel Brostow, University College London.............................................309 Jun Wang, University College London.......................................................309 David Jones Lab, University College London............................................310 Simon Prince, University College London.................................................310 Massimiliano Pontil, University College London.......................................311 Richard E Turner, Cambridge University..................................................311 Andrew McHutchon Homepage, Cambridge University..........................311 Phil Blunsom, Oxford University...............................................................312 Nando de Freitas, Oxford University.........................................................312 Karl Hermann, Oxford University............................................................312 Edward Grefenstette, Oxford University...................................................313 ACADEMICS, in Netherlands...........................................314 Thomas Geijtenbeek Publications & Videos, Delft University of Technology 314 ACADEMICS, in Canada..................................................315 Yoshua Bengio, University of Montreal.....................................................315 KyungHyun Cho, University of Montreal.................................................315 Geoffrey Hinton, University of Toronto....................................................315 Alex Graves, University of Toronto...........................................................316 Hugo Larochelle, Universite de Sherbrooke..............................................316 Giuseppe Carenini, University of British Columbia..................................317 Cristina Conati, University of British Columbia.......................................317 Kevin Leyton-Brown, University of British Columbia..............................317 Holger Hoos, University of British Columbia...........................................317 Jim Little, University of British Columbia.................................................317 David Lowe, University of British Columbia.............................................317 Karon MacLean, University of British Columbia.....................................317 Alan Mackworth, University of British Columbia.....................................317 Dinesh K. Pai, University of British Columbia..........................................317 David Poole, University of British Columbia.............................................317 Prof. Shai Ben-David, University of Waterloo..........................................318 ACADEMICS, in Germany...............................................319 Machine Learning Lab, University of Freiburg.........................................319 ACADEMICS, in China.....................................................320 En-Hong Chen, USPC...............................................................................320 Linli Xu, USPC..........................................................................................320 Yuan Yao, School of Mathematical Sciences, University of Beijing..........320 ACADEMICS, in Australia................................................321 Prof. Peter Corke, Queensland University of Technology.........................321 ACADEMICS, in United Arab Emirates...........................322 Dmitry Efimov, American University of Sharjah, UAE............................322 ACADEMICS, in Poland....................................................323 Marcin Murca, University of Warsaw, POLAND.....................................323 ACADEMICS, in Switzerland............................................324 Prof. Jürgen Schmidhuber's Home Page (Great resources! Not to be missed!)...... 324 Free access to ML MSc & PhD Dissertations.....................325 Machine Learning Department, Carnegie Mellon University..................325 Machine Learning Department, Columbia University..............................325 Non linear Modelling and Control using Gaussian Processes, PhD Thesis by Andrew McHutchon, Cambridge University.............................................325 PhD Dissertations, University of Edingburgh, UK....................................326 MSc Dissertations, University of Oxford, UK...........................................326 Machine Learning Group, Department of Engineering, University of Cambridge, UK..........................................................................................326 New York University Computer Science PhD Theses...............................326 Digital Collection of The Australian National University (PhD Thesis)...326 TEL (thèses-EN-ligne) (more than 45,000 thesis, however some in French!)326 ABOUT The Machine Learning Salon Starter Kit The Machine Learning Salon Starter Kit is a selection of useful websites compiled by Jacqueline Isabelle Forien. The Starter Kit is free of charge and no registration is required to download it. There is no advertising. The useful websites are gathered on Blogs & Forums such as DataTau.com, Groups on LinkedIn, posts on Twitter, publications on Google Scholar and Machine Learning Research Group websites, etc. All descriptions are coming from the websites themselves. If you want to remove a link, please tell me why and I will take care of it as soon as possible. If you want to add a better description of your website, please send me the new version and I will do the change. Contact at contact@machinelearningsalon.org Founder of The Machine Learning Salon My name is Jacqueline Isabelle Forien and I am from Tours, France, a small city located in the middle of the Loire Valley. I am married and have four children. After an Engineer's degree in Computer Science at the UTC Engineering school and few years of work experience in that field, I decided to become a Mathematics teacher. I am still teaching but in the meantime became passionate about Artificial Intelligence and more specifically, Machine Learning. In 2013, I decided to start studying again at 53 years old and soon graduated from University College London in M.Sc Machine Learning. Soon after, I decided to create the Machine Learning Salon during my spare time so that I could stay updated on the changes that happen regularly in that field. I would like to express a special gratitude to my director of Machine Learning studies at UCL, Professor Mark Herbster, my tutor, Professor David Barber, my supervisor of Master's project, Professor Nadia Berthouze, as well as all my peers during this amazing year. In addition, I would like to express many thanks to Igor Carron who suggested the smart association of « Machine Learning » and « Salon », and gave me the opportunity to organise in London a wonderful event that was the Europe Wide Machine Learning Meetup between Paris, Berlin, Zurich and London with Andrew Ng as a Guest speaker. I hope that this Starter Kit will help many people learn and get more involved in this passionate field that is Machine Learning! Jacqueline Please, feel free to contact me if you want to add a contribution, remove a link, etc. Any suggestion or feedback is welcome! Contact at contact@machinelearningsalon.org MOOC, Opencourseware in English COURSERA: Machine Learning Stanford Course About the Course This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas. https://www.coursera.org/course/ml COURSERA: Pratical Machine Learning Part of the Data Science Specialization » About the Course One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation. https://www.coursera.org/course/predmachlearn COURSERA: Neural Networks for Machine Learning Neural Networks use learning algorithms that are inspired by our understanding of how the brain learns, but they are evaluated by how well they work for practical applications such as speech recognition, object recognition, image retrieval and the ability to recommend products that a user will like. As computers become more powerful, Neural Networks are gradually taking over from simpler Machine Learning methods. They are already at the heart of a new generation of speech recognition devices and they are beginning to outperform earlier systems for recognizing objects in images. The course will explain the new learning procedures that are responsible for these advances, including effective new proceduresr for learning multiple layers of non-linear features, and give you the skills and understanding required to apply these procedures in many other domains. https://www.coursera.org/course/neuralnets COURSERA: Data Science Specialization https://www.coursera.org/specialization/jhudatascience/1?utm medium=listingPage The Data Scientist’s Toolbox Part of the Data Science Specialization » Course Syllabus Upon completion of this course you will be able to identify and classify data science problems. You will also have created your Github account, created your first repository, and pushed your first markdown file to your account. https://www.coursera.org/course/datascitoolbox Getting and Cleaning Data Part of the Data Science Specialization » About the Course Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data. https://www.coursera.org/course/getdata Exploratory Data Analysis Part of the Data Science Specialization » About the Course This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data. https://www.coursera.org/course/exdata Statistical Inference Part of the Data Science Specialization » About the Course Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data. https://www.coursera.org/course/statinference Regression Models Part of the Data Science Specialization » About the Course Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing. https://www.coursera.org/course/regmods Developing Data Products Part of the Data Science Specialization » About the Course A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. This course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course will focus on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience. https://www.coursera.org/course/devdataprod COURSERA: Reasoning, Data Analysis and Writing Specialization Data Analysis and Statistical Inference Part of the Reasoning, Data Analysis and Writing Specialization » About the Course The goals of this course are as follows: • Recognize the importance of data collection, identify limitations in data collection methods, and determine how they affect the scope of inference. • Use statistical software (R) to summarize data numerically and visually, and to perform data analysis. • Have a conceptual understanding of the unified nature of statistical inference. • Apply estimation and testing methods (confidence intervals and hypothesis tests) to analyze single variables and the relationship between two variables in order to understand natural phenomena and make data-based decisions. • Model and investigate relationships between two or more variables within a regression framework. • Interpret results correctly, effectively, and in context without relying on statistical jargon. • Critique data-based claims and evaluate data-based decisions. Complete a research project that employs simple statistical inference and modeling techniques. https://www.coursera.org/course/statistics Process Mining: Data science in Action About the Course Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). This technology has become available only recently, but it can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational, understanding the browsing behavior of customers using a booking site, analyzing failures of a baggage handling system, and improving the user interface of an X-ray machine. All of these applications have in common that dynamic behavior needs to be related to process models. Hence, we refer to this as "data science in action". The course explains the key analysis techniques in process mining. Participants will learn various process discovery algorithms. These can be used to automatically learn process models from raw event data. Various other process analysis techniques that use event data will be presented. Moreover, the course will provide easy-to-use software, real-life data sets, and practical skills to directly apply the theory in a variety of application domains. https://www.coursera.org/course/procmin COURSERA: Data Mining Specialization https://www.coursera.org/specialization/datamining/20?utm medium=courseDescripTop Pattern Discovery in Data Mining Part of the Data Mining Specialization » About the Course Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern discovery in data mining. We will also introduce methods for pattern-based classification and some interesting applications of pattern discovery. This course provides you the opportunity to learn skills and content to practice and engage in scalable pattern discovery methods on massive transactional data, discuss pattern evaluation measures, and study methods for mining diverse kinds of patterns, sequential patterns, and sub-graph patterns. https://www.coursera.org/course/patterndiscovery Text Retrieval and Search Engines Part of the Data Mining Specialization » About the Course Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines. https://www.coursera.org/course/textretrieval Text Mining and Analytics Part of the Data Mining Specialization » About the Course This course will cover the major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort. Detailed analysis of text data requires understanding of natural language text, which is known to be a difficult task for computers. However, a number of statistical approaches have been shown to work well for the "shallow" but robust analysis of text data for pattern finding and knowledge discovery. You will learn the basic concepts, principles, and major algorithms in text mining and their potential applications. https://www.coursera.org/course/textanalytics Cluster Analysis in Data Mining Part of the Data Mining Specialization » About the Course Discover the basic concepts of cluster analysis, and then study a set of typical clustering methodologies, algorithms, and applications. This includes partitioning methods such as k-means, hierarchical methods such as BIRCH, density-based methods such as DBSCAN/OPTICS, probabilistic models, and the EM algorithm. Learn clustering and methods for clustering high dimensional data, streaming data, graph data, and networked data. Explore concepts and methods for constraint-based clustering and semi-supervised clustering. Finally, see examples of cluster analysis in applications. https://www.coursera.org/course/clusteranalysis Data Visualization Part of the Data Mining Specialization » About the Course Learn to present data to an observer in a way that yields insight and understanding. The first week focuses on the infrastructure for data visualization. It introduces elementary graphics programming, focusing primarily on two-dimensional vector graphics and the programming platforms for graphics. This infrastructure will also include lessons on the human side of visualization, studying human perception and cognition to gain a better understanding of the target of the data visualization. The second week will utilize the knowledge of graphics programming and human perception in the design and construction of visualizations, starting with simple charts and graphs and incorporating animation and user interactivity. The third week expands the data visualization vocabulary with more sophisticated methods, including hierarchical layouts and networks. The final week focuses on visualization of database and data mining processes, with methods specifically focused on visualization of unstructured information, such as text, and systems for visual analytics that provide decision support. https://www.coursera.org/course/datavisualization COURSERA: Cloud Computing Specialization https://www.coursera.org/specialization/cloudcomputing/19?utm_medium=listingPage Cloud Computing Concepts Part of the Cloud Computing Specialization » About the Course Cloud computing systems today, whether open-source or used inside companies, are built using a common set of core techniques, algorithms, and design philosophies all centered around distributed systems. Learn about such fundamental distributed computing "concepts" for cloud computing. Some of these concepts include: • Clouds, MapReduce, key-value stores • Classical precursors • Widely-used algorithms • Classical algorithms • Scalability • Trending areas • And more! Understand how these techniques work inside today’s most widely-used cloud computing systems. Get your hands dirty using these concepts with provided homework exercises. In the optional programming track, implement some of these concepts in template assignments provided in C programming language. You will also watch interviews with leading managers and researchers, from both industry and academia. https://www.coursera.org/course/cloudcomputing Cloud Computing Concepts: Part 2 Part of the Cloud Computing Specialization » https://www.coursera.org/course/cloudcomputing2 Cloud Computing Applications Part of the Cloud Computing Specialization » About the Course Learn of "cloudonomics," the underlying economic reasons that we are creating the cloud. Learn the basic concepts underlying cloud services and be able to use services like AWS or OpenStack Dashboard to construct cloud services or applications. Demonstrate your ability to create web services, massively parallel data intensive computations using Map/Reduce, NoSQL databases, and real-time processing of real-time data streams. Use machine learning tools to solve simple problems. This course serves as an introduction to building applications for cloud computing based on emerging OpenStack and other platforms. The course includes concepts of: • Baremetal provisioning • Neutron networking • Identity service • Image service • Orchestration • Infrastructure as a service • Software as a service • Platform as a service • MapReduce • Big data • Analytics • Privacy and legal issues The course will also include example problems and solutions to cloud computing, including hands-­‐on laboratory experiments (Load Balancing and Web Services, MapReduce, Hive, Storm, and Mahout). Case studies will be drawn from Yahoo, Google, Twitter, Facebook, data mining, analytics, and machine learning. https://www.coursera.org/course/cloudapplications Cloud Networking Part of the Cloud Computing Specialization » About the Course In the cloud networking course, we will see what the network needs to do to enable cloud computing. We will explore current practice by talking to leading industry experts, as well as looking into interesting new research that might shape the cloud network’s future. This course will allow us to explore in-­‐depth the challenges for cloud networking how do we build a network infrastructure that provides the agility to deploy virtual networks on a shared infrastructure, that enables both efficient transfer of big data and low latency communication, and that enables applications to be federated across countries and continents? Examining how these objectives are met will set the stage for the rest of the course. This course places an emphasis on both operations and design rationale i.e., how things work and why they were designed this way. We're excited to start the course with you and take a look inside what has become the critical communications infrastructure for many applications today. https://www.coursera.org/course/cloudnetworking COURSERA: Miscellaneous Core Concepts in Data Analysis (Higher School of Economics) Learn both theory and application for basic methods that have been invented either for developing new concepts principal components or clusters, or for finding interesting correlations regression and classification. This is preceded by a thorough analysis of 1D and 2D data This is an unconventional course in modern Data Analysis, Machine Learning and Data Mining. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. The term summarization embraces here both simple summaries like totals and means and more complex summaries: the principal components of a set of features and cluster structures in a set of entities. Similarly, correlation covers both bivariate and multivariate relations between input and target features including Bayes classifiers. https://www.coursera.org/course/datan Natural Language Processing Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models. https://www.coursera.org/course/nlangp Probability About the Course The renowned mathematical physicist Pierre-Simon, marquis de Laplace wrote in his opus on probability in 1812 that “the most important questions of life are, for the most part, really only problems in probability”. His words ring particularly true today in this the century of “big data”. This introductory course takes us through the development of a modern, axiomatic theory of probability. But, unusually for a technical subject, the material is presented in its lush and glorious historical context, the mathematical theory buttressed and made vivid by rich and beautiful applications drawn from the world around us. The student will see surprises in electionday counting of ballots, a historical wager the sun will rise tomorrow, the folly of gambling, the sad news about lethal genes, the curiously persistent illusion of the hot hand in sports, the unreasonable efficacy of polls and its implications to medical testing, and a host of other beguiling settings. A curious individual taking this as a stand-alone course will emerge with a nuanced understanding of the chance processes that surround us and an appreciation of the colourful history and traditions of the subject. And for the student who wishes to study the subject further, this course provides a sound mathematical foundation for courses at the advanced undergraduate or graduate levels. https://www.coursera.org/course/probability Probabilistic Graphical Models Uncertainty is unavoidable in real-world applications: we can almost never predict with certainty what will happen in the future, and even in the present and the past, many important aspects of the world are not observed with certainty. Probability theory gives us the basic foundation to model our beliefs about the different possible states of the world, and to update these beliefs as new evidence is obtained. These beliefs can be combined with individual preferences to help guide our actions, and even in selecting which observations to make. While probability theory has existed since the 17th century, our ability to use it effectively on large problems involving many inter-related variables is fairly recent, and is due largely to the development of a framework known as Probabilistic Graphical Models (PGMs). This framework, which spans methods such as Bayesian networks and Markov random fields, uses ideas from discrete data structures in computer science to efficiently encode and manipulate probability distributions over highdimensional spaces, often involving hundreds or even many thousands of variables. These methods have been used in an enormous range of application domains, which include: web search, medical and fault diagnosis, image understanding, reconstruction of biological networks, speech recognition, natural language processing, decoding of messages sent over a noisy communication channel, robot navigation, and many more. The PGM framework provides an essential tool for anyone who wants to learn how to reason coherently from limited and noisy observations. https://www.coursera.org/course/pgm Machine Learning Techniques by Hsuan-Tien Lin, National Taiwan University About the Course Welcome! The instructor has decided to teach the course in Mandarin on Coursera, while the slides of the course will be in English to ease the technical illustrations. We hope that this choice can help introduce Machine Learning to more students in the Mandarin-speaking world. The English-written slides will not require advanced English ability to understand, though. If you can understand the following descriptions of this course, you can probably follow the slides. https://www.coursera.org/course/ntumltwo High Performance Scientific Computing About the Course Computation and simulation are increasingly important in all aspects of science and engineering. At the same time writing efficient computer programs to take full advantage of current computers is becoming increasingly difficult. Even laptops now have 4 or more processors, but using them all to solve a single problem faster often requires rethinking the algorithm to introduce parallelism, and then programming in a language that can express this parallelism. Writing efficient programs also requires some knowledge of machine arithmetic, computer architecture, and memory hierarchies. Although parallel computing will be covered, this is not a class on the most advanced techniques for using supercomputers, which these days have tens of thousands of processors and cost millions of dollars. Instead, the goal is to teach tools that you can use immediately on your own laptop, desktop, or a small cluster. Cloud computing will also be discussed, and students who don't have a multiprocessor computer of their own will still be able to do projects using Amazon Web Services at very low cost. Along the way there will also be discussion of software engineering tools such as debuggers, unit testing, Makefiles, and the use of version control systems. After all, your time is more valuable than computer time, and a program that runs fast is totally useless if it produces the wrong results. High performance programming is also an important aspect of high performance scientific computing, and so another main theme of the course is the use of basic tools and techniques to improve your efficiency as a computational scientist. https://www.coursera.org/course/scicomp Statistical Analysis of fMRI Data About the Course In this course we will explore the intersection of statistics and functional magnetic resonance imaging, or fMRI, which is a non-invasive technique for studying brain activity. We will discuss the analysis of fMRI data, from its acquisition to its use in locating brain activity, making inference about brain connectivity and predictions about psychological or disease states. A standard fMRI study gives rise to massive amounts of noisy data with a complicated spatiotemporal correlation structure. Statistics plays a crucial role in understanding the nature of the data and obtaining relevant results that can be used and interpreted by neuroscientists. https://www.coursera.org/course/fmri STANFORD University: Stanford Engineering Everywhere SEE programming includes one of Stanford’s most popular engineering sequences: the threecourse Introduction to Computer Science taken by the majority of Stanford undergraduates, and seven more advanced courses in artificial intelligence and electrical engineering. Introduction to Computer Science Programming Methodology CS106A Programming Abstractions CS106B Programming Paradigms CS107 Artificial Intelligence Introduction to Robotics CS223A Natural Language Processing CS224N Machine Learning CS229 Linear Systems and Optimization The Fourier Transform and its Applications EE261 Introduction to Linear Dynamical Systems EE263 Convex Optimization I EE364A Convex Optimization II EE364B Additional School of Engineering Courses Programming Massively Parallel Processors CS193G iPhone Application Programming CS193P Seminars and Webinars http://see.stanford.edu/see/courses.aspx STANFORD University: 2015 Stanford HPC Conference Video Gallery HPC Advisory Council Stanford Workshop 2015 The HPC Advisory Council, together with Stanford University, will hold the HPC Advisory Council Stanford Conference 2015 on February 2-3, 2015, at Stanford, California. The conference will focus on High-Performance Computing (HPC) usage models and benefits, the future of supercomputing, latest technology developments, best practices and advanced HPC topics. In addition, there will be a strong focus on socially responsible computing, with advancements in solutions for the small to medium enterprise to have better use of power, cooling, hardware, and software. The conference is open to the public and will bring together system managers, researchers, developers, computational scientists and industry affiliates. http://insidehpc.com/2015-stanford-hpc-conference-video-gallery/ STANFORD University: Awni Hannun of Baidu Research Published on 5 Feb 2015 "Deep Speech: Scaling up end-to-end speech recognition" - Awni Hannun of Baidu Research Colloquium on Computer Systems Seminar Series (EE380) presents the current research in design, implementation, analysis, and use of computer systems. Topics range from integrated circuits to operating systems and programming languages. It is free and open to the public, with new lectures each week. https://www.youtube.com/watch?v=P9GLDezYVX4&spfreload=10 STANFORD University: Steve Cousins of Savioke Published on 29 Jan 2015 "Service Robots Are Here" - Steve Cousins of Savioke Colloquium on Computer Systems Seminar Series (EE380) presents the current research in design, implementation, analysis, and use of computer systems. Topics range from integrated circuits to operating systems and programming languages. It is free and open to the public, with new lectures each week. https://www.youtube.com/watch?v=dn74oHbhRuk&spfreload=10 STANFORD University: Ron Fagin of IBM Research Published on 5 Feb 2015 "Applying Theory to Practice (and Practice to Theory)" -Ron Fagin This seminar features leading Industrial and academic experts on big data analytics, information management, data mining, machine learning, and large-scale data processing. https://www.youtube.com/watch?v=zEcJhDgyTow&spfreload=10 STANFORD University: CS224d: Deep Learning for Natural Language Processing by Richard Socher, 2015 Course Description Natural language processing (NLP) is one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate most everything in language: web search, advertisement, emails, customer service, language translation, radiology reports, etc. There are a large variety of underlying tasks and machine learning models powering NLP applications. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. These models can often be trained with a single end-to-end model and do not require traditional, task-specific feature engineering. In this spring quarter course students will learn to implement, train, debug, visualize and invent their own neural network models. The course provides a deep excursion into cutting-edge research in deep learning applied to NLP. The final project will involve training a complex recurrent neural network and applying it to a large scale NLP problem. On the model side we will cover word vector representations, window-based neural networks, recurrent neural networks, long-short-term-memory models, recursive neural networks, convolutional neural networks as well as some very novel models involving a memory component. Through lectures and programming assignments students will learn the necessary engineering tricks for making neural networks work on practical problems. http://cs224d.stanford.edu/syllabus.html EdX: Articifial Intelligence (BerkeleyX) CS188.1x is a new online adaptation of the first half of UC Berkeley's CS188: Introduction to Artificial Intelligence. The on-campus version of this upper division computer science course draws about 600 Berkeley students each year. Artificial intelligence is already all around you, from web search to video games. AI methods plan your driving directions, filter your spam, and focus your cameras on faces. AI lets you guide your phone with your voice and read foreign newspapers in English. Beyond today's applications, AI is at the core of many new technologies that will shape our future. From self-driving cars to household robots, advancements in AI help transform science fiction into real systems. CS188.1x focuses on Behavior from Computation. It will introduce the basic ideas and techniques underlying the design of intelligent computer systems. A specific emphasis will be on the statistical and decision theoretic modeling paradigm. By the end of this course, you will have built autonomous agents that efficiently make decisions in stochastic and in adversarial settings. CS188.2x (to follow CS188.1x, precise date to be determined) will cover Reasoning and Learning. With this additional machinery your agents will be able to draw inferences in uncertain environments and optimize actions for arbitrary reward structures. Your machine learning algorithms will classify handwritten digits and photographs. The techniques you learn in CS188x apply to a wide variety of artificial intelligence problems and will serve as the foundation for further study in any application area you choose to pursue. https://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-cs188-1xartificial-579#.U4CqKl6RPwI EdX: Big Data and Social Physics (Ethics) Social physics is a big data science that models how networks of people behave and uses these network models to create actionable intelligence. It is a quantitative science that can accurately predict patterns of human behavior and guide how to influence those patterns to (for instance) increase decision making accuracy or productivity within an organization. Included in this course is a survey of methods for increasing communication quality within an organization, approaches to providing greater protection for personal privacy, and general strategies for increasing resistance to cyber attack. https://www.edx.org/course/mitx/mitx-mas-s69x-big-data-socialphysics-1737#.U4Cox5RdWG4 EdX: Introduction to Computational Thinking and Data Science 6.00.2x is aimed at students with some prior programming experience in Python and a rudimentary knowledge of computational complexity. We have chosen to focus on breadth rather than depth. The goal is to provide students with a brief introduction to many topics, so that they will have an idea of what’s possible when the time comes later in their career to think about how to use computation to accomplish some goal. That said, it is not a “computation appreciation” course. Students will spend a considerable amount of time writing programs to implement the concepts covered in the course. Topics covered include plotting, stochastic programs, probability and statistics, random walks, Monte Carlo simulations, modeling data, optimization problems, and clustering. https://www.edx.org/course/mitx/mitx-6-00-2x-introduction-computational-2836 MIT OpenCourseWare (OCW) OCW makes the materials used in the teaching of MIT's subjects available on the Web. http://ocw.mit.edu/index.htm https://www.youtube.com/user/MIT VLAB MIT Entreprise Forum Bay Area, Machine Learning Videos Added the 22-Nov-2014 Discovery of Disruptive Innovations & Actionable Ideas. VLAB is the San Francisco Bay Area chapter of the MIT Enterprise Forum, a non-profit organization dedicated to promoting the growth and success of high-tech entrepreneurial ventures by connecting ideas, technology and people. We provide a forum for San Francisco and Silicon Valley's leading entrepreneurs, industry experts, venture capitalists, private investors and technologists to exchange insights about how to effectively grow high-tech ventures amidst dynamic market risks and challenges. In a world where markets change at breakneck speed, knowledge is a critical source of competitive advantage. Our forums provide an excellent opportunity to network and learn about pivotal business issues, emerging industries and the latest technologies. http://www.youtube.com/user/vlabvideos/search?query=machine learning Foundations of Machine Learning by Mehryar Mohri - 10 years of Homeworks with Solutions and Lecture Slides Course Description This course introduces the fundamental concepts and methods of machine learning, including the description and analysis of several modern algorithms, their theoretical basis, and the illustration of their applications. Many of the algorithms described have been successfully used in text and speech processing, bioinformatics, and other areas in real-world products and services. The main topics covered are: Probability tools, concentration inequalities PAC model Rademacher complexity, growth function, VC-dimension Perceptron, Winnow Support vector machines (SVMs) Kernel methods Decision trees Boosting Density estimation, maximum entropy models Logistic regression Regression problems and algorithms Ranking problems and algorithms Halving algorithm, weighted majority algorithm, mistake bounds Learning automata and transducers Reinforcement learning, Markov decision processes (MDPs) http://www.cs.nyu.edu/~mohri/ml14/ Carnegie Mellon University (CMU) Video resources "The videos below are intended to serve as resources for our current students, and not as online learning materials for students outside of our program." - The Machine Learning Department http://www.ml.cmu.edu/teaching/video-resources.html CMU: Convex Optimisation, Fall 2013, by Barnabas Poczos and Ryan Tibshirani Overview and objectives Nearly every problem in machine learning and statistics can be formulated in terms of the optimization of some function, possibly under some set of constraints. As we obviously cannot solve every problem in machine learning or statistics, this means that we cannot generically solve every optimization problem (at least not efficiently). Fortunately, many problems of interest in statistics and machine learning can be posed as optimization tasks that have special properties such as convexity, smoothness, separability, sparsity etc. permitting standardized, efficient solution techniques. This course is designed to give a graduate-level student a thorough grounding in these properties and their role in optimization, and a broad comprehension of algorithms tailored to exploit such properties. The main focus will be on convex optimization problems, though we will also discuss nonconvex problems at the end. We will visit and revisit important applications in statistics and machine learning. Upon completing the course, students should be able to approach an optimization problem (often derived from a statistics or machine learning context) and: (1) identify key properties such as convexity, smoothness, sparsity, etc., and/or possibly reformulate the problem so that it possesses such desirable properties; (2) select an algorithm for this optimization problem, with an understanding of the ad- vantages and disadvantages of applying one method over another, given the problem and properties at hand; (3) implement this algorithm or use existing software to efficiently compute the solution. http://www.stat.cmu.edu/~ryantibs/convexopt/#videos CMU: Machine Learning, Spring 2011, by Tom Mitchell Machine Learning is concerned with computer programs that automatically improve their performance through experience (e.g., programs that learn to recognize human faces, recommend music and movies, and drive autonomous robots). This course covers the theory and practical algorithms for machine learning from a variety of perspectives. We cover topics such as Bayesian networks, decision tree learning, Support Vector Machines, statistical learning methods, unsupervised learning and reinforcement learning. The course covers theoretical concepts such as inductive bias, the PAC learning framework, Bayesian learning methods, margin-based learning, and Occam's Razor. Short programming assignments include hands-on experiments with various learning algorithms, and a larger course project gives students a chance to dig into an area of their choice. This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in machine learning. http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml Homework with solutions http://www.cs.cmu.edu/~tom/10701 sp11/hws.shtml CMU: 10-601 Machine Learning Spring 2015 - Lecture 18 by Maria-Florina Balcan Topics: support vector machines (SVM), semi-supervised learning, other learning paradigms https://www.youtube.com/watch?v=JoJhXsdTWxM&spfreload=10 CMU: 10-601 Machine Learning Spring 2015, Homeworks & Solutions & Code (Matlab) http://www.cs.cmu.edu/%7Eninamf/courses/601sp15/homeworks.shtml CMU: 10-601 Machine Learning Spring 2015 - Recitation 10 by Kirstin Early Topics: support vector machines (SVM), multi-class classification, constrained optimization using Lagrange multipliers https://www.youtube.com/watch?v=S4Cjl GwGZg&spfreload=10 CMU: Abulhair Saparov’s Youtube Channel https://www.youtube.com/channel/UC3IXpkDpzturFkkvGJ-HeMg?spfreload=10 CMU: Machine Learning Course by Roni Rosenfeld, Spring 2015 Topics covered in 10-601A include concept learning, version spaces, information theory, decision trees, neural networks, estimation & the bias-variance tradeoff, hypothesis testing in machine learning, Bayesian learning, the Minimum Description Length principle, the Gibbs classifier, Naïve Bayes classifier, Bayes Nets & Graphical Models, the EM algorithm, Hidden Markov Models, K-Nearest-Neighbors and nonparametric learning, Maximum Margin classifiers (SVM) and kernel based methods, bagging, boosting and Deep Learning. This section of 10-601 focuses on the mathematical, statistical and computational foundations of the field. It emphasizes the role of assumptions in machine learning. As we introduce different ML techniques, we work out together what assumptions are implicit in them. We use the Socratic Method whenever possible, and student participation is expected. We focus on conceptual depth, at the possible expense of breadth. http://www.cs.cmu.edu/~roni/10601/ CMU: Language and Statistics by Roni Rosenfeld, Spring 2015 Internet search, speech recognition, machine translation, question answering, information retrieval, biological sequence analysis -- are all at the forefront of this century’s information revolution. In addition to their use of machine learning, these technologies rely heavily on classic statistical estimation techniques. Yet most CS and engineering undergraduate programs do not prepare students in this area beyond an introductory probability & statistics course. This course is designed to address this gap. The goal of "Language and Statistics" is to ground the data-driven techniques used in language technologies in sound statistical methodology. We start by formulating various language technology problems in both an information theoretic framework (the source-channel paradigm) and a Bayesian framework (the Bayes classifier). We then discuss the statistical properties of words, sentences, documents and whole languages, and the various computational formalisms used to represent language. These discussions naturally lead to specific concepts in statistical estimation. Topics include: Zipf's distribution and type-token curves; point estimators, Maximum Likelihood estimation, bias and variance, sparseness, smoothing and clustering; interpolation, shrinkage, and backoff; entropy, cross entropy and mutual information; decision tree models applied to language; latent variable models and the EM algorithm; hidden Markov models; exponential models and the maximum entropy principle; semantic modeling and dimensionality reduction; probabilistic context-free grammars and syntactic language models. http://www.cs.cmu.edu/~roni/11761/ Metacademy Concept list and roadmap list Metacademy is a community-driven, open-source platform for experts to collaboratively construct a web of knowledge. Right now, Metacademy focuses on machine learning and probabilistic AI, because that's what the current contributors are experts in. But eventually, Metacademy will cover a much wider breadth of knowledge, e.g. mathematics, engineering, music, medicine, computer science… http://www.metacademy.org/list http://www.metacademy.org/roadmaps/ HARVARD University: Advanced Machine Learning, Fall 2013 This course is about learning to extract statistical structure from data, for making decisions and predictions, as well as for visualization. The course will cover many of the most important mathematical and computational tools for probabilistic modeling, as well as examine specific models from the literature and examine how they can be used for particular types of data. There will be a heavy emphasis on implementation. You may use Matlab, Python or R. Each of the five assignments will involve some amount of coding, and the final project will almost certainly require the running of computer experiments. https://www.seas.harvard.edu/courses/cs281/ HARVARD University: Data Science Course, Fall 2013 Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries. We will be using Python for all programming assignments and projects. http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml OXFORD University: Nando de Freitas Video Lectures I am a machine learning professor at UBC. I am making my lectures available to the world with the hope that this will give more folks out there the opportunity to learn some of the wonderful things I have been fortunate to learn myself. Enjoy. http://www.youtube.com/user/ProfNandoDF OXFORD University: Deep learning - Introduction by Nando de Freitas, 2015 Published on 29 Jan 2015 Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from Brendan Shillingford. https://www.youtube.com/watch?v=PlhFWT7vAEw&spfreload=10 OXFORD University: Deep learning - Linear Models by Nando de Freitas, 2015 Published on 29 Jan 2015 (bad audio) Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from Brendan Shillingford. https://www.youtube.com/watch?v=DHspIG64CVM&spfreload=10 OXFORD University: Yee Whye Teh Home Page, Department of Statistics, University College Research Interests I am interested in machine learning, Bayesian statistics and computational statistics. My current focus is on developing Bayesian nonparametric methodologies, with applications to large and complex problems in unsupervised learning, computational linguistics, and genetics. Teaching : Statistical Machine Learning and Data Mining (MS1b HT2014) Slides and Problem Sheets with Solutions (not to be missed!) http://www.stats.ox.ac.uk/~teh/smldm.html About Bayesian Nonparametrics (MLSS 2013) https://www.youtube.com/embed/dNeW5zoNJ7g?vq=hd1080&autoplay=1 https://www.youtube.com/embed/7sy MCbqtco?vq=hd1080&autoplay=1 https://www.youtube.com/embed/kqEWDdTB_3Q?vq=hd1080&autoplay=1 https://www.youtube.com/watch?v=FO0fgVS9OmE&spfreload=10 Slides http://mlss.tuebingen.mpg.de/2013/slides teh.pdf CAMBRIDGE University: Machine Learning Slides, Spring 2014 LECTURE SYLLABUS This year, the exposition of the material will be centered around three specific machine learning areas: 1) supervised non-paramtric probabilistic inference using Gaussian processes, 2) the TrueSkill ranking system and 3) the latent Dirichlet Allocation model for unsupervised learning in text. http://mlg.eng.cam.ac.uk/teaching/4f13/1314/ CALTECH University: Learning from Data Free, introductory Machine Learning online course (MOOC) Taught by Caltech Professor Yaser Abu-Mostafa [article] Lectures recorded from a live broadcast, including Q&A Prerequisites: Basic probability, matrices, and calculus 8 homework sets and a final exam Discussion forum for participants Topic-by-topic video library for easy review http://work.caltech.edu/telecourse.html http://work.caltech.edu/library/ UNIVERSITY COLLEGE LONDON (UCL): Discovery UCL Discovery showcases UCL's research publications, giving access to journal articles, book chapters, conference proceedings, digital web resources, theses and much more, from all UCL disciplines. Where copyright permissions allow, a full copy of each research publication is directly available from UCL Discovery. You can search or browse UCL Discovery, see the most-downloaded publications, and keep up to date with the latest UCL research by RSS or even on Twitter. UCL Discovery supports UCL's Publications Policy. http://discovery.ucl.ac.uk http://www.youtube.com/watch?v=Euaoblv nL8 UCL: Supervised Learning by Mark Herbster The course covers supervised approaches to machine learning. It starts by probabilistic pattern recognition followed by an in-depth introduction to various supervised learning algorithms such as Least Squares, Lasso, Perceptron Algorithm, Support Vector Machines and Boosting. http://www0.cs.ucl.ac.uk/staff/M.Herbster/GI01/ Yann LeCun’s Publications My main research interests are Machine Learning, Computer Vision, Mobile Robotics, and Computational Neuroscience. I am also interested in Data Compression, Digital Libraries, the Physics of Computation, and all the applications of machine learning (Vision, Speech, Language, Document understanding, Data Mining, Bioinformatics). http://yann.lecun.com/exdb/publis/index.html#fulllist Ecole Normale Superieure: Francis Bach, Courses and Exercises with solutions (English-French) Spring 2014: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2013: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2013: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Spring 2013: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure (Paris) Fall 2012: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2012: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure (Paris) Spring 2012: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2011: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2011: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2010: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2010: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2009: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Fall 2008: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2008: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des Mines de Paris Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2007: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des Mines de Paris Fall 2006: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Fall 2005: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan http://www.di.ens.fr/~fbach/ http://videolectures.net/francis_r_bach/ Technion, Israel Institute of Technology, Machine Learning Videos Added the 22-Nov-2014 Technion - Israel Institute of Technology is Israel's biggest scientific-technological university and one of the largest centers of applied research in the world. Here the future is being shaped - by over 13,000 of Israel's most dynamic students active in 18 faculties. Technion is Israel's flagship of world-class education, bringing Israel its first Nobel Prizes in science. From the cornerstone laying ceremony in 1912, Technion's over 70,000 alumni have built the state of Israel and created and lead the majority of Israel's successful companies, impacting millions of scientists, students, entrepreneurs and citizens worldwide. http://www.youtube.com/user/Technion/search?query=machine learning E0 370: Statistical Learning Theory by Prof. Shivani Agarwal, Indian Institute of Science Course Description This is an advanced course on learning theory suitable for PhD students working in learning theory or related areas (e.g. information theory, game theory, computational complexity theory etc) or 2nd-year Masters students doing a machine learning related project that involves learningtheoretic concepts. The course will consist broadly of three parts and will cover roughly the following topics: Generalization error bounds Uniform convergence Growth function, VC-dimension, Sauer's Lemma Covering numbers, pseudo-dimension, fat-shattering dimension Margin analysis Rademacher averages Algorithmic stability Statistical consistency and learnability Consistency of ERM and SRM methods Learnability/PAC learning Consistency of nearest neighbor methods Consistency of surrogate risk minimization methods (binary and multiclass) Online learning and multi-­‐armed bandits Online classification/regression Online learning from experts, online allocation Online convex optimization Online-to-batch conversions Multi-armed bandits (stochastic and adversarial) http://drona.csa.iisc.ernet.in/~shivani/Teaching/E0370/Aug-2013/index.html#lectures NPTEL, National Programme on Technology Enhanced Learning, India NPTEL provides E-learning through online Web and Video courses in Engineering, Science and humanities streams. The mission of NPTEL is to enhance the quality of Engineering education in the country by providing free online courseware. http://nptel.ac.in Probability Theory and Applications http://nptel.ac.in/courses/111104079/ Pattern Recognition http://nptel.ac.in/courses/106106046/1 Pattern Recognition Class, Universität Heidelberg, 2012 (Videos in English) Syllabus: 1. Introduction 1.1 Applications of Pattern Recognition 1.2 k-Nearest Neighbors Classification 1.3 Probability Theory 1.4 Statistical Decision Theory 2. Correlation Measures, Gaussian Models 2.1 Pearson Correlation 2.2 Alternative Correlation Measures 2.3 Gaussian Graphical Models 2.4 Discriminant Analysis 3. Dimensionality Reduction 3.1 Regularized LDA/QDA 3.2 Principal Component Analysis (PCA) 3.3 Bilinear Decompositions 4. Neural Networks 4.1 History of Neural Networks 4.2 Perceptrons 4.3 Multilayer Perceptrons 4.4 The Projection Trick 4.5 Radial Basis Function Networks 5. Support Vector Machines 5.1 Loss Functions 5.2 Linear Soft-Margin SVM 5.3 Nonlinear SVM 6. Kernels, Random Forest 6.1 Kernels 6.2 One-Class SVM 6.3 Random Forest 6.4 Random Forest Feature Importance 7. Regression 7.1 Least-Squares Regression 7.2 Optimum Experimental Design 7.3 Case Study: Functional MRI 7.4 Case Study: Computer Tomography 7.5 Regularized Regression 8. Gaussian Processes 8.1 Gaussian Process Regression 8.2 GP Regression: Interpretation 8.3 Gaussian Stochastic Processes 8.4 Covariance Function 9. Unsupervised Learning 9.1 Kernel Density Estimation 9.2 Cluster Analysis 9.3 Expectation Maximization 9.4 Gaussian Mixture Models 10. Directed Graphical Models 10.1 Bayesian Networks 10.2 Variable Elimination 10.3 Message Passing 10.4 State Space Models 11. Optimization 11.1 The Lagrangian Method 11.2 Constraint Qualifications 11.3 Linear Programming 11.4 The Simplex Algorithm 12. Structured Learning 12.1 structSVM 12.2 Cutting Planes https://www.youtube.com/playlist? list=PLuRaSnb3n4kRDZVU6wxPzGdx1CN12fn0w&spfreload=10 Videolectures.net VideoLectures.NET is an award-winning free and open access educational video lectures repository. The lectures are given by distinguished scholars and scientists at the most important and prominent events like conferences, summer schools, workshops and science promotional events from many fields of Science. The portal is aimed at promoting science, exchanging ideas and fostering knowledge sharing by providing high quality didactic contents not only to the scientific community but also to the general public. All lectures, accompanying documents, information and links are systematically selected and classified through the editorial process taking into account also users' comments. http://videolectures.net/Top/Computer Science/Machine Learning/ http://videolectures.net/Top/Computer Science/Machine Learning/#o=top MLSS Machine Learning Summer Schools Videos MLSS Videos from 2004 to 2012 http://videolectures.net/site/search/?q=MLSS MLSS Videos 2012 http://www.youtube.com/user/compcinemaucsc/feed MLSS Videos 2012 http://www.youtube.com/channel/UCHhbDEKA7BP58mq1wfTBQNQ Max Planck Institute for Intelligent Systems Tubingen, MLSS Videos 2013 Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems. The Institute studies these principles in biological, computational, hybrid, and material systems ranging from nano to macro scales.We take a highly interdisciplinary approach that combines mathematics, computation, material science, and biology. The MPI for Intelligent Systems has campuses in Stuttgart and Tübingen. Our Stuttgart campus has world-leading expertise in small-scale intelligent systems that leverage novel material science and biology. The Tübingen campus focuses on how intelligent systems process information to perceive, act and learn. http://www.youtube.com/channel/UCty-pPOWlWUk4gXNm5pydcg http://mlss.tuebingen.mpg.de/2013/speakers.html MLSS Videos 2014 https://www.youtube.com/playlist?list=PLZSO 6bSqHQCIYxE3ycGLXHMjK3XV7Iz&spfreload=10 All slides of MLSS 2015, Austin, Texas http://www.cs.utexas.edu/mlss/schedule GoogleTechTalks Machine Learning https://www.youtube.com/user/GoogleTechTalks/search?query=machine learning Deep Learning https://www.youtube.com/user/GoogleTechTalks/search?query=deep learning Udacity Opencourseware Supervised Learning (select "View Courseware" for free access) Why Take This Course? In this course, you will gain an understanding of a variety of topics and methods in Supervised Learning. Like function approximation in general, Supervised Learning prompts you to make generalizations based on fundamental assumptions about the world. Michael: So why wouldn't you call it "function induction?" Charles: Because someone said "supervised learning" first. Topics covered in this course include: Decision trees, neural networks, instance-based learning, ensemble learning, computational learning theory, Bayesian learning, and many other fascinating machine learning concepts. https://www.udacity.com/course/ud675 Unsupervised Learning (select "View Courseware" for free access) Why Take This Course? You will learn about and practice a variety of Unsupervised Learning approaches, including: randomized optimization, clustering, feature selection and transformation, and information theory. You will learn important Machine Learning methods, techniques and best practices, and will gain experience implementing them in this course through a hands-on final project in which you will be designing a movie recommendation system (just like Netflix!). https://www.udacity.com/course/ud741 Reinforcement Learning (select "View Courseware" for free access) Why Take This Course? You will learn about Reinforcement Learning, the field of Machine Learning concerned with the actions that software agents ought to take in a particular environment in order to maximize rewards. Michael: Reinforcement Learning is a very popular field. Charles: Perhaps because you're in it, Michael. Michael: I don't think that's it. In this course, you will gain an understanding of topics and methods in Reinforcement Learning, including Markov Decision Processes and Game Theory. You will gain experience implementing Reinforcement Learning techniques in a final project. In the final project, we’ll bring back the 80's and design a Pacman agent capable of eating all the food without getting eaten by monsters. https://www.udacity.com/course/ud820 Model Building and Validation Advanced Techniques for Analyzing Data Course Summary This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions. All of these things are equally important and model building is a crucial skill to acquire in every field of science. The process stays true to the scientific method, making what you learn through your models useful for gaining an understanding of whatever you are investigating as well as make predictions that hold true to test. We will take you on a journey through building various models. This process involves asking questions, gathering and manipulating data, building models, and ultimately testing and evaluating them. https://www.udacity.com/course/ud919 Udacity's Videos Udacity, a pioneer in online education, is building "University by Silicon Valley", a new type of online university that: - teaches the actual programming skills that industry employers need today; - delivers credentials endorsed by employers, because they built them; - provides education at a fraction of the cost and time of traditional schools. With industry giants - Google, AT&T, Facebook, Salesforce, Cloudera, etc. - we offer Nanodegree credentials, designed so professionals become Web Developers, Data Analysts, or Mobile Developers. Supported by our communities of coaches and students, our students learn programming and data science through a series of online courses and hand-on projects that help them practice and build a convincing portfolio. https://www.youtube.com/user/Udacity/videos?spfreload=10 Mathematicalmonk Machine Learning Videos about math, at the graduate level or upper-level undergraduate. https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA Judea Pearl Symposium Judea Pearl (born 1936) is an Israeli-born American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belief propagation). He is also credited for developing a theory of causal and counterfactual inference based on structural models (see article on causality). He is the 2011 winner of the ACM Turing Award, the highest distinction in computer science, "for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning". (source Wikipedia) http://www.youtube.com/playlist?list=PLMliWGoMCBYilM6tw6S 4BpL t29jbWsp http://www.youtube.com/user/UCLA/playlists SIGDATA, Indian Institute of Technology Kanpur http://www.cse.iitk.ac.in/users/sigdata/ http://www.cse.iitk.ac.in/users/sesres/ Hakka Labs Hakka Labs is passionate about helping professional software engineers level up in their careers. Our content, events & community have grown by leaps and bounds since our humble origin when we launched as a Tumblr blog in 2011. We believe that "software is eating the world" and our passion is in building valuable resources and community for startup-oriented software engineers - the folks that will power innovation and disrupt industries, and ultimately shape our future. Hakka originally launched in SF Bay & NYC and rapidly built relationships with the top companies, CTOs and tech influencers in these key areas. We have deep connections to the software engineering worlds on both coasts and often invite groups of CTOs and engineers to our office in Soho, or meet with them at engineering events that we either run or participate in. We're also currently up & running in Berlin & Moscow, and plan to continue to rapidly expand worldwide. Not too shabby for a scrappy startup with a small marketing budget! http://www.hakkalabs.co https://www.youtube.com/user/g33ktalktv/videos Open Yale Course Game Theory Each course includes a full set of class lectures produced in high-quality video accompanied by such other course materials as syllabi, suggested readings, exams, and problem sets. The lectures are available as downloadable videos, and an audio-only version is also offered. In addition, searchable transcripts of each lecture are provided. http://oyc.yale.edu/courses COLUMBIA University: Machine Learning resources Course related notes Regression by linear combination of basis functions [ps] [pdf] The perceptron [ps] [pdf] Document classification with the multinomial model [ps] [pdf] Sampling from a Gaussian [ps] [pdf] Slides on exponential family distributions [ps] [pdf] http://www.cs.columbia.edu/~jebara/4771/tutorials.html COLUMBIA University: Applied Data Science by Ian Langmore and Daniel Krasner The purpose of this course is to take people with strong mathematical/statistical knowledge and teach them software development fundamentals. This course will cover • Design of small software packages • Working in a Unix environment • Designing software in teams • Fundamental statistical algorithms such as linear and logistic regression • Overfitting and how to avoid it • Working with text data (e.g. regular expressions) • Time series • And more. . . http://columbia-applied-data-science.github.io/appdatasci.pdf http://columbia-applied-data-science.github.io Deep Learning Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. This website is intended to host a variety of resources and pointers to information about Deep Learning. In these pages you will find • a reading list, • links to software, • datasets, • a list of deep learning research groups and labs, • a list of announcements for deep learning related jobs (job listings), • as well as tutorials and cool demos. For the latest additions, including papers and software announcement, be sure to visit the Blog section and subscribe to our RSS feed of the website. Contact us if you have any comments or suggestions! http://www.deeplearning.net/tutorial/ http://deeplearning.net BigDataWeek Videos Big Data Week is one of the most unique global platforms of interconnected community events focusing on the social, political, technological and commercial impacts of Big Data. It brings together a global community of data scientists, data technologies, data visualisers and data businesses spanning six major commercial, financial, social and technological sectors. http://www.youtube.com/user/BigDataWeek/videos Neural Information Processing Systems Foundation (NIPS) Video resources The Foundation: The Neural Information Processing Systems (NIPS) Foundation is a non-profit corporation whose purpose is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects. Neural information processing is a field which benefits from a combined view of biological, physical, mathematical, and computational sciences. The primary focus of the NIPS Foundation is the presentation of a continuing series of professional meetings known as the Neural Information Processing Systems Conference, held over the years at various locations in the United States, Canada and Spain. http://www.youtube.com/user/NeuralInformationPro/feed NIPS 2014 Workshop Videos https://www.youtube.com/user/NeuralInformationPro/videos?spfreload=10 NIPS 2014 Workshop - (Bengio) OPT2014 Optimization for Machine Learning Optimization lies at the heart of many machine learning algorithms and enjoys great interest in our community. Indeed, this intimate relation of optimization with ML is the key motivation for the OPT series of workshops. We aim to foster discussion, discovery, and dissemination of the state-of-the-art in optimization relevant to ML.This year, as the seventh in its series, the workshop's special topic will be the challenges in non-convex optimization, with contributions spanning both the challenges (hardness results) and the opportunities (modeling flexibility) of non-convex optimization. Irrespective of the special topic, the workshop will again warmly welcome contributed talks and posters on all topics in optimization for machine learning. The confirmed invited speakers for this year are: * Amir Beck (Technion, Israel) * Jean Bernard Lasserre (CNRS, France) * Yoshua Bengio (University of Montreal, Canada) https://www.youtube.com/watch?v=jl-s4gFWhlI&spfreload=10 Hong Kong Open Source Conference 2013 (English&Chinese) Wang Leung Wong The Vice-Chairperson of the Hong Kong Linux User Group This channel will post the videos of my life and opensource events in Hong Kong. Hong Kong Linux User Group: http://linux.org.hk Facebook: https://www.facebook.com/groups/hklug/ http://www.youtube.com/playlist?list=PL2FSfitY-hTKbEKNOwb-j0blK6qBauZ1f http://www.youtube.com/playlist?list=PL2FSfitY-hTLOL6tT 12YUK4c67e-E0xh ICLR 2014 Videos It is well understood that the performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied. The rapidly developing field of representation learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field, and include in it topics such as deep learning and feature learning, metric learning, kernel learning, compositional models, non-linear structured prediction, and issues regarding non-convex optimization. Despite the importance of representation learning to machine learning and to application areas such as vision, speech, audio and NLP, there is currently no common venue for researchers who share a common interest in this topic. The goal of ICLR is to help fill this void. ICLR 2014 will be a 3-day event from April 14th to April 16th 2014, in Banff, Canada. The conference will follow the recently introduced open reviewing and open publishing publication process, which is explained in further detail here: Publication Model. https://www.youtube.com/playlist?list=PLhiWXaTdsWB-3O19E0PSR0r9OseIylUM8 ICLR 2013 Videos ICLR 2013 will be a 3-day event from May 2nd to May 4th 2013, co-located with AISTATS2013 in Scottsdale, Arizona. The conference will adopt a novel publication process, which is explained in further detail here: Publication Model. https://sites.google.com/site/representationlearning2013/program-details/program Machine Learning Conference Videos Events matching your search: • ICML 2011 • Sixth Annual Machine Learning Symposium • 1st Lisbon Machine Learning School • Copulas in Machine Learning Workshop 2011 • NIPS 2011 Workshop on Integrating Language and Vision • Machine Learning in Computational Biology (MLCB) 2011 • Learning Semantics Workshop • Sparse Representation and Low-rank Approximation • The 4th International Workshop on Music and Machine Learning: Learning from Musical Structure • Big Learning: Algorithms, Systems, and Tools for Learning at Scale • ICML 2012 Oral Talks (International Conference on Machine Learning) • Big Data Meets Computer Vision: First International Workshop on Large Scale Visual Recognition and Retrieval • 2nd Workshop on Semantic Perception, Mapping and Exploration (SPME) • ICML 2012 Workshop on Representation Learning • Inferning 2012: ICML Workshop on interaction between Inference and Learning • Object, functional and structured data: towards next generation kernel-based methods - ICML 2012 Workshop • Tutorial on Statistical Learning Theory in Reinforcement Learning and Approximate Dynamic Programming • Tutorial on Causal inference - conditional independences and beyond • ICML 2012 Tutorial on Prediction, Belief, and Markets • PAC-Bayesian Analysis in Supervised, Unsupervised, and Reinforcement Learning • Issues Performance Evaluation for Learning Algorithms: Techniques, Application and • 2nd Lisbon Machine Learning School (2012) • OpenCV using Python • Big Learning : Algorithms, Systems, and Tools • NIPS 2012 Workshop on Log-Linear Models • Machine Learning in Computational Biology (MLCB) 2012 • NYU Course on Big Data, Large Scale Machine Learning • Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS) 2013 • International Conference on Learning Representations (ICLR) 2013 • ICML 2013 Plenary Webcast • NYU Course on Deep Learning (Spring 2014) • NYU Course on Machine Learning and Computational Statistics 2014 http://techtalks.tv/search/results/?q=machine learning Internet Archive Hello Patron, Every day 3 million people use our collections. We have archived over ten petabytes (that's 10,000,000,000,000,000 bytes!) of information, including everything ever written in Balinese. This year we also launched our groundbreaking TV News Search and Borrow service, which former FCC Chairman Newton Minow said "offers citizens exceptional opportunities" to easily do their own fact checking and "to hold powerful public institutions accountable." Your support helps us build amazing services and keep them free for people around the globe. https://archive.org/search.php?query=machine%20learning University of Berkeley http://www.youtube.com/user/UCBerkeley/search?query=machine learning AMP Camps, Big Data Bootcamp, UC Berkeley AMP Camps are Big Data training events organized by the UC Berkeley AMPLab about big data analytics, machine learning, and popular open-source software projects produced by the AMPLab. All AMP Camp curriculum, and whenever possible videos of instructional talks presented at AMP Camps, are published here and accessible for free. http://ampcamp.berkeley.edu AMP Camp 5 was held at UC Berkeley and live-streamed online on November 20 and 21, 2014. Videos and exercises from the event are available on the AMPCamp 5 page. http://ampcamp.berkeley.edu/5/ AI on the Web, AIMA (Artificial Intelligence: A Modern Approach) by Stuart Russell and Peter Norvig This page links to 820 pages around the web with information on Artificial Intelligence. Links in Bold* followed by a star are especially useful and interesting sites. Links with a sign at the end have "tooltip" information that will pop up if you put your mouse over the link for a second or two. If you have new links to add, mail them to peter@norvig.com. http://aima.cs.berkeley.edu/ai.html http://aima.cs.berkeley.edu/ai.html#learning Resources and Tools of Noah's ARK Research Group The following were developed by ARK researchers (*developed in whole or in part before joining ARK): NLP tools: universal part-­‐of-­‐speech tagset, set of twelve coarse POS tags that generalizes across several languages Semantics: SEMAFOR, an open-source statistical frame-semantic parser; AMALGr, an opensource statistical analyzer for multiword expressions in context Syntax: TurboParser, an open-source, trainable statistical dependency parser; MSTParserStacked, an open-source, trainable statistical dependency parser based on stacking; DAGEEM code for unsupervised dependency grammar induction Information extraction: Arabic named entity recognizer Libraries/languages: AD3, an approximate MAP decoder; *Dyna, a declarative programming language for dynamic programming algorithms Machine translation tools, including: *cdec, a framework for statistical translation and other structure prediction problems; *Egypt, a statistical machine translation toolkit that includes Giza; gappy pattern models, code for modeling monolingual and bilingual textual patterns with gaps; Rampion, a training algorithm for statistical machine translation models Social media tools, including: Twitter NLP resources Datasets: *STRAND (parallel text collections from the web); CURD (the Carnegie Mellon University Recipe Database); 10-K Corpus (company annual reports and stock return volatility data); political blog corpus; movie$ corpus; movie summary corpus; question-answer data; Congressional bills corpus; Arabic named entity and supersense corpora; NFL tweets corpus; multiword expressions corpus Project websites: Flexible Learning for NLP; Low-Density MT; Compuframes, Big Multilinguality, Corporate Social Network http://www.ark.cs.cmu.edu/#resources ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2014 ABOUT THE ESAC FACULTY The ESAC Faculty was created in 2006 in order to foster an effective scientific environment at ESAC, and to to present a united face to the scientific work done at the centre. The faculty includes all active (i.e. publishing papers) research scientists at ESAC: ESA staff, Research Fellows, Science Contractors, and LAEFF members. For an insight into the founding principles, see the Overview of the ESAC Faculty presentation given at the first assembly. The ESAC Faculty's main purpose is to stimulate and promote science activities at ESAC. For this it maintains an active and attractive visitor programme for short-to-medium term collaborative stays at ESAC, covering established researchers as well as young post-docs, PhD and graduate students. The Faculty also supports visiting seminar speakers, conferences, workshops and travel not possibly via normal mission budgets. ESAC Faculty members pursue their own research (as per the scientific interests of individual members), but are also involved in numerous internal and external collaborations (overview of Faculty Science at ESAC). Faculty members are also strongly involved in the ESAC Trainee programme. http://www.cosmos.esa.int/web/esac-science-faculty/esac-statistics-workshop-2014 The Royal Society The Royal Society is a self-governing Fellowship of many of the world’s most distinguished scientists drawn from all areas of science, engineering, and medicine. The Society’s fundamental purpose, reflected in its founding Charters of the 1660s, is to recognise, promote, and support excellence in science and to encourage the development and use of science for the benefit of humanity. The Society has played a part in some of the most fundamental, significant, and life-changing discoveries in scientific history and Royal Society scientists continue to make outstanding contributions to science in many research areas. The Royal Society is the national Academy of science in the UK, and its core is its Fellowship and Foreign Membership, supported by a dedicated staff in London and elsewhere. The Fellowship comprises the most eminent scientists of the UK, Ireland and the Commonwealth. A major activity of the Society is identifying and supporting the work of outstanding scientists. The Society supports researchers through its early and senior career schemes, innovation and industry schemes, and other schemes. The Society facilitates interaction and communication among scientists via its discussion meetings, and disseminates scientific advances through its journals. The Society also engages beyond the research community, through independent policy work, the promotion of high quality science education, and communication with the public. https://www.youtube.com/user/RoyalSociety/videos?spfreload=10 Statistical and causal approaches to machine learning by Professor Bernhard Schölkopf https://www.youtube.com/watch?v=ek9jwRA2Jio&spfreload=10 Deep Learning RNNaissance with Dr. Juergen Schmidhuber A great session of NYC-ML Meetup Hosted by ShutterStock in the glorious Empire State building. Details: Deep Learning RNNaissance Machine learning and pattern recognition are currently being revolutionised by "Deep Learning" (DL) https://www.youtube.com/watch?v=6bOMf9zr7N8&spfreload=10 Introduction to Deep Learning with Python by Alec Radford Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning with Python and the Theano library. The emphasis of the talk is on high performance computing, natural language processing using recurrent neural nets, and large scale learning with GPUs. https://www.youtube.com/watch?v=S75EdAcXHKk SlideShare presentation is available here: http://slidesha.re/1zs9M11 A Statistical Learning/Pattern Recognition Glossary by Thomas Minka Welcome to my glossary. It is inspired by Brian Ripley's glossary in "Pattern Recognition for Neural Networks" (and the need to save time explaining things). http://alumni.media.mit.edu/~tpminka/statlearn/glossary/ The Kalman Filter Website by Greg Welch and Gary Bishop The Kalman Filter Some tutorials, references, and research related to the Kalman filter. This site is maintained by Greg Welch in Nursing / Computer Science / Simulation & Training at the University of Central Florida, and Gary Bishop in the Department of Computer Science at the University of North Carolina at Chapel Hill. Welch also holds an adjunct position at UNC-Chapel Hill. Please send additions or comments. http://www.cs.unc.edu/~welch/kalman/index.html Lisbon Machine Learning School (LXMLS) LXMLS Lab guide (Great Tutorial!) Day 0 In this class we will introduce several fundamental concepts needed further ahead. We start with an introduc- tion to Python, the programming language we will use in the lab sessions, and to Matplotlib and Numpy, two modules for plotting and scientific computing in Python, respectively. Afterwards, we present several notions on probability theory and linear algebra. Finally, we focus on numerical optimization. The goal of this class is to give you the basic knowledge for you to understand the following lectures. We will not enter in too much detail in any of the topics. Day 1 This day will serve as an introduction to machine learning. We recall some fundamental concepts about deci- sion theory and classification. We also present some widely used models and algorithms and try to provide the main motivation behind them. There are several textbooks that provide a thorough description of some of the concepts introduced here: for example, Mitchell (1997), Duda et al. (2001), Scho lkopf and Smola (2002), Joachims (2002), Bishop (2006), Manning et al. (2008), to name just a few. The concepts that we introduce in this chapter will be revisited in later chapters, where the same algorithms and models will be adapted to structured inputs and outputs. For now, we concern only with multi-class classification (with just a few classes). Day 2 In this class, we relax the assumption that the data points are independently and identically distributed (i.i.d.) by moving to a scenario of structured prediction, where the inputs are assumed to have temporal or spacial dependencies. We start by considering sequential models, which correspond to a chain structure: for instance, the words in a sentence. In this lecture, we will use part-of-speech tagging as our example task. We start by defining the notation for this lecture in Section 2.1. Afterwards, in section 2.2, we focus on the well known Hidden Markov Models and in Section 2.3 we describe how to estimate its parameters from labeled data. In Section 2.4 we explain the inference algorithms (Viterbi and Forward-Backward) for sequence models. These inference algorithms will be fundamental for the rest of this lecture, as well as for the next lecture on discriminative training of sequence models. In Section 2.6 we describe the task of Part-of-Speech tagging, and how the Hidden Markov Models are suitable for this task. Finally, in Section 2.7 we address unsupervised learning of Hidden Markov Models through the Expectation Maximization algorithm. Day 3 In this class, we will continue to focus on sequence classification, but instead of following a generative ap- proach (like in the previous chapter) we move towards discriminative approaches. Recall that the difference between these approaches is that generative approaches attempt to model the probability distribution of the data, P(X, Y), whereas discriminative ones only model the conditional probability of the sequence, given the observed data, P(Y X). Day 4 In this lab we will implement some exercises related with parsing. Day 5 In this lab (and tomorrow), we will work with Amazon.com’s Web Services (AWS)1, a cloud based solution to run some simple analyses. Then, in the next lab, we will build on these tools to construct a larger learning system. We will only look at small problems, such that you can run them both locally and on AWS quickly. This way, you can learn how to use them within the limited time of these lab sessions. Unfortunately, this also means that you will not be dealing with truly large-scale problems where AWS is faster than local computations. You should consider these last two days as a proof-ofconcept giving you the knowledge necessary to run things on AWS, which you can apply to your own large-scale problems after this summer school. Day 6 In the previous lesson, you learned the fundamentals of MapReduce and applied it to a simple classification problem (language detection, using the Na ıve Bayes classifier). Today, we’re going to use MapReduce again to solve a trickier problem: using EM to perform unsupervised POS induction. Use the same login information you used yesterday to access your Amazon machine. http://lxmls.it.pt/2014/guide.pdf LXMLS Slides, 2014 During the morning there will be lectures focusing on the main areas of ML and their application to NLP. These areas include but are not restricted to: Classification, Structured Prediction (sequences, trees, graphs), Parsing, Information Retrieval, and their applications to practical language processing on the Web. For each topic introduced in the morning there will be a practical session in the afternoon, where students will have the opportunity to test the concepts in practice. The practical sessions will consist in implementation exercises (using Python, Numpy, and Matplotlib) of the methods learned during the morning, testing them on real examples. A preliminary version of the lab guide is available here. http://lxmls.it.pt/2014/?page_id=5 INTRODUCTORY APPLIED MACHINE LEARNING by Victor Lavrenko and Nigel Goddard, University of Edinburgh, 2011 The goal of this course is to introduce students to basic algorithms for learning from examples, focusing on classification and clustering problems. This is a level 9 course intended for MSc students and 3rd year undergraduates. http://www.inf.ed.ac.uk/teaching/courses/iaml/ Data Mining and Machine Learning Course Material by Bamshad Mobasher, DePaul University, Fall 2014 COURSE DESCRIPTION The course will focus on the implementations of various data mining and machine learning techniques and their applications in various domains. The primary tools used in the class are the Python programming language and several associated libraries. Additional open source machine learning and data mining tools may also be used as part of the class material and assignments. Students will develop hands on experience developing supervised and unsupervised machine learning algorithms and will learn how to employ these techniques in the context of popular applications such as automatic classification, recommender systems, searching and ranking, text mining, group and community discovery, and social media analytics. http://facweb.cs.depaul.edu/mobasher/classes/CSC478/lecture.html Intelligent Information Retrieval by Bamshad Mobasher, DePaul University, Winter 2015 COURSE DESCRIPTION This course will examine the design, implementation, and evaluation of information retrieval systems, such as Web search engines, as well as new and emerging technologies to build the next generation of intelligent and personalized search tools and Web information systems. We will focus on the underlying retrieval models, algorithms, and system implementations, such as vectorspace and probabilistic retrieval models, as well as the PageRank algorithm used by Google. We will also study more advanced topics in intelligent information retrieval and filtering, particularly on the World Wide Web, including techniques for document categorization, automatic concept discovery, recommender systems, discovery and analysis of online communities and social networks, and personalized search. Throughout the course, current literature from the viewpoints of both research and practical retrieval technologies both on and off the World Wide Web will be examined. http://facweb.cs.depaul.edu/mobasher/classes/csc575/lecture.html Student Dave Youtube Channel https://www.youtube.com/user/TheScienceguy3000/videos?spfreload=10 Current Courses of Justin E. Esarey, RICE University Current Courses POLS 395: Introduction to Statistics [syllabus] POLS 500: Social Scientific Thinking I (PhD) [syllabus] POLS 505: Advanced MLE: Analyzing Categorical and Longitudinal Data [syllabus] POLS 506: Bayesian Statistics (PhD) [syllabus] Lecture 0: Introduction to R [webcast lecture] [R script] Lecture 1: Basic Concepts of Bayesian Inference [webcast lecture][R script][notebook] Lecture 2: Simple Bayesian Models Lecture 3: Basic Monte Carlo Procedures and Sampling Algorithms Lecture 4: The Metropolis-Hastings Algorithm and the Gibbs Sampler Lecture 5: Practical MCMC for Estimating Models Lecture 6: Bayesian Hierarchical Models and GLMs Lecture 7: Fitting Hierarchical Models with BUGS Lecture 8: Item Response Theory and the Scaling of Latente Dimensions Lecture 9: Model Checking, Validation, and Comparison Lecture 10: Missing Data Imputation Lecture 11: Multilevel Regression and Poststratification Lecture 12: Bayesian Spatial Autoregressive Models POLS 507: Nonparametric Models and Machine Learning (PhD) [syllabus] Lecture 1: Introduction to Nonparametric Statistics [webcast lecture] [R script] [notebook] Lecture 2: Nonparametric Uncertainty Estimation and Bootstrapping [webcast lecture] [R script] [notebook] Lecture 3: Ensemble Models and Bayesian Model Averaging [webcast lecture] [R script] [notebook] Lecture 4: "Causal Inference" and Matching [webcast lecture] [R script] [notebook] Lecture 5: Instrumental Variable Models [webcast lecture] [R script] [notebook] Lecture 6: Bayesian Networks and Causality [webcast lecture] [R script] [notebook] Lecture 7: Assessing Fit in Discrete Choice Models [webcast lecture] [R script] [notebook] Lecture 8: Identifying and Measuring Latent Variables [webcast lecture] [R script] [notebook] Lecture 9: Neural Networks [webcast lecture] [R script] [notebook] Lecture 10: Classification and Regression Trees [webcast lecture] [R script] [notebook] http://jee3.web.rice.edu/teaching.htm From Bytes to Bites: How Data Science Might Help Feed the World by David Lobell, Stanford University This seminar features leading Industrial and academic experts on big data analytics, information management, data mining, machine learning, and large-scale data processing. http://i.stanford.edu/infoseminar/lobell.html Conference on Empirical Methods in Natural Language Processing (and forerunners) (EMNLP) (Free access to all publications) The ACL Anthology currently hosts 33921 papers on the study of computational linguistics and natural language processing. Subscribe to the mailing list to receive announcements and updates to the Anthology. http://aclanthology.info/venues/emnlp emnlp acl's Youtube Channel https://www.youtube.com/channel/UCZC4e4nrTjVqkW3Gcl16WoA/videos?spfreload=10 Columbia University's Laboratory for Intelligent Imaging and Neural Computing (LIINC) Columbia University's Laboratory for Intelligent Imaging and Neural Computing (LIINC) was founded in September 2000 by Paul Sajda. The mission of LIINC is to using principles of reverse "neuro"-engineering to characterize the cortical networks underlying perceptual and cognitive processes, such as rapid decision making, in the human brain. Our laboratory pursues both basic and applied neuroscience research projects, with emphasis in the following: ... http://liinc.bme.columbia.edu/mainTemplate.htm?liinc_projects.htm Enabling Brain-Computer Interfaces for Labeling Our Environment by Paul Sadja NYC Machine Learning Meetup 1/15/15 Paul Sadja from Columbia University presenting "Neural Correlates of the "Aha" Moment: Enabling Brain-Computer Interfaces for Labeling Our Environment" https://www.youtube.com/watch?v=weNqauwatBs The Unreasonable Effectivness Of Deep Learning by Yann LeCun, Sept 2014 http://videolectures.net/sahd2014 lecun deep learning/ Machine Learning by Prof. Shai Ben-David, University of Waterloo, Lecture 1-3, Jan 2015 https://www.youtube.com/watch?v=iN8des41d94&spfreload=10 https://www.youtube.com/watch?v=rOcjShZbCFo&spfreload=10 https://www.youtube.com/watch?v=MYbt63PPP8o&spfreload=10 https://www.youtube.com/watch?v=jEIIkhESDac&spfreload=10 Computer Vision by Richard E. Turner, Slides, Exercises & Solutions, University of Cambridge http://cbl.eng.cam.ac.uk/Public/Turner/Teaching Probability and Statistics by Carl Edward Rasmussen, Slides, University of Cambridge http://mlg.eng.cam.ac.uk/teaching/1BP7/1415/ Machine Learning by Carl Edward Rasmussen, Slides, University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/1314/ Seth Grimes's videos Sentiment Analysis Symposium http://vimeo.com/sethgrimes/videos Introduction to Reinforcement Learning by Shane Conway, Nov 2014 Machine learning is often divided into three categories: supervised, unsupervised, and reinforcement learning. Reinforcement learning concerns problems with sequences of decisions (where each decision affects subsequent opportunities), in which the effects can be uncertain, and with potentially long-term goals. It has achieved immense success in various different fields, especially AI/Robotics and Operations Research, by providing a framework for learning from interactions with an environment and feedback in the form of rewards and penalties. Shane Conway, researcher at Kepos Capital, gives a general overview of reinforcement learning, covering how to solve cases where there is uncertainty both in actions and states, as well as where the state space is very large. https://www.hakkalabs.co/articles/introduction-reinforcement-learning#! Machine Learning and Data Mining by Prof. Dr. Volker Tresp, 2014, LMU The lecture is given in English. http://www.dbs.ifi.lmu.de/cms/Maschinelles_Lernen_und_Data_Mining Applied Machine Learning by Joelle Pineau, Fall 2014, McGill University http://cs.mcgill.ca/~jpineau/comp598/schedule.html Analyzing data from the city of Montreal We gave the following instructions to our students. Here's what they came up with. There is a significant effort towards moving much of the data form the city of Montreal into an Open Data format. This data can be accessed here: http://donnees.ville.montreal.qc.ca/ http://donnees.ville.montreal.qc.ca/english-version-of-the-portail-des-donnees-ouvertes-de-laville-de-montreal/ The goal of this project is to use this data to identify an interesting prediction question that can be tackled using machine learning methods, and solve the problem using appropriate machine learning algorithms and methodology. You are not restricted to using only this data (though you should use some of it). You can incorporate data from other sources, or collect additional data (e.g. new test set) if appropriate. The choice of prediction task and dataset to use is open. Try to pick a prediction question that is relevant and important to the citizens or administrators of the city. Remember to design a prediction task that is well suited to your choice of dataset; and vice versa, pick the right data for tackling your prediction question. http://rl.cs.mcgill.ca/comp598/fall2014/ Artificial Intelligence by Joelle Pineau, Winter 2014-2015, McGill University http://www.cs.mcgill.ca/~jpineau/comp424/schedule.html Talking Machines: The History of Machine Learning from the Inside Out In episode five of Talking Machines, we hear the first part of our conversation with Geoffrey Hinton (Google and University of Toronto), Yoshua Bengio (University of Montreal) and Yann LeCun (Facebook and NYU). Ryan introduces us to the ideas in tensor factorization methods for learning latent variable models (which is both a tongue twister and and one of the new tools in ML). To find out more on the topic, the paper Tensor decompositions for learning latent variable models is a good place to start. You can also take a look at the work of Daniel Hsu, Animashree Anandkumar and Sham M. Kakade Plus we take a listener question about just where statistics stops and machine learning begins. http://www.thetalkingmachines.com/blog/2015/2/26/the-history-of-machine-learning-fromthe-inside-out The Simons Institute for the Theory of Computing About The Simons Institute for the Theory of Computing is an exciting new venue for collaborative research in theoretical computer science. Established on July 1, 2012 with a grant of $60 million from the Simons Foundation, the Institute is housed in Calvin Lab, a dedicated building on the UC Berkeley campus. Its goal is to bring together the world's leading researchers in theoretical computer science and related fields, as well as the next generation of outstanding young scholars, to explore deep unsolved problems about the nature and limits of computation. Open Lectures The Simons Institute Open Lectures are aimed at a broad scientific audience. Upcoming lectures can be viewed on the list of Other Events. To view the video of a past lecture, please follow the link in the list below. http://simons.berkeley.edu/events/openlectures DIKU - Datalogisk Institut, Københavns Universitet Youtube Channel https://www.youtube.com/channel/UCo1j8XjbD3B0UVjP0OTU3ZA?spfreload=10 Hashing in machine learning by John Langford, Microsoft Research Video of the lecture from the 2014 Summer School on Hashing: Theory and Applications, July 14-17, 2014, University of Copenhagen, Denmark. https://www.youtube.com/watch?v=BItoTJDupgM&spfreload=10 Dimensionality reductions by Alexander Andoni, Microsoft Research Video of the lecture from the 2014 Summer School on Hashing: Theory and Applications, July 14-17, 2014, University of Copenhagen, Denmark. https://www.youtube.com/watch?v=uLVMv9HFqIk&spfreload=10 RE.WORK Deep Learning Summit Videos, San Francisco 2015 https://www.youtube.com/playlist?list=PLnDbcXCpYZ8lCKExMs8k4PtIbani9ESX3 Machine Learning Tutorial, UNSW Australia http://www.cse.unsw.edu.au/~cs9417ml/ Reinforcement Learning's Tutorial by Tim Eden, Anthony Knittel and Raphael van Uffelen http://www.cse.unsw.edu.au/~cs9417ml/RL1/index.html Oxford's Podcast About This free site features public lectures, teaching material, interviews with leading academics, information about applying to the University, and much more. All the material is arranged within a series of related talks or lectures and may be in audio, video or document format. A full list of all series is available. Content is being added regularly to the site. All content is free for you to download and watch or listen to. This site contains over 6,500 items arranged into 416 series. Over 4,780 academic contributors have released material. http://podcasts.ox.ac.uk Natural Language Processing by Mohamed Alaa El-Dien Aly, 2014, KAUST Information This course covers basic Natural Language Processing concepts. Topics include: language modeling, spelling correction, sentiment analysis, parsing, text classification, information retrieval, ... etc. We will closely follow Coursera's two NLP classes: that by Jurafsky and Manning, as well as that by Collins. http://www.mohamedaly.info/teaching/cmp462-spring-2014-natural-language-processing QUT - Queensland University of Technology, Brisbane, Australia https://moocs.qut.edu.au/users/sign_in https://www.qut.edu.au/ QUT: Introduction to Robotics by Professor Peter Corke (you need to sign in) Course Summary This course is an introduction to the exciting world of robotics and the mathematics and algorithms that underpin it. You will develop an understanding of the representation of pose and motion, kinematics, dynamics and control. You will also be introduced to the variety of robots and the diversity of tasks to which this knowledge and skills can be applied, the role of robots in society, and associated ethical issues. If you have access to a LEGO Mindstorms robotics development kit you will be able to build a simple robot arm and write the control software for it. This course combined with the Robotic vision MOOC, is based on a 13 week undergraduate course, Introduction to robotics at the Queensland University of Technology. QUT: Robotic Vision by Professor Peter Corke (you need to sign in) Course Summary Robotic Vision introduces you to the field of computer vision and the mathematics and algorithms that underpin it.You’ll learn how to interpret images to determine the color, size, shape and position of objects in the scene.We’ll work with you to build an intelligent vision system that can recognise objects of different colours and shapes. Data & Society Data & Society is an NYC-based think/do tank focused on social, cultural, and ethical issues arising from data-centric technological development. Data & Society is an independent nonprofit 501(c)3 research institute. Its creation is supported by a generous gift from Microsoft. http://www.datasociety.net Open Book for people with autism Open Book is a new interactive tool that will assist people with autism to transform written information into a format that is easier for them to read and understand. This program has been developed by the FIRST project. The program is primarily aimed at people with autism who have IQ levels of 70 and above. Open Book is now available online in English, Spanish and Bulgarian. This project is partially funded by the European Commission under the Seventh Framework Programme for Research and Technological Development (FP7-2007-2013). http://www.first-asd.eu/openbook-video NUMDAN, Recherche et téléchargement d’archives de revues mathématiques numérisées http://www.numdam.org/?lang=en Project Euclid, mathematics and statistics online http://projecteuclid.org Statistical Modeling: The Two Cultures by Leo Breiman, 2001 http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726 mini-DML < The goal of the project > To collate in one place basic bibliographical data for any kind of mathematical digital article and make them accessible to the users through simple search or metadata retrieval. < The collections > A proof-of-concept implementation is presented, based on a variety of sources of mathematical texts. The main emphasis is on long-run journals whose early production is widely unknown to MathSciNet / Jahrbuch Zentralblatt-MATH, with special interest towards current production and preprints on the other end. NUMDAM journals (currently : 18) and seminars (currently : 21); CEDRAM Journals (current issues) (4) One Gallica journal : Journal de mathématiques pures & appliquées (a.k.a. Liouville) up to 1880; Gallica Complete works (Abel, Cauchy, Dirichlet, Fourier, Jacobi, Klein, Lagrange, Laguerre, Laplace, Möbius, Riemann); Project Euclid journals (Duke, Adv. in Appl. Probab., etc.); Math part of ArXiv. Journals from ICM (Bibliotheca Wirtualna Matematyki) http://minidml.mathdoc.fr MISCELLANEOUS The Automatic Statistician project About the Automatic Statistician project Making sense of data is one of the great challenges of the information age we live in. While it is becoming easier to collect and store all kinds of data, from personal medical data, to scientific data, to public data, and commercial data, there are relatively few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data. The Automatic Statistician project aims to build an artificial intelligence for data science, helping people make sense of their data. The current version of the Automatic Statistician is a system which explores an open-ended space of possible statistical models to discover a good explanation of the data, and then produces a detailed report with figures and natural-language text. While at Cambridge, James Lloyd, David Duvenaud and Zoubin Ghahramani, in collaboration with Roger Grosse and Joshua Tenenbaum at MIT, developed an early version of this system which not only automatically produces a 10-15 page report describing patterns discovered in data, but returns a statistical model with state-of-the-art extrapolation performance evaluated over real time series data sets from various domains. The system is based on reasoning over an open-ended language of nonparametric models using Bayesian inference. Kevin P. Murphy, Senior Research Scientist at Google says: "In recent years, machine learning has made tremendous progress in developing models that can accurately predict future data. However, there are still several obstacles in the way of its more widespread use in the data sciences. The first problem is that current Machine Learning (ML) methods still require considerable human expertise in devising appropriate features and models. The second problem is that the output of current methods, while accurate, is often hard to understand, which makes it hard to trust. The "automatic statistician" project from Cambridge aims to address both problems, by using Bayesian model selection strategies to automatically choose good models / features, and to interpret the resulting fit in easy-to-understand ways, in terms of human readable, automatically generated reports. This is a very promising direction for ML research, which is likely to find many applications at Google and beyond." The project has only just begun but we're excited for its future. Check out our example analyses to get a feel for what our work is about. http://www.automaticstatistician.com/about.php A selection of Youtube's featured channels This channel was generated automatically by YouTube's video discovery system. Channels auto generated by YouTube are channels created by algorithms to collect trending and popular videos by topic. Auto generated channels act like user channels in that you can subscribe to them and stay updated on new videos. Machine Learning - Topic https://www.youtube.com/channel/UCZsleorsr6rdZGfj1uwII2g?spfreload=10 Cluster analysis - Topic https://www.youtube.com/channel/UCnw0vt f06vjKe4v0WzMXWQ?spfreload=10 Regression analysis - Topic https://www.youtube.com/channel/UCOJOKlW_JtDxuIgOBRtO2sg?spfreload=10 Principal component analysis - Topic https://www.youtube.com/channel/UC8pfZWXk12lvZMCde04 4Og?spfreload=10 Support vector machine - Topic https://www.youtube.com/channel/UCzBnlLuYsm-f16t9RUClCaQ?spfreload=10 Artificial neural network - Topic https://www.youtube.com/channel/UC9Mv-haow40iOxAduUUMrwA?spfreload=10 Bayes' theorem - Topic https://www.youtube.com/channel/UCh8Hk7vMxhTSOAY2nUtkKNg?spfreload=10 Genetic algorithm - Topic https://www.youtube.com/channel/UC3ykWul05jzoN3nbdJv9Ghg?spfreload=10 Data Mining - Topic https://www.youtube.com/channel/UCO gxCgk006uEqVTNhrV-sw?spfreload=10 Statistical classification Topic https://www.youtube.com/channel/UCV7kd6QmwVf6J01y9f6-duA?spfreload=10 Computer vision - Topic https://www.youtube.com/channel/UCWA02whr4rPRQwrat0NlR2Q?spfreload=10 Introduction To Modern Brain-Computer Interface Design by Swartz Center for Computational Neuroscience This is an online course on Brain-Computer Interface (BCI) design with a focus on modern methods. The lectures were first given by Christian Kothe (SCCN/UCSD) in 2012 at University of Osnabrueck within the Cognitive Science curriculum and have now been recorded in the form of an open online course. The course includes basics of EEG, BCI, signal processing, machine learning, and also contains tutorials on using BCILAB and the lab streaming layer software. http://sccn.ucsd.edu/wiki/Introduction To Modern Brain-Computer Interface Design Distributed Computing Courses (lectures, exercises with solutions) by ETH Zurich, Group of Prof. Roger Wattenhofer Mission We are interested in both theory and practice of computer science and information technology. In our group we cultivate a large breadth of areas, reflecting our different backgrounds in computer science, mathematics, and electrical engineering. This gives us a unique blend of basic and applied research, proving mathematical theorems on the one hand, and building practical systems on the other. We currently study the following topics: Distributed computing (computability, locality, complexity), distributed systems (Bitcoin), wireline networks (software defined networks), wireless networks (media access theory and practice), social networks (influence), algorithms (online algorithms, game theory), learning theory (recommendation theory and practice). We regularly publish in different communities: distributed computing (e.g. PODC, SPAA, DISC), networking (e.g. SIGCOMM, MobiCom, SenSys), theory (e.g. STOC, FOCS, SODA, ICALP), and from time to time at random in areas such as machine learning or human computer interaction. Members of our group have won several best paper awards at top conferences such as PODC, SPAA, DISC, MobiCom, or P2P. Roger Wattenhofer has won the Prize for Innovations in Distributed Computing in 2012, for “extensive contributions to the study of distributed approximation”. Some projects turned into startup companies, e.g. Wuala, StreamForge, BitSplitters. Several projects have been covered by popular media and blogs, e.g. Gizmodo, Lifehacker, New York Times, NZZ, PC World Magazine, Red Herring, or Technology Review. Some of the software developed by our students is very popular: The music application Jukefox and the peer-to-peer client BitThief have together more than 1 million downloads. A branch of the United States FBI has requested to use a ver- sion of BitThief as a tool to uncover illegal activities. About half of the former PhD students are in academic positions, some others founded startup companies. http://dcg.ethz.ch/courses.html The wonderful and terrifying implications of computers that can learn | Jeremy Howard | TEDxBrussels Published on 6 Dec 2014 This talk was given at a local TEDx event, produced independently of the TED Conferences. The extraordinary, wonderful, and terrifying implications of computers that can learn https://www.youtube.com/watch?v=xx310zM3tLs&spfreload=10 Partially derivative, A podcast about data, data science, and awesomeness! Partially Derivative is a show about data, data science, drinking, and awesomeness! We cover our top 10 data-related articles and blog posts from the past week all in 30 minutes, or sometimes longer, depending on much we’ve been drinking. The show is hosted by Jonathon Morgan, a startup CTO, and Dr. Chris Albon, a computational political scientist. http://www.partiallyderivative.com Class Central MOOC Tracker Never miss a course https://www.class-central.com Beginning to Advanced University CS Courses Awesome Courses Introduction There is a lot of hidden treasure lying within university pages scattered across the internet. This list is an attempt to bring to light those awesome courses which make their high-quality material i.e. assignments, lectures, notes, readings & examinations available online for free. https://github.com/prakhar1989/awesome-courses WIRED UK Youtube Channel https://www.youtube.com/user/WiredVideoUK/videos?spfreload=10 AI at WIRED2014: The next big frontier is the mind and brain - Full WIRED2014 talk "When we were kids, we felt like the space age was imminent," says Google machine learning expert Blaise Aguera y Arcas. "But in a funny way, the big frontier for our generation is the mind, the brain -- these inward spaces" - Full WIRED 2014 talk The engineer, who was the architect of Bing Maps, was joined on stage at WIRED2014 by DeepMind Technologies founder Demis Hassabis and Ben Medlock, CTO of Swiftkey. WIRED2014 was the fourth annual event to bring the values of WIRED to life. Building on experience from groundbreaking previous events, WIRED2014 gathered pioneering speakers from around the world to stimulate debate, spread ideas and showcase the future in a multidisciplinary way. https://www.youtube.com/watch?v=CUhflgWvvoo Davos 2015 - A Brave New World - How will advances in artificial intelligence, smart sensors and social technology change our lives? • Rodney Brooks, Founder, Chairman and Chief Technical Officer, Rethink Robotics, USA; Technology Pioneer • Anthony Goldbloom, Founder and Chief Executive Officer, Kaggle, USA; Technology Pioneer • Hiroaki Nakanishi, Chairman and Chief Executive Officer, Hitachi, Japan • Kenneth Roth, Executive Director, Human Rights Watch, USA • Stuart Russell, Professor, University of California, Berkeley, USA; Global Agenda Council on Artificial Intelligence & Robotics Moderated by • Hiroko Kuniya, Anchor and Presenter, Today's Close-Up, NHK (Japan Broadcasting Corporation), Japan; Global Agenda Council on Japan https://www.youtube.com/watch?v=wGLJXO08IYo&spfreload=10 http://www.weforum.org World Economic Forum The World Economic Forum is an international institution committed to improving the state of the world through public-private cooperation in the spirit of global citizenship. It engages with business, political, academic and other leaders of society to shape global, regional and industry agendas. Incorporated as a not-for-profit foundation in 1971 and headquartered in Geneva, Switzerland, the Forum is independent, impartial and not tied to any interests. It cooperates closely with all leading international organizations. Best known for its Annual Meeting in Davos, Switzerland, the World Economic Forum, also publishes benchmark global reports on Competitiveness, Gender, and Risk. https://www.youtube.com/user/WorldEconomicForum/search?query=machine learning The Global Gender Gap Report The Global Gender Gap Report, published by the World Economic Forum, provides a framework for capturing the magnitude and scope of gender-based disparities around the world. https://www.youtube.com/channel/UCw-kH-Od73XDAt7qtH9uBYA?spfreload=10 Technology Pioneer 2014⎪Anthony Goldbloom⎪Kaggle https://www.youtube.com/watch?v=OShGuf7QeJY&spfreload=10 IdeasLab 2014 - Emma Brunskill - Closing the Skills Gap with Machine Learning https://www.youtube.com/watch?v=oZVSp1YS4jQ IdeasLab 2014 - Ian Goldin - The Future of Machine Intelligence https://www.youtube.com/watch?v=0fWYnv2gUWI&spfreload=10 IdeasLab 2014 - Michael Altendorf - The Truth of Machine Learning https://www.youtube.com/watch?v=JJBb-78gofY&spfreload=10 The LINCS project LINCS aims to create a network-based understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents, and by using computational tools to integrate this diverse information into a comprehensive view of normal and disease states that can be applied for the development of new biomarkers and therapeutics. By generating and making public data that indicates how cells respond to various genetic and environmental stressors, the LINCS project will help us gain a more detailed understanding of cell pathways and aid efforts to develop therapies that might restore perturbed pathways and networks to their normal states. This website is a source of information for the research community and general public about the LINCS project. It contains information about the experiments conducted, as well as links to participating LINCS centers’ websites, data releases from LINCS centers, and tools that can be used for analyzing the data. http://www.lincsproject.org Australian Academy of Science Official YouTube channel of the Australian Academy of Science, an independent organisation representing Australia's leading scientists. It recognises excellence, advises government and promotes science education and public awareness of science. https://www.youtube.com/user/ScienceAcademyAu/videos?spfreload=10 Artificial intelligence: Machines on the rise About the talk Speaking, natural-sounding machines which can interact with humans using normal conversational patterns are still in the realm of science fiction or are they? Associate Professor James Curran is developing artificial intelligence which will revolutionise the way we interact with technology - using spoken language, the same way we interact with each other. Using computational linguistics, an area of artificial intelligence, he’s building computer systems that can understand and communicate with us in our own natural languages. These systems will be able to navigate, manipulate and summarise knowledge, unlocking vast stores of language-based human knowledge on the web and beyond. https://www.youtube.com/watch?v=HwdmesBcbaw&spfreload=10 Bill Gates Q&A on Reddit Hi Reddit, I’m Bill Gates and I’m back for my third AMA. Ask me anything. https://www.reddit.com/r/IAmA/comments/2tzjp7/ hi reddit im bill gates and im back for my third/ The Guardian: Artificial intelligence will become strong enough to be a concern, says Bill Gates Former Microsoft boss joins Elon Musk and Stephen Hawking in suggesting that the march of AI could be an existential threat to humans http://www.theguardian.com/technology/2015/jan/29/artificial-intelligence-strong-concernbill-gates Second Price went to Yarin Gal for his extrapolated art image, Cambridge University Engineering Photo Competition The PhD student extended Van Gogh's Starry Night using algorithms to see what might have happened if the artist had carried on painting. http://www.telegraph.co.uk/technology/11228471/In-Pictures-Cambridge-UniversityEngineering-Photo-Competition-Winners.html?frame=3105270 Draw from a Deep Gaussian Process by David Duvenaud, Cambridge University Engineering Photo Competition http://www.telegraph.co.uk/technology/11228471/In-Pictures-Cambridge-UniversityEngineering-Photo-Competition-Winners.html?frame=3105356 MOOC, Opencourseware in Spanish I hope to find resources soon. Any suggestion is welcome! Thanks in advance! Jacqueline MOOC, Opencourseware in German I hope to find resources soon. Any suggestion is welcome! Thanks in advance! Jacqueline MOOC, Opencourseware in Italian I hope to find resources soon. Any suggestion is welcome! Thanks in advance! Jacqueline MOOC, Opencourseware in French France Universite Numerique (FUN) https://www.france-universite-numerique-mooc.fr Contrairement a ses homologues anglo-saxons, l’accès aux archives de FUN est prohibé. Certains liens de cours ci-dessous, peuvent rapidement devenir obsolètes. Désolée de ce problème. Jacqueline FUN: MinesTelecom: 04006 Fondamentaux pour le Big Data Ce MOOC s'adresse à un public ayant des bases en mathématiques et en algorithmique (niveau L2 validé) nécessitant un rafraichissement de ces connaissances pour suivre des formations en data science et big data. Il peut être suivi en préparation du Mastère Spécialisé « Big data : Gestion et analyse des données massives », du Certificat d’Etudes Spécialisées « Data Scientist » et de la formation courte « Data Science : Introduction au Machine Learning » . https://www.france-universite-numerique-mooc.fr/courses/MinesTelecom/04006/ Trimestre_1_2015/about University of Laval (French Canadian) Open access to the course material Apprentissage automatique Apprentissage automatique à partir de données et apprentissage supervisé. Minimisation du risque empirique et minimisation du risque structurel. Méthodes d'estimation du vrai risque à partir de données et intervalles de confiance. Classificateurs linéaires et non linéaires. Forme duale de l'algorithme du perceptron. Noyaux de Mercer. Classificateurs à large marge de séparation. SVMs à marge rigide et marge floue. Apprentissage probablement approximativement correct (PAC) et théorie de Vapnik et Chervonenkis sur l'erreur de prédiction des classificateurs. L'apprentissage par compression de l'échantillon et applications aux SCMs et perceptrons. https://cours.ift.ulaval.ca/2009a/ift7002_81602/ Théorie algorithm. des graphes Ce cours aborde des sujets tels la connexité dans un graphe (problèmes du flot maximum, de la dualité min-max, de couplage parfait, etc.), la planarité d'un graphe (formule d'Euler, théorème de Kuratowski, graphe dual), le coloriage d'un graphe (coloriages entiers et fractionnaires des sommets ou des arêtes, graphes de Kneiser), les problèmes de transversales d'un graphe (parcours eulériens, cycles hamiltoniens, graphes de DeBruijn, etc.) et la notion de marche aléatoire sur un graphe (chaînes de Markov, existence de la distribution limite, «mixing time», etc.). Plusieurs problèmes sur les graphes ont d'élégantes solutions, d'autres évidemment sont NP-complets; une partie de ce cours portera donc sur la théorie de la complexité (problèmes NP et NP-complets, théorème de Cook, algorithmes de réductions). https://cours.ift.ulaval.ca/2012a/ift7012 89927/ Hugo Larochelle, Apprentissage automatique, French Canadian Je m'intéresse aux algorithmes d'apprentissage automatique, soit aux algorithmes capables d'extraire des concepts ou patrons à partir de données. Mes travaux se concentrent sur le développement d'approches connexionnistes et probabilistes à diverses problèmes d'intelligence artificielle, tels la vision artificielle et le traitement automatique du langage. Les thèmes de recherche auxquels je m'intéresse incluent: Problèmes: apprentissage supervisé, semi-supervisé et non-supervisé, prédiction de cibles structurées, ordonnancement, estimation de densité; Modèles: réseaux de neurones profonds («deep learning»), autoencodeurs, machines de Boltzmann, champs Markoviens aléatoires; Applications: reconnaissance et suivi d'objects, classification et ordonnancement de documents; https://www.youtube.com/user/hugolarochelle?spfreload=10 http://www.dmi.usherb.ca/~larocheh/index_fr.html The Machine Learning Salon donne rarement son avis, mais concernant les vidéos de Hugo Larochelle, c’est vraiment excellent ! Toutes mes félicitations et remerciements a Hugo Larochelle! Francis Bach, Ecole Normale Superieure - Courses and Exercises with solutions (English-French) Spring 2014: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2013: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2013: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Spring 2013: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure (Paris) Fall 2012: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2012: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure (Paris) Spring 2012: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2011: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2011: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2010: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2010: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite Paris-Sud (Orsay) Fall 2009: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Fall 2008: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2008: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des Mines de Paris Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2007: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des Mines de Paris Fall 2006: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan Fall 2005: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan http://www.di.ens.fr/~fbach/ College de France, Mathematics and Digital Science, French One of the Collège de France's missions is to promote French research and thought abroad, and to participate in intel-lectual debates on major world issues. The institution therefore participates in international exchange through its teaching and the dissemination of knowledge, as well as through the research programmes involving its Chairs and laboratories. The fact that one fifth of the professors are currently from abroad, confirms the Collège de France's wid-ening research and education policy. This policy of international openness translates into: • Collège de France professors' teaching missions abroad • Lectures and lecture series by visiting professors • Junior Visiting Researchers scheme • Lecture series and symposia abroad • Internet broadcasts http://www.college-de-france.fr/site/alain-connes/index.htm Le Laboratoire de Recherche en Informatique (LRI) Le Laboratoire de Recherche en Informatique (LRI) est une unité mixte de recherche (UMR8623) de l'Université Paris-Sud et du CNRS. Les thèmes de recherche du laboratoire couvrent un large spectre de l'informatique à dominante logicielle et incluent à la fois des aspects fondamentaux et des aspects appliqués : algorithmique, combinatoire, graphes, optimisation discrète et continue, programmation, génie logiciel, vérification et preuves, parallélisme, calcul à haute performance, grilles, architecture et compilation, réseaux, bases de données, représentation et traitement des connaissances, apprentissage, fouille de données, bioinformatique, interaction homme-machine, etc. Cette diversité est l'une des forces du laboratoire car elle favorise les recherches aux frontières, là où le potentiel d'innovation est le plus grand. https://www.lri.fr MOOC, Opencourseware in Russian Russian Machine Learning Resources Google Translation from Russian: Professional information and analytical resource dedicated machine learning , pattern recognition and data mining . Now resource contains 831 article in Russian. (Source 16-07-2014) Classification Pattern recognition Regression analysis Analysis and understanding of images Prediction Processing and analysis of texts Applied Statistics Applied Systems Analysis Data Signal Processing All Destinations http://www.machinelearning.ru/wiki/index.php?title=Заглавная страница The Yandex School of Data Analysis The School of Data Analysis is a free Master’s-level program in Computer Science and Data Analysis, which is offered by Yandex since 2007 to graduates in engineering, mathematics, computer science or related fields. The aim of the School is to train specialists in data analysis and information retrieval for further employment at Yandex or any other IT company. … The School’s courses are taught by Russian and international experts at Yandex’s Moscow office in the evenings, several times a week. The average study load is 15-20 hours per week, including 9-12 hours of lectures and seminars. The School also runs distance-learning courses and provides lectures over the internet. All courses at the Yandex School of Data Analysis are currently taught only in Russian. http://shad.yandex.ru/lectures/ Alexander D’yakonov Resources http://alexanderdyakonov.narod.ru/index.htm Unknown in Data Mining and Machine Learning (2013) Чему не учат в анализе данных и машинном обучении http://alexanderdyakonov.narod.ru/lpot4emu.pdf Introduction to Data Mining (2012) Введение в анализ данных http://alexanderdyakonov.narod.ru/intro2datamining.pdf Tricks in Data Mining (2011) Шаманство в анализе данных http://alexanderdyakonov.narod.ru/lpotdyakonov.pdf MOOC, Opencourseware in Japanese I hope to find resources soon. Any suggestion is welcome! Thanks in advance! Jacqueline MOOC, Opencourseware in Chinese Yeeyan Coursera Chinese Classroom Google Translation from Chinese (Simplified Han) to English Welcome to Yeeyan × Coursera Chinese classroom. In this always have a small partner to accompany the classroom, you can: join collaborative translation; exchange ideas; enrollment became class representative; punch seek supervision; ...... Finally, welcome to drying out your certificate, either × Coursera joint Yeeyan Translator's Certificate or Certificate of Coursera course, you are overcome my own life winner! http://coursera.yeeyan.org Hong Kong Open Source Conference 2013 Wang Leung Wong The Vice-Chairperson of the Hong Kong Linux User Group This channel will post the videos of my life and opensource events in Hong Kong. Hong Kong Linux User Group: http://linux.org.hk Facebook: https://www.facebook.com/groups/hklug/ http://www.youtube.com/playlist?list=PL2FSfitY-hTKbEKNOwb-j0blK6qBauZ1f Guokr.com Machine Learning http://mooc.guokr.com/search/?wd= %E6%9C%BA%E5%99%A8%E5%AD %A6%E4%B9%A0 Data Mining http://mooc.guokr.com/search/?wd=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E %98 Artificial Intelligence http://mooc.guokr.com/search/?wd=%E4%BA%BA%E5%B7%A5%E6%99%BA %E8%83%BD MOOC, Opencourseware in Portuguese Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF, 2010 http://www2.ic.uff.br/~bianca/aa/ Algoritmo de Aprendizado de Máquina by Aurora Trinidad Ramirez Pozo, Universidade Federal do Paraná, UFPR http://www.inf.ufpr.br/aurora/tutoriais/aprendizadomaq/ http://www.inf.ufpr.br/aurora/tutoriais/arvoresdecisao/ http://www.inf.ufpr.br/aurora/tutoriais/Ceapostila.pdf http://www.inf.ufpr.br/aurora/ Digital Library, Universidad de Sao Paulo http://www.teses.usp.br/index.php? option=com jumi&fileid=20&Itemid=96&lang=en&cx=011662445380875560067%3Acack5lsx ley&cof=FORID%3A11&hl=en&q=machine learning&siteurl=www.teses.usp.br%2Findex.php %3Foption%3Dcom jumi%26fileid%3D20%26Itemid%3D96%26lang %3Den&ref=www.teses.usp.br%2F&ss=5799j3321895j16 MOOC, Opencourseware in Hebrew Open University of Israel .האוניברסיטה הפתוחה היא ייחודית בנוף האקדמי בישראל היא דומה לאוניברסיטאות האחרות בחתירתה למצוינות ובשקידתה על איכות למדנית ומדעית גבוהה ,אך היא שונה מהן במבנה הארגוני שלה ,בשיטות ההוראה שלה ,במערך .תכניות הלימודים ובדרישותיה מן המועמדים הפונים להירשם לקורסים שלה האוניברסיטה הפתוחה ,כשמה כן היא .היא פותחת את שעריה ,בלא תנאים מוקדמים ובלי דרישות קדם ,הן בפני מי שמבקשים ללמוד קורסים בודדים או חטיבות קורסים ,הן בפני מי ".שמעוניינים ללמוד תכנית לימודים מלאה לתואר "בוגר אוניברסיטה http://www.youtube.com/user/openofek/search?query=machine learning Homeworks, Assignments & Solutions CS229 Stanford Machine Learning List of projects (free access to abstracts), 2013 and previous years http://cs229.stanford.edu/projects2013.html http://cs229.stanford.edu CS229 Stanford Machine Learning by Andrew Ng, Autumn 2014 Some Exercises & Solutions collected from CS229's link http://cs229.stanford.edu/materials/ps1.pdf http://cs229.stanford.edu/materials/ps1sol.pdf http://cs229.stanford.edu/materials/ps2.pdf http://cs229.stanford.edu/materials/ps2sol.pdf http://cs229.stanford.edu/materials/ps3.pdf http://cs229.stanford.edu/materials/ps3sol.pdf http://cs229.stanford.edu/materials/midterm-2010-solutions.pdf http://cs229.stanford.edu/materials/midterm_aut2014.pdf CS 445/545 Machine Learning by Melanie Mitchell, Winter Quarter 2014 Some Exercises & Solutions http://web.cecs.pdx.edu/~mm/MachineLearningWinter2014/ Top Writing Errors by Melanie Mitchell http://web.cecs.pdx.edu/~mm/TopWritingErrors.pdf Introduction to Machine Learning, Machine Learning Lab, University of Freiburg, Germany http://ml.informatik.uni-freiburg.de/teaching/ss14/ml http://ml.informatik.uni-freiburg.de/_media/teaching/ss14/ml/sheet01.pdf http://ml.informatik.uni-freiburg.de/ media/teaching/ss14/sheet01 solution.pdf Unsupervised Feature Learning and Deep Learning by Andrew Ng, 2011 ? http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=ufldl&doc=exercises/ex1/ex1.html Machine Learning by Andrew Ng, 2011 http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex2/ex2.html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex3/ex3.html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex4/ex4.html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex5/ex5.html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex6/ex6.html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex7/ex7.html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex8/ex8.html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php? course=MachineLearning&doc=exercises/ex9/ex9.html Pattern Recognition and Machine Learning, Solutions to Exercises, by Markus Svensen and Christopher Bishop, 2009 http://research.microsoft.com/en-us/um/people/cmbishop/prml/pdf/prml-websol-2009-09-08.pdf Machine Learning Course by Aude Billard, Exercises & Solutions, EPFL, Switzerland Overview and objective The aim of machine learning is to extract knowledge from data. The algorithm may be informed by incorporating prior knowledge of the task at hand. The amount of information varies from fully supervised to unsupervised or semi-supervised learning. This course will present some of the core advanced methods in the field for structure discovery, classification and non-linear regression. This is an advanced class in Machine Learning; hence, students are expected to have some background in the field. The class will be accompanied by practical session on computer, using the mldemos software (http://mldemos.epfl.ch) that encompasses more than 30 state of the art algorithms. http://lasa.epfl.ch/teaching/lectures/ML_Phd/ T-61.3025 Principles of Pattern Recognition Weekly Exercises with Solutions (in English), Aalto University, Finland, 2015 https://noppa.aalto.fi/noppa/kurssi/t-61.3025/viikkoharjoitukset T-61.3050 Machine Learning: Basic Principles Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014 https://noppa.aalto.fi/noppa/kurssi/t-61.3050/viikkoharjoitukset http://www.aalto.fi/en/ CSE-E5430 Scalable Cloud Computing Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014 https://noppa.aalto.fi/noppa/kurssi/cse-e5430/viikkoharjoitukset Weekly Exercises with Solutions (in English) from Aalto University, Finland TO EXPLORE, not to be missed! https://noppa.aalto.fi/noppa/kurssit/sci/t3060 SurfStat Australia: an online text in introductory Statistics http://surfstat.anu.edu.au/surfstat-home/surfstat-main.html Exercises & Solutions http://surfstat.anu.edu.au/surfstat-home/exercises.html Learning from Data by Amos Storkey, Tutorial & Worksheets (with solutions), University of Edinburgh, Fall 2014 This is a course for basic data analysis, statistical model building and machine learning. The course aims to provide a set of tools that I hope you will find very useful, coupled with a principled approach to formulating solutions to problems in machine learning. http://www.inf.ed.ac.uk/teaching/courses/lfd/lfdtutorials.html Web Search and Mining by Christopher Manning and Prabhakar Raghavan,, Winter 2005 Slides, Exercises & Solutions http://web.stanford.edu/class/cs276b/ http://web.stanford.edu/class/cs276b/syllabus.html Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Spring 2014 This course will provide an introduction to the theoretical analysis of prediction methods, focusing on statistical and computational aspects. It will cover approaches such as kernel methods and boosting algorithms, and probabilistic and game theoretic formulations of prediction problems, and it will focus on tools for the theoretical analysis of the performance of learning algorithms and the inherent difficulty of learning problems. http://www.stat.berkeley.edu/~bartlett/courses/2014spring-cs281bstat241b/ Introduction to Time Series by Peter Bartlett, Berkeley, Homework & solutions, Fall 2010 An introduction to time series analysis in the time domain and frequency domain. Topics will include: Stationarity, autocorrelation functions, autoregressive moving average models, partial autocorrelation functions, forecasting, seasonal ARIMA models, power spectra, discrete Fourier transform, parametric spectral estimation, nonparametric spectral estimation. http://www.stat.berkeley.edu/~bartlett/courses/153-fall2010/index.html Introduction to Machine Learning by Stuart Russel, CS 194-10, Fall 2011, Assignments & Solutions The course will be a mixture of theory, algorithms, and hands-on projects with real data. The goal is to enable students to understand and use machine learning methods across a wide range of settings. http://www.eecs.berkeley.edu/~russell/classes/cs194/f11/ Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Fall 2009 This course will provide an introduction to probabilistic and computational methods for the statistical modeling of complex, multivariate data. It will concentrate on graphical models, a flexible and powerful approach to capturing statistical dependencies in complex, multivariate data. In particular, the course will focus on the key theoretical and methodological issues of representation, estimation, and inference. http://www.cs.berkeley.edu/~bartlett/courses/2009fall-cs281a/ Advanced Topics in Machine Learning by Arthur Gretton, 2015, University College London (exercises with solutions) http://www.gatsby.ucl.ac.uk/~gretton/coursefiles/rkhscourse.html Reinforcement Learning by David Silver, 2015, University College London (exercises with solutions) http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html Emmanuel Candes Lectures, Homeworks & Solutions, Stanford University (great resources, not to be missed!) http://statweb.stanford.edu/~candes/teaching.html Advanced Topics in Convex Optimization by Emmanuel Candes, Handouts, Homeworks & Solutions, Winter 2015, Stanford University Description: The main goal of this course is to expose students to modern and fundamental developments in convex optimization, a subject which has experienced tremendous growth in the last 20 years or so. This course builds on EE 364 and explores two distinct areas. The first concerns cone programming and especially semidefinite programming whose rich geometric theory and expressive power makes it suitable for a wide spectrum of important optimization problems arising in engineering and applied science. The second concerns novel and efficient first-order methods, e.g. Nesterov's method, for smooth and nonsmooth convex optimization which are suitable for large-scale problems. This is an advanced topics course, which will hopefully bring students near the frontier of current research. http://statweb.stanford.edu/~candes/math301/index.html MSM 4M13 Multicriteria Decision Making by SÁNDOR ZOLTÁN NÉMETH, School of Mathematics, University of Birmingham Slides, Handouts, Problems http://web.mat.bham.ac.uk/S.Z.Nemeth/teaching.htm Theorem Proofs, Exercises & Solutions http://web.mat.bham.ac.uk/S.Z.Nemeth/4m13 10-601 Machine Learning Spring 2015, Homeworks & Solutions & Code (Matlab) http://www.cs.cmu.edu/%7Eninamf/courses/601sp15/homeworks.shtml Introduction to Machine Learning by Alex Smola, CMU, Homeworks & Solutions I work on machine learning and statistical data analysis. This includes application areas ranging from document analysis, bioinformatics, computer vision to the analysis of internet data. In my work I have supervised numerous PhD students and researchers and I have written over 150 papers, written one book and edited 5 books. My specialties are kernel methods, such as Support Vector Machines and Gaussian Processes, and unsupervised information extraction. This includes highly scalable models which work on many TB of data and hundreds of millions of users. Specialties: Kernel Methods, User Profiling, Computational Advertising, Document Analysis, Bioinformatics, Statistical Modelling, Optimization http://alex.smola.org/teaching/10-701-15/submission.html https://www.youtube.com/user/smolix?spfreload=10 Applications MIT Media Lab The real-time city is now real! The increasing deployment of sensors and hand-held electronics in recent years is allowing a new approach to the study of the built environment. The way we describe and understand cities is being radically transformed - alongside the tools we use to design them and impact on their physical structure. Studying these changes from a critical point of view and anticipating them is the goal of the SENSEable City Laboratory, a new research initiative at the Massachusetts Institute of Technology. http://senseable.mit.edu TEDx San Francisco, Connected Reality Connected Reality is an evening that explored how the exponential technologies of the Internet of Things will give us deep insights that augment our understanding of the world and each other and will propel our ability to build intelligent tools that augment our lives. We'll briefly see the future through the eyes of presenters from varied industries of medicine to manufacturing who will illustrate how they use sensor data to perceive and understand the world differently and adjust their realities based on their new connectivity to their environment. http://tedxsf.org/videos/#tedxsf-connected-reality Emotion&Pain Project One of the main challenges facing healthcare providers in the UK today (and in Europe) is the rising number of people with chronic health problems. Almost 1 in 7 UK citizens experiences chronic pain, some due to chronic diseases such as osteoarthritis, but much of it mechanical low back pain (LBP) with no treatable pathology. 40% of these people experience severe pain and are very restricted by it. The capacity of our current health care system is insufficient to treat all these patients face-toface. Pain experience is affected by physical, psychological, and social factors and hence it poses a problem to the medical profession. This has prompted the development of a multidisciplinary approach to the treatment of chronic LBP, primarily involving psychology and physiotherapy alongside specialist clinicians (see British Pain Society guidelines). These programmes enable patients to become more self-managing through improving their physical and psychological functioning. While short term results are good, maintenance of these gains, and building on them, remains a problem, with psychological factors being one of the primary limiting causes. Rehabilitation-assistive technologies have shown some success in helping recovery in a number of conditions but have yet to have an impact in pain management, mostly because of the complexity of dealing with emotional and motivational aspects of self-directed activity increase. By providing the means to automatically recognise, interpret, and act upon human affective states, recent developments in sensing technology and the field of affective computing offer new avenues for addressing these limitations and alleviating the difficulties patients face in building on treatment gains. Thus we propose the design and development of an intelligent system that will enable ubiquitous monitoring and assessment of patients’ pain-related mood and movements inside (and in the longer term, outside) the clinical environment. Specifically, we aim to (a) develop a set of methods for automatically recognising audiovisual cues related to pain, behavioural patterns typical of low back pain, and affective states influencing pain, and (b) integrate these methods into a system that will provide appropriate feedback and prompts to the patient based on his/her behaviour measured during self-directed physical therapy sessions. In doing so, we seek to develop a new generation of multimodal patient-centred personal health technology. http://www.emo-pain.ac.uk IBM Research Machine learning applications Five innovations that will change our lives within five years http://www.research.ibm.com/cognitive-computing/machine-learning-applications/ index.shtml#fbid=Dp4uN7k8b2O EFPL Ecole Polytechnique Fédérale de Lausanne EPFL is one of two Federal Institutes of Technology in Switzerland. Located along the shore of Lake Geneva, the university has more than 9,000 students in seven academic schools including Life Science, Architecture, and Computer Sciences. http://www.youtube.com/channel/UClMJeVIVyGp-3 kWtspkS0Q Visualizing MBTA Data: An interactive exploration of Boston's subway system Boston’s Massachusetts Bay Transit Authority (MBTA) operates the 4th busiest subway system in the U.S. after New York, Washington, and Chicago. … We attempt to present this information to help people in Boston better understand the trains, how people use the trains, and how the people and trains interact with each other. http://mbtaviz.github.io Commercial Applications Listed without any transfer of money Google glass http://www.youtube.com/watch?v=D7TB8b2t3QE Google self-driving car http://www.youtube.com/watch?v=cdgQpa1pUUE SenseFly http://www.youtube.com/watch?v=NuZUSe87miY HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER Earlier this week, roughly 50,000 Skype users woke up to a new way of communicating over the Web-based phone- and video-calling platform, a feature that could’ve been pulled straight out of Star Trek. The new function, called Skype Translator, translates voice calls between different languages in realtime, turning English to Spanish and Spanish back into English on the fly. Skype plans to incrementally add support for more than 40 languages, promising nothing short of a universal translator for desktops and mobile devices. The product of more than a decade of dedicated research and development by Microsoft Research (Microsoft acquired Skype in 2011), Skype Translator does what several other Silicon Valley icons not to mention the U.S. Department of Defense have not yet been able to do. To do so, Microsoft Research (MSR) had to solve some major machine learning problems while pushing technologies like deep neural networks into new territory. http://www.popsci.com/how-microsofts-machine-learning-breaking-language-barrier RESEARCH PAPERS, in English Cambridge University Publications page http://mlg.eng.cam.ac.uk/pub/ arXiv.org by Cornell University Library Open access to 999,848 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics http://arxiv.org Google Scholar Stand on the shoulders of giants. Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research. http://scholar.google.com/intl/en/scholar/about.html http://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=machine learning&before author=m83- 28PAAAJ&astart=0 Google Research Google publishes hundreds of research papers each year. Publishing is important to us; it enables us to collaborate and share ideas with, as well as learn from, the broader scientific community. Submissions are often made stronger by the fact that ideas have been tested through real product implementation by the time of publication. http://research.google.com/pubs/papers.html Yahoo Research The machine learning group is a team of experts in computer science, statistics, mathematical optimization, and automatic control. They focus on making computers learn abstractions, patterns, conditional probability distributions, and policies from web scale data with the goal to improve the online experience for Yahoo! users, partner publishers, and advertisers. Machine learning has such a broad influence on the internet, it can be quite difficult to recognize. Machine learning’s benefits are often hidden they are the spam emails you don’t see, the uninteresting news articles you don’t see, and the irrelevant search results you don’t see, just to name a new. Machine learning is one of the best technologies we have for solving some of the biggest problems on the Web. http://labs.yahoo.com/areas/?areas=machine-learning Microsoft Research The Machine Learning Groups of Microsoft Research include a set of researchers and developers who push the state of the art in machine learning. We span the space from proving theorems about the math underlying ML, to creating new ML systems and algorithms, to helping our partner product groups apply ML to large and complex data sets. http://research.microsoft.com/en-us/groups/mldept/ Journal from MIT Press The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. http://jmlr.org DROPS, Dagstulh Research Online Publication Server Access to Research Papers http://drops.dagstuhl.de/opus/ OPEN SOURCE SOFTWARE, in English Weka 3: Data Mining Software in Java Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. http://www.cs.waikato.ac.nz/~ml/weka/index.html A deep-learning library for Java Distributed Deep Learning Platform for Java https://github.com/deeplearning4j/deeplearning4j List of Java ML Software by Machine Learning Mastery http://machinelearningmastery.com/java-machine-learning/ List of Java ML Software by MLOSS http://mloss.org/software/language/java/ MathFinder: Math API Discovery and Migration, Software Engineering and Analysis Lab (SEAL), IISc Bangalore MathFinder is an Eclipse plugin supported by a unit test mining backend for discovering and migrating math APIs. It is intended to make (re)implementing math algorithms in Java easier. Given a math expressions (see the syntax below), it returns a pseudo-code involving calls to suitable Java APIs. At present, it supports programming tasks that require use of matrix and linear algebra APIs. The underlying technique is however general and can be extended to support other math domains. http://www.iisc-seal.net/mathfinder Google Java Style http://google-styleguide.googlecode.com/svn/trunk/javaguide.html JSAT: java-statistical-analysis-tool by Edward Raff JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made available for use under the GPL 3. Part of the library is for self education, as such - all code is self contained. JSAT has no external dependencies, and is pure Java. I also aim to make the library suitably fast for small to medium size problems. As such, much of the code supports parallel execution. https://github.com/EdwardRaff/JSAT/tree/master Theano Library for Deep Learning, Python Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features: • Use numpy.ndarray in Theano-compiled tight integration with NumPy functions. • transparent use of a GPU with CPU.(float32 only) Perform data-intensive calculations up to 140x faster than • efficient symbolic differentiation one or many inputs. • speed and stability optimizations Theano does your derivatives for function with Get the right answer for log(1+x) even when x is really tiny. • dynamic C code generation Evaluate expressions faster. • extensive unit-testing and self-verification mistake. Detect and diagnose many types of Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). http://deeplearning.net/software/theano/ http://nbviewer.ipython.org/github/craffel/theano-tutorial/blob/master/Theano %20Tutorial.ipynb Theano and LSTM for Sentiment Analysis by Frederic Bastien, Universite de Montreal https://github.com/StartupML/Bastien-Theano-Workshop Introduction to Deep Learning with Python Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning with Python and the Theano library. The emphasis of the talk is on high performance computing, natural language processing using recurrent neural nets, and large scale learning with GPUs. https://www.youtube.com/watch?v=S75EdAcXHKk COURSERA: An Introduction to Interactive Programming in Python (Part 1) Part of the Fundamentals of Computing Specialization » About the Course This two-part course (part 2 is available here) is designed to help students with very little or no computing background learn the basics of building simple interactive applications. Our language of choice, Python, is an easy-to learn, high-level computer language that is used in many of the computational courses offered on Coursera. To make learning Python easy, we have developed a new browser-based programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard and the mouse. The primary method for learning the course material will be to work through multiple "miniprojects" in Python. To make this class enjoyable, these projects will include building fun games such as Pong, Blackjack, and Asteroids. When you’ve finished our course, we can’t promise that you will be a professional programmer, but we think that you will learn a lot about programming in Python and have fun while you’re doing it. https://www.coursera.org/course/interactivepython1 COURSERA: An Introduction to Interactive Programming in Python (Part 2) Part of the Fundamentals of Computing Specialization » https://www.coursera.org/course/interactivepython2 COURSERA: Programming for Everybody (Python) About the Course This course is specifically designed to be a first programming course using the popular Python programming language. The pace of the course is designed to lead to mastery of each of the topics in the class. We will use simple data analysis as the programming exercises through the course. Understanding how to process data is valuable for everyone regardless of your career. This course might kindle an interest in more advanced programming courses or courses in web design and development or just provide skills when you are faced with a bunch of data that you need to analyze. You can do the programming assignments for the class using a web browser or using your personal computer. All required software for the course is free. https://www.coursera.org/course/pythonlearn Udacity - Programming foundations with Python You’ll pick up some great tools for your programming toolkit in this course! You will: • Start coding in the programming language Python; • Reuse and share code with Object Oriented Programming; • Create and share amazing, life-hacking projects! https://www.udacity.com/course/programming-foundations-with-python--ud036 Scikit-learn, Machine Learning in Python Simple and efficient tools for data mining and data analysis Accessible to everybody, and reusable in various contexts Built on NumPy, SciPy, and matplotlib Open source, commercially usable - BSD license http://scikit-learn.org/stable/index.html Pydata PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization. https://www.youtube.com/user/PyDataTV/videos PyData NYC 2014 Videos https://www.youtube.com/user/PyDataTV/videos?spfreload=10 PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization. We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person. A major goal of the conference is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems. http://pydata.org/nyc2014/about/about/ PyData, The Complete Works by Rohit Sivaprasad The unofficial index of all PyData talks. This was intially going to be a pickled pandas DataFrame object, but then I decided against it. So here it is - in beautiful Github flavored markdown. There are placeholders for links to the video. Currently, the hyperlinks point to the pydata.org talk pages. Please do feel free to make it better by contributing to the repo. https://github.com/DataTau/datascience-anthology-pydata Anaconda Completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing We want to ensure that Python, NumPy, SciPy, Pandas, IPython, Matplotlib, Numba, Blaze, Bokeh, and other great Python data analysis tools can be used everywhere. We want to make it easier for Python evangelists and teachers to promote the use of Python. We want to give back to the Python community that we love being a part of. https://store.continuum.io/cshop/anaconda/ Ipython Interactive Computing IPython provides a rich architecture for interactive computing with: Powerful interactive shells (terminal and Qt-based). A browser-based notebook with support for code, rich text, mathematical expressions, inline plots and other rich media. Support for interactive data visualization and use of GUI toolkits. Flexible, embeddable interpreters to load into your own projects. Easy to use, high performance tools for parallel computing. http://ipython.org/ Scipy SciPy refers to several related but distinct entities: • The SciPy Stack, a collection of open source software for scientific computing in Python, and particularly a specified set of core packages. • The community of people who use and develop this stack. • Several conferences dedicated to scientific computing in Python - SciPy, EuroSciPy and SciPy.in. The SciPy library, one component of the SciPy stack, providing many numerical routines. http://www.scipy.org/ Numpy NumPy is the fundamental package for scientific computing with Python. It contains among other things: • a powerful N-dimensional array object • sophisticated (broadcasting) functions • tools for integrating C/C • useful linear algebra, Fourier transform, and random number capabilities and Fortran code Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. http://www.numpy.org/ matplotlib matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits. http://matplotlib.org/ pandas Python Data Analysis Library¶ pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. http://pandas.pydata.org/ SymPy SymPy is a Python library for symbolic mathematics. http://www.sympy.org/en/index.html Orange Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Add-ons for bioinformatics and text mining. Packed with features for data analytics. http://orange.biolab.si/ Pythonic Perambulations: How to be a Bayesian in Python Below I'll explore three mature Python packages for performing Bayesian analysis via MCMC: emcee: the MCMC Hammer pymc: Bayesian Statistical Modeling in Python pystan: The Python Interface to Stan http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-inpython/ emcee emcee is an extensible, pure-Python implementation of Goodman & Weare's Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler. It's designed for Bayesian parameter estimation and it's really sweet! http://dan.iel.fm/emcee/current/ PyMC PyMC is a python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems. Along with core sampling functionality, PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics. http://pymc-devs.github.io/pymc/ Pylearn2 Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Frédéric Bastien, and Yoshua Bengio. "Pylearn2: a machine learning research library". arXiv preprint arXiv:1308.4214 (BibTeX) https://github.com/lisa-lab/pylearn2 PyCon US 2014 PyCon is the largest annual gathering for the community using and developing the open-source Python programming language. It is produced and underwritten by the Python Software Foundation, the 501(c)(3) nonprofit organization dedicated to advancing and promoting Python. Through PyCon, the PSF advances its mission of growing the international community of Python programmers. Because PyCon is backed by the non-profit PSF, we keep registration costs much lower than comparable technology conferences so that PyCon remains accessible to the widest group possible. The PSF also pays for the ongoing development of the software that runs PyCon and makes it available under a liberal open source license. 140 videos http://pyvideo.org/category/50/pycon-us-2014 https://www.youtube.com/user/PyCon2014/videos PyCon India 2012 https://www.youtube.com/playlist?list=PL6GW05BfqWIdWaV aP6kHJKFY0ybOOfoA PyCon India 2013 https://www.youtube.com/playlist?list=PL6GW05BfqWIdsaaV35jcHWPWTI-DAw6Yn Montreal Python Montréal-Python's mission is to promote the growth of a lively and dynamic community of users of the Python programming language and to promote the use of the latter. Montréal-Python also aims to disseminate the local Python knowledge to build a stronger developer community. Montréal-Python promotes Free and Open Source Software, favors its adoption within the community, and collaborates with community players to achieve this goal. https://www.youtube.com/user/MontrealPython/videos http://montrealpython.org/en/ SciPy 2014 SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. http://pyvideo.org/category/51/scipy-2014 PyLadies London Meetup resources PyLadies is an international mentorship group with a focus on helping more women and genderqueers become active participants and leaders in the Python open-source community. Our mission is to promote, educate and advance a diverse Python community through outreach, education, conferences, events, and social gatherings. PyLadies also aims to provide a friendly support network for women and genderqueers, and a bridge to the larger Python world. https://github.com/pyladieslondon/resources Python Tools for Machine Learning by CB Insights https://www.cbinsights.com/blog/python-tools-machine-learning/ Python Tutorials by Jessica MacKellar I am a startup founder, software engineer, and open source developer living in San Francisco, California. I enjoy the Internet, networking, low-level systems engineering, relational databases, tinkering on electronics projects, and contributing to and helping other people contribute to open source software. "Be the change you wish to see in the world" may be clichéd, but what can I say, I believe in it. I am committed to applying my skills, in individual and collective efforts, to improve the world. Right now, this means I spend a lot of time volunteering, engaging technologists about education, and empowering effective people and initiatives in my capacity as a Director for the Python Software Foundation. http://web.mit.edu/jesstess/ INTRODUCTION TO PYTHON FOR DATA MINING http://nbviewer.ipython.org/github/Syrios12/learningwithdata/blob/master/ Python For Data Mining.ipynb Python Scientific Lecture Notes Tutorial material on the scientific Python ecosystem, a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert. http://scipy-lectures.github.io/index.html# Notebook Gallery: Links to the best IPython and Jupyter Notebooks by ? What is this website ? This website is a collection of links to IPython/Jupyter notebooks. Contrary to other galleries (such as the one on nbviewer and the wakari gallery), this collection is continuously updated with notebooks submitted by users. It also uses the twitter API to fetch new notebooks daily. Please note that this website does not contain nor host any notebooks, only offers links to relevant notebooks. Why did you make this website ? Have you seen the amazing stuff people are making with IPython/Jupyter notebooks ? It will blow your mind! So I needed a place where I could find more of these amazing notebooks. For now it's a simple website that displays the latests and most viewed Notebooks, however in the future I would like it to have searching and categorization features. Can I say something ? Sure!. I'd love to hear some feedback. If it's an issue with the website feel free to open an issue here. You can also email me at f@bianp.net. http://nb.bianp.net/sort/views/ Google Python Style Guide http://google-styleguide.googlecode.com/svn/trunk/pyguide.html Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper The NLTK book is currently being updated for Python 3 and NLTK 3. This is work in progress; chapters that still need to be updated are indicated. The first edition of the book, published by O'Reilly, is available at http://nltk.org/book_1ed/. A second edition of the book is anticipated in early 2016. 0. Preface 1. Language Processing and Python 2. Accessing Text Corpora and Lexical Resources 3. Processing Raw Text 4. Writing Structured Programs 5. Categorizing and Tagging Words (minor fixes still required) 6. Learning to Classify Text 7. Extracting Information from Text 8. Analyzing Sentence Structure 9. Building Feature Based Grammars 10. Analyzing the Meaning of Sentences (minor fixes still required) 11. Managing Linguistic Data (minor fixes still required) 12. Afterword: Facing the Language Challenge Bibliography Term Index This book is made available under the terms of the Creative Commons Attribution Noncommercial No-Derivative-Works 3.0 US License. Please post any questions about the materials to the nltk-users mailing list. Please report any errors on the issue tracker. http://www.nltk.org/book/ PyBrain Library Welcome to PyBrain PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-touse yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms. PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive "Backronym". http://pybrain.org/ Classifying MNIST dataset with Pybrain http://analyticsbot.ml/2015/02/classifying-mnist-dataset-pybrain/ OCTAVE GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable. http://www.gnu.org/software/octave/ PMTK Toolbox by Matt Dunham, Kevin Murphy PMTK is a collection of Matlab/Octave functions, written by Matt Dunham, Kevin Murphy and various other people. The toolkit is primarily designed to accompany Kevin Murphy's textbook Machine learning: a probabilistic perspective, but can also be used independently of this book. The goal is to provide a unified conceptual and software framework encompassing machine learning, graphical models, and Bayesian statistics (hence the logo). (Some methods from frequentist statistics, such as cross validation, are also supported.) Since December 2011, the toolbox is in maintenance mode, meaning that bugs will be fixed, but no new features will be added (at least not by Kevin or Matt). PMTK supports a large variety of probabilistic models, including linear and logistic regression models (optionally with kernels), SVMs and gaussian processes, directed and undirected graphical models, various kinds of latent variable models (mixtures, PCA, HMMs, etc) , etc. Several kinds of prior are supported, including Gaussian (L2 regularization), Laplace (L1 regularization), Dirichlet, etc. Many algorithms are supported, for both Bayesian inference (including dynamic programming, variational Bayes and MCMC) and MAP/ML estimation (including EM, conjugate and projected gradient methods, etc.) https://github.com/probml/pmtk3 Octave Tutorial by Paul Nissenson I was born, raised, and educated in Orange County, California. I figure that everyone around the world wants to come here, so why should I leave? I received my B.S. in Physics from the University of California, Irvine (UCI) in 2003. Not knowing what to do next, I decided to further my education at UCI by attending graduate school in Mechanical & Aerospace Engineering. My research focused on computer modeling of systems that are related to the atmosphere. I was fortunate to work under my very supportive advisor, Dr. Donald Dabdub, and work with a lot of good collaborators. During graduate school, I was a teaching assistant many times and found my true calling. Research had its ups and downs, but teaching was always fun for me. After I received my Ph.D. in 2009, I decided to follow my heart and pursue a faculty position at a primarily undergraduate university. After being a post-doctoral researcher at UCI for a couple years, I was hired as an Assistant Professor in the Mechanical Engineering Department at Cal Poly Pomona in Fall 2011. https://www.youtube.com/channel/UCr-6gDvh0atAFM4VuYq7PHw/videos?spfreload=10 http://www.cpp.edu/~pmnissenson/ JULIA Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, largely written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing. In addition, the Julia developer community is contributing a number of external packages through Julia’s built-in package manager at a rapid pace. IJulia, a collaboration between the IPython and Julia communities, provides a powerful browser-based graphical notebook interface to Julia. Julia programs are organized around multiple dispatch; by defining functions and overloading them for different combinations of argument types, which can also be user-defined. For a more in-depth discussion of the rationale and advantages of Julia over other systems, see the following highlights or read the introduction in the online manual. http://julialang.org/ Julia by example by Samuel Colvin http://samuelcolvin.github.io/JuliaByExample/ https://github.com/samuelcolvin The R PROJECT for Statistical Computing R is a language and environment for statistical computing and graphics… R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. https://www.r-project.org/ Coursera: R Programming Part of the Data Science Specialization » About the Course In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples. https://www.coursera.org/course/rprog R Graph Gallery The blog is a collection of script examples with example data and output plots. R produce excellent quality graphs for data analysis, science and business presentation, publications and other purposes. Self-help codes and examples are provided. Enjoy nice graphs !! http://rgraphgallery.blogspot.co.uk/2013/04/ploting-heatmap-in-map-using-maps.html Code School - R Course Learn the R programming language for data analysis and visualization. This software programming language is great for statistical computing and graphics. https://www.codeschool.com/courses/try-r Coursera R programming In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples. https://www.coursera.org/course/rprog Open Intro R Labs OpenIntro Labs promote the understanding and application of statistics through applied data analysis. The statistical software R is a widely used and stable software that is free. RStudio is a user-friendly interface for R. https://www.openintro.org/stat/labs.php R Tutorial • Hierarchical Linear Model • Bayesian Classification with Gaussian Process • Bayesian Inference Using OpenBUGS • Significance Test for Kendall's Tau-b • Support Vector Machine with GPU, Part II • Hierarchical Cluster Analysis http://www.r-tutor.com/ DataCamp R Course • Introduction to R • Data Analysis and Statistical Inference • Introduction to Computational Finance and Financial Econometrics • How to work with Quandl in R https://www.datacamp.com/courses R Bloggers R-Bloggers.com is a central hub (e.g: A blog aggregator) of content collected from bloggers who write about R (in English). The site will help R bloggers and users to connect and follow the “R blogosphere” (you can view a 7 minute talk, from useR2011, for more information about the Rblogosphere). http://www.r-bloggers.com/ R-Project Package: caret: Classification and Regression Training Misc functions for training and plotting classification and regression models https://cran.r-project.org/web/packages/caret/index.html A Short Introduction to the caret Package by Max Kuhn The caret package (short for classification and regression training) contains functions to streamline the model training process for complex regression and classification problems. The package utilizes a number of R packages but tries not to load them all at package start-up1. The package “suggests” field includes 26 packages. caret loads packages as needed and assumes that they are installed. https://cran.r-project.org/web/packages/caret/vignettes/caret.pdf R packages by Hadley Wickham Style guide Good coding style is like using correct punctuation. You can manage without it, but it sure makes things easier to read. As with styles of punctuation, there are many possible variations. The following guide describes the style that I use (in this book and elsewhere). It is based on Google’s R style guide, with a few tweaks. You don’t have to use my style, but you really should use a consistent style. Good style is important because while your code only has one author, it’ll usually have multiple readers. This is especially true when you’re writing code with others. In that case, it’s a good idea to agree on a common style up-front. Since no style is strictly better than another, working with others may mean that you’ll need to sacrifice some preferred aspects of your style. http://r-pkgs.had.co.nz/style.html Google's R Style Guide http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml STAN Software Stan is a probabilistic programming language implementing full Bayesian statistical inference with • MCMC sampling (NUTS, HMC) • and penalized maximum likelihood estimation with • Optimization (BFGS) • Stan is coded in C • Stan is freedom-respecting, open-source software (new BSD core, GPLv3 interfaces). and runs on all major platforms (Linux, Mac, Windows). Interfaces Download and getting started instructions, organized by interface: • RStan v2.5.0 (R) • PyStan v2.5.0 (Python) • CmdStan v2.5.0 (shell, command-line terminal) • MatlabStan (MATLAB) • Stan.jl (Julia) http://mc-stan.org/ List of Machine Learning Open Source Software To support the open source software movement, JMLR MLOSS publishes contributions related to implementations of non-trivial machine learning algorithms, toolboxes or even languages for scientific computing. http://jmlr.org/mloss/ Google Prediction API Google's cloud-based machine learning tools can help analyze your data to add the following features to your applications: Customer sentiment analysis, Message routing decisions, Document and email classification, Recommendation systems, Churn analysis, Spam detection, Upsell opportunity analysis, Diagnostics, Suspicious activity identification, and much more …Free Quota: Usage is free for the first six months, up to the following limits per Google Developers Console project. This free quota applies even when billing is enabled, until the six-month expiration time. Usage limits: Predictions: 100 predictions/day Hosted model predictions: Hosted models have a usage limit of 100 predictions/day/user across all models. Training: 5MB trained/day Streaming updates: 100 streaming updates/day Lifetime cap: 20,000 predictions. Expiration: Free quota expires six months after activating Google Prediction for your project in the Google Developers Console. https://cloud.google.com/prediction/docs Reddit Reddit / rɛdɪt/,[3] stylized as reddit,[4] is an entertainment, social networking service and news website where registered community members can submit content, such as text posts or direct links. Only registered users can then vote submissions "up" or "down" to organize the posts and determine their position on the site's pages. Content entries are organized by areas of interest called "subreddits". (source Wikipedia) http://www.reddit.com/r/MachineLearning/ SCHOGUN toolbox A large scale machine learning toolbox. SHOGUN is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. http://www.shogun-toolbox.org/page/home/ Comparison between ML toolbox https://docs.google.com/spreadsheets/d/ 1bclw5Nq2jwuOuqsBbwe9fjARkxcr50gWyklCL3r1P-4/edit?hl=en%22 %5Cl %22gid %3D0&pli=1#gid=0 Infer.NET, Microsoft Research Infer.NET is a framework for running Bayesian inference in graphical models. It can also be used for probabilistic programming as shown in this video. You can use Infer.NET to solve many different kinds of machine learning problems, from standard problems like classification or clustering through to customised solutions to domainspecific problems. Infer.NET has been used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision, and many others. A new feature in Infer.NET 2.5 is Fun, a library turns the simple succinct syntax of F# into a probabilistic modeling language for Bayesian machine learning. You can run your models with F# to compute synthetic data, and you can compile your models with the Infer.NET compiler for efficient inference. See the Infer.NET Fun website for additional information. http://research.microsoft.com/en-us/um/cambridge/projects/infernet/default.aspx F# Software Foundation F# is ideally suited to machine learning because of its efficient execution, succinct style, data access capabilities and scalability. F# has been successfully used by some of the most advanced machine learning teams in the world, including several groups at Microsoft Research. Try F# has some introductory machine learning algorithms. Further resources related to different aspects of machine learning are below. See also the Math and Statistics and Data Science sections for related material. http://fsharp.org/guides/machine-learning/index.html BigML Now Free Unlimited tasks (up to 16MB/Task) https://bigml.com/ BRML Toolbox in Matlab/Julia – David Barber Toolbox, University College London http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.Software SCILAB Scilab is free and open source software for numerical computation providing a powerful computing environment for engineering and scientific applications. Scilab includes hundreds of mathematical functions. It has a high level programming language allowing access to advanced data structures, 2-D and 3-D graphical functions. http://www.scilab.org/en/scilab/about OverFeat and Torch7, CILVR Lab @ NYU OverFeat is an image recognizer and feature extractor built around a convolutional network. The OverFeat convolutional net was trained on the ImageNet 1K dataset. It participated in the ImangeNet Large Scale Recognition Challenge 2013 under the name “OverFeat NYU”. This release provides C/C code to run the network and output class probabilities or feature vectors. It also includes a webcam-based demo. Torch7 is an interactive development environment for machine learning and computer vision. It is an extension of the Lua language with a multidimensional numerical array library. Lua is a very simple, compact and efficient interpreter/compiler with a straightforward syntax. It is used widely as a scripting language in the computer game industry. Torch extends Lua with an extensive numerical library and various facilities for machine learning and computer vision. Torch has computational back-ends for multicore/multi-CPU machines (using Intel/AVX and OpenMP), NVidia GPUs (using CUDA), and ARM CPUs (using the Neon instruction set). Many research projects at the CILVR Lab are built with Torch. http://cilvr.nyu.edu/doku.php?id=code:start FAIR open sources deep-learning modules for Torch https://research.facebook.com/blog/879898285375829/fair-open-sources-deep-learningmodules-for-torch/ IPython kernel for Torch with visualization and plotting https://github.com/facebook/iTorch Deep Learning Lecture 9: Neural networks and modular design in Torch by Nando de Freitas, Oxford University https://www.youtube.com/watch?v=NUKp0c4xb8w&spfreload=10 https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/ Deep Learning Lecture 8: Modular back-propagation, logistic regression and Torch Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from Brendan Shillingford. https://www.youtube.com/watch?v=-YRB0eFxeQA&spfreload=10 Machine Learning with Torch7: Defining your own Neural Net Module https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/practicals/practical4.pdf Lua Tutorial in 15 Minutes by Tyler Neylon http://tylerneylon.com/a/learn-lua/ http://www.lua.org/ Google: Punctuation, symbols & operators in search Use search operators to narrow down results Search operators are words that can be added to searches to help narrow down the results. Don’t worry about memorizing every operator - you can also use the Advanced Search page to create these searches. Note: When you search using operators, don't add any spaces between the operator and your search terms. A search for site:nytimes.com will work, but site: nytimes.com will not. https://support.google.com/websearch/answer/2466433?hl=en A discovery thanks to datatau.com's user called skadamat! Thanks! To search on a website a sequence of keywords, just type "site:nameOfTheSite keywords" on Google search Example: How to search "deep learning" on "The Machine Learning Salon" https://www.google.co.uk/?gws rd=ssl#q=site:machinelearningsalon.org deep learning&start=0 WolframAlpha Making the world’s knowledge computable Wolfram Alpha introduces a fundamentally new way to get knowledge and answers not by searching the web, but by doing dynamic computations based on a vast collection of builtin data, algorithms, and methods. http://www.wolframalpha.com/ http://www.wolframalpha.com/examples/?src=input Computation and the Future of Mathematics by Stephen Wolfram, Oxford's Podcast Duration: 0:51:50 Added: 15 Jan 2014 Stephen Wolfram, creator of Mathematica and Wolfram Alpha, gives a talk about the future of mathematics and computation. http://podcasts.ox.ac.uk/computation-and-future-mathematics Mloss.org Our goal is to support a community creating a comprehensive open source machine learning environment. Ultimately, open source machine learning software should be able to compete with existing commercial closed source solutions. To this end, it is not enough to bring existing and freshly developed toolboxes and algorithmic implementations to people's attention. More importantly the MLOSS platform will facilitate collaborations with the goal of creating a set of tools that work with one another. Far from requiring integration into a single package, we believe that this kind of interoperability can also be achieved in a collaborative manner, which is especially suited to open source software development practices. https://mloss.org/software/view/501/ Sourceforge Find, Create, and Publish Open Source Software for free http://sourceforge.net/directory/os:mac/freshness:recently-updated/?q=machine%20learning AForge.NET Framework AForge.NET is a C# framework designed for developers and researchers in the fields of Computer Vision and Artificial Intelligence - image processing, neural networks, genetic algorithms, machine learning, robotics, etc. http://www.aforgenet.com/ cuda-convnet High-performance C /CUDA implementation of convolutional neural networks This is a fast C /CUDA implementation of convolutional (or more generally, feed-forward) neural networks. It can model arbitrary layer connectivity and network depth. Any directed acyclic graph of layers will do. Training is done using the back-propagation algorithm. Fermi-generation GPU (GTX 4xx, GTX 5xx, or Tesla equivalent) required. https://code.google.com/p/cuda-convnet/ word2vec Tool for computing continuous distributed representations of words. This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. https://code.google.com/p/word2vec/ Open Machine Learning Workshop organized by Alekh Agarwal, Alina Beygelzimer, and John Langford, August 2014 The goal of this workshop is to inform people about open source machine learning systems being developed, aid the coordination of such projects, and discuss future plans. http://hunch.net/~nyoml/ Maxim Milakov Software I am a researcher in machine learning and high-performance computing. I designed and implemented nnForge - a library for training convolutional and fully connected neural networks, with CPU and GPU (CUDA) backends. You will find my thoughts on convolutional neural networks and the results of applying convolutional ANNs for various classification tasks in the Blog. http://www.milakov.org/ Alfonso Nieto-Castanon Software http://www.alfnie.com/software Lib Skylark The Sketching based Matrix computations for Machine Learning is a library for matrix computations suitable for general statistical data analysis and optimization applications. Many tasks in machine learning and statistics ultimately end up being problems involving matrices: whether you're finding the key players in the bitcoin market, or inferring where tweets came from, or figuring out what's in sewage, you'll want to have a toolkit for least-squares and robust regression, eigenvector analysis, non-negative matrix factorization, and other matrix computations. Sketching is a way to compress matrices that preserves key matrix properties; it can be used to speed up many matrix computations. Sketching takes a given matrix A and produces a sketch matrix B that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem with input B, the solution will also be pretty good for input A. For some problems, sketches can also be used to get faster ways to find high-precision solutions to the original problem. In other cases, sketches can be used to summarize the data by identifying the most important rows or columns. A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where each row (and/or column) is equally likely to be sampled. This uniform sampling is quick and easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods that do yield good sketches. http://xdata-skylark.github.io/ Mutual Information Text Explorer The Mutual information Text Explorer is a tool that allows interactive exploration of text data and document covariates. See the paper or slides for information. Currently, an experimental system is available. http://brenocon.com/mte/ Data Science Resources by Jonathan Bower on GitHub This repo is intended to provide open source resources to facilitate learning or to point practicing/aspiring data scientists in the right direction. It also exists so that I can keep track of resources that are/were helpful to me and hopefully for you. I aim to cover the full spectrum of data science and to hopefully include topics of data science that aren't either actively covered or easy to find in the open-source world. For instance, I haven't focused on in-depth machine learning theory since that is well covered. If you are looking for ML theory I would look to some of the online courses, books or bootcamps. There is a lot of theory information available online, some is linked lower on this page here, here and other info is available with many purchasable books. Keep in mind that this is a constant work in progress. If you have anything to add, any feedback, or would like to be a contributor - please reach out. If there are any mistakes or typos, be patient with me, but please let me know. Lastly, I would add that a large portion of data science is exploratory data analysis and properly cleaning your data to implement the tools and theory necessary to solve the problem at hand. For each problem there are many different ways and tools to execute a successful solution - if one method isn't working re-evaluate, re-work the problem, try another approach and/or reach out to the community for support. Good luck and I hope this repo helpful! https://github.com/jonathan-bower/DataScienceResources Joseph Misiti Blog A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. Other awesome lists can be found in the awesome-awesomeness list. https://github.com/josephmisiti/awesome-machine-learning Michael Waskom GitHub repositories I'm a Ph.D. student in the Department of Psychology at Stanford University, where I work with Anthony Wagner. I use behavioral, computational, and neuroimaging methods to study cognitive control and decision making in humans. Previously, I spent time in John Gabrieli's lab at MIT investigating whether cognition can be improved through training. I did my undergrad at Amherst College, where I studied philosophy and neuroscience. Complementing this research, I have developed a set of software libraries for statistical analysis and visualization. These libraries aim to make computationally-based research more reproducible and improve the visual presentation of statistical and neuroimaging results. https://github.com/mwaskom Visualizing distributions of data This notebook demonstrates different approaches to graphically representing distributions of data, specifically focusing on the tools provided by the seaborn package. https://github.com/mwaskom/seaborn Exploring Seaborn and Pandas based plot types in HoloViews by Philipp John Frederic Rudiger In this notebook we'll look at interfacing between the composability and ability to generate complex visualizations that HoloViews provides and the great looking plots incorporated in the seaborn library. Along the way we'll explore how to wrap different types of data in a number of Seaborn View types, including: - Distribution Views - Bivariate Views - TimeSeries Views Additionally we explore how a Pandas dframe can be wrapped in a general purpose View type, which can either be used to convert the data into standard View types or be visualized directly using a wide array of plotting options, including: - Regression plots, correlation plots, box plots, autocorrelation plots, scatter matrices, histograms or regular scatter or line plots. http://philippjfr.com/blog/seabornviews/ "Machine Learning: An Algorithmic Perspective" Code by Stephen Marsland Remark: I couldn’t open Stephen Marsland Home page. http://www.amazon.com/Machine-Learning-Algorithmic-Perspective-Recognition/dp/ 1466583282 http://www.briolat.org/assets/R/classif/Machine%20learning%20an%20algorithmic %20perspective(2009).pdf Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!) https://github.com/rasbt http://sebastianraschka.com/ Open Source Hong Kong Open Source Hong Kong (OSHK) is a developer/contributor/user community about open source software and technology. http://opensource.hk/ Lamda Group, Nanjing University Open Source Software http://lamda.nju.edu.cn/Default.aspx? Page=Data&NS=&AspxAutoDetectCookieSupport=1#code GATE, General Architecture for Text Engineering GATE is... ¥ open source software capable of solving almost any text processing problem ¥ a mature and extensive community of developers, users, educators, students and scientists ¥ a defined and repeatable process for creating robust and maintainable text processing workflows ¥ in active use for all sorts of language processing tasks and applications, including: voice of the customer; cancer research; drug research; decision support; recruitment; web mining; information extraction; semantic annotation ¥ the result of a €multi-million R&D programme running since 1995, funded by commercial users, the EC, BBSRC, EPSRC, AHRC, JISC, etc. ¥ used by corporations, SMEs, research labs and Universities worldwide ¥ the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, the ISO 9001 of Text Mining ¥ a world-class team of language processing developers ¥ If you need to solve a problem with text analysis or human language processing you're in the right place. https://gate.ac.uk/ CLARIN, Common Language Resources and Technology Infrastructure CLARIN is the Common Language Resources and Technology Infrastructure, which aims to provide easy and sustainable access for scholars in the humanities and social sciences to digital language data (in written, spoken, video or multimodal form), and advanced tools to discover, explore, exploit, annotate, analyse or combine them, wherever they are located. CLARIN is building a networked federation of European data repositories, service centres and centres of expertise, with single sign-on access for all members of the academic community in all participating countries. Tools and data from different centres will be interoperable, so that data collections can be combined and tools from different sources can be chained to perform complex operations to support researchers in their work. At this moment the CLARIN infrastructure is still under construction, but a number of participating centres are already offering access services to data, tools and expertise. On the services page we show the services accessible at this moment and we explain how and by whom the various services can be accessed. http://www.clarin.eu/ FLaReNet, Fostering Language Resources Network A major condition for the take-off of the field of Language Resources and Language Technologies is the creation of a shared policy for the next years. FLaReNet aims at developing a common vision of the area and fostering a European strategy for consolidating the sector, thus enhancing competitiveness at EU level and worldwide. By creating a consensus among major players in the field, the mission of FLaReNet is to identify priorities as well as short, medium, and long-term strategic objectives and provide consensual recommendations in the form of a plan of action for EC, national organisations and industry. Through the exploitation of new collaborative modalities as well as workshops and meetings, FLaReNet will sustain international cooperation and (re)create a wide Language community. http://www.flarenet.eu/ My Data Science Resources by Viktor Shaumann With seemingly infinite amount of Data Science resources available online, it is very easy to get lost. I compiled a collection of practical resources I found to be the most useful on my path of learning Data Science. This list is continuosly updated with new material. https://github.com/vshaumann/My-Data-Science-Resources MISCELLANEOUS Overleaf (ex WriteLaTeX) About Overleaf Overleaf is a collaborative writing and publishing system that makes the whole process of producing academic papers much quicker for both authors and publishers. Overleaf is a free service that lets you create, edit and share your scientific ideas easily online using LaTeX, a comprehensive and powerful tool for scientific writing. Overleaf has grown rapidly since its launch in 2011, and today there are over 150,000 users from over 180 countries worldwide who've created over 2 millions projects using the service. Writelatex Limited, the company behind Overleaf, was founded by John Hammersley and John Lees-Miller, two mathematicians who worked together on the pioneering Ultra PRT Project and who were inspired by their own experiences in academia to create a better solution for collaborative scientific writing. Overleaf is supported by Digital Science. Digital Science is a technology company serving the needs of scientific research. Their mission is to provide software that makes research simpler, so there’s more time for discovery. Whether at the bench or in a research setting, their range of products help to simplify workflows and change the way science is done. Digital Science believes passionately that tomorrow's research will be different and better than today's. Their portfolio brands include Altmetric, Labguru, Figshare, ReadCube, ÜberResearch, BioRAFT and Symplectic. Digital Science is a business division of Macmillan Science and Education. https://www.overleaf.com/2070900jhqnyz#/5252162/ Interview of Dr John Lees-Miller by Imperial College London ACM Student Chapter https://www.youtube.com/watch?v=kYkN0Yv56bI&spfreload=10 LISA Lab GitHub repository, Université de Montréal https://github.com/lisa-lab MILA, Institut des algorithmes d'apprentissage de Montréal, Montreal Institute for Learning Algorithms http://www.mila.umontreal.ca/ Vowpal Wabbit GitHub repository by John Langford Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. https://github.com/JohnLangford/vowpal wabbit http://hunch.net/~vw/ Google-styleguide: Style guides for Google-originated opensource projects Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style. “Style” covers a lot of ground, from “use camelCase for variable names” to “never use global variables” to “never use exceptions.” This project holds the style guidelines we use for Google code. If you are modifying a project that originated at Google, you may be pointed to this page to see the style guides that apply to that project. https://github.com/google/styleguide BIG DATA/CLOUD COMPUTING, in English Apache Spark Machine Learning Library MLlib is a Spark implementation of some common machine learning (ML) functionality, as well associated tests and data generators. MLlib currently supports four common types of machine learning problem settings, namely, binary classification, regression, clustering and collaborative filtering, as well as an underlying gradient descent optimization primitive. http://spark.apache.org/docs/0.9.1/mllib-guide.html Ampcamp, Big Data Boot Camp AMP Camp 5 was held at UC Berkeley and live-streamed online on November 20 and 21, 2014. Videos and exercises from the event are available on the AMPCamp 5 page. AMP Camps are Big Data training events organized by the UC Berkeley AMPLab about big data analytics, machine learning, and popular open-source software projects produced by the AMPLab. All AMP Camp curricula, and whenever possible videos of instructional talks presented at AMP Camps, are published here and accessible for free. About the AMPLab The UC Berkeley AMPLab works at the intersection of machine learning, cloud computing, and crowdsourcing; integrating Algorithms, Machines, and People (AMP) to make sense of Big Data. http://ampcamp.berkeley.edu/5/exercises/ Spark Summit 2013 Videos https://spark-summit.org/2013/#videos Spark Summit 2014 Videos https://spark-summit.org/2014/#videos Spark Summit 2015 Videos & Slides https://spark-summit.org/2015/ Spark Summit Training & Videos https://www.youtube.com/user/TheApacheSpark/playlists Databricks Videos Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache Spark. We’ve been working for the past six years on cutting-edge systems to extract value from Big Data. We believe that Big Data is a huge opportunity that is still largely untapped, and we’re working to revolutionize what you can do with it. Open Source Commitment Apache Spark is 100% open source, and at Databricks we are fully committed to maintaining this model. We believe that no computing platform will win in the Big Data space unless it is fully open source. Spark has one of the largest open source communities in Big Data, with over 200 contributors from 50 organizations. Databricks works closely with the community to maintain this momentum. https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q- UUbA/videos SF Scala & SF Bay Area Machine Learning, Joseph Bradley: Decision Trees on Spark Joseph talks about Machine Learning with Spark, focusing on the decision tree and (upcoming) random forest implementations in MLlib. Spark has been established as a natural platform for iterative ML algorithms, and trees provide a great example. This talk aims both to give insight into the underlying implementation and to highlight best practices for using MLlib. http://functional.tv/post/98342564544/sfscala-sfbaml-joseph-bradley-decision-trees-on-spark Slides https://speakerdeck.com/jkbradley/mllib-decision-trees-at-sf-scala-baml-meetup Apache Mahout ML library The Apache Mahout™ project's goal is to build a scalable machine learning library. Currently Mahout supports mainly three use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. https://mahout.apache.org/ Apache Mahout on Javaworld Enjoy machine learning with Mahout on Hadoop, 2014 Mahout brings the power of scalable processing to Hadoop's huge data sets http://www.javaworld.com/article/2241046/big-data/enjoy-machine-learning-with-mahout-onhadoop.html Know this right now about Hadoop, 2014 From core elements like HDFS and YARN to ancillary tools like Zookeeper, Flume, and Sqoop, here's your cheat sheet and cartography of the ever expanding Hadoop ecosystem. http://www.javaworld.com/article/2158789/data-storage/know-this-right-now-abouthadoop.html MapReduce programming with Apache Hadoop, 2008 Process massive data sets in parallel on large clusters http://www.javaworld.com/article/2077907/open-source-tools/mapreduce-programming-withapache-hadoop.html Hadoop Users Group UK Recordings from meetups of the UK Hadoop Users Group. These will be a combination of tech talks, panel sessions and other events that we run. https://www.youtube.com/channel/UCjo2p6jTA0joX8HoUeHFcDg?spfreload=10 Deeplearning4j Deeplearning4j is the first commercial-grade deep learning library written in Java. It is meant to be used in business environments, rather than as a research tool for extensive data exploration. Deeplearning4j is most helpful in solving distinct problems, like identifying faces, voices, spam or e-commerce fraud. Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration. By following its conventions, you get an infinitely scalable deep-learning architecture. The framework has a domain-specific language (DSL) for neural networks, to turn their multiple knobs. Deeplearning4j includes a distributed deep-learning framework and a normal deeplearning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce. The distributed framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models. By following the links at the bottom of each page, you will learn to set up, and train with sample data, several types of deep-learning networks. These include single- and multithread networks, Restricted Boltzmann machines, deep-belief networks and Stacked Denoising Autoencoders. For a quick introduction to neural nets, please see our overview. http://deeplearning4j.org/ Udacity opencourseware "Intro to Hadoop and MapReduce" Course Summary The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data. Why Take This Course? • How Hadoop fits into the world (recognize the problems it solves) • Understand the concepts of HDFS and MapReduce (find out how it solves the problems) • Write MapReduce programs (see how we solve the problems) • Practice solving problems on your own https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617 Storm Apache Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use! http://storm.apache.org/ http://storm.apache.org/documentation/Tutorial.html Scaling Apache Storm by Taylor Goetz http://www.slideshare.net/ptgoetz https://github.com/ptgoetz Michael Viogiatzis Blog How to spot first stories on Twitter using Storm As a first blog post, I decided to describe a way to detect first stories (a.k.a new events) on Twitter as they happen. This work is part of the Thesis I wrote last year for my MSc in Computer Science in the University of Edinburgh.You can find the document here. http://micvog.com/2013/09/08/storm-first-story-detection/ Prediction IO BUILD SMARTER SOFTWARE with Machine Learning PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery. https://prediction.io/ https://hacks.mozilla.org/2014/04/introducing-predictionio/ https://www.youtube.com/channel/UCN0jVSCIEh7eeuWXIuo316g PredictionIO tutorial - Thomas Stone - PAPIs.io '14 PredictionIO is an open source machine learning server for software developers to create predictive features. Traditionally, this included personalization, recommendation and content discovery in domains such as e-commerce and media. The latest version of PredictionIO will open our platform for many more use cases such as churn analysis, trend detection and more! Allowing developers to use the power of machine learning for any web and mobile app. We will also discuss the new software design pattern DASE for building machine learning engines on top of PredictionIO's scalable infrastructure. It's time to see what an open source community can build re-imaging software with machine learning. https://www.youtube.com/watch?v=zeGnILRIdUk&spfreload=10 Container Cluster Manager Kubernetes builds on top of Docker to construct a clustered container scheduling service. The goals of the project are to enable users to ask a Kubernetes cluster to run a set of containers. The system will automatically pick a worker node to run those containers on. As container based applications and systems get larger, some tools are provided to facilitate sanity. This includes ways for containers to find and communicate with each other and ways to work with and manage sets of containers that do similar work. When looking at the architecture of the system, we'll break it down to services that run on the worker node and services that play a "master" role. https://github.com/GoogleCloudPlatform/kubernetes?utm source Domino Data Labs Domino is a platform for modern data scientists using Python, R, Matlab, and more. Use our cloud-hosted infrastructure to securely run your code on powerful hardware with a single command without any changes to your code. If you have your own infrastructure, our Enterprise offering provides powerful, easy-to-use cluster management functionality behind your firewall. Special offer for The Machine Learning Salon's readers: Machine Learning Salon readers can get $50 worth of compute credits when they sign up for Domino. Domino lets you run your analyses on powerful cloud hardware in one step without any setup or changes to your code. Sign up here, or email support@dominoup.zendesk.com and tell them you are a Machine Learning Salon reader. https://www.dominodatalab.com/ Data Science Central Data Science Central is the industry's online resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central provides a community experience that includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology, tools and trends and industry job opportunities. http://www.datasciencecentral.com/ Amazon Web Services Videos https://www.youtube.com/user/AmazonWebServices/playlists Google Cloud Computing Videos https://cloud.google.com/docs/videos VLAB: Deep Learning: Intelligence from Big Data, Stanford Graduate School of Business https://www.youtube.com/watch?v=czLI3oLDe8M&spfreload=10 Machine Learning and Big Data in Cyber Security Eyal Kolman Technion Lecture https://www.youtube.com/watch?v=G2BydTwrrJk&spfreload=10 Chaire Machine Learning Big Data, Telecom Paris Tech (Videos in French) Télécom ParisTech a organisé les premières rencontres de la Chaire de recherche Machine Learning for Big data, le 26 novembre 2014, avec ses partenaires Fondation télécom, Criteo, PSA Peugeot Citroën, Safran. http://www.dailymotion.com/video/x2cti71 chaire-ml-big-data-premieres-rencontres school https://www.youtube.com/user/TelecomParisTech1/search?query=big data An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014 The past few years have seen a major change in computing systems, as growing data volumes and stalling processor speeds require more and more applications to scale out to distributed systems. Today, a myriad data sources, from the Internet to business operations to scientific instruments, produce large and valuable data streams. However, the processing capabilities of single machines have not kept up with the size of data, making it harder and harder to put to use. As a result, a grow- ing number of organizations not just web companies, but traditional enterprises and research labs need to scale out their most important computations to clusters of hundreds of machines. At the same time, the speed and sophistication required of data processing have grown. In addition to simple queries, complex algorithms like machine learning and graph analysis are becoming common in many domains. And in addition to batch processing, streaming analysis of new real-time data sources is required to let organizations take timely action. Future computing platforms will need to not only scale out traditional workloads, but support these new applications as well. This dissertation proposes an architecture for cluster computing systems that can tackle emerging data processing workloads while coping with larger and larger scales. Whereas early cluster computing systems, like MapReduce, handled batch processing, our architecture also enables streaming and interactive queries, while keeping the scalability and fault tolerance of previous systems. And whereas most deployed systems only support simple one-pass computations (e.g., aggregation or SQL queries), ours also extends to the multi-pass algorithms required for more complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike the specialized systems proposed for some of these workloads, our architecture allows these computations to be combined, enabling rich new applications that intermix, for example, streaming and batch processing, or SQL and complex analytics. We achieve these results through a simple extension to MapReduce that adds primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to efficiently capture a wide range of workloads. We implement RDDs in the open source Spark system, which we evaluate using both synthetic benchmarks and real user applications. Spark matches or exceeds the performance of specialized systems in many application domains, while offering stronger fault tolerance guarantees and allowing these workloads to be combined. We explore the generality of RDDs from both a theoretical modeling perspective and a practical perspective to see why this extension can capture a wide range of previously disparate workloads. http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf Big Data Requires Big Visions For Big Change | Martin Hilbert | TEDxUCL At the University of California, Davis, Martin thinks about the fundamental theories of how digitization affects society. During his 15 years at the United Nations Secretariat, Martin assisted governments to take advantage of the digital revolution. When the ‘big data’ age arrived, his research was the first to quantify the historical growth of how much technologically mediated information there actually is in the world. He is convinced that ‘big data’ is a huge opportunity for making the world a better place. After joining the faculty of the University of California, Davis, he had more time to think more deeply about the theoretical underpinning and fundamental limitations of the ‘big data’ revolution. When TEDxUCL asked him if there is a limit to the power of data, he answered with the fundamental limitation to all empirical science. The fundamental limit of ‘big data’ has to do with social change and how we envision the future. Luckily, the digital age also provides solutions for fine-tuning our future visions. Martin holds doctorates in Economics and Social Sciences, and in Communication, and has provided hands-on technical assistance to Presidents, government experts, legislators, diplomats, NGOs, and companies in over 20 countries. https://www.youtube.com/watch?v=UXef6yfJZAI&spfreload=10 http://www.martinhilbert.net/ Ethical Quandary in the Age of Big Data | Justin Grace | TEDxUCL This talk was given at a local TEDx event, produced independently of the TED Conferences. Data is now everywhere. The ‘internet era’ has now passed and we are entering the era of data. Data use and misuse can lead to both powerful positive change or disaster. Here I discuss the questions we should ask about data and present three case studies where organisations have generated controversy from their data practices. I finish by touching on what we can do to take ownership of our data. Justin is a freelance data scientist who has worked in academia, technology, healthcare and most recently digital media with the Guardian. He is passionate about all things data and understanding how its use and misuse shapes the world we live in and how this affects our relationships with organisations and each other. Justin is a freelance data scientist who has worked in academia, technology, healthcare and most recently digital media with the Guardian. He is passionate about all things data and understanding how its use and misuse shapes the world we live in and how this affects our relationships with organisations and each other. https://www.youtube.com/watch?v=mVZ78kdduyY&spfreload=10 Big Data & Dangerous Ideas | Daniel Hulme | TEDxUCL This talk was given at a local TEDx event, produced independently of the TED Conferences. This is an illumining and animated talk about how Data and Artificial Intelligence effect our every day lives. It provides a framework for anyone to understand data driven decision making process, and raises critical moral, ethical and legal questions that society needs to address to ensure that our rights are kept safe and that we safeguard our very own existence. Daniel is the Founder and CEO of Satalia (NPComplete Ltd), a spin-out of UCL that provides a unique algorithmic technology and professional services to solve industries data-driven decision problems. He is passionate about emerging technology and regularly speaks at events with interests in Algorithms, Optimisation, Analytics, Big Data and the Future Internet. Daniel has been awarded a Masters in Computer Science with Machine Learning and Doctorate in Computational Complexity from UCL. He is the Director of UCL Business Analytics MSc, and has Senior Researcher and Lecturing positions in Computer Science and Management Science at UCL and Pearson College. He is a Visiting Fellow of the Big Innovation Centre, and has advisory and executive positions across world-wide companies in the area of Education, Analytics, Big Data, Data-driven Decision Making and Open-Innovation. He holds an international Kauffman Global Entrepreneur Scholarship and actively promotes entrepreneurship and technology innovation across the globe. https://www.youtube.com/watch?v=tLQoncvCKxs&spfreload=10 http://www0.cs.ucl.ac.uk/staff/D.Hulme/ List of good free Programming and Data Resources, BITBOOTCAMP We are a group of data enthusiasts with years of experience working at leading financial companies on Wall Street. In Jan 2014, we started Bit Bootcamp: an intensive and immersive big data boot camp to spread the knowledge and to address the shortage of good talent in the industry. The motivation for the bootcamp comes from our own difficulties faced while we were trying to hire new talent. No matter now much money we threw at the problem, we could not find people with the right skills. Then we figured we might as well train them ourselves. http://www.bitbootcamp.com/#!resources/ctx5 BIG Data, Medical Imaging and Machine Intelligence by Professor H.R.Tizhoosh at the University of Waterloo This is a talk by Professor H.R.Tizhoosh at the University of Waterloo, Ontario, Canada (January 21, 2015). https://www.youtube.com/watch?v=Pkk6Lad2N5g&spfreload= Session 6: Science in the cloud: big data and new technology The way science is undertaken has changed dramatically in the past 10 15 years, and it is set to change even more in the coming decade. New technologies, such as online databases, virtual machines, cloud computing and machine learning are becoming commonplace. This session will explore such innovations and their role in maximising the scientific value from astronomy data, in particular from the next generation of telescopes and simulations. Chair Associate Professor Darren Croton Swinburne University of Technology https://www.youtube.com/watch?v=xHoMI1nC8 4&spfreload=10 MapReduce for C: Run Native Code in Hadoop by Google Open Source Software We are pleased to announce the release of MapReduce for C (MR4C), an open source framework that allows you to run native code in Hadoop. MR4C was originally developed at Skybox Imaging to facilitate large scale satellite image processing and geospatial data science. We found the job tracking and cluster management capabilities of Hadoop well-suited for scalable data handling, but also wanted to leverage the powerful ecosystem of proven image processing libraries developed in C and C . While many software companies that deal with large datasets have built proprietary systems to execute native code in MapReduce frameworks, MR4C represents a flexible solution in this space for use and development by the open source community. http://google-opensource.blogspot.sg/2015/02/mapreduce-for-c-run-native-code-in.html Machine Learning & Big Data at Spotify with Andy Sloane, Big Data Madison Meetup Back for a return engagement, Spotify engineer Andy Sloane will cover how they use machine learning at music recommendation service Spotify. He will also discuss some large database "tricks" Spotify uses to do real-time recommendations and fingerprint matching. Please join us! https://www.youtube.com/watch?v=MX ARH-KoDg&spfreload=10 http://www.meetup.com/BigDataMadison/events/216561502/ Slides http://www.slideshare.net/AndySloane/machine-learning-spotify-madison-big-data-meetup Hands on tutorial on Neo4J with Max De Marzi, Big Data Madison Meetup Back for a return engagement, developer Max De Marzi is coming from Chicago to give a tutorial on Neo4J, the popular graph database application. https://www.youtube.com/watch?v=l4EmLFaMxkA TED Talk: What do we do with all this big data? by Susan Etlinger Does a set of data make you feel more comfortable? More successful? Then your interpretation of it is likely wrong. In a surprisingly moving talk, Susan Etlinger explains why, as we receive more and more data, we need to deepen our critical thinking skills. Because it's hard to move beyond counting things to really understanding them. https://www.ted.com/talks/susan etlinger what do we do with all this big data Big Data's Big Deal by Viktor Mayer-Schonberger, Oxford's Podcast Duration: 0:44:14 Added: 20 Nov 2014 Big Data promises to change all sectors of our economy, and deeply affect our society. But beyond the current hype, what are Big Data's salient qualities, and do they warrant the high hopes? These are some of the questions that this talk addresses. http://podcasts.ox.ac.uk/big-datas-big-deal-0 BID Data Project - Big Data Analytics with Small Footprint Welcome to the BID Data Project! Here you will find resources for the fastest Big Data tools on the Web. See our Benchmarks on github. BIDMach running on a single GPU-equipped host holds the records for many common machine learning problems, on single nodes or clusters. Try It! BIDMach is an interactive environment designed to make it extremely easy to build and use machine learning models. BIDMach runs on Linux, Windows 7&8, and Mac OS X, and we have a pre-loaded Amazon EC2 instance. See the instructions in the Download Section. Develop with it. BIDMach includes core classes that take care of managing data sources, optimization and distributing data over CPUs or GPUs. It’s very easy to write your own models by generalizing from the models already included in the Toolkit. Explore. Our Publications Section includes published reports on the project, and the topics of forthcoming papers. Contribute. BIDMach includes many popular machine learning algorithms. But there is much more work to do. In progress we have Random Forests, extremely fast Gibbs samplers for Bayesian graphical models, distributed Deep Learning networks, and graph algorithms. Ask us for an unpublished report on these topics. Please use Github’s issues page for bug reports or suggestions: Lightning Overview The BID Data Suite is a collection of hardware, software and design patterns that enable fast, large-scale data mining at very low cost. http://bid2.berkeley.edu/bid-data-project/ SF Big Analytics and SF Machine learning meetup: Machine Learning at the Limit by Prof. John Canny Machine Learning at the Limit How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/ MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms. https://www.youtube.com/watch?v=smMy1lIG9WQ&spfreload=10 COMPETITIONS, in English Angry Birds AI Competition Here you will find all the information about upcoming and previous Angry Birds AI Competitions. The task of this competition is to develop a computer program that can successfully play Angry Birds. The long term goal is to build an intelligent Angry Birds playing agent that can play new levels better than the best human players. http://www.aibirds.org/ ChaLearn Mission: Machine Learning is the science of building hardware or software that can achieve tasks by learning from examples. The examples often come as {input, output} pairs. Given new inputs a trained machine can make predictions of the unknown output. Examples of machine learning tasks include: • automatic reading of handwriting • assisted medical diagnosis • automatic text classification (classification of web pages; spam filtering) • financial predictions We organize challenges to stimulate research in this field. The web sites of past challenges remain open for post-challenge submission as ever-going benchmarks. ChaLearn is a tax-exempt organization under section 501(c)(3) of the US IRS code. DLN: 17053090370022. http://www.chalearn.org/ ChaLearn Automatic Machine Learning Challenge (AutoML) https://www.codalab.org/competitions/2321 ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures. http://image-net.org/challenges/LSVRC/2015/ Kaggle Kaggle is the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions. https://www.kaggle.com/competitions Kaggle Competition Past Solutions We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn what makes a stellar and just a good solution. I will post solutions I came upon so we can all learn to become better! I collected the following source code and interesting discussions from the Kaggle held competitions for learning purposes. Not all competitions are listed because I am only manually collecting them, also some competitions are not listed due to no one sharing. I will add more as time goes by. Thank you. http://www.chioka.in/kaggle-competition-solutions/ Kaggle Connectomics Winning Solution Research Article Simple connectome inference from partial correlation statistics in calcium imaging http://arxiv.org/abs/1406.7865 Solution to the Galaxy Zoo Challenge by Sander Dieleman http://benanne.github.io/2014/04/05/galaxy-zoo.html https://github.com/benanne/kaggle-galaxies Winning 2 Kaggle in class competitions on spam http://mlwave.com/winning-2-kaggle-in-class-competitions-on-spam/ Matlab Benchmark for Packing Santa’s Sleigh translated in Python http://beatingthebenchmark.blogspot.co.uk/search?updatedmin=2013-01-01T00:00:00-08:00&updated-max=2014-01-01T00:00:00-08:00&max-results=4 Machine learning best practices we've learned from hundreds of competitions - Ben Hamner (Kaggle) Ben Hamner is Chief Scientist at Kaggle, leading its data science and development teams. He is the principal architect of many of Kaggle's most advanced machine learning projects including current work in Eagle Ford and GE's flight arrival prediction and optimization modeling. https://www.youtube.com/watch?v=9Zag7uhjdYo TEDx San Francisco, Jeremy Howard talk (Connecting Devices with Algorithms) http://tedxsf.org/videos/#tedxsf-connected-reality CrowdANALYTICS https://crowdanalytix.com/community Challenges for governmental applications https://www.challenge.gov/list/ InnoCentive Challenge Center https://www.innocentive.com/ar/challenge/browse TunedIT http://tunedit.org/ Ants, AI Challenge, sponsored by Google, 2011 The AI Challenge is all about creating artificial intelligence, whether you are a beginning programmer or an expert. Using one of the easy-to-use starter kits, you will create a computer program (in any language) that controls a colony of ants which fight against other colonies for domination. http://ants.aichallenge.org/ International Collegial Programming Contest The ACM International Collegiate Programming Contest (ICPC) is the premiere global programming competition conducted by and for the world’s universities. The competition operates under the auspices of ACM, is sponsored by IBM, and is headquartered at Baylor University. For nearly four decades, the ICPC has grown to be a game- changing global competitive educational program that has raised aspirations and performance of generations of the world’s problem solvers in the computing sciences and engineering. http://icpc.baylor.edu/welcome.icpc Dream challenges The Dialogue on Reverse Engineering Assessment and Methods (DREAM) project is an initiative to advance the field of systems biology through the organization of Challenges to foster the development of predictive models that allow scientists to better understand human disease. Challenges engage broad and diverse communities of scientists to competitively solve a specific problem in a given time period. The concept fosters collaboration between scientists through shared data and approaches. DREAM has developed by “Challenge” concept by launching 27 successful challenges over the past seven years. Sage Bionetworks and DREAM merged in early 2013 in order to develop Challenges engage a broader participation of the research community in open science projects hosted on Synapse, and that provide a meaningful impact to both discovery and clinical research. By presenting the research community with well-formulated questions that usually involve complex data, we effectively enable the sharing and improvement of predictive models, accelerating many-fold the transformation of this data into useful scientific knowledge. Our ultimate goal is to foster collaborations of like-minded researchers that together will find the solution for vexing problems that matter most to citizens and patients. https://www.synapse.org/#!Wiki:syn1929437/ENTITY Texata TEXATA The World’s Big Data Analytics Showdown. For Business. TEXATA 2015 is the annual Big Data Analytics World Championships for Business and Enterprise. Thousands of the best and brightest professionals and students from over 100 countries working across Computer Science, Maths, Technology, Engineering and Analytical disciplines compete to develop and apply their skills to real-world business case studies and challenges. The competition involves two online qualification rounds (4 hours each) and Live World Finals in Austin, Texas. TEXATA 2015 is a World Championship Event independently organized and administered by the Professional Services Champions League (PSCL). http://www.texata.com/ IoT World Forum Young Women's Innovation Grand Challenge NEWSFLASH! It is with great pleasure that we announce the twenty (20) semi-finalists of the 2015 IoT World Forum Young Women’s Innovation Grand Challenge. Our semi-finalists are listed here. We wish all our semi-finalists good luck as they prepare for the contest finals. Check back on July 30th to see who made the finals! The IoT World Forum Young Women’s Innovation Grand Challenge is a global innovation challenge open to young women between the ages of 13-18. The aim of the challenge is to recognize, promote, and reward young innovators as they come up with new uses for Internet of Things technologies. What is a problem you see today or expect to emerge in the next 5 years? How can connecting more devices and everyday objects to the internet or other networks help to solve this problem? If you’re a student who likes to take a creative approach to projects this challenge is for you! Use your skills to help envision new solutions that can be enabled with Internet of Things technologies both now and in the future. The Challenge: Your goal is to come up with new ideas on how technologies from the Internet of Things can improve education, healthcare, manufacturing, energy, retail, transportation, smart cities or find new solutions that can cut through many industries. http://iotchallenge-cisco.younoodle.com/ COMPETITIONS, in French Coming soon … COMPETITIONS, in Russian Russian AI Cup - Competition Programming Artificial Intelligence Russian AI Cup - open competition for programming artificial intelligence. Try your hand at programming strategy game! It's easy, clear and fun! Championship third Russian AI Cup called CodeHockey. You have to program the artificial intelligence of the players for the team. Your strategy will compete together in the sandbox and the championship. You can use any of the programming languages: C , Java, C #, Python, Pascal, or Ruby. Sandbox is now open. Good luck! To participate in the competition are invited novice programmers - students and schoolchildren, and professionals. It does not require any special knowledge, fairly basic programming skills. http://russianaicup.ru/ OPEN DATASET, in English Friday Lunch time Lectures at the Open Data Institute, Videos, slides and podcasts (not to be missed!) The ODI Friday lunchtime lecture series is now available to listen to and download as a podcast on iTunes or via the RSS feed. Friday lunchtime lectures are for everyone and free to attend. You bring your lunch, we provide tea and coffee, an interesting talk, and enough time to get back to your desk. They run from 1pm-1.45pm, with informal networking until 2pm, weekly during UK school term-times. Each lecture runs for around 20 minutes, leaving time for questions afterwards. The lectures do not require any specialist knowledge, but are focused around communicating the meaning and impact of open data in all areas of life. http://theodi.org/lunchtime-lectures Open data Institute: Certify your open data What does a certificate look like? It's a badge that links to a description of your open data. The description explores things like how often it's updated, what format it's in, who and where it came from. https://certificates.theodi.org/ The Text REtrieval Conference (TREC) Datasets The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the TIPSTER Text program. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. In particular, the TREC workshop series has the following goals: • to encourage research in information retrieval based on large test collections; • to increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas; • to speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems; and • to increase the availability of appropriate evaluation techniques for use by industry and academia, including development of new evaluation techniques more applicable to current systems. TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provides a test set of documents and questions. Participants run their own retrieval systems on the data, and return to NIST a list of the retrieved top-ranked documents. NIST pools the individual results, judges the retrieved documents for correctness, and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences. This evaluation effort has grown in both the number of participating systems and the number of tasks each year. Ninety-three groups representing 22 countries participated in TREC 2003. The TREC test collections and evaluation software are available to the retrieval research community at large, so organizations can evaluate their own retrieval systems at any time. TREC has successfully met its dual goals of improving the state-of-the-art in information retrieval and of facilitating technology transfer. Retrieval system effectiveness approximately doubled in the first six years of TREC. TREC has also sponsored the first large-scale evaluations of the retrieval of non-English (Spanish and Chinese) documents, retrieval of recordings of speech, and retrieval across multiple languages. TREC has also introduced evaluations for open-domain question answering and content-based retrieval of digital video. The TREC test collections are large enough so that they realistically model operational settings. Most of today's commercial search engines include technology first developed in TREC. http://trec.nist.gov/data.html HDX Humanitarian Data Exchange What is HDX? The goal of the Humanitarian Data Exchange (HDX) is to make humanitarian data easy to find and use for analysis. We are working on three elements that will eventually combine into an integrated data platform. Repository The HDX repository, where data providers can upload their raw data spreadsheets for others to find and use. Analytics HDX analytics, a database of high-value data that can be compared across countries and crises, with tools for analysis and visualisation. Standards Standards to help share humanitarian data through the use of a consensus Humanitarian Exchange Language. https://data.hdx.rwlabs.org/dataset World Data Bank Explore. Create. Share: Development Data DataBank is an analysis and visualisation tool that contains collections of time series data on a variety of topics. You can create your own queries; generate tables, charts, and maps; and easily save, embed, and share them. The World Bank Group has set two goals for the world to achieve by 2030: • End extreme poverty by decreasing the percentage of people living on less than $1.25 a day to no more than 3% • Promote shared prosperity by fostering the income growth of the bottom 40% for every country The World Bank is a vital source of financial and technical assistance to developing countries around the world. We are not a bank in the ordinary sense but a unique partnership to reduce poverty and support development. The World Bank Group comprises five institutions managed by their member countries. Established in 1944, the World Bank Group is headquartered in Washington, D.C. We have more than 10,000 employees in more than 120 offices worldwide. http://databank.worldbank.org/data/home.aspx US Dataset The home of the U.S. Government’s open data Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more. http://www.data.gov/ US City Open Data Census http://us-city.census.okfn.org/ Machine Learning repository The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. https://archive.ics.uci.edu/ml/datasets.html IMAGENET ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures. Who uses ImageNet? We envision ImageNet as a useful resource to researchers in the academic world, as well as educators around the world. Does ImageNet own the images? Can I download the images? No, ImageNet does not own the copyright of the images. ImageNet only provides thumbnails and URLs of images, in a way similar to what image search engines do. In other words, ImageNet compiles an accurate list of web images for each synset of WordNet. For researchers and educators who wish to use the images for non-commercial research and/or educational purposes, we can provide access through our site under certain conditions and terms. For details click here http://www.image-net.org/ Stanford Large Network Dataset Collection Social networks : online social networks, edges represent interactions between people Networks with ground-truth communities : ground-truth network communities in social and information networks Communication networks : email communication networks with edges representing communication Citation networks : nodes represent papers, edges represent citations Collaboration networks : nodes represent scientists, edges represent collaborations (co-authoring a paper) Web graphs : nodes represent webpages and edges are hyperlinks Amazon networks : nodes represent products and edges link commonly co-purchased products Internet networks : nodes represent computers and edges communication Road networks : nodes represent intersections and edges roads connecting the intersections Autonomous systems : graphs of the internet Signed networks : networks with positive and negative edges (friend/foe, trust/distrust) Location-based online social networks : Social networks with geographic check-ins Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets Online communities : Data from online communities such as Reddit and Flickr Online reviews : Data from online review systems such as BeerAdvocate and Amazon Information cascades : ... http://snap.stanford.edu/data/ Deep Learning datasets Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. This website is intended to host a variety of resources and pointers to information about Deep Learning. In these pages you will find • a reading list, • links to software, • datasets, • a list of deep learning research groups and labs, • a list of announcements for deep learning related jobs (job listings), • as well as tutorials and cool demos. http://deeplearning.net/datasets/ Open Government Data (OGD) Platform India https://data.gov.in/ Yahoo Datasets We have various types of data available to share. They are categorized into Ratings, Language, Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant data and resources available via the Yahoo! Developer Network. http://webscope.sandbox.yahoo.com/catalog.php Windows Azure Marketplace One-Stop Shop for Premium Data and Applications Hundreds of Apps, Thousands of Subscriptions, Trillions of Data Points https://datamarket.azure.com/browse/data?price=free Amazon Public Data Sets Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. Learn more about Public Data Sets on AWS and visit the Public Data Sets forum. http://aws.amazon.com/datasets/ Wikipedia: Database Download Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights. https://en.wikipedia.org/wiki/Wikipedia:Database download Gutenberg project (Free books available in different format, useful for NLP) Project Gutenberg offers 45,541 free ebooks to download. (source the 5 h June 2014) https://en.wikipedia.org/wiki/Wikipedia:Database download Freebase Use Freebase data Freebase data is free to use under an open license. You can: Query Freebase using our Search, Topic, or MQL APIs Download our weekly data dumps http://www.freebase.com/ Datamob Data http://datamob.org/datasets Reddit Datasets http://www.reddit.com/r/datasets/ 100+ Interesting Data Sets for Statistics Summary: Looking for interesting data sets? Here's a list of more than 100 of the best stuff, from dolphin relationships to political campaign donations to death row prisoners. http://rs.io/100-interesting-data-sets-for-statistics/ Data portal of the City of Chicago https://data.cityofchicago.org/ Gold mine where we can find data set such as names, salaries, positions of all persons working for Chicago City! https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-andPosition-Title/xzkq-xp2w Data portal of the City of Seattle https://data.seattle.gov/browse Data portal of the City of LA https://data.lacity.org/ California Department of Water Resources DWR has many programs and data tools to collect and disseminate information on water resources. All Water Data Topics… http://www.water.ca.gov/nav/index.cfm?id=106 CALIFORNIA DATA EXCHANGE CENTER (CDEC) With the cooperation of over 140 other agencies, the CDEC provides real-time, forecast, and historical hydrologic data. This data includes water discharge in rivers, water storage in reservoirs, precipitation accumulation, and water content in snow pack, primarily focused in flood management. However, the data is also helpful for determining general water availability and natural supply trends. More about CDEC http://cdec.water.ca.gov/ CALIFORNIA IRRIGATION MANAGEMENT INFORMATION SYSTEM (CIMIS) CIMIS is a network of over 120 automated weather stations in California. CIMIS was developed in 1982 by DWR and the University of California, Davis to assist California's irrigators to manage their water resources efficiently. More about CIMIS http://wwwcimis.water.ca.gov/ WATER DATA LIBRARY The library provides geographic-based data on water conditions. More about the Water Data Library http://www.water.ca.gov/waterdatalibrary/ INTERAGENCY ECOLOGICAL PROGRAM The Interagency Ecological Program (IEP) provides ecological information and scientific leadership for use in management of the San Francisco Estuary. More about IEP http://www.water.ca.gov/iep/ INTEGRATED WATER RESOURCES INFORMATION SYSTEM (IWRIS) IWRIS is a one stop shop for state-wide water resources information. It integrates multidisciplinary data to support Integrated Regional Water Management. More about IWRIS http://www.water.ca.gov/iwris/ http://www.water.ca.gov/data home.cfm Data portal of the City of Dallas https://www.dallasopendata.com/browse Data portal of the City of Austin https://data.austintexas.gov/ How to produce and use datasets: lessons learned, mlwave http://mlwave.com/how-to-produce-and-use-datasets-lessons-learned/ MITx and HarvardX release MOOC datasets and visualization tools http://newsoffice.mit.edu/2014/mitx-and-harvardx-release-mooc-datasets-and-vizualizationtools Finding the perfect house using open data, Justin Palmer’s Blog http://dealloc.me/2014/05/24/opendata-house-hunting/ Synapse A private or public workspace that allows you to aggregate, describe, and share your research. A tool to improve reproducibility of data intensive science, recording progress as you work with tools such as R and Python. A set of living research projects enabling contribution to large-scale collaborative solutions to scientific problems. https://www.synapse.org/ NYC Taxi Trips Date from 2013 These data were made publicly available thanks to Chris Whong who did the heavy lifting. He is also providing links to a bittorrent where the data can be downloaded much faster. Read more about it here. http://www.andresmh.com/nyctaxitrips/ Sebastian Raschka’s Dataset Collections https://github.com/rasbt/pattern_classification/blob/master/resources/dataset_collections.md Awesome Public Datasets by Xiaming Chen, Shanghai, China This list of public data sources are collected and tidyed from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. https://github.com/caesar0301/awesome-public-datasets I am now a Ph.D. candidate with Prof. Yaohui Jin at Shanghai Jiao Tong Univ.. I received my B.S. (2010) of Optical Information and Science Technology at Xidian University, Xi'an, China. My research interests come from the measurement and analysis of network traffic, especially the renewed models and characteristics of networks traffic, with the data mining techniques and high performance processing platforms like Network Processors and distributed processing systems like Hadoop/MapReduce or Spark. If you are interested in my articles, researches, or projects, you can reach me via email or other partially instant messages like github. Enjoy! :-) http://xiaming.me/pages/about.html UK Dataset Opening up government http://data.gov.uk/ LONDON DATASTORE - 601 datasets found (28-08-2015) Welcome to the new look DataStore Over the last few months we have been busy updating London Datastore to deliver a host of practical new features - improved (geography based) searches, dataset previews and APIs all of which will make for a much sleeker experience. The technical improvements are there to support our broader aim of kick-starting collaboration so that the value of data in our city reaches its full potential. Have a look around, read the introductory blog and Let us know what you think. http://data.london.gov.uk/dataset Transport For London Open Data, UK https://tfl.gov.uk/info-for/open-data-users/our-open-data Gaussian Processes List of Datasets Welcome to the web site for theory and applications of Gaussian Processes Gaussian Process is powerful non-parametric machine learning technique for constructing comprehensive probabilistic models of real world problems. They can be applied to geostatistics, supervised, unsupervised, reinforcement learning, principal component analysis, system identification and control, rendering music performance, optimization and many other tasks. People Geology & Modelling Research Group at Rio Tinto Centre for Mine Automation, ACFR, University of Sydney http://gaussianprocess.com/datasets.php The New York Times Linked Open Data For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies ever developed. In 2009, we began to publish this vocabulary as linked open data. The Data As of 13 January 2010, The New York Times has published approximately ,10,000 subject headings as linked open data under a CC BY license. We provide both RDF documents and a human-friendly HTML versions. The table below gives a breakdown of the various tag types and mapping strategies on data.nytimes.com. Type Manually Mapped Tags Automatically Mapped Tags Total People 4,978 0 4,978 Organizations 1,489 1,592 3,081 Locations 1,910 0 1,910 Descriptors 498 0 498 Total 10,467 http://data.nytimes.com/ Google Public Data Explorer The Google Public Data Explorer makes large, public-interest datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your own comparisons, and share your findings. Students, journalists, policy makers and everyone else can play with the tool to create visualizations of public data, link to them, or embed them in their own webpages. Embedded charts and links can update automatically so you’re always sharing the latest available data. The Public Data Explorer launched in March, 2010. See this blog post, which originally announced the product, for more background and historical perspective. https://www.google.com/publicdata/directory?hl=en US&dl=en US%22%20%5Cl%20%22! st=DATASET The Million Song Dataset The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are: - To encourage research on algorithms that scale to commercial sizes - To provide a reference dataset for evaluating research - As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's) - To help new researchers get started in the MIR field http://labrosa.ee.columbia.edu/millionsong/ CrowFlower Open Data Library CrowdFlower encourages developers and researchers to use its open data to explore new ways of what crowdsourcing can achieve. This webpage is a repository of data sets collected or enhanced by CrowdFlower's workforce and made available for everyone to use. http://www.crowdflower.com/data-for-everyone OPEN DATASET, in French Montreal, Portail Donnees Ouvertes (French&English), Canada http://donnees.ville.montreal.qc.ca/ Insee, France http://www.insee.fr/fr/publications-et-services/depliant webinsee.pdf RATP Open Data, French Tube in Paris, France http://data.ratp.fr/explore/ L’Open-Data français cartographié Voici trois cartographies de l’écosphère de l‘Open Data français. Sur fond noir, les trois posters (téléchargeable au format « A0″) livrent un aperçu général sur l’open-data français actuel. Les trois cartographies sont basées sur les données fournies par Data-Publica, notamment deux études réalisées récemment par Guillaume Lebourgeois, Pierrick Boitel et Perrine Letellier (ayant accueilli les deux derniers dans mon enseignement à l’UTC au semestre dernier). L’objectif de ces cartes est d’entamer une « radiographie » assez complète du domaine, renouvelable dans le temps (peut-être tous les six mois) et directement associée aux données présentes chez Data Publica. En somme, une sorte d’observatoire de l’open-data français dans lequel je me lance à travers les productions de l’Atelier de Cartographie. http://ateliercartographie.wordpress.com/2012/09/23/lopen-data-francais-cartographie/ OPEN DATASET, China Lamda Group Data • Image Data For Multi-Instance Multi-Label Learning • MDDM Data for for multi-label dimensionality reduction. • Text Data for Multi-Instance Learning • MILWEB Data for Multi-Instance Learning Based Web Index Recommendation. • SGBDota Data for the PCES (Positive Concept Expansion with Single snapshot) problem. • Single Face Dataset Data for Face Recognition with One Training Image per Person. • Text Data For Multi-Instance Multi-Label Learning http://lamda.nju.edu.cn/Data.ashx DATA VISUALIZATION Visualization Lab Gallery, Computer Science Division, University of California, Berkeley CS 294-10 Fall '14 Visualization Instructors: Maneesh Agrawala and Jessica Hullman Course Wiki CS 160 Spring '14 User Interface Design Instructor: Maneesh Agrawala and Bjoern Hartmann TAs: Brittany Cheng, Steve Rubin, and Eric Xiao Course Wiki CS 294-10 Fall '13 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Spring '12 User Interface Design Instructor: Maneesh Agrawala TAs: Nicholas Kong, Anuj Tewari Course Wiki CS 294-69 Fall '11 Image Manipulation and Computational Photography Instructor: Maneesh Agrawala TA: Floraine Berthouzoz Course Wiki CS 294-10 Spring '11 Visualization Instructor: Maneesh Agrawala Course Wiki CS 184 Fall '10 Computer Graphics Instructor: Maneesh Agrawala TAs: Robert Carroll, Fu-Chung Huang Course Wiki CS 160 Spring '10 User Interface Instructors: Bjoern Hartmann, Maneesh Agrawala TAs: Kenrick Kin, Anuj Tewari Course Wiki CS 294-10 Spring '10 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Spring '09 User Interfaces Instructors: Maneesh Agrawala, Jeffrey Nichols TAs: Nicholas Kong Course Wiki CS 294-10 Fall '08 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Spring '08 User Interfaces Instructor: Maneesh Agrawala TAs: Wesley Willett and Seth Horrigan Course Wiki CS 294-10 Fall '07 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Fall '06 User Interfaces Instructor: Maneesh Agrawala TAs: David Sun and Jerry Yu Course Wiki CS 294-10 Spring '06 Visualization Organizers: Maneesh Agrawala, Jeffrey Heer Course Wiki http://vis.berkeley.edu/courses/cs294-10-fa14/wiki/index.php/Visualization Gallery Visualization Lab Software, Computer Science Division, University of California, Berkeley http://vis.berkeley.edu/software/ Visualization Lab Course Wiki, Computer Science Division, University of California, Berkeley http://vis.berkeley.edu/courses/ Mike Bostock Visualizing algorithms http://bost.ocks.org/mike/ Eyeo Festival Eyeo assembles an incredible set of creative coders, data designers and artists, and attendees -expect enthralling talks, unique workshops and interactions with open source instigators and super fascinating practitioners. Join us for an extraordinary festival. http://eyeofestival.com/ MIT Data Collider A new language for data visualisation http://datacollider.io/ D3 JS Data-Driven Documents D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation. http://d3js.org/ Shan He, Research Fellow at MIT Senseable City Lab Shan He is research fellow at MIT Senseable City Lab. She is an architect and a computational design specialist. She is currently a student at MIT Department of Architecture pursuing her SMArchS in Design and Computation. At Senseable, her focus is on data visualization, interactive design and web application. Prior to coming to MIT she worked as a product designer for Blu Homes where she worked on developing an online 3-D customization tool with intellectual property. During her time at MIT she has worked as a research assistant for the Clean Energy City Lab at the Advanced Urbanism Center and also for the Mobile Experience Lab at the CMS. Shan holds a B.Arch from Tsinghua University in China and a M.Arch from University of Michigan, Ann Arbor. http://cargocollective.com/shanhe/About-Shan-He Gource software version control visualization Software projects are displayed by Gource as an animated tree with the root directory of the project at its centre. Directories appear as branches with files as leaves. Developers can be seen working on the tree at the times they contributed to the project. https://www.youtube.com/watch?v=NjUuAuBcoqs https://code.google.com/p/gource/ Logstalgia, website access log visualization Logstalgia (aka ApachePong) is a website access log visualization tool. https://code.google.com/p/logstalgia/ Andrew Caudwell's Blog Andrew Caudwell is a software developer and sometimes computer graphics programmer/artist located in Wellington, New Zealand. He is probably best known through his work as the author of several popular data visualizations: Logstalgia (aka Apache Pong) a visualization of website traffic as a pong-like game Gource a force-directed layout software version control visualization This blog is a collection of his work, experiments, thoughts and ideas on procedurally generated computer graphics and animation. http://www.thealphablenders.com/ MLDemos , EPFL, Switzerland MLDemos is an open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. MLDemos is open-source and free for personal and academic use. http://mldemos.epfl.ch/ The University of Florida Sparse Matrix Collection We describe the University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications. The Collection is widely used by the numerical linear algebra community for the development and performance evaluation of sparse matrix algorithms. It allows for robust and repeatable experiments: robust because performance results with artificially-generated matrices can be misleading, and repeatable because matrices are curated and made publicly available in many formats. Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs). We provide software for accessing and managing the Collection, from MATLAB, Mathematica, Fortran, and C, as well as an online search capability. Graph visualization of the matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task. http://www.cise.ufl.edu/research/sparse/matrices/ Visualization & Graphics lab, Dept. of CSA and SERC, Indian Institute of Science, Bangalore This is the video channel of the Visualization & Graphics lab (http://vgl.serc.iisc.ernet.in) which is part of the Dept. of CSA and SERC, Indian Institute of Science, Bangalore. It contains videos created by the members of the lab as part of their research. https://www.youtube.com/user/vgliisc/videos?spfreload=10 Allison McCann Allison McCann is a visual journalist and data reporter for FiveThirtyEight. http://allisontmccann.com/ Scott Murray I write software that generates images and interactive experiences. I’m interested in data visualization, generative art, and designed experiences that encourage people to slow down and reflect. I am an Assistant Professor of Design at USF, a contributor to Processing, and the author of Interactive Data Visualization for the Web. I studied at MassArt’s Dynamic Media Institute (M.F.A. 2010) and Vassar College (A.B. 2001). Website The energetic particles on the home page were created with Processing and Processing.js. Site content is managed in a database-free environment with Kirby. Changes are pushed with git to magical boxes at Pagoda Box, where the files are hosted. Site analytics magic performed by Piwik. The site was made mobile-friendly through a combination of CSS3 media queries and JavaScript. http://alignedleft.com/ Gephi: The Open Graph Viz Platform Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs. Runs on Windows, Linux and Mac OS X. Gephi is open-source and free. What else? ;-) Gephi is an open-source software for network visualization and analysis. It helps data analysts to intuitively reveal patterns and trends, highlight outliers and tells stories with their data. It uses a 3D render engine to display large graphs in real-time and to speed up the exploration. Gephi combines built-in functionalities and flexible architecture to explore, analyze, spatialize, filter, cluster, manipulate, export all types of networks. Gephi is based on a visualize-and-manipulate paradigm which allow any user to discover networks and data properties. Moreover, it is designed to follow the chain of a case study, from data file to nice printable maps. Gephi is a free/libre software distributed under the GPL 3 ("GNU General Public License"). Tags: network, network science, infovis, visualization, visual analytics, exploratory data analysis, graph, graph viz, graph theory, complex network, software, open source, science https://gephi.github.io/features/ http://gephi.github.io/ Data Analysis and Visualization Using R by David Robinson This is a course that combines video, HTML and interactive elements to teach the statistical programming language R. http://varianceexplained.org/RData/ Visualising Data Blog (Huge list of resources, great blog!) About Andy Kirk Andy Kirk is a UK-based freelance data visualisation specialist: A design consultant, training provider, author, editor of visualisingdata.com, speaker and researcher... Between January 2014 and March 2015 Andy is working as a co-investigator on a research project called ‘Seeing Data’ funded by the Arts & Humanities Research Council and hosted by the University of Leeds. This study is exploring the issue of visualisation literacy amongst the general public. http://www.visualisingdata.com/index.php/blog/ http://www.visualisingdata.com/index.php/resources/ The 8 hats of Data Visualisation Design by Andy Kirk The nature of data visualization as a truly multi-disciplinary subject introduces many challenges. You might be a creative but how are your analytical skills? Good at closing out a design but how about the initial research and data sourcing? In this talk Andy Kirk will discuss the many different ‘hats’ a visualization designer needs to wear in order to effectively deliver against these demands. It will also contextualize these duties in the sense of a data visualization project timeline. Whether a single person will fulfill these roles, or a team collaboration will be set up to cover all bases, this presentation will help you understand the requirements of any visualization problem context. https://vimeo.com/44886980 Andy Kirk, Visualisation consultant at the Big Data Week, 2013 https://www.youtube.com/watch?v=13weAkpSdWk&spfreload=10 Image Gallery by the Arts and Humanities Research Council, UK Images are generated and used in the arts and humanities in a wide variety of ways and for a range of purposes as computer-generated (CGI) or computer enhanced images, virtual reality representations and visualisations, digitised images from museums, libraries and archives, design and architectural blueprints, photographs, cartoons, newspapers, maps and much else. The AHRC Image Gallery is designed to showcase the range of digital images generated either as by-products or as outputs of research projects in the arts and humanities as a means of highlighting the richness and diversity of images created and used within the arts and humanities and to showcase the talents of those who create them, including those of doctoral students and early career researchers. http://www.ahrc.ac.uk/News-and-Events/Image-Gallery/Pages/Image-Gallery.aspx Setosa.io by Victor Powell & Lewis Lehe interactive = intuitive substance > information http://setosa.io/#/ BOOKS, in English 2015 Bayesian Reasoning and Machine Learning, David Barber, 2012 (online version 04-2015) Machine learning methods extract value from vast data sets quickly and with modest resources. They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly. People who know the methods have their choice of rewarding jobs. This hands-on text opens these opportunities to computer science students with modest mathematical backgrounds. It is designed for final-year undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models. Students learn more than a menu of techniques, they develop analytical and problem-solving skills that equip them for the real world. Numerous examples and exercises, both computer based and theoretical, are included in every chapter. Resources for students and instructors, including a MATLAB toolbox, are available online. http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=Brml.Online Deep Learning (Artificial Intelligence) , An MIT Press book in preparation, by Yoshua Bengio, Ian Goodfellow and Aaron Courville, Jul-2015 Please help us make this a great book! This draft is still full of typos and can be improved in many ways. Your suggestions are more than welcome. Do not hesitate to contact any of the authors directly by e-mail or Google messages: Yoshua, Ian, Aaron. Table of Contents Deep Learning for AI Linear Algebra Probability and Information Theory Numerical Computation Machine Learning Basics Feedforward Deep Networks Structured Probabilistic Models: A Deep Learning Perspective Unsupervised and Transfer Learning Convolutional Networks Sequence Modeling: Recurrent and Recursive Nets The Manifold Perspective on Auto-Encoders Confronting the Partition Function References http://www.iro.umontreal.ca/~bengioy/dlbook/ Neural Networks and Deep Learning by Michael Nielsen, 2015 Neural Networks and Deep Learning is a free online book. The book will teach you about: Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data Deep learning, a powerful set of techniques for learning in neural networks Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts behind neural networks and deep learning. The book is currently an incomplete beta draft. More chapters will be added over the coming months. For now, you can: Read Chapter 1, which explains how neural networks can learn to recognize handwriting Read Chapter 2, which explains backpropagation, the most important algorithm used to learn in neural networks. http://neuralnetworksanddeeplearning.com/index.html 2014 An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014 The past few years have seen a major change in computing systems, as growing data volumes and stalling processor speeds require more and more applications to scale out to distributed systems. Today, a myriad data sources, from the Internet to business operations to scientific instruments, produce large and valuable data streams. However, the processing capabilities of single machines have not kept up with the size of data, making it harder and harder to put to use. As a result, a grow- ing number of organizations not just web companies, but traditional enterprises and research labs need to scale out their most important computations to clusters of hundreds of machines. At the same time, the speed and sophistication required of data processing have grown. In addition to simple queries, complex algorithms like machine learning and graph analysis are becoming common in many domains. And in addition to batch processing, streaming analysis of new real-time data sources is required to let organizations take timely action. Future computing platforms will need to not only scale out traditional workloads, but support these new applications as well. This dissertation proposes an architecture for cluster computing systems that can tackle emerging data processing workloads while coping with larger and larger scales. Whereas early cluster computing systems, like MapReduce, handled batch processing, our architecture also enables streaming and interactive queries, while keeping the scalability and fault tolerance of previous systems. And whereas most deployed systems only support simple one-pass computations (e.g., aggregation or SQL queries), ours also extends to the multi-pass algorithms required for more complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike the specialized systems proposed for some of these workloads, our architecture allows these computations to be combined, enabling rich new applications that intermix, for example, streaming and batch processing, or SQL and complex analytics. We achieve these results through a simple extension to MapReduce that adds primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to efficiently capture a wide range of workloads. We implement RDDs in the open source Spark system, which we evaluate using both synthetic benchmarks and real user applications. Spark matches or exceeds the performance of specialized systems in many application domains, while offering stronger fault tolerance guarantees and allowing these workloads to be combined. We explore the generality of RDDs from both a theoretical modeling perspective and a practical perspective to see why this extension can capture a wide range of previously disparate workloads. http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf Deep Learning Tutorial by LISA Lab, University of Montreal, 2014 The tutorials presented here will introduce you to some of the most important deep learning algorithms and will also show you how to run them using Theano. Theano is a python library that makes writing deep learning models easy, and gives the option of training them on a GPU. The algorithm tutorials have some prerequisites. You should know some python, and be familiar with numpy. Since this tutorial is about using Theano, you should read over the Theano basic tutorial first. Once you’ve done that, read through our Getting Started chapter it introduces the notation, and [downloadable] datasets used in the algorithm tutorials, and the way we do optimization by stochastic gradient descent. The purely supervised learning algorithms are meant to be read in order: 1. Logistic Regression - using Theano for something simple 2. Multilayer perceptron - introduction to layers 3. Deep Convolutional Network - a simplified version of LeNet5 The unsupervised and semi-supervised learning algorithms can be read in any order (the autoencoders can be read independently of the RBM/DBN thread): • Auto Encoders, Denoising Autoencoders - description of autoencoders • Stacked Denoising Auto-Encoders - easy steps into unsupervised pre-training for deep nets • Restricted Boltzmann Machines - single layer generative RBM model • DeepBeliefNetworks-unsupervisedgenerativepre-trainingofstackedRBMsfollowedbysupervised fine-tuning Building towards including the mcRBM model, we have a new tutorial on sampling from energy models: • HMC Sampling - hybrid (aka Hamiltonian) Monte-Carlo sampling with scan() Building towards including the Contractive auto-encoders tutorial, we have the code for now: • Contractive auto-encoders code - There is some basic doc in the code. Energy-based recurrent neural network (RNN-RBM): • Modeling and generating sequences of polyphonic music http://deeplearning.net/tutorial/deeplearning.pdf Statistical Inference for Everyone, by Professor Bryan Blais, 2014 This is a new approach to an introductory statistical inference textbook, motivated by probability theory as logic. It is targeted to the typical Statistics 101 college student, and covers the topics typically covered in the first semester of such a course. It is freely available under the Creative Commons License, and includes a software library in Python for making some of the calculations and visualizations easier. I am a professor of Science and Technology, Bryant University and a research professor at the Institute for Brain and Neural Systems, Brown University. My interests include Theoretical Neuroscience learning and memory in neural systems vision spike-timing dependent plasticity Bayesian Inference frequentist versus Bayesian statistics Bayesian approaches to learning and memory Digital to Analog Computer Control autonomous experiments neural networks and robotics Global Resources Dynamics of global resources and economics Population growth, Malthusian traps, and energy http://web.bryant.edu/~bblais/statistical-inference-for-everyone-sie.html Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014 The book The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining). The book, like the course, is designed at the undergraduate computer science level with no formal prerequisites. To support deeper explorations, most of the chapters are supplemented with further reading references. The Mining of Massive Datasets book has been published by Cambridge University Press. You can get 20% discount here. By agreement with the publisher, you can download the book for free from this page. Cambridge University Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3. We welcome your feedback on the manuscript. The 2nd edition of the book (v2.1) The following is the second edition of the book. There are three new chapters, on mining large graphs, dimensionality reduction, and machine learning. There is also a revised Chapter 2 that treats map-reduce programming in a manner closer to how it is used in practice. Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford CS246: Mining Massive Datasets course. Note that the slides do not necessarily cover all the material convered in the corresponding chapters. Download the latest version of the book as a single big PDF file (511 pages, 3 MB). Note to the users of provided slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http:// www.mmds.org/. Comments and corrections are most welcome. Please let us know if you are using these materials in your course and we will list and link to your course. http://infolab.stanford.edu/~ullman/mmds/book.pdf Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi, Huan Liu, 2014 The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining. http://dmml.asu.edu/smm/book/ Slides http://dmml.asu.edu/smm/slides/ Causal Inference by Miguel A. Hernán and James M. Robins, May 14, 2014, Draft The book provides a cohesive presentation of concepts of, and methods for, causal inference. Much of this material is currently scattered across journals in several disciplines or confined to technical articles. We expect that the book will be of interest to anyone interested in causal inference, e.g., epidemiologists, statisticians, psychologists, economists, sociologists, other social scientists… The book is geared towards graduate students and practitioners. We have divided the book in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from complex longitudinal data. We will make drafts of selected book sections available on this website. The idea is that interested readers can submit suggestions or criticisms before the book is published. If you wish to share any comments, please email me or visit us on Facebook (user causalinference). Warning: These documents are drafts. We are constantly revising and correcting errors without documenting the changes. Please make sure you use the most updated version posted here. http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ Slides for High Performance Python tutorial at EuroSciPy, 2014 by Ian Ozsvald This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me. https://github.com/ianozsvald/euroscipy2014 highperformancepython http://ianozsvald.com/2014/08/30/slides-for-high-performance-python-tutorial-ateuroscipy2014-book-signing/ Probabilistic Programming and Bayesian Methods for Hackers by Cameron Davidson-Pilon, 2014 Bayesian Methods for Hackers is designed as a introduction to Bayesian inference from a computational/understanding-first, and mathematics-second, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical-background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining. https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-forHackers Past, Present, and Future of Statistical Science by COPSS, 2014 http://nisla05.niss.org/copss/past-present-future-copss.pdf Essential of Metaheuristics by Sean Luke, 2014 This is an open set of lecture notes on metaheuristics algorithms, intended for undergraduate students, practitioners, programmers, and other non-experts. It was developed as a series of lecture notes for an undergraduate course I taught at GMU. The chapters are designed to be printable separately if necessary. As it's lecture notes, the topics are short and light on examples and theory. It's best when complementing other texts. With time, I might remedy this. http://cs.gmu.edu/~sean/book/metaheuristics/ 2013 Interactive Data Visualization for the Web By Scott Murray, 2013 Read online for free on the publisher website This online version of Interactive Data Visualization for the Web includes 44 examples that will show you how to best represent your interactive data. For instance, you'll learn how to create this simple force layout with 10 nodes and 12 edges. Click and drag the nodes below to see the diagram react. This step-by-step guide is ideal whether you’re a designer or visual artist with no programming experience, a reporter exploring the new frontier of data journalism, or anyone who wants to visualize and share data. Create and publish your own interactive data visualization projects on the Web even if you have little or no experience with data visualization or web development. It’s easy and fun with this practical, hands-on introduction. Author Scott Murray teaches you the fundamental concepts and methods of D3, a JavaScript library that lets you express data visually in a web browser. Along the way, you’ll expand your web programming skills, using tools such as HTML and JavaScript. http://chimera.labs.oreilly.com/books/1230000000345 Statistical Model Building, Machine Learning, and the Ah-Ha Moment by Grace Wahba, 2013 https://archive.org/details/arxiv-1303.5153 An Introduction to Statistical Learning with applications in R. by Gareth James Daniela Witten Trevor Hastie Robert Tibshirani, 2013 (first printing) http://web.stanford.edu/~hastie/local.ftp/Springer/ISLR print1.pdf 2012 Reinforcement Learning by Richard S. Sutton and Andrew G. Barto, 2012, Second edition in progress (PDF) This introductory textbook on reinforcement learning is targeted toward engineers and scientists in artificial intelligence, operations research, neural networks, and control systems, and we hope it will also be of interest to psychologists and neuroscientists. ... A second edition is incomplete and in progress, but also perfectly usable. Feedback is welcome; send your comments to rich@richsutton.com. http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html R Graphics Cookbook Code Resources (Graphs with ggplot2) by Winston Chang, 2012 My book about data visualization in R is available! The book covers many of the same topics as the Graphs and Data Manipulation sections of this website, but it goes into more depth and covers a broader range of techniques. You can preview it at Google Books. http://www.cookbook-r.com/Graphs/ Supervised Sequence Labelling with Recurrent Neural Networks by Alex Graves, 2012 Structure of the Book The chapters are roughly grouped into three parts: background material is presented in Chapters 2 4, Chapters 5 and 6 are primarily experimental, and new methods are introduced in Chapters 7 9. Chapter 2 briefly reviews supervised learning in general, and pattern classi- fication in particular. It also provides a formal definition of sequence labelling, and discusses three classes of sequence labelling task that arise under different relationships between the input and label sequences. Chapter 3 provides back- ground material for feedforward and recurrent neural networks, with emphasis on their application to labelling and classification tasks. It also introduces the sequential Jacobian as a tool for analysing the use of context by RNNs. Chapter 4 describes the LSTM architecture and introduces bidirectional LSTM (BLSTM). Chapter 5 contains an experimental comparison of BLSTM to other neural network architectures applied to framewise phoneme classification. Chapter 6 investigates the use of LSTM in hidden Markov model-neural network hybrids. Chapter 7 introduces connectionist temporal classification, Chapter 8 covers multidimensional networks, and hierarchical subsampling networks are described in Chapter 9. http://www.cs.toronto.edu/~graves/preprint.pdf A course in Machine Learning by Hal Daume, 2012 Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential consumer of machine learning. CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some. http://ciml.info/ Machine Learning in Action, Peter Harrington, 2012 Chapter 1 and 7 are available for free on the publisher website http://www.manning.com/pharrington/MLiAchapter1sample.pdf http://www.manning.com/pharrington/MLiAchapter7sample.pdf A Programmer's Guide to Data Mining, by Ron Zacharski, 2012 About This Book Before you is a tool for learning basic data mining techniques. Most data mining textbooks focus on providing a theoretical foundation for data mining, and as result, may seem notoriously difficult to understand. Don’t get me wrong, the information in those books is extremely important. However, if you are a programmer interested in learning a bit about data mining you might be interested in a beginner’s hands-on guide as a first step. That’s what this book provides. This guide follows a learn-by-doing approach. Instead of passively reading the book, I encourage you to work through the exercises and experiment with the Python code I provide. I hope you will be actively involved in trying out and programming data mining techniques. The textbook is laid out as a series of small steps that build on each other until, by the time you complete the book, you have laid the foundation for understanding data mining techniques. This book is available for download for free under a Creative Commons license (see link in footer). You are free to share the book, and remix it. Someday I may offer a paper copy, but the online version will always be free. http://guidetodatamining.com/ 2010 Artificial Intelligence, Foundations of Computational Agents by David Poole and Alan Mackworth, 2010 Artificial Intelligence: Foundations of Computational Agents is a book about the science of artificial intelligence (AI). The view we take is that AI is the study of the design of intelligent computational agents. The book is structured as a textbook but it is designed to be accessible to a wide audience. We wrote this book because we are excited about the emergence of AI as an integrated science. As with any science worth its salt, AI has a coherent, formal theory and a rambunctious experimental wing. Here we balance theory and experiment and show how to link them intimately together. We develop the science of AI together with its engineering applications. We believe the adage, "There is nothing so practical as a good theory." The spirit of our approach is captured by the dictum, "Everything should be made as simple as possible, but not simpler." We must build the science on solid foundations; we present the foundations, but only sketch, and give some examples of, the complexity required to build useful intelligent systems. Although the resulting systems the will be complex, the foundations and the building blocks should be simple. http://artint.info/html/ArtInt.html Introduction to Machine Learning by Ethem Alpaydın, MIT Press, Second Edition, 2010, 579 pages 1 Introduction 1 2 Supervised Learning 21 3 Bayesian Decision Theory 47 4 Parametric Methods 61 5 Multivariate Methods 87 6 Dimensionality Reduction 7 Clustering 143 8 Nonparametric Methods 9 Decision Trees 185 10 Linear Discrimination 209 11 Multilayer Perceptrons 233 12 Local Models 279 13 Kernel Machines 14 Bayesian Estimation 15 Hidden Markov Models 16 Graphical Models 387 17 Combining Multiple Learners 419 18 Reinforcement Learning 447 19 Design and Analysis of Machine Learning Experiments 475 A Probability 517 http://www.cmpe.boun.edu.tr/~ethem/i2ml2e/index.html 2009 The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J. Friedman, 2009 During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization and spectral clustering. There is also a chapter on methods for ``wide'' data (italics p bigger than n), including multiple testing and false discovery rates. http://statweb.stanford.edu/~tibs/ElemStatLearn/ Learning Deep Architecture for AI by Yoshua Bengio, 2009 Abstract Theoretical results suggest that in order to learn the kind of com- plicated functions that can represent high-level abstractions (e.g., in vision, language, and other AI-level tasks), one may need deep architec- tures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in com- plicated propositional formulae reusing many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state- of-the-art in certain areas. This monograph discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of singlelayer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks. http://www.iro.umontreal.ca/~bengioy/papers/ftml book.pdf An Introduction to Information Retrieval by Christopher D. Manning Prabhakar Raghavan Hinrich Schütze, 2009 This book is the result of a series of courses we have taught at Stanford University and at the University of Stuttgart, in a range of durations including a single quarter, one semester and two quarters. These courses were aimed at early-stage graduate students in computer science, but we have also had enrollment from upper-class computer science undergraduates, as well as students from law, medical informatics, statistics, linguistics and various en- gineering disciplines. The key design principle for this book, therefore, was to cover what we believe to be important in a oneterm graduate course on information retrieval. An additional principle is to build each chapter around material that we believe can be covered in a single lecture of 75 to 90 minutes. The first eight chapters of the book are devoted to the basics of information retrieval, and in particular the heart of search engines; we consider this material to be core to any course on information retrieval. … Chapters 9 21 build on the foundation of the first eight chapters to cover a variety of more advanced topics. http://www-nlp.stanford.edu/IR-book/ http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf 2008 Kernel Method in Machine Learning by Thomas Hofmann; Bernhard Schölkopf; Alexander J. Smola, 2008 We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data. We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data. https://archive.org/details/arxiv-math0701907 Introduction to Machine Learning, Alex Smola, S.V.N. Vishwanathan, 2008 Over the past two decades Machine Learning has become one of the main- stays of information technology and with that, a rather central, albeit usually hidden, part of our life. With the ever increasing amounts of data becoming available there is good reason to believe that smart data analysis will become even more pervasive as a necessary ingredient for technological progress. The purpose of this chapter is to provide the reader with an overview over the vast range of applications which have at their heart a machine learning problem and to bring some degree of order to the zoo of problems. After that, we will discuss some basic tools from statistics and probability theory, since they form the language in which many machine learning problems must be phrased to become amenable to solving. Finally, we will outline a set of fairly basic yet effective algorithms to solve an important problem, namely that of classification. More sophisticated tools, a discussion of more general problems and a detailed analysis will follow in later parts of the book. http://alex.smola.org/drafts/thebook.pdf 2006 Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field, and together they have undergone substantial development over the past ten years. In particular, Bayesian methods have grown from a specialist niche to become mainstream, while graphical models have emerged as a general framework for describing and applying probabilistic models. Also, the practical applicability of Bayesian methods has been greatly enhanced through the development of a range of approximate inference algorithms such as variational Bayes and expectation propa- gation. Similarly, new models based on kernels have had significant impact on both algorithms and applications. Chapter 8 Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to the sum rule and the product rule. All of the probabilistic infer- ence and learning manipulations discussed in this book, no matter how complex, amount to repeated application of these two equations. We could therefore proceed to formulate and solve complicated probabilistic models purely by algebraic ma- nipulation. However, we shall find it highly advantageous to augment the analysis using diagrammatic representations of probability distributions, called probabilistic graphical models. These offer several useful properties: 1. They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models. 2. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph. 3. Complex computations, required to perform inference and learning in sophis- ticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly. http://research.microsoft.com/en-us/um/people/cmbishop/PRML/pdf/Bishop-PRMLsample.pdf http://research.microsoft.com/en-us/um/people/cmbishop/prml/ Gaussian processes for Machine Learning, C. Rasmussen and C. Williams, 2006 Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics.The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes. http://www.gaussianprocess.org/gpml/chapters/ 2005 Bayesian Machine Learning by Chakraborty, Sounak, 2005 PhD Thesis https://archive.org/details/bayesianmachinel00chak Machine Learning by Tom Mitchell, 2005 Policy on use:. You are welcome to download these chapters for your personal use, or for use in classes you teach. In return, I ask only two things: • Please do not re-post these documents on the internet. If you wish to make them available to your students, point them directly to this site. • If you find errors please send me email at Tom.Mitchell@cmu.edu I hope you find these useful! Tom Mitchell http://www.cs.cmu.edu/~tom/NewChapters.html http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html 2003 Information Theory, Inference, and Learning Algorithms, David McKay, 2003 This book is aimed at senior undergraduates and graduate students in Engineering, Science, Mathematics, and Computing. It expects familiarity with calculus, probability theory, and linear algebra as taught in a first- or second- year undergraduate course on mathematics for scientists and engineers. Conventional courses on information theory cover not only the beautiful theoretical ideas of Shannon, but also practical solutions to communica- tion problems. This book goes further, bringing in Bayesian data modelling, Monte Carlo methods, variational methods, clustering algorithms, and neural networks. Why unify information theory and machine learning? Because they are two sides of the same coin. In the 1960s, a single field, cybernetics, was populated by information theorists, computer scientists, and neuroscientists, all studying common problems. Information theory and machine learning still belong together. Brains are the ultimate compression and communication systems. And the state-of-the-art algorithms for both data compression and error-correcting codes use the same tools as machine learning. http://www.inference.phy.cam.ac.uk/itprnn/book.html https://archive.org/details/MackayInformationTheoryFreeEbookReleasedByAuthor MISCELLANEOUS Free Book List E-Books for free online viewing and/or download http://www.e-booksdirectory.com/listing.php?category=284 Free resource book (need to sign in) There are too many machine learning resources on the internet, so much so that it can feel overwhelming. I have read the books and taken the courses and can give you good advice on where to start. Resources you can use to learn faster I have hand-picked the best machine learning… …books …websites …videos …university courses …software …competition sites These resources have been listed in a handy PDF that you can download now http://machinelearningmastery.com/machine-learning-resources/ Wikipedia: Machine Learning, the Complete Guide This is a Wikipedia book, a collection of Wikipedia articles that can be easily saved, rendered electronically, and ordered as a printed book. For information and help on Wikipedia books in general, see Help:Books (general tips) and WikiProject Wikipedia-Books (questions and assistance). https://en.wikipedia.org/wiki/Book:Machine_Learning_%E2%80%93_The_Complete_Guide ISSUU Rediscover reading With over 19 million publications, Issuu is the fastest growing digital publishing platform in the world. Millions of avid readers come here every day to read the free publications created by enthusiastic publishers from all over the globe with topics in fashion, lifestyle, art, sports and global affairs to mention a few. And that's not all. We've also got a prominent range of independent publishers utilizing the Issuu network to reach new fans every day. Created by a bunch of geeks with an undying love for the publishing industry, Issuu has grown to become one of the biggest publishing networks in the industry. It's an archive, library and newsstand all gathered in one reading experience. http://issuu.com/search?q=%22machine learning%22 Neural Networks, A Systematic Introduction by Raul Rojas We are now beginning to see good textbooks for introducing the subject to various student groups. This book by Rau ́l Rojas is aimed at advanced undergraduates in computer science and mathematics. This is a revised version of his German text which has been quite successful. It is also a valuable self- instruction source for professionals interested in the relation of neural network ideas to theoretical computer science and articulating disciplines. The book is divided into eighteen chapters, each designed to be taught in about one week. The first eight chapters follow a progression and the later ones can be covered in a variety of orders. The emphasis throughout is on explicating the computational nature of the structures and processes and relating them to other computational formalisms. Proofs are rigorous, but not overly formal, and there is extensive use of geometric intuition and diagrams. Specific applications are discussed, with the emphasis on computational rather than engineering issues. There is a modest number of exercises at the end of most chapters. http://www.inf.fu-berlin.de/inst/ag-ki/rojas home/documents/1996/NeuralNetworks/ neuron.pdf BOOKS, in Spanish Coming soon … BOOKS, in Portuguese Coming soon … BOOKS, in German Coming soon … BOOKS, in Italian Coming soon … BOOKS, in French Coming soon … BOOKS, in Russian Pattern Recognition by А.Б.Мерков, 2014 http://www.recognition.mccme.ru/pub/RecognitionLab.html/slbook.pdf Algorithmic models of learning classification: rationale, comparison, selection, 2014 http://www.machinelearning.ru/wiki/images/c/c3/Donskoy14algorithmic.pdf More coming soon … BOOKS, in Japanese Coming soon … BOOKS, in Chinese Blog recommending useful books A blog written in Chinese which introduces and recommends many useful ML books (the books are mostly written in English). http://blog.csdn.net/pongba/article/details/2915005 Textbook for Statistics http://baike.baidu.com/subview/1724467/13114186.htm Introduction to Pattern recognition http://baike.baidu.com/view/3911812.htm Translated version of Machine Learning by Tom Mitchell http://book.douban.com/subject/1102235/ Presentation, Infographics and Documents in English Meetup's Presentations https://skillsmatter.com/explore?content=skillscasts&location=&q=machine learning Slideshare.com http://www.slideshare.net/search/slideshow?searchfrom=header&q=machine learning Slides.com http://slides.com/explore?search=machine%20learning Powershow.com http://www.powershow.com/search/presentations/machine-learning Speaker Deck https://speakerdeck.com/search?q=machine learning Introduction to Artificial Intelligence, 2014, University of Waterloo https://cs.uwaterloo.ca/~ppoupart/teaching/cs486-spring15/ Aprendizado de Maquina, Conceitos e definicoes by Jose Augusto Baranauskas http://dcm.ffclrp.usp.br/~augusto/teaching/ami/AM-I-Conceitos-Definicoes.pdf Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF, 2010 http://www2.ic.uff.br/~bianca/aa/ NYC ML Meetup, 2014 Natural Language Processing in Investigative Journalism by Jonathan Stray http://www.scribd.com/doc/230605794/Natural-Language-Processing-in-InvestigativeJournalism Statistics with Doodles by Thomas Levine https://thomaslevine.com/!/statistics-with-doodles-2014-03/ Conferences ICML, Lille, France 2015 http://icml.cc/2015/ ICML, Beijing, China 2014 http://icml.cc/2014/ ICML, Atlanta, US 2013 http://icml.cc/2013/ http://techtalks.tv/icml/2013/ ICML, Edinburgh, UK 2012 http://icml.cc/2012/ http://techtalks.tv/icml/2012/orals/ http://techtalks.tv/icml 2012 representation learning/ http://techtalks.tv/icml/2012/inferning2012/ http://techtalks.tv/icml/2012/object2012/ http://techtalks.tv/icml/2012/icml colt 2012 tutorials/icml-2012-tutorial-on-prediction-beliefand-market/ ICML, Bellevue, US 2011 http://www.icml-2011.org/ http://techtalks.tv/icml-2011/ Full archive of ICML http://machinelearning.org/icml.html Machine Learning Conference Videos http://techtalks.tv/search/results/?q=machine learning Annual Machine Learning Symposium 6th http://techtalks.tv/sixth-annual-machine-learning-symposium/ 8th http://www.nyas.org/Events/Detail.aspx?cid=2cc3521e-408a-460e-b159-e774734bcbea Archive http://www.nyas.org/whatwedo/fos/machine.aspx MLSS Machine Learning Summer Schools http://www.mlss.cc/ Data Gotham 2012, 2013 https://www.youtube.com/user/DataGotham Meetup 1,380 Machine Learning Meetup in the World http://machine-learning.meetup.com/ Data Science Weekly – List of Meetups List of Data Science Meetups: NYC, San Francisco, Washington DC, Boston, Chicago, Seattle, Denver, Austin, Atlanta, Toronto, Vancouver, London, Berlin, Paris, Amsterdam, Tel Aviv, Dubai, Delhi, Bangalore, Singapore, Sydney http://www.datascienceweekly.org/data-science-resources/data-science-meetups London Machine Learning Meetup http://www.meetup.com/London-Machine-Learning-Meetup/ BLOGS, in English Igor Carron Blog Nuit Blanche is a blog that focuses on Compressive Sensing, Advanced Matrix Factorization Techniques, Machine Learning as well as many other engaging ideas and techniques needed to handle and make sense of very high dimensional data also known as Big Data. http://nuit-blanche.blogspot.co.uk/ Data Science Weekly The Data Science Weekly Blog contains interviews to better understand how people are using Data and Data Science to change the world. http://www.datascienceweekly.org/blog Yann LeCun, Google+ My main research interests are Machine Learning, Computer Vision, Mobile Robotics, and Computational Neuroscience. I am also interested in Data Compression, Digital Libraries, the Physics of Computation, and all the applications of machine learning (Vision, Speech, Language, Document understanding, Data Mining, Bioinformatics). https://plus.google.com/ YannLeCunPhD/posts KDD Community, Knowledge discovery and Data Mining KDD bringing together the data mining, data science and analytics community http://www.sigkdd.org/blog Kaggle Blog http://blog.kaggle.com/ Digg Digg is a news aggregator with an editorially driven front page, aiming to select stories specifically for the Internet audience such as science, trending political issues, and viral Internet issues. (source wikipedia) http://digg.com/search?q=machine learning Feedly Found a site you like? Use the feedly button to add it to your feedly reading list http://feedly.com/i/explore/%23Machine%20Learning Mlwave Learning Machine Learning ML Wave is a platform that talks about machine learning and data science. It was founded in 2014 by the Dutch Kaggle user Triskelion. http://mlwave.com/ FastML Machine Learning made easy FastML probably grew out of a frustration with papers you need a PhD in math to understand and with either no code or half-baked Matlab implementation of homework-assignment quality. We understand that some cutting-edge researchers might have no interest in providing the goodies for free, or just no interest in such down-to-earth matters. But we don’t have time nor desire to become experts in every machine learning topic. Fortunately, there is quite a lot of good software with acceptable documentation. http://fastml.com/ Beating the Benchmark http://beatingthebenchmark.blogspot.co.uk/ Trevor Stephens Blog http://trevorstephens.com/ Mozilla Hacks Mozilla Hacks is one of the key resources for people developing for the Open Web, talking about news and in-depth descriptions of technologies and features. https://hacks.mozilla.org/?s=machine learning Banach's Algorithmic Corner, University of Warsaw This blog is maintained by members of Algorithmic group at University of Warsaw: http://corner.mimuw.edu.pl/ DataCamp Blog http://blog.datacamp.com/ Natural Language Processing Blog, Hal Daume http://nlpers.blogspot.co.uk/ Maxim Milakov Blog I am a researcher in machine learning and high-performance computing. I designed and implemented nnForge - a library for training convolutional and fully connected neural networks, with CPU and GPU (CUDA) backends. You will find my thoughts on convolutional neural networks and the results of applying convolutional ANNs for various classification tasks in the Blog. http://www.milakov.org/ Alfonso Nieto-Castanon Blog I work on the field of computational neuroscience, and my background is on neuroscience (Ph.D. Cognitive and Neural Systems, Boston University) and engineering (B.S./M.S. Telecommunication Engineering, Universidad de Valladolid). My areas of specialization are modeling and statistics, fMRI analysis methods, and signal processing. http://www.alfnie.com/home Persontyle Blog Every object on earth is generating data, including our homes, our cars and yes even our bodies. Data is the by-product of our new digital existence. Data has the potential to revolutionize the way business, government, science, research, and healthcare are carried out. Data presents unprecedented opportunities to those who have the skills and expertise to use it to unveil patterns, insights, signals and predict trends which was never possible before. In massively connected data driven world, it is imperative that the workforce of today and tomorrow is able to understand what data is available and use scientific methods to analyze and interpret it. We’re here to help you learn and apply the art and science of turning data into meaningful insights and intelligent predictions http://www.persontyle.com/blog/ Analytics Vidhya Learn everything about Analytics Welcome to Analytics Vidhya! For those of you, who are wondering what is “Analytics Vidhya”, “Analytics” can be defined as the science of extracting insights from raw data. The spectrum of analytics starts from capturing data and evolves into using insights / trends from this data to make informed decisions. “Vidhya” on the other hand is a Sanskrit noun meaning “Knowledge” or “Clarity on a subject”. Knowledge, which has been gained through reading literature or through self practice / experimentation. Through this blog, I want to create a passionate community, which dedicates itself in study of Analytics. I share my learning and tips on Analytics through this blog. http://www.analyticsvidhya.com/blog/ Bugra Akyildiz Blog Great Blog (Notes) both theoretical and practical I work as a Machine Learning/NLP Engineer at CB Insights where I apply machine learning algorithms to NLP problems. I received B.S from Bilkent University and M.Sc from New York University focusing signal processing and machine learning. http://bugra.github.io/ Rasbt Blog A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks Links to useful resources https://github.com/rasbt/pattern_classification#links-to-useful-resources Gilles Louppe Blog Understanding Random Forest, PhD Thesis https://github.com/glouppe/phd-thesis AI Topics AITopics is a mediated information portal provided by AAAI (The Association for the Advancement of Artificial Intelligence), with the goal of communicating the science and applications of AI to interested people around the world. http://aitopics.org/ AI International This international AI site is designed to help you locate AI research efforts in your country or region. Pages on this site will link to local AI societies, universities, labs, and other research efforts. http://www.aiinternational.org/index.html Joseph Misiti Blog machine-learning applied mathematics django hadoop. Co-Founder of @socialq. https://github.com/josephmisiti https://medium.com/@josephmisiti MIRI, Machine Intelligence Research Institute The mathematics of safe machine intelligence MIRI’s mission is to ensure that the creation of smarter-than-human intelligence has a positive impact. We aim to make intelligent machines behave as we intend even in the absence of immediate human supervision. Much of our current research deals with reflection, an AI’s ability to reason about its own behavior in a principled rather than ad-hoc way. We focus our research on AI approaches that can be made transparent (e.g. principled decision algorithms, not genetic algorithms), so that humans can understand why the AIs behave as they do. https://intelligence.org/blog/ Kevin Davenport Data Blog I'm a tech enthusiast interested in automation, machine learning, and conveying complex statistical models through visualization. http://kldavenport.com/ Alexandre Passant Blog I'm a hacker, researcher, and entrepreneur. I'm passionate about the Web and I love when smart algorithms and architectures power beautiful and useful products. I'm co-founder of MDG Web (http://mdg.io), a music-tech start-up based in Dogpatch Labs Dublin and focusing on the music discovery field. We're building seevl (http://seevl.fm), a free, unlimited and targeted music discovery platform available as a standalone app and a Deezer app. We also work with industry stakeholders to let hem promote their content on streaming platforms through their own branded apps. I was previously a Research Fellow and Unit Leader at DERI (http://deri.ie), the world's largest Web 3.0 R&D lab, leading high-impact projects with partners such as Google, Cisco, and more, on the Social / Semantic / Sensor Web, with a focus on Knowledge Representation and Management, Personalisation, Privacy, Distributed Systems, and Recommender Systems. Overall, I’m trying to make the Web a better place, and I’m having fun doing it. http://apassant.net/ Daniel Nouri Blog Using convolutional neural nets to detect facial keypoints tutorial, Daniel Nouri's Blog This is a hands-on tutorial on deep learning. Step by step, we'll go about building a solution for the Facial Keypoint Detection Kaggle challenge. The tutorial introduces Lasagne, a new library for building neural networks with Python and Theano. We'll use Lasagne to implement a couple of network architectures, talk about data augmentation, dropout, the importance of momentum, and pre-training. Some of these methods will help us improve our results quite a bit. I'll assume that you already know a fair bit about neural nets. That's because we won't talk about much of the background of how neural nets work; there's a few of good books and videos for that, like the Neural Networks and Deep Learning online book. Alec Radford's talk Deep Learning with Python's Theano library is a great quick introduction. Make sure you also check out Andrej Karpathy's mind-blowing ConvNetJS Browser Demos. http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facialkeypoints-tutorial/ Yvonne Rogers Blog Yvonne Rogers is a Professor of Interaction Design, the director of UCLIC and a deputy head of the Computer Science department at UCL. Her research interests are in the areas of ubiquitous computing, interaction design and human-computer interaction. A central theme is how to design interactive technologies that can enhance life by augmenting and extending everyday, learning and work activities. This involves informing, building and evaluating novel user experiences through creating and assembling a diversity of pervasive technologies. http://www.interactiveingredients.com/ Igor Subbotin Blog (Both in English & Russian) 153 abonnés 56 448 consultations (02-Jan-2015) http://igorsubbotin.blogspot.ru/ Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!) https://github.com/rasbt http://sebastianraschka.com/ Popular Science Website http://www.popsci.com/find/machine%20learning HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER Earlier this week, roughly 50,000 Skype users woke up to a new way of communicating over the Web-based phone- and video-calling platform, a feature that could’ve been pulled straight out of Star Trek. The new function, called Skype Translator, translates voice calls between different languages in realtime, turning English to Spanish and Spanish back into English on the fly. Skype plans to incrementally add support for more than 40 languages, promising nothing short of a universal translator for desktops and mobile devices. The product of more than a decade of dedicated research and development by Microsoft Research (Microsoft acquired Skype in 2011), Skype Translator does what several other Silicon Valley icons not to mention the U.S. Department of Defense have not yet been able to do. To do so, Microsoft Research (MSR) had to solve some major machine learning problems while pushing technologies like deep neural networks into new territory. http://www.popsci.com/how-microsofts-machine-learning-breaking-language-barrier Max Woolf Blog Max Woolf is a Software QA Engineer living and working in the San Francisco Bay Area for over 2 years. He graduated from Carnegie Mellon University in 2012 with a degree in Business Administration, concentrating in Computing and Information Technology. In his spare time, Max uses Python to gather data from public APIs and ggplot2 to make pretty charts from that data. Max also comments on technology blogs rather frequently. http://minimaxir.com/ Rasmus Bååth Research Blog I’m a phd student at Lund University Cognitive Science in Sweden. My main research interest is music cognition and especially rhythm perception and production. I’m also interested in statistics and statistical computing using R. My blog is syndicated on R-bloggers and StatsBlogs two great sites if you are interested in R and statistics. Everything published on my blog is licensed under a Creative Commons Attribution 4.0 International License. I also run a drinks blog over at groggbloggen.se, it’s in Swedish but focuses on minimalist drinks with only two ingrediets (which are called grogs in Sweden) so you should be able to figure it out! :) I believe that if you haven’t tried using Bayesian statistics you’re really missing out on something. Why not do some Bayesian statistics right now in the browser and try my Bayesian “t-test” demo featuring MCMC in javascript! http://www.sumsar.net/ Flowing Data Blog About The greatest value of a picture is when it forces us to notice what we never expected to see. John W. Tukey. Exploratory Data Analysis. 1977. FlowingData explores how statisticians, designers, data scientists, and others use analysis, visualization, and exploration to understand data and ourselves. As for me, I'm Dr. Nathan Yau, PhD, but you can call me Nathan. My dissertation was on personal data collection and how we can use visualization in the everyday context. That expands to more general types of data and visualization and design for a growing audience I've also written a couple of books on how to visualize data, and the series is growing. http://flowingdata.com/ Genetic algorithm walkers http://flowingdata.com/2015/01/16/genetic-algorithm-walkers/ The Shape of Data Blog About Whether your goal is to write data intensive software, use existing software to analyze large, high dimensional data sets, or to better understand and interact with the experts who do these things, you will need a strong understanding of the structure of data and how one can try to understand it. On this blog, I plan to explore and explain the basic ideas that underlie modern data analysis from a very intuitive and minimally technical perspective: by thinking of data sets as geometric objects. When I began learning about machine learning and data mining, I found that the intuition I had formed while studying geometry was extremely valuable in understanding the basic concepts and algorithms. But in many of the resources I’ve seen, this relatively simple geometry is hidden behind enough equations and algorithms to intimidate all but the most technically inclined readers. My goal in writing this blog is to put the geometry first, and show that anyone can gain an intuitive understanding of modern data analysis. About the Author: Jesse Johnson is a former math professor, with a research background in lowdimensional geometry/topology, who is now a software engineer at Google in Cambridge, MA. https://shapeofdata.wordpress.com/ Data School Blog My name is Kevin Markham, and I'm the co-founder of a small tech-for-good company. I've been living in the Washington, DC area since 2005. I'm a computer engineer, an avid cook and theatre-goer, and an occasional triathlete. I teach an 11-week data science course for General Assembly, mentor data science students for SlideRule, and am a Community Teaching Assistant for the Johns Hopkins University Data Science Specialization. In my spare time, I create educational videos and compete in Kaggle competitions. I created this blog because I love writing about data science topics, especially for people new to the field. I've found that most data science resources are inaccessible to novices, and so I strive to make my resources as accessible as possible to data scientists at all levels of knowledge and experience. http://www.dataschool.io/ Julia Evans Blog About me (See the website to access the links) Hi! I’m Julia. I live in Montreal and work on Stripe’s machine learning team. You can find me elsewhere on the internet:... This blog is mostly about having fun with systems programming, with lots of forays into other areas. There’s a list of my favorite posts, as well as some projects I’ve worked on. I spent the fall of 2013 at Hacker School, which houses the best programming community I’ve seen anywhere. I wrote down what I did every day while there, if you want to know what it’s like. In the last year or two I’ve discovered that I like organizing community events and giving talks about programming. A few things I’ve worked on: Montreal All-Girl Hack Night with my awesome friend Monica PyLadies Montreal. !!Con, a 2-day conference about what excites us about programming, where all the talks are lightning talks (with several amazing people) http://jvns.ca/ http://nbviewer.ipython.org/github/jvns/talks/blob/master/pydatanyc2013/PyData%20NYC %202013%20tutorial.ipynb Stephan Hügel's Blog About My name’s Stephan Hügel, and I’m a doctoral researcher at UCL CASA. My main research interest is in computational municipal infrastructure. In short: I’m studying ways for cities to make more of their infrastructure available to people who live in them, in both human and machine-readable ways. Currently, this infrastructure takes the form of data stores and sensor platforms. If in doubt, put an API on it. http://sensitivecities.com/ Visualising London Bike Hire Journey Lengths with Python and OSRM by Stephan Hügel London’s Cycle Hire scheme has been a roaring success and continues to grow, with new stations being added all the time. This tutorial will produce a visualisation of journey times from the central point (well, approximately) of the bike station network to all other stations. This is made possible by the provision of an open-access instance of OSRM by the lovely people at Mapzen. I won’t spend too much time on what OSRM is or how it works; suffice to say that it’s an open-source routing engine that uses OpenStreetmap, and that the Mapzen instance provides walking, cycling, and public transit routing data via HTTP. Hurrah! The code for this tutorial is available here as an IPython Notebook http://sensitivecities.com/bikeshare.html#.Vbe-Tiqqqkp So You’d Like To Make a Map Using Python by Stephan Hügel Making thematic maps has traditionally been the preserve of a ‘proper’ GIS, such as ArcGIS or QGIS. While these tools make it easy to work with shapefiles, and expose a range of common everyday GIS operations, they aren’t particularly well-suited to exploratory data analysis. In short, if you need to obtain, reshape, and otherwise wrangle data before you use it to make a map, it’s easier to use a data analysis tool (such as Pandas), and couple it to a plotting library. This tutorial will be demonstrating the use of: • Pandas • Matplotlib • The matplotlib Basemap toolkit, for plotting 2D data on maps • Fiona, a Python interface to OGR • Shapely, for analyzing and manipulating planar geometric objects • Descartes, which turns said geometric objects into matplotlib “patches” • PySAL, a spatial analysis library The approach I’m using here uses an interactive REPL (IPython Notebook) for data exploration and analysis, and the Descartes package to render individual polygons (in this case, wards in London) as matplotlib patches, before adding them to a matplotlib axes instance. I should stress that many of the plotting operations could be more quickly accomplished, but my aim here is to demonstrate how to precisely control certain operations, in order to achieve e.g. the precise line width, colour, alpha value or label position you want. http://sensitivecities.com/so-youd-like-to-make-a-map-using-python-EN.html#.Vbe-cyqqqkp BACKCHANNEL "Tech Stories Hub" by Steven Levy I’m moving to Medium Creating a new hub for tech stories that matter For more than 30 years, I’ve been telling the true and truly jaw-dropping stories of the people who are changing the world with tech, and I’ve been extremely lucky in finding homes for my work. I first began writing about the subject for Rolling Stone, a magazine I idolized ever since my high school years. My twelve years at Newsweek provided an amazing front row seat to the dawn of the Internet era. And Wired, where I’ve been full time for the last six years after freelancing for the magazine since its birth is the gold standard of reporting on the parts of the world where the future is already distributed. Now, after hanging out at great startups since forever, I’m finally joining one, hoping to create a chunk of that future myself. https://medium.com/backchannel DataScience Vegas DataScience.Vegas is a blog that acts as a home for all the content (slides, code, videos) for several data science meetups in Las Vegas including Data Science LV, R, DataVis, and Python-Data Science. The blog is run by Data Science Las Vegas (DSLV), a non-profit professional group that brings together people interested in data science in the Las Vegas community. We are a community of data scientists, data miners, statisticians, data analysts, data engineers, data visualizers, data journalists, academics, researchers, and in general people directly involved in data projects. - See more at: http://datascience.vegas/about/#sthash.SIKuX50E.dpuf http://datascience.vegas/blog/#sthash.m55GV0dM.K5gVO4Wb.dpbs The Twitter Developer Blog Your source for new features, best practices and real-world use of the Twitter Platform. https://blog.twitter.com/developer Tyler Neylon Blog Coder in C, js, Python, Obj-C; math for fun. Currently building a game called Apanga. http://tylerneylon.com/ Victor Powell Blog Freelance data visualization - data visualization visual explanation http://blog.vctr.me/ CrowFlower Blog http://www.crowdflower.com/blog Edward Raff Blog I work as a Computer Scientist and specialize in the area of Machine Learning. In my spare time I maintain a large open source project for Machine Learning in Java. http://jsatml.blogspot.co.uk/ Dirk Gorissen Blog and Projects Academic who crossed over to the dark side. Research engineer, dabbling in everything from autonomous systems and data science, to machine learning and computational engineering. Organiser of @bigoldn and @deeplearningldn. Tech4Good enthusiast. http://dirkgorissen.com/blog/ http://dirkgorissen.com/projects/ "How data, Python, and you can help 22.3 million people in Tanzania (almost half the population) get better access to clean water." - based on Dirk's recent travels to Tanzania working on a real-world data science problem http://www.slideshare.net/dgorissen/data-for-good-38663284 http://www.meetup.com/PyData-London-Meetup/events/201507442/ Joseph Jacobs Homepage & Blog Hey there! My name is Joseph Jacobs (or Joe for short). I was born and raised in Kajang, Malaysia. I am currently pursuing a PhD in Computer Science at University College London. I have a BSc in Computer Science from the University of Bristol and an MSc in Machine Learning from University College London. I am a Mac user, a Manchester United fan and a general all-round geek. https://joejacobs.me/ http://joejacobs.org/ MISCELLANEOUS Allen Institute for Artificial Intelligence (AI2) MISSION The core mission of The Allen Institute for Artificial Intelligence (AI2) is to contribute to humanity through high-impact AI research and engineering. We will do this by constructing AI systems with reasoning, learning and reading capabilities. Please see the New York Times Profile of AI2. http://allenai.org/index.html https://www.youtube.com/channel/UCEqgmyWChwvt6MFGGlmUQCQ?spfreload=10 Artificial General Intelligence (AGI) Society This channel contains videos from the Artificial General Intelligence Society. The AGI Society organizes a yearly conference and occasional summer school. Artificial General Intelligence (AGI) is an emerging field aiming at the building of “thinking machines”; that is, general-purpose systems with intelligence comparable to that of the human mind (and perhaps ultimately well beyond human general intelligence). While this was the original goal of Artificial Intelligence (AI), the mainstream of AI research has turned toward domain-dependent and problem-specific solutions; therefore it has become necessary to use a new name to indicate research that still pursues the “Grand AI Dream”. Similar labels for this kind of research include “Strong AI”, “Human-level AI”, etc. https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA?spfreload=10 http://www.agi-society.org/ AUAI, Association for Uncertainty in Artificial Intelligence About AUAI The Association for Uncertainty in Artificial Intelligence is a non-profit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty. The next UAI conference is the 30th conference, UAI-2015 in Amsterdam, The Netherlands, on July 12-16, 2015. Join our Facebook group or add yourself to the UAI Mailing list to keep updated on announcements and relevant AI news. Principles and applications developed within the UAI community have been at the forefront of research in Artificial Intelligence. The UAI community and annual meeting have been primary sources of advances in graphical models for representing and reasoning with uncertainty. http://www.auai.org/ BLOGS, in Spanish Coming soon … BLOGS, in Portuguese Coming soon … BLOGS, in Italian Coming soon … BLOGS, in German Coming soon … BLOGS, in French L'ATELIER's News L'Atelier, cellule de veille de BNP Paribas depuis plus de 30 ans. BNP ParibasL'Atelier est implanté dans trois territoires majeurs de l'innovation (USA, Chine, Europe) pour repérer, conseiller et accompagner les entreprises. La cellule de veille s’appuie sur quatre activités : le Média, qui réalise une veille partagée sur ses différents supports (site, radio, médias sociaux) ; les Evénements, qui permettent l’échange autour de problématiques innovantes, le Conseil en stratégie numérique, qui replace les innovations détectées dans le contexte des entreprises et des métiers. Enfin, L'Atelier Lab rapproche entrepreneurs innovants et grandes entreprises, pour les aider à concevoir ensemble de nouveaux produits et services numériques. http://www.atelier.net/search/apachesolr_search/machine%20learning More coming soon … BLOGS, in Russian Igor Subbotin's Blog (Both in English & Russian) (Huge list of resources) 153 abonnés 56 448 consultations (02-Jan-2015) http://igorsubbotin.blogspot.ru/ More coming soon … BLOGS, in Japanese Coming soon … BLOGS, in Chinese Coming soon … JOURNALS, in English Journal of Machine Learning Research, MIT Press http://jmlr.org/ Machine Learning Journal (last article could be downloaded for free) http://link.springer.com/journal/10994 Machine Learning (Theory) This is an experiment in the application of a blog to academic research in machine learning and learning theory by John Langford. Exactly where this experiment takes us and how the blog will turn out to be useful (or not) is one of those prediction problems we so dearly love in machine learning. http://hunch.net/ List of Journals on Microsoft Academic Research website http://academic.research.microsoft.com/RankList? entitytype=4&topDomainID=2&subDomainID=6&last=0&start=1&end=100 Wired magazine http://www.wired.com/tag/machine-learning/ Data Science Central Data Science Central is the industry's online resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central provides a community experience that includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology, tools and trends and industry job opportunities. http://www.datasciencecentral.com JOURNALS, in Spanish Coming soon … JOURNALS, in Portuguese Coming soon … JOURNALS, in Italian Coming soon … JOURNALS, in German Coming soon … JOURNALS, in French Coming soon … JOURNALS, in Russian Coming soon … JOURNALS, in Japanese Coming soon … JOURNALS, in Chinese Coming soon … FORUM, Q&A, in English Data Tau Hacker News for Data Scientists Great website with a lot of really good and leading edge information! Respect the user’s privacy by do not asking any personal information or email! Remark: machinelearningsalon.org is using standard templates for forums which are provided by its website hosting system, but machinelearningsalon.org is looking forward to do the same than DataTau.com! http://www.datatau.com/ Hacker News Great website like datatau.com but less dedicated to Machine Learning! Respect the user’s privacy by do not asking any personal information or email! https://news.ycombinator.com/ Kaggle Forums 44,032 posts in 8,087 topics in 439 forums. (source 4 h June 2014) https://www.kaggle.com/forums Reddit /r/MachineLearning News, Research Papers, Videos, Lectures, Softwares and Discussions on: • Machine Learning • Data Mining • Information Retrieval • Predictive Statistics • Learning Theory • Search Engines • Pattern Recognition • Analytics http://www.reddit.com/r/MachineLearning/ Beginners: Please have a look at our FAQ and Link-Collection http://www.reddit.com/r/MachineLearning/wiki/index Reddit /r/generative Art that has been generated, composed, or constructed in an algorithmic manner through the use of systems defined by computer software algorithms, or similar mathematical or mechanical or randomized autonomous processes. http://www.reddit.com/r/generative Cross validated Stack Exchange Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It's 100% free, no registration required. http://stats.stackexchange.com/ Open data Stack Exchange Open Data Stack Exchange is a question and answer site for developers and researchers interested in open data. It's 100% free, no registration required. http://opendata.stackexchange.com/ Data Science Beta Stack Exchange Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It's 100% free, no registration required. http://datascience.stackexchange.com/ Quora Quora is your best source for knowledge. Why do I need to sign in? Quora is a knowledge-sharing community that depends on everyone being able to pitch in when they know something. http://www.quora.com/Machine-Learning Machine Learning Impact Forum Welcome! Please contribute your ideas for what challenges we might aspire to solve, changes in our community that can improve machine learning impact, and examples of machine learning projects that have had tangible impact. http://www.wkiri.com/mlimpact/ FORUM, Q&A, in Spanish More coming soon … FORUM, Q&A, in Portuguese More coming soon … FORUM, Q&A, in Italian More coming soon … FORUM, Q&A, in German More coming soon … FORUM, Q&A, in French More coming soon … FORUM, Q&A, in Russian Reddit in Russian http://www.reddit.com/r/MachineLearning_Ru Habrahabr.ru Forum (in Russian translated by Google Chrome) http://habrahabr.ru/ Some examples: Playing with genetic algorithms http://habrahabr.ru/post/246951/ What is a genetic algorithm Why it works We formalize the problem with a random string An example of the algorithm Experiments with the classics Code and data Findings PythonDigest - 2014, the results of our work in figures and references The main purpose for which it was created digest creation aggregator of news and information, as a programming language python, and by branch or modules. During the existence of the digest collected approximately 5235 materials, translated and published in 1776 news. http://habrahabr.ru/post/247067/ More coming soon … FORUM, Q&A, in Japanese More coming soon … FORUM, Q&A, in Chinese Zhihu.com Machine Learning http://www.zhihu.com/search?q=%E6%9C%BA%E5%99%A8%E5%AD %A6%E4%B9%A0&type=question Data Mining http://www.zhihu.com/search?q=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E %98&type=question Artificial Intelligence http://www.zhihu.com/search?q=%E4%BA%BA%E5%B7%A5%E6%99%BA %E8%83%BD&type=question Guokr.com Machine Learning http://www.guokr.com/search/all/?wd=%E6%9C%BA%E5%99%A8%E5%AD %A6%E4%B9%A0 Data Mining http://www.guokr.com/search/all/?wd=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E %98&sort=&term=True Artificial Intelligence http://www.guokr.com/search/all/?wd=%E4%BA%BA%E5%B7%A5%E6%99%BA %E8%83%BD&sort=&term=True More coming soon … Governmental REPORTS, in English Big Data report, Whitehouse, US https://www.whitehouse.gov/sites/default/files/docs/ big data privacy report may 1 2014.pdf FUN, in English Founder of PhD Comics Jorge is the creator of "PHD Comics", the popular comic strip about life (or the lack thereof) in Academia. He is also the co-founder of PHDtv, a video science and discovery outreach collaborative, and a founding board member of Endeavor College Prep, a non-profit school for kids in East L.A. He earned his Ph.D. in Robotics from Stanford University and was an Instructor and Research Associate at Caltech from 2003-2005. He is originally from Panama. http://jorgecham.com/ MACHINE LEARNING RESEARCH GROUPS, in USA Computer Science and Artificial Intelligence Lab, MIT The Computer Science and Artificial Intelligence Laboratory known as CSAIL is the largest research laboratory at MIT and one of the world’s most important centers of information technology research. CSAIL and its members have played a key role in the computer revolution. The Lab’s researchers have been key movers in developments like time-sharing, massively parallel computers, public key encryption, the mass commercialization of robots, and much of the technology underlying the ARPANet, Internet and the World Wide Web. CSAIL members (former and current) have launched more than 100 companies, including 3Com, Lotus Development Corporation, RSA Data Security, Akamai, iRobot, Meraki, ITA Software, and Vertica. The Lab is home to the World Wide Web Consortium (W3C), directed by Tim Berners-Lee, inventor of the Web and a CSAIL member. CSAIL research is focused on developing the architectures and infrastructures of tomorrow’s information technology, and on creating innovations that will yield long-term improvements in how people live and work. Lab members conduct research in almost all aspects of computer science, including artificial intelligence, the theory of computation, systems, machine learning, computer graphics, as well as exploring revolutionary new computational methods for advancing healthcare, manufacturing, energy and human productivity. http://www.csail.mit.edu/ Artificial Intelligence Laboratory, Stanford University Welcome to the Stanford AI Lab Founded in 1962, The Stanford Artificial Intelligence Laboratory (SAIL) has been a center of excellence for Artificial Intelligence research, teaching, theory, and practice for over fifty years. Reading group We have several weekly reading groups where we present and discuss papers on various topics in machine learning, natural language processing, computer vision, etc. Autonomous Highway Driving A deep learning model outputs the location of lane markings and surrounding cars given only a single camera image. http://ai.stanford.edu/ http://ai.stanford.edu/courses/ Machine Learning Department, Carnegie Mellon University The Machine Learning Department is an academic department within Carnegie Mellon University's School of Computer Science. We focus on research and education in all areas of statistical machine learning. Watch an interview with Tom Mitchell, Department Head: http://www.ml.cmu.edu/ Noah's ARK Research Group, Carnegie Mellon University Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. (The research is formal; the group is informal.) As you may have guessed, our research focuses on problems of ambiguity and uncertainty in natural language processing, including morphology, syntax, semantics, translation, and behavioral/social phenomena observed through language all viewed through a computational lens. http://www.ark.cs.cmu.edu/ Intelligent Interactive Systems Group, Harvard University Intelligent Interactive Systems are fundamentally hard to design because they require intelligent technology that is well suited for people's abilities, limitations, and preferences; they also require entirely novel interactions that can give the user a predictable and reliable experience despite the fact that the underlying technology is inherently proactive, unpredictable, and occasionally wrong. Thus, design of successful intelligent interactive systems requires intimate knowledge of and ability to innovate in two very disparate areas: human-computer interaction and artificial intelligence or machine learning. Our projects span the full range from formal user studies to statistical machine learning. We have worked on developing new intelligent technologies to enable novel interactions (e.g., SUPPLE system) and on understanding the principles underlying how people interact with intelligent systems (e.g., the project on exploring the design space of adaptive user interfaces). Our BrainComputer Interface project aims at developing a new set of interactions for efficiently controlling complex applications, and we are also interested in building and studying complete applications. One particular area of inteterest is the ability-based user interfaces -- an approach for adapting interactions to the individual abilities of people with impairments or of able-bodied people in unusual situations. http://iis.seas.harvard.edu/ http://iis.seas.harvard.edu/resources/ Statistical Machine Learning, University of California, Berkeley Research Statement Statistical machine learning merges statistics with the computational sciences---computer science, systems science and optimization. Much of the agenda in statistical machine learning is driven by applied problems in science and technology, where data streams are increasingly large-scale, dynamical and heterogeneous, and where mathematical and algorithmic creativity are required to bring statistical methodology to bear. Fields such as bioinformatics, artificial intelligence, signal processing, communications, networking, information management, finance, game theory and control theory are all being heavily influenced by developments in statistical machine learning. The field of statistical machine learning also poses some of the most challenging theoretical problems in modern statistics, chief among them being the general problem of understanding the link between inference and computation. Research in statistical machine learning at Berkeley builds on Berkeley's world-class strengths in probability, mathematical statistics, computer science and systems science. Moreover, by its interdisciplinary nature, statistical machine learning helps to forge new links among these fields. An education in statistical machine learning at Berkeley thus involves an immersion in the traditions of statistical science broadly defined, a thoroughgoing involvement in exciting applied problems, and an opportunity to help shape the future of statistics. http://www.stat.berkeley.edu/~statlearning/ UC Berkeley AMPLab, AMP: ALGORITHMS MACHINES PEOPLE People will play a key role in data-intensive applications not simply as passive consumers of results, but as active providers and gatherers of data, and to solve ML-hard problems that algorithms on their own cannot solve. With crowdsourcing, people can be viewed as highly valuable but unreliable and unpredictable resources, in terms of both latency and answer quality. They must be incentivized appropriately to provide quality answers despite varying expertise, diligence and even malicious behavior. The AMPLab is addressing these issues in all phases of the analytics lifecycle. https://amplab.cs.berkeley.edu/ Videos https://www.youtube.com/user/BerkeleyAMPLab/videos?spfreload=10 Berkeley Institute for Data Science The Berkeley Institute for Data Science (BIDS) was founded in fall 2013 to build on existing campus strengths with a multidisciplinary emphasis that aims to facilitate and enhance the development and application of cutting-edge data science techniques in the biological, physical, social and engineering sciences. The Institute aims to build on the many recent innovations in data science techniques so that they can be applied in effective ways to domain science challenges. BIDS brings together researchers across disciplines and enhances career paths for data scientists through a number of newly created Data Science Fellows positions, graduate student fellowships, boot-camps, special classes, and conferences of interest to the academic community and general public. The Institute’s initial support is provided by a 5-year $12.5 million grant from the Moore and Sloan Foundations together with significant support provided by UC Berkeley. The “MooreSloan Data Science Environment” also supports similar programs with shared goals and objectives at the University of Washington and New York University. http://bids.berkeley.edu/ Data Science Lecture Series: Maximizing Human Potential Using Machine Learning-Driven Applications https://www.youtube.com/channel/UCBBd3JxQl455JkWBeulc-9w?spfreload=10 Department of Computer Science - ARTIFICIAL INTELLIGENCE & MACHINE LEARNING, Princeton University Machine learning and computational perception research at Princeton is focused on the theoretical foundations of machine learning, the experimental study of machine learning algorithms, and the interdisciplinary application of machine learning to other domains, such as biology and information retrieval. Some of the techniques that we are studying include boosting, probabilistic graphical models, support-vector machines, and nonparametric Bayesian techniques. We are especially interested in learning from large and complex data sets. Example applications include habitat modeling of species distributions, topic models of large collections of scientific articles, classification of brain images, protein function classification, and extensions of the Wordnet semantic network. http://www.cs.princeton.edu/research/areas/mlearn Research Laboratories and Groups, University of California, Los Angeles (UCLA) Automated Reasoning Group (Adnan Darwiche) Biocybernetics Laboratory (Joe DiStefano) Center for Vision, Cognition, Learning and Art (Song-Chun Zhu) Cognitive Systems Laboratory (Judea Pearl) Concurrent Systems Laboratory (Yuval Tamir) Digital Arithmetic and Reconfigurable Architecture Laboratory (Milos Ercegovac) ER: Embedded and Reconfigurable System Design (Majid Sarrafzadeh) Information and Data Management Group (multiple faculty) Internet Research Laboratory (Lixia Zhang) Laboratory for Embedded Collaborative Systems (LECS) (archived CENS documents) Laboratory for Advanced Systems Research (LASR) (Peter Reiher) MAGIX: Computer Graphics & Vision Laboratory (Demetri Terzopoulos) Multimedia Information System Technology Group & Laboratory (Alfonso Cardenas) Network Research Laboratory (Mario Gerla) Software Systems Group (multiple faculty) Vision Laboratory (Stefano Soatto) VLSI Architecture, Synthesis & Technology (VAST) Laboratory (Jason Cong) Web Information Systems Laboratory (Carlo Zaniolo) WiNG (Wireless Networking Group) (Songwu Lu) http://www.cs.ucla.edu/research-labs/ Cornwell University https://confluence.cornell.edu/display/ml/Home https://confluence.cornell.edu/display/ML/Courses Machine Learning Research, University of Illinois at Urbana Champaign The Department of Computer Science at the University of Illinois at Urbana Champaign has several faculty members working in the area of machine learning, learning theory, explanation based learning, learning in natural language processing and data mining. In addition, many faculty members inside and outside the department whose primary research interests are in other areas have specific research projects involving machine learning in some way. http://ml.cs.illinois.edu/ Department of Computing + Mathematical Science, California Institute of Technology, Caltech The Computing Mathematical Sciences department pursues numerous research interests covering a wide array of application areas. We take full advantage of Caltech's unique interdisciplinary character by drawing on research expertise not only from our own department, but from throughout the Institute. Research efforts within the department evolve at a fast pace, and cover currently six discernible focus areas: • Discrete Differential Modeling • DNA Computing and Molecular Programming • Perceptual and Machine Learning for Autonomous Systems • Rigorous Systems Research • Scientific Computing and Applied Analysis • Theory of Computation http://www.cms.caltech.edu/research/ Machine Learning, University of Washington UW is one of the world's top centers of research in machine learning. We are active in most major areas of ML and in a variety of applications like natural language processing, vision, computational biology, the Web, and social networks. Check out the links on the left to find out who's who and what's happening in ML at UW. And be sure to see our CSE-wide efforts in Big Data https://www.cs.washington.edu/research/ml/ "Big Data" Research and Education, University of Washington UW CSE is driving the "Big Data" revolution. Our traditional strength in data management (Magda Balazinska, Bill Howe, Dan Suciu), machine learning (Pedro Domingos), and open information extraction (Oren Etzioni, Dan Weld) has recently been augmented by key hires in machine learning (Emily Fox, Carlos Guestrin, Ben Taskar) and data visualization (Jeff Heer). Our efforts are coordinated with those of outstanding researchers in the University of Washington's top-ten programs in Statistics, Biostatistics, and Applied Mathematics, among others. Through the University of Washington eScience Institute (directed by Ed Lazowska) we are integrally involved in ensuring that researchers across the campus have access to cutting-edge approaches to data-driven discovery. http://www.cs.washington.edu/research/bigdata Social Robotics Lab - Yale University The members of our lab perform research over a diverse collection of topics. Though these projects approach social and developmental research from varied perspectives, they all share common themes. Robots provide an embodied, empirical testbed that allows for repeated validation. Robots also enable the use of social interactions as part of the modeled experimental environment, staying grounded in real-world perceptions, and appropriately integrating perceptual, motor, and cognitive skills. http://scazlab.yale.edu/publications/all-publications ML@GT, Georgia Institute of Technology http://ml.cc.gatech.edu/ Machine Learning Research Group, University of Texas and Austin Machine learning is the study of adaptive computational systems that improve their performance with experience. The Machine Learning Research Group at UT Austin is led by Professor Raymond Mooney, and our research has explored a wide variety of issues in machine learning for over two decades. Our current research focuses primarily on natural language learning, statistical relational learning, transfer learning, and active learning. https://www.cs.utexas.edu/~ml/ Penn Research in Machine Learning, University of Pennsylvania Current projects: • Structured Prediction • Bandit and Limited-Feedback Problems • Computation and Statistics • Online Learning, Sequential Prediction, Regret Minimization • Statistical Learning Theory http://priml.upenn.edu/Main/Research Machine Learning @ Columbia University The Columbia Machine Learning Lab pursues research in machine learning with applications in vision, graphs and spatio-temporal data. Funding provided by NSF. http://www.cs.columbia.edu/learning/ New York City University CILVR Lab and Center for Data Science The CILVR Lab (Computational Intelligence, Learning, Vision, and Robotics) regroups three faculty members, research scientists, postdocs, and students working on AI, machine learning, and a wide variety of applications, notably computer perception, robotics, and health care. http://cilvr.nyu.edu/doku.php http://cds.nyu.edu/ University of Chicago http://ml.cs.uchicago.edu/ The Johns Hopkins Center for Language and Speech Processing (CLSP) Archive Videos The Johns Hopkins Center for Language and Speech Processing (CLSP) is an interdisciplinary research and educational center focused on the science and technology of language and speech. Within its field, CLSP is recognized as one of the largest and most influential academic research centers in the world. The center conducts research across a broad spectrum of fundamental and applied topics including acoustic processing, automatic speech recognition, big data, cognitive modeling, computational linguistics, information extraction, machine learning, machine translation, and text analysis. http://clsp.jhu.edu/seminars/archive/video/ MISCELLENEAOUS IARPA Organization The Intelligence Advanced Research Projects Activity (IARPA) invests in high-risk/high-payoff research programs that have the potential to provide our nation with an overwhelming intelligence advantage over future adversaries. http://www.iarpa.gov/ MACHINE LEARNING RESEARCH GROUPS, in Canada Machine Learning Lab, University of Toronto Machine Learning @ UofT: The Department of Computer Science at the University of Toronto has several faculty members working in the area of machine learning, neural networks, statistical pattern recognition, probabilistic planning, and adaptive systems. In addition, many faculty members inside and outside the department whose primary research interests are in other areas have specific research projects involving machine learning in some way. http://learning.cs.toronto.edu/ The Fields Institute for Research in Mathematical Science, University of Toronto The Fields Institute is a center for mathematical research activity - a place where mathematicians from Canada and abroad, from business, industry and financial institutions, can come together to carry out research and formulate problems of mutual interest. Our mission is to provide a supportive and stimulating environment for mathematics innovation and education. The Fields Institute promotes mathematical activity in Canada and helps to expand the application of mathematics in modern society. http://www.fields.utoronto.ca/ Artificial Intelligence Research Group, University of Waterloo The Artificial Intelligence Group conducts research in many areas of artificial intelligence. The group has active interests in: models of intelligent interaction, multi-agent systems, natural language understanding, constraint programming, computational vision, robotics, machine learning, and reasoning under uncertainty. http://ai.uwaterloo.ca/ Course material http://ai.uwaterloo.ca/coursegr.html Artificial Intelligence Research Groups, University of British Columbia Research Groups Computer Vision and Robotics: This is one of the most influential vision and robotics groups in the world. It is this group that created RoboCup and the celebrated SIFT features. The students in this group have won most of the AAAI Semantic Robot Challenges. The group has four active faculty: David Lowe, Jim Little, Alan Mackworth and Bob Woodham. Empirical Algorithmics: Led by Holger Hoos and Kevin Leyton Brown, this research group studies the empirical behaviour of algorithms and develops automated methods for improving algorithmic performance. Work by the empirical algorithmics group at UBC/CS has lead to substantial improvements in the state of the art in solving a wide range of prominent problems, including SAT, AI Planning and Mixed Integer Programming, and won numerous awards. Game Theory and Decision Theory: With Kevin Leyton Brown in the lead, this group has made significant contributions to algorithmic game theory, multiagent systems and mechanism design. David Poole also contributes to this group with his work on decision processes and planning. The research problems attacked by this group are therefore of great importance to e-commerce, auctions and advertising. Intelligent User Interfaces: With Cristina Conati and Giuseppe Carenini this group's goal is to investigate principles and techniques for preference modeling and elicitation, interactive decision making, user-adaptive information visualization and visual interfaces for text analysis. Knowledge Representation and Reasoning: David Poole leads this group with his foundational work on probabilistic first order logic and semantic science. This work on logical and probabilistic reasoning has been of profound and broad impact in the field of artificial intelligence (AI). Holger Hoos is also an important member of this group with his work on satisfiability (SAT) and planning, which has won numerous awards and competitions. Machine Learning: With the guidance of Nando de Freitas and Kevin Murphy, this group's vision is to advance the frontier of knowledge in Bayesian inference, Monte Carlo algorithms, probabilistic graphical models, neural computation, personalization, mining web-scale datasets, prediction and optimal decision making. Natural Language Processing: Under the leadership of Giuseppe Carenini and Raymond Ng (Data Management and Mining Lab) this group's vision is to further our understanding of abstactive summarization, mining conversations and evaluative text, natural language generation. https://www.cs.ubc.ca/cs-research/lci/research-groups/machine-learning MILA, Machine Learning Lab, University of Montreal The MILA is the Institut des algorithmes d'apprentissage de Montréal, or the Montreal Institute for Learning Algorithms. Mission • federate researchers in the domain of Deep Learning, • provide a plat-form of collaboration, and co-supervision • share human ressources as well as insfrastrutures and computer networks • provide a unique access to a pool of companies which can benefit of the opportunities given by automatic learning algorithms Scientific Mission • supervised learning and pattern recognition • non-supervised and semi-supervised learning • representation learning, and deep learning representations • computer vision applications • applications in natural language precessing • applications signal modelisation such as sounds and music • applications on large scale data (big data) See Research for more info on our main research interests. Expertise Researchers from MILA have developed an expertise in deep neural networks (both discriminant and generative) and their applications to vision, speech and language. MILA is world-renowned for many breakthroughs in developing novel deep learning algorithms and applying them to various domains. They include, but are not limited to, neural language modelling, neural machine translation, object recognition, structured output generative modelling and neural speech recognition. http://www.mila.umontreal.ca/ Intelligence artificielle, University of Sherbrooke Trois équipes oeuvrent dans cet axe de recherche; d'autres projets sont conduits par des chercheurs agissant à titre individuel. L'équipe de recherche dans le domaine des systèmes tutoriels intelligents ASTUS (Apprentissage par Système Tutoriel de l'Université de Sherbrooke) travaille autour des thèmes suivants: représentation des connaissances, modélisation de l'utilisateur, interactions humain-machine, psychologie de l'éducation et sciences cognitives. L'équipe de recherche dans le domaine du forage de données, Prospectus (Prospection de données à l'Université de Sherbrooke), travaille autour des thèmes suivants: prospection des données, prospection et modélisation des connaissances, reconnaissance de formes, segmentation et classification, méthodes d'intelligence artificielle non symboliques, réseaux de neurones et réseaux bayésiens, détection de structures et comportements latents. L'équipe de recherche dans le domaine de la planification en intelligence artificielle, PLANIART, travaille autour de thèmes suivant : planification de trajectoires, planification de comportements et reconnaissance de plans dans les jeux vidéo et en robotique mobile. La planification permet de décider quoi faire (décomposition des buts), comment le faire (allocation des ressources) et quand le faire (ordonnancement). http://www.usherbrooke.ca/informatique/recherche/domaines-de-recherche/intelligenceartificielle/ Centre de recherche sur les environnements intelligents, University of Sherbrooke Le Centre de Recherche sur les Environnements Intelligents (CREI) comprend 13 membres réguliers, 11 membres associés et plus d'une soixantaine d'étudiants gradués. Le CREI fédère 7 laboratoires dont les intérêts de recherche portent sur l'imagerie numérique, l’intelligence artificielle, la modélisation-validation et l’intelligence ambiante. Les chercheurs du CREI collaborent depuis des années, développant des applications en lien avec les environnements intelligents. http://www.usherbrooke.ca/crei/ Machine Learning Research Group, University of Laval http://graal.ift.ulaval.ca/ More to come … MACHINE LEARNING RESEARCH GROUPS, in Brazil USP - UNIVERSIDADE DE SÃO PAULO, Instituto de Ciências Matemáticas e de Computação http://www.icmc.usp.br/Portal/ More coming soon … MACHINE LEARNING RESEARCH GROUPS, in United Kingdom The Centre for Computational Statistics and Machine Learning (CSML), University College London The Centre for Computational Statistics and Machine Learning (CSML) spans three departments at University College London, Computer Science, Statistical Science, and the Gatsby Computational Neuroscience Unit. The Centre will pioneer an emerging field that brings together statistics, the recent extensive advances in theoretically well-founded machine learning, and links with a broad range of application areas drawn from across the college, including neuroscience, astrophysics, biological sciences, complexity science, etc. There is a deliberate intention to maintain and cultivate a plurality of approaches within the centre including Bayesian, frequentist, on-line, statistical, etc. http://www.csml.ucl.ac.uk/ CASA (Centre for Advanced Spatial Studies) Working Papers, University College London http://www.bartlett.ucl.ac.uk/casa/latest/publications/working-papers Example #198 A global inter-country economic model based on linked input-output models We present a new, flexible and extensible alternative to multi-regional input-output (MRIO) for modelling the global economy. The limited coefficient set of MRIO (technical coefficients only) is extended to include two new sets of coefficients, import ratios and import propensities. These new coefficient sets assist in the interaction of the new model with other social science models such as those of trade, migration, international security and development aid. The model uses input-output models as descriptions of the internal workings of countries' economies, and couples these more loosely than in MRIO using trade data for commodities and services from the UN. The model is constructed using a minimal number of assumptions, seeks to be as parsimonious as possible in terms of the number of coefficients, and is based to a great extent on empirical observation. Two new metrics are introduced, measuring sectors' economic significance and economic self-reliance per country. The Chinese vehicles sector is shown to be the world's most significant, and self-reliance is shown to be strongly correlated with population. The new model is shown to be equivalent to an MRIO under an additional assumption, allowing existing analysis techniques to be applied. http://www.bartlett.ucl.ac.uk/casa/publications/working-paper-198 The Machine Learning Research Group in the Department of Engineering Science, Oxford University The Machine Learning Research Group is a sub-group within Information Engineering (Robotics Research Group) in the Department of Engineering Science of the University of Oxford. We are interested in probabilistic reasoning applied to problems in science, engineering and computing. We use the tools of statistical, and in particular Bayesian, inference to deal rationally with uncertainty and information in a number of domains including astronomy, biology, finance, image & signal processing and multi-agent systems, as well as researching the theory of Bayesian modelling and inference. http://www.robots.ox.ac.uk/~parg/doku.php?id=home Machine Learning research in the Department of Computer Science Machine Learning research in the Department of Computer Science evolves along the following directions Deep learning Large scale machine learning and big data Random forests and ensemble methods Proabilistic graphical models Bayesian optimisation Reinforcement learning Monte Carlo methods and randomised algorithms. Applications to control, games, language understanding, computer vision, speech, time series, and all types of structured and unstructured data. The group is part of wider Machine Learning initiative at Oxford, which includes researchers in statistics (Yee Whye Teh, Arnaud Doucet, Chris Holmes) and information engineering (Michael Osborne,Steve Roberts,Frank Wood) http://www.cs.ox.ac.uk/activities/machlearn/ Machine Learning Group, Imperial College Transforming Big Data into Knowledge The Machine Learning Group is a cross-faculty network of Imperial College’s Department of Computing. We embrace research at the interface of machine learning, artificial intelligence and its Big Data applications. Research With an ever-increasing use of Internet, digital devices and science, tremendous amount of data encapsulating valuable knowledge have become available. We reflect this impact in the many vibrant facets of this field from automated reasoning to probabilistic inference, from creative and affective computing to human-computer interaction, from machine vision to neurotechnology, from bioinformatics to medical & economic applications. Broadly members of the group belong to at least one of the two pillars of Machine Learning: ¥ Data-level machine learning to support feature extraction from data (“Big Data”) ¥ Knowledge-level machine learning and knowledge representation to extract readable and insightful relational knowledge which supports human-understandable machine inference At the data-level, ongoing research focuses on applying a wide variety of feature-based machine learning techniques in key application areas. Notable recent successes in these areas include the application of machine learning to medical imaging of the brain and heart (Rueckert), human emotions and social signals (Pantic, Zafeiriou), robotic vision (Davison), autonomous systems (Deisenroth), medical applications (Gillies), computational neuroscience and Brain-MachineInterfaces (Faisal). At the knowledge-level, our key expertise lies in Relational and First-Order Logic Learning. Past research had major impact in scientific discovery in biological prediction tasks (Muggleton), security and semi-automated software engineering (Russo). Moreover, the closely related areas of smart analysis of biological or economic network topologies (Przulj) and robust systems optimisation (Parpas) and scalable data analytics (Pietzuch). http://wp.doc.ic.ac.uk/mlg/ The Data Science Institute, Imperial College The Data Science Institute at Imperial College is being established to conduct research on the foundations of data science by developing advanced theory, technology and systems that will contribute to the state-of-the-art in data science and big data, and support data-driven research at Imperial and beyond. The Institute will empower Imperial and its partners to collaborate in the pursuit of world class data-driven innovation. http://www.imperial.ac.uk/data-science/ The University of Edinburgh, Institute for Adaptive and Neural Computation http://www.anc.ed.ac.uk/machine-learning/ Cambridge University About Us We are a part of the Computational and Biological Learning Laboratory located in the Department of Engineering at the University of Cambridge. The research in our group is very broad, and we are interested in all aspects of machine learning. Particular strengths of the group are in Bayesian approaches to modelling and inference in statistical applications. The type of work we do can range from studying fundamental concepts in applied Bayesian statistics, all the way to getting our algorithms to perform competitively against the state-of-the-art in big-data applications. We also work in a broad range of application domains, including neuroscience, bioinformatics, finance, social networks, and physics, just to name a few, and we actively seek to collaborate with other groups within the Department of Engineering, throughout the university as a whole, and with other groups within the UK and around the world. If you are interested in finding out more about our research, please visit our Publications page, or visit the individual research pages of our group members. http://mlg.eng.cam.ac.uk/ Centre for Intelligent Sensing, Queen Mary University of London I am delighted to introduce you to the Centre for Intelligent Sensing (CIS). CIS is a focal point for research in Intelligent Sensing at Queen Mary University of London. The Centre focuses on breakthrough innovations in computational intelligence that will have a major impact in transforming the way humans and machines utilise a variety of sensor inputs for interpretation and decision making. The Centre gathers 33 academics with expertise in all aspects of intelligent sensing from the design and building of the physical sensors to the mathematical and computational challenges of extracting key information from real-time streams of high-dimensional data acquired by networks of sensors. The legal, ethical and social implications of these processes are also addressed. CIS researchers have an outstanding international reputation in camera and sensor networks, image and signal processing, computer vision, data mining, pattern recognition, machine learning, bio-inspired computing, human-computer interaction, affective computing and social signal processing. The Centre also provides post-graduate research and teaching in Intelligent Sensing, and is responsible for the MSc programme in Computer Vision. I do hope that you will enjoy reading this brochure and learning more about who we are and how the research we do helps to address important societal challenges. I also invite you to keep up to date with our activities by following us on Twitter @intelsensing and to enjoy our research videos at http://cis.eecs.qmul.ac.uk. Professor Andrea Cavallaro Director http://cis.eecs.qmul.ac.uk/ Videos https://www.youtube.com/user/intelsensing/feed?spfreload=10 ICRI, The Intel Collaborative Research Institute The Intel Collaborative Research Institute is concerned with how to enhance the social, economic and environmental well being of cities by advancing compute, communication and social constructs to deliver innovations in system architecture, algorithms and societal participation. http://www.cities.io/ MACHINE LEARNING RESEARCH GROUPS, in France Magnet, MAchine learninG in information NETworks, INRIA The Magnet project aims to design new machine learning based methods geared towards mining information networks. Information networks are large collections of interconnected data and documents like citation networks and blog networks among others. For this, we will define new structured prediction methods for (networks of) texts based on machine learning algorithms in graphs. Such algorithms include node classification, link prediction, clustering and probabilistic modeling of graphs. Envisioned applications include browsing, monitoring and recommender systems, and more broadly information extraction in information networks. Application domains cover social networks for cultural data and e-commerce, and biomedical informatics. https://team.inria.fr/magnet/ Sierra Team - Ecole Normale Superieure , CNRS, INRIA SIERRA is based in the Laboratoire d'Informatique de l'École Normale Superiéure (CNRS/ ENS/INRIA UMR 8548) and is a joint research team between INRIA Rocquencourt, École Normale Supérieure de Paris and Centre National de la Recherche Scientifique. We follow four main research directions: Supervised learning: This part of our research focuses on methods where, given a set of examples of input/output pairs, the goal is to predict the output for a new input, with research on kernel methods, calibration methods, structured prediction, and multi-task learning. Unsupervised learning: We focus here on methods where no output is given and the goal is to find structure of certain known types (e.g., discrete or low-dimensional) in the data, with a focus on matrix factorization, statistical tests, dimension reduction, and semi-supervised learning. Parsimony: The concept of parsimony is central to many areas of science. In the context of statistical machine learning, this takes the form of variable or feature selection. The team focuses primarily on structured sparsity, with theoretical and algorithmic contributions. Optimization: Optimization in all its forms is central to machine learning, as many of its theoretical frameworks are based at least in part on empirical risk minimization. The team focuses primarily on convex and bandit optimization. http://www.di.ens.fr/sierra/ ENS Ecole Normale Superieure The Computer Science Department of ENS (DI ENS) is both a teaching department and a research laboratory affiliated with CNRS and INRIA (UMR 8548). On the teaching side, the DI ENS trains students through its Pre-doctoral program and the Masters program (MPRI). On the research side, the research is structured into research groups. The DI ENS is member of the Fondation Sciences Mathématiques de Paris. The Computer Services (SPI) and the Mathematics and Computer Science Library are common to the DI ENS and the Department of Mathematics and Applications (DMA). Teams of the Computer Science Department at École normale supérieure Antique Static analysis by abstract interpretation (head: Xavier Rival) Cascade Cryptography (head: David Pointcheval) Data Signal Processing and Classification (head: Stéphane Mallat) Dyogene Dynamics of Geometric Networks (head: Marc Lelarge) Parkas Parallelism of Synchronous Kahn Networks (head: Marc Pouzet) Sierra Machine Learning (head: Francis Bach) Talgo Theory, Algorithms, topoLogy, Graphs, and Optimization (head: Claire Mathieu) Willow Artificial Vision (head: Jean Ponce) http://www.di.ens.fr/ WILLOW Publications and PhD Thesis Our research is concerned with representational issues in visual object recognition and scene understanding. Our objective is to develop geometric, physical, and statistical models for all components of the image interpretation process, including illumination, materials, objects, scenes, and human activities. These models will be used to tackle fundamental scientific challenges such as three-dimensional (3D) object and scene modeling, analysis, and retrieval; human activity capture and classification; and category-level object and scene recognition. They will also support applications with high scientific, societal, and/or economic impact in domains such as quantitative image analysis in domains such as archaeology and cultural heritage conservation; film post-production and special effects; and video annotation, interpretation, and retrieval. Moreover, machine learning now represents a significant part of computer vision research, and one of the aims of the project is to foster the joint development of contributions to machine learning and computer vision, together with algorithmic and theoretical work on generic statistical machine learning. http://www.di.ens.fr/willow/publications/YearOnly/publications.html Laboratoire Hubert Curien UMR CNRS 5516, Machine Learning Group leader: Marc Sebban Machine learning is the sub-field of artificial intelligence and computer science that studies how machines can learn. A machine learns when it modifies its own behavior as the result of its past experience and performance. Because of this need to analyze the past experience, machine learning techniques are very related to data mining ones. The Machine Learning team is divided into two collaborating sub-projects, one more specialised in statistical learning theory and one more specialised in data mining and information retrieval. In the first sub-project statistical learning theory, the precise focus is on: - Metric Learning, - Transfert Learning and Domain Adaptation - Machine Learning for Computer Vision Applications - Machine Learning for Natural Language Processing In the data mining and information retrieval sub-project, the focus is on: - Developing methods to efficiently mine structured data: documents, graph, social networks, etc., - Modeling heterogeneous structured documents for information retrieval, - Data Mining for Image and Video Analysis http://laboratoirehubertcurien.fr/spip.php?rubrique28 MACHINE LEARNING RESEARCH GROUPS, in Germany Max Planck Institute for Intelligent Systems, Tübingen site Intelligent systems can optimise their structure and properties in order to successfully function within a complex, partially changing environment. Three sub-areas perception, learning and action can be differentiated here. The scientists at the Max Planck Institute for Intelligent Systems are carrying out basic research and development of intelligent systems in all three subareas. Research expertise in the areas of computer science, material science and biology is brought together in one Institute, at two different sites. Machine learning, image recognition, robotics and biological systems will be investigated in Tübingen, while so-called learning material systems, micro- and nanorobitics, as well as self-organisation will be explored in Stuttgart. Although the focus is on basic research, the Institute has a high potential for practical applications in, among other areas, robotics, medical technology, and innovative technologies based on new materials. http://www.mpg.de/1342929/intelligenteSystemeTuebingen BRML Research Lab, Institute of Informatics at the Technische Universität München Patrick van der Smagt's BRML is a collaborative research lab of fortiss--an Institute at TUM; Chair for Robotics and Embedded Systems, Institute of Informatics at the Technische Universität München; and the DLR Institute of Robotics and Mechatronics. The heart of our inforfacious research is formed by machine learning. Within that, we focus on biomechanics and body-machine interfaces. We apply our methods to advanced rehabilitation and assistive robotics. http://brml.org/ HCI, Heidelberg Collaboratory for Image Processing, Universität Heidelberg The HCI is an "Industry on Campus" project established in the context of the German excellence initiative jointly by the University of Heidelberg and the following companies:... The HCI has been established in January, 2008 and moved to its new premises in March, 2008. The HCI consists of four chairs and one associate groups: - Computer Vision(Ommer lab) - Digital Image Processing (Jähne lab) - Image and Pattern Analysis (Schnörr lab) - Image Processing and Modelling (Garbe lab) - Multidimensional Image Processing (Hamprecht lab) The strategic concept of the HCI is built on the simple fact that basic problems in image processing are largely application-independent. The approximately 80 scientists working in the HCI conduct basic research with the aim of providing cutting-edge solutions to basic image analysis problems for applications in industry, environmental and life sciences. The HCI is part of the institutional strategy of the University of Heidelberg within the Excellence Initiative. http://hci.iwr.uni-heidelberg.de/ MACHINE LEARNING RESEARCH GROUPS, in Switzerland EPFL Ecole Polytechnique Federale de Lausanne, Switzerland Artificial Intelligence & Machine Learning The modern world is full of artificial, abstract environments that challenge our natural intelligence. The goal of our research is to develop Artificial Intelligence that gives people the capability to master these challenges, ranging from formal methods for automated reasoning to interaction techniques that stimulate truthful elicitation of preferences and opinions. Another aspect is characterizing human intelligence and cognitive science, with applications in humancomputer interaction and computer animation. Machine Learning aims to automate the statistical analysis of large complex datasets by adaptive computing. A core strategy to meet growing demands of science and applications, it provides a data-driven basis for automated decision making and probabilistic reasoning. Machine learning applications at EPFL range from natural language and image processing to scientific imaging as well as computational neuroscience. http://ic.epfl.ch/intelligence-artificielle-et-apprentissage-automatique IDSIA: the Swiss AI Lab The Swiss AI Lab IDSIA (Istituto Dalle Molle di Studi sull'Intelligenza Artificiale) is a non-profit oriented research institute for artificial intelligence, affiliated with both the Faculty of Informatics of the Università della Svizzera Italiana and the Department of Innovative Technologies of SUPSI, the University of Applied Sciences of Southern Switzerland. We focus on machine learning (deep neural networks, reinforcement learning), operations research, data mining, and robotics. IDSIA researchers win nine international competitions Our neural networks research team has won nine international competitions in machine learning and pattern recognition. Follow the link to learn more about the methods that allowed us to achieve these results. http://www.idsia.ch/ MACHINE LEARNING RESEARCH GROUPS, in Netherlands Machine Learning Research Groups in The Netherlands A large number of researchers and research groups are active in the broad area of machine learning, ranging from Bayesian inference, to robotics and neural networks. Collected is a brief overview, the researchers can be contacted for more information. http://www.mlplatform.nl/researchgroups/ MACHINE LEARNING RESEARCH GROUPS, in POLAND University of Warsaw, Dept. of Mathematics, Informatics and Mechanics Algorithms group Our research The research of our group focuses on several branches of modern algorithmics and the underlying fields of discrete mathematics. The latter include combinatorics on words and on ordered sets, graph theory, formal languages, computational geometry, information theory, foundation of cryptography. The research on algorithms covers parallel and distributed algorithms, large scale algorithms, approximation and randomized algorithms, fixed-parameter and exponential-time algorithms, dynamic algorithms, radio algorithms, multi-party computations, and cryptographic protocols. http://zaa.mimuw.edu.pl/ more to come … MACHINE LEARNING RESEARCH GROUPS, in India RESEARCH LABS, Department of Computer Science and Automation, IISc, Bangalore The department houses a number of research labs, each dedicated to a focused area of research. The lab members comprise faculty, students (both ME and research students), and dedicated project staff. The labs are usually equipped with specialized software and computing facilities, and carry out work on various projects in their area. http://www.csa.iisc.ernet.in/research/research-reslabs.php MLSIG: Machine Learning Special Interest Group, Indian Institute of Science The Machine Learning Special Interest Group (MLSIG) is a group of faculty and students at the Indian Institute of Science in Bangalore, who share interests in machine learning and related fields. The group enjoys the presence of several outstanding faculty engaged in cutting-edge research on a variety of aspects of machine learning and related fields, ranging from theoretical foundations to new algorithms as well as several exciting applications; highly motivated PhD and Masters' research students who complement and expand the energy of the faculty; and close proximity and partnerships with a variety of industry research laboratories, both within Bangalore and outside the city. http://drona.csa.iisc.ernet.in/~mlcenter/ More to come … MACHINE LEARNING RESEARCH GROUPS, in China Peking University School of Electronics Engineering and Computer Science We have built strong cooperation with many famous academic organizations, e.g., University of California at Berkeley, University of California at Los Angeles, Stanford University, University of Illinois at Urbana-Champaign, Oxford University, University of Edinburgh, Paris High Division, University of Tokyo, Waseda University. These cooperation cover most of our research directions: from electronic communication, optical communication, to quantum communication; from computer hardware, software, to network; from micro-electromechanical system to nano techniques; from machine perception to machine intelligence. Center for Information Science Main Research Areas 鈻ð Machine Vision Image processing, image and video compression, pattern recognition and machine learning, biometrics, 3-D visual informational processing. 鈻ð Machine Audition Computational auditory models, speech signal processing, spoken language processing, natural language processing, intelligent human-machine interaction. 鈻ð Intelligent Information Systems Computational intelligence, multimedia resource organization and management, data mining and content-oriented massive information integration, analysis, processing and service. 鈻ð Physiology and Psychology for Machine Perception Electro-physiology, psychophysics and neurophysiology of vision and audition, theories and methods of hearing rehabilitation. http://www.cis.pku.edu.cn/ http://eecs.pku.edu.cn/eecs_english/CnterInfoScience.shtml Institute of Computational Linguistics Main Research Areas 鈻ð Comprehensive Language Knowledge Databases, including large scale word-level information database for the Chinese language. 鈻ð Corpus based NLP, including large scale corpus processing and statistical models and theories. 鈻ð Domain Knowledge Construction, including computational terminology and term database construction. 鈻ð Multilingual Semantic Lexicons, focusing on the study of a Chinese concept dictionary. 鈻ð Computer-aided Translation, focusing on translation methods for technical documents. 鈻ð Information Retrieval, Extraction and Summarization, including various levels of docu ment processing such as document retrieval, topic extraction, summarization, and question answering. http://eecs.pku.edu.cn/index.aspx? menuid=5&type=articleinfo&lanmuid=84&infoid=232&language=cn http://eecs.pku.edu.cn/eecs english/InstComputationalLinguistics.shtml PKU Real course online http://www.grids.cn/ University of Science and Technology of China, USTC https://en.wikipedia.org/wiki/University of Science and Technology of China Nanjing University Lamda Group LAMDA is affiliated with the National Key Laboratory for Novel Software Technology and the Department of Computer Science & Technology, Nanjing University, China. It locates at Computer Science and Technology Building in the Xianlin campus of Nanjing University, mainly in Rm910. The Founding Director of LAMDA is Prof. Zhi-Hua Zhou. "LAMDA" means "Learning And Mining from DatA". The main research interests of LAMDA include machine learning, data mining, pattern recognition, information retrieval, evolutionary computation, neural computation, and some other related areas. Currently our research mainly involves: ensemble learning, semi-supervised and active learning, multi-instance and multi-label learning, cost-sensitive and class-imbalance learning, metric learning, dimensionality reduction and feature selection, structure learning and clustering, theoretical foundations of evolutionary computation, improving comprehensibility, content-based image retrieval, web search and mining, face recognition, computer-aided medical diagnosis, bioinformatics, etc. http://lamda.nju.edu.cn/MainPage.ashx More to come … MACHINE LEARNING RESEARCH GROUPS, in Russia Moscow State University http://www.msu.ru/ More to come … MACHINE LEARNING RESEARCH GROUPS, in Australia NICTA Machine Learning Research Group We want to change the world. Machine learning is a powerful technology that can help solve almost any problem. We think about it differently to much of the machine learning research community. We focus on important and challenging problems such as • Navigating the world’s patent literature • Finding sites for geothermal energy production • Predicting the output of rooftop solar photovoltaic systems • Building actionable data analytics for the enterprise • Managing the traffic in large cities • Predicting failures of widespread infrastructure We develop new technologies to solve these problems and make them freely available or commercially deploy them. We regularly host visitors and regularly have job openings and opportunities for PhD students. If you also want to change the world, come and join us. http://www.nicta.com.au/ ACADEMICS, USA Andrew Ng, Stanford University Andrew Ng is a Co-founder of Coursera and the Director of the Stanford AI Lab. In 2011 he led the development of Stanford University’s main MOOC (Massive Open Online Courses) platform and also taught an online Machine Learning class that was offered to over 100,000 students, leading to the founding of Coursera. Ng’s goal is to give everyone in the world access to a high quality education, for free. Today, Coursera partners with some of the top universities in the world to offer high quality free online courses. It is the largest MOOC platform in the world. Outside online education, Ng’s work at Stanford is on machine learning with an emphasis on deep learning. He also founded and led a project at Google to develop massive-scale deep learning algorithms. It resulted in the famous cat detector popularly known as the “Google cat” in which a massive neural network with 1 billion parameters learned from unlabeled YouTube videos. http://cs.stanford.edu/people/ang/?page id=414 Emmanuel Candes, Stanford University Research Areas Compressive sensing, mathematical signal processing, computational harmonic analysis, statistics, scientific computing. Applications to the imaging sciences and inverse problems. Other topics of recent interest include theoretical computer science, mathematical optimization, and information theory. http://statweb.stanford.edu/~candes/ Tom Mitchell, Carnegie Mellon University (CMU) Dr. Mitchell works on new learning algorithms, such as methods for learning from labeled and unlabeled data. Much of his research is driven by applications of machine learning such as understanding natural language text, and analyzing fMRI brain image data to model human cognition. http://www.cs.cmu.edu/~tom/ Robert Kass, CMU Dr. Kass has long-standing interests in the Bayesian approach to statistical inference, and has contributed to the development of Bayesian methods and their computational implementation. Over the past 10 years he has focused on statistical problems in neuroscience, especially in the analysis of signals coming from single neurons and from multiple neurons recorded simultaneously. http://www.stat.cmu.edu/~kass/ Alexander J. Smola, CMU Researcher, Google Professor, Carnegie Mellon University Interests My primary research interest covers the following four areas: ¥ Scalability of algorithms. This means pushing algorithms to internet scale, distributing them on many (faulty) machines, showing convergence, and modifying models to fit these requirements. For instance, randomized techniques are quite promising in this context. In other words, I'm interested in big data. ¥ Kernels methods are quite an effective means of making linear methods nonlinear and nonparametric. My research interests include support vector Machines, gaussian processes, and conditional random fields. Kernels are very useful also for the representation of distributions, that is two-sample tests, independence tests and many applications to unsupervised learning. ¥ Statistical modeling, primarily with Bayesian Nonparametrics is a great way of addressing many modeling problems. Quite often, the techniques overlap with kernel methods and scalability in rather delightful ways. Applications, primarily in terms of user modeling, document analysis, temporal models, and modeling data at scale is a great source of inspiration. That is, how can we find principled techniques to solve the problem, what are the underlying concepts, how can we solve things automatically. http://alex.smola.org/ https://www.youtube.com/channel/UCYoS2VT03weLA7uzvL2Vybw?spfreload=10 Maria-Florina Balcan, CMU Research My main research interests are in machine learning and theoretical computer science. I am a member of the machine learning group, computer science theory group, and the ACO program. Current research focus includes: - Developing foundations and principled, practical algorithms for important modern learning paradigms. These include interactive learning, distributed learning, multi-task learning, and lifelong learning. My research formalizes and explicitly addresses all constraints and important challenges of these new settings, including statistical efficiency, computational efficiency, noise tolerance, limited supervision or interaction, privacy, low communication, and incentives. - Analyzing the overall behavior of complex systems in which multiple agents with limited information are adapting their behavior based on past experience, both in social and engineered systems contexts. - Computational aspects in game theory and economics. - Analysis of the algorithms beyond the worst case and more generally identifying interesting and realistic models of computation that provide a better alternative to traditional worst-case models in a broad range of optimization problems. http://www.cs.cmu.edu/~ninamf/ Abulhair Saparov, CMU http://www.cs.cmu.edu/directory/abulhair-saparov John Canny, Berkeley University, John F. Canny (1953) is an Australian computer scientist, and Paul and Stacy Jacobs Distinguished Professor of Engineering in the Computer Science Department of the University of California, Berkeley. He has made significant contributions in various areas of computer science and mathematics including artificial intelligence, robotics, computer graphics, humancomputer interaction, computer security, computational algebra, and computational geometry. http://www.cs.berkeley.edu/~jfc/papers/grouped.html http://www.eecs.berkeley.edu/Faculty/Homepages/canny.html Robert Schapire, Princeton University Robert Elias Schapire is the David M. Siegel '83 Professor in the computer science department at Princeton University. His primary specialty is theoretical and applied machine learning. His work led to the development of the boosting meta-algorithm used in machine learning. Together with Yoav Freund, he invented the AdaBoost algorithm in 1996. He received the Gödel prize in 2003 for his work on AdaBoost with Yoav Freund. In 2014, Schapire was elected to the National Academy of Engineering for his contributions to machine learning through the invention and development of boosting algorithms.[1] (Source Wikipedia) http://www.cs.princeton.edu/~schapire/ http://mitpress.mit.edu/sites/default/files/titles/content/9780262017183_sch_0001.pdf Mona Singh, Princeton University My group develops algorithms for a diverse set of problems in computational molecular biology. We are particularly interested in predicting specificity in protein interactions and uncovering how molecular interactions and functions vary across context, organisms and individuals. We leverage high-throughput biological datasets in order to develop data-driven algorithms for predicting protein interactions and specificity; for analyzing biological networks in order to uncover cellular organization, functioning, and pathways; for uncovering protein functions via sequences and structures; and for analyzing proteomics and sequencing data. An appreciation of protein structure guides much of our research. http://www.cs.princeton.edu/~mona/ Olga Troyanskaya, Princeton University The goal of my research is to bring the capabilities of computer science and statistics to the study of gene function and regulation in the biological networks through integrated analysis of biological data from diverse data sources--both existing and yet to come (e.g. from diverse gene expression data sets and proteomic studies). I am designing systematic and accurate computational and statistical algorithms for biological signal detection in high-throughput data sets. More specifically, I am interested in developing methods for better gene expression data processing and algorithms for integrated analysis of biological data from multiple genomic data sets and different types of data sources (e.g. genomic sequences, gene expression, and proteomics data). http://reducio.princeton.edu/cm/node/13 Judea Pearl, Cognitive System Laboratory, UCLA Judea Pearl (born 1936) is an Israeli-born American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belief propagation). He is also credited for developing a theory of causal and counterfactual inference based on structural models (see article on causality). He is the 2011 winner of the ACM Turing Award, the highest distinction in computer science, "for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning". (source Wikipedia) http://bayes.cs.ucla.edu/csl papers.html Justin Esarey Lectures, Assistant Professor of Political Science, Rice University Dr. Justin Esarey is an Assistant Professor of Political Science at Rice University who specializes in political methodology. His areas of expertise include detecting and presenting context-specific relationships, model specification and sensitivity, the analysis of binary data, laboratory social experimentation, and promoting thoughtful inference (and thinking about inference) by using technology to make methodological resources available to the scholarly public. His recent substantive projects study the relationship between corruption and female participation in government, the effect of "naming and shaming" on human rights abuse, and the behavioral implications of political ideology. https://www.youtube.com/user/jeesarey/videos?spfreload=10 Justin Esarey Publications & Software, Assistant Professor of Political Science, Rice University http://jee3.web.rice.edu/research.htm Hal Daume III, University of Maryland I am Hal Daumé III, an Associate Professor in Computer Science (also UMIACS and Linguistics) at the University of Maryland; I was previously in the School of Computing at the University of Utah (CV). Although I'd like to be known for my research in language (computational linguistics and natural language processing) and machine learning (structured prediction, domain adapation and Bayesian methods), I am probably best known for my NLPers blog. I associate myself most with conferences like ACL, ICML, EMNLP and NIPS. At UMD, I'm affiliated with the Computational Linguistics lab, the machine learning reading group, the language science program and the AI group, and interact closely with LINQS and computer vision. http://www.umiacs.umd.edu/~hal/ Melanie Mitchell, Portland State University Research My research interests: Artificial intelligence, machine learning, and complex systems. Evolutionary computation and artificial life. Understanding how natural systems perform computation, and how to use ideas from natural systems to develop new kinds of computational systems. Cognitive science, particularly computer modeling of perception and analogy-making, emergent computation and representation, and philosophical foundations of cognitive science. Biographical Sketch Melanie Mitchell is Professor of Computer Science at Portland State University, and External Professor and Member of the Science Board at the Santa Fe Institute. She attended Brown University, where she majored in mathematics and did research in astronomy, and the University of Michigan, where she received a Ph.D. in computer science, Her dissertation, in collaboration with her advisor Douglas Hofstadter, was the development of Copycat, a computer program that makes analogies. She has held faculty or professional positions at the University of Michigan, the Santa Fe Institute, Los Alamos National Laboratory, the OGI School of Science and Engineering, and Portland State University. She is the author or editor of five books and over 70 scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her most recent book, Complexity: A Guided Tour (Oxford, 2009), won the 2010 Phi Beta Kappa Science Book Award. It was also named by Amazon.com as one of the ten best science books of 2009, and was longlisted for the Royal Society's 2010 book prize. Melanie directs the Santa Fe Institute's Complexity Explorer project, which offers online courses and other educational resources related to the field of complex systems. http://web.cecs.pdx.edu/~mm/ ACADEMICS, France Francis Bach, Ecole Normale Supérieure I am a researcher at INRIA, leading since 2011 the SIERRA project-team, which is part of the Computer Science Laboratory at Ecole Normale Superieure. I completed my Ph.D. in Computer Science at U.C. Berkeley, working with Professor Michael Jordan, and spent two years in the Mathematical Morphology group at Ecole des Mines de Paris, I then joined the WILLOW project-team at INRIA/Ecole Normale Superieure from 2007 to 2010. I am interested in statistical machine learning, and especially in graphical models, sparse methods, kernel-based learning, convex optimization vision and signal processing. http://www.di.ens.fr/~fbach/ Gaël Varoquaux, INRIA Machine learning and brain imaging researcher ♣ Research faculty (CR1), Parietal team, INRIA ♣ Associate researcher, Unicog team, INSERM ACADEMIC RESEARCH Machine learning to link cognition with brain activity: I am interested in data mining of functional brain images (fMRI) to learn models of brain function. ♣ Machine learning for encoding / decoding models ♣ Spatial penalties for learning and denoising ♣ Resting-state methods ♣ Functional parcellations of the brain ♣ Functional connectivity • More... My publications page and my Google scholar page. Research at Parietal OPEN-SOURCE SOFTWARE Core contributor to scientific computing in Python: ¥ scikit-learn: Machine learning in Python ¥ joblib: lightweight pipelining of scientific code ¥ Mayavi: 3D plotting and scientific visualization ¥ nilearn: Machine learning for NeuroImaging More... I am editor of the scipy lecture notes. See my view on scientific computing. http://gael-varoquaux.info/ ACADEMICS, in United Kingdom John Shaw-Taylor, University College London John S Shawe-Taylor is a professor at University College London (UK) where he is Director of the Centre for Computational Statistics and Machine Learning (CSML). His main research area is Statistical Learning Theory, but his contributions range from Neural Networks, to Machine Learning, to Graph Theory. John Shawe-Taylor obtained a PhD in Mathematics at Royal Holloway, University of London in 1986. He subsequently completed an MSc in the Foundations of Advanced Information Technology at Imperial College. He was promoted to Professor of Computing Science in 1996. He has published over 150 research papers. He moved to the University of Southampton in 2003 to lead the ISIS research group. He has been appointed the Director of the Centre for Computational Statistics and Machine Learning at University College, London from July 2006. He has coordinated a number of European wide projects investigating the theory and practice of Machine Learning, including the NeuroCOLT projects. He is currently the scientific coordinator of a Framework VI Network of Excellence in Pattern Analysis, Statistical Modelling and Computational Learning (PASCAL) involving 57 partners. http://www0.cs.ucl.ac.uk/staff/J.Shawe-Taylor/ Mark Herbster, University College London My research currently focuses on the problem of predicting a labeling of a graph. This problem is foundational for transductive and semi-supervised learning. Initial bounds and experimental results are given in Online learning over graphs. The paper Prediction on a graph with a perceptron significantly improves on previous results in terms of the tightness and interpretability of the bounds. In the recent work A fast method to predict the labeling of a tree we've developed methods to speed up graph prediction methods. I am also broadly interested in online learning, see my publications page for more details. http://www0.cs.ucl.ac.uk/staff/M.Herbster/pubs/ David Barber, University College London David Barber received a BA in Mathematics from Cambridge University and subsequently a PhD in Theoretical Physics (Statistical Mechanics) from Edinburgh University. He is currently Reader in Information Processing in the department of Computer Science UCL where he develops novel information processing schemes, mainly based on the application of probabilistic reasoning. Prior to joining UCL he was a lecturer at Aston and Edinburgh Universities. http://web4.cs.ucl.ac.uk/staff/d.barber/publications/david barber online.html Gabriel Brostow, University College London My name is Gabriel Brostow, and I am an associate professor (Senior Lecturer) in Computer Science here at UCL. My group explores research problems relating to Computer Vision and Computer Graphics. The students and colleagues here have diverse interests, but my focus is on "Smart Capture" for analysis and synthesis applications. To me, smart capture of visual data (usually video) means having or finding satisfying answers to these questions about a system, whether interactive or fully automated: I) Does the system know the intended purpose of the data being captured? II) Can the system assess its own accuracy? III) Does the system compare new inputs to old ones? I love this field because it allows us to apply our expertise to a variety of tough problems, including film and photo special effects (computational photography), action analysis (of people, animals, and cells), and authoring systems (for architecture, animation, presentations) that make the most of user effort. "Motion reveals everything" used to be my main research mantra, but that has now taken hold sufficiently (obviously NOT just through my efforts!) that it no longer needs championing. http://www0.cs.ucl.ac.uk/staff/g.brostow/#Research Jun Wang, University College London My research focus is on the areas of information retrieval, large scale data mining, multimedia content analysis, and statistical pattern recognition; current research covers both theoretical and practical aspects: portfolio theory and statistical modeling of information retrieval, data mining and collaborative filtering (recommendation), web economy and online advertising, user-centric information seeking, social, “the wisdom of crowds”, approaches for content understanding, organisation, and retrieval, peer-to-peer information retrieval and filtering, and multimedia content analysis, indexing and retrieval. https://scholar.google.com/citations?user=wIE1tY4AAAAJ&hl=en David Jones Lab, University College London My main research interests are in protein structure prediction and analysis, simulations of protein folding, Hidden Markov Model methods, transmembrane protein analysis, machine learning applications in bioinformatics, de novo protein design methodology, and genome analysis including the application of intelligent software agents. New areas of research include the use of high throughput computing and Grid technology for bioinformatics applications, analysis and prediction of protein disorder, expression array data analysis and the analysis and prediction of protein function and protein-protein interactions. http://bioinf.cs.ucl.ac.uk/publications/ Simon Prince, University College London My initial work addressed human stereo vision. My doctoral thesis concerned the solution of the binocular stereo correspondence problem in the human visual system. I also worked on the physiology of stereo vision in my subsequent post-doctoral research. I became interested in computer vision and made the switch in 2000. My first Computer Science research was on time-series methods for the solution of the inverse problem in Optical Tomography with Simon Arridge at UCL. In Singapore, I worked for several years on augmented reality. This involved developing algorithms for camera pose estimation, and a threedimensional video-conferencing system using real-time image based rendering. More recently, I have worked on face detection in a novel foveated sensor system. I am interested in face recognition in general and have presented work on how to recognize faces in the presence of large pose and lighting changes. I am interested in most areas of computer vision and computer graphics, and still maintain active links with the neuroscience and medical imaging communities. http://web4.cs.ucl.ac.uk/research/vis/pvl/ http://www.computervisionmodels.com/ Massimiliano Pontil, University College London I am mainly interested in machine learning theory and pattern recognition. I have also some interest in function representation and approximation, numerical optimization and statistics. I have worked on different machine learning approaches, particularly on regularization methods, such as support vector machines and other kernel-based methods, multi-task and transfer learning, online learning and learning over graphs. I have also worked on machine learning applications arising in computer vision, natural language processing, bioinformatics and user modeling. http://www0.cs.ucl.ac.uk/staff/M.Pontil/pubs.html Richard E Turner, Cambridge University Richard Turner holds a Lectureship (equivalent to US Assistant Professor) in Computer Vision and Machine Learning in the Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, UK. Before taking up this position, he held an EPSRC Postdoctoral research fellowship which he spent at both the University of Cambridge and the Laboratory for Computational Vision, NYU, USA. He has a PhD degree in Computational Neuroscience and Machine Learning from the Gatsby Computational Neuroscience Unit, UCL, UK and a M.Sci. degree in Natural Sciences (specialism Physics) from the University of Cambridge, UK. https://scholar.google.com/citations?user=DgLEyZgAAAAJ&hl=en Andrew McHutchon Homepage, Cambridge University Before starting my PhD I took the MEng course at Cambridge and specialised in Information Engineering in my third and fourth year. In particular I studied control, bioinformatics, and some information theory and statistics. As part of the MEng year I undertook a research project with Carl Rasmussen, on applying Machine Learning techniques to control; this has now continued on into my PhD research. Other avenues of research I have so far looked at include fast approximations to Gaussian Processes for uncertain inputs and training GPs with input noise. I am a member of Churchill College. http://mlg.eng.cam.ac.uk/?portfolio=andrew-mchutchon Phil Blunsom, Oxford University My research interests lie at the intersection of machine learning and computational linguistics. I apply machine learning techniques, such as graphical models, to a range of problems relating to the understanding, learning and manipulation of language. Recently I have focused on structural induction problems such as grammar induction and learning statistical machine translation models https://scholar.google.co.uk/citations?user=eJwbbXEAAAAJ&hl=en Nando de Freitas, Oxford University I want to understand intelligence and how minds work. My research is multi-disciplinary and focuses primarily on the following areas: Machine learning, big data, and computational statistics Artificial intelligence, probabilistic reasoning, and decision making Computational neuroscience, neural networks, and cognitive science Randomized algorithms, and Monte Carlo simulation Vision, robotics, and speech perception http://scholar.google.co.uk/citations?user=nzEluBwAAAAJ&hl=en Karl Hermann, Oxford University My research is at the intersection of Natural Language Processing and Machine Learning, with particular emphasis on semantics. Current topics of interest include: Compositional Semantics Learning from Multilingual Data Semantic Frame Identification Machine Translation Hypergraph Grammars http://www.cs.ox.ac.uk/people/publications/personal/KarlMoritz.Hermann.html Edward Grefenstette, Oxford University I am a Franco-American computer scientist, working as a research assistant on EPSRC Project EP/I03808X/1 entitled A Unified Model of Compositional and Distributional Semantics: Theory and Applications. I am also lecturing at Hertford College to students taking Oxford's new computer science and philosophy course. From October 2013, I will also be a Fulford Junior Research Fellow at Somerville College. http://www.cs.ox.ac.uk/people/publications/date/Edward.Grefenstette.html ACADEMICS, in Netherlands Thomas Geijtenbeek Publications & Videos, Delft University of Technology I am a postdoctoral researcher at Delft University of Technology. My main research interests are simulation, control, animation and artificial intelligence. In addition, I work part-time as Manager Software Development at Motek Medical. http://goatstream.com/research/ ACADEMICS, in Canada Yoshua Bengio, University of Montreal My long-term goal is to understand intelligence; understanding the underlying principles would deliver artificial intelligence, and I believe that learning algorithms are essential in this quest. Machine learning algorithms attempt to endow machines with the ability to capture operational knowledge through examples, e.g., allowing a machine to classify or predict correctly in new cases. Machine learning research has been extremely successful in the past two decades and is now applied in many areas of science and technology, some well known examples including web search engines, natural language translation, speech recognition, machine vision, and datamining. Yet, machines still seem to fall short of even mammal-level intelligence in many respects. One of the remaining frontiers of machine learning is the difficulty of learning the kind of complicated and highly-varying functions that are necessary to perform machine vision or natural language processing tasks at a level comparable to humans (even a 2-year old). See my lab's long-term vision web page for a broader introduction. An introductory discussion of recent and ongoing research is below. See the lab's publications site for a downloadable and complete bibliographic list of my papers. http://www.iro.umontreal.ca/~bengioy/yoshua_en/research.html http://www.iro.umontreal.ca/~bengioy/yoshua en/ Deep Learning Slides by Yoshua Bengio, MLSS 2015, Austin, Texas http://www.iro.umontreal.ca/~bengioy/talks/mlss-austin.pdf KyungHyun Cho, University of Montreal http://www.kyunghyuncho.me/home Deep Learning Tutorial at KAIST Slides https://drive.google.com/file/d/0B16RwCMQqrtdb05qdDFnSXprM0E/edit?pli=1 Geoffrey Hinton, University of Toronto I design learning algorithms for neural networks. My aim is to discover a learning procedure that is efficient at finding complex structure in large, high-dimensional datasets and to show that this is how the brain learns to see. I was one of the researchers who introduced the back-propagation algorithm that has been widely used for practical applications. My other contributions to neural network research include Boltzmann machines, distributed representations, time-delay neural nets, mixtures of experts, variational learning, contrastive divergence learning, dropout, and deep belief nets. My students have changed the way in which speech recognition and object recognition are done. I now work part-time at Google and part-time at the University of Toronto. http://www.cs.toronto.edu/~hinton/papers.html http://www.cs.toronto.edu/~hinton/ Alex Graves, University of Toronto Research Interests Recurrent neural networks (especially LSTM) Supervised sequence labelling (especially speech and handwriting recognition) Unsupervised sequence learning http://www.cs.toronto.edu/%7Egraves/ Hugo Larochelle, Universite de Sherbrooke Je m'intéresse aux algorithmes d'apprentissage automatique, soit aux algorithmes capables d'extraire des concepts ou patrons à partir de données. Mes travaux se concentrent sur le développement d'approches connexionnistes et probabilistes à diverses problèmes d'intelligence artificielle, tels la vision artificielle et le traitement automatique du langage. Les thèmes de recherche auxquels je m'intéresse incluent: Problèmes: apprentissage supervisé, semi-supervisé et non-supervisé, prédiction de cibles structurées, ordonnancement, estimation de densité; Modèles: réseaux de neurones profonds («deep learning»), autoencodeurs, machines de Boltzmann, champs Markoviens aléatoires; Applications: reconnaissance et suivi d'objects, classification et ordonnancement de documents. http://www.dmi.usherb.ca/~larocheh/index fr.html http://info.usherbrooke.ca/hlarochelle/neural networks/content.html Giuseppe Carenini, University of British Columbia http://www.cs.ubc.ca/%7Ecarenini/storage/new-papers-frame.html Cristina Conati, University of British Columbia http://www.cs.ubc.ca/~conati/publications.php Kevin Leyton-Brown, University of British Columbia http://www.cs.ubc.ca/~kevinlb/publications.html Holger Hoos, University of British Columbia http://www.cs.ubc.ca/~hoos/publications.html Jim Little, University of British Columbia http://www.cs.ubc.ca/~little/links/papers.html David Lowe, University of British Columbia http://www.cs.ubc.ca/~lowe/pubs.html Karon MacLean, University of British Columbia http://www.cs.ubc.ca/labs/spin/publications/index.html Alan Mackworth, University of British Columbia http://www.cs.ubc.ca/~mack/Publications/sort date.html Dinesh K. Pai, University of British Columbia http://www.cs.ubc.ca/~pai/ David Poole, University of British Columbia http://www.cs.ubc.ca/~poole/publications.html Prof. Shai Ben-David, University of Waterloo Research Interests My research interests span a wide spectrum of topics in the foundations of computer science and its applications, with a particular emphasis on statistical and computational machine learning. The common thread throughout my research is aiming to provide mathematical formulation and understanding of real world problems. In particular, I have been looking at popular machine learning and data mining paradigms that seem to lack clear theoretical justification. https://cs.uwaterloo.ca/~shai/ http://videolectures.net/shai_ben_david/ ACADEMICS, in Germany Machine Learning Lab, University of Freiburg Future computer programs will contain a growing part of 'intelligent' software modules that are not conventionally programmed, but that are learned either from data provided by the user or from data that the program autonomously collects during its use. In this spirit, the Machine Learning Lab deals with research on Machine Learning techniques and the integration of learning modules into larger software systems, aiming at their effective application in complex real-world problems. Application areas are robotics, control, forecasting and disposition systems, scheduling and related fields. Research Areas: Efficient Reinforcement Learning Algorithms, Intelligent Robot Control Architectures, Learning in Multiagent Systems, (Un-)Supervised Learning, Deep Learning, Autonomous Robots, Industrial Applications, Clinical Applications http://ml.informatik.uni-freiburg.de ACADEMICS, in China En-Hong Chen, USPC My current research interests are data mining and machine learning, especially social network analysis and recommender systems. I have published more than 100 papers on many journals and conferences, including international journals such as IEEE Trans, ACM Trans, and important data mining conferences, such as KDD, ICDM, NIPS. My research is supported by the National Natural Science Foundation of China, National High Technology Research and Development Program 863 of China, etc. I won the Best Application Paper Award on KDD2008 and Best Research Paper Award on ICDM2011. http://staff.ustc.edu.cn/~cheneh/#pub Linli Xu, USPC My research area is Machine Learning. More specifically, my work combines aspects from the following: • Unsupervised learning and semi-supervised learning, clustering • Large margin approaches, support vector machines • Optimization, convex programming http://staff.ustc.edu.cn/~linlixu/papers.html Yuan Yao, School of Mathematical Sciences, University of Beijing My most recent interests are focusing on mathematics for data sciences, in particular topological and geometric methods for high dimensional data analysis and statistical machine learning, with applications in computational biology and information technology. Publications and code to reproduce results http://www.math.pku.edu.cn/teachers/yaoy/research.html ACADEMICS, in Australia Prof. Peter Corke, Queensland University of Technology Software for robotics, vision and other things. This includes the robotics and machine vision toolboxes for Matlab. More recently this has become a book. and then two MOOCs. Everything is freeware so enjoy! About I live in Brisbane with my wife, two daughters and a cat. By day I’m a professor at Queensland University of Technology. My interests include robotics, computer vision, embedded systems, control and networking. I’ve worked on robotic systems for mining, aerial and underwater applications. By night I maintain two open-source toolboxes, one for robotics and one for vision, and have just finished writing a book on robotics, vision & control which will be published September 2011. http://www.petercorke.com/Home.html ACADEMICS, in United Arab Emirates Dmitry Efimov, American University of Sharjah, UAE Dmitry is an expert in promising areas of modern complex and functional analysis; the author of original results. He begins with the systematic study of some classes of analytic functions in the half-plane that are analogous to the well-known Privalov classes and maximal Privalov classes in the disc. His main results are the following: 1) A new factorization formula and accurate estimates of growth for functions in these classes; 2) The introduction of natural invariant metrics under which the classes form Frecher algebras; 3) A complete description of the linear isometries as well as the bounded and completely bounded subsets in the classes. https://www2.aus.edu/facultybios/profile.php?faculty=defimov https://www.kaggle.com/users/29346/dmitry-efimov ACADEMICS, in Poland Marcin Murca, University of Warsaw, POLAND I am an assistant professor at the Institute of Informatics, University of Warsaw, member of the Algorithms Group (see our blog!). I work on graph algorithms, approximation algorithms and on-line algorithms most of my papers at DBLP or here. you can find You can find my PhD Thesis here it contains a rather detailed exposition of the algebraic approach to matching problems in graphs. http://duch.mimuw.edu.pl/~mucha/wordpress/?page_id=58 ACADEMICS, in Switzerland Prof. Jürgen Schmidhuber's Home Page (Great resources! Not to be missed!) Prof. Jürgen Schmidhuber's Artificial Intelligence team has won nine international competitions in machine learning and pattern recognition (more than any other AI research group) and seven independent best paper/best video awards, achieved the world's first superhuman visual classification results, Deep Learning since 1991 - Winning Contests in Pattern Recognition and Sequence Learning Through Fast & Deep / Recurrent Neural Networks has pioneered Deep Learning methods for Artificial Neural Networks since 1991, and established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He generalized algorithmic information theory, and the many-worlds theory of physics, to obtain a minimal theory of all constructively computable universes - an elegant algorithmic theory of everything. Google & Apple and many other leading companies are now using the machine learning techniques developed in his group at the Swiss AI Lab IDSIA & USI & SUPSI (ex-TUM CogBotLab). Since age 15 or so his main scientific ambition has been to build an optimal scientist through self-improving AI, then retire. Progress is accelerating - are 40,000 years of humandominated history about to converge within the next few decades? http://people.idsia.ch/~juergen/ Free access to ML MSc & PhD Dissertations Machine Learning Department, Carnegie Mellon University https://www.ml.cmu.edu/research/phd-dissertations.html Machine Learning Department, Columbia University (Search for PhD on the page) http://www.cs.columbia.edu/learning/papers.html Non linear Modelling and Control using Gaussian Processes, PhD Thesis by Andrew McHutchon, Cambridge University Abstract ... In this thesis we start by discussing how GPs can be applied to data sets which have noise affecting their inputs. We present the ‘Noisy Input GP’, which uses a simple local-linearisation to refer the input noise into heteroscedastic output noise, and compare it to other methods both theoretically and empirically. We show that this technique leads to a effective model for nonlinear functions with input and output noise. We then consider the broad topic of GP state space models for application to dynamical systems. We discuss a very wide variety of approaches for using GPs in state space models, including introducing a new method based on momentmatching, which consistently gave the best performance. We analyse the methods in some detail including providing a systematic comparison between approximate-analytic and particle methods. To our knowledge such a comparison has not been provided before in this area. Finally, we investigate an automatic control learning framework, which uses Gaussian Processes to model a system for which we wish to design a controller. Controller design for complex systems is a difficult task and thus a framework which allows an automatic design directly from data promises to be extremely useful. We demonstrate that the previously published framework cannot cope with the presence of observation noise but that the introduction of a state space model dramatically improves its performance. This contribution, along with some other suggested improvements opens the door for this framework to be used in real-world applications. http://mlg.eng.cam.ac.uk/pub/pdf/Mch14.pdf PhD Dissertations, University of Edingburgh, UK https://www.era.lib.ed.ac.uk MSc Dissertations, University of Oxford, UK https://www.cs.ox.ac.uk/admissions/grad/ A list of some recent theses that received high marks Machine Learning Group, Department of Engineering, University of Cambridge, UK (Search for PhD on the page) http://mlg.eng.cam.ac.uk/pub/ New York University Computer Science PhD Theses http://www.cs.nyu.edu/web/Research/theses.html Digital Collection of The Australian National University (PhD Thesis) https://digitalcollections.anu.edu.au/handle/1885/3/simple-search?query=machine learning&rpp=10&sort_by=0&order=DESC&etal=0&submit_search=Update TEL (thèses-EN-ligne) (more than 45,000 thesis, however some in French!) The purpose of TEL (thèses-EN-ligne) is to facilitate the self archiving of thesis manuscripts, which are important documents for direct scientific communication between scientists. TEL is actually a particular "environment" of HAL. It therefore has the same objective: make scientific documents available to scientists all around the world, rapidly and freely, but with a restriction to PHD thesis and habilitations (HDR, in countries where habilitations exist). CCSD does not make any scientific evaluation of the thesis that are submitted, since this is the responsibiliy of the university professors in the examination board. https://tel.archives-­‐ouvertes.fr/browse/domain