PDF - The Machine Learning Salon

advertisement
The Machine
Learning Salon
Starter Kit
Jacqueline Isabelle Forien
1st Edition - Summer 2015
ABOUT................................................................................37
The Machine Learning Salon Starter Kit....................................................37
Founder of The Machine Learning Salon...................................................38
MOOC, Opencourseware in English...................................39
COURSERA: Machine Learning Stanford Course....................................39
COURSERA: Pratical Machine Learning..................................................39
COURSERA: Neural Networks for Machine Learning..............................40
COURSERA: Data Science Specialization.................................................40
COURSERA: Reasoning, Data Analysis and Writing Specialization.........42
COURSERA: Data Mining Specialization.................................................43
COURSERA: Cloud Computing Specialization.........................................46
COURSERA: Miscellaneous.......................................................................48
STANFORD University: Stanford Engineering Everywhere......................52
STANFORD University: 2015 Stanford HPC Conference Video Gallery.53
STANFORD University: Awni Hannun of Baidu Research.......................53
STANFORD University: Steve Cousins of Savioke....................................53
STANFORD University: Ron Fagin of IBM Research...............................54
STANFORD University: CS224d: Deep Learning for Natural Language
Processing by Richard Socher, 2015............................................................54
EdX: Articifial Intelligence (BerkeleyX).......................................................55
EdX: Big Data and Social Physics (Ethics)...................................................55
EdX: Introduction to Computational Thinking and Data Science.............56
MIT OpenCourseWare (OCW)...................................................................56
VLAB MIT Entreprise Forum Bay Area, Machine Learning Videos.........56
Foundations of Machine Learning by Mehryar Mohri - 10 years of
Homeworks with Solutions and Lecture Slides............................................57
Carnegie Mellon University (CMU) Video resources..................................57
CMU: Convex Optimisation, Fall 2013, by Barnabas Poczos and Ryan
Tibshirani.....................................................................................................58
CMU: Machine Learning, Spring 2011, by Tom Mitchell..........................58
CMU: 10-601 Machine Learning Spring 2015 - Lecture 18 by Maria-Florina
Balcan...........................................................................................................59
CMU: 10-601 Machine Learning Spring 2015, Homeworks & Solutions &
Code (Matlab)...............................................................................................59
CMU: 10-601 Machine Learning Spring 2015 - Recitation 10 by Kirstin Early..
59
CMU: Abulhair Saparov’s Youtube Channel..............................................59
CMU: Machine Learning Course by Roni Rosenfeld, Spring 2015............59
CMU: Language and Statistics by Roni Rosenfeld, Spring 2015................60
Metacademy Concept list and roadmap list.................................................60
HARVARD University: Advanced Machine Learning, Fall 2013...............61
HARVARD University: Data Science Course, Fall 2013.............................61
OXFORD University: Nando de Freitas Video Lectures............................61
OXFORD University: Deep learning - Introduction by Nando de Freitas, 2015.
62
OXFORD University: Deep learning - Linear Models by Nando de Freitas,
2015..............................................................................................................62
OXFORD University: Yee Whye Teh Home Page, Department of Statistics,
University College........................................................................................62
CAMBRIDGE University: Machine Learning Slides, Spring 2014............63
CALTECH University: Learning from Data...............................................63
UNIVERSITY COLLEGE LONDON (UCL): Discovery.........................63
UCL: Supervised Learning by Mark Herbster............................................64
Yann LeCun’s Publications...........................................................................64
Ecole Normale Superieure: Francis Bach, Courses and Exercises with solutions
(English-French) ...........................................................................................64
Technion, Israel Institute of Technology, Machine Learning Videos..........65
E0 370: Statistical Learning Theory by Prof. Shivani Agarwal, Indian Institute
of Science.....................................................................................................66
NPTEL, National Programme on Technology Enhanced Learning, India67
.
Pattern Recognition Class, Universität Heidelberg, 2012 (Videos in English)67
Videolectures.net..........................................................................................70
MLSS Machine Learning Summer Schools Videos....................................70
GoogleTechTalks..........................................................................................71
Udacity Opencourseware.............................................................................71
Udacity's Videos ..........................................................................................73
Mathematicalmonk Machine Learning.......................................................73
Judea Pearl Symposium................................................................................73
SIGDATA, Indian Institute of Technology Kanpur....................................74
Hakka Labs ..................................................................................................74
Open Yale Course........................................................................................74
COLUMBIA University: Machine Learning resources...............................74
COLUMBIA University: Applied Data Science by Ian Langmore and Daniel
Krasner.........................................................................................................75
Deep Learning..............................................................................................75
BigDataWeek Videos....................................................................................76
Neural Information Processing Systems Foundation (NIPS) Video resources76
NIPS 2014 Workshop Videos.......................................................................76
NIPS 2014 Workshop - (Bengio) OPT2014 Optimization for Machine Learning
......................................................................................................................77
Hong Kong Open Source Conference 2013 (English&Chinese) ................77
ICLR 2014 Videos.......................................................................................77
ICLR 2013 Videos.......................................................................................78
Machine Learning Conference Videos........................................................78
Internet Archive...........................................................................................79
University of Berkeley..................................................................................80
AMP Camps, Big Data Bootcamp, UC Berkeley ........................................80
AI on the Web, AIMA (Artificial Intelligence: A Modern Approach) by Stuart
Russell and Peter Norvig..............................................................................80
Resources and Tools of Noah's ARK Research Group...............................80
ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2014.........81
The Royal Society .......................................................................................82
Statistical and causal approaches to machine learning by Professor Bernhard
Schölkopf......................................................................................................83
Deep Learning RNNaissance with Dr. Juergen Schmidhuber.....................83
Introduction to Deep Learning with Python by Alec Radford....................83
A Statistical Learning/Pattern Recognition Glossary by Thomas Minka...83
The Kalman Filter Website by Greg Welch and Gary Bishop.....................83
Lisbon Machine Learning School (LXMLS)...............................................84
LXMLS Slides, 2014....................................................................................85
INTRODUCTORY APPLIED MACHINE LEARNING by Victor Lavrenko
and Nigel Goddard, University of Edinburgh, 2011...................................86
Data Mining and Machine Learning Course Material by Bamshad Mobasher, DePaul University, Fall 2014........................................................................86
Intelligent Information Retrieval by Bamshad Mobasher, DePaul University,
Winter 2015..................................................................................................86
Student Dave Youtube Channel...................................................................87
Current Courses of Justin E. Esarey, RICE University................................87
From Bytes to Bites: How Data Science Might Help Feed the World by David
Lobell, Stanford University..........................................................................88
Conference on Empirical Methods in Natural Language Processing (and
forerunners) (EMNLP)..................................................................................88
Columbia University's Laboratory for Intelligent Imaging and Neural
Computing (LIINC).....................................................................................89
Enabling Brain-Computer Interfaces for Labeling Our Environment by Paul
Sadja.............................................................................................................89
The Unreasonable Effectivness Of Deep Learning by Yann LeCun, Sept 2014..
89
Machine Learning by Prof. Shai Ben-David, University of Waterloo, Lecture
1-3, Jan 2015................................................................................................89
Computer Vision by Richard E. Turner, Slides, Exercises & Solutions,
University of Cambridge.............................................................................90
Probability and Statistics by Carl Edward Rasmussen, Slides, University of
Cambridge....................................................................................................90
Machine Learning by Carl Edward Rasmussen, Slides, University of
Cambridge....................................................................................................90
Seth Grimes's videos.....................................................................................90
Introduction to Reinforcement Learning by Shane Conway, Nov 2014......90
Machine Learning and Data Mining by Prof. Dr. Volker Tresp, 2014, LMU91
Applied Machine Learning by Joelle Pineau, Fall 2014, McGill University91
Analyzing data from the city of Montreal....................................................91
Artificial Intelligence by Joelle Pineau, Winter 2014-2015, McGill University.....
91
Talking Machines: The History of Machine Learning from the Inside Out92
The Simons Institute for the Theory of Computing..................................92
DIKU - Datalogisk Institut, Københavns Universitet Youtube Channel....92
Hashing in machine learning by John Langford, Microsoft Research.........93
Dimensionality reductions by Alexander Andoni, Microsoft Research.......93
RE.WORK Deep Learning Summit Videos, San Francisco 2015..............93
Machine Learning Tutorial, UNSW Australia.............................................93
Oxford's Podcast...........................................................................................93
Natural Language Processing by Mohamed Alaa El-Dien Aly, 2014, KAUST.....
94
QUT - Queensland University of Technology, Brisbane, Australia............94
Data & Society..............................................................................................94
Open Book for people with autism...............................................................95
NUMDAN, Recherche et téléchargement d’archives de revues mathématiques
numérisées....................................................................................................95
Project Euclid, mathematics and statistics online.........................................95
Statistical Modeling: The Two Cultures by Leo Breiman, 2001..................95
mini-DML....................................................................................................95
MISCELLANEOUS....................................................................................96
The Automatic Statistician project...............................................................96
A selection of Youtube's featured channels..................................................97
Introduction To Modern Brain-Computer Interface Design by Swartz Center
for Computational Neuroscience.................................................................98
Distributed Computing Courses (lectures, exercises with solutions) by ETH
Zurich, Group of Prof. Roger Wattenhofer.................................................98
The wonderful and terrifying implications of computers that can learn | Jeremy
Howard | TEDxBrussels..............................................................................99
Partially derivative, A podcast about data, data science, and awesomeness!99
Class Central................................................................................................99
Beginning to Advanced University CS Courses.........................................100
WIRED UK Youtube Channel..................................................................100
Davos 2015 - A Brave New World - How will advances in artificial intelligence,
smart sensors and social technology change our lives?...............................100
World Economic Forum.............................................................................101
The Global Gender Gap Report................................................................101
The LINCS project....................................................................................102
Australian Academy of Science.................................................................102
Artificial intelligence: Machines on the rise................................................102
Bill Gates Q&A on Reddit..........................................................................103
Second Price went to Yarin Gal for his extrapolated art image, Cambridge
University Engineering Photo Competition...............................................103
Draw from a Deep Gaussian Process by David Duvenaud, Cambridge
University Engineering Photo Competition...............................................103
MOOC, Opencourseware in Spanish................................104
MOOC, Opencourseware in German...............................104
MOOC, Opencourseware in Italian..................................104
MOOC, Opencourseware in French..................................105
France Universite Numerique (FUN).........................................................105
FUN: MinesTelecom: 04006 Fondamentaux pour le Big Data.................105
University of Laval (French Canadian)......................................................105
Théorie algorithm. des graphes..................................................................106
Hugo Larochelle, Apprentissage automatique, French Canadian.............106
Francis Bach, Ecole Normale Superieure - Courses and Exercises with solutions
(English-French) .........................................................................................107
College de France, Mathematics and Digital Science, French...................108
Le Laboratoire de Recherche en Informatique (LRI)................................108
MOOC, Opencourseware in Russian.................................110
Russian Machine Learning Resources.......................................................110
The Yandex School of Data Analysis.........................................................110
Alexander D’yakonov Resources................................................................111
MOOC, Opencourseware in Japanese..............................112
MOOC, Opencourseware in Chinese................................113
Yeeyan Coursera Chinese Classroom........................................................113
Hong Kong Open Source Conference 2013 .............................................113
Guokr.com..................................................................................................113
MOOC, Opencourseware in Portuguese...........................115
Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF,
2010............................................................................................................115
Algoritmo de Aprendizado de Máquina by Aurora Trinidad Ramirez Pozo,
Universidade Federal do Paraná, UFPR....................................................115
Digital Library, Universidad de Sao Paulo.................................................115
MOOC, Opencourseware in Hebrew................................116
Open University of Israel...........................................................................116
Homeworks, Assignments & Solutions................................117
CS229 Stanford Machine Learning List of projects (free access to abstracts),
2013 and previous years.............................................................................117
CS229 Stanford Machine Learning by Andrew Ng, Autumn 2014 .........117
CS 445/545 Machine Learning by Melanie Mitchell, Winter Quarter 2014117
Introduction to Machine Learning, Machine Learning Lab, University of
Freiburg, Germany.....................................................................................118
Unsupervised Feature Learning and Deep Learning by Andrew Ng, 2011 ?118
Machine Learning by Andrew Ng, 2011....................................................118
Pattern Recognition and Machine Learning, Solutions to Exercises, by Markus
Svensen and Christopher Bishop, 2009......................................................119
Machine Learning Course by Aude Billard, Exercises & Solutions, EPFL,
Switzerland.................................................................................................119
T-61.3025 Principles of Pattern Recognition Weekly Exercises with Solutions (in
English), Aalto University, Finland, 2015..................................................119
T-61.3050 Machine Learning: Basic Principles Weekly Exercises with Solutions
(in English), Aalto University, Finland, Fall 2014.......................................119
CSE-E5430 Scalable Cloud Computing Weekly Exercises with Solutions (in
English), Aalto University, Finland, Fall 2014............................................119
Weekly Exercises with Solutions (in English) from Aalto University, Finland120
SurfStat Australia: an online text in introductory Statistics.......................120
Learning from Data by Amos Storkey, Tutorial & Worksheets (with solutions),
University of Edinburgh, Fall 2014............................................................120
Web Search and Mining by Christopher Manning and Prabhakar Raghavan,,
Winter 2005................................................................................................120
Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions,
Spring 2014................................................................................................120
Introduction to Time Series by Peter Bartlett, Berkeley, Homework & solutions,
Fall 2010.....................................................................................................121
Introduction to Machine Learning by Stuart Russel, CS 194-10, Fall 2011,
Assignments & Solutions............................................................................121
Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Fall 2009.....................................................................................................121
Advanced Topics in Machine Learning by Arthur Gretton, 2015, University
College London (exercises with solutions)..................................................121
Reinforcement Learning by David Silver, 2015, University College London
(exercises with solutions).............................................................................121
Emmanuel Candes Lectures, Homeworks & Solutions, Stanford University
(great resources, not to be missed!).............................................................122
Advanced Topics in Convex Optimization by Emmanuel Candes, Handouts,
Homeworks & Solutions, Winter 2015, Stanford University.....................122
MSM 4M13 Multicriteria Decision Making by SÁNDOR ZOLTÁN
NÉMETH, School of Mathematics, University of Birmingham.............122
10-601 Machine Learning Spring 2015, Homeworks & Solutions & Code
(Matlab)......................................................................................................122
Introduction to Machine Learning by Alex Smola, CMU, Homeworks &
Solutions.....................................................................................................123
Applications.........................................................................124
MIT Media Lab.........................................................................................124
TEDx San Francisco, Connected Reality..................................................124
Emotion&Pain Project................................................................................124
IBM Research.............................................................................................125
EFPL Ecole Polytechnique Fédérale de Lausanne ....................................125
Visualizing MBTA Data: An interactive exploration of Boston's subway system..
126
Commercial Applications ...................................................127
Google glass................................................................................................127
Google self-driving car...............................................................................127
SenseFly......................................................................................................127
HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE
GLOBAL LANGUAGE BARRIER..........................................................127
RESEARCH PAPERS, in English......................................128
Cambridge University Publications page...................................................128
arXiv.org by Cornell University Library ...................................................128
Google Scholar...........................................................................................128
Google Research.........................................................................................128
Yahoo Research..........................................................................................129
Microsoft Research.....................................................................................129
Journal from MIT Press.............................................................................129
DROPS, Dagstulh Research Online Publication Server............................129
OPEN SOURCE SOFTWARE, in English.......................130
Weka 3: Data Mining Software in Java......................................................130
A deep-learning library for Java.................................................................130
List of Java ML Software by Machine Learning Mastery.........................130
List of Java ML Software by MLOSS........................................................130
MathFinder: Math API Discovery and Migration, Software Engineering and
Analysis Lab (SEAL), IISc Bangalore.........................................................130
Google Java Style........................................................................................131
JSAT: java-statistical-analysis-tool by Edward Raff....................................131
Theano Library for Deep Learning, Python..............................................131
Theano and LSTM for Sentiment Analysis by Frederic Bastien, Universite de
Montreal.....................................................................................................132
Introduction to Deep Learning with Python..............................................132
COURSERA: An Introduction to Interactive Programming in Python (Part 1)...
132
COURSERA: An Introduction to Interactive Programming in Python (Part 2)...
133
COURSERA: Programming for Everybody (Python)...............................133
Udacity - Programming foundations with Python.....................................133
Scikit-learn, Machine Learning in Python.................................................133
Pydata ........................................................................................................134
PyData NYC 2014 Videos..........................................................................134
PyData, The Complete Works by Rohit Sivaprasad..................................134
Anaconda...................................................................................................135
Ipython Interactive Computing..................................................................135
Scipy...........................................................................................................135
Numpy........................................................................................................136
matplotlib...................................................................................................136
pandas.........................................................................................................136
SymPy.........................................................................................................136
Orange........................................................................................................137
Pythonic Perambulations: How to be a Bayesian in Python......................137
emcee..........................................................................................................137
PyMC.........................................................................................................137
Pylearn2......................................................................................................137
PyCon US 2014..........................................................................................138
PyCon India 2012......................................................................................138
PyCon India 2013......................................................................................138
Montreal Python........................................................................................138
SciPy 2014..................................................................................................139
PyLadies London Meetup resources..........................................................139
Python Tools for Machine Learning by CB Insights..................................139
Python Tutorials by Jessica MacKellar.......................................................139
INTRODUCTION TO PYTHON FOR DATA MINING.....................140
Notebook Gallery: Links to the best IPython and Jupyter Notebooks by ?140
Google Python Style Guide........................................................................140
Natural Language Processing with Python by Steven Bird, Ewan Klein, and
Edward Loper............................................................................................141
PyBrain Library..........................................................................................141
Classifying MNIST dataset with Pybrain...................................................142
OCTAVE....................................................................................................142
PMTK Toolbox by Matt Dunham, Kevin Murphy...................................142
Octave Tutorial by Paul Nissenson.............................................................143
JULIA.........................................................................................................143
Julia by example by Samuel Colvin............................................................144
The R PROJECT for Statistical Computing.............................................144
Coursera: R Programming.........................................................................144
R Graph Gallery........................................................................................145
Code School - R Course.............................................................................145
Coursera R programming..........................................................................145
Open Intro R Labs.....................................................................................145
R Tutorial...................................................................................................145
DataCamp R Course.................................................................................146
R Bloggers..................................................................................................146
R-Project Package: caret: Classification and Regression Training.........146
A Short Introduction to the caret Package by Max Kuhn.........................146
R packages by Hadley Wickham................................................................147
Google's R Style Guide..............................................................................147
STAN Software..........................................................................................147
List of Machine Learning Open Source Software.....................................148
Google Prediction API...............................................................................148
Reddit ........................................................................................................149
SCHOGUN toolbox..................................................................................149
Comparison between ML toolbox.............................................................149
Infer.NET, Microsoft Research...................................................................149
F# Software Foundation.............................................................................150
BigML........................................................................................................150
BRML Toolbox in Matlab/Julia David Barber Toolbox, University College
London.......................................................................................................150
SCILAB......................................................................................................150
OverFeat and Torch7, CILVR Lab @ NYU.............................................150
FAIR open sources deep-learning modules for Torch................................151
IPython kernel for Torch with visualization and plotting...........................151
Deep Learning Lecture 9: Neural networks and modular design in Torch by
Nando de Freitas, Oxford University.........................................................151
Deep Learning Lecture 8: Modular back-propagation, logistic regression and
Torch..........................................................................................................151
Machine Learning with Torch7: Defining your own Neural Net Module152
.
Lua Tutorial in 15 Minutes by Tyler Neylon.............................................152
Google: Punctuation, symbols & operators in search.................................152
WolframAlpha............................................................................................152
Computation and the Future of Mathematics by Stephen Wolfram, Oxford's
Podcast........................................................................................................153
Mloss.org....................................................................................................153
Sourceforge.................................................................................................153
AForge.NET Framework............................................................................153
cuda-convnet..............................................................................................153
word2vec.....................................................................................................154
Open Machine Learning Workshop organized by Alekh Agarwal, Alina
Beygelzimer, and John Langford, August 2014..........................................154
Maxim Milakov Software...........................................................................154
Alfonso Nieto-Castanon Software..............................................................154
Lib Skylark..................................................................................................155
Mutual Information Text Explorer............................................................155
Data Science Resources by Jonathan Bower on GitHub...........................155
Joseph Misiti Blog.......................................................................................156
Michael Waskom GitHub repositories.......................................................156
Visualizing distributions of data.................................................................156
Exploring Seaborn and Pandas based plot types in HoloViews by Philipp John
Frederic Rudiger.........................................................................................157
"Machine Learning: An Algorithmic Perspective" Code by Stephen Marsland....
157
Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you
need is there!)..............................................................................................157
Open Source Hong Kong..........................................................................158
Lamda Group, Nanjing University............................................................158
GATE, General Architecture for Text Engineering...................................158
CLARIN, Common Language Resources and Technology Infrastructure159
FLaReNet, Fostering Language Resources Network..................................159
My Data Science Resources by Viktor Shaumann.....................................159
MISCELLANEOUS..................................................................................160
Overleaf (ex WriteLaTeX).........................................................................160
Interview of Dr John Lees-Miller by Imperial College London ACM Student
Chapter.......................................................................................................160
LISA Lab GitHub repository, Université de Montréal .............................160
MILA, Institut des algorithmes d'apprentissage de Montréal, Montreal Institute
for Learning Algorithms.............................................................................161
Vowpal Wabbit GitHub repository by John Langford...............................161
Google-styleguide: Style guides for Google-originated open-source projects161
BIG DATA/CLOUD COMPUTING, in English.............162
Apache Spark Machine Learning Library.................................................162
Ampcamp, Big Data Boot Camp...............................................................162
Spark Summit 2013 Videos .......................................................................162
Spark Summit 2014 Videos .......................................................................162
Spark Summit 2015 Videos & Slides..........................................................163
Spark Summit Training & Videos..............................................................163
Databricks Videos.......................................................................................163
SF Scala & SF Bay Area Machine Learning, Joseph Bradley: Decision Trees on
Spark...........................................................................................................163
Apache Mahout ML library.......................................................................163
Apache Mahout on Javaworld....................................................................164
MapReduce programming with Apache Hadoop, 2008............................164
Hadoop Users Group UK..........................................................................164
Deeplearning4j...........................................................................................164
Udacity opencourseware "Intro to Hadoop and MapReduce" ................165
Storm Apache............................................................................................166
Scaling Apache Storm by Taylor Goetz.....................................................166
Michael Viogiatzis Blog .............................................................................166
Prediction IO..............................................................................................166
PredictionIO tutorial - Thomas Stone - PAPIs.io '14.................................166
Container Cluster Manager.......................................................................167
Domino Data Labs.....................................................................................167
Data Science Central..................................................................................168
Amazon Web Services Videos....................................................................168
Google Cloud Computing Videos..............................................................168
VLAB: Deep Learning: Intelligence from Big Data, Stanford Graduate School
of Business..................................................................................................168
Machine Learning and Big Data in Cyber Security Eyal Kolman Technion
Lecture .......................................................................................................168
Chaire Machine Learning Big Data, Telecom Paris Tech (Videos in French)168
An Architecture for Fast and General Data Processing on Large Clusters by
Matei Zaharia, 2014...................................................................................169
Big Data Requires Big Visions For Big Change | Martin Hilbert | TEDxUCL...
170
Ethical Quandary in the Age of Big Data | Justin Grace | TEDxUCL...170
Big Data & Dangerous Ideas | Daniel Hulme | TEDxUCL....................171
List of good free Programming and Data Resources, BITBOOTCAMP.171
BIG Data, Medical Imaging and Machine Intelligence by Professor
H.R.Tizhoosh at the University of Waterloo.............................................172
Session 6: Science in the cloud: big data and new technology...................172
MapReduce for C: Run Native Code in Hadoop by Google Open Source
Software......................................................................................................172
Machine Learning & Big Data at Spotify with Andy Sloane, Big Data Madison
Meetup.......................................................................................................173
Hands on tutorial on Neo4J with Max De Marzi, Big Data Madison Meetup.....
173
TED Talk: What do we do with all this big data? by Susan Etlinger.........173
Big Data's Big Deal by Viktor Mayer-Schonberger, Oxford's Podcast.......173
BID Data Project - Big Data Analytics with Small Footprint.....................174
SF Big Analytics and SF Machine learning meetup: Machine Learning at the
Limit by Prof. John Canny.........................................................................174
COMPETITIONS, in English...........................................176
Angry Birds AI Competition......................................................................176
ChaLearn...................................................................................................176
ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015)177
Kaggle........................................................................................................177
Kaggle Competition Past Solutions............................................................177
Kaggle Connectomics Winning Solution Research Article........................177
Solution to the Galaxy Zoo Challenge by Sander Dieleman.....................177
Winning 2 Kaggle in class competitions on spam......................................178
Matlab Benchmark for Packing Santa’s Sleigh translated in Python.........178
Machine learning best practices we've learned from hundreds of competitions Ben Hamner (Kaggle)................................................................................178
TEDx San Francisco, Jeremy Howard talk (Connecting Devices with
Algorithms).................................................................................................178
CrowdANALYTICS..................................................................................178
Challenges for governmental applications..................................................178
InnoCentive Challenge Center..................................................................178
TunedIT.....................................................................................................179
Ants, AI Challenge, sponsored by Google, 2011........................................179
International Collegial Programming Contest...........................................179
Dream challenges.......................................................................................179
Texata.........................................................................................................180
IoT World Forum Young Women's Innovation Grand Challenge.............180
COMPETITIONS, in French............................................182
COMPETITIONS, in Russian...........................................182
Russian AI Cup - Competition Programming Artificial Intelligence.........182
OPEN DATASET, in English.............................................183
Friday Lunch time Lectures at the Open Data Institute, Videos, slides and
podcasts (not to be missed!)........................................................................183
Open data Institute: Certify your open data..............................................183
The Text REtrieval Conference (TREC) Datasets.....................................183
HDX Humanitarian Data Exchange.........................................................184
World Data Bank........................................................................................185
US Dataset.................................................................................................185
US City Open Data Census.......................................................................186
Machine Learning repository.....................................................................186
IMAGENET..............................................................................................186
Stanford Large Network Dataset Collection..............................................187
Deep Learning datasets..............................................................................187
Open Government Data (OGD) Platform India.......................................188
Yahoo Datasets...........................................................................................188
Windows Azure Marketplace.....................................................................188
Amazon Public Data Sets...........................................................................188
Wikipedia: Database Download.................................................................189
Gutenberg project (Free books available in different format, useful for NLP)189
Freebase......................................................................................................189
Datamob Data............................................................................................189
Reddit Datasets...........................................................................................189
100+ Interesting Data Sets for Statistics....................................................189
Data portal of the City of Chicago............................................................190
Data portal of the City of Seattle..............................................................190
Data portal of the City of LA....................................................................190
California Department of Water Resources..............................................190
Data portal of the City of Dallas...............................................................191
Data portal of the City of Austin...............................................................191
How to produce and use datasets: lessons learned, mlwave.......................191
MITx and HarvardX release MOOC datasets and visualization tools.....192
Finding the perfect house using open data, Justin Palmer’s Blog...............192
Synapse.......................................................................................................192
NYC Taxi Trips Date from 2013...............................................................192
Sebastian Raschka’s Dataset Collections....................................................192
Awesome Public Datasets by Xiaming Chen, Shanghai, China................192
UK Dataset.................................................................................................193
LONDON DATASTORE - 601 datasets found (28-08-2015)..................193
Transport For London Open Data, UK....................................................193
Gaussian Processes List of Datasets...........................................................193
The New York Times Linked Open Data .................................................194
Google Public Data Explorer.....................................................................194
The Million Song Dataset..........................................................................195
CrowFlower Open Data Library...............................................................195
OPEN DATASET, in French..............................................196
Montreal, Portail Donnees Ouvertes (French&English), Canada..............196
Insee, France...............................................................................................196
RATP Open Data, French Tube in Paris, France......................................196
L’Open-Data français cartographié...........................................................196
OPEN DATASET, China...................................................197
Lamda Group.............................................................................................197
DATA VISUALIZATION..................................................198
Visualization Lab Gallery, Computer Science Division, University of California,
Berkeley......................................................................................................198
Visualization Lab Software, Computer Science Division, University of
California, Berkeley....................................................................................200
Visualization Lab Course Wiki, Computer Science Division, University of
California, Berkeley....................................................................................200
Mike Bostock..............................................................................................200
Eyeo Festival...............................................................................................200
MIT Data Collider.....................................................................................200
D3 JS Data-Driven Documents..................................................................200
Shan He, Research Fellow at MIT Senseable City Lab.............................201
Gource software version control visualization ...........................................201
Logstalgia, website access log visualization................................................201
Andrew Caudwell's Blog............................................................................201
MLDemos , EPFL, Switzerland.................................................................202
The University of Florida Sparse Matrix Collection.................................202
Visualization & Graphics lab, Dept. of CSA and SERC, Indian Institute of
Science, Bangalore.....................................................................................203
Allison McCann.........................................................................................203
Scott Murray..............................................................................................203
Gephi: The Open Graph Viz Platform......................................................203
Data Analysis and Visualization Using R by David Robinson...................204
Visualising Data Blog (Huge list of resources, great blog!).........................204
The 8 hats of Data Visualisation Design by Andy Kirk.............................205
Andy Kirk, Visualisation consultant at the Big Data Week, 2013..............205
Image Gallery by the Arts and Humanities Research Council, UK..........205
Setosa.io by Victor Powell & Lewis Lehe...................................................205
BOOKS, in English............................................................206
2015
206
Bayesian Reasoning and Machine Learning, David Barber, 2012 (online version
04-2015)......................................................................................................206
Deep Learning (Artificial Intelligence) , An MIT Press book in preparation, by
Yoshua Bengio, Ian Goodfellow and Aaron Courville, Jul-2015................206
Neural Networks and Deep Learning by Michael Nielsen, 2015 .............207
2014
208
An Architecture for Fast and General Data Processing on Large Clusters by
Matei Zaharia, 2014...................................................................................208
Deep Learning Tutorial by LISA Lab, University of Montreal, 2014.......209
Statistical Inference for Everyone, by Professor Bryan Blais, 2014............210
Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman,
2014............................................................................................................210
Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi, Huan Liu,
2014............................................................................................................211
Causal Inference by Miguel A. Hernán and James M. Robins, May 14, 2014,
Draft...........................................................................................................212
Slides for High Performance Python tutorial at EuroSciPy, 2014 by Ian Ozsvald.
213
Probabilistic Programming and Bayesian Methods for Hackers by Cameron
Davidson-Pilon, 2014.................................................................................213
Past, Present, and Future of Statistical Science by COPSS, 2014.............213
Essential of Metaheuristics by Sean Luke, 2014........................................213
2013
214
Interactive Data Visualization for the Web By Scott Murray, 2013...........214
Statistical Model Building, Machine Learning, and the Ah-Ha Moment by
Grace Wahba, 2013....................................................................................214
An Introduction to Statistical Learning with applications in R. by Gareth James
Daniela Witten Trevor Hastie Robert Tibshirani, 2013 (first printing).....214
2012
215
Reinforcement Learning by Richard S. Sutton and Andrew G. Barto, 2012,
Second edition in progress (PDF)...............................................................215
R Graphics Cookbook Code Resources (Graphs with ggplot2) by Winston
Chang, 2012...............................................................................................215
Supervised Sequence Labelling with Recurrent Neural Networks by Alex
Graves, 2012...............................................................................................215
A course in Machine Learning by Hal Daume, 2012................................216
Machine Learning in Action, Peter Harrington, 2012...............................216
A Programmer's Guide to Data Mining, by Ron Zacharski, 2012............216
2010
217
Artificial Intelligence, Foundations of Computational Agents by David Poole
and Alan Mackworth, 2010........................................................................217
Introduction to Machine Learning by Ethem Alpaydın, MIT Press, Second
Edition, 2010, 579 pages............................................................................217
2009
218
The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J. Friedman,
2009............................................................................................................218
Learning Deep Architecture for AI by Yoshua Bengio, 2009....................219
An Introduction to Information Retrieval by Christopher D. Manning
Prabhakar Raghavan Hinrich Schütze, 2009.............................................219
2008
220
Kernel Method in Machine Learning by Thomas Hofmann; Bernhard
Schölkopf; Alexander J. Smola, 2008.........................................................220
Introduction to Machine Learning, Alex Smola, S.V.N. Vishwanathan, 2008
220
2006
221
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006221
Gaussian processes for Machine Learning, C. Rasmussen and C. Williams, 2006
....................................................................................................................222
2005
222
Bayesian Machine Learning by Chakraborty, Sounak, 2005.....................222
Machine Learning by Tom Mitchell, 2005................................................222
2003
223
Information Theory, Inference, and Learning Algorithms, David McKay, 2003..
223
MISCELLANEOUS
224
Free Book List.............................................................................................224
Free resource book (need to sign in)...........................................................224
Wikipedia: Machine Learning, the Complete Guide.................................224
ISSUU........................................................................................................224
Neural Networks, A Systematic Introduction by Raul Rojas.....................225
BOOKS, in Spanish............................................................226
BOOKS, in Portuguese.......................................................226
BOOKS, in German...........................................................226
BOOKS, in Italian..............................................................226
BOOKS, in French.............................................................226
BOOKS, in Russian............................................................227
Pattern Recognition by А.Б.Мерков, 2014................................................227
Algorithmic models of learning classification: rationale, comparison, selection,
2014............................................................................................................227
BOOKS, in Japanese..........................................................227
BOOKS, in Chinese...........................................................228
Blog recommending useful books...............................................................228
Textbook for Statistics................................................................................228
Introduction to Pattern recognition............................................................228
Translated version of Machine Learning by Tom Mitchell.......................228
Presentation, Infographics and Documents in English.......229
Meetup's Presentations...............................................................................229
Slideshare.com............................................................................................229
Slides.com...................................................................................................229
Powershow.com..........................................................................................229
Speaker Deck..............................................................................................229
Introduction to Artificial Intelligence, 2014, University of Waterloo........229
Aprendizado de Maquina, Conceitos e definicoes by Jose Augusto Baranauskas .
229
Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF,
2010............................................................................................................230
NYC ML Meetup, 2014.............................................................................230
Statistics with Doodles by Thomas Levine.................................................230
Conferences.........................................................................231
ICML, Lille, France 2015...........................................................................231
ICML, Beijing, China 2014.......................................................................231
ICML, Atlanta, US 2013...........................................................................231
ICML, Edinburgh, UK 2012.....................................................................231
ICML, Bellevue, US 2011..........................................................................231
Full archive of ICML.................................................................................231
Machine Learning Conference Videos......................................................232
Annual Machine Learning Symposium.....................................................232
MLSS Machine Learning Summer Schools..............................................232
Data Gotham 2012, 2013...........................................................................232
Meetup ......................................................................................................232
Data Science Weekly
List of Meetups.....................................................232
London Machine Learning Meetup...........................................................232
BLOGS, in English.............................................................233
Igor Carron Blog........................................................................................233
Data Science Weekly..................................................................................233
Yann LeCun, Google+...............................................................................233
KDD Community, Knowledge discovery and Data Mining......................233
Kaggle Blog................................................................................................233
Digg............................................................................................................234
Feedly..........................................................................................................234
Mlwave.......................................................................................................234
FastML.......................................................................................................234
Beating the Benchmark..............................................................................234
Trevor Stephens Blog.................................................................................235
Mozilla Hacks.............................................................................................235
Banach's Algorithmic Corner, University of Warsaw................................235
DataCamp Blog..........................................................................................235
Natural Language Processing Blog, Hal Daume........................................235
Maxim Milakov Blog..................................................................................235
Alfonso Nieto-Castanon Blog.....................................................................235
Persontyle Blog...........................................................................................236
Analytics Vidhya.........................................................................................236
Bugra Akyildiz Blog....................................................................................237
Rasbt Blog..................................................................................................237
Gilles Louppe Blog.....................................................................................237
AI Topics....................................................................................................237
AI International..........................................................................................237
Joseph Misiti Blog.......................................................................................237
MIRI, Machine Intelligence Research Institute.........................................238
Kevin Davenport Data Blog.......................................................................238
Alexandre Passant Blog..............................................................................238
Daniel Nouri Blog......................................................................................239
Yvonne Rogers Blog...................................................................................239
Igor Subbotin Blog (Both in English & Russian)........................................239
Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you
need is there!)..............................................................................................239
Popular Science Website.............................................................................240
HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE
GLOBAL LANGUAGE BARRIER .........................................................240
Max Woolf Blog.........................................................................................240
Rasmus Bååth Research Blog.....................................................................240
Flowing Data Blog......................................................................................241
The Shape of Data Blog............................................................................241
Data School Blog........................................................................................242
Julia Evans Blog..........................................................................................242
Stephan Hügel's Blog.................................................................................243
BACKCHANNEL "Tech Stories Hub" by Steven Levy............................244
DataScience Vegas.....................................................................................245
The Twitter Developer Blog.......................................................................245
Tyler Neylon Blog.......................................................................................245
Victor Powell Blog......................................................................................245
CrowFlower Blog........................................................................................245
Edward Raff Blog......................................................................................245
Dirk Gorissen Blog and Projects.................................................................246
Joseph Jacobs Homepage & Blog...............................................................246
MISCELLANEOUS..................................................................................246
Allen Institute for Artificial Intelligence (AI2)............................................246
Artificial General Intelligence (AGI) Society..............................................247
AUAI, Association for Uncertainty in Artificial Intelligence.....................247
BLOGS, in Spanish.............................................................248
BLOGS, in Portuguese........................................................248
BLOGS, in Italian...............................................................248
BLOGS, in German............................................................248
BLOGS, in French..............................................................249
L'ATELIER's News ...................................................................................249
BLOGS, in Russian.............................................................250
Igor Subbotin's Blog (Both in English & Russian) (Huge list of resources)250
BLOGS, in Japanese...........................................................251
BLOGS, in Chinese............................................................251
JOURNALS, in English......................................................252
Journal of Machine Learning Research, MIT Press..................................252
Machine Learning Journal (last article could be downloaded for free)......252
Machine Learning (Theory).......................................................................252
List of Journals on Microsoft Academic Research website........................252
Wired magazine..........................................................................................252
Data Science Central..................................................................................252
JOURNALS, in Spanish.....................................................253
JOURNALS, in Portuguese................................................253
JOURNALS, in Italian.......................................................253
JOURNALS, in German....................................................253
JOURNALS, in French.......................................................253
JOURNALS, in Russian.....................................................253
JOURNALS, in Japanese....................................................254
JOURNALS, in Chinese.....................................................254
FORUM, Q&A, in English.................................................255
Data Tau.....................................................................................................255
Hacker News..............................................................................................255
Kaggle Forums...........................................................................................255
Reddit /r/MachineLearning.....................................................................255
Reddit /r/generative..................................................................................256
Cross validated Stack Exchange.................................................................256
Open data Stack Exchange........................................................................256
Data Science Beta Stack Exchange............................................................256
Quora.........................................................................................................256
Machine Learning Impact Forum..............................................................257
FORUM, Q&A, in Spanish................................................258
FORUM, Q&A, in Portuguese...........................................258
FORUM, Q&A, in Italian...................................................258
FORUM, Q&A, in German...............................................258
FORUM, Q&A, in French..................................................258
FORUM, Q&A, in Russian.................................................259
Reddit in Russian .......................................................................................259
Habrahabr.ru Forum (in Russian translated by Google Chrome)..............259
FORUM, Q&A, in Japanese...............................................260
FORUM, Q&A, in Chinese................................................260
Zhihu.com..................................................................................................260
Guokr.com..................................................................................................260
Governmental REPORTS, in English................................262
Big Data report, Whitehouse, US..............................................................262
FUN, in English...................................................................263
Founder of PhD Comics............................................................................263
MACHINE LEARNING RESEARCH GROUPS, in USA264
Computer Science and Artificial Intelligence Lab, MIT...........................264
Artificial Intelligence Laboratory, Stanford University..............................264
Machine Learning Department, Carnegie Mellon University..................265
Noah's ARK Research Group, Carnegie Mellon University.....................265
Intelligent Interactive Systems Group, Harvard University.......................265
Statistical Machine Learning, University of California, Berkeley..............266
UC Berkeley AMPLab, AMP: ALGORITHMS MACHINES PEOPLE267
Berkeley Institute for Data Science............................................................267
Department of Computer Science - ARTIFICIAL INTELLIGENCE &
MACHINE LEARNING, Princeton University........................................268
Research Laboratories and Groups, University of California, Los Angeles
(UCLA).......................................................................................................268
Cornwell University...................................................................................269
Machine Learning Research, University of Illinois at Urbana Champaign269
Department of Computing + Mathematical Science, California Institute of
Technology, Caltech...................................................................................269
Machine Learning, University of Washington...........................................270
"Big Data" Research and Education, University of Washington...............270
Social Robotics Lab - Yale University........................................................270
ML@GT, Georgia Institute of Technology...............................................271
Machine Learning Research Group, University of Texas and Austin.......271
Penn Research in Machine Learning, University of Pennsylvania............271
Machine Learning @ Columbia University...............................................271
New York City University...........................................................................271
University of Chicago................................................................................272
The Johns Hopkins Center for Language and Speech Processing (CLSP)
Archive Videos............................................................................................272
MISCELLENEAOUS...............................................................................272
IARPA Organization..................................................................................272
MACHINE LEARNING RESEARCH GROUPS, in Canada
273
Machine Learning Lab, University of Toronto.........................................273
The Fields Institute for Research in Mathematical Science, University of
Toronto.......................................................................................................273
Artificial Intelligence Research Group, University of Waterloo.................273
Artificial Intelligence Research Groups, University of British Columbia .274
MILA, Machine Learning Lab, University of Montreal...........................275
Intelligence artificielle, University of Sherbrooke......................................276
Centre de recherche sur les environnements intelligents, University of
Sherbrooke.................................................................................................276
Machine Learning Research Group, University of Laval..........................277
MACHINE LEARNING RESEARCH GROUPS, in Brazil278
MACHINE LEARNING RESEARCH GROUPS, in United
Kingdom.............................................................................279
The Centre for Computational Statistics and Machine Learning (CSML),
University College London........................................................................279
CASA (Centre for Advanced Spatial Studies) Working Papers, University
College London..........................................................................................279
The Machine Learning Research Group in the Department of Engineering
Science, Oxford University.........................................................................280
Machine Learning Group, Imperial College..............................................281
The Data Science Institute, Imperial College............................................282
The University of Edinburgh, Institute for Adaptive and Neural Computation...
282
Cambridge University................................................................................282
Centre for Intelligent Sensing, Queen Mary University of London..........282
ICRI, The Intel Collaborative Research Institute......................................283
MACHINE LEARNING RESEARCH GROUPS, in France
284
Magnet, MAchine learninG in information NETworks, INRIA...............284
Sierra Team - Ecole Normale Superieure , CNRS, INRIA.......................284
ENS Ecole Normale Superieure................................................................285
WILLOW Publications and PhD Thesis....................................................286
Laboratoire Hubert Curien UMR CNRS 5516, Machine Learning........286
MACHINE LEARNING RESEARCH GROUPS, in Germany
.............................................................................................288
Max Planck Institute for Intelligent Systems, Tübingen site......................288
BRML Research Lab, Institute of Informatics at the Technische Universität
München....................................................................................................288
HCI, Heidelberg Collaboratory for Image Processing, Universität Heidelberg....
289
MACHINE LEARNING RESEARCH GROUPS, in
Switzerland .........................................................................290
EPFL Ecole Polytechnique Federale de Lausanne, Switzerland................290
IDSIA: the Swiss AI Lab............................................................................290
MACHINE LEARNING RESEARCH GROUPS, in
Netherlands.........................................................................292
Machine Learning Research Groups in The Netherlands.........................292
MACHINE LEARNING RESEARCH GROUPS, in
POLAND............................................................................293
University of Warsaw, Dept. of Mathematics, Informatics and Mechanics293
MACHINE LEARNING RESEARCH GROUPS, in India294
RESEARCH LABS, Department of Computer Science and Automation, IISc,
Bangalore....................................................................................................294
MLSIG: Machine Learning Special Interest Group, Indian Institute of Science.
294
MACHINE LEARNING RESEARCH GROUPS, in China 295
Peking University........................................................................................295
University of Science and Technology of China, USTC..........................296
Nanjing University.....................................................................................296
MACHINE LEARNING RESEARCH GROUPS, in Russia.
298
Moscow State University............................................................................298
MACHINE LEARNING RESEARCH GROUPS, in Australia
299
NICTA Machine Learning Research Group.............................................299
ACADEMICS, USA...........................................................300
Andrew Ng, Stanford University................................................................300
Emmanuel Candes, Stanford University....................................................300
Tom Mitchell, Carnegie Mellon University (CMU)...................................300
Robert Kass, CMU....................................................................................301
Alexander J. Smola, CMU.........................................................................301
Maria-Florina Balcan, CMU.....................................................................302
Abulhair Saparov, CMU............................................................................302
John Canny, Berkeley University,................................................................302
Robert Schapire, Princeton University.......................................................303
Mona Singh, Princeton University.............................................................303
Olga Troyanskaya, Princeton University...................................................303
Judea Pearl, Cognitive System Laboratory, UCLA....................................304
Justin Esarey Lectures, Assistant Professor of Political Science, Rice University...
304
Hal Daume III, University of Maryland...................................................304
Melanie Mitchell, Portland State University..............................................305
ACADEMICS, France........................................................306
Francis Bach, Ecole Normale Supérieure..................................................306
Gaël Varoquaux, INRIA............................................................................306
ACADEMICS, in United Kingdom...................................308
John Shaw-Taylor, University College London..........................................308
Mark Herbster, University College London...............................................308
David Barber, University College London.................................................309
Gabriel Brostow, University College London.............................................309
Jun Wang, University College London.......................................................309
David Jones Lab, University College London............................................310
Simon Prince, University College London.................................................310
Massimiliano Pontil, University College London.......................................311
Richard E Turner, Cambridge University..................................................311
Andrew McHutchon Homepage, Cambridge University..........................311
Phil Blunsom, Oxford University...............................................................312
Nando de Freitas, Oxford University.........................................................312
Karl Hermann, Oxford University............................................................312
Edward Grefenstette, Oxford University...................................................313
ACADEMICS, in Netherlands...........................................314
Thomas Geijtenbeek Publications & Videos, Delft University of Technology
314
ACADEMICS, in Canada..................................................315
Yoshua Bengio, University of Montreal.....................................................315
KyungHyun Cho, University of Montreal.................................................315
Geoffrey Hinton, University of Toronto....................................................315
Alex Graves, University of Toronto...........................................................316
Hugo Larochelle, Universite de Sherbrooke..............................................316
Giuseppe Carenini, University of British Columbia..................................317
Cristina Conati, University of British Columbia.......................................317
Kevin Leyton-Brown, University of British Columbia..............................317
Holger Hoos, University of British Columbia...........................................317
Jim Little, University of British Columbia.................................................317
David Lowe, University of British Columbia.............................................317
Karon MacLean, University of British Columbia.....................................317
Alan Mackworth, University of British Columbia.....................................317
Dinesh K. Pai, University of British Columbia..........................................317
David Poole, University of British Columbia.............................................317
Prof. Shai Ben-David, University of Waterloo..........................................318
ACADEMICS, in Germany...............................................319
Machine Learning Lab, University of Freiburg.........................................319
ACADEMICS, in China.....................................................320
En-Hong Chen, USPC...............................................................................320
Linli Xu, USPC..........................................................................................320
Yuan Yao, School of Mathematical Sciences, University of Beijing..........320
ACADEMICS, in Australia................................................321
Prof. Peter Corke, Queensland University of Technology.........................321
ACADEMICS, in United Arab Emirates...........................322
Dmitry Efimov, American University of Sharjah, UAE............................322
ACADEMICS, in Poland....................................................323
Marcin Murca, University of Warsaw, POLAND.....................................323
ACADEMICS, in Switzerland............................................324
Prof. Jürgen Schmidhuber's Home Page (Great resources! Not to be missed!)......
324
Free access to ML MSc & PhD Dissertations.....................325
Machine Learning Department, Carnegie Mellon University..................325
Machine Learning Department, Columbia University..............................325
Non linear Modelling and Control using Gaussian Processes, PhD Thesis by
Andrew McHutchon, Cambridge University.............................................325
PhD Dissertations, University of Edingburgh, UK....................................326
MSc Dissertations, University of Oxford, UK...........................................326
Machine Learning Group, Department of Engineering, University of
Cambridge, UK..........................................................................................326
New York University Computer Science PhD Theses...............................326
Digital Collection of The Australian National University (PhD Thesis)...326
TEL (thèses-EN-ligne) (more than 45,000 thesis, however some in French!)326
ABOUT
The Machine Learning Salon Starter Kit
The Machine Learning Salon Starter Kit is a selection of useful websites compiled by Jacqueline
Isabelle Forien. The Starter Kit is free of charge and no registration is required to download it.
There is no advertising.
The useful websites are gathered on Blogs & Forums such as DataTau.com, Groups on LinkedIn,
posts on Twitter, publications on Google Scholar and Machine Learning Research Group
websites, etc. All descriptions are coming from the websites themselves.
If you want to remove a link, please tell me why and I will take care of it as soon as possible.
If you want to add a better description of your website, please send me the new version and I will
do the change.
Contact at contact@machinelearningsalon.org
Founder of The Machine Learning Salon
My name is Jacqueline Isabelle Forien and I am from Tours, France, a small city located in the
middle of the Loire Valley. I am married and have four children.
After an Engineer's degree in Computer Science at the UTC Engineering school and few years
of work experience in that field, I decided to become a Mathematics teacher. I am still teaching
but in the meantime became passionate about Artificial Intelligence and more specifically,
Machine Learning. In 2013, I decided to start studying again at 53 years old and soon graduated
from University College London in M.Sc Machine Learning. Soon after, I decided to create the
Machine Learning Salon during my spare time so that I could stay updated on the changes that
happen regularly in that field.
I would like to express a special gratitude to my director of Machine Learning studies at UCL,
Professor Mark Herbster, my tutor, Professor David Barber, my supervisor of Master's project,
Professor Nadia Berthouze, as well as all my peers during this amazing year.
In addition, I would like to express many thanks to Igor Carron who suggested the smart
association of « Machine Learning » and « Salon », and gave me the opportunity to organise in
London a wonderful event that was the Europe Wide Machine Learning Meetup between Paris,
Berlin, Zurich and London with Andrew Ng as a Guest speaker.
I hope that this Starter Kit will help many people learn and get more involved in this passionate
field that is Machine Learning!
Jacqueline
Please, feel free to contact me if you want to add a contribution, remove a link, etc.
Any suggestion or feedback is welcome!
Contact at contact@machinelearningsalon.org MOOC, Opencourseware in
English
COURSERA: Machine Learning Stanford Course
About the Course
This course provides a broad introduction to machine learning, datamining, and statistical
pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric
algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning
(clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in
machine learning (bias/variance theory; innovation process in machine learning and AI). The
course will also draw from numerous case studies and applications, so that you'll also learn how to
apply learning algorithms to building smart robots (perception, control), text understanding (web
search, anti-spam), computer vision, medical informatics, audio, database mining, and other
areas.
https://www.coursera.org/course/ml
COURSERA: Pratical Machine Learning
Part of the Data Science Specialization »
About the Course
One of the most common tasks performed by data scientists and data analysts are prediction and
machine learning. This course will cover the basic components of building and applying
prediction functions with an emphasis on practical applications. The course will provide basic
grounding in concepts such as training and tests sets, overfitting, and error rates. The course will
also introduce a range of model based and algorithmic machine learning methods including
regression, classification trees, Naive Bayes, and random forests. The course will cover the
complete process of building prediction functions including data collection, feature creation,
algorithms, and evaluation.
https://www.coursera.org/course/predmachlearn
COURSERA: Neural Networks for Machine Learning
Neural Networks use learning algorithms that are inspired by our understanding of how the
brain learns, but they are evaluated by how well they work for practical applications such as
speech recognition, object recognition, image retrieval and the ability to recommend products
that a user will like. As computers become more powerful, Neural Networks are gradually taking
over from simpler Machine Learning methods. They are already at the heart of a new generation
of speech recognition devices and they are beginning to outperform earlier systems for
recognizing objects in images. The course will explain the new learning procedures that are
responsible for these advances, including effective new proceduresr for learning multiple layers of
non-linear features, and give you the skills and understanding required to apply these procedures
in many other domains.
https://www.coursera.org/course/neuralnets
COURSERA: Data Science Specialization
https://www.coursera.org/specialization/jhudatascience/1?utm medium=listingPage
The Data Scientist’s Toolbox
Part of the Data Science Specialization »
Course Syllabus
Upon completion of this course you will be able to identify and classify data science problems.
You will also have created your Github account, created your first repository, and pushed your
first markdown file to your account.
https://www.coursera.org/course/datascitoolbox
Getting and Cleaning Data
Part of the Data Science Specialization »
About the Course
Before you can work with data you have to get some. This course will cover the basic ways that
data can be obtained. The course will cover obtaining data from the web, from APIs, from
databases and from colleagues in various formats. It will also cover the basics of data cleaning
and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The
course will also cover the components of a complete data set including raw data, processing
instructions, codebooks, and processed data. The course will cover the basics needed for
collecting, cleaning, and sharing data.
https://www.coursera.org/course/getdata
Exploratory Data Analysis
Part of the Data Science Specialization »
About the Course
This course covers the essential exploratory techniques for summarizing data. These techniques
are typically applied before formal modeling commences and can help inform the development
of more complex statistical models. Exploratory techniques are also important for eliminating or
sharpening potential hypotheses about the world that can be addressed by the data. We will cover
in detail the plotting systems in R as well as some of the basic principles of constructing data
graphics. We will also cover some of the common multivariate statistical techniques used to
visualize high-dimensional data.
https://www.coursera.org/course/exdata
Statistical Inference
Part of the Data Science Specialization »
About the Course
Statistical inference is the process of drawing conclusions about populations or scientific truths
from data. There are many modes of performing inference including statistical modeling, data
oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there
are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous
complexities (missing data, observed and unobserved confounding, biases) for performing
inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and
nuance. This course presents the fundamentals of inference in a practical approach for getting
things done. After taking this course, students will understand the broad directions of statistical
inference and use this information for making informed choices in analyzing data.
https://www.coursera.org/course/statinference
Regression Models
Part of the Data Science Specialization »
About the Course
Linear models, as their name implies, relates an outcome to a set of predictors of interest using
linear assumptions. Regression models, a subset of linear models, are the most important
statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least
squares and inference using regression models. Special cases of the regression model, ANOVA
and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated.
The course will cover modern thinking on model selection and novel uses of regression models
including scatterplot smoothing.
https://www.coursera.org/course/regmods
Developing Data Products
Part of the Data Science Specialization »
About the Course
A data product is the production output from a statistical analysis. Data products automate
complex analysis tasks or use technology to expand the utility of a data informed model,
algorithm or inference. This course covers the basics of creating data products using Shiny, R
packages, and interactive graphics. The course will focus on the statistical fundamentals of
creating a data product that can be used to tell a story about data to a mass audience.
https://www.coursera.org/course/devdataprod
COURSERA: Reasoning, Data Analysis and Writing
Specialization
Data Analysis and Statistical Inference
Part of the Reasoning, Data Analysis and Writing Specialization »
About the Course
The goals of this course are as follows:
•
Recognize the importance of data collection, identify limitations in data collection
methods, and determine how they affect the scope of inference.
•
Use statistical software (R) to summarize data numerically and visually, and to perform
data analysis.
•
Have a conceptual understanding of the unified nature of statistical inference.
•
Apply estimation and testing methods (confidence intervals and hypothesis tests) to
analyze single variables and the relationship between two variables in order to understand
natural phenomena and make data-based decisions.
•
Model and investigate relationships between two or more variables within a regression
framework.
•
Interpret results correctly, effectively, and in context without relying on statistical jargon.
•
Critique data-based claims and evaluate data-based decisions.
Complete a research project that employs simple statistical inference and modeling techniques.
https://www.coursera.org/course/statistics
Process Mining: Data science in Action
About the Course
Data science is the profession of the future, because organizations that are unable to use (big)
data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis.
The data scientist also needs to relate data to process analysis. Process mining bridges the gap
between traditional model-based process analysis (e.g., simulation and other business process
management techniques) and data-centric analysis techniques such as machine learning and data
mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and
process models (hand-made or discovered automatically). This technology has become available
only recently, but it can be applied to any type of operational processes (organizations and
systems). Example applications include: analyzing treatment processes in hospitals, improving
customer service processes in a multinational, understanding the browsing behavior of customers
using a booking site, analyzing failures of a baggage handling system, and improving the user
interface of an X-ray machine. All of these applications have in common that dynamic behavior
needs to be related to process models. Hence, we refer to this as "data science in action".
The course explains the key analysis techniques in process mining. Participants will learn various
process discovery algorithms. These can be used to automatically learn process models from raw
event data. Various other process analysis techniques that use event data will be presented.
Moreover, the course will provide easy-to-use software, real-life data sets, and practical skills to
directly apply the theory in a variety of application domains.
https://www.coursera.org/course/procmin
COURSERA: Data Mining Specialization
https://www.coursera.org/specialization/datamining/20?utm medium=courseDescripTop
Pattern Discovery in Data Mining
Part of the Data Mining Specialization »
About the Course
Learn the general concepts of data mining along with basic methodologies and applications.
Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts,
methods, and applications of pattern discovery in data mining. We will also introduce methods
for pattern-based classification and some interesting applications of pattern discovery. This
course provides you the opportunity to learn skills and content to practice and engage in
scalable pattern discovery methods on massive transactional data, discuss pattern
evaluation measures, and study methods for mining diverse kinds of patterns, sequential
patterns, and sub-graph patterns.
https://www.coursera.org/course/patterndiscovery
Text Retrieval and Search Engines
Part of the Data Mining Specialization »
About the Course
Recent years have seen a dramatic growth of natural language text data, including web pages,
news articles, scientific literature, emails, enterprise documents, and social media such as blog
articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually
generated directly by humans rather than a computer system or sensors, and are thus especially
valuable for discovering knowledge about people’s opinions and preferences, in addition to many
other kinds of knowledge that we encode in text.
This course will cover search engine technologies, which play an important role in any data
mining applications involving text data for two reasons. First, while the raw data may be large for
any particular problem, it is often a relatively small subset of the data that are relevant, and a
search engine is an essential tool for quickly discovering a small subset of relevant text data
in a large text collection. Second, search engines are needed to help analysts interpret any
patterns discovered in the data by allowing them to examine the relevant original text data to
make sense of any discovered pattern. You will learn the basic concepts, principles, and the
major techniques in text retrieval, which is the underlying science of search engines.
https://www.coursera.org/course/textretrieval
Text Mining and Analytics
Part of the Data Mining Specialization »
About the Course
This course will cover the major techniques for mining and analyzing text data to discover
interesting patterns, extract useful knowledge, and support decision making, with an emphasis on
statistical approaches that can be generally applied to arbitrary text data in any natural language
with no or minimum human effort.
Detailed analysis of text data requires understanding of natural language text, which is
known to be a difficult task for computers. However, a number of statistical approaches have
been shown to work well for the "shallow" but robust analysis of text data for pattern finding
and knowledge discovery. You will learn the basic concepts, principles, and major
algorithms in text mining and their potential applications.
https://www.coursera.org/course/textanalytics
Cluster Analysis in Data Mining
Part of the Data Mining Specialization »
About the Course
Discover the basic concepts of cluster analysis, and then study a set of typical clustering
methodologies, algorithms, and applications. This includes partitioning methods such as k-means,
hierarchical methods such as BIRCH, density-based methods such as DBSCAN/OPTICS,
probabilistic models, and the EM algorithm. Learn clustering and methods for clustering high
dimensional data, streaming data, graph data, and networked data. Explore concepts and
methods for constraint-based clustering and semi-supervised clustering. Finally, see examples of
cluster analysis in applications.
https://www.coursera.org/course/clusteranalysis
Data Visualization
Part of the Data Mining Specialization »
About the Course
Learn to present data to an observer in a way that yields insight and understanding. The first
week focuses on the infrastructure for data visualization. It introduces elementary graphics
programming, focusing primarily on two-dimensional vector graphics and the programming
platforms for graphics. This infrastructure will also include lessons on the human side of
visualization, studying human perception and cognition to gain a better understanding of the
target of the data visualization.
The second week will utilize the knowledge of graphics programming and human perception in
the design and construction of visualizations, starting with simple charts and graphs and
incorporating animation and user interactivity. The third week expands the data visualization
vocabulary with more sophisticated methods, including hierarchical layouts and networks.
The final week focuses on visualization of database and data mining processes, with methods
specifically focused on visualization of unstructured information, such as text, and systems
for visual analytics that provide decision support.
https://www.coursera.org/course/datavisualization
COURSERA: Cloud Computing Specialization
https://www.coursera.org/specialization/cloudcomputing/19?utm_medium=listingPage
Cloud Computing Concepts
Part of the Cloud Computing Specialization »
About the Course
Cloud computing systems today, whether open-source or used inside companies, are built using a
common set of core techniques, algorithms, and design philosophies all centered
around distributed systems. Learn about such fundamental distributed computing "concepts" for
cloud computing.
Some of these concepts include:
•
Clouds, MapReduce, key-value stores
•
Classical precursors
•
Widely-used algorithms
•
Classical algorithms
•
Scalability
•
Trending areas
•
And more!
Understand how these techniques work inside today’s most widely-used cloud computing
systems. Get your hands dirty using these concepts with provided homework exercises. In
the optional programming track, implement some of these concepts in template assignments
provided in C
programming language.
You will also watch interviews with leading managers and researchers, from both industry and
academia.
https://www.coursera.org/course/cloudcomputing
Cloud Computing Concepts: Part 2
Part of the Cloud Computing Specialization »
https://www.coursera.org/course/cloudcomputing2
Cloud Computing Applications
Part of the Cloud Computing Specialization »
About the Course
Learn of "cloudonomics," the underlying economic reasons that we are creating the cloud.
Learn the basic concepts underlying cloud services and be able to use services like AWS or
OpenStack Dashboard to construct cloud services or applications. Demonstrate your ability
to create web services, massively parallel data intensive computations using Map/Reduce,
NoSQL databases, and real-time processing of real-time data streams. Use machine learning
tools to solve simple problems.
This course serves as an introduction to building applications for cloud computing based on
emerging OpenStack and other platforms. The course includes concepts of:
•
Baremetal provisioning
•
Neutron networking
•
Identity service
•
Image service
•
Orchestration
•
Infrastructure as a service
•
Software as a service
•
Platform as a service
•
MapReduce
•
Big data
•
Analytics
•
Privacy and legal issues
The course will also include example problems and solutions to cloud computing, including
hands-­‐on laboratory experiments (Load Balancing and Web Services, MapReduce, Hive,
Storm, and Mahout). Case studies will be drawn from Yahoo, Google, Twitter, Facebook, data
mining, analytics, and machine learning.
https://www.coursera.org/course/cloudapplications
Cloud Networking
Part of the Cloud Computing Specialization »
About the Course
In the cloud networking course, we will see what the network needs to do to enable cloud
computing. We will explore current practice by talking to leading industry experts, as well
as looking into interesting new research that might shape the cloud network’s future.
This course will allow us to explore in-­‐depth the challenges for cloud networking how do
we build a network infrastructure that provides the agility to deploy virtual networks on a shared
infrastructure, that enables both efficient transfer of big data and low latency communication,
and that enables applications to be federated across countries and continents? Examining how
these objectives are met will set the stage for the rest of the course.
This course places an emphasis on both operations and design rationale i.e., how things
work and why they were designed this way. We're excited to start the course with you and take a
look inside what has become the critical communications infrastructure for many
applications today.
https://www.coursera.org/course/cloudnetworking
COURSERA: Miscellaneous
Core Concepts in Data Analysis (Higher School of Economics)
Learn both theory and application for basic methods that have been invented either for
developing new concepts principal components or clusters, or for finding interesting
correlations regression and classification. This is preceded by a thorough analysis of 1D and 2D
data
This is an unconventional course in modern Data Analysis, Machine Learning and Data Mining.
Its contents are heavily influenced by the idea that data analysis should help in enhancing and
augmenting knowledge of the domain as represented by the concepts and statements of relation
between them. According to this view, two main pathways for data analysis are summarization,
for developing and augmenting concepts, and correlation, for enhancing and establishing
relations. The term summarization embraces here both simple summaries like totals and means
and more complex summaries: the principal components of a set of features and cluster
structures in a set of entities. Similarly, correlation covers both bivariate and multivariate
relations between input and target features including Bayes classifiers.
https://www.coursera.org/course/datan
Natural Language Processing
Natural language processing (NLP) deals with the application of computational models to text or
speech data. Application areas within NLP include automatic (machine) translation between
languages; dialogue systems, which allow a human to interact with a machine using natural
language; and information extraction, where the goal is to transform unstructured text into
structured (database) representations that can be searched and browsed in flexible ways. NLP
technologies are having a dramatic impact on the way people interact with computers, on the
way people interact with each other through the use of language, and on the way people access
the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP
involves fundamental questions of how to structure formal models (for example statistical models)
of natural language phenomena, and of how to design algorithms that implement these models.
https://www.coursera.org/course/nlangp
Probability
About the Course
The renowned mathematical physicist Pierre-Simon, marquis de Laplace wrote in his opus on
probability in 1812 that “the most important questions of life are, for the most part, really only
problems in probability”. His words ring particularly true today in this the century of “big data”.
This introductory course takes us through the development of a modern, axiomatic theory of
probability. But, unusually for a technical subject, the material is presented in its lush and
glorious historical context, the mathematical theory buttressed and made vivid by rich and
beautiful applications drawn from the world around us. The student will see surprises in electionday counting of ballots, a historical wager the sun will rise tomorrow, the folly of gambling, the
sad news about lethal genes, the curiously persistent illusion of the hot hand in sports, the
unreasonable efficacy of polls and its implications to medical testing, and a host of other
beguiling settings. A curious individual taking this as a stand-alone course will emerge with a
nuanced understanding of the chance processes that surround us and an appreciation of the
colourful history and traditions of the subject. And for the student who wishes to study the
subject further, this course provides a sound mathematical foundation for courses at the advanced
undergraduate or graduate levels. https://www.coursera.org/course/probability
Probabilistic Graphical Models
Uncertainty is unavoidable in real-world applications: we can almost never predict with certainty
what will happen in the future, and even in the present and the past, many important aspects of
the world are not observed with certainty. Probability theory gives us the basic foundation to
model our beliefs about the different possible states of the world, and to update these beliefs as
new evidence is obtained. These beliefs can be combined with individual preferences to help
guide our actions, and even in selecting which observations to make. While probability theory has
existed since the 17th century, our ability to use it effectively on large problems involving many
inter-related variables is fairly recent, and is due largely to the development of a framework
known as Probabilistic Graphical Models (PGMs). This framework, which spans methods such as
Bayesian networks and Markov random fields, uses ideas from discrete data structures in
computer science to efficiently encode and manipulate probability distributions over highdimensional spaces, often involving hundreds or even many thousands of variables. These
methods have been used in an enormous range of application domains, which include: web
search, medical and fault diagnosis, image understanding, reconstruction of biological networks,
speech recognition, natural language processing, decoding of messages sent over a noisy
communication channel, robot navigation, and many more. The PGM framework provides an
essential tool for anyone who wants to learn how to reason coherently from limited and noisy
observations.
https://www.coursera.org/course/pgm
Machine Learning Techniques by Hsuan-Tien Lin, National Taiwan University
About the Course
Welcome! The instructor has decided to teach the course in Mandarin on Coursera, while the
slides of the course will be in English to ease the technical illustrations. We hope that this choice
can help introduce Machine Learning to more students in the Mandarin-speaking world. The
English-written slides will not require advanced English ability to understand, though. If you can
understand the following descriptions of this course, you can probably follow the slides.
https://www.coursera.org/course/ntumltwo
High Performance Scientific Computing
About the Course
Computation and simulation are increasingly important in all aspects of science and engineering.
At the same time writing efficient computer programs to take full advantage of current
computers is becoming increasingly difficult. Even laptops now have 4 or more processors, but
using them all to solve a single problem faster often requires rethinking the algorithm to
introduce parallelism, and then programming in a language that can express this parallelism. Writing efficient programs also requires some knowledge of machine arithmetic, computer
architecture, and memory hierarchies.
Although parallel computing will be covered, this is not a class on the most advanced techniques
for using supercomputers, which these days have tens of thousands of processors and cost
millions of dollars. Instead, the goal is to teach tools that you can use immediately on your own
laptop, desktop, or a small cluster. Cloud computing will also be discussed, and students who
don't have a multiprocessor computer of their own will still be able to do projects using Amazon
Web Services at very low cost.
Along the way there will also be discussion of software engineering tools such as debuggers, unit
testing, Makefiles, and the use of version control systems. After all, your time is more valuable
than computer time, and a program that runs fast is totally useless if it produces the wrong
results.
High performance programming is also an important aspect of high performance scientific
computing, and so another main theme of the course is the use of basic tools and techniques to
improve your efficiency as a computational scientist.
https://www.coursera.org/course/scicomp
Statistical Analysis of fMRI Data
About the Course
In this course we will explore the intersection of statistics and functional magnetic resonance
imaging, or fMRI, which is a non-invasive technique for studying brain activity. We will
discuss the analysis of fMRI data, from its acquisition to its use in locating brain activity, making
inference about brain connectivity and predictions about psychological or disease states. A
standard fMRI study gives rise to massive amounts of noisy data with a complicated spatiotemporal correlation structure. Statistics plays a crucial role in understanding the nature of the
data and obtaining relevant results that can be used and interpreted by neuroscientists.
https://www.coursera.org/course/fmri
STANFORD University: Stanford Engineering Everywhere
SEE programming includes one of Stanford’s most popular engineering sequences: the threecourse Introduction to Computer Science taken by the majority of Stanford undergraduates,
and seven more advanced courses in artificial intelligence and electrical engineering.
Introduction to Computer Science
Programming Methodology
CS106A
Programming Abstractions
CS106B
Programming Paradigms
CS107
Artificial Intelligence
Introduction to Robotics
CS223A
Natural Language Processing
CS224N
Machine Learning
CS229
Linear Systems and Optimization
The Fourier Transform and its Applications
EE261
Introduction to Linear Dynamical Systems
EE263
Convex Optimization I
EE364A
Convex Optimization II
EE364B
Additional School of Engineering Courses
Programming Massively Parallel Processors
CS193G
iPhone Application Programming
CS193P
Seminars and Webinars
http://see.stanford.edu/see/courses.aspx
STANFORD University: 2015 Stanford HPC Conference Video
Gallery
HPC Advisory Council Stanford Workshop 2015
The HPC Advisory Council, together with Stanford University, will hold the HPC Advisory
Council Stanford Conference 2015 on February 2-3, 2015, at Stanford, California. The
conference will focus on High-Performance Computing (HPC) usage models and benefits, the
future of supercomputing, latest technology developments, best practices and advanced HPC
topics. In addition, there will be a strong focus on socially responsible computing, with
advancements in solutions for the small to medium enterprise to have better use of power,
cooling, hardware, and software. The conference is open to the public and will bring together
system managers, researchers, developers, computational scientists and industry affiliates.
http://insidehpc.com/2015-stanford-hpc-conference-video-gallery/
STANFORD University: Awni Hannun of Baidu Research
Published on 5 Feb 2015
"Deep Speech: Scaling up end-to-end speech recognition" - Awni Hannun of Baidu Research
Colloquium on Computer Systems Seminar Series (EE380) presents the current research in
design, implementation, analysis, and use of computer systems. Topics range from integrated
circuits to operating systems and programming languages. It is free and open to the public, with
new lectures each week.
https://www.youtube.com/watch?v=P9GLDezYVX4&spfreload=10
STANFORD University: Steve Cousins of Savioke
Published on 29 Jan 2015
"Service Robots Are Here" - Steve Cousins of Savioke
Colloquium on Computer Systems Seminar Series (EE380) presents the current research in
design, implementation, analysis, and use of computer systems. Topics range from integrated
circuits to operating systems and programming languages. It is free and open to the public, with
new lectures each week.
https://www.youtube.com/watch?v=dn74oHbhRuk&spfreload=10
STANFORD University: Ron Fagin of IBM Research
Published on 5 Feb 2015
"Applying Theory to Practice (and Practice to Theory)" -Ron Fagin
This seminar features leading Industrial and academic experts on big data analytics, information
management, data mining, machine learning, and large-scale data processing.
https://www.youtube.com/watch?v=zEcJhDgyTow&spfreload=10
STANFORD University: CS224d: Deep Learning for Natural
Language Processing by Richard Socher, 2015
Course Description
Natural language processing (NLP) is one of the most important technologies of the information
age. Understanding complex language utterances is also a crucial part of artificial intelligence.
Applications of NLP are everywhere because people communicate most everything in language:
web search, advertisement, emails, customer service, language translation, radiology reports, etc.
There are a large variety of underlying tasks and machine learning models powering NLP
applications. Recently, deep learning approaches have obtained very high performance across
many different NLP tasks. These models can often be trained with a single end-to-end model and
do not require traditional, task-specific feature engineering. In this spring quarter course students
will learn to implement, train, debug, visualize and invent their own neural network models. The
course provides a deep excursion into cutting-edge research in deep learning applied to NLP. The
final project will involve training a complex recurrent neural network and applying it to a large
scale NLP problem. On the model side we will cover word vector representations, window-based
neural networks, recurrent neural networks, long-short-term-memory models, recursive neural
networks, convolutional neural networks as well as some very novel models involving a memory
component. Through lectures and programming assignments students will learn the necessary
engineering tricks for making neural networks work on practical problems.
http://cs224d.stanford.edu/syllabus.html
EdX: Articifial Intelligence (BerkeleyX)
CS188.1x is a new online adaptation of the first half of UC Berkeley's CS188: Introduction to
Artificial Intelligence. The on-campus version of this upper division computer science course
draws about 600 Berkeley students each year.
Artificial intelligence is already all around you, from web search to video games. AI methods plan
your driving directions, filter your spam, and focus your cameras on faces. AI lets you guide your
phone with your voice and read foreign newspapers in English. Beyond today's applications, AI is
at the core of many new technologies that will shape our future. From self-driving cars to
household robots, advancements in AI help transform science fiction into real systems.
CS188.1x focuses on Behavior from Computation. It will introduce the basic ideas and
techniques underlying the design of intelligent computer systems. A specific emphasis will be on
the statistical and decision theoretic modeling paradigm. By the end of this course, you will have
built autonomous agents that efficiently make decisions in stochastic and in adversarial settings.
CS188.2x (to follow CS188.1x, precise date to be determined) will cover Reasoning and
Learning. With this additional machinery your agents will be able to draw inferences in uncertain
environments and optimize actions for arbitrary reward structures. Your machine learning
algorithms will classify handwritten digits and photographs. The techniques you learn in CS188x
apply to a wide variety of artificial intelligence problems and will serve as the foundation for
further study in any application area you choose to pursue.
https://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-cs188-1xartificial-579#.U4CqKl6RPwI
EdX: Big Data and Social Physics (Ethics)
Social physics is a big data science that models how networks of people behave and uses these
network models to create actionable intelligence. It is a quantitative science that can accurately
predict patterns of human behavior and guide how to influence those patterns to (for instance)
increase decision making accuracy or productivity within an organization. Included in this course
is a survey of methods for increasing communication quality within an organization, approaches
to providing greater protection for personal privacy, and general strategies for increasing
resistance to cyber attack.
https://www.edx.org/course/mitx/mitx-mas-s69x-big-data-socialphysics-1737#.U4Cox5RdWG4
EdX: Introduction to Computational Thinking and Data
Science
6.00.2x is aimed at students with some prior programming experience in Python and a
rudimentary knowledge of computational complexity. We have chosen to focus on breadth rather
than depth. The goal is to provide students with a brief introduction to many topics, so that they
will have an idea of what’s possible when the time comes later in their career to think about how
to use computation to accomplish some goal. That said, it is not a “computation appreciation”
course. Students will spend a considerable amount of time writing programs to implement the
concepts covered in the course. Topics covered include plotting, stochastic programs, probability
and statistics, random walks, Monte Carlo simulations, modeling data, optimization problems,
and clustering.
https://www.edx.org/course/mitx/mitx-6-00-2x-introduction-computational-2836
MIT OpenCourseWare (OCW)
OCW makes the materials used in the teaching of MIT's subjects available on the Web.
http://ocw.mit.edu/index.htm
https://www.youtube.com/user/MIT
VLAB MIT Entreprise Forum Bay Area, Machine Learning
Videos
Added the 22-Nov-2014
Discovery of Disruptive Innovations & Actionable Ideas.
VLAB is the San Francisco Bay Area chapter of the MIT Enterprise Forum, a non-profit
organization dedicated to promoting the growth and success of high-tech entrepreneurial
ventures by connecting ideas, technology and people. We provide a forum for San Francisco and
Silicon Valley's leading entrepreneurs, industry experts, venture capitalists, private investors and
technologists to exchange insights about how to effectively grow high-tech ventures amidst
dynamic market risks and challenges. In a world where markets change at breakneck speed,
knowledge is a critical source of competitive advantage. Our forums provide an excellent
opportunity to network and learn about pivotal business issues, emerging industries and the latest
technologies.
http://www.youtube.com/user/vlabvideos/search?query=machine learning
Foundations of Machine Learning by Mehryar Mohri - 10 years
of Homeworks with Solutions and Lecture Slides
Course Description
This course introduces the fundamental concepts and methods of machine learning, including
the description and analysis of several modern algorithms, their theoretical basis, and the
illustration of their applications. Many of the algorithms described have been successfully used in
text and speech processing, bioinformatics, and other areas in real-world products and services.
The main topics covered are:
Probability tools, concentration inequalities
PAC model
Rademacher complexity, growth function, VC-dimension
Perceptron, Winnow
Support vector machines (SVMs)
Kernel methods
Decision trees
Boosting
Density estimation, maximum entropy models
Logistic regression
Regression problems and algorithms
Ranking problems and algorithms
Halving algorithm, weighted majority algorithm, mistake bounds
Learning automata and transducers
Reinforcement learning, Markov decision processes (MDPs)
http://www.cs.nyu.edu/~mohri/ml14/
Carnegie Mellon University (CMU) Video resources
"The videos below are intended to serve as resources for our current students, and not as online
learning materials for students outside of our program." - The Machine Learning Department
http://www.ml.cmu.edu/teaching/video-resources.html
CMU: Convex Optimisation, Fall 2013, by Barnabas Poczos and
Ryan Tibshirani
Overview and objectives
Nearly every problem in machine learning and statistics can be formulated in terms of the
optimization of some function, possibly under some set of constraints. As we obviously cannot
solve every problem in machine learning or statistics, this means that we cannot generically solve
every optimization problem (at least not efficiently). Fortunately, many problems of interest in
statistics and machine learning can be posed as optimization tasks that have special properties
such as convexity, smoothness, separability, sparsity etc. permitting standardized, efficient
solution techniques.
This course is designed to give a graduate-level student a thorough grounding in these properties
and their role in optimization, and a broad comprehension of algorithms tailored to exploit such
properties. The main focus will be on convex optimization problems, though we will also discuss
nonconvex problems at the end. We will visit and revisit important applications in statistics and
machine learning. Upon completing the course, students should be able to approach an
optimization problem (often derived from a statistics or machine learning context) and:
(1) identify key properties such as convexity, smoothness, sparsity, etc., and/or possibly
reformulate the problem so that it possesses such desirable properties;
(2) select an algorithm for this optimization problem, with an understanding of the ad- vantages
and disadvantages of applying one method over another, given the problem and properties at
hand;
(3) implement this algorithm or use existing software to efficiently compute the solution.
http://www.stat.cmu.edu/~ryantibs/convexopt/#videos
CMU: Machine Learning, Spring 2011, by Tom Mitchell
Machine Learning is concerned with computer programs that automatically improve their
performance through experience (e.g., programs that learn to recognize human faces,
recommend music and movies, and drive autonomous robots). This course covers the theory and
practical algorithms for machine learning from a variety of perspectives. We cover topics such as
Bayesian networks, decision tree learning, Support Vector Machines, statistical learning methods,
unsupervised learning and reinforcement learning. The course covers theoretical concepts such as
inductive bias, the PAC learning framework, Bayesian learning methods, margin-based learning,
and Occam's Razor. Short programming assignments include hands-on experiments with various
learning algorithms, and a larger course project gives students a chance to dig into an area of
their choice. This course is designed to give a graduate-level student a thorough grounding in the
methodologies, technologies, mathematics and algorithms currently needed by people who do
research in machine learning.
http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml
Homework with solutions
http://www.cs.cmu.edu/~tom/10701 sp11/hws.shtml
CMU: 10-601 Machine Learning Spring 2015 - Lecture 18 by
Maria-Florina Balcan
Topics: support vector machines (SVM), semi-supervised learning, other learning paradigms
https://www.youtube.com/watch?v=JoJhXsdTWxM&spfreload=10
CMU: 10-601 Machine Learning Spring 2015, Homeworks &
Solutions & Code (Matlab)
http://www.cs.cmu.edu/%7Eninamf/courses/601sp15/homeworks.shtml
CMU: 10-601 Machine Learning Spring 2015 - Recitation 10
by Kirstin Early
Topics: support vector machines (SVM), multi-class classification, constrained optimization using
Lagrange multipliers
https://www.youtube.com/watch?v=S4Cjl GwGZg&spfreload=10
CMU: Abulhair Saparov’s Youtube Channel
https://www.youtube.com/channel/UC3IXpkDpzturFkkvGJ-HeMg?spfreload=10
CMU: Machine Learning Course by Roni Rosenfeld, Spring
2015
Topics covered in 10-601A include concept learning, version spaces, information theory, decision
trees, neural networks, estimation & the bias-variance tradeoff, hypothesis testing in machine
learning, Bayesian learning, the Minimum Description Length principle, the Gibbs classifier,
Naïve Bayes classifier, Bayes Nets & Graphical Models, the EM algorithm, Hidden Markov
Models, K-Nearest-Neighbors and nonparametric learning, Maximum Margin classifiers (SVM)
and kernel based methods, bagging, boosting and Deep Learning.
This section of 10-601 focuses on the mathematical, statistical and computational foundations of
the field. It emphasizes the role of assumptions in machine learning. As we introduce different
ML techniques, we work out together what assumptions are implicit in them. We use the Socratic
Method whenever possible, and student participation is expected. We focus on conceptual depth,
at the possible expense of breadth.
http://www.cs.cmu.edu/~roni/10601/
CMU: Language and Statistics by Roni Rosenfeld, Spring 2015
Internet search, speech recognition, machine translation, question answering, information
retrieval, biological sequence analysis -- are all at the forefront of this century’s information
revolution. In addition to their use of machine learning, these technologies rely heavily on classic
statistical estimation techniques. Yet most CS and engineering undergraduate programs do not
prepare students in this area beyond an introductory probability & statistics course. This course is
designed to address this gap.
The goal of "Language and Statistics" is to ground the data-driven techniques used in language
technologies in sound statistical methodology. We start by formulating various language
technology problems in both an information theoretic framework (the source-channel paradigm)
and a Bayesian framework (the Bayes classifier). We then discuss the statistical properties of
words, sentences, documents and whole languages, and the various computational formalisms
used to represent language. These discussions naturally lead to specific concepts in statistical
estimation.
Topics include: Zipf's distribution and type-token curves; point estimators, Maximum Likelihood
estimation, bias and variance, sparseness, smoothing and clustering; interpolation, shrinkage, and
backoff; entropy, cross entropy and mutual information; decision tree models applied to language;
latent variable models and the EM algorithm; hidden Markov models; exponential models and
the maximum entropy principle; semantic modeling and dimensionality reduction; probabilistic
context-free grammars and syntactic language models.
http://www.cs.cmu.edu/~roni/11761/
Metacademy Concept list and roadmap list
Metacademy is a community-driven, open-source platform for experts to collaboratively
construct a web of knowledge. Right now, Metacademy focuses on machine learning and
probabilistic AI, because that's what the current contributors are experts in. But eventually,
Metacademy will cover a much wider breadth of knowledge, e.g. mathematics, engineering,
music, medicine, computer science…
http://www.metacademy.org/list
http://www.metacademy.org/roadmaps/
HARVARD University: Advanced Machine Learning, Fall 2013
This course is about learning to extract statistical structure from data, for making decisions and
predictions, as well as for visualization. The course will cover many of the most important mathematical and computational tools for probabilistic modeling, as well as examine specific models
from the literature and examine how they can be used for particular types of data. There will be
a heavy emphasis on implementation. You may use Matlab, Python or R. Each of the five assignments will involve some amount of coding, and the final project will almost certainly require the
running of computer experiments.
https://www.seas.harvard.edu/courses/cs281/
HARVARD University: Data Science Course, Fall 2013
Learning from data in order to gain useful predictions and insights. This course introduces
methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a
suitable data set; data management to be able to access big data quickly and reliably; exploratory
data analysis to generate hypotheses and intuition; prediction based on statistical methods such as
regression and classification; and communication of results through visualization, stories, and
interpretable summaries.
We will be using Python for all programming assignments and projects.
http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml
OXFORD University: Nando de Freitas Video Lectures
I am a machine learning professor at UBC. I am making my lectures available to the world with
the hope that this will give more folks out there the opportunity to learn some of the wonderful
things I have been fortunate to learn myself. Enjoy.
http://www.youtube.com/user/ProfNandoDF
OXFORD University: Deep learning - Introduction by Nando
de Freitas, 2015
Published on 29 Jan 2015
Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from
Brendan Shillingford.
https://www.youtube.com/watch?v=PlhFWT7vAEw&spfreload=10
OXFORD University: Deep learning - Linear Models by Nando
de Freitas, 2015
Published on 29 Jan 2015 (bad audio)
Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from
Brendan Shillingford.
https://www.youtube.com/watch?v=DHspIG64CVM&spfreload=10
OXFORD University: Yee Whye Teh Home Page, Department
of Statistics, University College
Research Interests
I am interested in machine learning, Bayesian statistics and computational statistics. My current
focus is on developing Bayesian nonparametric methodologies, with applications to large and
complex problems in unsupervised learning, computational linguistics, and genetics.
Teaching : Statistical Machine Learning and Data Mining (MS1b HT2014)
Slides and Problem Sheets with Solutions (not to be missed!)
http://www.stats.ox.ac.uk/~teh/smldm.html
About Bayesian Nonparametrics (MLSS 2013)
https://www.youtube.com/embed/dNeW5zoNJ7g?vq=hd1080&autoplay=1
https://www.youtube.com/embed/7sy MCbqtco?vq=hd1080&autoplay=1
https://www.youtube.com/embed/kqEWDdTB_3Q?vq=hd1080&autoplay=1
https://www.youtube.com/watch?v=FO0fgVS9OmE&spfreload=10
Slides
http://mlss.tuebingen.mpg.de/2013/slides teh.pdf
CAMBRIDGE University: Machine Learning Slides, Spring
2014
LECTURE SYLLABUS
This year, the exposition of the material will be centered around three specific machine learning
areas: 1) supervised non-paramtric probabilistic inference using Gaussian processes, 2) the
TrueSkill ranking system and 3) the latent Dirichlet Allocation model for unsupervised learning
in text.
http://mlg.eng.cam.ac.uk/teaching/4f13/1314/
CALTECH University: Learning from Data
Free, introductory Machine Learning online course (MOOC)
Taught by Caltech Professor Yaser Abu-Mostafa [article]
Lectures recorded from a live broadcast, including Q&A
Prerequisites: Basic probability, matrices, and calculus
8 homework sets and a final exam
Discussion forum for participants
Topic-by-topic video library for easy review
http://work.caltech.edu/telecourse.html
http://work.caltech.edu/library/
UNIVERSITY COLLEGE LONDON (UCL): Discovery
UCL Discovery showcases UCL's research publications, giving access to journal articles, book
chapters, conference proceedings, digital web resources, theses and much more, from all UCL
disciplines. Where copyright permissions allow, a full copy of each research publication is directly
available from UCL Discovery.
You can search or browse UCL Discovery, see the most-downloaded publications, and keep up to
date with the latest UCL research by RSS or even on Twitter.
UCL Discovery supports UCL's Publications Policy.
http://discovery.ucl.ac.uk
http://www.youtube.com/watch?v=Euaoblv nL8
UCL: Supervised Learning by Mark Herbster
The course covers supervised approaches to machine learning. It starts by probabilistic pattern
recognition followed by an in-depth introduction to various supervised learning algorithms such
as Least Squares, Lasso, Perceptron Algorithm, Support Vector Machines and Boosting.
http://www0.cs.ucl.ac.uk/staff/M.Herbster/GI01/
Yann LeCun’s Publications
My main research interests are Machine Learning, Computer Vision, Mobile Robotics, and
Computational Neuroscience. I am also interested in Data Compression, Digital Libraries, the
Physics of Computation, and all the applications of machine learning (Vision, Speech, Language,
Document understanding, Data Mining, Bioinformatics).
http://yann.lecun.com/exdb/publis/index.html#fulllist
Ecole Normale Superieure: Francis Bach, Courses and Exercises
with solutions (English-French)
Spring 2014: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2013: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan Spring 2013: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Spring 2013: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure
(Paris)
Fall 2012: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan Spring 2012: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure
(Paris)
Spring 2012: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2011: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan Spring 2011: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2010: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan Spring 2010: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2009: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan Fall 2008: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan May 2008: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des
Mines de Paris
Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
May 2007: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des
Mines de Paris
Fall 2006: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
Fall 2005: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
http://www.di.ens.fr/~fbach/
http://videolectures.net/francis_r_bach/
Technion, Israel Institute of Technology, Machine Learning
Videos
Added the 22-Nov-2014
Technion - Israel Institute of Technology is Israel's biggest scientific-technological university and
one of the largest centers of applied research in the world. Here the future is being shaped - by
over 13,000 of Israel's most dynamic students active in 18 faculties. Technion is Israel's flagship
of world-class education, bringing Israel its first Nobel Prizes in science. From the cornerstone
laying ceremony in 1912, Technion's over 70,000 alumni have built the state of Israel and created
and lead the majority of Israel's successful companies, impacting millions of scientists, students,
entrepreneurs and citizens worldwide.
http://www.youtube.com/user/Technion/search?query=machine learning
E0 370: Statistical Learning Theory by Prof. Shivani Agarwal,
Indian Institute of Science
Course Description
This is an advanced course on learning theory suitable for PhD students working in learning
theory or related areas (e.g. information theory, game theory, computational complexity theory
etc) or 2nd-year Masters students doing a machine learning related project that involves learningtheoretic concepts. The course will consist broadly of three parts and will cover roughly the
following topics:
Generalization error bounds
Uniform convergence
Growth function, VC-dimension, Sauer's Lemma
Covering numbers, pseudo-dimension, fat-shattering dimension
Margin analysis
Rademacher averages
Algorithmic stability
Statistical consistency and learnability
Consistency of ERM and SRM methods
Learnability/PAC learning
Consistency of nearest neighbor methods
Consistency of surrogate risk minimization methods (binary and multiclass)
Online learning and multi-­‐armed bandits
Online classification/regression
Online learning from experts, online allocation
Online convex optimization
Online-to-batch conversions
Multi-armed bandits (stochastic and adversarial)
http://drona.csa.iisc.ernet.in/~shivani/Teaching/E0370/Aug-2013/index.html#lectures
NPTEL, National Programme on Technology Enhanced
Learning, India
NPTEL provides E-learning through online Web and Video courses in Engineering, Science and
humanities streams. The mission of NPTEL is to enhance the quality of Engineering education
in the country by providing free online courseware.
http://nptel.ac.in
Probability Theory and Applications
http://nptel.ac.in/courses/111104079/
Pattern Recognition
http://nptel.ac.in/courses/106106046/1
Pattern Recognition Class, Universität Heidelberg, 2012 (Videos
in English)
Syllabus:
1. Introduction
1.1 Applications of Pattern Recognition
1.2 k-Nearest Neighbors Classification
1.3 Probability Theory
1.4 Statistical Decision Theory
2. Correlation Measures, Gaussian Models
2.1 Pearson Correlation
2.2 Alternative Correlation Measures
2.3 Gaussian Graphical Models
2.4 Discriminant Analysis
3. Dimensionality Reduction
3.1 Regularized LDA/QDA
3.2 Principal Component Analysis (PCA)
3.3 Bilinear Decompositions
4. Neural Networks
4.1 History of Neural Networks
4.2 Perceptrons
4.3 Multilayer Perceptrons
4.4 The Projection Trick
4.5 Radial Basis Function Networks
5. Support Vector Machines
5.1 Loss Functions
5.2 Linear Soft-Margin SVM
5.3 Nonlinear SVM
6. Kernels, Random Forest
6.1 Kernels
6.2 One-Class SVM
6.3 Random Forest
6.4 Random Forest Feature Importance
7. Regression
7.1 Least-Squares Regression
7.2 Optimum Experimental Design
7.3 Case Study: Functional MRI
7.4 Case Study: Computer Tomography
7.5 Regularized Regression
8. Gaussian Processes
8.1 Gaussian Process Regression
8.2 GP Regression: Interpretation
8.3 Gaussian Stochastic Processes
8.4 Covariance Function
9. Unsupervised Learning
9.1 Kernel Density Estimation
9.2 Cluster Analysis
9.3 Expectation Maximization
9.4 Gaussian Mixture Models
10. Directed Graphical Models
10.1 Bayesian Networks
10.2 Variable Elimination
10.3 Message Passing
10.4 State Space Models
11. Optimization
11.1 The Lagrangian Method
11.2 Constraint Qualifications
11.3 Linear Programming
11.4 The Simplex Algorithm
12. Structured Learning
12.1 structSVM
12.2 Cutting Planes
https://www.youtube.com/playlist?
list=PLuRaSnb3n4kRDZVU6wxPzGdx1CN12fn0w&spfreload=10
Videolectures.net
VideoLectures.NET is an award-winning free and open access educational video lectures
repository. The lectures are given by distinguished scholars and scientists at the most important
and prominent events like conferences, summer schools, workshops and science promotional
events from many fields of Science. The portal is aimed at promoting science, exchanging ideas
and fostering knowledge sharing by providing high quality didactic contents not only to the
scientific community but also to the general public. All lectures, accompanying documents,
information and links are systematically selected and classified through the editorial process
taking into account also users' comments.
http://videolectures.net/Top/Computer Science/Machine Learning/
http://videolectures.net/Top/Computer Science/Machine Learning/#o=top
MLSS Machine Learning Summer Schools Videos
MLSS Videos from 2004 to 2012
http://videolectures.net/site/search/?q=MLSS
MLSS Videos 2012
http://www.youtube.com/user/compcinemaucsc/feed
MLSS Videos 2012
http://www.youtube.com/channel/UCHhbDEKA7BP58mq1wfTBQNQ
Max Planck Institute for Intelligent Systems Tubingen, MLSS Videos 2013
Our goal is to understand the principles of Perception, Action and Learning in autonomous
systems that successfully interact with complex environments and to use this understanding to
design future systems. The Institute studies these principles in biological, computational, hybrid,
and material systems ranging from nano to macro scales.We take a highly interdisciplinary
approach that combines mathematics, computation, material science, and biology.
The MPI for Intelligent Systems has campuses in Stuttgart and Tübingen. Our Stuttgart campus
has world-leading expertise in small-scale intelligent systems that leverage novel material science
and biology. The Tübingen campus focuses on how intelligent systems process information to
perceive, act and learn.
http://www.youtube.com/channel/UCty-pPOWlWUk4gXNm5pydcg
http://mlss.tuebingen.mpg.de/2013/speakers.html
MLSS Videos 2014
https://www.youtube.com/playlist?list=PLZSO 6bSqHQCIYxE3ycGLXHMjK3XV7Iz&spfreload=10
All slides of MLSS 2015, Austin, Texas
http://www.cs.utexas.edu/mlss/schedule
GoogleTechTalks
Machine Learning
https://www.youtube.com/user/GoogleTechTalks/search?query=machine learning
Deep Learning
https://www.youtube.com/user/GoogleTechTalks/search?query=deep learning
Udacity Opencourseware
Supervised Learning (select "View Courseware" for free access)
Why Take This Course?
In this course, you will gain an understanding of a variety of topics and methods in Supervised
Learning. Like function approximation in general, Supervised Learning prompts you to make
generalizations based on fundamental assumptions about the world.
Michael: So why wouldn't you call it "function induction?"
Charles: Because someone said "supervised learning" first.
Topics covered in this course include: Decision trees, neural networks, instance-based learning,
ensemble learning, computational learning theory, Bayesian learning, and many other fascinating
machine learning concepts.
https://www.udacity.com/course/ud675
Unsupervised Learning (select "View Courseware" for free access)
Why Take This Course?
You will learn about and practice a variety of Unsupervised Learning approaches, including:
randomized optimization, clustering, feature selection and transformation, and information
theory.
You will learn important Machine Learning methods, techniques and best practices, and will
gain experience implementing them in this course through a hands-on final project in which you
will be designing a movie recommendation system (just like Netflix!).
https://www.udacity.com/course/ud741
Reinforcement Learning (select "View Courseware" for free access)
Why Take This Course?
You will learn about Reinforcement Learning, the field of Machine Learning concerned with the
actions that software agents ought to take in a particular environment in order to maximize
rewards.
Michael: Reinforcement Learning is a very popular field. Charles: Perhaps because you're in it, Michael. Michael: I don't think that's it.
In this course, you will gain an understanding of topics and methods in Reinforcement Learning,
including Markov Decision Processes and Game Theory. You will gain experience implementing
Reinforcement Learning techniques in a final project.
In the final project, we’ll bring back the 80's and design a Pacman agent capable of eating all the
food without getting eaten by monsters.
https://www.udacity.com/course/ud820
Model Building and Validation
Advanced Techniques for Analyzing Data
Course Summary
This course will teach you how to start from scratch in answering questions about the real world
using data. Machine learning happens to be a small part of this process. The model building
process involves setting up ways of collecting data, understanding and paying attention to what is
important in the data to answer the questions you are asking, finding a statistical, mathematical
or a simulation model to gain understanding and make predictions.
All of these things are equally important and model building is a crucial skill to acquire in every
field of science. The process stays true to the scientific method, making what you learn through
your models useful for gaining an understanding of whatever you are investigating as well as
make predictions that hold true to test.
We will take you on a journey through building various models. This process involves asking
questions, gathering and manipulating data, building models, and ultimately testing and
evaluating them.
https://www.udacity.com/course/ud919
Udacity's Videos
Udacity, a pioneer in online education, is building "University by Silicon Valley", a new type of
online university that:
- teaches the actual programming skills that industry employers need today;
- delivers credentials endorsed by employers, because they built them;
- provides education at a fraction of the cost and time of traditional schools.
With industry giants - Google, AT&T, Facebook, Salesforce, Cloudera, etc. - we offer
Nanodegree credentials, designed so professionals become Web Developers, Data Analysts, or
Mobile Developers. Supported by our communities of coaches and students, our students learn
programming and data science through a series of online courses and hand-on projects that help
them practice and build a convincing portfolio.
https://www.youtube.com/user/Udacity/videos?spfreload=10
Mathematicalmonk Machine Learning
Videos about math, at the graduate level or upper-level undergraduate.
https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA
Judea Pearl Symposium
Judea Pearl (born 1936) is an Israeli-born American computer scientist and philosopher, best
known for championing the probabilistic approach to artificial intelligence and the development
of Bayesian networks (see the article on belief propagation). He is also credited for developing a
theory of causal and counterfactual inference based on structural models (see article on
causality). He is the 2011 winner of the ACM Turing Award, the highest distinction in computer
science, "for fundamental contributions to artificial intelligence through the development of a
calculus for probabilistic and causal reasoning". (source Wikipedia)
http://www.youtube.com/playlist?list=PLMliWGoMCBYilM6tw6S 4BpL t29jbWsp
http://www.youtube.com/user/UCLA/playlists
SIGDATA, Indian Institute of Technology Kanpur
http://www.cse.iitk.ac.in/users/sigdata/
http://www.cse.iitk.ac.in/users/sesres/
Hakka Labs
Hakka Labs is passionate about helping professional software engineers level up in their careers.
Our content, events & community have grown by leaps and bounds since our humble origin
when we launched as a Tumblr blog in 2011.
We believe that "software is eating the world" and our passion is in building valuable resources
and community for startup-oriented software engineers - the folks that will power innovation and
disrupt industries, and ultimately shape our future.
Hakka originally launched in SF Bay & NYC and rapidly built relationships with the top
companies, CTOs and tech influencers in these key areas. We have deep connections to the
software engineering worlds on both coasts and often invite groups of CTOs and engineers to
our office in Soho, or meet with them at engineering events that we either run or participate in.
We're also currently up & running in Berlin & Moscow, and plan to continue to rapidly expand
worldwide. Not too shabby for a scrappy startup with a small marketing budget!
http://www.hakkalabs.co
https://www.youtube.com/user/g33ktalktv/videos
Open Yale Course
Game Theory
Each course includes a full set of class lectures produced in high-quality video accompanied by
such other course materials as syllabi, suggested readings, exams, and problem sets. The lectures
are available as downloadable videos, and an audio-only version is also offered. In addition,
searchable transcripts of each lecture are provided.
http://oyc.yale.edu/courses
COLUMBIA University: Machine Learning resources
Course related notes
Regression by linear combination of basis functions [ps] [pdf]
The perceptron [ps] [pdf]
Document classification with the multinomial model [ps] [pdf]
Sampling from a Gaussian [ps] [pdf]
Slides on exponential family distributions [ps] [pdf]
http://www.cs.columbia.edu/~jebara/4771/tutorials.html
COLUMBIA University: Applied Data Science by Ian
Langmore and Daniel Krasner
The purpose of this course is to take people with strong mathematical/statistical knowledge and
teach them software development fundamentals. This course will cover
•
Design of small software packages
•
Working in a Unix environment
•
Designing software in teams
•
Fundamental statistical algorithms such as linear and logistic regression
•
Overfitting and how to avoid it
•
Working with text data (e.g. regular expressions)
•
Time series
•
And more. . .
http://columbia-applied-data-science.github.io/appdatasci.pdf
http://columbia-applied-data-science.github.io
Deep Learning
Deep Learning is a new area of Machine Learning research, which has been introduced with the
objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.
This website is intended to host a variety of resources and pointers to information about Deep
Learning. In these pages you will find
•
a reading list,
•
links to software,
•
datasets,
•
a list of deep learning research groups and labs,
•
a list of announcements for deep learning related jobs (job listings),
•
as well as tutorials and cool demos.
For the latest additions, including papers and software announcement, be sure to visit the
Blog section and subscribe to our RSS feed of the website. Contact us if you have any
comments or suggestions!
http://www.deeplearning.net/tutorial/
http://deeplearning.net
BigDataWeek Videos
Big Data Week is one of the most unique global platforms of interconnected community events
focusing on the social, political, technological and commercial impacts of Big Data. It brings
together a global community of data scientists, data technologies, data visualisers and data
businesses spanning six major commercial, financial, social and technological sectors.
http://www.youtube.com/user/BigDataWeek/videos
Neural Information Processing Systems Foundation (NIPS)
Video resources
The Foundation: The Neural Information Processing Systems (NIPS) Foundation is a non-profit
corporation whose purpose is to foster the exchange of research on neural information
processing systems in their biological, technological, mathematical, and theoretical aspects.
Neural information processing is a field which benefits from a combined view of biological,
physical, mathematical, and computational sciences.
The primary focus of the NIPS Foundation is the presentation of a continuing series of
professional meetings known as the Neural Information Processing Systems Conference, held
over the years at various locations in the United States, Canada and Spain.
http://www.youtube.com/user/NeuralInformationPro/feed
NIPS 2014 Workshop Videos
https://www.youtube.com/user/NeuralInformationPro/videos?spfreload=10
NIPS 2014 Workshop - (Bengio) OPT2014 Optimization for
Machine Learning
Optimization lies at the heart of many machine learning algorithms and enjoys great interest in
our community. Indeed, this intimate relation of optimization with ML is the key motivation for
the OPT series of workshops. We aim to foster discussion, discovery, and dissemination of the
state-of-the-art in optimization relevant to ML.This year, as the seventh in its series, the
workshop's special topic will be the challenges in non-convex optimization, with contributions
spanning both the challenges (hardness results) and the opportunities (modeling flexibility) of
non-convex optimization. Irrespective of the special topic, the workshop will again warmly
welcome contributed talks and posters on all topics in optimization for machine learning. The
confirmed invited speakers for this year are: * Amir Beck (Technion, Israel) * Jean Bernard
Lasserre (CNRS, France) * Yoshua Bengio (University of Montreal, Canada)
https://www.youtube.com/watch?v=jl-s4gFWhlI&spfreload=10
Hong Kong Open Source Conference 2013 (English&Chinese)
Wang Leung Wong
The Vice-Chairperson of the Hong Kong Linux User Group
This channel will post the videos of my life and opensource events in Hong Kong.
Hong Kong Linux User Group: http://linux.org.hk
Facebook: https://www.facebook.com/groups/hklug/
http://www.youtube.com/playlist?list=PL2FSfitY-hTKbEKNOwb-j0blK6qBauZ1f
http://www.youtube.com/playlist?list=PL2FSfitY-hTLOL6tT 12YUK4c67e-E0xh
ICLR 2014 Videos
It is well understood that the performance of machine learning methods is heavily dependent on
the choice of data representation (or features) on which they are applied. The rapidly developing
field of representation learning is concerned with questions surrounding how we can best learn
meaningful and useful representations of data. We take a broad view of the field, and include in
it topics such as deep learning and feature learning, metric learning, kernel learning,
compositional models, non-linear structured prediction, and issues regarding non-convex
optimization. Despite the importance of representation learning to machine learning and to application areas
such as vision, speech, audio and NLP, there is currently no common venue for researchers who
share a common interest in this topic. The goal of ICLR is to help fill this void. ICLR 2014 will be a 3-day event from April 14th to April 16th 2014, in Banff, Canada. The
conference will follow the recently introduced open reviewing and open publishing publication
process, which is explained in further detail here: Publication Model.
https://www.youtube.com/playlist?list=PLhiWXaTdsWB-3O19E0PSR0r9OseIylUM8
ICLR 2013 Videos
ICLR 2013 will be a 3-day event from May 2nd to May 4th 2013, co-located
with AISTATS2013 in Scottsdale, Arizona. The conference will adopt a novel publication
process, which is explained in further detail here: Publication Model.
https://sites.google.com/site/representationlearning2013/program-details/program
Machine Learning Conference Videos
Events matching your search:
• ICML 2011
• Sixth Annual Machine Learning Symposium
• 1st Lisbon Machine Learning School
• Copulas in Machine Learning Workshop 2011
• NIPS 2011 Workshop on Integrating Language and Vision
• Machine Learning in Computational Biology (MLCB) 2011
• Learning Semantics Workshop
• Sparse Representation and Low-rank Approximation
• The 4th International Workshop on Music and Machine Learning: Learning from
Musical Structure
• Big Learning: Algorithms, Systems, and Tools for Learning at Scale
• ICML 2012 Oral Talks (International Conference on Machine Learning)
• Big Data Meets Computer Vision: First International Workshop on Large Scale
Visual Recognition and Retrieval
• 2nd Workshop on Semantic Perception, Mapping and Exploration (SPME)
• ICML 2012 Workshop on Representation Learning
• Inferning 2012: ICML Workshop on interaction between Inference and Learning
• Object, functional and structured data: towards next generation kernel-based
methods - ICML 2012 Workshop
• Tutorial on Statistical Learning Theory in Reinforcement Learning and
Approximate Dynamic Programming
• Tutorial on Causal inference - conditional independences and beyond
• ICML 2012 Tutorial on Prediction, Belief, and Markets
• PAC-Bayesian Analysis in Supervised, Unsupervised, and Reinforcement
Learning
• Issues
Performance Evaluation for Learning Algorithms: Techniques, Application and
• 2nd Lisbon Machine Learning School (2012)
• OpenCV using Python
• Big Learning : Algorithms, Systems, and Tools
• NIPS 2012 Workshop on Log-Linear Models
• Machine Learning in Computational Biology (MLCB) 2012
• NYU Course on Big Data, Large Scale Machine Learning
• Sixteenth International Conference on Artificial Intelligence and Statistics
(AISTATS) 2013
• International Conference on Learning Representations (ICLR) 2013
• ICML 2013 Plenary Webcast
• NYU Course on Deep Learning (Spring 2014)
•
NYU Course on Machine Learning and Computational Statistics 2014
http://techtalks.tv/search/results/?q=machine learning
Internet Archive
Hello Patron,
Every day 3 million people use our collections.
We have archived over ten petabytes (that's 10,000,000,000,000,000 bytes!) of information,
including everything ever written in Balinese. This year we also launched our groundbreaking
TV News Search and Borrow service, which former FCC Chairman Newton Minow said "offers
citizens exceptional opportunities" to easily do their own fact checking and "to hold powerful
public institutions accountable."
Your support helps us build amazing services and keep them free for people around the globe.
https://archive.org/search.php?query=machine%20learning
University of Berkeley
http://www.youtube.com/user/UCBerkeley/search?query=machine learning
AMP Camps, Big Data Bootcamp, UC Berkeley
AMP Camps are Big Data training events organized by the UC Berkeley AMPLab about big
data analytics, machine learning, and popular open-source software projects produced by the
AMPLab. All AMP Camp curriculum, and whenever possible videos of instructional talks
presented at AMP Camps, are published here and accessible for free.
http://ampcamp.berkeley.edu
AMP Camp 5 was held at UC Berkeley and live-streamed online on November 20 and 21, 2014.
Videos and exercises from the event are available on the AMPCamp 5 page.
http://ampcamp.berkeley.edu/5/
AI on the Web, AIMA (Artificial Intelligence: A Modern
Approach) by Stuart Russell and Peter Norvig
This page links to 820 pages around the web with information on Artificial Intelligence. Links in
Bold* followed by a star are especially useful and interesting sites. Links with a sign at the end
have "tooltip" information that will pop up if you put your mouse over the link for a second or
two. If you have new links to add, mail them to peter@norvig.com.
http://aima.cs.berkeley.edu/ai.html
http://aima.cs.berkeley.edu/ai.html#learning
Resources and Tools of Noah's ARK Research Group
The following were developed by ARK researchers (*developed in whole or in part before joining
ARK):
NLP tools:
universal part-­‐of-­‐speech tagset, set of twelve coarse POS tags that generalizes across several
languages
Semantics: SEMAFOR, an open-source statistical frame-semantic parser; AMALGr, an opensource statistical analyzer for multiword expressions in context
Syntax: TurboParser, an open-source, trainable statistical dependency parser;
MSTParserStacked, an open-source, trainable statistical dependency parser based on stacking;
DAGEEM code for unsupervised dependency grammar induction
Information extraction: Arabic named entity recognizer
Libraries/languages: AD3, an approximate MAP decoder; *Dyna, a declarative programming
language for dynamic programming algorithms
Machine translation tools, including: *cdec, a framework for statistical translation and other
structure prediction problems; *Egypt, a statistical machine translation toolkit that includes Giza;
gappy pattern models, code for modeling monolingual and bilingual textual patterns with gaps;
Rampion, a training algorithm for statistical machine translation models
Social media tools, including: Twitter NLP resources
Datasets: *STRAND (parallel text collections from the web); CURD (the Carnegie Mellon
University Recipe Database); 10-K Corpus (company annual reports and stock return volatility
data); political blog corpus; movie$ corpus; movie summary corpus; question-answer data;
Congressional bills corpus; Arabic named entity and supersense corpora; NFL tweets corpus;
multiword expressions corpus
Project websites: Flexible Learning for NLP; Low-Density MT; Compuframes, Big
Multilinguality, Corporate Social Network
http://www.ark.cs.cmu.edu/#resources
ESAC DATA ANALYSIS AND STATISTICS WORKSHOP
2014
ABOUT THE ESAC FACULTY
The ESAC Faculty was created in 2006 in order to foster an effective scientific environment at
ESAC, and to to present a united face to the scientific work done at the centre. The faculty
includes all active (i.e. publishing papers) research scientists at ESAC: ESA staff, Research
Fellows, Science Contractors, and LAEFF members. For an insight into the founding principles,
see the Overview of the ESAC Faculty presentation given at the first assembly.
The ESAC Faculty's main purpose is to stimulate and promote science activities at ESAC. For
this it maintains an active and attractive visitor programme for short-to-medium term
collaborative stays at ESAC, covering established researchers as well as young post-docs, PhD
and graduate students. The Faculty also supports visiting seminar speakers, conferences,
workshops and travel not possibly via normal mission budgets.
ESAC Faculty members pursue their own research (as per the scientific interests of individual
members), but are also involved in numerous internal and external collaborations (overview of
Faculty Science at ESAC). Faculty members are also strongly involved in the ESAC Trainee
programme.
http://www.cosmos.esa.int/web/esac-science-faculty/esac-statistics-workshop-2014
The Royal Society
The Royal Society is a self-governing Fellowship of many of the world’s most distinguished
scientists drawn from all areas of science, engineering, and medicine.
The Society’s fundamental purpose, reflected in its founding Charters of the 1660s, is to
recognise, promote, and support excellence in science and to encourage the development and use
of science for the benefit of humanity.
The Society has played a part in some of the most fundamental, significant, and life-changing
discoveries in scientific history and Royal Society scientists continue to make outstanding
contributions to science in many research areas.
The Royal Society is the national Academy of science in the UK, and its core is its Fellowship
and Foreign Membership, supported by a dedicated staff in London and elsewhere. The
Fellowship comprises the most eminent scientists of the UK, Ireland and the Commonwealth.
A major activity of the Society is identifying and supporting the work of outstanding scientists.
The Society supports researchers through its early and senior career schemes, innovation and
industry schemes, and other schemes.
The Society facilitates interaction and communication among scientists via its discussion
meetings, and disseminates scientific advances through its journals. The Society also engages
beyond the research community, through independent policy work, the promotion of high quality
science education, and communication with the public.
https://www.youtube.com/user/RoyalSociety/videos?spfreload=10
Statistical and causal approaches to machine learning by
Professor Bernhard Schölkopf
https://www.youtube.com/watch?v=ek9jwRA2Jio&spfreload=10
Deep Learning RNNaissance with Dr. Juergen Schmidhuber
A great session of NYC-ML Meetup Hosted by ShutterStock in the glorious Empire State
building. Details:
Deep Learning RNNaissance
Machine learning and pattern recognition are currently being revolutionised by "Deep Learning"
(DL)
https://www.youtube.com/watch?v=6bOMf9zr7N8&spfreload=10
Introduction to Deep Learning with Python by Alec Radford
Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning with
Python and the Theano library. The emphasis of the talk is on high performance computing,
natural language processing using recurrent neural nets, and large scale learning with GPUs.
https://www.youtube.com/watch?v=S75EdAcXHKk
SlideShare presentation is available here:
http://slidesha.re/1zs9M11
A Statistical Learning/Pattern Recognition Glossary by Thomas
Minka
Welcome to my glossary. It is inspired by Brian Ripley's glossary in "Pattern Recognition for
Neural Networks" (and the need to save time explaining things).
http://alumni.media.mit.edu/~tpminka/statlearn/glossary/
The Kalman Filter Website by Greg Welch and Gary Bishop
The Kalman Filter
Some tutorials, references, and research related to the Kalman filter.
This site is maintained by Greg Welch in Nursing / Computer Science / Simulation & Training
at the University of Central Florida, and Gary Bishop in the Department of Computer Science
at the University of North Carolina at Chapel Hill. Welch also holds an adjunct position at
UNC-Chapel Hill. Please send additions or comments.
http://www.cs.unc.edu/~welch/kalman/index.html
Lisbon Machine Learning School (LXMLS)
LXMLS Lab guide (Great Tutorial!)
Day 0
In this class we will introduce several fundamental concepts needed further ahead. We start with
an introduc- tion to Python, the programming language we will use in the lab sessions, and to
Matplotlib and Numpy, two modules for plotting and scientific computing in Python, respectively.
Afterwards, we present several notions on probability theory and linear algebra. Finally, we focus
on numerical optimization.
The goal of this class is to give you the basic knowledge for you to understand the following
lectures. We will not enter in too much detail in any of the topics.
Day 1
This day will serve as an introduction to machine learning. We recall some fundamental concepts
about deci- sion theory and classification. We also present some widely used models and
algorithms and try to provide the main motivation behind them. There are several textbooks that
provide a thorough description of some of the concepts introduced here: for example, Mitchell
(1997), Duda et al. (2001), Scho lkopf and Smola (2002), Joachims (2002), Bishop (2006),
Manning et al. (2008), to name just a few. The concepts that we introduce in this chapter will be
revisited in later chapters, where the same algorithms and models will be adapted to structured
inputs and outputs. For now, we concern only with multi-class classification (with just a few
classes).
Day 2
In this class, we relax the assumption that the data points are independently and identically
distributed (i.i.d.) by moving to a scenario of structured prediction, where the inputs are assumed
to have temporal or spacial dependencies. We start by considering sequential models, which
correspond to a chain structure: for instance, the words in a sentence. In this lecture, we will use
part-of-speech tagging as our example task.
We start by defining the notation for this lecture in Section 2.1. Afterwards, in section 2.2, we
focus on the well known Hidden Markov Models and in Section 2.3 we describe how to estimate
its parameters from labeled data. In Section 2.4 we explain the inference algorithms (Viterbi and
Forward-Backward) for sequence models. These inference algorithms will be fundamental for the
rest of this lecture, as well as for the next lecture on discriminative training of sequence models.
In Section 2.6 we describe the task of Part-of-Speech tagging, and how the Hidden Markov
Models are suitable for this task. Finally, in Section 2.7 we address unsupervised learning of
Hidden Markov Models through the Expectation Maximization algorithm.
Day 3
In this class, we will continue to focus on sequence classification, but instead of following a
generative ap- proach (like in the previous chapter) we move towards discriminative approaches.
Recall that the difference between these approaches is that generative approaches attempt to
model the probability distribution of the data, P(X, Y), whereas discriminative ones only model
the conditional probability of the sequence, given the observed data, P(Y X).
Day 4
In this lab we will implement some exercises related with parsing.
Day 5
In this lab (and tomorrow), we will work with Amazon.com’s Web Services (AWS)1, a cloud
based solution to run some simple analyses. Then, in the next lab, we will build on these tools to
construct a larger learning system.
We will only look at small problems, such that you can run them both locally and on AWS
quickly. This way, you can learn how to use them within the limited time of these lab sessions.
Unfortunately, this also means that you will not be dealing with truly large-scale problems where
AWS is faster than local computations. You should consider these last two days as a proof-ofconcept giving you the knowledge necessary to run things on AWS, which you can apply to your
own large-scale problems after this summer school.
Day 6
In the previous lesson, you learned the fundamentals of MapReduce and applied it to a simple
classification problem (language detection, using the Na ıve Bayes classifier).
Today, we’re going to use MapReduce again to solve a trickier problem: using EM to perform
unsupervised POS induction.
Use the same login information you used yesterday to access your Amazon machine.
http://lxmls.it.pt/2014/guide.pdf
LXMLS Slides, 2014
During the morning there will be lectures focusing on the main areas of ML and their
application to NLP. These areas include but are not restricted to: Classification, Structured
Prediction (sequences, trees, graphs), Parsing, Information Retrieval, and their applications to
practical language processing on the Web.
For each topic introduced in the morning there will be a practical session in the afternoon, where
students will have the opportunity to test the concepts in practice. The practical sessions will
consist in implementation exercises (using Python, Numpy, and Matplotlib) of the methods
learned during the morning, testing them on real examples. A preliminary version of the lab
guide is available here.
http://lxmls.it.pt/2014/?page_id=5
INTRODUCTORY APPLIED MACHINE LEARNING by
Victor Lavrenko and Nigel Goddard, University of Edinburgh,
2011
The goal of this course is to introduce students to basic algorithms for learning from examples,
focusing on classification and clustering problems. This is a level 9 course intended for MSc
students and 3rd year undergraduates.
http://www.inf.ed.ac.uk/teaching/courses/iaml/
Data Mining and Machine Learning Course Material by
Bamshad Mobasher, DePaul University, Fall 2014
COURSE DESCRIPTION
The course will focus on the implementations of various data mining and machine learning
techniques and their applications in various domains. The primary tools used in the class are the
Python programming language and several associated libraries. Additional open source machine
learning and data mining tools may also be used as part of the class material and assignments.
Students will develop hands on experience developing supervised and unsupervised machine
learning algorithms and will learn how to employ these techniques in the context of popular
applications such as automatic classification, recommender systems, searching and ranking, text
mining, group and community discovery, and social media analytics.
http://facweb.cs.depaul.edu/mobasher/classes/CSC478/lecture.html
Intelligent Information Retrieval by Bamshad Mobasher, DePaul University, Winter 2015
COURSE DESCRIPTION
This course will examine the design, implementation, and evaluation of information retrieval
systems, such as Web search engines, as well as new and emerging technologies to build the next
generation of intelligent and personalized search tools and Web information systems. We will
focus on the underlying retrieval models, algorithms, and system implementations, such as vectorspace and probabilistic retrieval models, as well as the PageRank algorithm used by Google. We
will also study more advanced topics in intelligent information retrieval and filtering, particularly
on the World Wide Web, including techniques for document categorization, automatic concept
discovery, recommender systems, discovery and analysis of online communities and social
networks, and personalized search. Throughout the course, current literature from the viewpoints
of both research and practical retrieval technologies both on and off the World Wide Web will be
examined.
http://facweb.cs.depaul.edu/mobasher/classes/csc575/lecture.html
Student Dave Youtube Channel
https://www.youtube.com/user/TheScienceguy3000/videos?spfreload=10
Current Courses of Justin E. Esarey, RICE University
Current Courses
POLS 395: Introduction to Statistics [syllabus]
POLS 500: Social Scientific Thinking I (PhD) [syllabus]
POLS 505: Advanced MLE: Analyzing Categorical and Longitudinal Data [syllabus]
POLS 506: Bayesian Statistics (PhD) [syllabus]
Lecture 0: Introduction to R [webcast lecture] [R script]
Lecture 1: Basic Concepts of Bayesian Inference [webcast lecture][R script][notebook]
Lecture 2: Simple Bayesian Models
Lecture 3: Basic Monte Carlo Procedures and Sampling Algorithms
Lecture 4: The Metropolis-Hastings Algorithm and the Gibbs Sampler
Lecture 5: Practical MCMC for Estimating Models
Lecture 6: Bayesian Hierarchical Models and GLMs
Lecture 7: Fitting Hierarchical Models with BUGS
Lecture 8: Item Response Theory and the Scaling of Latente Dimensions
Lecture 9: Model Checking, Validation, and Comparison
Lecture 10: Missing Data Imputation
Lecture 11: Multilevel Regression and Poststratification
Lecture 12: Bayesian Spatial Autoregressive Models
POLS 507: Nonparametric Models and Machine Learning (PhD) [syllabus]
Lecture 1: Introduction to Nonparametric Statistics [webcast lecture] [R script] [notebook]
Lecture 2: Nonparametric Uncertainty Estimation and Bootstrapping [webcast lecture] [R
script] [notebook]
Lecture 3: Ensemble Models and Bayesian Model Averaging [webcast lecture] [R script]
[notebook]
Lecture 4: "Causal Inference" and Matching [webcast lecture] [R script] [notebook]
Lecture 5: Instrumental Variable Models [webcast lecture] [R script] [notebook]
Lecture 6: Bayesian Networks and Causality [webcast lecture] [R script] [notebook]
Lecture 7: Assessing Fit in Discrete Choice Models [webcast lecture] [R script] [notebook]
Lecture 8: Identifying and Measuring Latent Variables [webcast lecture] [R script] [notebook]
Lecture 9: Neural Networks [webcast lecture] [R script] [notebook]
Lecture 10: Classification and Regression Trees [webcast lecture] [R script] [notebook]
http://jee3.web.rice.edu/teaching.htm
From Bytes to Bites: How Data Science Might Help Feed the
World by David Lobell, Stanford University
This seminar features leading Industrial and academic experts on big data analytics, information
management, data mining, machine learning, and large-scale data processing.
http://i.stanford.edu/infoseminar/lobell.html
Conference on Empirical Methods in Natural Language
Processing (and forerunners) (EMNLP)
(Free access to all publications)
The ACL Anthology currently hosts 33921 papers on the study of computational linguistics and
natural language processing. Subscribe to the mailing list to receive announcements and updates
to the Anthology.
http://aclanthology.info/venues/emnlp
emnlp acl's Youtube Channel
https://www.youtube.com/channel/UCZC4e4nrTjVqkW3Gcl16WoA/videos?spfreload=10
Columbia University's Laboratory for Intelligent Imaging and
Neural Computing (LIINC)
Columbia University's Laboratory for Intelligent Imaging and Neural Computing (LIINC) was
founded in September 2000 by Paul Sajda. The mission of LIINC is to using principles of
reverse "neuro"-engineering to characterize the cortical networks underlying perceptual and
cognitive processes, such as rapid decision making, in the human brain. Our laboratory pursues
both basic and applied neuroscience research projects, with emphasis in the following: ...
http://liinc.bme.columbia.edu/mainTemplate.htm?liinc_projects.htm
Enabling Brain-Computer Interfaces for Labeling Our
Environment by Paul Sadja
NYC Machine Learning Meetup 1/15/15
Paul Sadja from Columbia University presenting "Neural Correlates of the "Aha" Moment:
Enabling Brain-Computer Interfaces for Labeling Our Environment"
https://www.youtube.com/watch?v=weNqauwatBs
The Unreasonable Effectivness Of Deep Learning by Yann
LeCun, Sept 2014
http://videolectures.net/sahd2014 lecun deep learning/
Machine Learning by Prof. Shai Ben-David, University of
Waterloo, Lecture 1-3, Jan 2015
https://www.youtube.com/watch?v=iN8des41d94&spfreload=10
https://www.youtube.com/watch?v=rOcjShZbCFo&spfreload=10
https://www.youtube.com/watch?v=MYbt63PPP8o&spfreload=10
https://www.youtube.com/watch?v=jEIIkhESDac&spfreload=10
Computer Vision by Richard E. Turner, Slides, Exercises &
Solutions, University of Cambridge
http://cbl.eng.cam.ac.uk/Public/Turner/Teaching
Probability and Statistics by Carl Edward Rasmussen, Slides,
University of Cambridge
http://mlg.eng.cam.ac.uk/teaching/1BP7/1415/
Machine Learning by Carl Edward Rasmussen, Slides,
University of Cambridge
http://mlg.eng.cam.ac.uk/teaching/4f13/1314/
Seth Grimes's videos
Sentiment Analysis Symposium
http://vimeo.com/sethgrimes/videos
Introduction to Reinforcement Learning by Shane Conway, Nov
2014
Machine learning is often divided into three categories: supervised, unsupervised, and
reinforcement learning. Reinforcement learning concerns problems with sequences of decisions
(where each decision affects subsequent opportunities), in which the effects can be uncertain, and
with potentially long-term goals. It has achieved immense success in various different fields,
especially AI/Robotics and Operations Research, by providing a framework for learning from
interactions with an environment and feedback in the form of rewards and penalties.
Shane Conway, researcher at Kepos Capital, gives a general overview of reinforcement learning,
covering how to solve cases where there is uncertainty both in actions and states, as well as where
the state space is very large.
https://www.hakkalabs.co/articles/introduction-reinforcement-learning#!
Machine Learning and Data Mining by Prof. Dr. Volker Tresp,
2014, LMU
The lecture is given in English.
http://www.dbs.ifi.lmu.de/cms/Maschinelles_Lernen_und_Data_Mining
Applied Machine Learning by Joelle Pineau, Fall 2014, McGill
University
http://cs.mcgill.ca/~jpineau/comp598/schedule.html
Analyzing data from the city of Montreal
We gave the following instructions to our students. Here's what they came up with.
There is a significant effort towards moving much of the data form the city of Montreal into an
Open Data format. This data can be accessed here:
http://donnees.ville.montreal.qc.ca/
http://donnees.ville.montreal.qc.ca/english-version-of-the-portail-des-donnees-ouvertes-de-laville-de-montreal/
The goal of this project is to use this data to identify an interesting prediction question that can
be tackled using machine learning methods, and solve the problem using appropriate machine
learning algorithms and methodology. You are not restricted to using only this data (though you
should use some of it). You can incorporate data from other sources, or collect additional data
(e.g. new test set) if appropriate. The choice of prediction task and dataset to use is open. Try to
pick a prediction question that is relevant and important to the citizens or administrators of the
city. Remember to design a prediction task that is well suited to your choice of dataset; and vice
versa, pick the right data for tackling your prediction question.
http://rl.cs.mcgill.ca/comp598/fall2014/
Artificial Intelligence by Joelle Pineau, Winter 2014-2015, McGill University
http://www.cs.mcgill.ca/~jpineau/comp424/schedule.html
Talking Machines: The History of Machine Learning from the
Inside Out
In episode five of Talking Machines, we hear the first part of our conversation with Geoffrey
Hinton (Google and University of Toronto), Yoshua Bengio (University of Montreal) and Yann
LeCun (Facebook and NYU). Ryan introduces us to the ideas in tensor factorization methods for
learning latent variable models (which is both a tongue twister and and one of the new tools in
ML). To find out more on the topic, the paper Tensor decompositions for learning latent variable
models is a good place to start. You can also take a look at the work of Daniel Hsu, Animashree
Anandkumar and Sham M. Kakade Plus we take a listener question about just where statistics
stops and machine learning begins.
http://www.thetalkingmachines.com/blog/2015/2/26/the-history-of-machine-learning-fromthe-inside-out
The Simons Institute for the Theory of Computing
About
The Simons Institute for the Theory of Computing is an exciting new venue for collaborative
research in theoretical computer science. Established on July 1, 2012 with a grant of $60 million
from the Simons Foundation, the Institute is housed in Calvin Lab, a dedicated building on the
UC Berkeley campus. Its goal is to bring together the world's leading researchers in theoretical
computer science and related fields, as well as the next generation of outstanding young scholars,
to explore deep unsolved problems about the nature and limits of computation.
Open Lectures
The Simons Institute Open Lectures are aimed at a broad scientific audience. Upcoming lectures
can be viewed on the list of Other Events. To view the video of a past lecture, please follow the
link in the list below.
http://simons.berkeley.edu/events/openlectures
DIKU - Datalogisk Institut, Københavns Universitet Youtube
Channel
https://www.youtube.com/channel/UCo1j8XjbD3B0UVjP0OTU3ZA?spfreload=10
Hashing in machine learning by John Langford, Microsoft
Research
Video of the lecture from the 2014 Summer School on Hashing: Theory and Applications, July
14-17, 2014, University of Copenhagen, Denmark.
https://www.youtube.com/watch?v=BItoTJDupgM&spfreload=10
Dimensionality reductions by Alexander Andoni, Microsoft
Research
Video of the lecture from the 2014 Summer School on Hashing: Theory and Applications, July
14-17, 2014, University of Copenhagen, Denmark.
https://www.youtube.com/watch?v=uLVMv9HFqIk&spfreload=10
RE.WORK Deep Learning Summit Videos, San Francisco 2015
https://www.youtube.com/playlist?list=PLnDbcXCpYZ8lCKExMs8k4PtIbani9ESX3
Machine Learning Tutorial, UNSW Australia
http://www.cse.unsw.edu.au/~cs9417ml/
Reinforcement Learning's Tutorial by Tim Eden, Anthony Knittel and Raphael van Uffelen
http://www.cse.unsw.edu.au/~cs9417ml/RL1/index.html
Oxford's Podcast
About
This free site features public lectures, teaching material, interviews with leading academics,
information about applying to the University, and much more.
All the material is arranged within a series of related talks or lectures and may be in audio, video
or document format. A full list of all series is available.
Content is being added regularly to the site. All content is free for you to download and watch or
listen to. This site contains over 6,500 items arranged into 416 series. Over 4,780 academic
contributors have released material.
http://podcasts.ox.ac.uk
Natural Language Processing by Mohamed Alaa El-Dien Aly,
2014, KAUST
Information
This course covers basic Natural Language Processing concepts. Topics include: language
modeling, spelling correction, sentiment analysis, parsing, text classification, information
retrieval, ... etc. We will closely follow Coursera's two NLP classes: that by Jurafsky and Manning,
as well as that by Collins.
http://www.mohamedaly.info/teaching/cmp462-spring-2014-natural-language-processing
QUT - Queensland University of Technology, Brisbane,
Australia
https://moocs.qut.edu.au/users/sign_in
https://www.qut.edu.au/
QUT: Introduction to Robotics by Professor Peter Corke (you need to sign in)
Course Summary
This course is an introduction to the exciting world of robotics and the mathematics and
algorithms that underpin it. You will develop an understanding of the representation of pose and
motion, kinematics, dynamics and control. You will also be introduced to the variety of robots
and the diversity of tasks to which this knowledge and skills can be applied, the role of robots in
society, and associated ethical issues. If you have access to a LEGO Mindstorms robotics
development kit you will be able to build a simple robot arm and write the control software for it.
This course combined with the Robotic vision MOOC, is based on a 13 week undergraduate
course, Introduction to robotics at the Queensland University of Technology.
QUT: Robotic Vision by Professor Peter Corke (you need to sign in)
Course Summary
Robotic Vision introduces you to the field of computer vision and the mathematics and
algorithms that underpin it.You’ll learn how to interpret images to determine the color, size,
shape and position of objects in the scene.We’ll work with you to build an intelligent vision
system that can recognise objects of different colours and shapes.
Data & Society
Data & Society is an NYC-based think/do tank focused on social, cultural, and ethical issues
arising from data-centric technological development.
Data & Society is an independent nonprofit 501(c)3 research institute. Its creation is supported by
a generous gift from Microsoft.
http://www.datasociety.net
Open Book for people with autism
Open Book is a new interactive tool that will assist people with autism to transform written
information into a format that is easier for them to read and understand. This program has been
developed by the FIRST project.
The program is primarily aimed at people with autism who have IQ levels of 70 and above.
Open Book is now available online in English, Spanish and Bulgarian.
This project is partially funded by the European Commission under the Seventh Framework
Programme for Research and Technological Development (FP7-2007-2013).
http://www.first-asd.eu/openbook-video
NUMDAN, Recherche et téléchargement d’archives de revues
mathématiques numérisées
http://www.numdam.org/?lang=en
Project Euclid, mathematics and statistics online
http://projecteuclid.org
Statistical Modeling: The Two Cultures by Leo Breiman, 2001
http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726
mini-DML
< The goal of the project >
To collate in one place basic bibliographical data for any kind of mathematical digital article and
make them accessible to the users through simple search or metadata retrieval.
< The collections >
A proof-of-concept implementation is presented, based on a variety of sources of mathematical
texts. The main emphasis is on long-run journals whose early production is widely unknown to
MathSciNet / Jahrbuch Zentralblatt-MATH, with special interest towards current production
and preprints on the other end.
NUMDAM journals (currently : 18) and seminars (currently : 21);
CEDRAM Journals (current issues) (4)
One Gallica journal : Journal de mathématiques pures & appliquées (a.k.a. Liouville) up to 1880;
Gallica Complete works (Abel, Cauchy, Dirichlet, Fourier, Jacobi, Klein, Lagrange, Laguerre,
Laplace, Möbius, Riemann);
Project Euclid journals (Duke, Adv. in Appl. Probab., etc.);
Math part of ArXiv.
Journals from ICM (Bibliotheca Wirtualna Matematyki)
http://minidml.mathdoc.fr
MISCELLANEOUS
The Automatic Statistician project
About the Automatic Statistician project
Making sense of data is one of the great challenges of the information age we live in. While it is
becoming easier to collect and store all kinds of data, from personal medical data, to scientific
data, to public data, and commercial data, there are relatively few people trained in the statistical
and machine learning methods required to test hypotheses, make predictions, and otherwise
create interpretable knowledge from this data. The Automatic Statistician project aims to build
an artificial intelligence for data science, helping people make sense of their data.
The current version of the Automatic Statistician is a system which explores an open-ended
space of possible statistical models to discover a good explanation of the data, and then produces
a detailed report with figures and natural-language text. While at Cambridge, James Lloyd,
David Duvenaud and Zoubin Ghahramani, in collaboration with Roger Grosse and Joshua
Tenenbaum at MIT, developed an early version of this system which not only automatically
produces a 10-15 page report describing patterns discovered in data, but returns a statistical
model with state-of-the-art extrapolation performance evaluated over real time series data sets
from various domains. The system is based on reasoning over an open-ended language of
nonparametric models using Bayesian inference.
Kevin P. Murphy, Senior Research Scientist at Google says: "In recent years, machine learning
has made tremendous progress in developing models that can accurately predict future data.
However, there are still several obstacles in the way of its more widespread use in the data
sciences. The first problem is that current Machine Learning (ML) methods still require
considerable human expertise in devising appropriate features and models. The second problem
is that the output of current methods, while accurate, is often hard to understand, which makes it
hard to trust. The "automatic statistician" project from Cambridge aims to address both
problems, by using Bayesian model selection strategies to automatically choose good models /
features, and to interpret the resulting fit in easy-to-understand ways, in terms of human
readable, automatically generated reports. This is a very promising direction for ML research,
which is likely to find many applications at Google and beyond."
The project has only just begun but we're excited for its future. Check out our example analyses
to get a feel for what our work is about.
http://www.automaticstatistician.com/about.php
A selection of Youtube's featured channels
This channel was generated automatically by YouTube's video discovery system.
Channels auto generated by YouTube are channels created by algorithms to collect trending and
popular videos by topic. Auto generated channels act like user channels in that you can subscribe
to them and stay updated on new videos.
Machine Learning - Topic
https://www.youtube.com/channel/UCZsleorsr6rdZGfj1uwII2g?spfreload=10
Cluster analysis - Topic
https://www.youtube.com/channel/UCnw0vt f06vjKe4v0WzMXWQ?spfreload=10
Regression analysis - Topic
https://www.youtube.com/channel/UCOJOKlW_JtDxuIgOBRtO2sg?spfreload=10
Principal component analysis - Topic
https://www.youtube.com/channel/UC8pfZWXk12lvZMCde04 4Og?spfreload=10
Support vector machine - Topic
https://www.youtube.com/channel/UCzBnlLuYsm-f16t9RUClCaQ?spfreload=10
Artificial neural network - Topic
https://www.youtube.com/channel/UC9Mv-haow40iOxAduUUMrwA?spfreload=10
Bayes' theorem - Topic
https://www.youtube.com/channel/UCh8Hk7vMxhTSOAY2nUtkKNg?spfreload=10
Genetic algorithm - Topic
https://www.youtube.com/channel/UC3ykWul05jzoN3nbdJv9Ghg?spfreload=10
Data Mining - Topic
https://www.youtube.com/channel/UCO gxCgk006uEqVTNhrV-sw?spfreload=10
Statistical classification Topic
https://www.youtube.com/channel/UCV7kd6QmwVf6J01y9f6-duA?spfreload=10
Computer vision - Topic
https://www.youtube.com/channel/UCWA02whr4rPRQwrat0NlR2Q?spfreload=10
Introduction To Modern Brain-Computer Interface Design by
Swartz Center for Computational Neuroscience
This is an online course on Brain-Computer Interface (BCI) design with a focus on modern
methods. The lectures were first given by Christian Kothe (SCCN/UCSD) in 2012 at University
of Osnabrueck within the Cognitive Science curriculum and have now been recorded in the
form of an open online course.
The course includes basics of EEG, BCI, signal processing, machine learning, and also contains
tutorials on using BCILAB and the lab streaming layer software.
http://sccn.ucsd.edu/wiki/Introduction To Modern Brain-Computer Interface Design
Distributed Computing Courses (lectures, exercises with
solutions) by ETH Zurich, Group of Prof. Roger Wattenhofer
Mission
We are interested in both theory and practice of computer science and information technology.
In our group we cultivate a large breadth of areas, reflecting our different backgrounds in
computer science, mathematics, and electrical engineering. This gives us a unique blend of basic
and applied research, proving mathematical theorems on the one hand, and building practical
systems on the other.
We currently study the following topics: Distributed computing (computability, locality,
complexity), distributed systems (Bitcoin), wireline networks (software defined networks), wireless
networks (media access theory and practice), social networks (influence), algorithms (online
algorithms, game theory), learning theory (recommendation theory and practice). We regularly
publish in different communities: distributed computing (e.g. PODC, SPAA, DISC), networking
(e.g. SIGCOMM, MobiCom, SenSys), theory (e.g. STOC, FOCS, SODA, ICALP), and from
time to time at random in areas such as machine learning or human computer interaction.
Members of our group have won several best paper awards at top conferences such as PODC,
SPAA, DISC, MobiCom, or P2P. Roger Wattenhofer has won the Prize for Innovations in
Distributed Computing in 2012, for “extensive contributions to the study of distributed
approximation”. Some projects turned into startup companies, e.g. Wuala, StreamForge,
BitSplitters. Several projects have been covered by popular media and blogs, e.g. Gizmodo,
Lifehacker, New York Times, NZZ, PC World Magazine, Red Herring, or Technology Review.
Some of the software developed by our students is very popular: The music application Jukefox
and the peer-to-peer client BitThief have together more than 1 million downloads. A branch of
the United States FBI has requested to use a ver- sion of BitThief as a tool to uncover illegal
activities. About half of the former PhD students are in academic positions, some others founded
startup companies.
http://dcg.ethz.ch/courses.html
The wonderful and terrifying implications of computers that can
learn | Jeremy Howard | TEDxBrussels
Published on 6 Dec 2014
This talk was given at a local TEDx event, produced independently of the TED Conferences.
The extraordinary, wonderful, and terrifying implications of computers that can learn
https://www.youtube.com/watch?v=xx310zM3tLs&spfreload=10
Partially derivative, A podcast about data, data science, and
awesomeness!
Partially Derivative is a show about data, data science, drinking, and awesomeness! We cover our
top 10 data-related articles and blog posts from the past week
all in 30 minutes, or sometimes
longer, depending on much we’ve been drinking. The show is hosted by Jonathon Morgan, a
startup CTO, and Dr. Chris Albon, a computational political scientist.
http://www.partiallyderivative.com
Class Central
MOOC Tracker
Never miss a course
https://www.class-central.com
Beginning to Advanced University CS Courses
Awesome Courses
Introduction
There is a lot of hidden treasure lying within university pages scattered across the internet. This
list is an attempt to bring to light those awesome courses which make their high-quality material
i.e. assignments, lectures, notes, readings & examinations available online for free.
https://github.com/prakhar1989/awesome-courses
WIRED UK Youtube Channel
https://www.youtube.com/user/WiredVideoUK/videos?spfreload=10
AI at WIRED2014: The next big frontier is the mind and brain - Full WIRED2014 talk
"When we were kids, we felt like the space age was imminent," says Google machine learning
expert Blaise Aguera y Arcas. "But in a funny way, the big frontier for our generation is the mind,
the brain -- these inward spaces" - Full WIRED 2014 talk
The engineer, who was the architect of Bing Maps, was joined on stage at WIRED2014 by
DeepMind Technologies founder Demis Hassabis and Ben Medlock, CTO of Swiftkey.
WIRED2014 was the fourth annual event to bring the values of WIRED to life. Building on
experience from groundbreaking previous events, WIRED2014 gathered pioneering speakers
from around the world to stimulate debate, spread ideas and showcase the future in a
multidisciplinary way.
https://www.youtube.com/watch?v=CUhflgWvvoo
Davos 2015 - A Brave New World - How will advances in
artificial intelligence, smart sensors and social technology change
our lives?
• Rodney Brooks, Founder, Chairman and Chief Technical Officer, Rethink Robotics, USA;
Technology Pioneer
• Anthony Goldbloom, Founder and Chief Executive Officer, Kaggle, USA; Technology Pioneer
• Hiroaki Nakanishi, Chairman and Chief Executive Officer, Hitachi, Japan
• Kenneth Roth, Executive Director, Human Rights Watch, USA
• Stuart Russell, Professor, University of California, Berkeley, USA; Global Agenda Council on
Artificial Intelligence & Robotics
Moderated by
• Hiroko Kuniya, Anchor and Presenter, Today's Close-Up, NHK (Japan Broadcasting
Corporation), Japan; Global Agenda Council on Japan
https://www.youtube.com/watch?v=wGLJXO08IYo&spfreload=10
http://www.weforum.org
World Economic Forum
The World Economic Forum is an international institution committed to improving the state of
the world through public-private cooperation in the spirit of global citizenship. It engages with
business, political, academic and other leaders of society to shape global, regional and industry
agendas.
Incorporated as a not-for-profit foundation in 1971 and headquartered in Geneva, Switzerland,
the Forum is independent, impartial and not tied to any interests. It cooperates closely with all
leading international organizations.
Best known for its Annual Meeting in Davos, Switzerland, the World Economic Forum, also
publishes benchmark global reports on Competitiveness, Gender, and Risk.
https://www.youtube.com/user/WorldEconomicForum/search?query=machine learning
The Global Gender Gap Report
The Global Gender Gap Report, published by the World Economic Forum, provides a
framework for capturing the magnitude and scope of gender-based disparities around the world.
https://www.youtube.com/channel/UCw-kH-Od73XDAt7qtH9uBYA?spfreload=10
Technology Pioneer 2014⎪Anthony Goldbloom⎪Kaggle
https://www.youtube.com/watch?v=OShGuf7QeJY&spfreload=10
IdeasLab 2014 - Emma Brunskill - Closing the Skills Gap with Machine Learning
https://www.youtube.com/watch?v=oZVSp1YS4jQ
IdeasLab 2014 - Ian Goldin - The Future of Machine Intelligence
https://www.youtube.com/watch?v=0fWYnv2gUWI&spfreload=10
IdeasLab 2014 - Michael Altendorf - The Truth of Machine Learning
https://www.youtube.com/watch?v=JJBb-78gofY&spfreload=10
The LINCS project
LINCS aims to create a network-based understanding of biology by cataloging changes in gene
expression and other cellular processes that occur when cells are exposed to a variety of
perturbing agents, and by using computational tools to integrate this diverse information into a
comprehensive view of normal and disease states that can be applied for the development of new
biomarkers and therapeutics. By generating and making public data that indicates how cells
respond to various genetic and environmental stressors, the LINCS project will help us gain a
more detailed understanding of cell pathways and aid efforts to develop therapies that might
restore perturbed pathways and networks to their normal states.
This website is a source of information for the research community and general public about the
LINCS project. It contains information about the experiments conducted, as well as links to
participating LINCS centers’ websites, data releases from LINCS centers, and tools that can be
used for analyzing the data.
http://www.lincsproject.org
Australian Academy of Science
Official YouTube channel of the Australian Academy of Science, an independent organisation
representing Australia's leading scientists. It recognises excellence, advises government and
promotes science education and public awareness of science.
https://www.youtube.com/user/ScienceAcademyAu/videos?spfreload=10
Artificial intelligence: Machines on the rise
About the talk
Speaking, natural-sounding machines which can interact with humans using normal
conversational patterns are still in the realm of science fiction or are they?
Associate Professor James Curran is developing artificial intelligence which will revolutionise the
way we interact with technology - using spoken language, the same way we interact with each
other. Using computational linguistics, an area of artificial intelligence, he’s building computer
systems that can understand and communicate with us in our own natural languages. These
systems will be able to navigate, manipulate and summarise knowledge, unlocking vast stores of
language-based human knowledge on the web and beyond.
https://www.youtube.com/watch?v=HwdmesBcbaw&spfreload=10
Bill Gates Q&A on Reddit
Hi Reddit, I’m Bill Gates and I’m back for my third AMA. Ask me anything.
https://www.reddit.com/r/IAmA/comments/2tzjp7/
hi reddit im bill gates and im back for my third/
The Guardian: Artificial intelligence will become strong enough to be a concern, says Bill Gates
Former Microsoft boss joins Elon Musk and Stephen Hawking in suggesting that the march of AI
could be an existential threat to humans
http://www.theguardian.com/technology/2015/jan/29/artificial-intelligence-strong-concernbill-gates
Second Price went to Yarin Gal for his extrapolated art image,
Cambridge University Engineering Photo Competition
The PhD student extended Van Gogh's Starry Night using algorithms to see what might have
happened if the artist had carried on painting.
http://www.telegraph.co.uk/technology/11228471/In-Pictures-Cambridge-UniversityEngineering-Photo-Competition-Winners.html?frame=3105270
Draw from a Deep Gaussian Process by David Duvenaud, Cambridge University Engineering Photo Competition
http://www.telegraph.co.uk/technology/11228471/In-Pictures-Cambridge-UniversityEngineering-Photo-Competition-Winners.html?frame=3105356
MOOC, Opencourseware in
Spanish
I hope to find resources soon. Any suggestion is welcome! Thanks in advance!
Jacqueline
MOOC, Opencourseware in
German
I hope to find resources soon. Any suggestion is welcome! Thanks in advance!
Jacqueline
MOOC, Opencourseware in
Italian
I hope to find resources soon. Any suggestion is welcome! Thanks in advance!
Jacqueline
MOOC, Opencourseware in
French
France Universite Numerique (FUN)
https://www.france-universite-numerique-mooc.fr
Contrairement a ses homologues anglo-saxons, l’accès aux archives de FUN est prohibé. Certains
liens de cours ci-dessous, peuvent rapidement devenir obsolètes. Désolée de ce problème. Jacqueline
FUN: MinesTelecom: 04006 Fondamentaux pour le Big Data
Ce MOOC s'adresse à un public ayant des bases en mathématiques et en algorithmique (niveau
L2 validé) nécessitant un rafraichissement de ces connaissances pour suivre des formations en
data science et big data. Il peut être suivi en préparation du Mastère Spécialisé « Big data :
Gestion et analyse des données massives », du Certificat d’Etudes Spécialisées « Data Scientist »
et de la formation courte « Data Science : Introduction au Machine Learning » .
https://www.france-universite-numerique-mooc.fr/courses/MinesTelecom/04006/
Trimestre_1_2015/about
University of Laval (French Canadian)
Open access to the course material
Apprentissage automatique
Apprentissage automatique à partir de données et apprentissage supervisé. Minimisation du
risque empirique et minimisation du risque structurel. Méthodes d'estimation du vrai risque à
partir de données et intervalles de confiance. Classificateurs linéaires et non linéaires. Forme
duale de l'algorithme du perceptron. Noyaux de Mercer. Classificateurs à large marge de
séparation.
SVMs à marge rigide et marge floue. Apprentissage probablement approximativement correct
(PAC) et théorie de Vapnik et Chervonenkis sur l'erreur de prédiction des classificateurs.
L'apprentissage par compression de l'échantillon et applications aux SCMs et perceptrons.
https://cours.ift.ulaval.ca/2009a/ift7002_81602/
Théorie algorithm. des graphes
Ce cours aborde des sujets tels la connexité dans un graphe (problèmes du flot maximum,
de la dualité min-max, de couplage parfait, etc.), la planarité d'un graphe (formule d'Euler,
théorème de Kuratowski, graphe dual), le coloriage d'un graphe (coloriages entiers et
fractionnaires des sommets ou des arêtes, graphes de Kneiser), les problèmes de transversales
d'un graphe (parcours eulériens, cycles hamiltoniens, graphes de DeBruijn, etc.) et la notion de
marche aléatoire sur un graphe (chaînes de Markov, existence de la distribution limite,
«mixing time», etc.). Plusieurs problèmes sur les graphes ont d'élégantes solutions,
d'autres évidemment sont NP-complets; une partie de ce cours portera donc sur la théorie de la
complexité (problèmes NP et NP-complets, théorème de Cook, algorithmes de réductions).
https://cours.ift.ulaval.ca/2012a/ift7012 89927/
Hugo Larochelle, Apprentissage automatique, French Canadian
Je m'intéresse aux algorithmes d'apprentissage automatique, soit aux algorithmes capables
d'extraire des concepts ou patrons à partir de données. Mes travaux se concentrent sur le
développement d'approches connexionnistes et probabilistes à diverses problèmes d'intelligence
artificielle, tels la vision artificielle et le traitement automatique du langage.
Les thèmes de recherche auxquels je m'intéresse incluent:
Problèmes: apprentissage supervisé, semi-supervisé et non-supervisé, prédiction de cibles
structurées, ordonnancement, estimation de densité;
Modèles: réseaux de neurones profonds («deep learning»), autoencodeurs, machines de
Boltzmann, champs Markoviens aléatoires;
Applications: reconnaissance et suivi d'objects, classification et ordonnancement de documents;
https://www.youtube.com/user/hugolarochelle?spfreload=10
http://www.dmi.usherb.ca/~larocheh/index_fr.html
The Machine Learning Salon donne rarement son avis, mais concernant les vidéos de Hugo
Larochelle, c’est vraiment excellent ! Toutes mes félicitations et remerciements a Hugo Larochelle!
Francis Bach, Ecole Normale Superieure - Courses and
Exercises with solutions (English-French)
Spring 2014: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2013: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan Spring 2013: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Spring 2013: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure
(Paris)
Fall 2012: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan Spring 2012: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale Superieure
(Paris)
Spring 2012: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2011: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
Spring 2011: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2010: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
Spring 2010: Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Fall 2009: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
Fall 2008: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan May 2008: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des
Mines de Paris
Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
May 2007: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des
Mines de Paris
Fall 2006: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
Fall 2005: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
http://www.di.ens.fr/~fbach/
College de France, Mathematics and Digital Science, French
One of the Collège de France's missions is to promote French research and thought abroad, and
to participate in intel-lectual debates on major world issues. The institution therefore participates
in international exchange through its teaching and the dissemination of knowledge, as well as
through the research programmes involving its Chairs and laboratories. The fact that one fifth of
the professors are currently from abroad, confirms the Collège de France's wid-ening research
and education policy.
This policy of international openness translates into:
• Collège de France professors' teaching missions abroad
• Lectures and lecture series by visiting professors
• Junior Visiting Researchers scheme
• Lecture series and symposia abroad
• Internet broadcasts
http://www.college-de-france.fr/site/alain-connes/index.htm
Le Laboratoire de Recherche en Informatique (LRI)
Le Laboratoire de Recherche en Informatique (LRI) est une unité mixte de recherche
(UMR8623) de l'Université Paris-Sud et du CNRS.
Les thèmes de recherche du laboratoire couvrent un large spectre de l'informatique à dominante
logicielle et incluent à la fois des aspects fondamentaux et des aspects appliqués : algorithmique,
combinatoire, graphes, optimisation discrète et continue, programmation, génie logiciel,
vérification et preuves, parallélisme, calcul à haute performance, grilles, architecture et
compilation, réseaux, bases de données, représentation et traitement des connaissances,
apprentissage, fouille de données, bioinformatique, interaction homme-machine, etc. Cette
diversité est l'une des forces du laboratoire car elle favorise les recherches aux frontières, là où le
potentiel d'innovation est le plus grand.
https://www.lri.fr
MOOC, Opencourseware in
Russian
Russian Machine Learning Resources
Google Translation from Russian:
Professional information and analytical resource dedicated
machine learning , pattern recognition and data mining .
Now resource contains 831 article in Russian. (Source 16-07-2014)
Classification
Pattern recognition
Regression analysis
Analysis and understanding of images
Prediction
Processing and analysis of texts
Applied Statistics
Applied Systems Analysis Data
Signal Processing
All Destinations
http://www.machinelearning.ru/wiki/index.php?title=Заглавная страница
The Yandex School of Data Analysis
The School of Data Analysis is a free Master’s-level program in Computer Science and Data
Analysis, which is offered by Yandex since 2007 to graduates in engineering, mathematics,
computer science or related fields. The aim of the School is to train specialists in data analysis
and information retrieval for further employment at Yandex or any other IT company.
…
The School’s courses are taught by Russian and international experts at Yandex’s Moscow office
in the evenings, several times a week. The average study load is 15-20 hours per week, including
9-12 hours of lectures and seminars. The School also runs distance-learning courses and provides
lectures over the internet. All courses at the Yandex School of Data Analysis are currently taught
only in Russian.
http://shad.yandex.ru/lectures/
Alexander D’yakonov Resources
http://alexanderdyakonov.narod.ru/index.htm
Unknown in Data Mining and Machine Learning (2013)
Чему не учат в анализе данных и машинном обучении
http://alexanderdyakonov.narod.ru/lpot4emu.pdf
Introduction to Data Mining (2012)
Введение в анализ данных
http://alexanderdyakonov.narod.ru/intro2datamining.pdf
Tricks in Data Mining (2011)
Шаманство в анализе данных
http://alexanderdyakonov.narod.ru/lpotdyakonov.pdf
MOOC, Opencourseware in
Japanese
I hope to find resources soon. Any suggestion is welcome! Thanks in advance!
Jacqueline
MOOC, Opencourseware in
Chinese
Yeeyan Coursera Chinese Classroom
Google Translation from Chinese (Simplified Han) to English
Welcome to Yeeyan × Coursera Chinese classroom. In this always have a small partner to accompany the classroom, you can: join collaborative translation; exchange ideas; enrollment became class representative; punch seek supervision; ...... Finally, welcome to drying out your certificate, either × Coursera joint Yeeyan Translator's
Certificate or Certificate of Coursera course, you are overcome my own life winner!
http://coursera.yeeyan.org
Hong Kong Open Source Conference 2013
Wang Leung Wong
The Vice-Chairperson of the Hong Kong Linux User Group
This channel will post the videos of my life and opensource events in Hong Kong.
Hong Kong Linux User Group: http://linux.org.hk
Facebook: https://www.facebook.com/groups/hklug/
http://www.youtube.com/playlist?list=PL2FSfitY-hTKbEKNOwb-j0blK6qBauZ1f
Guokr.com
Machine Learning
http://mooc.guokr.com/search/?wd= %E6%9C%BA%E5%99%A8%E5%AD
%A6%E4%B9%A0
Data Mining
http://mooc.guokr.com/search/?wd=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E
%98
Artificial Intelligence
http://mooc.guokr.com/search/?wd=%E4%BA%BA%E5%B7%A5%E6%99%BA
%E8%83%BD
MOOC, Opencourseware in
Portuguese
Aprendizado de Maquina by Bianca Zadrozni, Instituto de
Computação, UFF, 2010
http://www2.ic.uff.br/~bianca/aa/
Algoritmo de Aprendizado de Máquina by Aurora Trinidad
Ramirez Pozo, Universidade Federal do Paraná, UFPR
http://www.inf.ufpr.br/aurora/tutoriais/aprendizadomaq/
http://www.inf.ufpr.br/aurora/tutoriais/arvoresdecisao/
http://www.inf.ufpr.br/aurora/tutoriais/Ceapostila.pdf
http://www.inf.ufpr.br/aurora/
Digital Library, Universidad de Sao Paulo
http://www.teses.usp.br/index.php?
option=com jumi&fileid=20&Itemid=96&lang=en&cx=011662445380875560067%3Acack5lsx
ley&cof=FORID%3A11&hl=en&q=machine learning&siteurl=www.teses.usp.br%2Findex.php
%3Foption%3Dcom jumi%26fileid%3D20%26Itemid%3D96%26lang
%3Den&ref=www.teses.usp.br%2F&ss=5799j3321895j16
‫‪MOOC, Opencourseware in‬‬
‫‪Hebrew‬‬
‫‪Open University of Israel‬‬
‫‪.‬האוניברסיטה הפתוחה היא ייחודית בנוף האקדמי בישראל‬
‫היא דומה לאוניברסיטאות האחרות בחתירתה למצוינות ובשקידתה על איכות למדנית‬
‫ומדעית גבוהה‪ ,‬אך היא שונה מהן במבנה הארגוני שלה‪ ,‬בשיטות ההוראה שלה‪ ,‬במערך‬
‫‪.‬תכניות הלימודים ובדרישותיה מן המועמדים הפונים להירשם לקורסים שלה‬
‫האוניברסיטה הפתוחה‪ ,‬כשמה כן היא‪ .‬היא פותחת את שעריה‪ ,‬בלא תנאים מוקדמים ובלי‬
‫דרישות קדם‪ ,‬הן בפני מי שמבקשים ללמוד קורסים בודדים או חטיבות קורסים‪ ,‬הן בפני מי‬
‫‪".‬שמעוניינים ללמוד תכנית לימודים מלאה לתואר "בוגר אוניברסיטה‬
‫
‪http://www.youtube.com/user/openofek/search?query=machine learning‬‬
Homeworks, Assignments &
Solutions
CS229 Stanford Machine Learning List of projects (free access
to abstracts), 2013 and previous years
http://cs229.stanford.edu/projects2013.html
http://cs229.stanford.edu
CS229 Stanford Machine Learning by Andrew Ng, Autumn
2014
Some Exercises & Solutions collected from CS229's link
http://cs229.stanford.edu/materials/ps1.pdf
http://cs229.stanford.edu/materials/ps1sol.pdf
http://cs229.stanford.edu/materials/ps2.pdf
http://cs229.stanford.edu/materials/ps2sol.pdf
http://cs229.stanford.edu/materials/ps3.pdf
http://cs229.stanford.edu/materials/ps3sol.pdf
http://cs229.stanford.edu/materials/midterm-2010-solutions.pdf
http://cs229.stanford.edu/materials/midterm_aut2014.pdf
CS 445/545 Machine Learning by Melanie Mitchell, Winter
Quarter 2014
Some Exercises & Solutions
http://web.cecs.pdx.edu/~mm/MachineLearningWinter2014/
Top Writing Errors by Melanie Mitchell
http://web.cecs.pdx.edu/~mm/TopWritingErrors.pdf
Introduction to Machine Learning, Machine Learning Lab,
University of Freiburg, Germany
http://ml.informatik.uni-freiburg.de/teaching/ss14/ml
http://ml.informatik.uni-freiburg.de/_media/teaching/ss14/ml/sheet01.pdf
http://ml.informatik.uni-freiburg.de/ media/teaching/ss14/sheet01 solution.pdf
Unsupervised Feature Learning and Deep Learning by Andrew
Ng, 2011 ?
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=ufldl&doc=exercises/ex1/ex1.html
Machine Learning by Andrew Ng, 2011
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex2/ex2.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex3/ex3.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex4/ex4.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex5/ex5.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex6/ex6.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex7/ex7.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex8/ex8.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?
course=MachineLearning&doc=exercises/ex9/ex9.html
Pattern Recognition and Machine Learning, Solutions to
Exercises, by Markus Svensen and Christopher Bishop, 2009
http://research.microsoft.com/en-us/um/people/cmbishop/prml/pdf/prml-websol-2009-09-08.pdf
Machine Learning Course by Aude Billard, Exercises &
Solutions, EPFL, Switzerland
Overview and objective
The aim of machine learning is to extract knowledge from data. The algorithm may be informed
by incorporating prior knowledge of the task at hand. The amount of information varies from
fully supervised to unsupervised or semi-supervised learning. This course will present some of the
core advanced methods in the field for structure discovery, classification and non-linear
regression. This is an advanced class in Machine Learning; hence, students are expected to have
some background in the field. The class will be accompanied by practical session on computer,
using the mldemos software (http://mldemos.epfl.ch) that encompasses more than 30 state of the
art algorithms.
http://lasa.epfl.ch/teaching/lectures/ML_Phd/
T-61.3025 Principles of Pattern Recognition Weekly Exercises
with Solutions (in English), Aalto University, Finland, 2015
https://noppa.aalto.fi/noppa/kurssi/t-61.3025/viikkoharjoitukset
T-61.3050 Machine Learning: Basic Principles Weekly Exercises
with Solutions (in English), Aalto University, Finland, Fall 2014
https://noppa.aalto.fi/noppa/kurssi/t-61.3050/viikkoharjoitukset
http://www.aalto.fi/en/
CSE-E5430 Scalable Cloud Computing Weekly Exercises with
Solutions (in English), Aalto University, Finland, Fall 2014
https://noppa.aalto.fi/noppa/kurssi/cse-e5430/viikkoharjoitukset
Weekly Exercises with Solutions (in English) from Aalto
University, Finland
TO EXPLORE, not to be missed!
https://noppa.aalto.fi/noppa/kurssit/sci/t3060
SurfStat Australia: an online text in introductory Statistics
http://surfstat.anu.edu.au/surfstat-home/surfstat-main.html
Exercises & Solutions
http://surfstat.anu.edu.au/surfstat-home/exercises.html
Learning from Data by Amos Storkey, Tutorial & Worksheets
(with solutions), University of Edinburgh, Fall 2014
This is a course for basic data analysis, statistical model building and machine learning. The
course aims to provide a set of tools that I hope you will find very useful, coupled with a
principled approach to formulating solutions to problems in machine learning.
http://www.inf.ed.ac.uk/teaching/courses/lfd/lfdtutorials.html
Web Search and Mining by Christopher Manning and
Prabhakar Raghavan,, Winter 2005
Slides, Exercises & Solutions
http://web.stanford.edu/class/cs276b/
http://web.stanford.edu/class/cs276b/syllabus.html
Statistical Learning Theory by Peter Bartlett, Berkeley,
Homework & solutions, Spring 2014
This course will provide an introduction to the theoretical analysis of prediction methods,
focusing on statistical and computational aspects. It will cover approaches such as kernel methods
and boosting algorithms, and probabilistic and game theoretic formulations of prediction
problems, and it will focus on tools for the theoretical analysis of the performance of learning
algorithms and the inherent difficulty of learning problems.
http://www.stat.berkeley.edu/~bartlett/courses/2014spring-cs281bstat241b/
Introduction to Time Series by Peter Bartlett, Berkeley,
Homework & solutions, Fall 2010
An introduction to time series analysis in the time domain and frequency domain. Topics will
include: Stationarity, autocorrelation functions, autoregressive moving average models, partial
autocorrelation functions, forecasting, seasonal ARIMA models, power spectra, discrete Fourier
transform, parametric spectral estimation, nonparametric spectral estimation.
http://www.stat.berkeley.edu/~bartlett/courses/153-fall2010/index.html
Introduction to Machine Learning by Stuart Russel, CS 194-10,
Fall 2011, Assignments & Solutions
The course will be a mixture of theory, algorithms, and hands-on projects with real data. The
goal is to enable students to understand and use machine learning methods across a wide range
of settings.
http://www.eecs.berkeley.edu/~russell/classes/cs194/f11/ Statistical Learning Theory by Peter Bartlett, Berkeley,
Homework & solutions, Fall 2009
This course will provide an introduction to probabilistic and computational methods for the
statistical modeling of complex, multivariate data. It will concentrate on graphical models, a
flexible and powerful approach to capturing statistical dependencies in complex, multivariate
data. In particular, the course will focus on the key theoretical and methodological issues of
representation, estimation, and inference.
http://www.cs.berkeley.edu/~bartlett/courses/2009fall-cs281a/
Advanced Topics in Machine Learning by Arthur Gretton,
2015, University College London (exercises with solutions)
http://www.gatsby.ucl.ac.uk/~gretton/coursefiles/rkhscourse.html
Reinforcement Learning by David Silver, 2015, University
College London (exercises with solutions)
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html
Emmanuel Candes Lectures, Homeworks & Solutions, Stanford
University (great resources, not to be missed!)
http://statweb.stanford.edu/~candes/teaching.html
Advanced Topics in Convex Optimization by Emmanuel
Candes, Handouts, Homeworks & Solutions, Winter 2015,
Stanford University
Description: The main goal of this course is to expose students to modern and fundamental
developments in convex optimization, a subject which has experienced tremendous growth in the
last 20 years or so. This course builds on EE 364 and explores two distinct areas. The first
concerns cone programming and especially semidefinite programming whose rich geometric
theory and expressive power makes it suitable for a wide spectrum of important optimization
problems arising in engineering and applied science. The second concerns novel and efficient
first-order methods, e.g. Nesterov's method, for smooth and nonsmooth convex optimization
which are suitable for large-scale problems.
This is an advanced topics course, which will hopefully bring students near the frontier of current
research.
http://statweb.stanford.edu/~candes/math301/index.html
MSM 4M13 Multicriteria Decision Making by SÁNDOR
ZOLTÁN NÉMETH, School of Mathematics, University of
Birmingham
Slides, Handouts, Problems
http://web.mat.bham.ac.uk/S.Z.Nemeth/teaching.htm
Theorem Proofs, Exercises & Solutions
http://web.mat.bham.ac.uk/S.Z.Nemeth/4m13
10-601 Machine Learning Spring 2015, Homeworks & Solutions
& Code (Matlab)
http://www.cs.cmu.edu/%7Eninamf/courses/601sp15/homeworks.shtml
Introduction to Machine Learning by Alex Smola, CMU,
Homeworks & Solutions
I work on machine learning and statistical data analysis. This includes application areas ranging
from document analysis, bioinformatics, computer vision to the analysis of internet data. In my
work I have supervised numerous PhD students and researchers and I have written over 150
papers, written one book and edited 5 books. My specialties are kernel methods, such as Support
Vector Machines and Gaussian Processes, and unsupervised information extraction. This
includes highly scalable models which work on many TB of data and hundreds of millions of
users.
Specialties: Kernel Methods, User Profiling, Computational Advertising, Document Analysis,
Bioinformatics, Statistical Modelling, Optimization
http://alex.smola.org/teaching/10-701-15/submission.html
https://www.youtube.com/user/smolix?spfreload=10
Applications
MIT Media Lab
The real-time city is now real! The increasing deployment of sensors and hand-held electronics
in recent years is allowing a new approach to the study of the built environment. The way we
describe and understand cities is being radically transformed - alongside the tools we use to
design them and impact on their physical structure.
Studying these changes from a critical point of view and anticipating them is the goal of the
SENSEable City Laboratory, a new research initiative at the Massachusetts Institute of
Technology.
http://senseable.mit.edu
TEDx San Francisco, Connected Reality
Connected Reality is an evening that explored how the exponential technologies of the Internet
of Things will give us deep insights that augment our understanding of the world and each other
and will propel our ability to build intelligent tools that augment our lives. We'll briefly see the
future through the eyes of presenters from varied industries of medicine to manufacturing who
will illustrate how they use sensor data to perceive and understand the world differently and
adjust their realities based on their new connectivity to their environment.
http://tedxsf.org/videos/#tedxsf-connected-reality
Emotion&Pain Project
One of the main challenges facing healthcare providers in the UK today (and in Europe) is the
rising number of people with chronic health problems. Almost 1 in 7 UK citizens experiences
chronic pain, some due to chronic diseases such as osteoarthritis, but much of it mechanical low
back pain (LBP) with no treatable pathology. 40% of these people experience severe pain and are
very restricted by it.
The capacity of our current health care system is insufficient to treat all these patients face-toface. Pain experience is affected by physical, psychological, and social factors and hence it poses a
problem to the medical profession. This has prompted the development of a multidisciplinary
approach to the treatment of chronic LBP, primarily involving psychology and physiotherapy
alongside specialist clinicians (see British Pain Society guidelines). These programmes enable
patients to become more self-managing through improving their physical and psychological
functioning. While short term results are good, maintenance of these gains, and building on
them, remains a problem, with psychological factors being one of the primary limiting causes.
Rehabilitation-assistive technologies have shown some success in helping recovery in a number of
conditions but have yet to have an impact in pain management, mostly because of the complexity
of dealing with emotional and motivational aspects of self-directed activity increase. By providing
the means to automatically recognise, interpret, and act upon human affective states, recent
developments in sensing technology and the field of affective computing offer new avenues for
addressing these limitations and alleviating the difficulties patients face in building on treatment
gains.
Thus we propose the design and development of an intelligent system that will enable ubiquitous
monitoring and assessment of patients’ pain-related mood and movements inside (and in the
longer term, outside) the clinical environment. Specifically, we aim to
(a) develop a set of methods for automatically recognising audiovisual cues related to pain,
behavioural patterns typical of low back pain, and affective states influencing pain, and
(b) integrate these methods into a system that will provide appropriate feedback and prompts to
the patient based on his/her behaviour measured during self-directed physical therapy sessions.
In doing so, we seek to develop a new generation of multimodal patient-centred personal health
technology.
http://www.emo-pain.ac.uk
IBM Research
Machine learning applications
Five innovations that will change our lives within five years
http://www.research.ibm.com/cognitive-computing/machine-learning-applications/
index.shtml#fbid=Dp4uN7k8b2O
EFPL Ecole Polytechnique Fédérale de Lausanne
EPFL is one of two Federal Institutes of Technology in Switzerland. Located along the shore of
Lake Geneva, the university has more than 9,000 students in seven academic schools including
Life Science, Architecture, and Computer Sciences.
http://www.youtube.com/channel/UClMJeVIVyGp-3 kWtspkS0Q
Visualizing MBTA Data: An interactive exploration of Boston's
subway system
Boston’s Massachusetts Bay Transit Authority (MBTA) operates the 4th busiest subway system in
the U.S. after New York, Washington, and Chicago. … We attempt to present this information to
help people in Boston better understand the trains, how people use the trains, and how the
people and trains interact with each other.
http://mbtaviz.github.io
Commercial Applications
Listed without any transfer of money
Google glass
http://www.youtube.com/watch?v=D7TB8b2t3QE
Google self-driving car
http://www.youtube.com/watch?v=cdgQpa1pUUE
SenseFly
http://www.youtube.com/watch?v=NuZUSe87miY
HOW MICROSOFT'S MACHINE LEARNING IS
BREAKING THE GLOBAL LANGUAGE BARRIER
Earlier this week, roughly 50,000 Skype users woke up to a new way of communicating over the
Web-based phone- and video-calling platform, a feature that could’ve been pulled straight out of
Star Trek. The new function, called Skype Translator, translates voice calls between different
languages in realtime, turning English to Spanish and Spanish back into English on the fly. Skype
plans to incrementally add support for more than 40 languages, promising nothing short of a
universal translator for desktops and mobile devices.
The product of more than a decade of dedicated research and development by Microsoft
Research (Microsoft acquired Skype in 2011), Skype Translator does what several other Silicon
Valley icons not to mention the U.S. Department of Defense have not yet been able to do. To
do so, Microsoft Research (MSR) had to solve some major machine learning problems while
pushing technologies like deep neural networks into new territory.
http://www.popsci.com/how-microsofts-machine-learning-breaking-language-barrier
RESEARCH PAPERS, in
English
Cambridge University Publications page
http://mlg.eng.cam.ac.uk/pub/
arXiv.org by Cornell University Library
Open access to 999,848 e-prints in Physics, Mathematics, Computer Science, Quantitative
Biology, Quantitative Finance and Statistics
http://arxiv.org
Google Scholar
Stand on the shoulders of giants.
Google Scholar provides a simple way to broadly search for scholarly literature. From one place,
you can search across many disciplines and sources: articles, theses, books, abstracts and court
opinions, from academic publishers, professional societies, online repositories, universities and
other web sites. Google Scholar helps you find relevant work across the world of scholarly
research.
http://scholar.google.com/intl/en/scholar/about.html
http://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=machine
learning&before author=m83- 28PAAAJ&astart=0
Google Research
Google publishes hundreds of research papers each year. Publishing is important to us; it enables
us to collaborate and share ideas with, as well as learn from, the broader scientific community.
Submissions are often made stronger by the fact that ideas have been tested through real product
implementation by the time of publication.
http://research.google.com/pubs/papers.html
Yahoo Research
The machine learning group is a team of experts in computer science, statistics, mathematical
optimization, and automatic control. They focus on making computers learn abstractions,
patterns, conditional probability distributions, and policies from web scale data with the goal to
improve the online experience for Yahoo! users, partner publishers, and advertisers.
Machine learning has such a broad influence on the internet, it can be quite difficult to
recognize. Machine learning’s benefits are often hidden they are the spam emails you don’t see,
the uninteresting news articles you don’t see, and the irrelevant search results you don’t see, just to
name a new. Machine learning is one of the best technologies we have for solving some of the
biggest problems on the Web.
http://labs.yahoo.com/areas/?areas=machine-learning
Microsoft Research
The Machine Learning Groups of Microsoft Research include a set of researchers and
developers who push the state of the art in machine learning. We span the space from proving
theorems about the math underlying ML, to creating new ML systems and algorithms, to helping
our partner product groups apply ML to large and complex data sets.
http://research.microsoft.com/en-us/groups/mldept/
Journal from MIT Press
The Journal of Machine Learning Research (JMLR) provides an international forum for the
electronic and paper publication of high-quality scholarly articles in all areas of machine
learning. All published papers are freely available online.
http://jmlr.org
DROPS, Dagstulh Research Online Publication Server
Access to Research Papers
http://drops.dagstuhl.de/opus/
OPEN SOURCE
SOFTWARE, in English
Weka 3: Data Mining Software in Java
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from your own Java code. Weka contains tools for
data pre-processing, classification, regression, clustering, association rules, and visualization. It is
also well-suited for developing new machine learning schemes.
http://www.cs.waikato.ac.nz/~ml/weka/index.html
A deep-learning library for Java
Distributed Deep Learning Platform for Java
https://github.com/deeplearning4j/deeplearning4j
List of Java ML Software by Machine Learning Mastery
http://machinelearningmastery.com/java-machine-learning/
List of Java ML Software by MLOSS
http://mloss.org/software/language/java/
MathFinder: Math API Discovery and Migration, Software
Engineering and Analysis Lab (SEAL), IISc Bangalore
MathFinder is an Eclipse plugin supported by a unit test mining backend for discovering and
migrating math APIs. It is intended to make (re)implementing math algorithms in Java easier.
Given a math expressions (see the syntax below), it returns a pseudo-code involving calls to
suitable Java APIs. At present, it supports programming tasks that require use of matrix and
linear algebra APIs. The underlying technique is however general and can be extended to
support other math domains.
http://www.iisc-seal.net/mathfinder
Google Java Style
http://google-styleguide.googlecode.com/svn/trunk/javaguide.html
JSAT: java-statistical-analysis-tool by Edward Raff
JSAT is a library for quickly getting started with Machine Learning problems. It is developed in
my free time, and made available for use under the GPL 3. Part of the library is for self
education, as such - all code is self contained. JSAT has no external dependencies, and is pure
Java. I also aim to make the library suitably fast for small to medium size problems. As such,
much of the code supports parallel execution.
https://github.com/EdwardRaff/JSAT/tree/master
Theano Library for Deep Learning, Python
Theano is a Python library that allows you to define, optimize, and evaluate mathematical
expressions involving multi-dimensional arrays efficiently. Theano features:
•
Use numpy.ndarray in Theano-compiled
tight integration with NumPy
functions.
•
transparent use of a GPU
with CPU.(float32 only)
Perform data-intensive calculations up to 140x faster than
•
efficient symbolic differentiation
one or many inputs.
•
speed and stability optimizations
Theano does your derivatives for function with
Get the right answer for log(1+x) even when x is
really tiny.
•
dynamic C code generation
Evaluate expressions faster.
•
extensive unit-testing and self-verification
mistake.
Detect and diagnose many types of
Theano has been powering large-scale computationally intensive scientific investigations since
2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University
of Montreal).
http://deeplearning.net/software/theano/
http://nbviewer.ipython.org/github/craffel/theano-tutorial/blob/master/Theano
%20Tutorial.ipynb
Theano and LSTM for Sentiment Analysis by Frederic Bastien,
Universite de Montreal
https://github.com/StartupML/Bastien-Theano-Workshop
Introduction to Deep Learning with Python
Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning with
Python and the Theano library. The emphasis of the talk is on high performance computing,
natural language processing using recurrent neural nets, and large scale learning with GPUs.
https://www.youtube.com/watch?v=S75EdAcXHKk
COURSERA: An Introduction to Interactive Programming in
Python (Part 1)
Part of the Fundamentals of Computing Specialization »
About the Course
This two-part course (part 2 is available here) is designed to help students with very little or no
computing background learn the basics of building simple interactive applications. Our language
of choice, Python, is an easy-to learn, high-level computer language that is used in many of the
computational courses offered on Coursera. To make learning Python easy, we have developed a
new browser-based programming environment that makes developing interactive applications in
Python simple. These applications will involve windows whose contents are graphical and
respond to buttons, the keyboard and the mouse.
The primary method for learning the course material will be to work through multiple "miniprojects" in Python. To make this class enjoyable, these projects will include building fun games
such as Pong, Blackjack, and Asteroids. When you’ve finished our course, we can’t promise that
you will be a professional programmer, but we think that you will learn a lot about programming
in Python and have fun while you’re doing it.
https://www.coursera.org/course/interactivepython1
COURSERA: An Introduction to Interactive Programming in
Python (Part 2)
Part of the Fundamentals of Computing Specialization »
https://www.coursera.org/course/interactivepython2
COURSERA: Programming for Everybody (Python)
About the Course
This course is specifically designed to be a first programming course using the popular Python
programming language. The pace of the course is designed to lead to mastery of each of the
topics in the class. We will use simple data analysis as the programming exercises through the
course. Understanding how to process data is valuable for everyone regardless of your career.
This course might kindle an interest in more advanced programming courses or courses in web
design and development or just provide skills when you are faced with a bunch of data that you
need to analyze. You can do the programming assignments for the class using a web browser or
using your personal computer. All required software for the course is free. https://www.coursera.org/course/pythonlearn
Udacity - Programming foundations with Python
You’ll pick up some great tools for your programming toolkit in this course! You will:
•
Start coding in the programming language Python;
•
Reuse and share code with Object Oriented Programming;
•
Create and share amazing, life-hacking projects!
https://www.udacity.com/course/programming-foundations-with-python--ud036
Scikit-learn, Machine Learning in Python
Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable - BSD license
http://scikit-learn.org/stable/index.html
Pydata
PyData is a gathering of users and developers of data analysis tools in Python. The goals are to
provide Python enthusiasts a place to share ideas and learn from each other about how best to
apply our language and tools to ever-evolving challenges in the vast realm of data management,
processing, analytics, and visualization.
https://www.youtube.com/user/PyDataTV/videos
PyData NYC 2014 Videos
https://www.youtube.com/user/PyDataTV/videos?spfreload=10
PyData is a gathering of users and developers of data analysis tools in Python. The goals are to
provide Python enthusiasts a place to share ideas and learn from each other about how best to
apply our language and tools to ever-evolving challenges in the vast realm of data management,
processing, analytics, and visualization.
We aim to be an accessible, community-driven conference, with tutorials for novices, advanced
topical workshops for practitioners, and opportunities for package developers and users to meet
in person.
A major goal of the conference is to provide a venue for users across all the various domains of
data analysis to share their experiences and their techniques, as well as highlight the triumphs
and potential pitfalls of using Python for certain kinds of problems.
http://pydata.org/nyc2014/about/about/
PyData, The Complete Works by Rohit Sivaprasad
The unofficial index of all PyData talks. This was intially going to be a pickled pandas
DataFrame object, but then I decided against it. So here it is - in beautiful Github flavored
markdown.
There are placeholders for links to the video. Currently, the hyperlinks point to the pydata.org
talk pages. Please do feel free to make it better by contributing to the repo.
https://github.com/DataTau/datascience-anthology-pydata
Anaconda
Completely free enterprise-ready Python distribution for large-scale data processing, predictive
analytics, and scientific computing
We want to ensure that Python, NumPy, SciPy, Pandas, IPython, Matplotlib, Numba, Blaze,
Bokeh, and other great Python data analysis tools can be used everywhere.
We want to make it easier for Python evangelists and teachers to promote the use of Python.
We want to give back to the Python community that we love being a part of.
https://store.continuum.io/cshop/anaconda/
Ipython Interactive Computing
IPython provides a rich architecture for interactive computing with:
Powerful interactive shells (terminal and Qt-based).
A browser-based notebook with support for code, rich text, mathematical expressions, inline plots
and other rich media.
Support for interactive data visualization and use of GUI toolkits.
Flexible, embeddable interpreters to load into your own projects.
Easy to use, high performance tools for parallel computing.
http://ipython.org/
Scipy
SciPy refers to several related but distinct entities:
•
The SciPy Stack, a collection of open source software for scientific computing in Python,
and particularly a specified set of core packages.
•
The community of people who use and develop this stack.
•
Several conferences dedicated to scientific computing in Python - SciPy, EuroSciPy and
SciPy.in.
The SciPy library, one component of the SciPy stack, providing many numerical routines.
http://www.scipy.org/
Numpy
NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
•
a powerful N-dimensional array object
•
sophisticated (broadcasting) functions
•
tools for integrating C/C
•
useful linear algebra, Fourier transform, and random number capabilities
and Fortran code
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly
and speedily integrate with a wide variety of databases.
http://www.numpy.org/
matplotlib
matplotlib is a python 2D plotting library which produces publication quality figures in a variety
of hardcopy formats and interactive environments across platforms. matplotlib can be used in
python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web
application servers, and six graphical user interface toolkits.
http://matplotlib.org/
pandas
Python Data Analysis Library¶
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data
structures and data analysis tools for the Python programming language.
http://pandas.pydata.org/
SymPy
SymPy is a Python library for symbolic mathematics.
http://www.sympy.org/en/index.html
Orange
Open source data visualization and analysis for novice and experts. Data mining through visual
programming or Python scripting. Components for machine learning. Add-ons for bioinformatics
and text mining. Packed with features for data analytics.
http://orange.biolab.si/
Pythonic Perambulations: How to be a Bayesian in Python
Below I'll explore three mature Python packages for performing Bayesian analysis via MCMC:
emcee: the MCMC Hammer
pymc: Bayesian Statistical Modeling in Python
pystan: The Python Interface to Stan
http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-inpython/
emcee
emcee is an extensible, pure-Python implementation of Goodman & Weare's Affine Invariant
Markov chain Monte Carlo (MCMC) Ensemble sampler. It's designed for Bayesian parameter
estimation and it's really sweet!
http://dan.iel.fm/emcee/current/
PyMC
PyMC is a python module that implements Bayesian statistical models and fitting algorithms,
including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large
suite of problems. Along with core sampling functionality, PyMC includes methods for
summarizing output, plotting, goodness-of-fit and convergence diagnostics.
http://pymc-devs.github.io/pymc/
Pylearn2
Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza,
Razvan Pascanu, James Bergstra, Frédéric Bastien, and Yoshua Bengio. "Pylearn2: a machine
learning research library". arXiv preprint arXiv:1308.4214 (BibTeX)
https://github.com/lisa-lab/pylearn2
PyCon US 2014
PyCon is the largest annual gathering for the community using and developing the open-source
Python programming language. It is produced and underwritten by the Python Software
Foundation, the 501(c)(3) nonprofit organization dedicated to advancing and promoting Python.
Through PyCon, the PSF advances its mission of growing the international community of
Python programmers.
Because PyCon is backed by the non-profit PSF, we keep registration costs much lower than
comparable technology conferences so that PyCon remains accessible to the widest group
possible. The PSF also pays for the ongoing development of the software that runs PyCon and
makes it available under a liberal open source license.
140 videos
http://pyvideo.org/category/50/pycon-us-2014
https://www.youtube.com/user/PyCon2014/videos
PyCon India 2012
https://www.youtube.com/playlist?list=PL6GW05BfqWIdWaV aP6kHJKFY0ybOOfoA
PyCon India 2013
https://www.youtube.com/playlist?list=PL6GW05BfqWIdsaaV35jcHWPWTI-DAw6Yn
Montreal Python
Montréal-Python's mission is to promote the growth of a lively and dynamic community of users
of the Python programming language and to promote the use of the latter. Montréal-Python also
aims to disseminate the local Python knowledge to build a stronger developer community.
Montréal-Python promotes Free and Open Source Software, favors its adoption within the
community, and collaborates with community players to achieve this goal.
https://www.youtube.com/user/MontrealPython/videos
http://montrealpython.org/en/
SciPy 2014
SciPy is a community dedicated to the advancement of scientific computing through open source
Python software for mathematics, science, and engineering. The annual SciPy Conference allows
participants from all types of organizations to showcase their latest projects, learn from skilled
users and developers, and collaborate on code development.
http://pyvideo.org/category/51/scipy-2014
PyLadies London Meetup resources
PyLadies is an international mentorship group with a focus on helping more women and
genderqueers become active participants and leaders in the Python open-source community. Our
mission is to promote, educate and advance a diverse Python community through outreach,
education, conferences, events, and social gatherings. PyLadies also aims to provide a friendly
support network for women and genderqueers, and a bridge to the larger Python world.
https://github.com/pyladieslondon/resources
Python Tools for Machine Learning by CB Insights
https://www.cbinsights.com/blog/python-tools-machine-learning/
Python Tutorials by Jessica MacKellar
I am a startup founder, software engineer, and open source developer living in San Francisco,
California.
I enjoy the Internet, networking, low-level systems engineering, relational databases, tinkering on
electronics projects, and contributing to and helping other people contribute to open source
software.
"Be the change you wish to see in the world" may be clichéd, but what can I say, I believe in it. I
am committed to applying my skills, in individual and collective efforts, to improve the world.
Right now, this means I spend a lot of time volunteering, engaging technologists about education,
and empowering effective people and initiatives in my capacity as a Director for the Python
Software Foundation.
http://web.mit.edu/jesstess/
INTRODUCTION TO PYTHON FOR DATA MINING
http://nbviewer.ipython.org/github/Syrios12/learningwithdata/blob/master/
Python For Data Mining.ipynb
Python Scientific Lecture Notes
Tutorial material on the scientific Python ecosystem, a quick introduction to central tools and
techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level
of expertise, from beginner to expert.
http://scipy-lectures.github.io/index.html#
Notebook Gallery: Links to the best IPython and Jupyter
Notebooks by ?
What is this website ?
This website is a collection of links to IPython/Jupyter notebooks. Contrary to other galleries
(such as the one on nbviewer and the wakari gallery), this collection is continuously updated with
notebooks submitted by users. It also uses the twitter API to fetch new notebooks daily. Please
note that this website does not contain nor host any notebooks, only offers links to relevant
notebooks.
Why did you make this website ?
Have you seen the amazing stuff people are making with IPython/Jupyter notebooks ? It will
blow your mind! So I needed a place where I could find more of these amazing notebooks. For
now it's a simple website that displays the latests and most viewed Notebooks, however in the
future I would like it to have searching and categorization features.
Can I say something ?
Sure!. I'd love to hear some feedback. If it's an issue with the website feel free to open an issue
here. You can also email me at f@bianp.net.
http://nb.bianp.net/sort/views/
Google Python Style Guide
http://google-styleguide.googlecode.com/svn/trunk/pyguide.html
Natural Language Processing with Python by Steven Bird, Ewan
Klein, and Edward Loper
The NLTK book is currently being updated for Python 3 and NLTK 3. This is work in progress;
chapters that still need to be updated are indicated. The first edition of the book, published by
O'Reilly, is available at http://nltk.org/book_1ed/. A second edition of the book is anticipated in
early 2016.
0. Preface
1. Language Processing and Python
2. Accessing Text Corpora and Lexical Resources
3. Processing Raw Text
4. Writing Structured Programs
5. Categorizing and Tagging Words (minor fixes still required)
6. Learning to Classify Text
7. Extracting Information from Text
8. Analyzing Sentence Structure
9. Building Feature Based Grammars
10. Analyzing the Meaning of Sentences (minor fixes still required)
11. Managing Linguistic Data (minor fixes still required)
12. Afterword: Facing the Language Challenge
Bibliography
Term Index
This book is made available under the terms of the Creative Commons Attribution
Noncommercial No-Derivative-Works 3.0 US License.
Please post any questions about the materials to the nltk-users mailing list. Please report any
errors on the issue tracker.
http://www.nltk.org/book/
PyBrain Library
Welcome to PyBrain
PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-touse yet still powerful algorithms for Machine Learning Tasks and a variety of predefined
environments to test and compare your algorithms.
PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural
Network Library. In fact, we came up with the name first and later reverse-engineered this quite
descriptive "Backronym".
http://pybrain.org/
Classifying MNIST dataset with Pybrain
http://analyticsbot.ml/2015/02/classifying-mnist-dataset-pybrain/
OCTAVE
GNU Octave is a high-level interpreted language, primarily intended for numerical
computations. It provides capabilities for the numerical solution of linear and nonlinear
problems, and for performing other numerical experiments. It also provides extensive graphics
capabilities for data visualization and manipulation. Octave is normally used through its
interactive command line interface, but it can also be used to write non-interactive programs.
The Octave language is quite similar to Matlab so that most programs are easily portable.
http://www.gnu.org/software/octave/
PMTK Toolbox by Matt Dunham, Kevin Murphy
PMTK is a collection of Matlab/Octave functions, written by Matt Dunham, Kevin Murphy
and various other people. The toolkit is primarily designed to accompany Kevin Murphy's
textbook Machine learning: a probabilistic perspective, but can also be used independently of
this book. The goal is to provide a unified conceptual and software framework encompassing
machine learning, graphical models, and Bayesian statistics (hence the logo). (Some methods
from frequentist statistics, such as cross validation, are also supported.) Since December 2011, the
toolbox is in maintenance mode, meaning that bugs will be fixed, but no new features will be
added (at least not by Kevin or Matt).
PMTK supports a large variety of probabilistic models, including linear and logistic regression
models (optionally with kernels), SVMs and gaussian processes, directed and undirected
graphical models, various kinds of latent variable models (mixtures, PCA, HMMs, etc) , etc.
Several kinds of prior are supported, including Gaussian (L2 regularization), Laplace (L1
regularization), Dirichlet, etc. Many algorithms are supported, for both Bayesian inference
(including dynamic programming, variational Bayes and MCMC) and MAP/ML estimation
(including EM, conjugate and projected gradient methods, etc.)
https://github.com/probml/pmtk3
Octave Tutorial by Paul Nissenson
I was born, raised, and educated in Orange County, California. I figure that everyone around the
world wants to come here, so why should I leave?
I received my B.S. in Physics from the University of California, Irvine (UCI) in 2003. Not
knowing what to do next, I decided to further my education at UCI by attending graduate school
in Mechanical & Aerospace Engineering. My research focused on computer modeling of systems
that are related to the atmosphere. I was fortunate to work under my very supportive advisor, Dr.
Donald Dabdub, and work with a lot of good collaborators.
During graduate school, I was a teaching assistant many times and found my true calling.
Research had its ups and downs, but teaching was always fun for me. After I received my Ph.D. in
2009, I decided to follow my heart and pursue a faculty position at a primarily undergraduate
university. After being a post-doctoral researcher at UCI for a couple years, I was hired as an
Assistant Professor in the Mechanical Engineering Department at Cal Poly Pomona in Fall 2011.
https://www.youtube.com/channel/UCr-6gDvh0atAFM4VuYq7PHw/videos?spfreload=10
http://www.cpp.edu/~pmnissenson/
JULIA
Julia is a high-level, high-performance dynamic programming language for technical computing,
with syntax that is familiar to users of other technical computing environments. It provides a
sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive
mathematical function library. The library, largely written in Julia itself, also integrates mature,
best-of-breed C and Fortran libraries for linear algebra, random number generation, signal
processing, and string processing. In addition, the Julia developer community is contributing a
number of external packages through Julia’s built-in package manager at a rapid pace. IJulia, a
collaboration between the IPython and Julia communities, provides a powerful browser-based
graphical notebook interface to Julia.
Julia programs are organized around multiple dispatch; by defining functions and overloading
them for different combinations of argument types, which can also be user-defined. For a more
in-depth discussion of the rationale and advantages of Julia over other systems, see the following
highlights or read the introduction in the online manual.
http://julialang.org/
Julia by example by Samuel Colvin
http://samuelcolvin.github.io/JuliaByExample/
https://github.com/samuelcolvin
The R PROJECT for Statistical Computing
R is a language and environment for statistical computing and graphics…
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests,
time-series analysis, classification, clustering, ...) and graphical techniques, and is highly
extensible. The S language is often the vehicle of choice for research in statistical methodology,
and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots can be
produced, including mathematical symbols and formulae where needed. Great care has been
taken over the defaults for the minor design choices in graphics, but the user retains full control.
https://www.r-project.org/
Coursera: R Programming
Part of the Data Science Specialization »
About the Course
In this course you will learn how to program in R and how to use R for effective data
analysis. You will learn how to install and configure software necessary for a statistical
programming environment and describe generic programming language concepts as they are
implemented in a high-level statistical language. The course covers practical issues in statistical
computing which includes programming in R, reading data into R, accessing R packages, writing
R functions, debugging, profiling R code, and organizing and commenting R code. Topics in
statistical data analysis will provide working examples.
https://www.coursera.org/course/rprog
R Graph Gallery
The blog is a collection of script examples with example data and output plots. R produce
excellent quality graphs for data analysis, science and business presentation, publications and
other purposes. Self-help codes and examples are provided. Enjoy nice graphs !!
http://rgraphgallery.blogspot.co.uk/2013/04/ploting-heatmap-in-map-using-maps.html
Code School - R Course
Learn the R programming language for data analysis and visualization. This software
programming language is great for statistical computing and graphics.
https://www.codeschool.com/courses/try-r
Coursera R programming
In this course you will learn how to program in R and how to use R for effective data
analysis. You will learn how to install and configure software necessary for a statistical
programming environment and describe generic programming language concepts as they are
implemented in a high-level statistical language. The course covers practical issues in statistical
computing which includes programming in R, reading data into R, accessing R packages, writing
R functions, debugging, profiling R code, and organizing and commenting R code. Topics in
statistical data analysis will provide working examples.
https://www.coursera.org/course/rprog
Open Intro R Labs
OpenIntro Labs promote the understanding and application of statistics through applied data
analysis. The statistical software R is a widely used and stable software that is free. RStudio is a
user-friendly interface for R.
https://www.openintro.org/stat/labs.php
R Tutorial
•
Hierarchical Linear Model
•
Bayesian Classification with Gaussian Process
•
Bayesian Inference Using OpenBUGS
•
Significance Test for Kendall's Tau-b
•
Support Vector Machine with GPU, Part II
•
Hierarchical Cluster Analysis
http://www.r-tutor.com/
DataCamp R Course
•
Introduction to R
•
Data Analysis and Statistical Inference
•
Introduction to Computational Finance and Financial Econometrics
•
How to work with Quandl in R
https://www.datacamp.com/courses
R Bloggers
R-Bloggers.com is a central hub (e.g: A blog aggregator) of content collected from bloggers who
write about R (in English). The site will help R bloggers and users to connect and follow the “R
blogosphere” (you can view a 7 minute talk, from useR2011, for more information about the Rblogosphere).
http://www.r-bloggers.com/
R-Project Package: caret: Classification and Regression
Training
Misc functions for training and plotting classification and regression models
https://cran.r-project.org/web/packages/caret/index.html
A Short Introduction to the caret Package by Max Kuhn
The caret package (short for classification and regression training) contains functions to
streamline the model training process for complex regression and classification problems. The
package utilizes a number of R packages but tries not to load them all at package start-up1. The
package “suggests” field includes 26 packages. caret loads packages as needed and assumes that
they are installed.
https://cran.r-project.org/web/packages/caret/vignettes/caret.pdf
R packages by Hadley Wickham
Style guide
Good coding style is like using correct punctuation. You can manage without it, but it sure makes
things easier to read. As with styles of punctuation, there are many possible variations. The
following guide describes the style that I use (in this book and elsewhere). It is based on Google’s
R style guide, with a few tweaks. You don’t have to use my style, but you really should use a
consistent style.
Good style is important because while your code only has one author, it’ll usually have multiple
readers. This is especially true when you’re writing code with others. In that case, it’s a good idea
to agree on a common style up-front. Since no style is strictly better than another, working with
others may mean that you’ll need to sacrifice some preferred aspects of your style.
http://r-pkgs.had.co.nz/style.html
Google's R Style Guide
http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml
STAN Software
Stan is a probabilistic programming language implementing full Bayesian statistical inference
with
•
MCMC sampling (NUTS, HMC)
•
and penalized maximum likelihood estimation with
•
Optimization (BFGS)
•
Stan is coded in C
•
Stan is freedom-respecting, open-source software (new BSD core, GPLv3 interfaces).
and runs on all major platforms (Linux, Mac, Windows).
Interfaces
Download and getting started instructions, organized by interface:
•
RStan v2.5.0 (R)
•
PyStan v2.5.0 (Python)
•
CmdStan v2.5.0 (shell, command-line terminal)
•
MatlabStan (MATLAB)
•
Stan.jl (Julia)
http://mc-stan.org/
List of Machine Learning Open Source Software
To support the open source software movement, JMLR MLOSS publishes contributions related
to implementations of non-trivial machine learning algorithms, toolboxes or even languages for
scientific computing.
http://jmlr.org/mloss/
Google Prediction API
Google's cloud-based machine learning tools can help analyze your data to add the following
features to your applications: Customer sentiment analysis, Message routing decisions, Document
and email classification, Recommendation systems, Churn analysis, Spam detection, Upsell
opportunity analysis, Diagnostics, Suspicious activity identification, and much more …Free
Quota:
Usage is free for the first six months, up to the following limits per Google Developers Console
project. This free quota applies even when billing is enabled, until the six-month expiration time.
Usage limits:
Predictions: 100 predictions/day
Hosted model predictions: Hosted models have a usage limit of 100 predictions/day/user
across all models.
Training: 5MB trained/day
Streaming updates: 100 streaming updates/day
Lifetime cap: 20,000 predictions.
Expiration: Free quota expires six months after activating Google Prediction for your project in
the Google Developers Console.
https://cloud.google.com/prediction/docs
Reddit
Reddit / rɛdɪt/,[3] stylized as reddit,[4] is an entertainment, social networking service and
news website where registered community members can submit content, such as text posts or
direct links. Only registered users can then vote submissions "up" or "down" to organize the posts
and determine their position on the site's pages. Content entries are organized by areas of
interest called "subreddits". (source Wikipedia)
http://www.reddit.com/r/MachineLearning/
SCHOGUN toolbox
A large scale machine learning toolbox. SHOGUN is designed for unified large-scale learning for a broad range of feature types and
learning settings, like classification, regression, or explorative data analysis.
http://www.shogun-toolbox.org/page/home/
Comparison between ML toolbox
https://docs.google.com/spreadsheets/d/
1bclw5Nq2jwuOuqsBbwe9fjARkxcr50gWyklCL3r1P-4/edit?hl=en%22 %5Cl %22gid
%3D0&pli=1#gid=0
Infer.NET, Microsoft Research
Infer.NET is a framework for running Bayesian inference in graphical models. It can also be
used for probabilistic programming as shown in this video.
You can use Infer.NET to solve many different kinds of machine learning problems, from
standard problems like classification or clustering through to customised solutions to domainspecific problems. Infer.NET has been used in a wide variety of domains including information
retrieval, bioinformatics, epidemiology, vision, and many others.
A new feature in Infer.NET 2.5 is Fun, a library turns the simple succinct syntax of F# into a
probabilistic modeling language for Bayesian machine learning. You can run your models with
F# to compute synthetic data, and you can compile your models with the Infer.NET compiler for
efficient inference. See the Infer.NET Fun website for additional information.
http://research.microsoft.com/en-us/um/cambridge/projects/infernet/default.aspx
F# Software Foundation
F# is ideally suited to machine learning because of its efficient execution, succinct style, data
access capabilities and scalability. F# has been successfully used by some of the most advanced
machine learning teams in the world, including several groups at Microsoft Research.
Try F# has some introductory machine learning algorithms. Further resources related to different
aspects of machine learning are below.
See also the Math and Statistics and Data Science sections for related material.
http://fsharp.org/guides/machine-learning/index.html
BigML
Now Free
Unlimited tasks (up to 16MB/Task)
https://bigml.com/
BRML Toolbox in Matlab/Julia – David Barber Toolbox,
University College London
http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.Software
SCILAB
Scilab is free and open source software for numerical computation providing a powerful
computing environment for engineering and scientific applications. Scilab includes hundreds of mathematical functions. It has a high level programming language
allowing access to advanced data structures, 2-D and 3-D graphical functions. http://www.scilab.org/en/scilab/about
OverFeat and Torch7, CILVR Lab @ NYU
OverFeat is an image recognizer and feature extractor built around a convolutional network.
The OverFeat convolutional net was trained on the ImageNet 1K dataset. It participated in the
ImangeNet Large Scale Recognition Challenge 2013 under the name “OverFeat NYU”.
This release provides C/C
code to run the network and output class probabilities or feature
vectors. It also includes a webcam-based demo.
Torch7 is an interactive development environment for machine learning and computer vision. It
is an extension of the Lua language with a multidimensional numerical array library.
Lua is a very simple, compact and efficient interpreter/compiler with a straightforward syntax. It
is used widely as a scripting language in the computer game industry. Torch extends Lua with an
extensive numerical library and various facilities for machine learning and computer vision.
Torch has computational back-ends for multicore/multi-CPU machines (using Intel/AVX and
OpenMP), NVidia GPUs (using CUDA), and ARM CPUs (using the Neon instruction set).
Many research projects at the CILVR Lab are built with Torch.
http://cilvr.nyu.edu/doku.php?id=code:start
FAIR open sources deep-learning modules for Torch
https://research.facebook.com/blog/879898285375829/fair-open-sources-deep-learningmodules-for-torch/
IPython kernel for Torch with visualization and plotting
https://github.com/facebook/iTorch
Deep Learning Lecture 9: Neural networks and modular design
in Torch by Nando de Freitas, Oxford University
https://www.youtube.com/watch?v=NUKp0c4xb8w&spfreload=10
https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
Deep Learning Lecture 8: Modular back-propagation, logistic
regression and Torch
Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from
Brendan Shillingford.
https://www.youtube.com/watch?v=-YRB0eFxeQA&spfreload=10
Machine Learning with Torch7: Defining your own Neural Net
Module
https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/practicals/practical4.pdf
Lua Tutorial in 15 Minutes by Tyler Neylon
http://tylerneylon.com/a/learn-lua/
http://www.lua.org/
Google: Punctuation, symbols & operators in search
Use search operators to narrow down results
Search operators are words that can be added to searches to help narrow down the results. Don’t
worry about memorizing every operator - you can also use the Advanced Search page to create
these searches.
Note: When you search using operators, don't add any spaces between the operator and your
search terms. A search for site:nytimes.com will work, but site: nytimes.com will not.
https://support.google.com/websearch/answer/2466433?hl=en
A discovery thanks to datatau.com's user called skadamat! Thanks!
To search on a website a sequence of keywords, just type "site:nameOfTheSite keywords" on
Google search
Example: How to search "deep learning" on "The Machine Learning Salon"
https://www.google.co.uk/?gws rd=ssl#q=site:machinelearningsalon.org deep
learning&start=0
WolframAlpha
Making the world’s knowledge computable
Wolfram Alpha introduces a fundamentally new way to get knowledge and answers
not by searching the web, but by doing dynamic computations based on a vast collection of builtin data, algorithms, and methods.
http://www.wolframalpha.com/
http://www.wolframalpha.com/examples/?src=input
Computation and the Future of Mathematics by Stephen
Wolfram, Oxford's Podcast
Duration: 0:51:50
Added: 15 Jan 2014
Stephen Wolfram, creator of Mathematica and Wolfram Alpha, gives a talk about the future of
mathematics and computation.
http://podcasts.ox.ac.uk/computation-and-future-mathematics
Mloss.org
Our goal is to support a community creating a comprehensive open source machine learning
environment. Ultimately, open source machine learning software should be able to compete with
existing commercial closed source solutions. To this end, it is not enough to bring existing and
freshly developed toolboxes and algorithmic implementations to people's attention. More
importantly the MLOSS platform will facilitate collaborations with the goal of creating a set of
tools that work with one another. Far from requiring integration into a single package, we believe
that this kind of interoperability can also be achieved in a collaborative manner, which is
especially suited to open source software development practices.
https://mloss.org/software/view/501/
Sourceforge
Find, Create, and Publish Open Source Software for free
http://sourceforge.net/directory/os:mac/freshness:recently-updated/?q=machine%20learning
AForge.NET Framework
AForge.NET is a C# framework designed for developers and researchers in the fields of
Computer Vision and Artificial Intelligence - image processing, neural networks, genetic
algorithms, machine learning, robotics, etc.
http://www.aforgenet.com/
cuda-convnet
High-performance C
/CUDA implementation of convolutional neural networks
This is a fast C /CUDA implementation of convolutional (or more generally, feed-forward)
neural networks. It can model arbitrary layer connectivity and network depth. Any directed
acyclic graph of layers will do. Training is done using the back-propagation algorithm.
Fermi-generation GPU (GTX 4xx, GTX 5xx, or Tesla equivalent) required.
https://code.google.com/p/cuda-convnet/
word2vec
Tool for computing continuous distributed representations of words.
This tool provides an efficient implementation of the continuous bag-of-words and skip-gram
architectures for computing vector representations of words. These representations can be
subsequently used in many natural language processing applications and for further research.
https://code.google.com/p/word2vec/
Open Machine Learning Workshop organized by Alekh
Agarwal, Alina Beygelzimer, and John Langford, August 2014
The goal of this workshop is to inform people about open source machine learning systems being
developed, aid the coordination of such projects, and discuss future plans.
http://hunch.net/~nyoml/
Maxim Milakov Software
I am a researcher in machine learning and high-performance computing.
I designed and implemented nnForge - a library for training convolutional and fully connected
neural networks, with CPU and GPU (CUDA) backends.
You will find my thoughts on convolutional neural networks and the results of applying
convolutional ANNs for various classification tasks in the Blog.
http://www.milakov.org/
Alfonso Nieto-Castanon Software
http://www.alfnie.com/software
Lib Skylark
The Sketching based Matrix computations for Machine Learning is a library for matrix
computations suitable for general statistical data analysis and optimization applications.
Many tasks in machine learning and statistics ultimately end up being problems involving
matrices: whether you're finding the key players in the bitcoin market, or inferring where tweets
came from, or figuring out what's in sewage, you'll want to have a toolkit for least-squares and
robust regression, eigenvector analysis, non-negative matrix factorization, and other matrix
computations.
Sketching is a way to compress matrices that preserves key matrix properties; it can be used to
speed up many matrix computations. Sketching takes a given matrix A and produces a sketch
matrix B that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem
with input B, the solution will also be pretty good for input A. For some problems, sketches can
also be used to get faster ways to find high-precision solutions to the original problem. In other
cases, sketches can be used to summarize the data by identifying the most important rows or
columns.
A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where
each row (and/or column) is equally likely to be sampled. This uniform sampling is quick and
easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods
that do yield good sketches.
http://xdata-skylark.github.io/
Mutual Information Text Explorer
The Mutual information Text Explorer is a tool that allows interactive exploration of text
data and document covariates. See the paper or slides for information. Currently, an
experimental system is available.
http://brenocon.com/mte/
Data Science Resources by Jonathan Bower on GitHub
This repo is intended to provide open source resources to facilitate learning or to point
practicing/aspiring data scientists in the right direction. It also exists so that I can keep track of
resources that are/were helpful to me and hopefully for you.
I aim to cover the full spectrum of data science and to hopefully include topics of data science
that aren't either actively covered or easy to find in the open-source world. For instance, I haven't
focused on in-depth machine learning theory since that is well covered. If you are looking for ML
theory I would look to some of the online courses, books or bootcamps. There is a lot of theory
information available online, some is linked lower on this page here, here and other info is
available with many purchasable books.
Keep in mind that this is a constant work in progress. If you have anything to add, any feedback,
or would like to be a contributor - please reach out. If there are any mistakes or typos, be patient
with me, but please let me know.
Lastly, I would add that a large portion of data science is exploratory data analysis and properly
cleaning your data to implement the tools and theory necessary to solve the problem at hand. For
each problem there are many different ways and tools to execute a successful solution - if one
method isn't working re-evaluate, re-work the problem, try another approach and/or reach out to
the community for support. Good luck and I hope this repo helpful!
https://github.com/jonathan-bower/DataScienceResources
Joseph Misiti Blog
A curated list of awesome machine learning frameworks, libraries and software (by language).
Inspired by awesome-php. Other awesome lists can be found in the awesome-awesomeness list.
https://github.com/josephmisiti/awesome-machine-learning
Michael Waskom GitHub repositories
I'm a Ph.D. student in the Department of Psychology at Stanford University, where I work with
Anthony Wagner. I use behavioral, computational, and neuroimaging methods to study cognitive
control and decision making in humans.
Previously, I spent time in John Gabrieli's lab at MIT investigating whether cognition can be
improved through training. I did my undergrad at Amherst College, where I studied philosophy
and neuroscience.
Complementing this research, I have developed a set of software libraries for statistical analysis
and visualization. These libraries aim to make computationally-based research more
reproducible and improve the visual presentation of statistical and neuroimaging results.
https://github.com/mwaskom
Visualizing distributions of data
This notebook demonstrates different approaches to graphically representing distributions of
data, specifically focusing on the tools provided by the seaborn package.
https://github.com/mwaskom/seaborn
Exploring Seaborn and Pandas based plot types in HoloViews
by Philipp John Frederic Rudiger
In this notebook we'll look at interfacing between the composability and ability to generate
complex visualizations that HoloViews provides and the great looking plots incorporated in the
seaborn library. Along the way we'll explore how to wrap different types of data in a number of
Seaborn View types, including:
- Distribution Views
- Bivariate Views
- TimeSeries Views
Additionally we explore how a Pandas dframe can be wrapped in a general purpose View type,
which can either be used to convert the data into standard View types or be visualized directly
using a wide array of plotting options, including:
- Regression plots, correlation plots, box plots, autocorrelation plots, scatter matrices, histograms
or regular scatter or line plots.
http://philippjfr.com/blog/seabornviews/
"Machine Learning: An Algorithmic Perspective" Code by
Stephen Marsland
Remark: I couldn’t open Stephen Marsland Home page.
http://www.amazon.com/Machine-Learning-Algorithmic-Perspective-Recognition/dp/
1466583282
http://www.briolat.org/assets/R/classif/Machine%20learning%20an%20algorithmic
%20perspective(2009).pdf
Sebastian Raschka GitHub Repository & Blog (Great Resources,
everything you need is there!)
https://github.com/rasbt
http://sebastianraschka.com/
Open Source Hong Kong
Open Source Hong Kong (OSHK) is a developer/contributor/user community about open
source software and technology.
http://opensource.hk/
Lamda Group, Nanjing University
Open Source Software
http://lamda.nju.edu.cn/Default.aspx?
Page=Data&NS=&AspxAutoDetectCookieSupport=1#code
GATE, General Architecture for Text Engineering
GATE is...
¥
open source software capable of solving almost any text processing problem
¥
a mature and extensive community of developers, users, educators, students and scientists
¥
a defined and repeatable process for creating robust and maintainable text processing
workflows
¥
in active use for all sorts of language processing tasks and applications, including: voice of
the customer; cancer research; drug research; decision support; recruitment; web mining;
information extraction; semantic annotation
¥
the result of a €multi-million R&D programme running since 1995, funded by
commercial users, the EC, BBSRC, EPSRC, AHRC, JISC, etc.
¥
used by corporations, SMEs, research labs and Universities worldwide
¥
the Eclipse of Natural Language Engineering, the Lucene of Information Extraction,
the ISO 9001 of Text Mining
¥
a world-class team of language processing developers
¥
If you need to solve a problem with text analysis or human language processing you're in
the right place.
https://gate.ac.uk/
CLARIN, Common Language Resources and Technology
Infrastructure
CLARIN is the Common Language Resources and Technology Infrastructure, which aims to
provide easy and sustainable access for scholars in the humanities and social sciences to digital
language data (in written, spoken, video or multimodal form), and advanced tools to discover,
explore, exploit, annotate, analyse or combine them, wherever they are located. CLARIN is
building a networked federation of European data repositories, service centres and centres of
expertise, with single sign-on access for all members of the academic community in all
participating countries. Tools and data from different centres will be interoperable, so that data
collections can be combined and tools from different sources can be chained to perform complex
operations to support researchers in their work.
At this moment the CLARIN infrastructure is still under construction, but a number of
participating centres are already offering access services to data, tools and expertise. On the
services page we show the services accessible at this moment and we explain how and by whom
the various services can be accessed.
http://www.clarin.eu/
FLaReNet, Fostering Language Resources Network
A major condition for the take-off of the field of Language Resources and Language
Technologies is the creation of a shared policy for the next years.
FLaReNet aims at developing a common vision of the area and fostering a European strategy for
consolidating the sector, thus enhancing competitiveness at EU level and worldwide.
By creating a consensus among major players in the field, the mission of FLaReNet is to identify
priorities as well as short, medium, and long-term strategic objectives and provide consensual
recommendations in the form of a plan of action for EC, national organisations and industry.
Through the exploitation of new collaborative modalities as well as workshops and meetings,
FLaReNet will sustain international cooperation and (re)create a wide Language community.
http://www.flarenet.eu/
My Data Science Resources by Viktor Shaumann
With seemingly infinite amount of Data Science resources available online, it is very easy to get
lost. I compiled a collection of practical resources I found to be the most useful on my path of
learning Data Science. This list is continuosly updated with new material.
https://github.com/vshaumann/My-Data-Science-Resources
MISCELLANEOUS
Overleaf (ex WriteLaTeX)
About Overleaf
Overleaf is a collaborative writing and publishing system that makes the whole process of
producing academic papers much quicker for both authors and publishers.
Overleaf is a free service that lets you create, edit and share your scientific ideas easily online
using LaTeX, a comprehensive and powerful tool for scientific writing.
Overleaf has grown rapidly since its launch in 2011, and today there are over 150,000 users from
over 180 countries worldwide who've created over 2 millions projects using the service.
Writelatex Limited, the company behind Overleaf, was founded by John Hammersley and John
Lees-Miller, two mathematicians who worked together on the pioneering Ultra PRT Project and
who were inspired by their own experiences in academia to create a better solution for
collaborative scientific writing.
Overleaf is supported by Digital Science. Digital Science is a technology company serving the
needs of scientific research. Their mission is to provide software that makes research simpler, so
there’s more time for discovery.
Whether at the bench or in a research setting, their range of products help to simplify workflows
and change the way science is done. Digital Science believes passionately that tomorrow's
research will be different and better
than today's.
Their portfolio brands include Altmetric, Labguru, Figshare, ReadCube, ÜberResearch,
BioRAFT and Symplectic. Digital Science is a business division of Macmillan Science and
Education.
https://www.overleaf.com/2070900jhqnyz#/5252162/
Interview of Dr John Lees-Miller by Imperial College London
ACM Student Chapter
https://www.youtube.com/watch?v=kYkN0Yv56bI&spfreload=10
LISA Lab GitHub repository, Université de Montréal
https://github.com/lisa-lab
MILA, Institut des algorithmes d'apprentissage de Montréal,
Montreal Institute for Learning Algorithms
http://www.mila.umontreal.ca/
Vowpal Wabbit GitHub repository by John Langford
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with
techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive
learning.
https://github.com/JohnLangford/vowpal wabbit
http://hunch.net/~vw/
Google-styleguide: Style guides for Google-originated opensource projects
Every major open-source project has its own style guide: a set of conventions (sometimes
arbitrary) about how to write code for that project. It is much easier to understand a large
codebase when all the code in it is in a consistent style.
“Style” covers a lot of ground, from “use camelCase for variable names” to “never use global
variables” to “never use exceptions.” This project holds the style guidelines we use for Google
code. If you are modifying a project that originated at Google, you may be pointed to this page to
see the style guides that apply to that project.
https://github.com/google/styleguide
BIG DATA/CLOUD
COMPUTING, in English
Apache Spark Machine Learning Library
MLlib is a Spark implementation of some common machine learning (ML) functionality, as well
associated tests and data generators. MLlib currently supports four common types of machine
learning problem settings, namely, binary classification, regression, clustering and collaborative
filtering, as well as an underlying gradient descent optimization primitive.
http://spark.apache.org/docs/0.9.1/mllib-guide.html
Ampcamp, Big Data Boot Camp
AMP Camp 5 was held at UC Berkeley and live-streamed online on November 20 and 21, 2014.
Videos and exercises from the event are available on the AMPCamp 5 page.
AMP Camps are Big Data training events organized by the UC Berkeley AMPLab about big data
analytics, machine learning, and popular open-source software projects produced by the
AMPLab. All AMP Camp curricula, and whenever possible videos of instructional talks
presented at AMP Camps, are published here and accessible for free.
About the AMPLab
The UC Berkeley AMPLab works at the intersection of machine learning, cloud computing, and
crowdsourcing; integrating Algorithms, Machines, and People (AMP) to make sense of Big Data.
http://ampcamp.berkeley.edu/5/exercises/
Spark Summit 2013 Videos
https://spark-summit.org/2013/#videos
Spark Summit 2014 Videos
https://spark-summit.org/2014/#videos
Spark Summit 2015 Videos & Slides
https://spark-summit.org/2015/
Spark Summit Training & Videos
https://www.youtube.com/user/TheApacheSpark/playlists
Databricks Videos
Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache Spark.
We’ve been working for the past six years on cutting-edge systems to extract value from Big Data.
We believe that Big Data is a huge opportunity that is still largely untapped, and we’re working to
revolutionize what you can do with it.
Open Source Commitment
Apache Spark is 100% open source, and at Databricks we are fully committed to maintaining this
model. We believe that no computing platform will win in the Big Data space unless it is fully
open source.
Spark has one of the largest open source communities in Big Data, with over 200 contributors
from 50 organizations. Databricks works closely with the community to maintain this
momentum.
https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q- UUbA/videos
SF Scala & SF Bay Area Machine Learning, Joseph Bradley:
Decision Trees on Spark
Joseph talks about Machine Learning with Spark, focusing on the decision tree and (upcoming)
random forest implementations in MLlib. Spark has been established as a natural platform for
iterative ML algorithms, and trees provide a great example. This talk aims both to give insight
into the underlying implementation and to highlight best practices for using MLlib.
http://functional.tv/post/98342564544/sfscala-sfbaml-joseph-bradley-decision-trees-on-spark
Slides
https://speakerdeck.com/jkbradley/mllib-decision-trees-at-sf-scala-baml-meetup
Apache Mahout ML library
The Apache Mahout™ project's goal is to build a scalable machine learning library.
Currently Mahout supports mainly three use cases: Recommendation mining takes users'
behavior and from that tries to find items users might like. Clustering takes e.g. text documents
and groups them into groups of topically related documents. Classification learns from exisiting
categorized documents what documents of a specific category look like and is able to assign
unlabelled documents to the (hopefully) correct category.
https://mahout.apache.org/
Apache Mahout on Javaworld
Enjoy machine learning with Mahout on Hadoop, 2014
Mahout brings the power of scalable processing to Hadoop's huge data sets
http://www.javaworld.com/article/2241046/big-data/enjoy-machine-learning-with-mahout-onhadoop.html
Know this right now about Hadoop, 2014
From core elements like HDFS and YARN to ancillary tools like Zookeeper, Flume, and Sqoop,
here's your cheat sheet and cartography of the ever expanding Hadoop ecosystem.
http://www.javaworld.com/article/2158789/data-storage/know-this-right-now-abouthadoop.html
MapReduce programming with Apache Hadoop, 2008
Process massive data sets in parallel on large clusters
http://www.javaworld.com/article/2077907/open-source-tools/mapreduce-programming-withapache-hadoop.html
Hadoop Users Group UK
Recordings from meetups of the UK Hadoop Users Group. These will be a combination of tech
talks, panel sessions and other events that we run.
https://www.youtube.com/channel/UCjo2p6jTA0joX8HoUeHFcDg?spfreload=10
Deeplearning4j
Deeplearning4j is the first commercial-grade deep learning library written in Java. It is meant to
be used in business environments, rather than as a research tool for extensive data exploration.
Deeplearning4j is most helpful in solving distinct problems, like identifying faces, voices, spam or
e-commerce fraud.
Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration. By
following its conventions, you get an infinitely scalable deep-learning architecture. The
framework has a domain-specific language (DSL) for neural networks, to turn their multiple
knobs.
Deeplearning4j includes a distributed deep-learning framework and a normal deeplearning framework; i.e. it runs on a single thread as well. Training takes place in the cluster,
which means it can process massive amounts of data. Nets are trained in parallel via iterative
reduce.
The distributed framework is made for data input and neural net training at scale, and its output
should be highly accurate predictive models.
By following the links at the bottom of each page, you will learn to set up, and train with sample
data, several types of deep-learning networks. These include single- and multithread networks,
Restricted Boltzmann machines, deep-belief networks and Stacked Denoising Autoencoders.
For a quick introduction to neural nets, please see our overview.
http://deeplearning4j.org/
Udacity opencourseware "Intro to Hadoop and MapReduce"
Course Summary
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed
computing. Learn the fundamental principles behind it, and how you can use its power to make
sense of your Big Data.
Why Take This Course?
•
How Hadoop fits into the world (recognize the problems it solves)
•
Understand the concepts of HDFS and MapReduce (find out how it solves the problems)
•
Write MapReduce programs (see how we solve the problems)
•
Practice solving problems on your own
https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617
Storm Apache
Apache Storm is a free and open source distributed realtime computation system. Storm makes it
easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop
did for batch processing. Storm is simple, can be used with any programming language, and is a
lot of fun to use!
http://storm.apache.org/
http://storm.apache.org/documentation/Tutorial.html
Scaling Apache Storm by Taylor Goetz
http://www.slideshare.net/ptgoetz
https://github.com/ptgoetz
Michael Viogiatzis Blog
How to spot first stories on Twitter using Storm
As a first blog post, I decided to describe a way to detect first stories (a.k.a new events) on Twitter
as they happen. This work is part of the Thesis I wrote last year for my MSc in Computer
Science in the University of Edinburgh.You can find the document here.
http://micvog.com/2013/09/08/storm-first-story-detection/
Prediction IO
BUILD SMARTER SOFTWARE with Machine Learning
PredictionIO is an open source machine learning server for software developers to create
predictive features, such as personalization, recommendation and content discovery.
https://prediction.io/
https://hacks.mozilla.org/2014/04/introducing-predictionio/
https://www.youtube.com/channel/UCN0jVSCIEh7eeuWXIuo316g
PredictionIO tutorial - Thomas Stone - PAPIs.io '14
PredictionIO is an open source machine learning server for software developers to create
predictive features. Traditionally, this included personalization, recommendation and content
discovery in domains such as e-commerce and media. The latest version of PredictionIO will
open our platform for many more use cases such as churn analysis, trend detection and more!
Allowing developers to use the power of machine learning for any web and mobile app. We will
also discuss the new software design pattern DASE for building machine learning engines on top
of PredictionIO's scalable infrastructure. It's time to see what an open source community can
build re-imaging software with machine learning.
https://www.youtube.com/watch?v=zeGnILRIdUk&spfreload=10
Container Cluster Manager
Kubernetes builds on top of Docker to construct a clustered container scheduling service. The
goals of the project are to enable users to ask a Kubernetes cluster to run a set of containers. The
system will automatically pick a worker node to run those containers on.
As container based applications and systems get larger, some tools are provided to facilitate sanity.
This includes ways for containers to find and communicate with each other and ways to work
with and manage sets of containers that do similar work.
When looking at the architecture of the system, we'll break it down to services that run on the
worker node and services that play a "master" role.
https://github.com/GoogleCloudPlatform/kubernetes?utm source
Domino Data Labs
Domino is a platform for modern data scientists using Python, R, Matlab, and more.
Use our cloud-hosted infrastructure to securely run your code on powerful hardware with a single
command without any changes to your code.
If you have your own infrastructure, our Enterprise offering provides powerful, easy-to-use
cluster management functionality behind your firewall.
Special offer for The Machine Learning Salon's readers:
Machine Learning Salon readers can get $50 worth of compute credits when they sign up for
Domino. Domino lets you run your analyses on powerful cloud hardware in one step
without
any setup or changes to your code. Sign up here, or email support@dominoup.zendesk.com and
tell them you are a Machine Learning Salon reader.
https://www.dominodatalab.com/
Data Science Central
Data Science Central is the industry's online resource for big data practitioners. From Analytics
to Data Integration to Visualization, Data Science Central provides a community experience that
includes a robust editorial platform, social interaction, forum-based technical support, the latest
in technology, tools and trends and industry job opportunities.
http://www.datasciencecentral.com/
Amazon Web Services Videos
https://www.youtube.com/user/AmazonWebServices/playlists
Google Cloud Computing Videos
https://cloud.google.com/docs/videos
VLAB: Deep Learning: Intelligence from Big Data, Stanford
Graduate School of Business
https://www.youtube.com/watch?v=czLI3oLDe8M&spfreload=10
Machine Learning and Big Data in Cyber Security Eyal Kolman
Technion Lecture
https://www.youtube.com/watch?v=G2BydTwrrJk&spfreload=10
Chaire Machine Learning Big Data, Telecom Paris Tech (Videos
in French)
Télécom ParisTech a organisé les premières rencontres de la Chaire de recherche Machine
Learning for Big data, le 26 novembre 2014, avec ses partenaires Fondation télécom, Criteo, PSA
Peugeot Citroën, Safran.
http://www.dailymotion.com/video/x2cti71 chaire-ml-big-data-premieres-rencontres school
https://www.youtube.com/user/TelecomParisTech1/search?query=big data
An Architecture for Fast and General Data Processing on Large
Clusters by Matei Zaharia, 2014
The past few years have seen a major change in computing systems, as growing data volumes and
stalling processor speeds require more and more applications to scale out to distributed systems.
Today, a myriad data sources, from the Internet to business operations to scientific instruments,
produce large and valuable data streams. However, the processing capabilities of single machines
have not kept up with the size of data, making it harder and harder to put to use. As a result, a
grow- ing number of organizations not just web companies, but traditional enterprises and
research labs need to scale out their most important computations to clusters of hundreds of
machines.
At the same time, the speed and sophistication required of data processing have grown. In
addition to simple queries, complex algorithms like machine learning and graph analysis are
becoming common in many domains. And in addition to batch processing, streaming analysis of
new real-time data sources is required to let organizations take timely action. Future computing
platforms will need to not only scale out traditional workloads, but support these new applications
as well.
This dissertation proposes an architecture for cluster computing systems that can tackle emerging
data processing workloads while coping with larger and larger scales. Whereas early cluster
computing systems, like MapReduce, handled batch processing, our architecture also enables
streaming and interactive queries, while keeping the scalability and fault tolerance of previous
systems. And whereas most deployed systems only support simple one-pass computations (e.g.,
aggregation or SQL queries), ours also extends to the multi-pass algorithms required for more
complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike the specialized
systems proposed for some of these workloads, our architecture allows these computations to be
combined, enabling rich new applications that intermix, for example, streaming and batch
processing, or SQL and complex analytics.
We achieve these results through a simple extension to MapReduce that adds primitives for data
sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to efficiently
capture a wide range of workloads. We implement RDDs in the open source Spark system,
which we evaluate using both synthetic benchmarks and real user applications. Spark matches or
exceeds the performance of specialized systems in many application domains, while offering
stronger fault tolerance guarantees and allowing these workloads to be combined. We explore the
generality of RDDs from both a theoretical modeling perspective and a practical perspective to
see why this extension can capture a wide range of previously disparate workloads.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf
Big Data Requires Big Visions For Big Change | Martin Hilbert
| TEDxUCL
At the University of California, Davis, Martin thinks about the fundamental theories of how
digitization affects society.
During his 15 years at the United Nations Secretariat, Martin assisted governments to take
advantage of the digital revolution. When the ‘big data’ age arrived, his research was the first to
quantify the historical growth of how much technologically mediated information there actually
is in the world. He is convinced that ‘big data’ is a huge opportunity for making the world a
better place. After joining the faculty of the University of California, Davis, he had more time to
think more deeply about the theoretical underpinning and fundamental limitations of the ‘big
data’ revolution. When TEDxUCL asked him if there is a limit to the power of data, he
answered with the fundamental limitation to all empirical science. The fundamental limit of ‘big
data’ has to do with social change and how we envision the future. Luckily, the digital age also
provides solutions for fine-tuning our future visions. Martin holds doctorates in Economics and
Social Sciences, and in Communication, and has provided hands-on technical assistance to
Presidents, government experts, legislators, diplomats, NGOs, and companies in over 20
countries.
https://www.youtube.com/watch?v=UXef6yfJZAI&spfreload=10
http://www.martinhilbert.net/
Ethical Quandary in the Age of Big Data | Justin Grace |
TEDxUCL
This talk was given at a local TEDx event, produced independently of the TED Conferences.
Data is now everywhere. The ‘internet era’ has now passed and we are entering the era of data.
Data use and misuse can lead to both powerful positive change or disaster. Here I discuss the
questions we should ask about data and present three case studies where organisations have
generated controversy from their data practices. I finish by touching on what we can do to take
ownership of our data.
Justin is a freelance data scientist who has worked in academia, technology, healthcare and most
recently digital media with the Guardian. He is passionate about all things data and
understanding how its use and misuse shapes the world we live in and how this affects our
relationships with organisations and each other.
Justin is a freelance data scientist who has worked in academia, technology, healthcare and most
recently digital media with the Guardian. He is passionate about all things data and
understanding how its use and misuse shapes the world we live in and how this affects our
relationships with organisations and each other.
https://www.youtube.com/watch?v=mVZ78kdduyY&spfreload=10
Big Data & Dangerous Ideas | Daniel Hulme | TEDxUCL
This talk was given at a local TEDx event, produced independently of the TED Conferences.
This is an illumining and animated talk about how Data and Artificial Intelligence effect our
every day lives. It provides a framework for anyone to understand data driven decision making
process, and raises critical moral, ethical and legal questions that society needs to address to
ensure that our rights are kept safe and that we safeguard our very own existence.
Daniel is the Founder and CEO of Satalia (NPComplete Ltd), a spin-out of UCL that provides a
unique algorithmic technology and professional services to solve industries data-driven decision
problems. He is passionate about emerging technology and regularly speaks at events with
interests in Algorithms, Optimisation, Analytics, Big Data and the Future Internet. Daniel has
been awarded a Masters in Computer Science with Machine Learning and Doctorate in
Computational Complexity from UCL. He is the Director of UCL Business Analytics MSc, and
has Senior Researcher and Lecturing positions in Computer Science and Management Science
at UCL and Pearson College. He is a Visiting Fellow of the Big Innovation Centre, and has
advisory and executive positions across world-wide companies in the area of Education,
Analytics, Big Data, Data-driven Decision Making and Open-Innovation. He holds an
international Kauffman Global Entrepreneur Scholarship and actively promotes
entrepreneurship and technology innovation across the globe.
https://www.youtube.com/watch?v=tLQoncvCKxs&spfreload=10
http://www0.cs.ucl.ac.uk/staff/D.Hulme/
List of good free Programming and Data Resources,
BITBOOTCAMP
We are a group of data enthusiasts with years of experience working at leading financial
companies on Wall Street. In Jan 2014, we started Bit Bootcamp: an intensive and immersive big
data boot camp to spread the knowledge and to address the shortage of good talent in the
industry.
The motivation for the bootcamp comes from our own difficulties faced while we were trying to
hire new talent. No matter now much money we threw at the problem, we could not find people
with the right skills. Then we figured we might as well train them ourselves.
http://www.bitbootcamp.com/#!resources/ctx5
BIG Data, Medical Imaging and Machine Intelligence by
Professor H.R.Tizhoosh at the University of Waterloo
This is a talk by Professor H.R.Tizhoosh at the University of Waterloo, Ontario, Canada
(January 21, 2015).
https://www.youtube.com/watch?v=Pkk6Lad2N5g&spfreload=
Session 6: Science in the cloud: big data and new technology
The way science is undertaken has changed dramatically in the past 10 15 years, and it is set to
change even more in the coming decade. New technologies, such as online databases, virtual
machines, cloud computing and machine learning are becoming commonplace. This session will
explore such innovations and their role in maximising the scientific value from astronomy data, in
particular from the next generation of telescopes and simulations.
Chair
Associate Professor Darren Croton
Swinburne University of Technology
https://www.youtube.com/watch?v=xHoMI1nC8 4&spfreload=10
MapReduce for C: Run Native Code in Hadoop by Google
Open Source Software
We are pleased to announce the release of MapReduce for C (MR4C), an open source
framework that allows you to run native code in Hadoop.
MR4C was originally developed at Skybox Imaging to facilitate large scale satellite image
processing and geospatial data science. We found the job tracking and cluster management
capabilities of Hadoop well-suited for scalable data handling, but also wanted to leverage the
powerful ecosystem of proven image processing libraries developed in C and C . While many
software companies that deal with large datasets have built proprietary systems to execute native
code in MapReduce frameworks, MR4C represents a flexible solution in this space for use and
development by the open source community.
http://google-opensource.blogspot.sg/2015/02/mapreduce-for-c-run-native-code-in.html
Machine Learning & Big Data at Spotify with Andy Sloane, Big
Data Madison Meetup
Back for a return engagement, Spotify engineer Andy Sloane will cover how they use machine
learning at music recommendation service Spotify. He will also discuss some large database
"tricks" Spotify uses to do real-time recommendations and fingerprint matching. Please join us!
https://www.youtube.com/watch?v=MX ARH-KoDg&spfreload=10
http://www.meetup.com/BigDataMadison/events/216561502/
Slides
http://www.slideshare.net/AndySloane/machine-learning-spotify-madison-big-data-meetup
Hands on tutorial on Neo4J with Max De Marzi, Big Data
Madison Meetup
Back for a return engagement, developer Max De Marzi is coming from Chicago to give a
tutorial on Neo4J, the popular graph database application.
https://www.youtube.com/watch?v=l4EmLFaMxkA
TED Talk: What do we do with all this big data? by Susan
Etlinger
Does a set of data make you feel more comfortable? More successful? Then your interpretation
of it is likely wrong. In a surprisingly moving talk, Susan Etlinger explains why, as we receive
more and more data, we need to deepen our critical thinking skills. Because it's hard to move
beyond counting things to really understanding them.
https://www.ted.com/talks/susan etlinger what do we do with all this big data
Big Data's Big Deal by Viktor Mayer-Schonberger, Oxford's
Podcast
Duration: 0:44:14
Added: 20 Nov 2014
Big Data promises to change all sectors of our economy, and deeply affect our society. But
beyond the current hype, what are Big Data's salient qualities, and do they warrant the high
hopes? These are some of the questions that this talk addresses.
http://podcasts.ox.ac.uk/big-datas-big-deal-0
BID Data Project - Big Data Analytics with Small Footprint
Welcome to the BID Data Project! Here you will find resources for the fastest Big Data tools on
the Web. See our Benchmarks on github. BIDMach running on a single GPU-equipped host
holds the records for many common machine learning problems, on single nodes or clusters.
Try It! BIDMach is an interactive environment designed to make it extremely easy to build and
use machine learning models. BIDMach runs on Linux, Windows 7&8, and Mac OS X, and we
have a pre-loaded Amazon EC2 instance. See the instructions in the Download Section.
Develop with it. BIDMach includes core classes that take care of managing data sources,
optimization and distributing data over CPUs or GPUs. It’s very easy to write your own models
by generalizing from the models already included in the Toolkit.
Explore. Our Publications Section includes published reports on the project, and the topics of
forthcoming papers.
Contribute. BIDMach includes many popular machine learning algorithms. But there is much
more work to do. In progress we have Random Forests, extremely fast Gibbs samplers for
Bayesian graphical models, distributed Deep Learning networks, and graph algorithms. Ask us
for an unpublished report on these topics. Please use Github’s issues page for bug reports or
suggestions:
Lightning Overview
The BID Data Suite is a collection of hardware, software and design patterns that enable fast,
large-scale data mining at very low cost.
http://bid2.berkeley.edu/bid-data-project/
SF Big Analytics and SF Machine learning meetup: Machine
Learning at the Limit by Prof. John Canny
Machine Learning at the Limit
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is
driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic
improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs
to achieve two- to three-orders of magnitude improvements over other toolkits on single
machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/
MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms
these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms)
which do require cluster computing, we have developed a rooflined network primitive called
"Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and
empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are
great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very
general tool for inference, but is typically much slower than alternatives. SAME (State
Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal
parameter estimation. We show that it has high parallelism, and a fast GPU implementation.
Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running
time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are
extending this approach to general graphical models, an area where there is currently a void of
(practically) fast tools. It seems at least plausible that a general-purpose solution based on these
techniques can closely approach the performance of custom algorithms.
https://www.youtube.com/watch?v=smMy1lIG9WQ&spfreload=10
COMPETITIONS, in English
Angry Birds AI Competition
Here you will find all the information about upcoming and previous Angry Birds AI
Competitions. The task of this competition is to develop a computer program that can
successfully play Angry Birds. The long term goal is to build an intelligent Angry Birds
playing agent that can play new levels better than the best human players. http://www.aibirds.org/
ChaLearn
Mission:
Machine Learning is the science of building hardware or software that can achieve tasks by
learning from examples. The examples often come as {input, output} pairs. Given new inputs a
trained machine can make predictions of the unknown output.
Examples of machine learning tasks include:
•
automatic reading of handwriting
•
assisted medical diagnosis
•
automatic text classification (classification of web pages; spam filtering)
•
financial predictions
We organize challenges to stimulate research in this field. The web sites of past challenges
remain open for post-challenge submission as ever-going benchmarks.
ChaLearn is a tax-exempt organization under section 501(c)(3) of the US IRS code. DLN:
17053090370022.
http://www.chalearn.org/
ChaLearn Automatic Machine Learning Challenge (AutoML)
https://www.codalab.org/competitions/2321
ImageNet Large Scale Visual Recognition Challenge 2015
(ILSVRC2015)
ImageNet is an image database organized according to the WordNet hierarchy (currently only
the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of
images. Currently we have an average of over five hundred images per node. We hope ImageNet
will become a useful resource for researchers, educators, students and all of you who share our
passion for pictures.
http://image-net.org/challenges/LSVRC/2015/
Kaggle
Kaggle is the world's largest community of data scientists. They compete with each other to solve
complex data science problems, and the top competitors are invited to work on the most
interesting and sensitive business problems from some of the world’s biggest companies through
Masters competitions.
https://www.kaggle.com/competitions
Kaggle Competition Past Solutions
We learn more from code, and from great code. Not necessarily always the 1st ranking solution,
because we also learn what makes a stellar and just a good solution. I will post solutions I came
upon so we can all learn to become better!
I collected the following source code and interesting discussions from the Kaggle held
competitions for learning purposes. Not all competitions are listed because I am only manually
collecting them, also some competitions are not listed due to no one sharing. I will add more as
time goes by. Thank you.
http://www.chioka.in/kaggle-competition-solutions/
Kaggle Connectomics Winning Solution Research Article
Simple connectome inference from partial correlation statistics in calcium imaging
http://arxiv.org/abs/1406.7865
Solution to the Galaxy Zoo Challenge by Sander Dieleman
http://benanne.github.io/2014/04/05/galaxy-zoo.html
https://github.com/benanne/kaggle-galaxies
Winning 2 Kaggle in class competitions on spam
http://mlwave.com/winning-2-kaggle-in-class-competitions-on-spam/
Matlab Benchmark for Packing Santa’s Sleigh translated in
Python
http://beatingthebenchmark.blogspot.co.uk/search?updatedmin=2013-01-01T00:00:00-08:00&updated-max=2014-01-01T00:00:00-08:00&max-results=4
Machine learning best practices we've learned from hundreds of
competitions - Ben Hamner (Kaggle)
Ben Hamner is Chief Scientist at Kaggle, leading its data science and development teams. He is
the principal architect of many of Kaggle's most advanced machine learning projects including
current work in Eagle Ford and GE's flight arrival prediction and optimization modeling.
https://www.youtube.com/watch?v=9Zag7uhjdYo
TEDx San Francisco, Jeremy Howard talk (Connecting Devices
with Algorithms)
http://tedxsf.org/videos/#tedxsf-connected-reality
CrowdANALYTICS
https://crowdanalytix.com/community
Challenges for governmental applications
https://www.challenge.gov/list/
InnoCentive Challenge Center
https://www.innocentive.com/ar/challenge/browse
TunedIT
http://tunedit.org/
Ants, AI Challenge, sponsored by Google, 2011
The AI Challenge is all about creating artificial intelligence, whether you are a beginning
programmer or an expert. Using one of the easy-to-use starter kits, you will create a computer
program (in any language) that controls a colony of ants which fight against other colonies for
domination.
http://ants.aichallenge.org/
International Collegial Programming Contest
The ACM International Collegiate Programming Contest (ICPC) is the premiere global
programming competition conducted by and for the world’s universities. The competition
operates under the auspices of ACM, is sponsored by IBM, and is headquartered at Baylor
University. For nearly four decades, the ICPC has grown to be a game- changing global
competitive educational program that has raised aspirations and performance of generations of
the world’s problem solvers in the computing sciences and engineering.
http://icpc.baylor.edu/welcome.icpc
Dream challenges
The Dialogue on Reverse Engineering Assessment and Methods (DREAM) project is an initiative
to advance the field of systems biology through the organization of Challenges to foster the
development of predictive models that allow scientists to better understand human disease.
Challenges engage broad and diverse communities of scientists to competitively solve a specific
problem in a given time period. The concept fosters collaboration between scientists through
shared data and approaches.
DREAM has developed by “Challenge” concept by launching 27 successful challenges over the
past seven years. Sage Bionetworks and DREAM merged in early 2013 in order to develop
Challenges engage a broader participation of the research community in open science projects
hosted on Synapse, and that provide a meaningful impact to both discovery and clinical research.
By presenting the research community with well-formulated questions that usually involve
complex data, we effectively enable the sharing and improvement of predictive models,
accelerating many-fold the transformation of this data into useful scientific knowledge. Our
ultimate goal is to foster collaborations of like-minded researchers that together will find the
solution for vexing problems that matter most to citizens and patients.
https://www.synapse.org/#!Wiki:syn1929437/ENTITY
Texata
TEXATA
The World’s Big Data Analytics Showdown. For Business.
TEXATA 2015 is the annual Big Data Analytics World Championships for Business and
Enterprise. Thousands of the best and brightest professionals and students from over 100
countries working across Computer Science, Maths, Technology, Engineering and Analytical
disciplines compete to develop and apply their skills to real-world business case studies and
challenges. The competition involves two online qualification rounds (4 hours each) and Live
World Finals in Austin, Texas. TEXATA 2015 is a World Championship Event independently
organized and administered by the Professional Services Champions League (PSCL).
http://www.texata.com/
IoT World Forum Young Women's Innovation Grand Challenge
NEWSFLASH!
It is with great pleasure that we announce the twenty (20) semi-finalists of the 2015 IoT World
Forum Young Women’s Innovation Grand Challenge.
Our semi-finalists are listed here. We wish all our semi-finalists good luck as they prepare for the
contest finals. Check back on July 30th to see who made the finals!
The IoT World Forum Young Women’s Innovation Grand Challenge is a global innovation
challenge open to young women between the ages of 13-18. The aim of the challenge is to
recognize, promote, and reward young innovators as they come up with new uses for Internet of
Things technologies.
What is a problem you see today or expect to emerge in the next 5 years? How can connecting
more devices and everyday objects to the internet or other networks help to solve this problem? If
you’re a student who likes to take a creative approach to projects this challenge is for you! Use
your skills to help envision new solutions that can be enabled with Internet of Things
technologies both now and in the future.
The Challenge: Your goal is to come up with new ideas on how technologies from the Internet of
Things can improve education, healthcare, manufacturing, energy, retail, transportation, smart
cities or find new solutions that can cut through many industries.
http://iotchallenge-cisco.younoodle.com/
COMPETITIONS, in French
Coming soon …
COMPETITIONS, in Russian
Russian AI Cup - Competition Programming Artificial
Intelligence
Russian AI Cup - open competition for programming artificial intelligence. Try your hand at
programming strategy game! It's easy, clear and fun!
Championship third Russian AI Cup called CodeHockey. You have to program the artificial
intelligence of the players for the team. Your strategy will compete together in the sandbox and
the championship. You can use any of the programming languages: C
, Java, C #, Python,
Pascal, or Ruby. Sandbox is now open. Good luck!
To participate in the competition are invited novice programmers - students and schoolchildren,
and professionals. It does not require any special knowledge, fairly basic programming skills.
http://russianaicup.ru/
OPEN DATASET, in English
Friday Lunch time Lectures at the Open Data Institute, Videos,
slides and podcasts (not to be missed!)
The ODI Friday lunchtime lecture series is now available to listen to and download as a podcast
on iTunes or via the RSS feed.
Friday lunchtime lectures are for everyone and free to attend. You bring your lunch, we provide
tea and coffee, an interesting talk, and enough time to get back to your desk.
They run from 1pm-1.45pm, with informal networking until 2pm, weekly during UK school
term-times. Each lecture runs for around 20 minutes, leaving time for questions afterwards. The
lectures do not require any specialist knowledge, but are focused around communicating the
meaning and impact of open data in all areas of life.
http://theodi.org/lunchtime-lectures
Open data Institute: Certify your open data
What does a certificate look like?
It's a badge that links to a description of your open data. The description explores things like how
often it's updated, what format it's in, who and where it came from.
https://certificates.theodi.org/
The Text REtrieval Conference (TREC) Datasets
The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards
and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the
TIPSTER Text program. Its purpose was to support research within the information retrieval
community by providing the infrastructure necessary for large-scale evaluation of text retrieval
methodologies. In particular, the TREC workshop series has the following goals:
•
to encourage research in information retrieval based on large test collections;
•
to increase communication among industry, academia, and government by creating an
open forum for the exchange of research ideas;
•
to speed the transfer of technology from research labs into commercial products by
demonstrating substantial improvements in retrieval methodologies on real-world problems; and
•
to increase the availability of appropriate evaluation techniques for use by industry and
academia, including development of new evaluation techniques more applicable to current
systems.
TREC is overseen by a program committee consisting of representatives from government,
industry, and academia. For each TREC, NIST provides a test set of documents and questions.
Participants run their own retrieval systems on the data, and return to NIST a list of the retrieved
top-ranked documents. NIST pools the individual results, judges the retrieved documents for
correctness, and evaluates the results. The TREC cycle ends with a workshop that is a forum for
participants to share their experiences.
This evaluation effort has grown in both the number of participating systems and the number of
tasks each year. Ninety-three groups representing 22 countries participated in TREC 2003. The
TREC test collections and evaluation software are available to the retrieval research community
at large, so organizations can evaluate their own retrieval systems at any time. TREC has
successfully met its dual goals of improving the state-of-the-art in information retrieval and of
facilitating technology transfer. Retrieval system effectiveness approximately doubled in the first
six years of TREC.
TREC has also sponsored the first large-scale evaluations of the retrieval of non-English
(Spanish and Chinese) documents, retrieval of recordings of speech, and retrieval across multiple
languages. TREC has also introduced evaluations for open-domain question answering and
content-based retrieval of digital video. The TREC test collections are large enough so that they
realistically model operational settings. Most of today's commercial search engines include
technology first developed in TREC.
http://trec.nist.gov/data.html
HDX Humanitarian Data Exchange
What is HDX?
The goal of the Humanitarian Data Exchange (HDX) is to make humanitarian data easy to find
and use for analysis. We are working on three elements that will eventually combine into an
integrated data platform.
Repository
The HDX repository, where data providers can upload their raw data spreadsheets for others to
find and use.
Analytics
HDX analytics, a database of high-value data that can be compared across countries and crises,
with tools for analysis and visualisation.
Standards
Standards to help share humanitarian data through the use of a consensus Humanitarian
Exchange Language.
https://data.hdx.rwlabs.org/dataset
World Data Bank
Explore. Create. Share: Development Data
DataBank is an analysis and visualisation tool that contains collections of time series data on a
variety of topics. You can create your own queries; generate tables, charts, and maps; and easily
save, embed, and share them.
The World Bank Group has set two goals for the world to achieve by 2030:
•
End extreme poverty by decreasing the percentage of people living on less than $1.25 a
day to no more than 3%
•
Promote shared prosperity by fostering the income growth of the bottom 40% for every
country
The World Bank is a vital source of financial and technical assistance to developing countries
around the world. We are not a bank in the ordinary sense but a unique partnership to reduce
poverty and support development. The World Bank Group comprises five institutions managed
by their member countries.
Established in 1944, the World Bank Group is headquartered in Washington, D.C. We have more
than 10,000 employees in more than 120 offices worldwide.
http://databank.worldbank.org/data/home.aspx
US Dataset
The home of the U.S. Government’s open data
Here you will find data, tools, and resources to conduct research, develop web and mobile
applications, design data visualizations, and more.
http://www.data.gov/
US City Open Data Census
http://us-city.census.okfn.org/
Machine Learning repository
The UCI Machine Learning Repository is a collection of databases, domain theories, and data
generators that are used by the machine learning community for the empirical analysis of
machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha
and fellow graduate students at UC Irvine. Since that time, it has been widely used by students,
educators, and researchers all over the world as a primary source of machine learning data sets.
As an indication of the impact of the archive, it has been cited over 1000 times, making it one of
the top 100 most cited "papers" in all of computer science. The current version of the web site
was designed in 2007 by Arthur Asuncion and David Newman, and this project is in
collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from
the National Science Foundation is gratefully acknowledged.
https://archive.ics.uci.edu/ml/datasets.html
IMAGENET
ImageNet is an image database organized according to the WordNet hierarchy (currently only
the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of
images. Currently we have an average of over five hundred images per node. We hope ImageNet
will become a useful resource for researchers, educators, students and all of you who share our
passion for pictures.
Who uses ImageNet?
We envision ImageNet as a useful resource to researchers in the academic world, as well as
educators around the world.
Does ImageNet own the images? Can I download the images?
No, ImageNet does not own the copyright of the images. ImageNet only provides thumbnails
and URLs of images, in a way similar to what image search engines do. In other words,
ImageNet compiles an accurate list of web images for each synset of WordNet. For researchers
and educators who wish to use the images for non-commercial research and/or educational
purposes, we can provide access through our site under certain conditions and terms. For details
click here
http://www.image-net.org/
Stanford Large Network Dataset Collection
Social networks : online social networks, edges represent interactions between people
Networks with ground-truth communities : ground-truth network communities in social and
information networks
Communication networks : email communication networks with edges representing
communication
Citation networks : nodes represent papers, edges represent citations
Collaboration networks : nodes represent scientists, edges represent collaborations (co-authoring
a paper)
Web graphs : nodes represent webpages and edges are hyperlinks
Amazon networks : nodes represent products and edges link commonly co-purchased products
Internet networks : nodes represent computers and edges communication
Road networks : nodes represent intersections and edges roads connecting the intersections
Autonomous systems : graphs of the internet
Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)
Location-based online social networks : Social networks with geographic check-ins
Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia
Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets
Online communities : Data from online communities such as Reddit and Flickr
Online reviews : Data from online review systems such as BeerAdvocate and Amazon
Information cascades : ...
http://snap.stanford.edu/data/
Deep Learning datasets
Deep Learning is a new area of Machine Learning research, which has been introduced with the
objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.
This website is intended to host a variety of resources and pointers to information about Deep
Learning. In these pages you will find
•
a reading list,
•
links to software,
•
datasets,
•
a list of deep learning research groups and labs,
•
a list of announcements for deep learning related jobs (job listings),
•
as well as tutorials and cool demos.
http://deeplearning.net/datasets/
Open Government Data (OGD) Platform India
https://data.gov.in/
Yahoo Datasets
We have various types of data available to share. They are categorized into Ratings, Language,
Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant
data and resources available via the Yahoo! Developer Network.
http://webscope.sandbox.yahoo.com/catalog.php
Windows Azure Marketplace
One-Stop Shop for Premium Data and Applications
Hundreds of Apps, Thousands of Subscriptions, Trillions of Data Points
https://datamarket.azure.com/browse/data?price=free
Amazon Public Data Sets
Public Data Sets on AWS provides a centralized repository of public data sets that can be
seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at
no charge for the community, and like all AWS services, users pay only for the compute and
storage they use for their own applications. Learn more about Public Data Sets on AWS and visit
the Public Data Sets forum.
http://aws.amazon.com/datasets/
Wikipedia: Database Download
Wikipedia offers free copies of all available content to interested users. These databases can be
used for mirroring, personal use, informal backups, offline use or database queries (such as for
Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons
Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License
(GFDL). Images and other files are available under different terms, as detailed on their
description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights.
https://en.wikipedia.org/wiki/Wikipedia:Database download
Gutenberg project (Free books available in different format,
useful for NLP)
Project Gutenberg offers 45,541 free ebooks to download. (source the 5 h June 2014)
https://en.wikipedia.org/wiki/Wikipedia:Database download
Freebase
Use Freebase data
Freebase data is free to use under an open license. You can:
Query Freebase using our Search, Topic, or MQL APIs
Download our weekly data dumps
http://www.freebase.com/
Datamob Data
http://datamob.org/datasets
Reddit Datasets
http://www.reddit.com/r/datasets/
100+ Interesting Data Sets for Statistics
Summary: Looking for interesting data sets? Here's a list of more than 100 of the best stuff, from
dolphin relationships to political campaign donations to death row prisoners.
http://rs.io/100-interesting-data-sets-for-statistics/
Data portal of the City of Chicago
https://data.cityofchicago.org/
Gold mine where we can find data set such as names, salaries, positions of all persons working for
Chicago City!
https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-andPosition-Title/xzkq-xp2w
Data portal of the City of Seattle
https://data.seattle.gov/browse
Data portal of the City of LA
https://data.lacity.org/
California Department of Water Resources
DWR has many programs and data tools to collect and disseminate information on water
resources.
All Water Data Topics…
http://www.water.ca.gov/nav/index.cfm?id=106
CALIFORNIA DATA EXCHANGE CENTER (CDEC)
With the cooperation of over 140 other agencies, the CDEC provides real-time, forecast, and
historical hydrologic data. This data includes water discharge in rivers, water storage in
reservoirs, precipitation accumulation, and water content in snow pack, primarily focused in
flood management. However, the data is also helpful for determining general water availability
and natural supply trends.
More about CDEC
http://cdec.water.ca.gov/
CALIFORNIA IRRIGATION MANAGEMENT INFORMATION SYSTEM (CIMIS)
CIMIS is a network of over 120 automated weather stations in California. CIMIS was developed
in 1982 by DWR and the University of California, Davis to assist California's irrigators to
manage their water resources efficiently.
More about CIMIS
http://wwwcimis.water.ca.gov/
WATER DATA LIBRARY
The library provides geographic-based data on water conditions.
More about the Water Data Library
http://www.water.ca.gov/waterdatalibrary/
INTERAGENCY ECOLOGICAL PROGRAM
The Interagency Ecological Program (IEP) provides ecological information and scientific
leadership for use in management of the San Francisco Estuary.
More about IEP
http://www.water.ca.gov/iep/
INTEGRATED WATER RESOURCES INFORMATION SYSTEM (IWRIS)
IWRIS is a one stop shop for state-wide water resources information. It integrates multidisciplinary data to support Integrated Regional Water Management.
More about IWRIS
http://www.water.ca.gov/iwris/
http://www.water.ca.gov/data home.cfm
Data portal of the City of Dallas
https://www.dallasopendata.com/browse
Data portal of the City of Austin
https://data.austintexas.gov/
How to produce and use datasets: lessons learned, mlwave
http://mlwave.com/how-to-produce-and-use-datasets-lessons-learned/
MITx and HarvardX release MOOC datasets and visualization
tools
http://newsoffice.mit.edu/2014/mitx-and-harvardx-release-mooc-datasets-and-vizualizationtools
Finding the perfect house using open data, Justin Palmer’s Blog
http://dealloc.me/2014/05/24/opendata-house-hunting/
Synapse
A private or public workspace that allows you to aggregate, describe, and
share your research.
A tool to improve reproducibility of data intensive science, recording progress
as you work with tools such as R and Python.
A set of living research projects enabling contribution to large-scale collaborative
solutions to scientific problems.
https://www.synapse.org/
NYC Taxi Trips Date from 2013
These data were made publicly available thanks to Chris Whong who did the heavy lifting. He is
also providing links to a bittorrent where the data can be downloaded much faster. Read more
about it here.
http://www.andresmh.com/nyctaxitrips/
Sebastian Raschka’s Dataset Collections
https://github.com/rasbt/pattern_classification/blob/master/resources/dataset_collections.md
Awesome Public Datasets by Xiaming Chen, Shanghai, China
This list of public data sources are collected and tidyed from blogs, answers, and user reponses.
Most of the data sets listed below are free, however, some are not.
https://github.com/caesar0301/awesome-public-datasets
I am now a Ph.D. candidate with Prof. Yaohui Jin at Shanghai Jiao Tong Univ.. I received my
B.S. (2010) of Optical Information and Science Technology at Xidian University, Xi'an, China.
My research interests come from the measurement and analysis of network traffic, especially the
renewed models and characteristics of networks traffic, with the data mining techniques and high
performance processing platforms like Network Processors and distributed processing systems like
Hadoop/MapReduce or Spark.
If you are interested in my articles, researches, or projects, you can reach me via email or other
partially instant messages like github.
Enjoy! :-)
http://xiaming.me/pages/about.html
UK Dataset
Opening up government
http://data.gov.uk/
LONDON DATASTORE - 601 datasets found (28-08-2015)
Welcome to the new look DataStore
Over the last few months we have been busy updating London Datastore to deliver a host of
practical new features - improved (geography based) searches, dataset previews and APIs all of
which will make for a much sleeker experience. The technical improvements are there to support
our broader aim of kick-starting collaboration so that the value of data in our city reaches its full
potential.
Have a look around, read the introductory blog and Let us know what you think.
http://data.london.gov.uk/dataset
Transport For London Open Data, UK
https://tfl.gov.uk/info-for/open-data-users/our-open-data
Gaussian Processes List of Datasets
Welcome to the web site for theory and applications of Gaussian Processes
Gaussian Process is powerful non-parametric machine learning technique for constructing
comprehensive probabilistic models of real world problems. They can be applied to geostatistics,
supervised, unsupervised, reinforcement learning, principal component analysis, system
identification and control, rendering music performance, optimization and many other tasks.
People
Geology & Modelling Research Group at Rio Tinto Centre for Mine Automation, ACFR,
University of Sydney
http://gaussianprocess.com/datasets.php
The New York Times Linked Open Data
For the last 150 years, The New York Times has maintained one of the most authoritative news
vocabularies ever developed. In 2009, we began to publish this vocabulary as linked open data.
The Data
As of 13 January 2010, The New York Times has published approximately ,10,000 subject
headings as linked open data under a CC BY license. We provide both RDF documents and a
human-friendly HTML versions. The table below gives a breakdown of the various tag types and
mapping strategies on data.nytimes.com.
Type Manually Mapped Tags Automatically Mapped Tags Total
People 4,978 0 4,978
Organizations 1,489 1,592 3,081
Locations 1,910 0 1,910
Descriptors 498 0 498
Total 10,467
http://data.nytimes.com/
Google Public Data Explorer
The Google Public Data Explorer makes large, public-interest datasets easy to explore, visualize
and communicate. As the charts and maps animate over time, the changes in the world become
easier to understand. You don't have to be a data expert to navigate between different views,
make your own comparisons, and share your findings.
Students, journalists, policy makers and everyone else can play with the tool to create
visualizations of public data, link to them, or embed them in their own webpages. Embedded
charts and links can update automatically so you’re always sharing the latest available data.
The Public Data Explorer launched in March, 2010. See this blog post, which originally
announced the product, for more background and historical perspective.
https://www.google.com/publicdata/directory?hl=en US&dl=en US%22%20%5Cl%20%22!
st=DATASET
The Million Song Dataset
The Million Song Dataset is a freely-available collection of audio features and metadata for a
million contemporary popular music tracks.
Its purposes are:
- To encourage research on algorithms that scale to commercial sizes
- To provide a reference dataset for evaluating research
- As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)
- To help new researchers get started in the MIR field
http://labrosa.ee.columbia.edu/millionsong/
CrowFlower Open Data Library
CrowdFlower encourages developers and researchers to use its open data to explore new ways of
what crowdsourcing can achieve. This webpage is a repository of data sets collected or enhanced
by CrowdFlower's workforce and made available for everyone to use.
http://www.crowdflower.com/data-for-everyone
OPEN DATASET, in French
Montreal, Portail Donnees Ouvertes (French&English), Canada
http://donnees.ville.montreal.qc.ca/
Insee, France
http://www.insee.fr/fr/publications-et-services/depliant webinsee.pdf
RATP Open Data, French Tube in Paris, France
http://data.ratp.fr/explore/
L’Open-Data français cartographié
Voici trois cartographies de l’écosphère de l‘Open Data français. Sur fond noir, les trois posters
(téléchargeable au format « A0″) livrent un aperçu général sur l’open-data français actuel. Les
trois cartographies sont basées sur les données fournies par Data-Publica, notamment deux
études réalisées récemment par Guillaume Lebourgeois, Pierrick Boitel et Perrine Letellier (ayant
accueilli les deux derniers dans mon enseignement à l’UTC au semestre dernier). L’objectif de
ces cartes est d’entamer une « radiographie » assez complète du domaine, renouvelable dans le
temps (peut-être tous les six mois) et directement associée aux données présentes chez Data
Publica. En somme, une sorte d’observatoire de l’open-data français dans lequel je me lance à
travers les productions de l’Atelier de Cartographie. http://ateliercartographie.wordpress.com/2012/09/23/lopen-data-francais-cartographie/ OPEN DATASET, China
Lamda Group
Data
•
Image Data For Multi-Instance Multi-Label Learning
•
MDDM Data for for multi-label dimensionality reduction.
•
Text Data for Multi-Instance Learning
•
MILWEB Data for Multi-Instance Learning Based Web Index Recommendation.
•
SGBDota Data for the PCES (Positive Concept Expansion with Single snapshot)
problem.
•
Single Face Dataset Data for Face Recognition with One Training Image per Person.
•
Text Data For Multi-Instance Multi-Label Learning
http://lamda.nju.edu.cn/Data.ashx
DATA VISUALIZATION
Visualization Lab Gallery, Computer Science Division,
University of California, Berkeley
CS 294-10 Fall '14 Visualization
Instructors: Maneesh Agrawala and Jessica Hullman
Course Wiki
CS 160 Spring '14 User Interface Design
Instructor: Maneesh Agrawala and Bjoern Hartmann
TAs: Brittany Cheng, Steve Rubin, and Eric Xiao
Course Wiki
CS 294-10 Fall '13 Visualization
Instructor: Maneesh Agrawala
Course Wiki
CS 160 Spring '12 User Interface Design
Instructor: Maneesh Agrawala
TAs: Nicholas Kong, Anuj Tewari
Course Wiki
CS 294-69 Fall '11 Image Manipulation and Computational Photography
Instructor: Maneesh Agrawala
TA: Floraine Berthouzoz
Course Wiki
CS 294-10 Spring '11 Visualization
Instructor: Maneesh Agrawala
Course Wiki
CS 184 Fall '10 Computer Graphics
Instructor: Maneesh Agrawala
TAs: Robert Carroll, Fu-Chung Huang
Course Wiki
CS 160 Spring '10 User Interface
Instructors: Bjoern Hartmann, Maneesh Agrawala
TAs: Kenrick Kin, Anuj Tewari
Course Wiki
CS 294-10 Spring '10 Visualization
Instructor: Maneesh Agrawala
Course Wiki
CS 160 Spring '09 User Interfaces
Instructors: Maneesh Agrawala, Jeffrey Nichols
TAs: Nicholas Kong
Course Wiki
CS 294-10 Fall '08 Visualization
Instructor: Maneesh Agrawala
Course Wiki
CS 160 Spring '08 User Interfaces
Instructor: Maneesh Agrawala
TAs: Wesley Willett and Seth Horrigan
Course Wiki
CS 294-10 Fall '07 Visualization
Instructor: Maneesh Agrawala
Course Wiki
CS 160 Fall '06 User Interfaces
Instructor: Maneesh Agrawala
TAs: David Sun and Jerry Yu
Course Wiki
CS 294-10 Spring '06 Visualization
Organizers: Maneesh Agrawala, Jeffrey Heer
Course Wiki
http://vis.berkeley.edu/courses/cs294-10-fa14/wiki/index.php/Visualization Gallery
Visualization Lab Software, Computer Science Division,
University of California, Berkeley
http://vis.berkeley.edu/software/
Visualization Lab Course Wiki, Computer Science Division,
University of California, Berkeley
http://vis.berkeley.edu/courses/
Mike Bostock
Visualizing algorithms
http://bost.ocks.org/mike/
Eyeo Festival
Eyeo assembles an incredible set of creative coders, data designers and artists, and attendees -expect enthralling talks, unique workshops and interactions with open source instigators and
super fascinating practitioners. Join us for an extraordinary festival.
http://eyeofestival.com/
MIT Data Collider
A new language for data visualisation
http://datacollider.io/
D3 JS Data-Driven Documents
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data
to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full
capabilities of modern browsers without tying yourself to a proprietary framework, combining
powerful visualization components and a data-driven approach to DOM manipulation.
http://d3js.org/
Shan He, Research Fellow at MIT Senseable City Lab
Shan He is research fellow at MIT Senseable City Lab. She is an architect and a computational
design specialist. She is currently a student at MIT Department of Architecture pursuing her
SMArchS in Design and Computation. At Senseable, her focus is on data visualization,
interactive design and web application.
Prior to coming to MIT she worked as a product designer for Blu Homes where she worked on
developing an online 3-D customization tool with intellectual property. During her time at MIT
she has worked as a research assistant for the Clean Energy City Lab at the Advanced Urbanism
Center and also for the Mobile Experience Lab at the CMS.
Shan holds a B.Arch from Tsinghua University in China and a M.Arch from University of
Michigan, Ann Arbor.
http://cargocollective.com/shanhe/About-Shan-He
Gource software version control visualization
Software projects are displayed by Gource as an animated tree with the root directory of the
project at its centre. Directories appear as branches with files as leaves. Developers can be seen
working on the tree at the times they contributed to the project.
https://www.youtube.com/watch?v=NjUuAuBcoqs
https://code.google.com/p/gource/
Logstalgia, website access log visualization
Logstalgia (aka ApachePong) is a website access log visualization tool.
https://code.google.com/p/logstalgia/
Andrew Caudwell's Blog
Andrew Caudwell is a software developer and sometimes computer graphics programmer/artist
located in Wellington, New Zealand.
He is probably best known through his work as the author of several popular data visualizations:
Logstalgia (aka Apache Pong)
a visualization of website traffic as a pong-like game
Gource a force-directed layout software version control visualization
This blog is a collection of his work, experiments, thoughts and ideas on procedurally generated
computer graphics and animation.
http://www.thealphablenders.com/
MLDemos , EPFL, Switzerland
MLDemos is an open-source visualization tool for machine learning algorithms created to help
studying and understanding how several algorithms function and how their parameters affect and
modify the results in problems of classification, regression, clustering, dimensionality reduction,
dynamical systems and reward maximization.
MLDemos is open-source and free for personal and academic use.
http://mldemos.epfl.ch/
The University of Florida Sparse Matrix Collection
We describe the University of Florida Sparse Matrix Collection, a large and actively growing set
of sparse matrices that arise in real applications. The Collection is widely used by the numerical
linear algebra community for the development and performance evaluation of sparse matrix
algorithms. It allows for robust and repeatable experiments: robust because performance results
with artificially-generated matrices can be misleading, and repeatable because matrices are
curated and made publicly available in many formats. Its matrices cover a wide spectrum of
domains, include those arising from problems with underlying 2D or 3D geometry (as structural
engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor
devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics,
and other discretizations) and those that typically do not have such geometry (optimization,
circuit simulation, economic and financial modeling, theoretical and quantum chemistry,
chemical process simulation, mathematics and statistics, power networks, and other networks and
graphs). We provide software for accessing and managing the Collection, from MATLAB,
Mathematica, Fortran, and C, as well as an online search capability. Graph visualization of the
matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task.
http://www.cise.ufl.edu/research/sparse/matrices/
Visualization & Graphics lab, Dept. of CSA and SERC, Indian
Institute of Science, Bangalore
This is the video channel of the Visualization & Graphics lab (http://vgl.serc.iisc.ernet.in) which
is part of the Dept. of CSA and SERC, Indian Institute of Science, Bangalore.
It contains videos created by the members of the lab as part of their research.
https://www.youtube.com/user/vgliisc/videos?spfreload=10
Allison McCann
Allison McCann is a visual journalist and data reporter for FiveThirtyEight.
http://allisontmccann.com/
Scott Murray
I write software that generates images and interactive experiences.
I’m interested in data visualization, generative art, and designed experiences that encourage
people to slow down and reflect.
I am an Assistant Professor of Design at USF, a contributor to Processing, and the author of
Interactive Data Visualization for the Web.
I studied at MassArt’s Dynamic Media Institute (M.F.A. 2010) and Vassar College (A.B. 2001).
Website
The energetic particles on the home page were created with Processing and Processing.js.
Site content is managed in a database-free environment with Kirby. Changes are pushed with git
to magical boxes at Pagoda Box, where the files are hosted. Site analytics magic performed by
Piwik.
The site was made mobile-friendly through a combination of CSS3 media queries and
JavaScript.
http://alignedleft.com/
Gephi: The Open Graph Viz Platform
Gephi is an interactive visualization and exploration platform for all kinds of networks and
complex systems, dynamic and hierarchical graphs.
Runs on Windows, Linux and Mac OS X. Gephi is open-source and free.
What else? ;-)
Gephi is an open-source software for network visualization and analysis. It helps data analysts to
intuitively reveal patterns and trends, highlight outliers and tells stories with their data. It uses a
3D render engine to display large graphs in real-time and to speed up the exploration. Gephi
combines built-in functionalities and flexible architecture to explore, analyze, spatialize, filter,
cluster, manipulate, export all types of networks.
Gephi is based on a visualize-and-manipulate paradigm which allow any user to discover
networks and data properties. Moreover, it is designed to follow the chain of a case study, from
data file to nice printable maps.
Gephi is a free/libre software distributed under the GPL 3 ("GNU General Public License").
Tags: network, network science, infovis, visualization, visual analytics, exploratory data analysis,
graph, graph viz, graph theory, complex network, software, open source, science
https://gephi.github.io/features/
http://gephi.github.io/
Data Analysis and Visualization Using R by David Robinson
This is a course that combines video, HTML and interactive elements to teach the statistical
programming language R.
http://varianceexplained.org/RData/
Visualising Data Blog (Huge list of resources, great blog!)
About Andy Kirk
Andy Kirk is a UK-based freelance data visualisation specialist: A design consultant, training
provider, author, editor of visualisingdata.com, speaker and researcher...
Between January 2014 and March 2015 Andy is working as a co-investigator on a research
project called ‘Seeing Data’ funded by the Arts & Humanities Research Council and hosted by
the University of Leeds. This study is exploring the issue of visualisation literacy amongst the
general public.
http://www.visualisingdata.com/index.php/blog/
http://www.visualisingdata.com/index.php/resources/
The 8 hats of Data Visualisation Design by Andy Kirk
The nature of data visualization as a truly multi-disciplinary subject introduces many challenges.
You might be a creative but how are your analytical skills? Good at closing out a design but how
about the initial research and data sourcing? In this talk Andy Kirk will discuss the many different
‘hats’ a visualization designer needs to wear in order to effectively deliver against these demands.
It will also contextualize these duties in the sense of a data visualization project timeline.
Whether a single person will fulfill these roles, or a team collaboration will be set up to cover all
bases, this presentation will help you understand the requirements of any visualization problem
context.
https://vimeo.com/44886980
Andy Kirk, Visualisation consultant at the Big Data Week, 2013
https://www.youtube.com/watch?v=13weAkpSdWk&spfreload=10
Image Gallery by the Arts and Humanities Research Council,
UK
Images are generated and used in the arts and humanities in a wide variety of ways and for a
range of purposes as computer-generated (CGI) or computer enhanced images, virtual reality
representations and visualisations, digitised images from museums, libraries and archives, design
and architectural blueprints, photographs, cartoons, newspapers, maps and much else.
The AHRC Image Gallery is designed to showcase the range of digital images generated either
as by-products or as outputs of research projects in the arts and humanities as a means of
highlighting the richness and diversity of images created and used within the arts and humanities
and to showcase the talents of those who create them, including those of doctoral students and
early career researchers.
http://www.ahrc.ac.uk/News-and-Events/Image-Gallery/Pages/Image-Gallery.aspx
Setosa.io by Victor Powell & Lewis Lehe
interactive = intuitive
substance > information
http://setosa.io/#/ BOOKS, in English
2015
Bayesian Reasoning and Machine Learning, David Barber, 2012
(online version 04-2015)
Machine learning methods extract value from vast data sets quickly and with modest resources.
They are established tools in a wide range of industrial applications, including search engines,
DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly.
People who know the methods have their choice of rewarding jobs. This hands-on text opens
these opportunities to computer science students with modest mathematical backgrounds. It is
designed for final-year undergraduates and master's students with limited background in linear
algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning
to advanced techniques within the framework of graphical models. Students learn more than a
menu of techniques, they develop analytical and problem-solving skills that equip them for the
real world. Numerous examples and exercises, both computer based and theoretical, are included
in every chapter. Resources for students and instructors, including a MATLAB toolbox, are
available online.
http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=Brml.Online
Deep Learning (Artificial Intelligence) , An MIT Press book in
preparation, by Yoshua Bengio, Ian Goodfellow and Aaron
Courville, Jul-2015
Please help us make this a great book! This draft is still full of typos and can be improved in
many ways. Your suggestions are more than welcome. Do not hesitate to contact any of the
authors directly by e-mail or Google messages: Yoshua, Ian, Aaron.
Table of Contents
Deep Learning for AI
Linear Algebra
Probability and Information Theory
Numerical Computation
Machine Learning Basics
Feedforward Deep Networks
Structured Probabilistic Models: A Deep Learning Perspective
Unsupervised and Transfer Learning
Convolutional Networks
Sequence Modeling: Recurrent and Recursive Nets
The Manifold Perspective on Auto-Encoders
Confronting the Partition Function
References
http://www.iro.umontreal.ca/~bengioy/dlbook/
Neural Networks and Deep Learning by Michael Nielsen, 2015
Neural Networks and Deep Learning is a free online book. The book will teach you about:
Neural networks, a beautiful biologically-inspired programming paradigm which enables a
computer to learn from observational data
Deep learning, a powerful set of techniques for learning in neural networks
Neural networks and deep learning currently provide the best solutions to many problems in
image recognition, speech recognition, and natural language processing. This book will teach you
the core concepts behind neural networks and deep learning.
The book is currently an incomplete beta draft. More chapters will be added over the coming
months. For now, you can:
Read Chapter 1, which explains how neural networks can learn to recognize handwriting
Read Chapter 2, which explains backpropagation, the most important algorithm used to learn
in neural networks.
http://neuralnetworksanddeeplearning.com/index.html
2014
An Architecture for Fast and General Data Processing on Large
Clusters by Matei Zaharia, 2014
The past few years have seen a major change in computing systems, as growing data volumes and
stalling processor speeds require more and more applications to scale out to distributed systems.
Today, a myriad data sources, from the Internet to business operations to scientific instruments,
produce large and valuable data streams. However, the processing capabilities of single machines
have not kept up with the size of data, making it harder and harder to put to use. As a result, a
grow- ing number of organizations not just web companies, but traditional enterprises and
research labs need to scale out their most important computations to clusters of hundreds of
machines.
At the same time, the speed and sophistication required of data processing have grown. In
addition to simple queries, complex algorithms like machine learning and graph analysis are
becoming common in many domains. And in addition to batch processing, streaming analysis of
new real-time data sources is required to let organizations take timely action. Future computing
platforms will need to not only scale out traditional workloads, but support these new applications
as well.
This dissertation proposes an architecture for cluster computing systems that can tackle emerging
data processing workloads while coping with larger and larger scales. Whereas early cluster
computing systems, like MapReduce, handled batch processing, our architecture also enables
streaming and interactive queries, while keeping the scalability and fault tolerance of previous
systems. And whereas most deployed systems only support simple one-pass computations (e.g.,
aggregation or SQL queries), ours also extends to the multi-pass algorithms required for more
complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike the specialized
systems proposed for some of these workloads, our architecture allows these computations to be
combined, enabling rich new applications that intermix, for example, streaming and batch
processing, or SQL and complex analytics.
We achieve these results through a simple extension to MapReduce that adds primitives for data
sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to efficiently
capture a wide range of workloads. We implement RDDs in the open source Spark system,
which we evaluate using both synthetic benchmarks and real user applications. Spark matches or
exceeds the performance of specialized systems in many application domains, while offering
stronger fault tolerance guarantees and allowing these workloads to be combined. We explore the
generality of RDDs from both a theoretical modeling perspective and a practical perspective to
see why this extension can capture a wide range of previously disparate workloads.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf
Deep Learning Tutorial by LISA Lab, University of Montreal,
2014
The tutorials presented here will introduce you to some of the most important deep learning
algorithms and will also show you how to run them using Theano. Theano is a python library
that makes writing deep learning models easy, and gives the option of training them on a GPU.
The algorithm tutorials have some prerequisites. You should know some python, and be familiar
with numpy. Since this tutorial is about using Theano, you should read over the Theano basic
tutorial first. Once you’ve done that, read through our Getting Started chapter it introduces the
notation, and [downloadable] datasets used in the algorithm tutorials, and the way we do
optimization by stochastic gradient descent.
The purely supervised learning algorithms are meant to be read in order:
1. Logistic Regression - using Theano for something simple
2. Multilayer perceptron - introduction to layers
3. Deep Convolutional Network - a simplified version of LeNet5
The unsupervised and semi-supervised learning algorithms can be read in any order (the autoencoders can be read independently of the RBM/DBN thread):
• Auto Encoders, Denoising Autoencoders - description of autoencoders
• Stacked Denoising Auto-Encoders - easy steps into unsupervised pre-training for deep nets
• Restricted Boltzmann Machines - single layer generative RBM model
• DeepBeliefNetworks-unsupervisedgenerativepre-trainingofstackedRBMsfollowedbysupervised
fine-tuning
Building towards including the mcRBM model, we have a new tutorial on sampling from energy
models:
• HMC Sampling - hybrid (aka Hamiltonian) Monte-Carlo sampling with scan()
Building towards including the Contractive auto-encoders tutorial, we have the code for now:
• Contractive auto-encoders code - There is some basic doc in the code.
Energy-based recurrent neural network (RNN-RBM):
• Modeling and generating sequences of polyphonic music
http://deeplearning.net/tutorial/deeplearning.pdf
Statistical Inference for Everyone, by Professor Bryan Blais, 2014
This is a new approach to an introductory statistical inference textbook, motivated by probability
theory as logic. It is targeted to the typical Statistics 101 college student, and covers the topics
typically covered in the first semester of such a course. It is freely available under the Creative
Commons License, and includes a software library in Python for making some of the calculations
and visualizations easier.
I am a professor of Science and Technology, Bryant University and a research professor at the
Institute for Brain and Neural Systems, Brown University. My interests include
Theoretical Neuroscience
learning and memory in neural systems
vision
spike-timing dependent plasticity
Bayesian Inference
frequentist versus Bayesian statistics
Bayesian approaches to learning and memory
Digital to Analog Computer Control
autonomous experiments
neural networks and robotics
Global Resources
Dynamics of global resources and economics
Population growth, Malthusian traps, and energy
http://web.bryant.edu/~bblais/statistical-inference-for-everyone-sie.html
Mining of Massive Datasets by Jure Leskovec, Anand
Rajaraman, Jeff Ullman, 2014
The book
The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and
CS345A: Data Mining).
The book, like the course, is designed at the undergraduate computer science level with no
formal prerequisites. To support deeper explorations, most of the chapters are supplemented
with further reading references.
The Mining of Massive Datasets book has been published by Cambridge University Press. You
can get 20% discount here.
By agreement with the publisher, you can download the book for free from this page. Cambridge
University Press does, however, retain copyright on the work, and we expect that you will obtain
their permission and acknowledge our authorship if you republish parts or all of it. We are sorry
to have to mention this point, but we have evidence that other items we have published on the
Web have been appropriated and republished under other names. It is easy to detect such misuse,
by the way, as you will learn in Chapter 3.
We welcome your feedback on the manuscript.
The 2nd edition of the book (v2.1)
The following is the second edition of the book. There are three new chapters, on mining large
graphs, dimensionality reduction, and machine learning. There is also a revised Chapter 2 that
treats map-reduce programming in a manner closer to how it is used in practice.
Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford
CS246: Mining Massive Datasets course. Note that the slides do not necessarily cover all the
material convered in the corresponding chapters.
Download the latest version of the book as a single big PDF file (511 pages, 3 MB).
Note to the users of provided slides: We would be delighted if you found this our material
useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
your own needs. PowerPoint originals are available. If you make use of a significant portion of
these slides in your own lecture, please include this message, or a link to our web site: http://
www.mmds.org/.
Comments and corrections are most welcome. Please let us know if you are using these materials
in your course and we will list and link to your course.
http://infolab.stanford.edu/~ullman/mmds/book.pdf
Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi,
Huan Liu, 2014
The growth of social media over the last decade has revolutionized the way individuals interact
and industries conduct business. Individuals produce data at an unprecedented rate by
interacting, sharing, and consuming content through social media. Understanding and processing
this new type of data to glean actionable patterns presents challenges and opportunities for
interdisciplinary research, novel algorithms, and tool development. Social Media Mining
integrates social media, social network analysis, and data mining to provide a convenient and
coherent platform for students, practitioners, researchers, and project managers to understand
the basics and potentials of social media mining. It introduces the unique problems arising from
social media data and presents fundamental concepts, emerging issues, and effective algorithms
for network analysis and data mining. Suitable for use in advanced undergraduate and beginning
graduate courses as well as professional short courses, the text contains exercises of different
degrees of difficulty that improve understanding and help apply concepts, principles, and
methods in various scenarios of social media mining.
http://dmml.asu.edu/smm/book/
Slides
http://dmml.asu.edu/smm/slides/
Causal Inference by Miguel A. Hernán and James M. Robins,
May 14, 2014, Draft
The book provides a cohesive presentation of concepts of, and methods for, causal inference.
Much of this material is currently scattered across journals in several disciplines or confined to
technical articles. We expect that the book will be of interest to anyone interested in causal
inference, e.g., epidemiologists, statisticians, psychologists, economists, sociologists, other social
scientists… The book is geared towards graduate students and practitioners.
We have divided the book in 3 parts of increasing difficulty: causal inference without models,
causal inference with models, and causal inference from complex longitudinal data. We will make
drafts of selected book sections available on this website. The idea is that interested readers can
submit suggestions or criticisms before the book is published. If you wish to share any comments,
please email me or visit us on Facebook (user causalinference).
Warning: These documents are drafts. We are constantly revising and correcting errors without
documenting the changes. Please make sure you use the most updated version posted here.
http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
Slides for High Performance Python tutorial at EuroSciPy, 2014
by Ian Ozsvald
This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant,
founder of the Annotate.io social media mining API, author of O'Reilly's High Performance
Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the
A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo
and FivePoundApps and also a Londoner. Here's a little more about me.
https://github.com/ianozsvald/euroscipy2014 highperformancepython
http://ianozsvald.com/2014/08/30/slides-for-high-performance-python-tutorial-ateuroscipy2014-book-signing/
Probabilistic Programming and Bayesian Methods for Hackers
by Cameron Davidson-Pilon, 2014
Bayesian Methods for Hackers is designed as a introduction to Bayesian inference from a
computational/understanding-first, and mathematics-second, point of view. Of course as an
introductory book, we can only leave it at that: an introductory book. For the mathematically
trained, they may cure the curiosity this text generates with other texts designed with
mathematical analysis in mind. For the enthusiast with less mathematical-background, or one
who is not interested in the mathematics but simply the practice of Bayesian methods, this text
should be sufficient and entertaining.
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-forHackers
Past, Present, and Future of Statistical Science by COPSS, 2014
http://nisla05.niss.org/copss/past-present-future-copss.pdf
Essential of Metaheuristics by Sean Luke, 2014
This is an open set of lecture notes on metaheuristics algorithms, intended for undergraduate
students, practitioners, programmers, and other non-experts. It was developed as a series of
lecture notes for an undergraduate course I taught at GMU. The chapters are designed to be
printable separately if necessary. As it's lecture notes, the topics are short and light on examples
and theory. It's best when complementing other texts. With time, I might remedy this.
http://cs.gmu.edu/~sean/book/metaheuristics/
2013
Interactive Data Visualization for the Web By Scott Murray,
2013
Read online for free on the publisher website
This online version of Interactive Data Visualization for the Web includes 44 examples that will
show you how to best represent your interactive data. For instance, you'll learn how to create this
simple force layout with 10 nodes and 12 edges. Click and drag the nodes below to see the
diagram react.
This step-by-step guide is ideal whether you’re a designer or visual artist with no programming
experience, a reporter exploring the new frontier of data journalism, or anyone who wants to
visualize and share data. Create and publish your own interactive data visualization projects on
the Web even if you have little or no experience with data visualization or web development.
It’s easy and fun with this practical, hands-on introduction. Author Scott Murray teaches you the
fundamental concepts and methods of D3, a JavaScript library that lets you express data visually
in a web browser. Along the way, you’ll expand your web programming skills, using tools such as
HTML and JavaScript.
http://chimera.labs.oreilly.com/books/1230000000345
Statistical Model Building, Machine Learning, and the Ah-Ha
Moment by Grace Wahba, 2013
https://archive.org/details/arxiv-1303.5153
An Introduction to Statistical Learning with applications in R.
by Gareth James Daniela Witten Trevor Hastie Robert
Tibshirani, 2013 (first printing)
http://web.stanford.edu/~hastie/local.ftp/Springer/ISLR print1.pdf
2012
Reinforcement Learning by Richard S. Sutton and Andrew G.
Barto, 2012, Second edition in progress (PDF)
This introductory textbook on reinforcement learning is targeted toward engineers and scientists
in artificial intelligence, operations research, neural networks, and control systems, and we hope it
will also be of interest to psychologists and neuroscientists. ...
A second edition is incomplete and in progress, but also perfectly usable. Feedback is welcome;
send your comments to rich@richsutton.com.
http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html
R Graphics Cookbook Code Resources (Graphs with ggplot2) by
Winston Chang, 2012
My book about data visualization in R is available! The book covers many of the same topics as
the Graphs and Data Manipulation sections of this website, but it goes into more depth and
covers a broader range of techniques. You can preview it at Google Books.
http://www.cookbook-r.com/Graphs/
Supervised Sequence Labelling with Recurrent Neural Networks
by Alex Graves, 2012
Structure of the Book
The chapters are roughly grouped into three parts: background material is presented in Chapters
2 4, Chapters 5 and 6 are primarily experimental, and new methods are introduced in Chapters
7 9.
Chapter 2 briefly reviews supervised learning in general, and pattern classi- fication in particular.
It also provides a formal definition of sequence labelling, and discusses three classes of sequence
labelling task that arise under different relationships between the input and label sequences.
Chapter 3 provides back- ground material for feedforward and recurrent neural networks, with
emphasis on their application to labelling and classification tasks. It also introduces the sequential
Jacobian as a tool for analysing the use of context by RNNs.
Chapter 4 describes the LSTM architecture and introduces bidirectional LSTM (BLSTM).
Chapter 5 contains an experimental comparison of BLSTM to other neural network
architectures applied to framewise phoneme classification. Chapter 6 investigates the use of
LSTM in hidden Markov model-neural network hybrids. Chapter 7 introduces connectionist
temporal classification, Chapter 8 covers multidimensional networks, and hierarchical
subsampling networks are described in Chapter 9.
http://www.cs.toronto.edu/~graves/preprint.pdf
A course in Machine Learning by Hal Daume, 2012
Machine learning is the study of algorithms that learn from data and experience. It is applied in
a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any
area in which you need to make sense of data is a potential consumer of machine learning.
CIML is a set of introductory materials that covers most major aspects of modern machine
learning (supervised learning, unsupervised learning, large margin methods, probabilistic
modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A
subset can be used for an undergraduate course; a graduate course could probably cover the
entire material and then some.
http://ciml.info/
Machine Learning in Action, Peter Harrington, 2012
Chapter 1 and 7 are available for free on the publisher website
http://www.manning.com/pharrington/MLiAchapter1sample.pdf
http://www.manning.com/pharrington/MLiAchapter7sample.pdf
A Programmer's Guide to Data Mining, by Ron Zacharski, 2012
About This Book
Before you is a tool for learning basic data mining techniques. Most data mining textbooks focus
on providing a theoretical foundation for data mining, and as result, may seem notoriously
difficult to understand. Don’t get me wrong, the information in those books is extremely
important. However, if you are a programmer interested in learning a bit about data mining you
might be interested in a beginner’s hands-on guide as a first step. That’s what this book provides.
This guide follows a learn-by-doing approach. Instead of passively reading the book, I encourage
you to work through the exercises and experiment with the Python code I provide. I hope you will
be actively involved in trying out and programming data mining techniques. The textbook is laid
out as a series of small steps that build on each other until, by the time you complete the book,
you have laid the foundation for understanding data mining techniques. This book is available for
download for free under a Creative Commons license (see link in footer). You are free to share the
book, and remix it. Someday I may offer a paper copy, but the online version will always be free.
http://guidetodatamining.com/
2010
Artificial Intelligence, Foundations of Computational Agents by
David Poole and Alan Mackworth, 2010
Artificial Intelligence: Foundations of Computational Agents is a book about the science of
artificial intelligence (AI). The view we take is that AI is the study of the design of intelligent
computational agents. The book is structured as a textbook but it is designed to be accessible to a
wide audience.
We wrote this book because we are excited about the emergence of AI as an integrated science.
As with any science worth its salt, AI has a coherent, formal theory and a rambunctious
experimental wing. Here we balance theory and experiment and show how to link them
intimately together. We develop the science of AI together with its engineering applications. We
believe the adage, "There is nothing so practical as a good theory." The spirit of our approach is
captured by the dictum, "Everything should be made as simple as possible, but not simpler." We
must build the science on solid foundations; we present the foundations, but only sketch, and give
some examples of, the complexity required to build useful intelligent systems. Although the
resulting systems the will be complex, the foundations and the building blocks should be simple.
http://artint.info/html/ArtInt.html
Introduction to Machine Learning by Ethem Alpaydın, MIT
Press, Second Edition, 2010, 579 pages
1 Introduction 1
2 Supervised Learning 21
3 Bayesian Decision Theory 47
4 Parametric Methods 61
5 Multivariate Methods 87
6 Dimensionality Reduction
7 Clustering 143
8 Nonparametric Methods
9 Decision Trees 185
10 Linear Discrimination 209
11 Multilayer Perceptrons 233
12 Local Models 279
13 Kernel Machines
14 Bayesian Estimation
15 Hidden Markov Models
16 Graphical Models 387
17 Combining Multiple Learners 419
18 Reinforcement Learning 447
19 Design and Analysis of Machine Learning Experiments 475
A Probability 517
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e/index.html
2009
The Elements of Statistical Learning, T. Hastie, R. Tibshirani,
and J. Friedman, 2009
During the past decade has been an explosion in computation and information technology. With
it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and
marketing. The challenge of understanding these data has led to the development of new tools in
the field of statistics, and spawned new areas such as data mining, machine learning, and
bioinformatics. Many of these tools have common underpinnings but are often expressed with
different terminology. This book descibes the important ideas in these areas in a common
conceptual framework. While the approach is statistical, the emphasis is on concepts rather than
mathematics. Many examples are given, with a liberal use of color graphics. It should be a
valuable resource for statisticians and anyone interested in data mining in science or industry.
The book's coverage is broad, from supervised learning (prediction) to unsupervised learning.
The many topics include neural networks, support vector machines, classification trees and
boosting--the first comprehensive treatment of this topic in any book.
This major new edition features many topics not covered in the original, including graphical
models, random forests, ensemble methods, least angle regression & path algorithms for the lasso,
non-negative matrix factorization and spectral clustering. There is also a chapter on methods for
``wide'' data (italics p bigger than n), including multiple testing and false discovery rates.
http://statweb.stanford.edu/~tibs/ElemStatLearn/
Learning Deep Architecture for AI by Yoshua Bengio, 2009
Abstract
Theoretical results suggest that in order to learn the kind of com- plicated functions that can
represent high-level abstractions (e.g., in vision, language, and other AI-level tasks), one may need
deep architec- tures. Deep architectures are composed of multiple levels of non-linear operations,
such as in neural nets with many hidden layers or in com- plicated propositional formulae reusing many sub-formulae. Searching the parameter space of deep architectures is a difficult task,
but learning algorithms such as those for Deep Belief Networks have recently been proposed to
tackle this problem with notable success, beating the state- of-the-art in certain areas. This
monograph discusses the motivations and principles regarding learning algorithms for deep
architectures, in particular those exploiting as building blocks unsupervised learning of singlelayer models such as Restricted Boltzmann Machines, used to construct deeper models such as
Deep Belief Networks.
http://www.iro.umontreal.ca/~bengioy/papers/ftml book.pdf
An Introduction to Information Retrieval by Christopher D.
Manning Prabhakar Raghavan Hinrich Schütze, 2009
This book is the result of a series of courses we have taught at Stanford University and at the
University of Stuttgart, in a range of durations including a single quarter, one semester and two
quarters. These courses were aimed at early-stage graduate students in computer science, but we
have also had enrollment from upper-class computer science undergraduates, as well as students
from law, medical informatics, statistics, linguistics and various en- gineering disciplines. The key
design principle for this book, therefore, was to cover what we believe to be important in a oneterm graduate course on information retrieval. An additional principle is to build each chapter
around material that we believe can be covered in a single lecture of 75 to 90 minutes.
The first eight chapters of the book are devoted to the basics of information retrieval, and in
particular the heart of search engines; we consider this material to be core to any course on
information retrieval.
…
Chapters 9 21 build on the foundation of the first eight chapters to cover a variety of more
advanced topics.
http://www-nlp.stanford.edu/IR-book/
http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf
2008
Kernel Method in Machine Learning by Thomas Hofmann;
Bernhard Schölkopf; Alexander J. Smola, 2008
We review machine learning methods employing positive definite kernels. These methods
formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of
functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of
function has the benefit of facilitating the construction and analysis of learning algorithms while
at the same time allowing large classes of functions. The latter include nonlinear functions as well
as functions defined on nonvectorial data. We cover a wide range of methods, ranging from
binary classifiers to sophisticated methods for estimation with structured data.
https://archive.org/details/arxiv-math0701907
Introduction to Machine Learning, Alex Smola, S.V.N.
Vishwanathan, 2008
Over the past two decades Machine Learning has become one of the main- stays of information
technology and with that, a rather central, albeit usually hidden, part of our life. With the ever
increasing amounts of data becoming available there is good reason to believe that smart data
analysis will become even more pervasive as a necessary ingredient for technological progress.
The purpose of this chapter is to provide the reader with an overview over the vast range of
applications which have at their heart a machine learning problem and to bring some degree of
order to the zoo of problems. After that, we will discuss some basic tools from statistics and
probability theory, since they form the language in which many machine learning problems must
be phrased to become amenable to solving. Finally, we will outline a set of fairly basic yet
effective algorithms to solve an important problem, namely that of classification. More
sophisticated tools, a discussion of more general problems and a detailed analysis will follow in
later parts of the book.
http://alex.smola.org/drafts/thebook.pdf
2006
Pattern Recognition and Machine Learning, Christopher M.
Bishop, 2006
Pattern recognition has its origins in engineering, whereas machine learning grew out of
computer science. However, these activities can be viewed as two facets of the same field, and
together they have undergone substantial development over the past ten years. In particular,
Bayesian methods have grown from a specialist niche to become mainstream, while graphical
models have emerged as a general framework for describing and applying probabilistic models.
Also, the practical applicability of Bayesian methods has been greatly enhanced through the
development of a range of approximate inference algorithms such as variational Bayes and
expectation propa- gation. Similarly, new models based on kernels have had significant impact on
both algorithms and applications.
Chapter 8
Graphical Models
Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that
probability theory can be expressed in terms of two simple equations corresponding to the sum
rule and the product rule. All of the probabilistic infer- ence and learning manipulations
discussed in this book, no matter how complex, amount to repeated application of these two
equations. We could therefore proceed to formulate and solve complicated probabilistic models
purely by algebraic ma- nipulation. However, we shall find it highly advantageous to augment the
analysis using diagrammatic representations of probability distributions, called probabilistic
graphical models. These offer several useful properties:
1. They provide a simple way to visualize the structure of a probabilistic model and can be used
to design and motivate new models.
2. Insights into the properties of the model, including conditional independence properties, can
be obtained by inspection of the graph.
3. Complex computations, required to perform inference and learning in sophis- ticated models,
can be expressed in terms of graphical manipulations, in which underlying mathematical
expressions are carried along implicitly.
http://research.microsoft.com/en-us/um/people/cmbishop/PRML/pdf/Bishop-PRMLsample.pdf
http://research.microsoft.com/en-us/um/people/cmbishop/prml/
Gaussian processes for Machine Learning, C. Rasmussen and C.
Williams, 2006
Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in
kernel machines. GPs have received increased attention in the machine-learning community over
the past decade, and this book provides a long-needed systematic and unified treatment of
theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive
and self-contained, targeted at researchers and students in machine learning and applied
statistics.The book deals with the supervised-learning problem for both regression and
classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions
are presented and their properties discussed. Model selection is discussed both from a Bayesian
and a classical perspective. Many connections to other well-known techniques from machine
learning and statistics are discussed, including support-vector machines, neural networks, splines,
regularization networks, relevance vector machines and others. Theoretical issues including
learning curves and the PAC-Bayesian framework are treated, and several approximation
methods for learning with large datasets are discussed. The book contains illustrative examples
and exercises, and code and datasets are available on the Web. Appendixes provide mathematical
background and a discussion of Gaussian Markov processes.
http://www.gaussianprocess.org/gpml/chapters/
2005
Bayesian Machine Learning by Chakraborty, Sounak, 2005
PhD Thesis
https://archive.org/details/bayesianmachinel00chak
Machine Learning by Tom Mitchell, 2005
Policy on use:. You are welcome to download these chapters for your personal use, or for use in
classes you teach. In return, I ask only two things: •
Please do not re-post these documents on the internet. If you wish to make them available
to your students, point them directly to this site.
•
If you find errors please send me email at Tom.Mitchell@cmu.edu
I hope you find these useful!
Tom Mitchell
http://www.cs.cmu.edu/~tom/NewChapters.html
http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html
2003
Information Theory, Inference, and Learning Algorithms, David
McKay, 2003
This book is aimed at senior undergraduates and graduate students in Engineering, Science,
Mathematics, and Computing. It expects familiarity with calculus, probability theory, and linear
algebra as taught in a first- or second- year undergraduate course on mathematics for scientists
and engineers.
Conventional courses on information theory cover not only the beautiful theoretical ideas of
Shannon, but also practical solutions to communica- tion problems. This book goes further,
bringing in Bayesian data modelling, Monte Carlo methods, variational methods, clustering
algorithms, and neural networks.
Why unify information theory and machine learning? Because they are two sides of the same
coin. In the 1960s, a single field, cybernetics, was populated by information theorists, computer
scientists, and neuroscientists, all studying common problems. Information theory and machine
learning still belong together. Brains are the ultimate compression and communication systems.
And the state-of-the-art algorithms for both data compression and error-correcting codes use the
same tools as machine learning.
http://www.inference.phy.cam.ac.uk/itprnn/book.html
https://archive.org/details/MackayInformationTheoryFreeEbookReleasedByAuthor
MISCELLANEOUS
Free Book List
E-Books for free online viewing and/or download
http://www.e-booksdirectory.com/listing.php?category=284
Free resource book (need to sign in)
There are too many machine learning resources on the internet, so much so that it can feel
overwhelming.
I have read the books and taken the courses and can give you good advice on where to start.
Resources you can use to learn faster
I have hand-picked the best machine learning…
…books
…websites
…videos
…university courses
…software
…competition sites
These resources have been listed in a handy PDF that you can download now
http://machinelearningmastery.com/machine-learning-resources/
Wikipedia: Machine Learning, the Complete Guide
This is a Wikipedia book, a collection of Wikipedia articles that can be easily saved, rendered
electronically, and ordered as a printed book. For information and help on Wikipedia books in
general, see Help:Books (general tips) and WikiProject Wikipedia-Books (questions and
assistance).
https://en.wikipedia.org/wiki/Book:Machine_Learning_%E2%80%93_The_Complete_Guide
ISSUU
Rediscover reading
With over 19 million publications, Issuu is the fastest growing digital publishing platform in the
world. Millions of avid readers come here every day to read the free publications created by
enthusiastic publishers from all over the globe with topics in fashion, lifestyle, art, sports and
global affairs to mention a few. And that's not all. We've also got a prominent range of
independent publishers utilizing the Issuu network to reach new fans every day.
Created by a bunch of geeks with an undying love for the publishing industry, Issuu has grown to
become one of the biggest publishing networks in the industry. It's an archive, library and
newsstand all gathered in one reading experience.
http://issuu.com/search?q=%22machine learning%22
Neural Networks, A Systematic Introduction by Raul Rojas
We are now beginning to see good textbooks for introducing the subject to various
student groups. This book by Rau ́l Rojas is aimed at advanced undergraduates in
computer science and mathematics. This is a revised version of his German text which has
been quite successful. It is also a valuable self- instruction source for professionals interested in the
relation of neural network ideas to theoretical computer science and articulating disciplines.
The book is divided into eighteen chapters, each designed to be taught in about one week. The
first eight chapters follow a progression and the later ones can be covered in a variety of orders.
The emphasis throughout is on explicating the computational nature of the structures and
processes and relating them to other computational formalisms. Proofs are rigorous, but not
overly formal, and there is extensive use of geometric intuition and diagrams. Specific
applications are discussed, with the emphasis on computational rather than engineering issues.
There is a modest number of exercises at the end of most chapters.
http://www.inf.fu-berlin.de/inst/ag-ki/rojas home/documents/1996/NeuralNetworks/
neuron.pdf
BOOKS, in Spanish
Coming soon …
BOOKS, in Portuguese
Coming soon …
BOOKS, in German
Coming soon …
BOOKS, in Italian
Coming soon …
BOOKS, in French
Coming soon …
BOOKS, in Russian
Pattern Recognition by А.Б.Мерков, 2014
http://www.recognition.mccme.ru/pub/RecognitionLab.html/slbook.pdf
Algorithmic models of learning classification: rationale,
comparison, selection, 2014
http://www.machinelearning.ru/wiki/images/c/c3/Donskoy14algorithmic.pdf
More coming soon …
BOOKS, in Japanese
Coming soon …
BOOKS, in Chinese
Blog recommending useful books
A blog written in Chinese which introduces and recommends many useful ML books (the books
are mostly written in English).
http://blog.csdn.net/pongba/article/details/2915005
Textbook for Statistics
http://baike.baidu.com/subview/1724467/13114186.htm
Introduction to Pattern recognition
http://baike.baidu.com/view/3911812.htm
Translated version of Machine Learning by Tom Mitchell
http://book.douban.com/subject/1102235/
Presentation, Infographics and
Documents in English
Meetup's Presentations
https://skillsmatter.com/explore?content=skillscasts&location=&q=machine learning
Slideshare.com
http://www.slideshare.net/search/slideshow?searchfrom=header&q=machine learning
Slides.com
http://slides.com/explore?search=machine%20learning
Powershow.com
http://www.powershow.com/search/presentations/machine-learning
Speaker Deck
https://speakerdeck.com/search?q=machine learning
Introduction to Artificial Intelligence, 2014, University of
Waterloo
https://cs.uwaterloo.ca/~ppoupart/teaching/cs486-spring15/
Aprendizado de Maquina, Conceitos e definicoes by Jose
Augusto Baranauskas
http://dcm.ffclrp.usp.br/~augusto/teaching/ami/AM-I-Conceitos-Definicoes.pdf
Aprendizado de Maquina by Bianca Zadrozni, Instituto de
Computação, UFF, 2010
http://www2.ic.uff.br/~bianca/aa/
NYC ML Meetup, 2014
Natural Language Processing in Investigative Journalism by Jonathan Stray
http://www.scribd.com/doc/230605794/Natural-Language-Processing-in-InvestigativeJournalism
Statistics with Doodles by Thomas Levine
https://thomaslevine.com/!/statistics-with-doodles-2014-03/
Conferences
ICML, Lille, France 2015
http://icml.cc/2015/
ICML, Beijing, China 2014
http://icml.cc/2014/
ICML, Atlanta, US 2013
http://icml.cc/2013/
http://techtalks.tv/icml/2013/
ICML, Edinburgh, UK 2012
http://icml.cc/2012/
http://techtalks.tv/icml/2012/orals/
http://techtalks.tv/icml 2012 representation learning/
http://techtalks.tv/icml/2012/inferning2012/
http://techtalks.tv/icml/2012/object2012/
http://techtalks.tv/icml/2012/icml colt 2012 tutorials/icml-2012-tutorial-on-prediction-beliefand-market/
ICML, Bellevue, US 2011
http://www.icml-2011.org/
http://techtalks.tv/icml-2011/
Full archive of ICML
http://machinelearning.org/icml.html
Machine Learning Conference Videos
http://techtalks.tv/search/results/?q=machine learning
Annual Machine Learning Symposium
6th
http://techtalks.tv/sixth-annual-machine-learning-symposium/
8th
http://www.nyas.org/Events/Detail.aspx?cid=2cc3521e-408a-460e-b159-e774734bcbea
Archive
http://www.nyas.org/whatwedo/fos/machine.aspx
MLSS Machine Learning Summer Schools
http://www.mlss.cc/
Data Gotham 2012, 2013
https://www.youtube.com/user/DataGotham
Meetup
1,380 Machine Learning Meetup in the World
http://machine-learning.meetup.com/
Data Science Weekly – List of Meetups
List of Data Science Meetups: NYC, San Francisco, Washington DC, Boston, Chicago, Seattle,
Denver, Austin, Atlanta, Toronto, Vancouver, London, Berlin, Paris, Amsterdam, Tel Aviv,
Dubai, Delhi, Bangalore, Singapore, Sydney
http://www.datascienceweekly.org/data-science-resources/data-science-meetups
London Machine Learning Meetup
http://www.meetup.com/London-Machine-Learning-Meetup/
BLOGS, in English
Igor Carron Blog
Nuit Blanche is a blog that focuses on Compressive Sensing, Advanced Matrix Factorization
Techniques, Machine Learning as well as many other engaging ideas and techniques needed to
handle and make sense of very high dimensional data also known as Big Data.
http://nuit-blanche.blogspot.co.uk/
Data Science Weekly
The Data Science Weekly Blog contains interviews to better understand how people are using
Data and Data Science to change the world.
http://www.datascienceweekly.org/blog
Yann LeCun, Google+
My main research interests are Machine Learning, Computer Vision, Mobile Robotics, and
Computational Neuroscience. I am also interested in Data Compression, Digital Libraries, the
Physics of Computation, and all the applications of machine learning (Vision, Speech, Language,
Document understanding, Data Mining, Bioinformatics).
https://plus.google.com/ YannLeCunPhD/posts
KDD Community, Knowledge discovery and Data Mining
KDD bringing together the data mining, data science and analytics community
http://www.sigkdd.org/blog
Kaggle Blog
http://blog.kaggle.com/
Digg
Digg is a news aggregator with an editorially driven front page, aiming to select stories
specifically for the Internet audience such as science, trending political issues, and viral Internet
issues. (source wikipedia)
http://digg.com/search?q=machine learning
Feedly
Found a site you like? Use the feedly button to add it to your feedly reading list
http://feedly.com/i/explore/%23Machine%20Learning
Mlwave
Learning Machine Learning
ML Wave is a platform that talks about machine learning and data science. It was founded in
2014 by the Dutch Kaggle user Triskelion.
http://mlwave.com/
FastML
Machine Learning made easy
FastML probably grew out of a frustration with papers you need a PhD in math to understand
and with either no code or half-baked Matlab implementation of homework-assignment quality.
We understand that some cutting-edge researchers might have no interest in providing the
goodies for free, or just no interest in such down-to-earth matters. But we don’t have time nor
desire to become experts in every machine learning topic. Fortunately, there is quite a lot of good
software with acceptable documentation.
http://fastml.com/
Beating the Benchmark
http://beatingthebenchmark.blogspot.co.uk/
Trevor Stephens Blog
http://trevorstephens.com/
Mozilla Hacks
Mozilla Hacks is one of the key resources for people developing for the Open Web, talking about
news and in-depth descriptions of technologies and features.
https://hacks.mozilla.org/?s=machine learning
Banach's Algorithmic Corner, University of Warsaw
This blog is maintained by members of Algorithmic group at University of Warsaw:
http://corner.mimuw.edu.pl/
DataCamp Blog
http://blog.datacamp.com/
Natural Language Processing Blog, Hal Daume
http://nlpers.blogspot.co.uk/
Maxim Milakov Blog
I am a researcher in machine learning and high-performance computing.
I designed and implemented nnForge - a library for training convolutional and fully connected
neural networks, with CPU and GPU (CUDA) backends.
You will find my thoughts on convolutional neural networks and the results of applying
convolutional ANNs for various classification tasks in the Blog.
http://www.milakov.org/
Alfonso Nieto-Castanon Blog
I work on the field of computational neuroscience, and my background is on neuroscience (Ph.D.
Cognitive and Neural Systems, Boston University) and engineering (B.S./M.S.
Telecommunication Engineering, Universidad de Valladolid). My areas of specialization are
modeling and statistics, fMRI analysis methods, and signal processing.
http://www.alfnie.com/home
Persontyle Blog
Every object on earth is generating data, including our homes, our cars and yes even our bodies.
Data is the by-product of our new digital existence.
Data has the potential to revolutionize the way business, government, science, research, and
healthcare are carried out. Data presents unprecedented opportunities to those who have the
skills and expertise to use it to unveil patterns, insights, signals and predict trends which was never
possible before.
In massively connected data driven world, it is imperative that the workforce of today and
tomorrow is able to understand what data is available and use scientific methods to analyze and
interpret it.
We’re here to help you learn and apply the art and science of turning data into meaningful
insights and intelligent predictions
http://www.persontyle.com/blog/
Analytics Vidhya
Learn everything about Analytics
Welcome to Analytics Vidhya!
For those of you, who are wondering what is “Analytics Vidhya”, “Analytics” can be defined as
the science of extracting insights from raw data. The spectrum of analytics starts from capturing
data and evolves into using insights / trends from this data to make informed decisions. “Vidhya”
on the other hand is a Sanskrit noun meaning “Knowledge” or “Clarity on a subject”.
Knowledge, which has been gained through reading literature or through self practice /
experimentation.
Through this blog, I want to create a passionate community, which dedicates itself in study of
Analytics. I share my learning and tips on Analytics through this blog.
http://www.analyticsvidhya.com/blog/
Bugra Akyildiz Blog
Great Blog (Notes) both theoretical and practical
I work as a Machine Learning/NLP Engineer at CB Insights where I apply machine learning
algorithms to NLP problems. I received B.S from Bilkent University and M.Sc from New York
University focusing signal processing and machine learning.
http://bugra.github.io/
Rasbt Blog
A collection of tutorials and examples for solving and understanding machine learning and
pattern classification tasks
Links to useful resources
https://github.com/rasbt/pattern_classification#links-to-useful-resources
Gilles Louppe Blog
Understanding Random Forest, PhD Thesis
https://github.com/glouppe/phd-thesis
AI Topics
AITopics is a mediated information portal provided by AAAI (The Association for the
Advancement of Artificial Intelligence), with the goal of communicating the science and
applications of AI to interested people around the world.
http://aitopics.org/
AI International
This international AI site is designed to help you locate AI research efforts in your country or
region. Pages on this site will link to local AI societies, universities, labs, and other research
efforts.
http://www.aiinternational.org/index.html
Joseph Misiti Blog
machine-learning
applied mathematics
django
hadoop. Co-Founder of @socialq.
https://github.com/josephmisiti
https://medium.com/@josephmisiti
MIRI, Machine Intelligence Research Institute
The mathematics of safe machine intelligence
MIRI’s mission is to ensure that the creation of smarter-than-human intelligence has a positive
impact. We aim to make intelligent machines behave as we intend even in the absence of
immediate human supervision. Much of our current research deals with reflection, an AI’s ability
to reason about its own behavior in a principled rather than ad-hoc way. We focus our research
on AI approaches that can be made transparent (e.g. principled decision algorithms, not genetic
algorithms), so that humans can understand why the AIs behave as they do.
https://intelligence.org/blog/
Kevin Davenport Data Blog
I'm a tech enthusiast interested in automation, machine learning, and conveying complex
statistical models through visualization.
http://kldavenport.com/
Alexandre Passant Blog
I'm a hacker, researcher, and entrepreneur. I'm passionate about the Web and I love when smart
algorithms and architectures power beautiful and useful products.
I'm co-founder of MDG Web (http://mdg.io), a music-tech start-up based in Dogpatch Labs
Dublin and focusing on the music discovery field. We're building seevl (http://seevl.fm), a free,
unlimited and targeted music discovery platform available as a standalone app and a Deezer app.
We also work with industry stakeholders to let hem promote their content on streaming platforms
through their own branded apps.
I was previously a Research Fellow and Unit Leader at DERI (http://deri.ie), the world's largest
Web 3.0 R&D lab, leading high-impact projects with partners such as Google, Cisco, and more,
on the Social / Semantic / Sensor Web, with a focus on Knowledge Representation and
Management, Personalisation, Privacy, Distributed Systems, and Recommender Systems.
Overall, I’m trying to make the Web a better place, and I’m having fun doing it.
http://apassant.net/
Daniel Nouri Blog
Using convolutional neural nets to detect facial keypoints tutorial, Daniel Nouri's Blog
This is a hands-on tutorial on deep learning. Step by step, we'll go about building a solution for
the Facial Keypoint Detection Kaggle challenge. The tutorial introduces Lasagne, a new library
for building neural networks with Python and Theano. We'll use Lasagne to implement a couple
of network architectures, talk about data augmentation, dropout, the importance of momentum,
and pre-training. Some of these methods will help us improve our results quite a bit.
I'll assume that you already know a fair bit about neural nets. That's because we won't talk about
much of the background of how neural nets work; there's a few of good books and videos for
that, like the Neural Networks and Deep Learning online book. Alec Radford's talk Deep
Learning with Python's Theano library is a great quick introduction. Make sure you also check
out Andrej Karpathy's mind-blowing ConvNetJS Browser Demos.
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facialkeypoints-tutorial/
Yvonne Rogers Blog
Yvonne Rogers is a Professor of Interaction Design, the director of UCLIC and a deputy head
of the Computer Science department at UCL. Her research interests are in the areas of
ubiquitous computing, interaction design and human-computer interaction. A central theme is
how to design interactive technologies that can enhance life by augmenting and extending
everyday, learning and work activities. This involves informing, building and evaluating novel
user experiences through creating and assembling a diversity of pervasive technologies.
http://www.interactiveingredients.com/
Igor Subbotin Blog (Both in English & Russian)
153 abonnés 56 448 consultations (02-Jan-2015)
http://igorsubbotin.blogspot.ru/
Sebastian Raschka GitHub Repository & Blog (Great Resources,
everything you need is there!)
https://github.com/rasbt
http://sebastianraschka.com/
Popular Science Website
http://www.popsci.com/find/machine%20learning
HOW MICROSOFT'S MACHINE LEARNING IS
BREAKING THE GLOBAL LANGUAGE BARRIER
Earlier this week, roughly 50,000 Skype users woke up to a new way of communicating over the
Web-based phone- and video-calling platform, a feature that could’ve been pulled straight out of
Star Trek. The new function, called Skype Translator, translates voice calls between different
languages in realtime, turning English to Spanish and Spanish back into English on the fly. Skype
plans to incrementally add support for more than 40 languages, promising nothing short of a
universal translator for desktops and mobile devices.
The product of more than a decade of dedicated research and development by Microsoft
Research (Microsoft acquired Skype in 2011), Skype Translator does what several other Silicon
Valley icons not to mention the U.S. Department of Defense have not yet been able to do. To
do so, Microsoft Research (MSR) had to solve some major machine learning problems while
pushing technologies like deep neural networks into new territory.
http://www.popsci.com/how-microsofts-machine-learning-breaking-language-barrier
Max Woolf Blog
Max Woolf is a Software QA Engineer living and working in the San Francisco Bay Area for over
2 years. He graduated from Carnegie Mellon University in 2012 with a degree in Business
Administration, concentrating in Computing and Information Technology.
In his spare time, Max uses Python to gather data from public APIs and ggplot2 to make pretty
charts from that data. Max also comments on technology blogs rather frequently.
http://minimaxir.com/
Rasmus Bååth Research Blog
I’m a phd student at Lund University Cognitive Science in Sweden. My main research interest is
music cognition and especially rhythm perception and production. I’m also interested in statistics
and statistical computing using R. My blog is syndicated on R-bloggers and StatsBlogs two great
sites if you are interested in R and statistics. Everything published on my blog is licensed under a
Creative Commons Attribution 4.0 International License.
I also run a drinks blog over at groggbloggen.se, it’s in Swedish but focuses on minimalist drinks
with only two ingrediets (which are called grogs in Sweden) so you should be able to figure it
out! :)
I believe that if you haven’t tried using Bayesian statistics you’re really missing out on something.
Why not do some Bayesian statistics right now in the browser and try my Bayesian “t-test” demo
featuring MCMC in javascript!
http://www.sumsar.net/
Flowing Data Blog
About
The greatest value of a picture is when it forces us to notice what we never expected to see.
John W. Tukey. Exploratory Data Analysis. 1977.
FlowingData explores how statisticians, designers, data scientists, and others use analysis,
visualization, and exploration to understand data and ourselves.
As for me, I'm Dr. Nathan Yau, PhD, but you can call me Nathan. My dissertation was on
personal data collection and how we can use visualization in the everyday context. That expands
to more general types of data and visualization and design for a growing audience
I've also written a couple of books on how to visualize data, and the series is growing.
http://flowingdata.com/
Genetic algorithm walkers
http://flowingdata.com/2015/01/16/genetic-algorithm-walkers/
The Shape of Data Blog
About
Whether your goal is to write data intensive software, use existing software to analyze large, high
dimensional data sets, or to better understand and interact with the experts who do these things,
you will need a strong understanding of the structure of data and how one can try to understand
it. On this blog, I plan to explore and explain the basic ideas that underlie modern data analysis
from a very intuitive and minimally technical perspective: by thinking of data sets as geometric
objects.
When I began learning about machine learning and data mining, I found that the intuition I had
formed while studying geometry was extremely valuable in understanding the basic concepts and
algorithms. But in many of the resources I’ve seen, this relatively simple geometry is hidden
behind enough equations and algorithms to intimidate all but the most technically inclined
readers. My goal in writing this blog is to put the geometry first, and show that anyone can gain
an intuitive understanding of modern data analysis.
About the Author: Jesse Johnson is a former math professor, with a research background in lowdimensional geometry/topology, who is now a software engineer at Google in Cambridge, MA.
https://shapeofdata.wordpress.com/
Data School Blog
My name is Kevin Markham, and I'm the co-founder of a small tech-for-good company. I've
been living in the Washington, DC area since 2005. I'm a computer engineer, an avid cook and
theatre-goer, and an occasional triathlete. I teach an 11-week data science course for General
Assembly, mentor data science students for SlideRule, and am a Community Teaching Assistant
for the Johns Hopkins University Data Science Specialization. In my spare time, I create
educational videos and compete in Kaggle competitions.
I created this blog because I love writing about data science topics, especially for people new to
the field. I've found that most data science resources are inaccessible to novices, and so I strive to
make my resources as accessible as possible to data scientists at all levels of knowledge and
experience.
http://www.dataschool.io/
Julia Evans Blog
About me (See the website to access the links)
Hi! I’m Julia.
I live in Montreal and work on Stripe’s machine learning team. You can find me elsewhere on the
internet:...
This blog is mostly about having fun with systems programming, with lots of forays into other
areas. There’s a list of my favorite posts, as well as some projects I’ve worked on.
I spent the fall of 2013 at Hacker School, which houses the best programming community I’ve
seen anywhere. I wrote down what I did every day while there, if you want to know what it’s like.
In the last year or two I’ve discovered that I like organizing community events and giving talks
about programming. A few things I’ve worked on:
Montreal All-Girl Hack Night with my awesome friend Monica
PyLadies Montreal.
!!Con, a 2-day conference about what excites us about programming, where all the talks are
lightning talks (with several amazing people)
http://jvns.ca/
http://nbviewer.ipython.org/github/jvns/talks/blob/master/pydatanyc2013/PyData%20NYC
%202013%20tutorial.ipynb
Stephan Hügel's Blog
About
My name’s Stephan Hügel, and I’m a doctoral researcher at UCL CASA.
My main research interest is in computational municipal infrastructure. In short: I’m studying
ways for cities to make more of their infrastructure available to people who live in them, in both
human and machine-readable ways. Currently, this infrastructure takes the form of data stores
and sensor platforms. If in doubt, put an API on it.
http://sensitivecities.com/
Visualising London Bike Hire Journey Lengths with Python and OSRM by Stephan
Hügel
London’s Cycle Hire scheme has been a roaring success and continues to grow, with new stations
being added all the time. This tutorial will produce a visualisation of journey times from the
central point (well, approximately) of the bike station network to all other stations.
This is made possible by the provision of an open-access instance of OSRM by the lovely people
at Mapzen. I won’t spend too much time on what OSRM is or how it works; suffice to say that it’s
an open-source routing engine that uses OpenStreetmap, and that the Mapzen instance provides
walking, cycling, and public transit routing data via HTTP. Hurrah!
The code for this tutorial is available here as an IPython Notebook
http://sensitivecities.com/bikeshare.html#.Vbe-Tiqqqkp
So You’d Like To Make a Map Using Python by Stephan Hügel
Making thematic maps has traditionally been the preserve of a ‘proper’ GIS, such as ArcGIS or
QGIS. While these tools make it easy to work with shapefiles, and expose a range of common
everyday GIS operations, they aren’t particularly well-suited to exploratory data analysis. In
short, if you need to obtain, reshape, and otherwise wrangle data before you use it to make a
map, it’s easier to use a data analysis tool (such as Pandas), and couple it to a plotting library. This
tutorial will be demonstrating the use of:
• Pandas
• Matplotlib
• The matplotlib Basemap toolkit, for plotting 2D data on maps
• Fiona, a Python interface to OGR
• Shapely, for analyzing and manipulating planar geometric objects
• Descartes, which turns said geometric objects into matplotlib “patches”
• PySAL, a spatial analysis library
The approach I’m using here uses an interactive REPL (IPython Notebook) for data exploration
and analysis, and the Descartes package to render individual polygons (in this case, wards in
London) as matplotlib patches, before adding them to a matplotlib axes instance. I should stress
that many of the plotting operations could be more quickly accomplished, but my aim here is to
demonstrate how to precisely control certain operations, in order to achieve e.g. the precise line
width, colour, alpha value or label position you want.
http://sensitivecities.com/so-youd-like-to-make-a-map-using-python-EN.html#.Vbe-cyqqqkp
BACKCHANNEL "Tech Stories Hub" by Steven Levy
I’m moving to Medium
Creating a new hub for tech stories that matter
For more than 30 years, I’ve been telling the true and truly jaw-dropping stories of the people
who are changing the world with tech, and I’ve been extremely lucky in finding homes for my
work. I first began writing about the subject for Rolling Stone, a magazine I idolized ever since
my high school years. My twelve years at Newsweek provided an amazing front row seat to the
dawn of the Internet era. And Wired, where I’ve been full time for the last six years after
freelancing for the magazine since its birth is the gold standard of reporting on the parts of the
world where the future is already distributed. Now, after hanging out at great startups since
forever, I’m finally joining one, hoping to create a chunk of that future myself.
https://medium.com/backchannel
DataScience Vegas
DataScience.Vegas is a blog that acts as a home for all the content (slides, code, videos) for several
data science meetups in Las Vegas including Data Science LV, R, DataVis, and Python-Data
Science. The blog is run by Data Science Las Vegas (DSLV), a non-profit professional group that
brings together people interested in data science in the Las Vegas community. We are a
community of data scientists, data miners, statisticians, data analysts, data engineers, data
visualizers, data journalists, academics, researchers, and in general people directly involved in
data projects. - See more at: http://datascience.vegas/about/#sthash.SIKuX50E.dpuf
http://datascience.vegas/blog/#sthash.m55GV0dM.K5gVO4Wb.dpbs
The Twitter Developer Blog
Your source for new features, best practices and real-world use of the Twitter Platform.
https://blog.twitter.com/developer
Tyler Neylon Blog
Coder in C, js, Python, Obj-C; math for fun. Currently building a game called Apanga.
http://tylerneylon.com/
Victor Powell Blog
Freelance data visualization - data visualization
visual explanation
http://blog.vctr.me/
CrowFlower Blog
http://www.crowdflower.com/blog
Edward Raff Blog
I work as a Computer Scientist and specialize in the area of Machine Learning. In my spare time
I maintain a large open source project for Machine Learning in Java.
http://jsatml.blogspot.co.uk/
Dirk Gorissen Blog and Projects
Academic who crossed over to the dark side. Research engineer, dabbling in everything from
autonomous systems and data science, to machine learning and computational engineering.
Organiser of @bigoldn and @deeplearningldn. Tech4Good enthusiast.
http://dirkgorissen.com/blog/
http://dirkgorissen.com/projects/
"How data, Python, and you can help 22.3 million people in Tanzania (almost half the
population) get better access to clean water." - based on Dirk's recent travels to Tanzania working
on a real-world data science problem
http://www.slideshare.net/dgorissen/data-for-good-38663284
http://www.meetup.com/PyData-London-Meetup/events/201507442/
Joseph Jacobs Homepage & Blog
Hey there!
My name is Joseph Jacobs (or Joe for short). I was born and raised in Kajang, Malaysia. I am
currently pursuing a PhD in Computer Science at University College London. I have a BSc in
Computer Science from the University of Bristol and an MSc in Machine Learning from
University College London. I am a Mac user, a Manchester United fan and a general all-round
geek.
https://joejacobs.me/
http://joejacobs.org/
MISCELLANEOUS
Allen Institute for Artificial Intelligence (AI2)
MISSION
The core mission of The Allen Institute for Artificial Intelligence (AI2) is to contribute to
humanity through high-impact AI research and engineering. We will do this by constructing AI
systems with reasoning, learning and reading capabilities. Please see the New York Times Profile
of AI2.
http://allenai.org/index.html
https://www.youtube.com/channel/UCEqgmyWChwvt6MFGGlmUQCQ?spfreload=10
Artificial General Intelligence (AGI) Society
This channel contains videos from the Artificial General Intelligence Society. The AGI Society
organizes a yearly conference and occasional summer school.
Artificial General Intelligence (AGI) is an emerging field aiming at the building of “thinking
machines”; that is, general-purpose systems with intelligence comparable to that of the human
mind (and perhaps ultimately well beyond human general intelligence). While this was the
original goal of Artificial Intelligence (AI), the mainstream of AI research has turned toward
domain-dependent and problem-specific solutions; therefore it has become necessary to use a
new name to indicate research that still pursues the “Grand AI Dream”. Similar labels for this
kind of research include “Strong AI”, “Human-level AI”, etc.
https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA?spfreload=10
http://www.agi-society.org/
AUAI, Association for Uncertainty in Artificial Intelligence
About AUAI
The Association for Uncertainty in Artificial Intelligence is a non-profit organization focused on
organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more
generally, on promoting research in pursuit of advances in knowledge representation, learning
and reasoning under uncertainty. The next UAI conference is the 30th conference, UAI-2015 in
Amsterdam, The Netherlands, on July 12-16, 2015. Join our Facebook group or add yourself to
the UAI Mailing list to keep updated on announcements and relevant AI news.
Principles and applications developed within the UAI community have been at the forefront of
research in Artificial Intelligence. The UAI community and annual meeting have been primary
sources of advances in graphical models for representing and reasoning with uncertainty.
http://www.auai.org/
BLOGS, in Spanish
Coming soon …
BLOGS, in Portuguese
Coming soon …
BLOGS, in Italian
Coming soon …
BLOGS, in German
Coming soon …
BLOGS, in French
L'ATELIER's News
L'Atelier, cellule de veille de BNP Paribas depuis plus de 30 ans.
BNP ParibasL'Atelier est implanté dans trois territoires majeurs de l'innovation (USA, Chine,
Europe) pour repérer, conseiller et accompagner les entreprises.
La cellule de veille s’appuie sur quatre activités : le Média, qui réalise une veille partagée sur ses
différents supports (site, radio, médias sociaux) ; les Evénements, qui permettent l’échange autour
de problématiques innovantes, le Conseil en stratégie numérique, qui replace les innovations
détectées dans le contexte des entreprises et des métiers. Enfin, L'Atelier Lab rapproche
entrepreneurs innovants et grandes entreprises, pour les aider à concevoir ensemble de nouveaux
produits et services numériques.
http://www.atelier.net/search/apachesolr_search/machine%20learning
More coming soon …
BLOGS, in Russian
Igor Subbotin's Blog (Both in English & Russian) (Huge list of
resources)
153 abonnés 56 448 consultations (02-Jan-2015)
http://igorsubbotin.blogspot.ru/
More coming soon …
BLOGS, in Japanese
Coming soon …
BLOGS, in Chinese
Coming soon …
JOURNALS, in English
Journal of Machine Learning Research, MIT Press
http://jmlr.org/
Machine Learning Journal (last article could be downloaded for
free)
http://link.springer.com/journal/10994
Machine Learning (Theory)
This is an experiment in the application of a blog to academic research in machine learning and
learning theory by John Langford. Exactly where this experiment takes us and how the blog will
turn out to be useful (or not) is one of those prediction problems we so dearly love in machine
learning.
http://hunch.net/
List of Journals on Microsoft Academic Research website
http://academic.research.microsoft.com/RankList?
entitytype=4&topDomainID=2&subDomainID=6&last=0&start=1&end=100
Wired magazine
http://www.wired.com/tag/machine-learning/
Data Science Central
Data Science Central is the industry's online resource for big data practitioners. From Analytics
to Data Integration to Visualization, Data Science Central provides a community experience that
includes a robust editorial platform, social interaction, forum-based technical support, the latest
in technology, tools and trends and industry job opportunities.
http://www.datasciencecentral.com
JOURNALS, in Spanish
Coming soon …
JOURNALS, in Portuguese
Coming soon …
JOURNALS, in Italian
Coming soon …
JOURNALS, in German
Coming soon …
JOURNALS, in French
Coming soon …
JOURNALS, in Russian
Coming soon …
JOURNALS, in Japanese
Coming soon …
JOURNALS, in Chinese
Coming soon …
FORUM, Q&A, in English
Data Tau
Hacker News for Data Scientists
Great website with a lot of really good and leading edge information! Respect the user’s privacy
by do not asking any personal information or email!
Remark: machinelearningsalon.org is using standard templates for forums which are provided by
its website hosting system, but machinelearningsalon.org is looking forward to do the same than
DataTau.com!
http://www.datatau.com/
Hacker News
Great website like datatau.com but less dedicated to Machine Learning! Respect the user’s
privacy by do not asking any personal information or email!
https://news.ycombinator.com/
Kaggle Forums
44,032 posts in 8,087 topics in 439 forums. (source 4 h June 2014)
https://www.kaggle.com/forums
Reddit /r/MachineLearning
News, Research Papers, Videos, Lectures, Softwares and Discussions on:
•
Machine Learning
•
Data Mining
•
Information Retrieval
•
Predictive Statistics
•
Learning Theory
•
Search Engines
•
Pattern Recognition
•
Analytics
http://www.reddit.com/r/MachineLearning/
Beginners: Please have a look at our FAQ and Link-Collection
http://www.reddit.com/r/MachineLearning/wiki/index
Reddit /r/generative
Art that has been generated, composed, or constructed in an algorithmic manner through the use
of systems defined by computer software algorithms, or similar mathematical or mechanical or
randomized autonomous processes.
http://www.reddit.com/r/generative
Cross validated Stack Exchange
Cross Validated is a question and answer site for people interested in statistics, machine learning,
data analysis, data mining, and data visualization. It's 100% free, no registration required.
http://stats.stackexchange.com/
Open data Stack Exchange
Open Data Stack Exchange is a question and answer site for developers and researchers
interested in open data. It's 100% free, no registration required.
http://opendata.stackexchange.com/
Data Science Beta Stack Exchange
Data Science Stack Exchange is a question and answer site for Data science professionals,
Machine Learning specialists, and those interested in learning more about the field. It's 100%
free, no registration required.
http://datascience.stackexchange.com/
Quora
Quora is your best source for knowledge.
Why do I need to sign in?
Quora is a knowledge-sharing community that depends on everyone being able to pitch in when
they know something.
http://www.quora.com/Machine-Learning
Machine Learning Impact Forum
Welcome! Please contribute your ideas for what challenges we might aspire to solve, changes in
our community that can improve machine learning impact, and examples of machine learning
projects that have had tangible impact.
http://www.wkiri.com/mlimpact/
FORUM, Q&A, in Spanish
More coming soon …
FORUM, Q&A, in
Portuguese
More coming soon …
FORUM, Q&A, in Italian
More coming soon …
FORUM, Q&A, in German
More coming soon …
FORUM, Q&A, in French
More coming soon …
FORUM, Q&A, in Russian
Reddit in Russian
http://www.reddit.com/r/MachineLearning_Ru
Habrahabr.ru Forum (in Russian translated by Google Chrome)
http://habrahabr.ru/
Some examples:
Playing with genetic algorithms
http://habrahabr.ru/post/246951/
What is a genetic algorithm
Why it works
We formalize the problem with a random string
An example of the algorithm
Experiments with the classics
Code and data
Findings
PythonDigest - 2014, the results of our work in figures and references
The main purpose for which it was created digest creation aggregator of news and information,
as a programming language python, and by branch or modules. During the existence of the
digest collected approximately 5235 materials, translated and published in 1776 news.
http://habrahabr.ru/post/247067/
More coming soon …
FORUM, Q&A, in Japanese
More coming soon …
FORUM, Q&A, in Chinese
Zhihu.com
Machine Learning
http://www.zhihu.com/search?q=%E6%9C%BA%E5%99%A8%E5%AD
%A6%E4%B9%A0&type=question
Data Mining
http://www.zhihu.com/search?q=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E
%98&type=question
Artificial Intelligence
http://www.zhihu.com/search?q=%E4%BA%BA%E5%B7%A5%E6%99%BA
%E8%83%BD&type=question
Guokr.com
Machine Learning
http://www.guokr.com/search/all/?wd=%E6%9C%BA%E5%99%A8%E5%AD
%A6%E4%B9%A0
Data Mining
http://www.guokr.com/search/all/?wd=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E
%98&sort=&term=True
Artificial Intelligence
http://www.guokr.com/search/all/?wd=%E4%BA%BA%E5%B7%A5%E6%99%BA
%E8%83%BD&sort=&term=True
More coming soon …
Governmental REPORTS,
in English
Big Data report, Whitehouse, US
https://www.whitehouse.gov/sites/default/files/docs/
big data privacy report may 1 2014.pdf
FUN, in English
Founder of PhD Comics
Jorge is the creator of "PHD Comics", the popular comic strip about life (or the lack thereof) in
Academia. He is also the co-founder of PHDtv, a video science and discovery outreach
collaborative, and a founding board member of Endeavor College Prep, a non-profit school for
kids in East L.A. He earned his Ph.D. in Robotics from Stanford University and was an
Instructor and Research Associate at Caltech from 2003-2005. He is originally from Panama.
http://jorgecham.com/
MACHINE LEARNING
RESEARCH GROUPS, in
USA
Computer Science and Artificial Intelligence Lab, MIT
The Computer Science and Artificial Intelligence Laboratory known as CSAIL is the largest
research laboratory at MIT and one of the world’s most important centers of information
technology research.
CSAIL and its members have played a key role in the computer revolution. The Lab’s
researchers have been key movers in developments like time-sharing, massively parallel
computers, public key encryption, the mass commercialization of robots, and much of the
technology underlying the ARPANet, Internet and the World Wide Web.
CSAIL members (former and current) have launched more than 100 companies, including
3Com, Lotus Development Corporation, RSA Data Security, Akamai, iRobot, Meraki, ITA
Software, and Vertica. The Lab is home to the World Wide Web Consortium (W3C), directed by
Tim Berners-Lee, inventor of the Web and a CSAIL member.
CSAIL research is focused on developing the architectures and infrastructures of tomorrow’s
information technology, and on creating innovations that will yield long-term improvements in
how people live and work. Lab members conduct research in almost all aspects of computer
science, including artificial intelligence, the theory of computation, systems, machine learning,
computer graphics, as well as exploring revolutionary new computational methods for advancing
healthcare, manufacturing, energy and human productivity.
http://www.csail.mit.edu/
Artificial Intelligence Laboratory, Stanford University
Welcome to the Stanford AI Lab
Founded in 1962, The Stanford Artificial Intelligence Laboratory (SAIL) has been a center of
excellence for Artificial Intelligence research, teaching, theory, and practice for over fifty years.
Reading group
We have several weekly reading groups where we present and discuss papers on various topics in
machine learning, natural language processing, computer vision, etc.
Autonomous Highway Driving
A deep learning model outputs the location of lane markings and surrounding cars given only a
single camera image.
http://ai.stanford.edu/
http://ai.stanford.edu/courses/
Machine Learning Department, Carnegie Mellon University
The Machine Learning Department is an academic department within Carnegie Mellon
University's School of Computer Science. We focus on research and education in all areas of
statistical machine learning. Watch an interview with Tom Mitchell, Department Head:
http://www.ml.cmu.edu/
Noah's ARK Research Group, Carnegie Mellon University
Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute,
School of Computer Science, Carnegie Mellon University. (The research is formal; the group is
informal.) As you may have guessed, our research focuses on problems of ambiguity and
uncertainty in natural language processing, including morphology, syntax, semantics, translation,
and behavioral/social phenomena observed through language all viewed through a
computational lens.
http://www.ark.cs.cmu.edu/
Intelligent Interactive Systems Group, Harvard University
Intelligent Interactive Systems are fundamentally hard to design because they require intelligent
technology that is well suited for people's abilities, limitations, and preferences; they also require
entirely novel interactions that can give the user a predictable and reliable experience despite the
fact that the underlying technology is inherently proactive, unpredictable, and occasionally
wrong. Thus, design of successful intelligent interactive systems requires intimate knowledge of
and ability to innovate in two very disparate areas: human-computer interaction and
artificial intelligence or machine learning.
Our projects span the full range from formal user studies to statistical machine learning. We have
worked on developing new intelligent technologies to enable novel interactions (e.g., SUPPLE
system) and on understanding the principles underlying how people interact with intelligent
systems (e.g., the project on exploring the design space of adaptive user interfaces). Our BrainComputer Interface project aims at developing a new set of interactions for efficiently controlling
complex applications, and we are also interested in building and studying complete applications.
One particular area of inteterest is the ability-based user interfaces -- an approach for adapting
interactions to the individual abilities of people with impairments or of able-bodied people in
unusual situations.
http://iis.seas.harvard.edu/
http://iis.seas.harvard.edu/resources/
Statistical Machine Learning, University of California, Berkeley
Research Statement
Statistical machine learning merges statistics with the computational sciences---computer science,
systems science and optimization. Much of the agenda in statistical machine learning is driven by
applied problems in science and technology, where data streams are increasingly large-scale,
dynamical and heterogeneous, and where mathematical and algorithmic creativity are required
to bring statistical methodology to bear. Fields such as bioinformatics, artificial intelligence, signal
processing, communications, networking, information management, finance, game theory and
control theory are all being heavily influenced by developments in statistical machine learning.
The field of statistical machine learning also poses some of the most challenging theoretical
problems in modern statistics, chief among them being the general problem of understanding
the link between inference and computation.
Research in statistical machine learning at Berkeley builds on Berkeley's world-class strengths in
probability, mathematical statistics, computer science and systems science. Moreover, by its
interdisciplinary nature, statistical machine learning helps to forge new links among these fields.
An education in statistical machine learning at Berkeley thus involves an immersion in the
traditions of statistical science broadly defined, a thoroughgoing involvement in exciting applied
problems, and an opportunity to help shape the future of statistics.
http://www.stat.berkeley.edu/~statlearning/
UC Berkeley AMPLab, AMP: ALGORITHMS MACHINES
PEOPLE
People will play a key role in data-intensive applications not simply as passive consumers of
results, but as active providers and gatherers of data, and to solve ML-hard problems that
algorithms on their own cannot solve. With crowdsourcing, people can be viewed as highly
valuable but unreliable and unpredictable resources, in terms of both latency and answer quality.
They must be incentivized appropriately to provide quality answers despite varying expertise,
diligence and even malicious behavior. The AMPLab is addressing these issues in all phases of
the analytics lifecycle.
https://amplab.cs.berkeley.edu/
Videos
https://www.youtube.com/user/BerkeleyAMPLab/videos?spfreload=10
Berkeley Institute for Data Science
The Berkeley Institute for Data Science (BIDS) was founded in fall 2013 to build on existing
campus strengths with a multidisciplinary emphasis that aims to facilitate and enhance the
development and application of cutting-edge data science techniques in the biological, physical,
social and engineering sciences. The Institute aims to build on the many recent innovations in
data science techniques so that they can be applied in effective ways to domain science
challenges.
BIDS brings together researchers across disciplines and enhances career paths for data scientists
through a number of newly created Data Science Fellows positions, graduate student fellowships,
boot-camps, special classes, and conferences of interest to the academic community and general
public.
The Institute’s initial support is provided by a 5-year $12.5 million grant from the Moore and
Sloan Foundations together with significant support provided by UC Berkeley. The “MooreSloan Data Science Environment” also supports similar programs with shared goals and
objectives at the University of Washington and New York University.
http://bids.berkeley.edu/
Data Science Lecture Series: Maximizing Human Potential Using Machine Learning-Driven
Applications
https://www.youtube.com/channel/UCBBd3JxQl455JkWBeulc-9w?spfreload=10
Department of Computer Science - ARTIFICIAL
INTELLIGENCE & MACHINE LEARNING, Princeton
University
Machine learning and computational perception research at Princeton is focused on the
theoretical foundations of machine learning, the experimental study of machine learning
algorithms, and the interdisciplinary application of machine learning to other domains, such as
biology and information retrieval. Some of the techniques that we are studying include boosting,
probabilistic graphical models, support-vector machines, and nonparametric Bayesian
techniques. We are especially interested in learning from large and complex data sets. Example
applications include habitat modeling of species distributions, topic models of large collections of
scientific articles, classification of brain images, protein function classification, and extensions of
the Wordnet semantic network.
http://www.cs.princeton.edu/research/areas/mlearn
Research Laboratories and Groups, University of California,
Los Angeles (UCLA)
Automated Reasoning Group (Adnan Darwiche)
Biocybernetics Laboratory (Joe DiStefano)
Center for Vision, Cognition, Learning and Art (Song-Chun Zhu)
Cognitive Systems Laboratory (Judea Pearl)
Concurrent Systems Laboratory (Yuval Tamir)
Digital Arithmetic and Reconfigurable Architecture Laboratory (Milos Ercegovac)
ER: Embedded and Reconfigurable System Design (Majid Sarrafzadeh)
Information and Data Management Group (multiple faculty)
Internet Research Laboratory (Lixia Zhang)
Laboratory for Embedded Collaborative Systems (LECS) (archived CENS documents)
Laboratory for Advanced Systems Research (LASR) (Peter Reiher)
MAGIX: Computer Graphics & Vision Laboratory (Demetri Terzopoulos)
Multimedia Information System Technology Group & Laboratory (Alfonso Cardenas)
Network Research Laboratory (Mario Gerla) Software Systems Group (multiple faculty) Vision Laboratory (Stefano Soatto)
VLSI Architecture, Synthesis & Technology (VAST) Laboratory (Jason Cong) Web Information Systems Laboratory (Carlo Zaniolo)
WiNG (Wireless Networking Group) (Songwu Lu)
http://www.cs.ucla.edu/research-labs/
Cornwell University
https://confluence.cornell.edu/display/ml/Home
https://confluence.cornell.edu/display/ML/Courses
Machine Learning Research, University of Illinois at Urbana
Champaign
The Department of Computer Science at the University of Illinois at Urbana Champaign has
several faculty members working in the area of machine learning, learning theory, explanation
based learning, learning in natural language processing and data mining. In addition, many
faculty members inside and outside the department whose primary research interests are in other
areas have specific research projects involving machine learning in some way.
http://ml.cs.illinois.edu/
Department of Computing + Mathematical Science, California
Institute of Technology, Caltech
The Computing Mathematical Sciences department pursues numerous research interests
covering a wide array of application areas. We take full advantage of Caltech's unique
interdisciplinary character by drawing on research expertise not only from our own department,
but from throughout the Institute. Research efforts within the department evolve at a fast pace,
and cover currently six discernible focus areas:
•
Discrete Differential Modeling
•
DNA Computing and Molecular Programming
•
Perceptual and Machine Learning for Autonomous Systems
•
Rigorous Systems Research
•
Scientific Computing and Applied Analysis
•
Theory of Computation
http://www.cms.caltech.edu/research/
Machine Learning, University of Washington
UW is one of the world's top centers of research in machine learning. We are active in most
major areas of ML and in a variety of applications like natural language processing, vision,
computational biology, the Web, and social networks. Check out the links on the left to find out
who's who and what's happening in ML at UW.
And be sure to see our CSE-wide efforts in Big Data
https://www.cs.washington.edu/research/ml/
"Big Data" Research and Education, University of Washington
UW CSE is driving the "Big Data" revolution. Our traditional strength in data
management (Magda Balazinska, Bill Howe, Dan Suciu), machine learning (Pedro Domingos),
and open information extraction (Oren Etzioni, Dan Weld) has recently been augmented by key
hires in machine learning (Emily Fox, Carlos Guestrin, Ben Taskar) and data visualization (Jeff
Heer).
Our efforts are coordinated with those of outstanding researchers in the University of
Washington's top-ten programs in Statistics, Biostatistics, and Applied Mathematics, among
others. Through the University of Washington eScience Institute (directed by Ed Lazowska) we
are integrally involved in ensuring that researchers across the campus have access to cutting-edge
approaches to data-driven discovery.
http://www.cs.washington.edu/research/bigdata
Social Robotics Lab - Yale University
The members of our lab perform research over a diverse collection of topics. Though these
projects approach social and developmental research from varied perspectives, they all share
common themes. Robots provide an embodied, empirical testbed that allows for repeated
validation. Robots also enable the use of social interactions as part of the modeled experimental
environment, staying grounded in real-world perceptions, and appropriately integrating
perceptual, motor, and cognitive skills.
http://scazlab.yale.edu/publications/all-publications
ML@GT, Georgia Institute of Technology
http://ml.cc.gatech.edu/
Machine Learning Research Group, University of Texas and
Austin
Machine learning is the study of adaptive computational systems that improve their
performance with experience.
The Machine Learning Research Group at UT Austin is led by Professor Raymond Mooney, and
our research has explored a wide variety of issues in machine learning for over two decades. Our
current research focuses primarily on natural language learning, statistical relational learning,
transfer learning, and active learning.
https://www.cs.utexas.edu/~ml/
Penn Research in Machine Learning, University of Pennsylvania
Current projects:
•
Structured Prediction
•
Bandit and Limited-Feedback Problems
•
Computation and Statistics
•
Online Learning, Sequential Prediction, Regret Minimization
•
Statistical Learning Theory
http://priml.upenn.edu/Main/Research
Machine Learning @ Columbia University
The Columbia Machine Learning Lab pursues research in machine learning with applications in
vision, graphs and spatio-temporal data. Funding provided by NSF.
http://www.cs.columbia.edu/learning/
New York City University
CILVR Lab and Center for Data Science
The CILVR Lab (Computational Intelligence, Learning, Vision, and Robotics) regroups three
faculty members, research scientists, postdocs, and students working on AI, machine learning,
and a wide variety of applications, notably computer perception, robotics, and health care.
http://cilvr.nyu.edu/doku.php
http://cds.nyu.edu/
University of Chicago
http://ml.cs.uchicago.edu/
The Johns Hopkins Center for Language and Speech Processing
(CLSP) Archive Videos
The Johns Hopkins Center for Language and Speech Processing (CLSP) is an interdisciplinary
research and educational center focused on the science and technology of language and speech.
Within its field, CLSP is recognized as one of the largest and most influential academic research
centers in the world. The center conducts research across a broad spectrum of fundamental and
applied topics including acoustic processing, automatic speech recognition, big data, cognitive
modeling, computational linguistics, information extraction, machine learning, machine
translation, and text analysis.
http://clsp.jhu.edu/seminars/archive/video/
MISCELLENEAOUS
IARPA Organization
The Intelligence Advanced Research Projects Activity (IARPA) invests in high-risk/high-payoff
research programs that have the potential to provide our nation with an overwhelming
intelligence advantage over future adversaries.
http://www.iarpa.gov/
MACHINE LEARNING
RESEARCH GROUPS, in
Canada
Machine Learning Lab, University of Toronto
Machine Learning @ UofT:
The Department of Computer Science at the University of Toronto has several faculty members
working in the area of machine learning, neural networks, statistical pattern recognition,
probabilistic planning, and adaptive systems. In addition, many faculty members inside and
outside the department whose primary research interests are in other areas have specific research
projects involving machine learning in some way.
http://learning.cs.toronto.edu/
The Fields Institute for Research in Mathematical Science,
University of Toronto
The Fields Institute is a center for mathematical research activity - a place where mathematicians
from Canada and abroad, from business, industry and financial institutions, can come together to
carry out research and formulate problems of mutual interest. Our mission is to provide a
supportive and stimulating environment for mathematics innovation and education. The Fields
Institute promotes mathematical activity in Canada and helps to expand the application of
mathematics in modern society.
http://www.fields.utoronto.ca/
Artificial Intelligence Research Group, University of Waterloo
The Artificial Intelligence Group conducts research in many areas of artificial intelligence. The
group has active interests in: models of intelligent interaction, multi-agent systems, natural
language understanding, constraint programming, computational vision, robotics, machine
learning, and reasoning under uncertainty.
http://ai.uwaterloo.ca/
Course material
http://ai.uwaterloo.ca/coursegr.html
Artificial Intelligence Research Groups, University of British
Columbia
Research Groups
Computer Vision and Robotics: This is one of the most influential vision and robotics
groups in the world. It is this group that created RoboCup and the celebrated SIFT features. The
students in this group have won most of the AAAI Semantic Robot Challenges. The group has
four active faculty: David Lowe, Jim Little, Alan Mackworth and Bob Woodham. Empirical Algorithmics: Led by Holger Hoos and Kevin Leyton Brown, this research group
studies the empirical behaviour of algorithms and develops automated methods for improving
algorithmic performance. Work by the empirical algorithmics group at UBC/CS has lead to
substantial improvements in the state of the art in solving a wide range of prominent problems,
including SAT, AI Planning and Mixed Integer Programming, and won numerous awards. Game Theory and Decision Theory: With Kevin Leyton Brown in the lead, this group has made
significant contributions to algorithmic game theory, multiagent systems and mechanism design.
David Poole also contributes to this group with his work on decision processes and planning. The
research problems attacked by this group are therefore of great importance to e-commerce,
auctions and advertising.
Intelligent User Interfaces: With Cristina Conati and Giuseppe Carenini this group's goal is to
investigate principles and techniques for preference modeling and elicitation, interactive decision
making, user-adaptive information visualization and visual interfaces for text analysis. Knowledge Representation and Reasoning: David Poole leads this group with his foundational
work on probabilistic first order logic and semantic science. This work on logical and probabilistic
reasoning has been of profound and broad impact in the field of artificial intelligence (AI).
Holger Hoos is also an important member of this group with his work on satisfiability (SAT) and
planning, which has won numerous awards and competitions.
Machine Learning: With the guidance of Nando de Freitas and Kevin Murphy, this group's
vision is to advance the frontier of knowledge in Bayesian inference, Monte Carlo algorithms,
probabilistic graphical models, neural computation, personalization, mining web-scale datasets,
prediction and optimal decision making. Natural Language Processing: Under the leadership of Giuseppe Carenini and Raymond
Ng (Data Management and Mining Lab) this group's vision is to further our understanding
of abstactive summarization, mining conversations and evaluative text, natural language
generation.
https://www.cs.ubc.ca/cs-research/lci/research-groups/machine-learning
MILA, Machine Learning Lab, University of Montreal
The MILA is the Institut des algorithmes d'apprentissage de Montréal, or the Montreal Institute
for Learning Algorithms.
Mission
•
federate researchers in the domain of Deep Learning,
•
provide a plat-form of collaboration, and co-supervision
•
share human ressources as well as insfrastrutures and computer networks
•
provide a unique access to a pool of companies which can benefit of the opportunities given
by automatic learning algorithms
Scientific Mission
•
supervised learning and pattern recognition
•
non-supervised and semi-supervised learning
•
representation learning, and deep learning representations
•
computer vision applications
•
applications in natural language precessing
•
applications signal modelisation such as sounds and music
•
applications on large scale data (big data)
See Research for more info on our main research interests.
Expertise
Researchers from MILA have developed an expertise in deep neural networks (both discriminant
and generative) and their applications to vision, speech and language. MILA is world-renowned
for many breakthroughs in developing novel deep learning algorithms and applying them to
various domains. They include, but are not limited to, neural language modelling, neural
machine translation, object recognition, structured output generative modelling and neural
speech recognition.
http://www.mila.umontreal.ca/
Intelligence artificielle, University of Sherbrooke
Trois équipes oeuvrent dans cet axe de recherche; d'autres projets sont conduits par des
chercheurs agissant à titre individuel.
L'équipe de recherche dans le domaine des systèmes tutoriels intelligents ASTUS (Apprentissage
par Système Tutoriel de l'Université de Sherbrooke) travaille autour des thèmes suivants:
représentation des connaissances, modélisation de l'utilisateur, interactions humain-machine,
psychologie de l'éducation et sciences cognitives.
L'équipe de recherche dans le domaine du forage de données, Prospectus (Prospection de
données à l'Université de Sherbrooke), travaille autour des thèmes suivants: prospection des
données, prospection et modélisation des connaissances, reconnaissance de formes, segmentation
et classification, méthodes d'intelligence artificielle non symboliques, réseaux de neurones et
réseaux bayésiens, détection de structures et comportements latents.
L'équipe de recherche dans le domaine de la planification en intelligence artificielle, PLANIART,
travaille autour de thèmes suivant : planification de trajectoires, planification de comportements
et reconnaissance de plans dans les jeux vidéo et en robotique mobile. La planification permet de
décider quoi faire (décomposition des buts), comment le faire (allocation des ressources) et quand
le faire (ordonnancement).
http://www.usherbrooke.ca/informatique/recherche/domaines-de-recherche/intelligenceartificielle/
Centre de recherche sur les environnements intelligents,
University of Sherbrooke
Le Centre de Recherche sur les Environnements Intelligents (CREI) comprend 13 membres
réguliers, 11 membres associés et plus d'une soixantaine d'étudiants gradués. Le CREI fédère 7
laboratoires dont les intérêts de recherche portent sur l'imagerie numérique, l’intelligence
artificielle, la modélisation-validation et l’intelligence ambiante. Les chercheurs du CREI
collaborent depuis des années, développant des applications en lien avec les environnements
intelligents.
http://www.usherbrooke.ca/crei/
Machine Learning Research Group, University of Laval
http://graal.ift.ulaval.ca/
More to come …
MACHINE LEARNING
RESEARCH GROUPS, in
Brazil
USP - UNIVERSIDADE DE SÃO PAULO,
Instituto de Ciências Matemáticas e de Computação
http://www.icmc.usp.br/Portal/
More coming soon …
MACHINE LEARNING
RESEARCH GROUPS, in
United Kingdom
The Centre for Computational Statistics and Machine Learning
(CSML), University College London
The Centre for Computational Statistics and Machine Learning (CSML) spans three
departments at University College London, Computer Science, Statistical Science, and the
Gatsby Computational Neuroscience Unit.
The Centre will pioneer an emerging field that brings together statistics, the recent extensive
advances in theoretically well-founded machine learning, and links with a broad range of
application areas drawn from across the college, including neuroscience, astrophysics, biological
sciences, complexity science, etc. There is a deliberate intention to maintain and cultivate a
plurality of approaches within the centre including Bayesian, frequentist, on-line, statistical, etc.
http://www.csml.ucl.ac.uk/
CASA (Centre for Advanced Spatial Studies) Working Papers,
University College London
http://www.bartlett.ucl.ac.uk/casa/latest/publications/working-papers
Example #198
A global inter-country economic model based on linked input-output models
We present a new, flexible and extensible alternative to multi-regional input-output (MRIO) for
modelling the global economy. The limited coefficient set of MRIO (technical coefficients only) is
extended to include two new sets of coefficients, import ratios and import propensities. These
new coefficient sets assist in the interaction of the new model with other social science models
such as those of trade, migration, international security and development aid. The model uses
input-output models as descriptions of the internal workings of countries' economies, and
couples these more loosely than in MRIO using trade data for commodities and services from the
UN. The model is constructed using a minimal number of assumptions, seeks to be as
parsimonious as possible in terms of the number of coefficients, and is based to a great extent on
empirical observation. Two new metrics are introduced, measuring sectors' economic significance
and economic self-reliance per country. The Chinese vehicles sector is shown to be the world's
most significant, and self-reliance is shown to be strongly correlated with population. The new
model is shown to be equivalent to an MRIO under an additional assumption, allowing existing
analysis techniques to be applied.
http://www.bartlett.ucl.ac.uk/casa/publications/working-paper-198
The Machine Learning Research Group in the Department of
Engineering Science, Oxford University
The Machine Learning Research Group is a sub-group within Information Engineering
(Robotics Research Group) in the Department of Engineering Science of the University of
Oxford.
We are interested in probabilistic reasoning applied to problems in science, engineering and
computing. We use the tools of statistical, and in particular Bayesian, inference to deal rationally
with uncertainty and information in a number of domains including astronomy, biology, finance,
image & signal processing and multi-agent systems, as well as researching the theory of Bayesian
modelling and inference.
http://www.robots.ox.ac.uk/~parg/doku.php?id=home
Machine Learning research in the Department of Computer Science
Machine Learning research in the Department of Computer Science evolves along the following
directions
Deep learning
Large scale machine learning and big data
Random forests and ensemble methods
Proabilistic graphical models
Bayesian optimisation
Reinforcement learning
Monte Carlo methods and randomised algorithms.
Applications to control, games, language understanding, computer vision, speech, time series,
and all types of structured and unstructured data.
The group is part of wider Machine Learning initiative at Oxford, which includes researchers in
statistics (Yee Whye Teh, Arnaud Doucet, Chris Holmes) and information engineering (Michael
Osborne,Steve Roberts,Frank Wood)
http://www.cs.ox.ac.uk/activities/machlearn/
Machine Learning Group, Imperial College
Transforming Big Data into Knowledge
The Machine Learning Group is a cross-faculty network of Imperial College’s Department of
Computing. We embrace research at the interface of machine learning, artificial intelligence and
its Big Data applications.
Research
With an ever-increasing use of Internet, digital devices and science, tremendous amount of data
encapsulating valuable knowledge have become available. We reflect this impact in the many
vibrant facets of this field from automated reasoning to probabilistic inference, from creative and
affective computing to human-computer interaction, from machine vision to neurotechnology,
from bioinformatics to medical & economic applications. Broadly members of the group belong
to at least one of the two pillars of Machine Learning:
¥
Data-level machine learning to support feature extraction from data (“Big Data”)
¥
Knowledge-level machine learning and knowledge representation to extract readable and
insightful relational knowledge which supports human-understandable machine inference
At the data-level, ongoing research focuses on applying a wide variety of feature-based machine
learning techniques in key application areas. Notable recent successes in these areas include the
application of machine learning to medical imaging of the brain and heart (Rueckert), human
emotions and social signals (Pantic, Zafeiriou), robotic vision (Davison), autonomous systems
(Deisenroth), medical applications (Gillies), computational neuroscience and Brain-MachineInterfaces (Faisal).
At the knowledge-level, our key expertise lies in Relational and First-Order Logic Learning. Past
research had major impact in scientific discovery in biological prediction tasks (Muggleton),
security and semi-automated software engineering (Russo). Moreover, the closely related areas of
smart analysis of biological or economic network topologies (Przulj) and robust systems
optimisation (Parpas) and scalable data analytics (Pietzuch).
http://wp.doc.ic.ac.uk/mlg/
The Data Science Institute, Imperial College
The Data Science Institute at Imperial College is being established to conduct research on the
foundations of data science by developing advanced theory, technology and systems that will
contribute to the state-of-the-art in data science and big data, and support data-driven research at
Imperial and beyond. The Institute will empower Imperial and its partners to collaborate in the
pursuit of world class data-driven innovation.
http://www.imperial.ac.uk/data-science/
The University of Edinburgh, Institute for Adaptive and Neural
Computation
http://www.anc.ed.ac.uk/machine-learning/
Cambridge University
About Us
We are a part of the Computational and Biological Learning Laboratory located in the
Department of Engineering at the University of Cambridge. The research in our group is very
broad, and we are interested in all aspects of machine learning. Particular strengths of the group
are in Bayesian approaches to modelling and inference in statistical applications. The type of
work we do can range from studying fundamental concepts in applied Bayesian statistics, all the
way to getting our algorithms to perform competitively against the state-of-the-art in big-data
applications. We also work in a broad range of application domains, including neuroscience,
bioinformatics, finance, social networks, and physics, just to name a few, and we actively seek to
collaborate with other groups within the Department of Engineering, throughout the university
as a whole, and with other groups within the UK and around the world. If you are interested in
finding out more about our research, please visit our Publications page, or visit the individual
research pages of our group members.
http://mlg.eng.cam.ac.uk/
Centre for Intelligent Sensing, Queen Mary University of
London
I am delighted to introduce you to the Centre for Intelligent Sensing (CIS).
CIS is a focal point for research in Intelligent Sensing at Queen Mary University of London. The
Centre focuses on breakthrough innovations in computational intelligence that will have a major
impact in transforming the way humans and machines utilise a variety of sensor inputs for
interpretation and decision making.
The Centre gathers 33 academics with expertise in all aspects of intelligent sensing from the
design and building of the physical sensors to the mathematical and computational challenges of
extracting key information from real-time streams of high-dimensional data acquired by
networks of sensors. The legal, ethical and social implications of these processes are also
addressed.
CIS researchers have an outstanding international reputation in camera and sensor networks,
image and signal processing, computer vision, data mining, pattern recognition, machine
learning, bio-inspired computing, human-computer interaction, affective computing and social
signal processing.
The Centre also provides post-graduate research and teaching in Intelligent Sensing, and is
responsible for the MSc programme in Computer Vision.
I do hope that you will enjoy reading this brochure and learning more about who we are and
how the research we do helps to address important societal challenges. I also invite you to keep
up to date with our activities by following us on Twitter @intelsensing and to enjoy our research
videos at http://cis.eecs.qmul.ac.uk.
Professor Andrea Cavallaro Director
http://cis.eecs.qmul.ac.uk/
Videos
https://www.youtube.com/user/intelsensing/feed?spfreload=10
ICRI, The Intel Collaborative Research Institute
The Intel Collaborative Research Institute is concerned with how to enhance the social,
economic and environmental well being of cities by advancing compute, communication and
social constructs to deliver innovations in system architecture, algorithms and societal
participation.
http://www.cities.io/
MACHINE LEARNING
RESEARCH GROUPS, in
France
Magnet, MAchine learninG in information NETworks, INRIA
The Magnet project aims to design new machine learning based methods geared towards mining
information networks. Information networks are large collections of interconnected data and
documents like citation networks and blog networks among others. For this, we will define new
structured prediction methods for (networks of) texts based on machine learning algorithms in
graphs. Such algorithms include node classification, link prediction, clustering and probabilistic
modeling of graphs. Envisioned applications include browsing, monitoring and recommender
systems, and more broadly information extraction in information networks. Application domains
cover social networks for cultural data and e-commerce, and biomedical informatics.
https://team.inria.fr/magnet/
Sierra Team - Ecole Normale Superieure , CNRS, INRIA
SIERRA is based in the Laboratoire d'Informatique de l'École Normale Superiéure (CNRS/
ENS/INRIA UMR 8548) and is a joint research team between INRIA Rocquencourt, École
Normale Supérieure de Paris and Centre National de la Recherche Scientifique.
We follow four main research directions:
Supervised learning:
This part of our research focuses on methods where, given a set of examples of input/output
pairs, the goal is to predict the output for a new input, with research on kernel methods,
calibration methods, structured prediction, and multi-task learning.
Unsupervised learning:
We focus here on methods where no output is given and the goal is to find structure of certain
known types (e.g., discrete or low-dimensional) in the data, with a focus on matrix factorization,
statistical tests, dimension reduction, and semi-supervised learning.
Parsimony:
The concept of parsimony is central to many areas of science. In the context of statistical
machine learning, this takes the form of variable or feature selection. The team focuses primarily
on structured sparsity, with theoretical and algorithmic contributions.
Optimization:
Optimization in all its forms is central to machine learning, as many of its theoretical frameworks
are based at least in part on empirical risk minimization. The team focuses primarily on convex
and bandit optimization.
http://www.di.ens.fr/sierra/
ENS Ecole Normale Superieure
The Computer Science Department of ENS (DI ENS) is both a teaching department and a
research laboratory affiliated with CNRS and INRIA (UMR 8548).
On the teaching side, the DI ENS trains students through its Pre-doctoral program and the
Masters program (MPRI).
On the research side, the research is structured into research groups. The DI ENS is member of
the Fondation Sciences Mathématiques de Paris.
The Computer Services (SPI) and the Mathematics and Computer Science Library are common
to the DI ENS and the Department of Mathematics and Applications (DMA).
Teams of the Computer Science Department at École normale supérieure
Antique
Static analysis by abstract interpretation (head: Xavier Rival)
Cascade
Cryptography (head: David Pointcheval)
Data
Signal Processing and Classification (head: Stéphane Mallat)
Dyogene
Dynamics of Geometric Networks (head: Marc Lelarge)
Parkas
Parallelism of Synchronous Kahn Networks (head: Marc Pouzet)
Sierra
Machine Learning (head: Francis Bach)
Talgo
Theory, Algorithms, topoLogy, Graphs, and Optimization (head: Claire Mathieu)
Willow
Artificial Vision (head: Jean Ponce)
http://www.di.ens.fr/
WILLOW Publications and PhD Thesis
Our research is concerned with representational issues in visual object recognition and scene
understanding. Our objective is to develop geometric, physical, and statistical models for all
components of the image interpretation process, including illumination, materials, objects,
scenes, and human activities. These models will be used to tackle fundamental scientific
challenges such as three-dimensional (3D) object and scene modeling, analysis, and retrieval;
human activity capture and classification; and category-level object and scene recognition. They
will also support applications with high scientific, societal, and/or economic impact in domains
such as quantitative image analysis in domains such as archaeology and cultural heritage
conservation; film post-production and special effects; and video annotation, interpretation, and
retrieval. Moreover, machine learning now represents a significant part of computer vision
research, and one of the aims of the project is to foster the joint development of contributions to
machine learning and computer vision, together with algorithmic and theoretical work on
generic statistical machine learning.
http://www.di.ens.fr/willow/publications/YearOnly/publications.html
Laboratoire Hubert Curien UMR CNRS 5516, Machine
Learning
Group leader: Marc Sebban
Machine learning is the sub-field of artificial intelligence and computer science that studies how
machines can learn. A machine learns when it modifies its own behavior as the result of its past
experience and performance. Because of this need to analyze the past experience, machine
learning techniques are very related to data mining ones. The Machine Learning team is divided
into two collaborating sub-projects, one more specialised in statistical learning theory and one
more specialised in data mining and information retrieval.
In the first sub-project statistical learning theory, the precise focus is on:
- Metric Learning,
- Transfert Learning and Domain Adaptation
- Machine Learning for Computer Vision Applications
- Machine Learning for Natural Language Processing
In the data mining and information retrieval sub-project, the focus is on:
- Developing methods to efficiently mine structured data: documents, graph, social networks, etc.,
- Modeling heterogeneous structured documents for information retrieval,
- Data Mining for Image and Video Analysis
http://laboratoirehubertcurien.fr/spip.php?rubrique28
MACHINE LEARNING
RESEARCH GROUPS, in
Germany
Max Planck Institute for Intelligent Systems, Tübingen site
Intelligent systems can optimise their structure and properties in order to successfully function
within a complex, partially changing environment. Three sub-areas perception, learning and
action can be differentiated here. The scientists at the Max Planck Institute for Intelligent
Systems are carrying out basic research and development of intelligent systems in all three subareas. Research expertise in the areas of computer science, material science and biology is
brought together in one Institute, at two different sites. Machine learning, image recognition,
robotics and biological systems will be investigated in Tübingen, while so-called learning material
systems, micro- and nanorobitics, as well as self-organisation will be explored in Stuttgart.
Although the focus is on basic research, the Institute has a high potential for practical
applications in, among other areas, robotics, medical technology, and innovative technologies
based on new materials.
http://www.mpg.de/1342929/intelligenteSystemeTuebingen
BRML Research Lab, Institute of Informatics at the Technische
Universität München
Patrick van der Smagt's BRML is a collaborative research lab of fortiss--an Institute at TUM;
Chair for Robotics and Embedded Systems, Institute of Informatics at the Technische
Universität München; and the DLR Institute of Robotics and Mechatronics. The heart of our
inforfacious research is formed by machine learning. Within that, we focus on biomechanics and
body-machine interfaces. We apply our methods to advanced rehabilitation and assistive robotics.
http://brml.org/
HCI, Heidelberg Collaboratory for Image Processing,
Universität Heidelberg
The HCI is an "Industry on Campus" project established in the context of the German
excellence initiative jointly by the University of Heidelberg and the following companies:...
The HCI has been established in January, 2008 and moved to its new premises in March, 2008.
The HCI consists of four chairs and one associate groups:
- Computer Vision(Ommer lab)
- Digital Image Processing (Jähne lab)
- Image and Pattern Analysis (Schnörr lab)
- Image Processing and Modelling (Garbe lab)
- Multidimensional Image Processing (Hamprecht lab)
The strategic concept of the HCI is built on the simple fact that basic problems in image
processing are largely application-independent. The approximately 80 scientists working in the
HCI conduct basic research with the aim of providing cutting-edge solutions to basic image
analysis problems for applications in industry, environmental and life sciences. The HCI is part of
the institutional strategy of the University of Heidelberg within the Excellence Initiative.
http://hci.iwr.uni-heidelberg.de/
MACHINE LEARNING
RESEARCH GROUPS, in
Switzerland
EPFL Ecole Polytechnique Federale de Lausanne, Switzerland
Artificial Intelligence & Machine Learning
The modern world is full of artificial, abstract environments that challenge our natural
intelligence. The goal of our research is to develop Artificial Intelligence that gives people the
capability to master these challenges, ranging from formal methods for automated reasoning to
interaction techniques that stimulate truthful elicitation of preferences and opinions. Another
aspect is characterizing human intelligence and cognitive science, with applications in humancomputer interaction and computer animation.
Machine Learning aims to automate the statistical analysis of large complex datasets by adaptive
computing. A core strategy to meet growing demands of science and applications, it provides a
data-driven basis for automated decision making and probabilistic reasoning. Machine learning
applications at EPFL range from natural language and image processing to scientific imaging as
well as computational neuroscience.
http://ic.epfl.ch/intelligence-artificielle-et-apprentissage-automatique
IDSIA: the Swiss AI Lab
The Swiss AI Lab IDSIA (Istituto Dalle Molle di Studi sull'Intelligenza Artificiale) is a non-profit
oriented research institute for artificial intelligence, affiliated with both the Faculty of Informatics
of the Università della Svizzera Italiana and the Department of Innovative Technologies of
SUPSI, the University of Applied Sciences of Southern Switzerland. We focus on machine
learning (deep neural networks, reinforcement learning), operations research, data mining, and
robotics.
IDSIA researchers win nine international competitions
Our neural networks research team has won nine international competitions in machine learning
and pattern recognition. Follow the link to learn more about the methods that allowed us to
achieve these results.
http://www.idsia.ch/
MACHINE LEARNING
RESEARCH GROUPS, in
Netherlands
Machine Learning Research Groups in The Netherlands
A large number of researchers and research groups are active in the broad area of machine
learning, ranging from Bayesian inference, to robotics and neural networks. Collected is a brief
overview, the researchers can be contacted for more information.
http://www.mlplatform.nl/researchgroups/
MACHINE LEARNING
RESEARCH GROUPS, in
POLAND
University of Warsaw, Dept. of Mathematics, Informatics and
Mechanics
Algorithms group
Our research
The research of our group focuses on several branches of modern algorithmics and the
underlying fields of discrete mathematics. The latter include combinatorics on words and on
ordered sets, graph theory, formal languages, computational geometry, information theory,
foundation of cryptography. The research on algorithms covers parallel and distributed
algorithms, large scale algorithms, approximation and randomized algorithms, fixed-parameter
and exponential-time algorithms, dynamic algorithms, radio algorithms, multi-party
computations, and cryptographic protocols.
http://zaa.mimuw.edu.pl/
more to come …
MACHINE LEARNING
RESEARCH GROUPS, in
India
RESEARCH LABS, Department of Computer Science and
Automation, IISc, Bangalore
The department houses a number of research labs, each dedicated to a focused area of research.
The lab members comprise faculty, students (both ME and research students), and dedicated
project staff. The labs are usually equipped with specialized software and computing facilities,
and carry out work on various projects in their area.
http://www.csa.iisc.ernet.in/research/research-reslabs.php
MLSIG: Machine Learning Special Interest Group, Indian
Institute of Science
The Machine Learning Special Interest Group (MLSIG) is a group of faculty and students at the
Indian Institute of Science in Bangalore, who share interests in machine learning and related
fields. The group enjoys the presence of several outstanding faculty engaged in cutting-edge
research on a variety of aspects of machine learning and related fields, ranging from theoretical
foundations to new algorithms as well as several exciting applications; highly motivated PhD and
Masters' research students who complement and expand the energy of the faculty; and close
proximity and partnerships with a variety of industry research laboratories, both within
Bangalore and outside the city.
http://drona.csa.iisc.ernet.in/~mlcenter/
More to come …
MACHINE LEARNING
RESEARCH GROUPS, in
China
Peking University
School of Electronics Engineering and Computer Science
We have built strong cooperation with many famous academic organizations, e.g., University of
California at Berkeley, University of California at Los Angeles, Stanford University, University of
Illinois at Urbana-Champaign, Oxford University, University of Edinburgh, Paris High Division,
University of Tokyo, Waseda University.
These cooperation cover most of our research directions: from electronic communication, optical
communication, to quantum communication; from computer hardware, software, to network;
from micro-electromechanical system to nano techniques; from machine perception to machine
intelligence.
Center for Information Science
Main Research Areas
鈻ð Machine Vision Image processing, image and video compression, pattern recognition and
machine learning, biometrics, 3-D visual informational processing.
鈻ð Machine Audition Computational auditory models, speech signal processing, spoken
language processing, natural language processing, intelligent human-machine interaction.
鈻ð Intelligent Information Systems Computational intelligence, multimedia resource
organization and management, data mining and content-oriented massive information
integration, analysis, processing and service.
鈻ð Physiology and Psychology for Machine Perception Electro-physiology, psychophysics and
neurophysiology of vision and audition, theories and methods of hearing rehabilitation.
http://www.cis.pku.edu.cn/
http://eecs.pku.edu.cn/eecs_english/CnterInfoScience.shtml
Institute of Computational Linguistics
Main Research Areas
鈻ð Comprehensive Language Knowledge Databases, including large scale word-level
information database for the Chinese language.
鈻ð Corpus based NLP, including large scale corpus processing and statistical models and
theories.
鈻ð Domain Knowledge Construction, including computational terminology and term database
construction.
鈻ð Multilingual Semantic Lexicons, focusing on the study of a Chinese concept dictionary.
鈻ð Computer-aided Translation, focusing on translation methods for technical documents.
鈻ð Information Retrieval, Extraction and Summarization, including various levels of docu ment
processing such as document retrieval, topic extraction, summarization, and question answering.
http://eecs.pku.edu.cn/index.aspx?
menuid=5&type=articleinfo&lanmuid=84&infoid=232&language=cn
http://eecs.pku.edu.cn/eecs english/InstComputationalLinguistics.shtml
PKU Real course online
http://www.grids.cn/
University of Science and Technology of China, USTC
https://en.wikipedia.org/wiki/University of Science and Technology of China
Nanjing University
Lamda Group
LAMDA is affiliated with the National Key Laboratory for Novel Software Technology and the
Department of Computer Science & Technology, Nanjing University, China. It locates at
Computer Science and Technology Building in the Xianlin campus of Nanjing University,
mainly in Rm910. The Founding Director of LAMDA is Prof. Zhi-Hua Zhou.
"LAMDA" means "Learning And Mining from DatA". The main research interests of LAMDA
include machine learning, data mining, pattern recognition, information retrieval, evolutionary
computation, neural computation, and some other related areas. Currently our research mainly
involves: ensemble learning, semi-supervised and active learning, multi-instance and multi-label
learning, cost-sensitive and class-imbalance learning, metric learning, dimensionality reduction
and feature selection, structure learning and clustering, theoretical foundations of evolutionary
computation, improving comprehensibility, content-based image retrieval, web search and
mining, face recognition, computer-aided medical diagnosis, bioinformatics, etc.
http://lamda.nju.edu.cn/MainPage.ashx
More to come …
MACHINE LEARNING
RESEARCH GROUPS, in
Russia
Moscow State University
http://www.msu.ru/
More to come …
MACHINE LEARNING
RESEARCH GROUPS, in
Australia
NICTA Machine Learning Research Group
We want to change the world.
Machine learning is a powerful technology that can help solve almost any problem. We think
about it differently to much of the machine learning research community.
We focus on important and challenging problems such as
• Navigating the world’s patent literature
• Finding sites for geothermal energy production
• Predicting the output of rooftop solar photovoltaic systems
• Building actionable data analytics for the enterprise
• Managing the traffic in large cities
• Predicting failures of widespread infrastructure
We develop new technologies to solve these problems and make them freely available or
commercially deploy them.
We regularly host visitors and regularly have job openings and opportunities for PhD students. If
you also want to change the world, come and join us.
http://www.nicta.com.au/
ACADEMICS, USA
Andrew Ng, Stanford University
Andrew Ng is a Co-founder of Coursera and the Director of the Stanford AI Lab. In 2011 he led
the development of Stanford University’s main MOOC (Massive Open Online Courses)
platform and also taught an online Machine Learning class that was offered to over 100,000
students, leading to the founding of Coursera.
Ng’s goal is to give everyone in the world access to a high quality education, for free. Today,
Coursera partners with some of the top universities in the world to offer high quality free online
courses. It is the largest MOOC platform in the world.
Outside online education, Ng’s work at Stanford is on machine learning with an emphasis on
deep learning. He also founded and led a project at Google to develop massive-scale deep
learning algorithms. It resulted in the famous cat detector popularly known as the “Google cat”
in which a massive neural network with 1 billion parameters learned from unlabeled YouTube
videos.
http://cs.stanford.edu/people/ang/?page id=414
Emmanuel Candes, Stanford University
Research Areas
Compressive sensing, mathematical signal processing, computational harmonic analysis, statistics,
scientific computing. Applications to the imaging sciences and inverse problems. Other topics of
recent interest include theoretical computer science, mathematical optimization, and information
theory.
http://statweb.stanford.edu/~candes/
Tom Mitchell, Carnegie Mellon University (CMU)
Dr. Mitchell works on new learning algorithms, such as methods for learning from labeled and
unlabeled data. Much of his research is driven by applications of machine learning such as
understanding natural language text, and analyzing fMRI brain image data to model human
cognition.
http://www.cs.cmu.edu/~tom/
Robert Kass, CMU
Dr. Kass has long-standing interests in the Bayesian approach to statistical inference, and has
contributed to the development of Bayesian methods and their computational implementation.
Over the past 10 years he has focused on statistical problems in neuroscience, especially in the
analysis of signals coming from single neurons and from multiple neurons recorded
simultaneously.
http://www.stat.cmu.edu/~kass/
Alexander J. Smola, CMU
Researcher, Google
Professor, Carnegie Mellon University
Interests
My primary research interest covers the following four areas:
¥
Scalability of algorithms. This means pushing algorithms to internet scale, distributing
them on many (faulty) machines, showing convergence, and modifying models to fit these
requirements. For instance, randomized techniques are quite promising in this context. In other
words, I'm interested in big data.
¥
Kernels methods are quite an effective means of making linear methods nonlinear and
nonparametric. My research interests include support vector Machines, gaussian processes, and
conditional random fields. Kernels are very useful also for the representation of distributions, that
is two-sample tests, independence tests and many applications to unsupervised learning.
¥
Statistical modeling, primarily with Bayesian Nonparametrics is a great way of addressing
many modeling problems. Quite often, the techniques overlap with kernel methods and
scalability in rather delightful ways.
Applications, primarily in terms of user modeling, document analysis, temporal models, and
modeling data at scale is a great source of inspiration. That is, how can we find principled
techniques to solve the problem, what are the underlying concepts, how can we solve things
automatically.
http://alex.smola.org/
https://www.youtube.com/channel/UCYoS2VT03weLA7uzvL2Vybw?spfreload=10
Maria-Florina Balcan, CMU
Research
My main research interests are in machine learning and theoretical computer science. I am a
member of the machine learning group, computer science theory group, and the ACO program.
Current research focus includes:
- Developing foundations and principled, practical algorithms for important modern learning
paradigms. These include interactive learning, distributed learning, multi-task learning, and lifelong learning. My research formalizes and explicitly addresses all constraints and important
challenges of these new settings, including statistical efficiency, computational efficiency, noise
tolerance, limited supervision or interaction, privacy, low communication, and incentives.
- Analyzing the overall behavior of complex systems in which multiple agents with limited
information are adapting their behavior based on past experience, both in social and engineered
systems contexts.
- Computational aspects in game theory and economics.
- Analysis of the algorithms beyond the worst case and more generally identifying interesting and
realistic models of computation that provide a better alternative to traditional worst-case models
in a broad range of optimization problems.
http://www.cs.cmu.edu/~ninamf/
Abulhair Saparov, CMU
http://www.cs.cmu.edu/directory/abulhair-saparov
John Canny, Berkeley University,
John F. Canny (1953) is an Australian computer scientist, and Paul and Stacy Jacobs
Distinguished Professor of Engineering in the Computer Science Department of the University
of California, Berkeley. He has made significant contributions in various areas of computer
science and mathematics including artificial intelligence, robotics, computer graphics, humancomputer interaction, computer security, computational algebra, and computational geometry.
http://www.cs.berkeley.edu/~jfc/papers/grouped.html
http://www.eecs.berkeley.edu/Faculty/Homepages/canny.html
Robert Schapire, Princeton University
Robert Elias Schapire is the David M. Siegel '83 Professor in the computer science
department at Princeton University. His primary specialty is theoretical and applied machine
learning.
His work led to the development of the boosting meta-algorithm used in machine learning.
Together with Yoav Freund, he invented the AdaBoost algorithm in 1996. He received the Gödel
prize in 2003 for his work on AdaBoost with Yoav Freund.
In 2014, Schapire was elected to the National Academy of Engineering for his contributions to
machine learning through the invention and development of boosting algorithms.[1] (Source
Wikipedia)
http://www.cs.princeton.edu/~schapire/
http://mitpress.mit.edu/sites/default/files/titles/content/9780262017183_sch_0001.pdf
Mona Singh, Princeton University
My group develops algorithms for a diverse set of problems in computational molecular biology.
We are particularly interested in predicting specificity in protein interactions and uncovering how
molecular interactions and functions vary across context, organisms and individuals. We leverage
high-throughput biological datasets in order to develop data-driven algorithms for predicting
protein interactions and specificity; for analyzing biological networks in order to uncover cellular
organization, functioning, and pathways; for uncovering protein functions via sequences and
structures; and for analyzing proteomics and sequencing data. An appreciation of protein
structure guides much of our research.
http://www.cs.princeton.edu/~mona/
Olga Troyanskaya, Princeton University
The goal of my research is to bring the capabilities of computer science and statistics to the study
of gene function and regulation in the biological networks through integrated analysis of
biological data from diverse data sources--both existing and yet to come (e.g. from diverse gene
expression data sets and proteomic studies). I am designing systematic and accurate
computational and statistical algorithms for biological signal detection in high-throughput data
sets. More specifically, I am interested in developing methods for better gene expression data
processing and algorithms for integrated analysis of biological data from multiple genomic data
sets and different types of data sources (e.g. genomic sequences, gene expression, and proteomics
data).
http://reducio.princeton.edu/cm/node/13
Judea Pearl, Cognitive System Laboratory, UCLA
Judea Pearl (born 1936) is an Israeli-born American computer scientist and philosopher, best
known for championing the probabilistic approach to artificial intelligence and the development
of Bayesian networks (see the article on belief propagation). He is also credited for developing a
theory of causal and counterfactual inference based on structural models (see article on
causality). He is the 2011 winner of the ACM Turing Award, the highest distinction in computer
science, "for fundamental contributions to artificial intelligence through the development of a
calculus for probabilistic and causal reasoning". (source Wikipedia)
http://bayes.cs.ucla.edu/csl papers.html
Justin Esarey Lectures, Assistant Professor of Political Science,
Rice University
Dr. Justin Esarey is an Assistant Professor of Political Science at Rice University who specializes
in political methodology. His areas of expertise include detecting and presenting context-specific
relationships, model specification and sensitivity, the analysis of binary data, laboratory social
experimentation, and promoting thoughtful inference (and thinking about inference) by using
technology to make methodological resources available to the scholarly public. His recent
substantive projects study the relationship between corruption and female participation in
government, the effect of "naming and shaming" on human rights abuse, and the behavioral
implications of political ideology.
https://www.youtube.com/user/jeesarey/videos?spfreload=10
Justin Esarey Publications & Software, Assistant Professor of Political Science, Rice University
http://jee3.web.rice.edu/research.htm
Hal Daume III, University of Maryland
I am Hal Daumé III, an Associate Professor in Computer Science (also UMIACS and
Linguistics) at the University of Maryland; I was previously in the School of Computing at the
University of Utah (CV). Although I'd like to be known for my research in language
(computational linguistics and natural language processing) and machine learning (structured
prediction, domain adapation and Bayesian methods), I am probably best known for my NLPers
blog. I associate myself most with conferences like ACL, ICML, EMNLP and NIPS. At UMD,
I'm affiliated with the Computational Linguistics lab, the machine learning reading group, the
language science program and the AI group, and interact closely with LINQS and computer
vision.
http://www.umiacs.umd.edu/~hal/
Melanie Mitchell, Portland State University
Research
My research interests: Artificial intelligence, machine learning, and complex systems.
Evolutionary computation and artificial life. Understanding how natural systems perform
computation, and how to use ideas from natural systems to develop new kinds of computational
systems. Cognitive science, particularly computer modeling of perception and analogy-making,
emergent computation and representation, and philosophical foundations of cognitive science.
Biographical Sketch
Melanie Mitchell is Professor of Computer Science at Portland State University, and External
Professor and Member of the Science Board at the Santa Fe Institute. She attended Brown
University, where she majored in mathematics and did research in astronomy, and the University
of Michigan, where she received a Ph.D. in computer science, Her dissertation, in collaboration
with her advisor Douglas Hofstadter, was the development of Copycat, a computer program that
makes analogies. She has held faculty or professional positions at the University of Michigan, the
Santa Fe Institute, Los Alamos National Laboratory, the OGI School of Science and
Engineering, and Portland State University. She is the author or editor of five books and over 70
scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her
most recent book, Complexity: A Guided Tour (Oxford, 2009), won the 2010 Phi Beta Kappa
Science Book Award. It was also named by Amazon.com as one of the ten best science books of
2009, and was longlisted for the Royal Society's 2010 book prize. Melanie directs the Santa Fe
Institute's Complexity Explorer project, which offers online courses and other educational
resources related to the field of complex systems.
http://web.cecs.pdx.edu/~mm/
ACADEMICS, France
Francis Bach, Ecole Normale Supérieure
I am a researcher at INRIA, leading since 2011 the SIERRA project-team, which is part of the
Computer Science Laboratory at Ecole Normale Superieure. I completed my Ph.D. in Computer
Science at U.C. Berkeley, working with Professor Michael Jordan, and spent two years in the
Mathematical Morphology group at Ecole des Mines de Paris, I then joined the WILLOW
project-team at INRIA/Ecole Normale Superieure from 2007 to 2010. I am interested in
statistical machine learning, and especially in graphical models, sparse methods, kernel-based
learning, convex optimization vision and signal processing.
http://www.di.ens.fr/~fbach/
Gaël Varoquaux, INRIA
Machine learning and brain imaging researcher
♣
Research faculty (CR1), Parietal team, INRIA
♣
Associate researcher, Unicog team, INSERM
ACADEMIC RESEARCH
Machine learning to link cognition with brain activity: I am interested in data mining of
functional brain images (fMRI) to learn models of brain function.
♣
Machine learning for encoding / decoding models
♣
Spatial penalties for learning and denoising
♣
Resting-state methods
♣
Functional parcellations of the brain
♣
Functional connectivity
•
More...
My publications page and my Google scholar page.
Research at Parietal
OPEN-SOURCE SOFTWARE
Core contributor to scientific computing in Python:
¥
scikit-learn: Machine learning in Python
¥
joblib: lightweight pipelining of scientific code
¥
Mayavi: 3D plotting and scientific visualization
¥
nilearn: Machine learning for NeuroImaging
More...
I am editor of the scipy lecture notes.
See my view on scientific computing.
http://gael-varoquaux.info/
ACADEMICS, in United
Kingdom
John Shaw-Taylor, University College London
John S Shawe-Taylor is a professor at University College London (UK) where he is Director of
the Centre for Computational Statistics and Machine Learning (CSML). His main
research area is Statistical Learning Theory, but his contributions range from Neural Networks,
to Machine Learning, to Graph Theory.
John Shawe-Taylor obtained a PhD in Mathematics at Royal Holloway, University of London in
1986. He subsequently completed an MSc in the Foundations of Advanced Information
Technology at Imperial College. He was promoted to Professor of Computing Science in 1996.
He has published over 150 research papers. He moved to the University of Southampton in 2003
to lead the ISIS research group. He has been appointed the Director of the Centre for
Computational Statistics and Machine Learning at University College, London from July 2006.
He has coordinated a number of European wide projects investigating the theory and practice of
Machine Learning, including the NeuroCOLT projects. He is currently the scientific coordinator
of a Framework VI Network of Excellence in Pattern Analysis, Statistical Modelling and
Computational Learning (PASCAL) involving 57 partners.
http://www0.cs.ucl.ac.uk/staff/J.Shawe-Taylor/
Mark Herbster, University College London
My research currently focuses on the problem of predicting a labeling of a graph. This problem
is foundational for transductive and semi-supervised learning. Initial bounds and experimental
results are given in Online learning over graphs. The paper Prediction on a graph with a
perceptron significantly improves on previous results in terms of the tightness and interpretability
of the bounds. In the recent work A fast method to predict the labeling of a tree we've developed
methods to speed up graph prediction methods. I am also broadly interested in online learning,
see my publications page for more details.
http://www0.cs.ucl.ac.uk/staff/M.Herbster/pubs/
David Barber, University College London
David Barber received a BA in Mathematics from Cambridge University and subsequently a
PhD in Theoretical Physics (Statistical Mechanics) from Edinburgh University. He is currently
Reader in Information Processing in the department of Computer Science UCL where he
develops novel information processing schemes, mainly based on the application of probabilistic
reasoning. Prior to joining UCL he was a lecturer at Aston and Edinburgh Universities.
http://web4.cs.ucl.ac.uk/staff/d.barber/publications/david barber online.html
Gabriel Brostow, University College London
My name is Gabriel Brostow, and I am an associate professor (Senior Lecturer) in Computer
Science here at UCL. My group explores research problems relating to Computer Vision and
Computer Graphics. The students and colleagues here have diverse interests, but my focus is on
"Smart Capture" for analysis and synthesis applications. To me, smart capture of visual data
(usually video) means having or finding satisfying answers to these questions about a system,
whether interactive or fully automated:
I) Does the system know the intended purpose of the data being captured?
II) Can the system assess its own accuracy?
III) Does the system compare new inputs to old ones?
I love this field because it allows us to apply our expertise to a variety of tough problems,
including film and photo special effects (computational photography), action analysis (of people,
animals, and cells), and authoring systems (for architecture, animation, presentations) that make
the most of user effort. "Motion reveals everything" used to be my main research mantra, but
that has now taken hold sufficiently (obviously NOT just through my efforts!) that it no longer
needs championing.
http://www0.cs.ucl.ac.uk/staff/g.brostow/#Research
Jun Wang, University College London
My research focus is on the areas of information retrieval, large scale data mining, multimedia
content analysis, and statistical pattern recognition; current research covers both theoretical and
practical aspects:
portfolio theory and statistical modeling of information retrieval,
data mining and collaborative filtering (recommendation),
web economy and online advertising,
user-centric information seeking,
social, “the wisdom of crowds”, approaches for content understanding, organisation, and
retrieval,
peer-to-peer information retrieval and filtering, and
multimedia content analysis, indexing and retrieval.
https://scholar.google.com/citations?user=wIE1tY4AAAAJ&hl=en
David Jones Lab, University College London
My main research interests are in protein structure prediction and analysis, simulations of protein
folding, Hidden Markov Model methods, transmembrane protein analysis, machine learning
applications in bioinformatics, de novo protein design methodology, and genome analysis including
the application of intelligent software agents. New areas of research include the use of high
throughput computing and Grid technology for bioinformatics applications, analysis and
prediction of protein disorder, expression array data analysis and the analysis and prediction of
protein function and protein-protein interactions.
http://bioinf.cs.ucl.ac.uk/publications/
Simon Prince, University College London
My initial work addressed human stereo vision. My doctoral thesis concerned the solution of the
binocular stereo correspondence problem in the human visual system. I also worked on the
physiology of stereo vision in my subsequent post-doctoral research.
I became interested in computer vision and made the switch in 2000. My first Computer Science
research was on time-series methods for the solution of the inverse problem in Optical
Tomography with Simon Arridge at UCL. In Singapore, I worked for several years on
augmented reality. This involved developing algorithms for camera pose estimation, and a threedimensional video-conferencing system using real-time image based rendering.
More recently, I have worked on face detection in a novel foveated sensor system. I am interested
in face recognition in general and have presented work on how to recognize faces in the presence
of large pose and lighting changes.
I am interested in most areas of computer vision and computer graphics, and still maintain active
links with the neuroscience and medical imaging communities.
http://web4.cs.ucl.ac.uk/research/vis/pvl/
http://www.computervisionmodels.com/
Massimiliano Pontil, University College London
I am mainly interested in machine learning theory and pattern recognition. I have also some
interest in function representation and approximation, numerical optimization and statistics. I
have worked on different machine learning approaches, particularly on regularization methods,
such as support vector machines and other kernel-based methods, multi-task and transfer
learning, online learning and learning over graphs. I have also worked on machine learning
applications arising in computer vision, natural language processing, bioinformatics and user
modeling.
http://www0.cs.ucl.ac.uk/staff/M.Pontil/pubs.html
Richard E Turner, Cambridge University
Richard Turner holds a Lectureship (equivalent to US Assistant Professor) in Computer Vision
and Machine Learning in the Computational and Biological Learning Lab, Department of
Engineering, University of Cambridge, UK. Before taking up this position, he held an EPSRC
Postdoctoral research fellowship which he spent at both the University of Cambridge and the
Laboratory for Computational Vision, NYU, USA. He has a PhD degree in Computational
Neuroscience and Machine Learning from the Gatsby Computational Neuroscience Unit, UCL,
UK and a M.Sci. degree in Natural Sciences (specialism Physics) from the University of
Cambridge, UK.
https://scholar.google.com/citations?user=DgLEyZgAAAAJ&hl=en
Andrew McHutchon Homepage, Cambridge University
Before starting my PhD I took the MEng course at Cambridge and specialised in Information
Engineering in my third and fourth year. In particular I studied control, bioinformatics, and some
information theory and statistics. As part of the MEng year I undertook a research project with
Carl Rasmussen, on applying Machine Learning techniques to control; this has now continued
on into my PhD research. Other avenues of research I have so far looked at include fast
approximations to Gaussian Processes for uncertain inputs and training GPs with input noise. I
am a member of Churchill College.
http://mlg.eng.cam.ac.uk/?portfolio=andrew-mchutchon
Phil Blunsom, Oxford University
My research interests lie at the intersection of machine learning and computational linguistics. I
apply machine learning techniques, such as graphical models, to a range of problems relating to
the understanding, learning and manipulation of language. Recently I have focused on structural
induction problems such as grammar induction and learning statistical machine translation
models
https://scholar.google.co.uk/citations?user=eJwbbXEAAAAJ&hl=en
Nando de Freitas, Oxford University
I want to understand intelligence and how minds work. My research is multi-disciplinary and
focuses primarily on the following areas:
Machine learning, big data, and computational statistics Artificial intelligence, probabilistic reasoning, and decision making
Computational neuroscience, neural networks, and cognitive science
Randomized algorithms, and Monte Carlo simulation
Vision, robotics, and speech perception
http://scholar.google.co.uk/citations?user=nzEluBwAAAAJ&hl=en
Karl Hermann, Oxford University
My research is at the intersection of Natural Language Processing and Machine Learning, with
particular emphasis on semantics. Current topics of interest include:
Compositional Semantics
Learning from Multilingual Data
Semantic Frame Identification
Machine Translation
Hypergraph Grammars
http://www.cs.ox.ac.uk/people/publications/personal/KarlMoritz.Hermann.html
Edward Grefenstette, Oxford University
I am a Franco-American computer scientist, working as a research assistant on EPSRC Project
EP/I03808X/1 entitled A Unified Model of Compositional and Distributional Semantics: Theory and
Applications. I am also lecturing at Hertford College to students taking Oxford's new computer
science and philosophy course. From October 2013, I will also be a Fulford Junior Research
Fellow at Somerville College.
http://www.cs.ox.ac.uk/people/publications/date/Edward.Grefenstette.html
ACADEMICS, in Netherlands
Thomas Geijtenbeek Publications & Videos, Delft University of
Technology
I am a postdoctoral researcher at Delft University of Technology. My main research interests are
simulation, control, animation and artificial intelligence. In addition, I work part-time as
Manager Software Development at Motek Medical.
http://goatstream.com/research/
ACADEMICS, in Canada
Yoshua Bengio, University of Montreal
My long-term goal is to understand intelligence; understanding the underlying principles
would deliver artificial intelligence, and I believe that learning algorithms are essential in this
quest.
Machine learning algorithms attempt to endow machines with the ability to capture operational
knowledge through examples, e.g., allowing a machine to classify or predict correctly in new
cases. Machine learning research has been extremely successful in the past two decades and is
now applied in many areas of science and technology, some well known examples including web
search engines, natural language translation, speech recognition, machine vision, and datamining. Yet, machines still seem to fall short of even mammal-level intelligence in many respects.
One of the remaining frontiers of machine learning is the difficulty of learning the kind of
complicated and highly-varying functions that are necessary to perform machine vision or
natural language processing tasks at a level comparable to humans (even a 2-year old).
See my lab's long-term vision web page for a broader introduction.
An introductory discussion of recent and ongoing research is below. See the lab's publications site
for a downloadable and complete bibliographic list of my papers.
http://www.iro.umontreal.ca/~bengioy/yoshua_en/research.html
http://www.iro.umontreal.ca/~bengioy/yoshua en/
Deep Learning Slides by Yoshua Bengio, MLSS 2015, Austin, Texas
http://www.iro.umontreal.ca/~bengioy/talks/mlss-austin.pdf
KyungHyun Cho, University of Montreal
http://www.kyunghyuncho.me/home
Deep Learning Tutorial at KAIST Slides
https://drive.google.com/file/d/0B16RwCMQqrtdb05qdDFnSXprM0E/edit?pli=1
Geoffrey Hinton, University of Toronto
I design learning algorithms for neural networks. My aim is to discover a learning procedure that
is efficient at finding complex structure in large, high-dimensional datasets and to show that this is
how the brain learns to see. I was one of the researchers who introduced the back-propagation
algorithm that has been widely used for practical applications. My other contributions to neural
network research include Boltzmann machines, distributed representations, time-delay neural
nets, mixtures of experts, variational learning, contrastive divergence learning, dropout, and deep
belief nets. My students have changed the way in which speech recognition and object
recognition are done.
I now work part-time at Google and part-time at the University of Toronto.
http://www.cs.toronto.edu/~hinton/papers.html
http://www.cs.toronto.edu/~hinton/
Alex Graves, University of Toronto
Research Interests
Recurrent neural networks (especially LSTM)
Supervised sequence labelling (especially speech and handwriting recognition)
Unsupervised sequence learning
http://www.cs.toronto.edu/%7Egraves/
Hugo Larochelle, Universite de Sherbrooke
Je m'intéresse aux algorithmes d'apprentissage automatique, soit aux algorithmes capables
d'extraire des concepts ou patrons à partir de données. Mes travaux se concentrent sur le
développement d'approches connexionnistes et probabilistes à diverses problèmes d'intelligence
artificielle, tels la vision artificielle et le traitement automatique du langage.
Les thèmes de recherche auxquels je m'intéresse incluent:
Problèmes: apprentissage supervisé, semi-supervisé et non-supervisé, prédiction de cibles
structurées, ordonnancement, estimation de densité;
Modèles: réseaux de neurones profonds («deep learning»), autoencodeurs, machines de
Boltzmann, champs Markoviens aléatoires;
Applications: reconnaissance et suivi d'objects, classification et ordonnancement de documents.
http://www.dmi.usherb.ca/~larocheh/index fr.html
http://info.usherbrooke.ca/hlarochelle/neural networks/content.html
Giuseppe Carenini, University of British Columbia
http://www.cs.ubc.ca/%7Ecarenini/storage/new-papers-frame.html
Cristina Conati, University of British Columbia
http://www.cs.ubc.ca/~conati/publications.php
Kevin Leyton-Brown, University of British Columbia
http://www.cs.ubc.ca/~kevinlb/publications.html
Holger Hoos, University of British Columbia
http://www.cs.ubc.ca/~hoos/publications.html
Jim Little, University of British Columbia
http://www.cs.ubc.ca/~little/links/papers.html
David Lowe, University of British Columbia
http://www.cs.ubc.ca/~lowe/pubs.html
Karon MacLean, University of British Columbia
http://www.cs.ubc.ca/labs/spin/publications/index.html
Alan Mackworth, University of British Columbia
http://www.cs.ubc.ca/~mack/Publications/sort date.html
Dinesh K. Pai, University of British Columbia
http://www.cs.ubc.ca/~pai/
David Poole, University of British Columbia
http://www.cs.ubc.ca/~poole/publications.html
Prof. Shai Ben-David, University of Waterloo
Research Interests
My research interests span a wide spectrum of topics in the foundations of computer science and
its applications, with a particular emphasis on statistical and computational machine learning.
The common thread throughout my research is aiming to provide mathematical formulation and
understanding of real world problems. In particular, I have been looking at popular machine
learning and data mining paradigms that seem to lack clear theoretical justification.
https://cs.uwaterloo.ca/~shai/
http://videolectures.net/shai_ben_david/
ACADEMICS, in Germany
Machine Learning Lab, University of Freiburg
Future computer programs will contain a growing part of 'intelligent' software modules that are
not conventionally programmed, but that are learned either from data provided by the user or
from data that the program autonomously collects during its use.
In this spirit, the Machine Learning Lab deals with research on Machine Learning techniques
and the integration of learning modules into larger software systems, aiming at their effective
application in complex real-world problems. Application areas are robotics, control, forecasting
and disposition systems, scheduling and related fields.
Research Areas: Efficient Reinforcement Learning Algorithms, Intelligent Robot Control
Architectures, Learning in Multiagent Systems, (Un-)Supervised Learning, Deep Learning,
Autonomous Robots, Industrial Applications, Clinical Applications
http://ml.informatik.uni-freiburg.de
ACADEMICS, in China
En-Hong Chen, USPC
My current research interests are data mining and machine learning, especially social network
analysis and recommender systems. I have published more than 100 papers on many journals
and conferences, including international journals such as IEEE Trans, ACM Trans, and
important data mining conferences, such as KDD, ICDM, NIPS. My research is supported by
the National Natural Science Foundation of China, National High Technology Research and
Development Program 863 of China, etc. I won the Best Application Paper Award on KDD2008
and Best Research Paper Award on ICDM2011.
http://staff.ustc.edu.cn/~cheneh/#pub
Linli Xu, USPC
My research area is Machine Learning. More specifically, my work combines aspects from the
following:
•
Unsupervised learning and semi-supervised learning, clustering
•
Large margin approaches, support vector machines
•
Optimization, convex programming
http://staff.ustc.edu.cn/~linlixu/papers.html
Yuan Yao, School of Mathematical Sciences, University of
Beijing
My most recent interests are focusing on mathematics for data sciences, in particular topological
and geometric methods for high dimensional data analysis and statistical machine learning, with
applications in computational biology and information technology.
Publications and code to reproduce results
http://www.math.pku.edu.cn/teachers/yaoy/research.html
ACADEMICS, in Australia
Prof. Peter Corke, Queensland University of Technology
Software for robotics, vision and other things. This includes the robotics and machine vision
toolboxes for Matlab. More recently this has become a book. and then two MOOCs.
Everything is freeware so enjoy!
About
I live in Brisbane with my wife, two daughters and a cat.
By day I’m a professor at Queensland University of Technology. My interests include robotics,
computer vision, embedded systems, control and networking. I’ve worked on robotic systems for
mining, aerial and underwater applications.
By night I maintain two open-source toolboxes, one for robotics and one for vision, and have just
finished writing a book on robotics, vision & control which will be published September 2011.
http://www.petercorke.com/Home.html
ACADEMICS, in United Arab
Emirates
Dmitry Efimov, American University of Sharjah, UAE
Dmitry is an expert in promising areas of modern complex and functional analysis; the author of
original results. He begins with the systematic study of some classes of analytic functions in the
half-plane that are analogous to the well-known Privalov classes and maximal Privalov classes in
the disc. His main results are the following:
1) A new factorization formula and accurate estimates of growth for functions in these classes;
2) The introduction of natural invariant metrics under which the classes form Frecher algebras;
3) A complete description of the linear isometries as well as the bounded and completely
bounded subsets in the classes.
https://www2.aus.edu/facultybios/profile.php?faculty=defimov
https://www.kaggle.com/users/29346/dmitry-efimov
ACADEMICS, in Poland
Marcin Murca, University of Warsaw, POLAND
I am an assistant professor at the Institute of Informatics, University of Warsaw, member of the
Algorithms Group (see our blog!).
I work on graph algorithms, approximation algorithms and on-line algorithms
most of my papers at DBLP or here.
you can find
You can find my PhD Thesis here it contains a rather detailed exposition of the algebraic
approach to matching problems in graphs.
http://duch.mimuw.edu.pl/~mucha/wordpress/?page_id=58
ACADEMICS, in Switzerland
Prof. Jürgen Schmidhuber's Home Page (Great resources! Not to be
missed!)
Prof. Jürgen Schmidhuber's Artificial Intelligence team has won nine international competitions
in machine learning and pattern recognition (more than any other AI research group) and seven
independent best paper/best video awards, achieved the world's first superhuman visual
classification results, Deep Learning since 1991 - Winning Contests in Pattern Recognition and
Sequence Learning Through Fast & Deep / Recurrent Neural Networks has pioneered Deep
Learning methods for Artificial Neural Networks since 1991, and established the field of
mathematically rigorous universal AI and optimal universal problem solvers. His formal theory
of creativity & curiosity & fun explains art, science, music, and humor. He generalized
algorithmic information theory, and the many-worlds theory of physics, to obtain a minimal
theory of all constructively computable universes - an elegant algorithmic theory of everything.
Google & Apple and many other leading companies are now using the machine learning
techniques developed in his group at the Swiss AI Lab IDSIA & USI & SUPSI (ex-TUM
CogBotLab). Since age 15 or so his main scientific ambition has been to build an optimal scientist
through self-improving AI, then retire. Progress is accelerating - are 40,000 years of humandominated history about to converge within the next few decades?
http://people.idsia.ch/~juergen/
Free access to ML MSc & PhD
Dissertations
Machine Learning Department, Carnegie Mellon University
https://www.ml.cmu.edu/research/phd-dissertations.html
Machine Learning Department, Columbia University
(Search for PhD on the page)
http://www.cs.columbia.edu/learning/papers.html
Non linear Modelling and Control using Gaussian Processes,
PhD Thesis by Andrew McHutchon, Cambridge University
Abstract
... In this thesis we start by discussing how GPs can be applied to data sets which have noise
affecting their inputs. We present the ‘Noisy Input GP’, which uses a simple local-linearisation to
refer the input noise into heteroscedastic output noise, and compare it to other methods both
theoretically and empirically. We show that this technique leads to a effective model for nonlinear
functions with input and output noise. We then consider the broad topic of GP state space
models for application to dynamical systems. We discuss a very wide variety of approaches for
using GPs in state space models, including introducing a new method based on momentmatching, which consistently gave the best performance. We analyse the methods in some detail
including providing a systematic comparison between approximate-analytic and particle
methods. To our knowledge such a comparison has not been provided before in this area. Finally,
we investigate an automatic control learning framework, which uses Gaussian Processes to model
a system for which we wish to design a controller. Controller design for complex systems is a
difficult task and thus a framework which allows an automatic design directly from data promises
to be extremely useful. We demonstrate that the previously published framework cannot cope
with the presence of observation noise but that the introduction of a state space model
dramatically improves its performance. This contribution, along with some other suggested
improvements opens the door for this framework to be used in real-world applications.
http://mlg.eng.cam.ac.uk/pub/pdf/Mch14.pdf
PhD Dissertations, University of Edingburgh, UK
https://www.era.lib.ed.ac.uk
MSc Dissertations, University of Oxford, UK
https://www.cs.ox.ac.uk/admissions/grad/
A list of some recent theses that received high marks
Machine Learning Group, Department of Engineering,
University of Cambridge, UK
(Search for PhD on the page)
http://mlg.eng.cam.ac.uk/pub/
New York University Computer Science PhD Theses
http://www.cs.nyu.edu/web/Research/theses.html
Digital Collection of The Australian National University (PhD
Thesis)
https://digitalcollections.anu.edu.au/handle/1885/3/simple-search?query=machine
learning&rpp=10&sort_by=0&order=DESC&etal=0&submit_search=Update
TEL (thèses-EN-ligne) (more than 45,000 thesis, however some
in French!)
The purpose of TEL (thèses-EN-ligne) is to facilitate the self archiving of thesis manuscripts,
which are important documents for direct scientific communication between scientists. TEL is
actually a particular "environment" of HAL. It therefore has the same objective: make scientific
documents available to scientists all around the world, rapidly and freely, but with a restriction to
PHD thesis and habilitations (HDR, in countries where habilitations exist). CCSD does not make
any scientific evaluation of the thesis that are submitted, since this is the responsibiliy of the
university professors in the examination board.
https://tel.archives-­‐ouvertes.fr/browse/domain
Download