The Machine Learning Salon

The Machine Learning Salon What is the Machine Learning Salon’s Kit? ................................................................. 19 What is the Machine Learning Salon’s Kit? .................................................................................................. 19 Future Development .............................................................................................................................................. 19 About the Founder of The Machine Learning Salon’s Website & Kit ............................................... 20 Contact ......................................................................................................................................................................... 20 MOOC or Opencourseware – English (checked 2-‐Jan-‐2015) ........................................ 21 Coursera ...................................................................................................................................................................... 21 Machine Learning Stanford Course ................................................................................................................. 21 Pratical Machine Learning ................................................................................................................................. 21 Machine Learning Washington Course .......................................................................................................... 21 Core Concepts in Data Analysis (Higher School of Economics) ........................................................... 22 Neural Networks for Machine Learning ........................................................................................................ 22 Natural Language Processing ........................................................................................................................... 22 Probabilistic Graphical Models ......................................................................................................................... 23 Stanford Engineering Everywhere .................................................................................................................. 23 EdX ................................................................................................................................................................................ 24 Learning from data (Caltech) ............................................................................................................................ 24 Articifial Intelligence (BerkeleyX) .................................................................................................................... 24 Big Data and Social Physics (Ethics) .............................................................................................................. 25 Introduction to Computational Thinking and Data Science ................................................................ 25 MIT OpenCourseWare (OCW) ........................................................................................................................... 25 VLAB MIT Entreprise Forum Bay Area, Machine Learning Videos ................................................... 25 Foundations of Machine Learning by Mehryar Mohri -‐ 10 years of Homeworks with Solutions and Lecture Slides, not to be missed ! ....................................................................................... 26 IPAM, Institute for Pure and Applied Mathematics, Videos, UCLA ................................................... 26 Carnegie Mellon University ................................................................................................................................ 27 Carnegie Mellon University (CMU) Video resources ................................................................................ 27 Convex Optimisation, Fall 2013, by Barnabas Poczos and Ryan Tibshirani, CMU ..................... 27 Machine Learning, Spring 2011, by Tom Mitchell, CMU ........................................................................ 27 Metacademy Concept list and roadmap list ................................................................................................ 28 Harvard University ................................................................................................................................................. 28 Advanced Machine Learning, Fall 2013 (Free access to most of videos) ........................................ 28 Data Science Course, Fall 2013 ......................................................................................................................... 28 Oxford University, Nando de Freitas video lectures ................................................................................ 29 Yee Whye Teh Home Page, Department of Statistics, University College, University of Oxford ......................................................................................................................................................................................... 29 Cambridge University Machine Learning Slides, Spring 2014 ............................................................ 29 Caltech University, Learning from Data ........................................................................................................ 29 University College London Discovery ............................................................................................................ 30 University College London, Supervised Learning ..................................................................................... 30 Yann LeCun’s Publications .................................................................................................................................. 30 Francis Bach, Ecole Normale Superieure -‐ Courses and Exercises with solutions (English-‐ French) ........................................................................................................................................................................ 30 Technion, Israel Institute of Technology, Machine Learning Videos ............................................... 31 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 1 E0 370: Statistical Learning Theory by Prof. Shivani Agarwal, Indian Institute of Science ... 31 NPTEL, National Programme on Technology Enhanced Learning, India ....................................... 32 Probability Theory and Applications .............................................................................................................. 32 Pattern Recognition ............................................................................................................................................... 32 Pattern Recognition Class, Universität Heidelberg, 2012 (Videos in English) ............................. 32 Videolectures.net .................................................................................................................................................... 34 MLSS Machine Learning Summer Schools Videos .................................................................................... 34 MLSS Videos from 2004 to 2012 ....................................................................................................................... 34 MLSS Videos 2012 ................................................................................................................................................... 34 MLSS Videos 2012 ................................................................................................................................................... 34 Max Planck Institute for Intelligent Systems Tubingen, MLSS Videos 2013 .................................. 34 MLSS Videos 2014 ................................................................................................................................................... 35 All slides of MLSS 2015, Austin, Texas ............................................................................................................ 35 GoogleTechTalks ..................................................................................................................................................... 35 Machine Learning ................................................................................................................................................... 35 Deep Learning ........................................................................................................................................................... 35 Udacity Opencourseware .................................................................................................................................... 35 Supervised Learning (select "View Courseware" for free access) ..................................................... 35 Unsupervised Learning (select "View Courseware" for free access) ................................................ 35 Reinforcement Learning (select "View Courseware" for free access) ............................................. 36 Mathematicalmonk Machine Learning .......................................................................................................... 36 Judea Pearl Symposium ....................................................................................................................................... 36 Machine Learning Reading Group, Indian Institute of Science ........................................................... 36 SIGDATA, Indian Institute of Technology Kanpur .................................................................................... 36 Hakka Labs ................................................................................................................................................................. 37 Open Yale Course .................................................................................................................................................... 37 Columbia University .............................................................................................................................................. 37 Machine Learning resources .............................................................................................................................. 37 Applied Data Science by Ian Langmore and Daniel Krasner ............................................................... 37 Deep Learning .......................................................................................................................................................... 38 BigDataWeek Videos ............................................................................................................................................. 38 Neural Information Processing Systems Foundation (NIPS) Video resources ............................ 38 Hong Kong Open Source Conference 2013 (English&Chinese) .......................................................... 39 ICLR 2014 Videos .................................................................................................................................................... 39 ICLR 2013 Videos .................................................................................................................................................... 39 Machine Learning Conference Videos ........................................................................................................... 39 Internet Archive ...................................................................................................................................................... 41 University of Berkeley .......................................................................................................................................... 41 AMP Camps, Big Data Bootcamp, UC Berkeley ........................................................................................... 41 Resources and Tools of Noah's ARK Research Group ............................................................................. 42 ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2014 ............................................................... 42 The Royal Society .................................................................................................................................................... 43 Statistical and causal approaches to machine learning by Professor Bernhard Schölkopf ... 43 Deep Learning .......................................................................................................................................................... 43 Deep Learning RNNaissance with Dr. Juergen Schmidhuber .............................................................. 43 Introduction to Deep Learning with Python by Alec Radford ............................................................. 44 A Statistical Learning/Pattern Recognition Glossary by Thomas Minka ....................................... 44 The Kalman Filter Website by Greg Welch and Gary Bishop ............................................................... 44 Lisbon Machine Learning School (LXMLS) .................................................................................................. 44 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 2 LXMLS Lab guide (Great Tutorial!) ................................................................................................................. 44 LXMLS Slides, 2014 ................................................................................................................................................. 46 INTRODUCTORY APPLIED MACHINE LEARNING by Victor Lavrenko and Nigel Goddard, University of Edinburgh, 2011 .......................................................................................................................... 46 Data Mining and Machine Learning Course Material by Bamshad Mobasher, DePaul University, Fall 2014 ............................................................................................................................................. 46 Intelligent Information Retrieval by Bamshad Mobasher, DePaul University, Winter 2015 46 Student Dave Youtube Channel ........................................................................................................................ 47 Current Courses of Justin E. Esarey, RICE University ............................................................................. 47 From Bytes to Bites: How Data Science Might Help Feed the World by David Lobell, Stanford University ................................................................................................................................................................... 48 Information and Data Analytics Seminar by Jure Leskovec, Stanford, Centre for Professional Development ............................................................................................................................................................. 48 Conference on Empirical Methods in Natural Language Processing (and forerunners) (EMNLP) ..................................................................................................................................................................... 48 emnlp acl's Youtube Channel ............................................................................................................................. 48 Columbia University's Laboratory for Intelligent Imaging and Neural Computing (LIINC) .. 49 Enabling Brain-‐Computer Interfaces for Labeling Our Environment by Paul Sadja ................. 49 The Unreasonable Effectivness Of Deep Learning by Yann LeCun, Sept 2014 ............................ 49 Machine Learning by Prof. Shai Ben-‐David, University of Waterloo, Lecture 1-‐3, Jan 2015 49 Miscellaneous ........................................................................................................................................................... 49 Introduction To Modern Brain-‐Computer Interface Design by Swartz Center for Computational Neuroscience ............................................................................................................................. 49 Distributed Computing Courses (lectures, exercises with solutions) by ETH Zurich, Group of Prof. Roger Wattenhofer ...................................................................................................................................... 49 The wonderful and terrifying implications of computers that can learn | Jeremy Howard | TEDxBrussels ............................................................................................................................................................. 50 Partially derivative, A podcast about data, data science, and awesomeness! .............................. 50 Class Central .............................................................................................................................................................. 50 Beginning to Advanced University CS Courses ........................................................................................... 50 MOOC or Opencourseware – Spanish (checked 2-‐Jan-‐2015) ....................................... 51 MOOC or Opencourseware – German (checked 2-‐Jan-‐2015) ...................................... 51 MOOC or Opencourseware – Italian (checked 2-‐Jan-‐2015) ......................................... 51 MOOC or Opencourseware – French (checked 2-‐Jan-‐2015) ........................................ 51 University of Laval (French Canadian) .......................................................................................................... 51 Apprentissage automatique ............................................................................................................................... 51 Théorie algorithm. des graphes ........................................................................................................................ 51 Hugo Larochelle, Apprentissage automatique, French Canadian ...................................................... 52 Francis Bach, Ecole Normale Superieure -‐ Courses and Exercises with solutions (English-‐ French) ........................................................................................................................................................................ 52 College de France, Mathematics and Digital Science, French .............................................................. 53 MOOC or Opencourseware – Russian (checked 2-‐Jan-‐2015) ....................................... 53 Russian Machine Learning Resources ........................................................................................................... 53 Yandex School The Yandex School of Data Analysis ................................................................................ 54 Alexander D’yakonov Resources ...................................................................................................................... 54 Unknown in Data Mining and Machine Learning (2013) ..................................................................... 54 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 3 Introduction to Data Mining (2012) ............................................................................................................... 54 Tricks in Data Mining (2011) ............................................................................................................................. 54 Manual "Logic Games, Data Mining, Weka, RapidMiner, MATLAB" (2010) ................................. 54 Machine Learning lectures by Konstantin Vorontsov. ........................................................................... 55 MOOC or Opencourseware – Japanese (checked 2-‐Jan-‐2015) ..................................... 55 MOOC or Opencourseware – Chinese (checked 2-‐Jan-‐2015) ....................................... 55 Yeeyan Coursera Chinese Classroom ............................................................................................................ 55 Hong Kong Open Source Conference 2013 .................................................................................................. 55 Guokr.com .................................................................................................................................................................. 55 Machine Learning ................................................................................................................................................... 55 Data Mining ............................................................................................................................................................... 55 Artificial Intelligence ............................................................................................................................................. 55 MOOC or Opencourseware -‐ Portuguese (checked 2-‐Jan-‐2015) .................................. 56 Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF, 2010 ............... 56 Algoritmo de Aprendizado de Máquina by Aurora Trinidad Ramirez Pozo, Universidade Federal do Paraná, UFPR .................................................................................................................................... 56 Digital Library, Universidad de Sao Paulo ................................................................................................... 56 MOOC or Opencourseware – Hebrew&English (checked 2-‐Jan-‐2015) ......................... 56 Open University of Israel ..................................................................................................................................... 56 Exercices & Solutions ................................................................................................. 57 CS229 Stanford Machine Learning (Huge!) List of projects (free access to abstracts), 2013 and previous years ................................................................................................................................................. 57 CS229 Stanford Machine Learning by Andrew Ng, Autumn 2014 .................................................... 57 CS 445/545 Machine Learning by Melanie Mitchell, Winter Quarter 2014 .................................. 57 Top Writing Errors by Melanie Mitchell ....................................................................................................... 57 Introduction to Machine Learning, Machine Learning Lab, University of Freiburg, Germany ......................................................................................................................................................................................... 57 Unsupervised Feature Learning and Deep Learning by Andrew Ng, 2011 ? ................................ 57 Machine Learning by Andrew Ng, 2011 ........................................................................................................ 57 Pattern Recognition and Machine Learning, Solutions to Exercises, by Markus Svensen and Christopher Bishop, 2009 ................................................................................................................................... 58 Machine Learning Course by Aude Billard, Exercises & Solutions, EPFL, Switzerland ............ 58 T-‐61.3025 Principles of Pattern Recognition Weekly Exercises with Solutions (in English), Aalto University, Finland, 2015 ....................................................................................................................... 58 T-‐61.3050 Machine Learning: Basic Principles Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014 ............................................................................................................... 58 CSE-‐E5430 Scalable Cloud Computing Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014 ........................................................................................................................... 58 Weekly Exercises with Solutions (in English) from Aalto University, Finland ............................ 58 SurfStat Australia: an online text in introductory Statistics ................................................................ 59 Learning from Data by Amos Storkey, Tutorial & Worksheets (with solutions), University of Edinburgh, Fall 2014 ............................................................................................................................................. 59 Web Search and Mining by Christopher Manning and Prabhakar Raghavan,, Winter 2005 . 59 Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Spring 2014 ......................................................................................................................................................................................... 59 Introduction to Time Series by Peter Bartlett, Berkeley, Homework & solutions, Fall 2010 59 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 4 Introduction to Machine Learning by Stuart Russel, CS 194-‐10, Fall 2011, Assignments & Solutions ..................................................................................................................................................................... 60 Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Fall 2009 60 Applications ............................................................................................................... 60 MIT Media Lab .......................................................................................................................................................... 60 TEDx San Francisco, Connected Reality ........................................................................................................ 60 Emotion&Pain Project .......................................................................................................................................... 60 NHK Documentary “Robot Revolution” Developing Robots for Dangerous Fukushima Decommission Process ......................................................................................................................................... 61 IBM Research ............................................................................................................................................................ 61 Visualizing MBTA Data: An interactive exploration of Boston's subway system ....................... 62 Commercial Applications (listed without any transfer of money) ............................... 62 Google glass ............................................................................................................................................................... 62 Google self-‐driving car .......................................................................................................................................... 62 SenseFly ...................................................................................................................................................................... 62 HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER ......................................................................................................................................................................................... 62 Free access to Research papers -‐ English .................................................................... 63 Cambridge University Publications page ..................................................................................................... 63 arXiv.org by Cornell University Library ........................................................................................................ 63 Google Scholar .......................................................................................................................................................... 63 Google Research ...................................................................................................................................................... 63 Yahoo Research ....................................................................................................................................................... 63 Microsoft Research ................................................................................................................................................ 63 Journal from MIT Press ........................................................................................................................................ 64 INRIA ............................................................................................................................................................................ 64 DROPS, Dagstulh Research Online Publication Server ........................................................................... 64 Open Source Software – English ................................................................................. 65 JAVA .............................................................................................................................................................................. 65 Weka 3: Data Mining Software in Java .......................................................................................................... 65 A deep-‐learning library for Java ....................................................................................................................... 65 List of Java ML Software by Machine Learning Mastery ........................................................................ 65 List of Java ML Software by MLOSS ................................................................................................................. 65 MathFinder: Math API Discovery and Migration, Software Engineering and Analysis Lab (SEAL), IISc Bangalore .......................................................................................................................................... 65 PYTHON ...................................................................................................................................................................... 65 Theano Library for Deep Learning .................................................................................................................. 65 Introduction to Deep Learning with Python ............................................................................................... 66 Udacity -‐ Programming foundations with Python .................................................................................... 66 Scikit-‐learn, Machine Learning in Python .................................................................................................... 66 Pydata .......................................................................................................................................................................... 66 PyData NYC 2014 Videos ..................................................................................................................................... 66 PyData, The Complete Works by Rohit Sivaprasad .................................................................................. 67 Anaconda .................................................................................................................................................................... 67 Ipython Interactive Computing ......................................................................................................................... 67 Scipy .............................................................................................................................................................................. 67 Numpy .......................................................................................................................................................................... 68 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 5 matplotlib ................................................................................................................................................................... 68 pandas .......................................................................................................................................................................... 68 SymPy ........................................................................................................................................................................... 68 Orange .......................................................................................................................................................................... 68 Pythonic Perambulations: How to be a Bayesian in Python ................................................................ 69 emcee ............................................................................................................................................................................ 69 PyMC ............................................................................................................................................................................. 69 Pylearn2 ...................................................................................................................................................................... 69 Giant list of python learning resources .......................................................................................................... 69 PyCon US 2014 .......................................................................................................................................................... 69 PyCon India 2012 .................................................................................................................................................... 70 PyCon India 2013 .................................................................................................................................................... 70 Montreal Python ...................................................................................................................................................... 70 SciPy 2014 .................................................................................................................................................................. 70 PyLadies London Meetup resources ................................................................................................................ 70 Python Tools for Machine Learning by CB Insights .................................................................................. 70 Python Tutorials by Jessica MacKellar ........................................................................................................... 70 INTRODUCTION TO PYTHON FOR DATA MINING .................................................................................... 71 Python Scientific Lecture Notes ........................................................................................................................ 71 OCTAVE ....................................................................................................................................................................... 71 PMTK Toolbox by Matt Dunham, Kevin Murphy ....................................................................................... 71 JULIA ............................................................................................................................................................................. 72 Julia by example ....................................................................................................................................................... 72 The R PROJECT for Statistical Computing .................................................................................................... 72 R ...................................................................................................................................................................................... 72 R Graph Gallery ........................................................................................................................................................ 72 Code School -‐ R Course .......................................................................................................................................... 73 Coursera R programming .................................................................................................................................... 73 Open Intro R Labs .................................................................................................................................................... 73 R Tutorial .................................................................................................................................................................... 73 DataCamp R Course ................................................................................................................................................ 73 R Bloggers ................................................................................................................................................................... 74 STAN Software ......................................................................................................................................................... 74 List of Machine Learning Open Source Software ...................................................................................... 74 Google Prediction API ........................................................................................................................................... 74 Reddit ........................................................................................................................................................................... 75 SCHOGUN toolbox ................................................................................................................................................... 75 Infer.NET, Microsoft Research .......................................................................................................................... 75 F# Software Foundation ...................................................................................................................................... 75 BigML ........................................................................................................................................................................... 76 BRML Toolbox in Matlab – David Barber Toolbox, University College London .......................... 76 Dmitry Efimov Software ...................................................................................................................................... 76 SCILAB ......................................................................................................................................................................... 76 OverFeat and Torch7, CILVR Lab @ NYU ..................................................................................................... 76 FAIR open sources deep-‐learning modules for Torch .............................................................................. 76 IPython kernel for Torch with visualization and plotting ..................................................................... 77 Mloss.org .................................................................................................................................................................... 77 Sourceforge ............................................................................................................................................................... 77 AForge.NET Framework ...................................................................................................................................... 77 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 6 cuda-‐convnet ............................................................................................................................................................ 77 word2vec .................................................................................................................................................................... 77 Freecode ..................................................................................................................................................................... 78 Open Machine Learning Workshop organized by Alekh Agarwal, Alina Beygelzimer, and John Langford, August 2014 ............................................................................................................................... 78 Maxim Milakov Software ..................................................................................................................................... 78 Alfonso Nieto-‐Castanon Software .................................................................................................................... 78 Lib Skylark ................................................................................................................................................................. 78 Mutual Information Text Explorer .................................................................................................................. 79 Data Science Resources by Jonathan Bower on GitHub ......................................................................... 79 Joseph Misiti's Blog ................................................................................................................................................ 80 Michael Waskom GitHub repositories ........................................................................................................... 80 Visualizing distributions of data ...................................................................................................................... 80 Exploring Seaborn and Pandas based plot types in HoloViews by Philipp John Frederic Rudiger ........................................................................................................................................................................ 80 "Machine Learning: An Algorithmic Perspective" Code by Stephen Marsland ............................ 81 Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!) .......................................................................................................................................................................... 84 Matlab and R code for original figures in book "Analysis of Neural Data" by Robert E. Kass, Uri T. Eden, and Emery N. Brown .................................................................................................................... 84 Open Source Hong Kong ...................................................................................................................................... 84 Lamda Group, Nanjing University ................................................................................................................... 84 Miscellaneous ........................................................................................................................................................... 85 Overleaf (ex WriteLaTeX) .................................................................................................................................... 85 Interview of Dr John Lees-‐Miller by Imperial College London ACM Student Chapter ............... 85 LISA Lab GitHub repository, Université de Montréal .............................................................................. 85 Big Data/Cloud Computing – English .......................................................................... 85 Apache SPARK .......................................................................................................................................................... 85 Apache Spark Machine Learning Library ..................................................................................................... 85 2013 Spark Summit exercises ............................................................................................................................ 86 2014 Spark Summit Training ............................................................................................................................. 86 Apache Spark Summit Videos ............................................................................................................................ 87 Databricks Videos .................................................................................................................................................... 87 SF Scala & SF Bay Area Machine Learning, Joseph Bradley: Decision Trees on Spark ............. 87 Apache MAHOUT ..................................................................................................................................................... 87 Apache Mahout ML library ................................................................................................................................. 87 Apache Mahout on Javaworld ............................................................................................................................ 88 Hadoop Users Group UK ....................................................................................................................................... 88 Deeplearning4j ......................................................................................................................................................... 88 Udacity opencourseware "Intro to Hadoop and MapReduce" ............................................................ 89 Storm Apache ........................................................................................................................................................... 89 Scaling Apache Storm by Taylor Goetz .......................................................................................................... 89 Michael Viogiatzis Blog ......................................................................................................................................... 89 Elasticsearch ............................................................................................................................................................. 89 Prediction IO ............................................................................................................................................................. 90 Container Cluster Manager ................................................................................................................................. 90 Domino Data Labs .................................................................................................................................................. 90 Data Science Central .............................................................................................................................................. 90 Amazon Web Services Videos ............................................................................................................................ 91 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 7 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Google Cloud Computing Videos ...................................................................................................................... 91 VLAB: Deep Learning: Intelligence from Big Data, Stanford Graduate School of Business .... 91 Machine Learning and Big Data in Cyber Security Eyal Kolman Technion Lecture .................. 91 Chaire Machine Learning Big Data, Telecom Paris Tech (Videos in French) ................................ 91 An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014 .............................................................................................................................................................................. 91 Big Data Requires Big Visions For Big Change | Martin Hilbert | TEDxUCL .................................. 92 Ethical Quandary in the Age of Big Data | Justin Grace | TEDxUCL ................................................... 92 Big Data & Dangerous Ideas | Daniel Hulme | TEDxUCL ....................................................................... 93 List of good free Programming and Data Resources, BITBOOTCAMP ............................................. 93 Predictive Modeling Competitions – English ............................................................... 95 Angry Birds AI Competition ............................................................................................................................... 95 LinkedIn Economic Graph Challenge, Deadline: 15-‐12-‐2014, $25,000 research award ......... 95 ChaLearn ..................................................................................................................................................................... 96 ChaLearn Automatic Machine Learning Challenge (AutoML) ............................................................ 96 IMAGENET Large Scale Visual Recognition Challenge 2014 (closed) ............................................. 96 Kaggle ........................................................................................................................................................................... 97 Kaggle Competition Past Solutions ................................................................................................................. 97 Kaggle Connectomics Winning Solution Research Article .................................................................... 97 Solution to the Galaxy Zoo Challenge ............................................................................................................. 97 Winning 2 Kaggle in class competitions on spam ..................................................................................... 97 Matlab Benchmark for Packing Santa’s Sleigh translated in Python .............................................. 97 Machine learning best practices we've learned from hundreds of competitions -‐ Ben Hamner (Kaggle) ..................................................................................................................................................... 98 TEDx San Francisco, Jeremy Howard talk (Connecting Devices with Algorithms) .................... 98 CrowdANALYTICS .................................................................................................................................................. 98 Challenges for governmental applications .................................................................................................. 98 InnoCentive Challenge Center ........................................................................................................................... 98 TunedIT ....................................................................................................................................................................... 98 Ants, AI Challenge, sponsored by Google, 2011 ......................................................................................... 98 International Collegial Programming Contest ........................................................................................... 98 Dream challenges .................................................................................................................................................... 99 Texata ........................................................................................................................................................................... 99 Cisco Internet of Things Innovation Grand Challenge ............................................................................ 99 Predictive Modeling Competitions -‐ Spanish ............................................................ 100 Predictive Modeling Competitions -‐ German ............................................................ 100 Predictive Modeling Competitions -‐ Italian .............................................................. 100 Predictive Modeling Competitions – French ............................................................. 100 RATP OpenDataLab results .............................................................................................................................. 100 Predictive Modeling Competitions -‐ Russian ............................................................ 100 Competition Avito.ru-‐2014: Recognition of contact information in images ............................... 100 Russian AI Cup -‐ Competition Programming Artificial Intelligence, 2013 .................................. 101 Predictive Modeling Competitions -‐ Portuguese ....................................................... 101 Open Dataset – English ............................................................................................ 102 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 8 The Text REtrieval Conference (TREC) Datasets .................................................................................... 102 HDX Humanitarian Data Exchange ............................................................................................................... 103 World Data Bank ................................................................................................................................................... 103 US Dataset ................................................................................................................................................................ 103 US City Open Data Census ................................................................................................................................. 103 Machine Learning repository .......................................................................................................................... 104 IMAGENET ............................................................................................................................................................... 104 Stanford Large Network Dataset Collection .............................................................................................. 104 Deep Learning datasets ...................................................................................................................................... 105 Open Government Data (OGD) Platform India ......................................................................................... 105 Yahoo Datasets ....................................................................................................................................................... 105 Windows Azure Marketplace .......................................................................................................................... 106 Amazon Public Data Sets ................................................................................................................................... 106 Wikipedia: Database Download ..................................................................................................................... 106 Gutenberg project (Free books available in different format, useful for NLP) .......................... 106 Freebase .................................................................................................................................................................... 106 Datamob Data ......................................................................................................................................................... 106 Reddit Datasets ...................................................................................................................................................... 106 100+ Interesting Data Sets for Statistics .................................................................................................... 107 Data portal of the City of Chicago .................................................................................................................. 107 Data portal of the City of Seattle ..................................................................................................................... 107 Data portal of the City of LA ............................................................................................................................. 107 California Department of Water Resources .............................................................................................. 107 Data portal of the City of Dallas ...................................................................................................................... 108 Data portal of the City of Austin ..................................................................................................................... 108 How to produce and use datasets: lessons learned, mlwave ............................................................. 108 MITx and HarvardX release MOOC datasets and visualization tools ............................................. 108 Finding the perfect house using open data, Justin Palmer’s Blog .................................................... 108 Synapse ..................................................................................................................................................................... 108 NYC Taxi Trips Date from 2013 ...................................................................................................................... 108 Sebastian Raschka’s Dataset Collections .................................................................................................... 109 Awesome Public Datasets by Xiaming Chen, Shanghai, China .......................................................... 109 UK Dataset ................................................................................................................................................................ 109 LONDON DATASTORE -‐ 591 datasets ......................................................................................................... 109 Transport For London Open Data, UK ......................................................................................................... 109 Gaussian Processes List of Datasets ............................................................................................................. 109 The New York Times Linked Open Data (Beta) ....................................................................................... 110 Google Public Data Explorer ............................................................................................................................ 110 Open Dataset -‐ French ............................................................................................. 111 Montreal, Portail Donnees Ouvertes (French&English), Canada ..................................................... 111 Insee, France ........................................................................................................................................................... 111 RATP Open Data, French Tube in Paris, France ....................................................................................... 111 L’Open-‐Data français cartographié ............................................................................................................... 111 Open Dataset -‐ China ............................................................................................... 111 Lamda Group .......................................................................................................................................................... 111 Data Visualisation .................................................................................................... 112 Visualization Lab Gallery, Computer Science Division, University of California, Berkeley .. 112 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 9 Visualization Lab Software, Computer Science Division, University of California, Berkeley ....................................................................................................................................................................................... 113 Visualization Lab Course Wiki, Computer Science Division, University of California, Berkeley ....................................................................................................................................................................................... 113 Mike Bostock ........................................................................................................................................................... 113 Eyeo Festival ........................................................................................................................................................... 113 MIT Data Collider .................................................................................................................................................. 114 D3 JS Data-‐Driven Documents ......................................................................................................................... 114 Shan He, Research Fellow at MIT Senseable City Lab ........................................................................... 114 Gource software version control visualization ........................................................................................ 114 Logstalgia, website access log visualization .............................................................................................. 114 Andrew Caudwell's Blog .................................................................................................................................... 114 MLDemos , EPFL, Switzerland ......................................................................................................................... 115 The University of Florida Sparse Matrix Collection ............................................................................... 115 Visualization & Graphics lab, Dept. of CSA and SERC, Indian Institute of Science, Bangalore ....................................................................................................................................................................................... 115 Allison McCann ...................................................................................................................................................... 115 Scott Murray ............................................................................................................................................................ 116 The Best New York City Maps of 2014 ........................................................................................................ 116 Gephi: The Open Graph Viz Platform ........................................................................................................... 116 Data Analysis and Visualization Using R by David Robinson ............................................................ 117 Books – English ........................................................................................................ 117 An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014 ............................................................................................................................................................................ 117 Deep Learning (Artificial Intelligence) , An MIT Press book in preparation, by Yoshua Bengio, Ian Goodfellow and Aaron Courville, 20-‐Oct-‐2014 ............................................................... 118 Deep Learning Tutorial by LISA Lab, University of Montreal, 2014 ............................................... 118 Statistical Inference for Everyone, by Professor Bryan Blais, 2014 ............................................... 119 Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014 ............. 120 Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi, Huan Liu, 2014 ...................... 120 Causal Inference by Miguel A. Hernán and James M. Robins, May 14, 2014, Draft .................. 121 Slides for High Performance Python tutorial at EuroSciPy2014 by Ian Ozsvald ...................... 121 Neural Networks and Deep Learning, 2014 .............................................................................................. 122 Probabilistic Programming and Bayesian Methods for Hackers by Cameron Davidson-‐Pilon, 2014 ............................................................................................................................................................................ 122 Bayesian Reasoning and Machine Learning, David Barber, 2012 (online version 02-‐2014) ....................................................................................................................................................................................... 122 Past, Present, and Future of Statistical Science by COPSS, 2014 ...................................................... 123 Interactive Data Visualization for the Web By Scott Murray, 2013 ................................................ 123 Essential of Metaheuristics by Sean Luke, 2013 ..................................................................................... 123 Statistical Model Building, Machine Learning, and the Ah-‐Ha Moment by Grace Wahba, 2013 ............................................................................................................................................................................ 123 An Introduction to Statistical Learning with applications in R. by Gareth James Daniela Witten Trevor Hastie Robert Tibshirani, 2013 (first printing) ........................................................ 124 Supervised Sequence Labelling with Recurrent Neural Networks by Alex Graves, 2012 .... 124 A course in Machine Learning by Hal Daume, 2012 .............................................................................. 124 Machine Learning in Action, Peter Harrington, 2012 ........................................................................... 124 A Programmer's Guide to Data Mining, by Ron Zacharski, 2012 ..................................................... 125 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 10 Artificial Intelligence, Foundations of Computational Agents by David Poole and Alan Mackworth, 2010 .................................................................................................................................................. 125 The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J. Friedman, 2009 ........... 125 Learning Deep Architecture for AI by Yoshua Bengio, 2009 ............................................................. 126 An Introduction to Information Retrieval by Christopher D. Manning Prabhakar Raghavan Hinrich Schütze, 2009 ......................................................................................................................................... 126 Kernel Method in Machine Learning by Thomas Hofmann; Bernhard Schölkopf; Alexander J. Smola, 2008 ....................................................................................................................................................... 127 Introduction to Machine Learning, Alex Smola, S.V.N. Vishwanathan, 2008 .............................. 127 Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 ................................ 128 Gaussian processes for Machine Learning, C. Rasmussen and C. Williams, 2006 .................... 128 Bayesian Machine Learning by Chakraborty, Sounak, 2005 .............................................................. 129 Machine Learning by Tom Mitchell, 2005 .................................................................................................. 129 Information Theory, Inference, and Learning Algorithms, David McKay, 2003 ....................... 129 Free Book List ......................................................................................................................................................... 130 Free resource book (need to sign in) ........................................................................................................... 130 Free ML ebooks on it-‐ebooks, but this website is controversial, please read stackoverflow before accessing to this website by yourself ............................................................................................ 130 Wikipedia: Machine Learning, the Complete Guide ............................................................................... 130 ISSUU .......................................................................................................................................................................... 130 Neural Networks, A Systematic Introduction by Raul Rojas .............................................................. 131 Books -‐ Spanish ........................................................................................................ 131 Books -‐ German ....................................................................................................... 131 Books -‐ Italian .......................................................................................................... 131 Books -‐ French ......................................................................................................... 131 Books – Russian ....................................................................................................... 132 Pattern Recognition by А.Б.Мерков, 2011 ................................................................................................ 132 Algorithmic models of learning classification: rationale, comparison, selection, 2014 ......... 132 Books -‐ Japanese ...................................................................................................... 132 Books -‐ Chinese ........................................................................................................ 132 Blog recommending useful books ................................................................................................................. 132 Textbook for Statistics ........................................................................................................................................ 132 Introduction to Pattern recognition ............................................................................................................. 132 Translated version of Machine Learning by Tom Mitchell: ................................................................ 132 Books -‐ Portuguese .................................................................................................. 132 Presentation, Infographics and Documents -‐ English ................................................ 133 Meetup's Presentations ...................................................................................................................................... 133 Slides .......................................................................................................................................................................... 133 Slideshare.com ....................................................................................................................................................... 133 Slides.com ................................................................................................................................................................ 133 Powershow.com .................................................................................................................................................... 133 Speaker Deck .......................................................................................................................................................... 133 Slides from Lectures ............................................................................................................................................ 133 Slides from Meetups ............................................................................................................................................ 133 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 11 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Slides from Conferences ..................................................................................................................................... 134 Conferences ............................................................................................................. 135 International Conference in Machine Learning (ICML) ....................................................................... 135 ICML, Beijing, China 2014 ................................................................................................................................. 135 ICML, Atlanta, US 2013 ...................................................................................................................................... 135 ICML, Edinburgh, UK 2012 ............................................................................................................................... 135 ICML, Bellevue, US 2011 .................................................................................................................................... 135 ICML, Haifa, Israel 2010 .................................................................................................................................... 135 Full archive of ICML ............................................................................................................................................ 135 Machine Learning Conference Videos ......................................................................................................... 135 Annual Machine Learning Symposium ........................................................................................................ 135 6th ............................................................................................................................................................................... 135 8th ................................................................................................................................................................................. 135 Archive ...................................................................................................................................................................... 135 MLSS Machine Learning Summer Schools ................................................................................................. 136 Data Gotham 2012,2013 .................................................................................................................................... 136 Meetup -‐ English ...................................................................................................... 137 631 Machine Learning Meetup in the World ............................................................................................ 137 Data Science Weekly – List of Meetups ....................................................................................................... 137 Other Meetups missing in Data Science Weekly ..................................................................................... 137 London Machine Learning Meetup ............................................................................................................... 137 London Deep Learning Meetup ...................................................................................................................... 137 Blog – English ........................................................................................................... 138 Data Science Weekly ............................................................................................................................................ 138 Yann LeCun, Google+ ........................................................................................................................................... 138 Igor Carron Blog .................................................................................................................................................... 138 KDD Community, Knowledge discovery and Data Mining .................................................................. 138 Kaggle Blog .............................................................................................................................................................. 138 Digg ............................................................................................................................................................................. 138 Feedly ......................................................................................................................................................................... 138 Mlwave ...................................................................................................................................................................... 138 FastML ....................................................................................................................................................................... 138 Beating the Benchmark ...................................................................................................................................... 139 YOU CANalytics ...................................................................................................................................................... 139 Trevor Stephens Blog .......................................................................................................................................... 139 Mozilla Hacks .......................................................................................................................................................... 139 Banach's Algorithmic Corner, University of Warsaw ............................................................................ 139 DataCamp Blog ....................................................................................................................................................... 140 Natural Language Processing Blog, Hal Daume ....................................................................................... 140 Maxim Milakov Blog ............................................................................................................................................ 140 Alfonso Nieto-‐Castanon Blog ........................................................................................................................... 140 Persontyle Blog ...................................................................................................................................................... 140 Analytics Vidhya .................................................................................................................................................... 140 Bugra Akyildiz's Blog .......................................................................................................................................... 141 Data origami ............................................................................................................................................................ 141 Rasbt’s Blog ............................................................................................................................................................. 141 Gilles Louppe's Blog ............................................................................................................................................. 141 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 12 AI Topics ................................................................................................................................................................... 141 AI International ..................................................................................................................................................... 142 Joseph Misiti's Blog .............................................................................................................................................. 142 MIRI, Machine Intelligence Research Institute ........................................................................................ 142 Kevin Davenport Data Blog .............................................................................................................................. 142 Alexandre Passant's Blog .................................................................................................................................. 143 Daniel Nouri’s Blog ............................................................................................................................................... 143 Yvonne Rogers Blog ............................................................................................................................................. 144 Igor Subbotin's Blog (Both in English & Russian) (Huge list of resources) .................................. 144 Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!) ........................................................................................................................................................................ 144 Popular Science Website ................................................................................................................................... 144 HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER ...................................................................................................................................................................................... 144 Max Woolf's Blog ................................................................................................................................................... 145 Rasmus Bååth's Research Blog ....................................................................................................................... 145 Flowing Data's Blog ............................................................................................................................................. 145 Genetic algorithm walkers ............................................................................................................................... 145 Miscellaneous ......................................................................................................................................................... 146 Allen Institute for Artificial Intelligence (AI2) ........................................................................................ 146 Artificial General Intelligence (AGI) Society ............................................................................................. 146 AUAI, Association for Uncertainty in Artificial Intelligence .............................................................. 146 Blog -‐ Spanish .......................................................................................................... 147 Blog -‐ Italian ............................................................................................................. 147 Blog -‐ German .......................................................................................................... 147 Blog -‐ French ............................................................................................................ 147 L'ATELIER's News ................................................................................................................................................ 147 Blog -‐ Russian ........................................................................................................... 147 Igor Subbotin's Blog (Both in English & Russian) (Huge list of resources) .................................. 147 Blog -‐ Japanese ........................................................................................................ 148 Blog -‐ Chinese .......................................................................................................... 148 Blog -‐ Portuguese ..................................................................................................... 148 Journals -‐ English ..................................................................................................... 148 Journal of Machine Learning Research, MIT Press ................................................................................. 148 Machine Learning Journal (last article could be downloaded for free) ........................................ 148 Machine Learning (Theory) ............................................................................................................................. 148 List of Journals on Microsoft Academic Research website ................................................................. 148 Wired magazine ..................................................................................................................................................... 148 Data Science Central ............................................................................................................................................ 148 Journals – Spanish .................................................................................................... 149 Journals – German ................................................................................................... 149 Journals – Italian ...................................................................................................... 149 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 13 Journals – French ..................................................................................................... 149 Journals – Russian .................................................................................................... 149 Journals – Japanese ................................................................................................. 149 Journals – Chinese ................................................................................................... 149 Journals -‐ Portuguese ............................................................................................... 149 Forum, Q&A -‐ English ............................................................................................... 150 Data Tau .................................................................................................................................................................... 150 Hacker News ........................................................................................................................................................... 150 Metaoptimize .......................................................................................................................................................... 150 Kaggle Forums ....................................................................................................................................................... 150 Reddit in English ................................................................................................................................................... 150 Cross validated Stack Exchange ..................................................................................................................... 150 Open data Stack Exchange ................................................................................................................................ 151 Data Science Beta Stack Exchange ................................................................................................................. 151 Quora .......................................................................................................................................................................... 151 Machine Learning Impact Forum ................................................................................................................... 151 Forum, Q&A -‐ Spanish .............................................................................................. 151 Forum, Q&A -‐ German ............................................................................................. 151 Forum, Q&A -‐ Italian ................................................................................................ 151 Forum, Q&A -‐ French ............................................................................................... 152 Forum, Q&A -‐ Russian .............................................................................................. 152 Reddit in Russian .................................................................................................................................................. 152 Habrahabr.ru Forum (in Russian translated by Google Chrome) ................................................... 152 Playing with genetic algorithms .................................................................................................................... 152 PythonDigest -‐ 2014, the results of our work in figures and references ...................................... 152 Forum, Q&A – Portuguese ....................................................................................... 152 Forum, Q&A – Chinese ............................................................................................. 152 Zhihu.com ................................................................................................................................................................. 153 Machine Learning ................................................................................................................................................ 153 Data Mining ............................................................................................................................................................ 153 Artificial Intelligence .......................................................................................................................................... 153 Guokr.com ................................................................................................................................................................ 153 Machine Learning ................................................................................................................................................ 153 Data Mining ............................................................................................................................................................ 153 Artificial Intelligence .......................................................................................................................................... 153 Governmental Reports -‐ English ............................................................................... 153 Big Data report, Whitehouse, US .................................................................................................................... 153 Fun -‐ English ............................................................................................................. 153 Founder of PhD Comics ...................................................................................................................................... 153 MACHINE LEARNING RESEARCH GROUPS: DRAFT, A LOT MORE TO COME SOON ..... 155 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 14 MACHINE LEARNING RESEARCH GROUPS in AMERICA, USA .......................................................... 155 MIT .............................................................................................................................................................................. 155 Stanford University .............................................................................................................................................. 155 Carnegie Mellon University .............................................................................................................................. 156 Harvard University .............................................................................................................................................. 156 University of California, Berkeley .................................................................................................................. 157 Princeton University ............................................................................................................................................ 158 University of California, Los Angeles (UCLA) ........................................................................................... 158 Cornwell University ............................................................................................................................................. 159 University of Illinois at Urbana Champaign ............................................................................................. 159 California Institute of Technology, Caltech ............................................................................................... 159 University of Washington ................................................................................................................................. 160 Social Robotics Lab -‐ Yale University ........................................................................................................... 160 Georgia Institute of Technology ..................................................................................................................... 161 University of Texas and Austin ....................................................................................................................... 161 University of Pennsylvania ............................................................................................................................... 161 Columbia University ............................................................................................................................................ 161 New York City University .................................................................................................................................. 161 University of Chicago .......................................................................................................................................... 162 The Johns Hopkins Center for Language and Speech Processing (CLSP) Archive Videos ..... 162 Miscellaneous ......................................................................................................................................................... 162 IARPA Organization ............................................................................................................................................ 162 MACHINE LEARNING RESEARCH GROUPS in AMERICA, CANADA ................................................ 162 University of Toronto .......................................................................................................................................... 162 University of Waterloo ....................................................................................................................................... 163 University of British Columbia ........................................................................................................................ 163 University of Montreal ........................................................................................................................................ 164 University of Sherbrooke ................................................................................................................................... 164 University of Laval ............................................................................................................................................... 165 MACHINE LEARNING RESEARCH GROUPS in AMERICA, BRAZIL ................................................... 166 USP -‐ UNIVERSIDADE DE SÃO PAULO, Instituto de Ciências Matemáticas e de Computação ...................................................................................................................................................................................... 166 MACHINE LEARNING RESEARCH GROUPS in EUROPE, UK ............................................................... 167 University College London ................................................................................................................................ 167 Oxford University .................................................................................................................................................. 168 Imperial College .................................................................................................................................................... 168 The University of Edinburgh, Institute for Adaptive and Neural Computation ........................ 169 Cambridge University ......................................................................................................................................... 169 Queen Mary University of London ................................................................................................................. 170 ICRI, The Intel Collaborative Research Institute .................................................................................... 170 MACHINE LEARNING RESEARCH GROUPS in EUROPE, FRANCE .................................................... 171 Magnet, MAchine learninG in information NETworks, INRIA, France .......................................... 171 Sierra Team -‐ Ecole Normale Superieure , CNRS, INRIA ..................................................................... 171 ENS Ecole Normale Superieure ...................................................................................................................... 171 Laboratoire Hubert Curien UMR CNRS 5516, Machine Learning ................................................... 172 MACHINE LEARNING RESEARCH GROUPS in EUROPE, GERMANY ............................................... 173 Max Planck Institute for Intelligent Systems, Tübingen site ............................................................. 173 BRML Research Lab, Institute of Informatics at the Technische Universität München ....... 173 HCI, Heidelberg Collaboratory for Image Processing, Universität Heidelberg ......................... 173 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 15 MACHINE LEARNING RESEARCH GROUPS in EUROPE, SWITZERLAND ..................................... 174 EPFL Ecole Polytechnique Federale de Lausanne, Switzerland ....................................................... 174 IDSIA: the Swiss AI Lab ...................................................................................................................................... 174 MACHINE LEARNING RESEARCH GROUPS in EUROPE, NETHERLANDS .................................... 175 Machine Learning Research Groups in The Netherlands .................................................................... 175 MACHINE LEARNING RESEARCH GROUPS in EUROPE, POLAND ................................................... 175 University of Warsaw, Dept. of Mathematics, Informatics and Mechanics ................................. 175 MACHINE LEARNING RESEARCH GROUPS in ASIA, INDIA ................................................................ 175 RESEARCH LABS, Department of Computer Science and Automation, IISc, Bangalore ....... 175 MLSIG: Machine Learning Special Interest Group, Indian Institute of Science ......................... 175 Indian Institute of Technology of Kanpur ................................................................................................. 176 MACHINE LEARNING RESEARCH GROUPS in ASIA, CHINA ............................................................... 176 Peking University .................................................................................................................................................. 176 Beijing University of Technology ................................................................................................................... 177 University of Science and Technology of China, USTC .......................................................................... 177 Nanjing University ............................................................................................................................................... 177 MACHINE LEARNING RESEARCH GROUPS in ASIA, RUSSIA ............................................................. 178 Moscow State University ................................................................................................................................... 178 MACHINE LEARNING RESEARCH GROUPS in AFRICA ......................................................................... 178 MACHINE LEARNING RESEARCH GROUPS in OCEANIA ..................................................................... 178 NICTA Machine Learning Research Group, Australia .......................................................................... 178 Academics (with free access to their publications): DRAFT, A LOT MORE TO COME SOON ....................................................................................................................... 180 Academics (with free access to their publications), US ............................................. 180 Stanford University .............................................................................................................................................. 180 Andrew Ng ............................................................................................................................................................... 180 Carnegie Mellon University .............................................................................................................................. 180 Tom Mitchell ........................................................................................................................................................... 180 Robert Kass ............................................................................................................................................................. 180 Alexander J. Smola ............................................................................................................................................... 180 Princeton University, US .................................................................................................................................... 181 Robert Schapire ..................................................................................................................................................... 181 Mona Singh ............................................................................................................................................................. 181 Olga Troyanskaya ................................................................................................................................................ 182 UCLA, US ................................................................................................................................................................... 182 Judea Pearl, Cognitive System Laboratory ................................................................................................ 182 Rice University, US ............................................................................................................................................... 183 Justin Esarey Lectures, Assistant Professor of Political Science ....................................................... 183 University of Maryland, US ............................................................................................................................... 183 Hal Daume III ......................................................................................................................................................... 183 Portland State University .................................................................................................................................. 183 Melanie Mitchell .................................................................................................................................................... 183 Academics (with free access to their publications), FRANCE ..................................... 184 Ecole Normale Superieure, FRANCE ............................................................................................................ 184 Francis Bach ........................................................................................................................................................... 184 INRIA .......................................................................................................................................................................... 184 Gaël Varoquaux ..................................................................................................................................................... 184 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 16 Academics (with free access to their publications), UK ............................................. 185 University College London, UK ....................................................................................................................... 185 John Shaw-‐Taylor ................................................................................................................................................. 185 Mark Herbster ........................................................................................................................................................ 185 David Barber .......................................................................................................................................................... 186 Gabriel Brostow .................................................................................................................................................... 186 Jun Wang .................................................................................................................................................................. 186 David Jones Lab ..................................................................................................................................................... 187 Simon Prince ........................................................................................................................................................... 187 Massimiliano Pontil ............................................................................................................................................. 187 Cambridge University, UK ................................................................................................................................. 188 Richard E Turner .................................................................................................................................................. 188 Oxford University, UK ......................................................................................................................................... 188 Phil Blunsom ........................................................................................................................................................... 188 Nando de Freitas ................................................................................................................................................... 188 Karl Hermann ........................................................................................................................................................ 189 Edward Grefenstette ........................................................................................................................................... 189 Delft University of Technology, NETHERLANDS ..................................................................................... 189 Thomas Geijtenbeek Publications & Videos .............................................................................................. 189 Academics (with free access to their publications), CANADA .................................... 189 University of Montreal, CANADA ................................................................................................................... 189 Yoshua Bengio ....................................................................................................................................................... 189 KyungHyun Cho ..................................................................................................................................................... 190 University of Toronto, CANADA ..................................................................................................................... 190 Geoffrey Hinton ..................................................................................................................................................... 190 Alex Graves .............................................................................................................................................................. 190 Universite de Sherbrooke, CANADA ............................................................................................................. 191 Hugo Larochelle .................................................................................................................................................... 191 University of British Columbia, CANADA ................................................................................................... 191 Giuseppe Carenini ................................................................................................................................................. 191 Cristina Conati ....................................................................................................................................................... 191 Kevin Leyton-‐Brown ............................................................................................................................................ 191 Holger Hoos ............................................................................................................................................................ 191 Jim Little ................................................................................................................................................................... 192 David Lowe .............................................................................................................................................................. 192 Karon MacLean ..................................................................................................................................................... 192 Alan Mackworth .................................................................................................................................................... 192 Dinesh K. Pai ........................................................................................................................................................... 192 David Poole ............................................................................................................................................................. 192 University of Waterloo ....................................................................................................................................... 192 Prof. Shai Ben-‐David .......................................................................................................................................... 192 Academics (with free access to their publications), GERMANY ................................. 192 University of Freiburg ......................................................................................................................................... 192 Machine Learning Lab ....................................................................................................................................... 192 Academics (with free access to their publications), CHINA ....................................... 193 USPC, CHINA ........................................................................................................................................................... 193 En-‐Hong Chen ........................................................................................................................................................ 193 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 17 Linli Xu ...................................................................................................................................................................... 193 University of Beijing, CHINA ............................................................................................................................ 193 Yuan Yao, School of Mathematical Sciences ............................................................................................. 193 Academics (with free access to their publications), RUSSIA ...................................... 194 Moscow State University, RUSSIA ................................................................................................................. 194 Dmitry Efimov ........................................................................................................................................................ 194 Academics (with free access to their publications), POLAND .................................... 194 University of Warsaw, POLAND ..................................................................................................................... 194 Marcin Murca ......................................................................................................................................................... 194 Academics (with free access to their publications), SWITZERLAND ........................... 195 Prof. Jürgen Schmidhuber's Home Page (Great resources! Not to be missed!) ........................ 195 Free access to a list of Machine Learning MSc/PhD Dissertations ............................. 195 Machine Learning Department, Carnegie Mellon University ............................................................ 195 Machine Learning Department, Columbia University .......................................................................... 195 PhD Dissertations, University of Edingburgh, UK .................................................................................. 195 MSc Dissertations, University of Oxford, UK ............................................................................................. 195 Machine Learning Group, Department of Engineering, University of Cambridge, UK .......... 196 Barton: MIT Libraries' Catalog ...................................................................................................................... 196 New York University Computer Science PhD Theses ............................................................................ 196 Digital Collection of The Australian National University (PhD Thesis) ........................................ 196 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 18 What is the Machine Learning Salon’s Kit? What is the Machine Learning Salon’s Kit? The Machine Learning Salon’s Kit is currently a collection of useful websites gathered on Blogs & Forums such as Reddit.com, DataTau.com, Groups on LinkedIn, posts on Twitter, publications on Google Scholar and Machine Learning Research Group websites. It contains more than 150 pages of descriptions and links to useful websites. The Machine learning Salon's Kit is not a commercial product (I am not making any money of it). All its content is free and no registration is required to download it. I am also really attached at keeping this website simple and free from any advertising. If you want to remove a link, please tell me why and I will take care of it as soon as possible. If you want to add a better description of your website, please send me the new version and I will do the change. Future Development A Free Win-‐Win opportunity for companies and developers When The Machine Learning Salon will move from Beta Test to Live, it will give CTOs the opportunity to post and publish the Machine Learning challenges that they are facing so that the Machine Learning Salon Community have the opportunity to learn more about challenges that today's companies are facing. When companies do so, they will also become able to post recruitment offers for free. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 19 About the Founder of The Machine Learning Salon’s Website & Kit My name is Jacqueline Isabelle Forien and I am from Tours, France, a small city located in the middle of the Loire Valley. I am married and have four children. After an Engineer's degree in Computer Science at the UTC Engineering school and few years of work experience in that field, I decided to become a Mathematics teacher. I am still teaching but in the meantime became passionate about Artificial Intelligence and more specifically, Machine Learning. In 2013, I decided to start studying again at 53 years old and soon graduated from University College London in M.Sc Machine Learning. Soon after, I decided to create the Machine Learning Salon during my spare time so that I could stay updated on the changes that happen regularly in that field. I would like to express a special gratitude to my director of Machine Learning studies at UCL, Professor Mark Herbster, my tutor, Professor David Barber, my supervisor of Master's project, Professor Nadia Berthouze, as well as all my peers during this amazing year. In addition, I would like to express many thanks to Igor Carron who initiated the smart association of 'Machine Learning' and 'Salon', and gave me the opportunity to organise in London a wonderful event that was the Europe Wide Machine Learning Meetup between Paris, Berlin, Zurich and London with Andrew Ng as a Guest speaker. I hope that this website will help many people learn and get more involved in this passionate field that is Machine Learning! Jacqueline Contact Please, contact me if you want to add a contribution, remove a link, etc. Any suggestion is welcome! Contact at contact@machinelearningsalon.org machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 20 MOOC or Opencourseware – English (checked 2-‐Jan-‐2015) Coursera Machine Learning Stanford Course This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-‐parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-‐spam), computer vision, medical informatics, audio, database mining, and other areas. https://www.coursera.org/course/ml Pratical Machine Learning One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation. https://www.coursera.org/course/predmachlearn Machine Learning Washington Course Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-‐effective when manual programming is not. Machine learning (also known as data mining, pattern recognition and predictive analytics) is used widely in business, industry, science and government, and there is a great shortage of experts in it. If you pick up a machine learning textbook you may find it forbiddingly mathematical, but in this class you will learn that the key ideas and algorithms are in fact quite intuitive. And powerful! Most of the class will be devoted to supervised learning (in other words, learning in which a teacher provides the learner with the correct answers at training time). This is the most mature and widely used type of machine learning. We will cover the main supervised learning techniques, including decision trees, rules, instances, Bayesian techniques, neural networks, model ensembles, and support vector machines. We will also touch on learning theory with an emphasis on its practical uses. Finally, we will cover the two main classes of unsupervised learning methods: machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 21 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! clustering and dimensionality reduction. Throughout the class there will be an emphasis not just on individual algorithms but on ideas that cut across them and tips for making them work. https://www.coursera.org/course/machlearning Core Concepts in Data Analysis (Higher School of Economics) Learn both theory and application for basic methods that have been invented either for developing new concepts – principal components or clusters, or for finding interesting correlations – regression and classification. This is preceded by a thorough analysis of 1D and 2D data This is an unconventional course in modern Data Analysis, Machine Learning and Data Mining. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. The term summarization embraces here both simple summaries like totals and means and more complex summaries: the principal components of a set of features and cluster structures in a set of entities. Similarly, correlation covers both bivariate and multivariate relations between input and target features including Bayes classifiers. https://www.coursera.org/course/datan Neural Networks for Machine Learning Neural Networks use learning algorithms that are inspired by our understanding of how the brain learns, but they are evaluated by how well they work for practical applications such as speech recognition, object recognition, image retrieval and the ability to recommend products that a user will like. As computers become more powerful, Neural Networks are gradually taking over from simpler Machine Learning methods. They are already at the heart of a new generation of speech recognition devices and they are beginning to outperform earlier systems for recognizing objects in images. The course will explain the new learning procedures that are responsible for these advances, including effective new proceduresr for learning multiple layers of non-‐linear features, and give you the skills and understanding required to apply these procedures in many other domains. https://www.coursera.org/course/neuralnets Natural Language Processing Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 22 technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models. https://www.coursera.org/course/nlangp Probabilistic Graphical Models Uncertainty is unavoidable in real-‐world applications: we can almost never predict with certainty what will happen in the future, and even in the present and the past, many important aspects of the world are not observed with certainty. Probability theory gives us the basic foundation to model our beliefs about the different possible states of the world, and to update these beliefs as new evidence is obtained. These beliefs can be combined with individual preferences to help guide our actions, and even in selecting which observations to make. While probability theory has existed since the 17th century, our ability to use it effectively on large problems involving many inter-‐related variables is fairly recent, and is due largely to the development of a framework known as Probabilistic Graphical Models (PGMs). This framework, which spans methods such as Bayesian networks and Markov random fields, uses ideas from discrete data structures in computer science to efficiently encode and manipulate probability distributions over high-‐dimensional spaces, often involving hundreds or even many thousands of variables. These methods have been used in an enormous range of application domains, which include: web search, medical and fault diagnosis, image understanding, reconstruction of biological networks, speech recognition, natural language processing, decoding of messages sent over a noisy communication channel, robot navigation, and many more. The PGM framework provides an essential tool for anyone who wants to learn how to reason coherently from limited and noisy observations. https://www.coursera.org/course/pgm Stanford Engineering Everywhere SEE programming includes one of Stanford’s most popular engineering sequences: the three-‐course Introduction to Computer Science taken by the majority of Stanford undergraduates, and seven more advanced courses in artificial intelligence and electrical engineering. Introduction to Computer Science Programming Methodology CS106A Programming Abstractions CS106B Programming Paradigms CS107 Artificial Intelligence Introduction to Robotics machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 23 CS223A Natural Language Processing CS224N Machine Learning CS229 Linear Systems and Optimization The Fourier Transform and its Applications EE261 Introduction to Linear Dynamical Systems EE263 Convex Optimization I EE364A Convex Optimization II EE364B Additional School of Engineering Courses Programming Massively Parallel Processors CS193G iPhone Application Programming CS193P Seminars and Webinars http://see.stanford.edu/see/courses.aspx EdX Learning from data (Caltech) This is an introductory course in machine learning (ML) that covers the basic theory, algorithms, and applications. ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applications. It enables computational systems to automatically learn how to perform a desired task based on information extracted from the data. ML has become one of the hottest fields of study today, taken up by undergraduate and graduate students from 15 different majors at Caltech. This course balances theory and practice, and covers the mathematical as well as the heuristic aspects. https://www.edx.org/course/caltechx/caltechx-‐cs1156x-‐learning-‐data-‐1120#.U5NNJxaRPwI https://www.edx.org/course/caltechx/caltechx-‐cs1156x-‐learning-‐data-‐1120#.U4oB75SSyG4 Articifial Intelligence (BerkeleyX) CS188.1x is a new online adaptation of the first half of UC Berkeley's CS188: Introduction to Artificial Intelligence. The on-‐campus version of this upper division computer science course draws about 600 Berkeley students each year. Artificial intelligence is already all around you, from web search to video games. AI methods plan your driving directions, filter your spam, and focus your cameras on faces. AI lets you guide your phone with your voice and read foreign newspapers in English. Beyond today's applications, AI is at the core of many new technologies that will shape our future. From self-‐driving cars to household robots, advancements in AI help transform science fiction into real systems. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 24 CS188.1x focuses on Behavior from Computation. It will introduce the basic ideas and techniques underlying the design of intelligent computer systems. A specific emphasis will be on the statistical and decision–theoretic modeling paradigm. By the end of this course, you will have built autonomous agents that efficiently make decisions in stochastic and in adversarial settings. CS188.2x (to follow CS188.1x, precise date to be determined) will cover Reasoning and Learning. With this additional machinery your agents will be able to draw inferences in uncertain environments and optimize actions for arbitrary reward structures. Your machine learning algorithms will classify handwritten digits and photographs. The techniques you learn in CS188x apply to a wide variety of artificial intelligence problems and will serve as the foundation for further study in any application area you choose to pursue. https://www.edx.org/course/uc-‐berkeleyx/uc-‐berkeleyx-‐cs188-‐1x-‐artificial-‐579#.U4CqKl6RPwI Big Data and Social Physics (Ethics) Social physics is a big data science that models how networks of people behave and uses these network models to create actionable intelligence. It is a quantitative science that can accurately predict patterns of human behavior and guide how to influence those patterns to (for instance) increase decision making accuracy or productivity within an organization. Included in this course is a survey of methods for increasing communication quality within an organization, approaches to providing greater protection for personal privacy, and general strategies for increasing resistance to cyber attack. https://www.edx.org/course/mitx/mitx-‐mas-‐s69x-‐big-‐data-‐social-‐physics-‐1737#.U4Cox5RdWG4 Introduction to Computational Thinking and Data Science 6.00.2x is aimed at students with some prior programming experience in Python and a rudimentary knowledge of computational complexity. We have chosen to focus on breadth rather than depth. The goal is to provide students with a brief introduction to many topics, so that they will have an idea of what’s possible when the time comes later in their career to think about how to use computation to accomplish some goal. That said, it is not a “computation appreciation” course. Students will spend a considerable amount of time writing programs to implement the concepts covered in the course. Topics covered include plotting, stochastic programs, probability and statistics, random walks, Monte Carlo simulations, modeling data, optimization problems, and clustering. https://www.edx.org/course/mitx/mitx-‐6-‐00-‐2x-‐introduction-‐computational-‐2836 MIT OpenCourseWare (OCW) OCW makes the materials used in the teaching of MIT's subjects available on the Web. http://ocw.mit.edu/index.htm https://www.youtube.com/user/MIT VLAB MIT Entreprise Forum Bay Area, Machine Learning Videos Added the 22-‐Nov-‐2014 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 25 Discovery of Disruptive Innovations & Actionable Ideas. VLAB is the San Francisco Bay Area chapter of the MIT Enterprise Forum, a non-‐ profit organization dedicated to promoting the growth and success of high-‐tech entrepreneurial ventures by connecting ideas, technology and people. We provide a forum for San Francisco and Silicon Valley's leading entrepreneurs, industry experts, venture capitalists, private investors and technologists to exchange insights about how to effectively grow high-‐tech ventures amidst dynamic market risks and challenges. In a world where markets change at breakneck speed, knowledge is a critical source of competitive advantage. Our forums provide an excellent opportunity to network and learn about pivotal business issues, emerging industries and the latest technologies. http://www.youtube.com/user/vlabvideos/search?query=machine+learning Foundations of Machine Learning by Mehryar Mohri -‐ 10 years of Homeworks with Solutions and Lecture Slides, not to be missed ! Added the 11-‐Nov-‐2014 Course Description This course introduces the fundamental concepts and methods of machine learning, including the description and analysis of several modern algorithms, their theoretical basis, and the illustration of their applications. Many of the algorithms described have been successfully used in text and speech processing, bioinformatics, and other areas in real-‐world products and services. The main topics covered are: Probability tools, concentration inequalities PAC model Rademacher complexity, growth function, VC-‐dimension Perceptron, Winnow Support vector machines (SVMs) Kernel methods Decision trees Boosting Density estimation, maximum entropy models Logistic regression Regression problems and algorithms Ranking problems and algorithms Halving algorithm, weighted majority algorithm, mistake bounds Learning automata and transducers Reinforcement learning, Markov decision processes (MDPs) http://www.cs.nyu.edu/~mohri/ml14/ IPAM, Institute for Pure and Applied Mathematics, Videos, UCLA IPAM records many of its lectures and makes them available to the public so that a wider audience may benefit from the scientific programs we offer. Since July 2012, IPAM has begun to record most of its lectures. You can access the lectures for a particular program or workshop (such as Materials Defects Tutorials) by following the program link listed below to the relevant workshop schedule. Each speaker is listed along with available slide shows and videos. For public lectures, the link will machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 26 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! take you directly to the video. The programs and public lectures are listed in reverse chronological order. Older videos play on Real Player only; recent videos will play on Flash supported browsers and software. https://www.ipam.ucla.edu/videos.aspx Carnegie Mellon University Carnegie Mellon University (CMU) Video resources "The videos below are intended to serve as resources for our current students, and not as online learning materials for students outside of our program." -‐ The Machine Learning Department http://www.ml.cmu.edu/teaching/video-‐resources.html Convex Optimisation, Fall 2013, by Barnabas Poczos and Ryan Tibshirani, CMU Overview and objectives Nearly every problem in machine learning and statistics can be formulated in terms of the optimization of some function, possibly under some set of constraints. As we obviously cannot solve every problem in machine learning or statistics, this means that we cannot generically solve every optimization problem (at least not efficiently). Fortunately, many problems of interest in statistics and machine learning can be posed as optimization tasks that have special properties—such as convexity, smoothness, separability, sparsity etc.— permitting standardized, efficient solution techniques. This course is designed to give a graduate-‐level student a thorough grounding in these properties and their role in optimization, and a broad comprehension of algorithms tailored to exploit such properties. The main focus will be on convex optimization problems, though we will also discuss nonconvex problems at the end. We will visit and revisit important applications in statistics and machine learning. Upon completing the course, students should be able to approach an optimization problem (often derived from a statistics or machine learning context) and: (1) identify key properties such as convexity, smoothness, sparsity, etc., and/or possibly reformulate the problem so that it possesses such desirable properties; (2) select an algorithm for this optimization problem, with an understanding of the ad-‐ vantages and disadvantages of applying one method over another, given the problem and properties at hand; (3) implement this algorithm or use existing software to efficiently compute the solution. http://www.stat.cmu.edu/~ryantibs/convexopt/#videos Machine Learning, Spring 2011, by Tom Mitchell, CMU Machine Learning is concerned with computer programs that automatically improve their performance through experience (e.g., programs that learn to recognize human faces, recommend music and movies, and drive autonomous robots). This course machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 27 covers the theory and practical algorithms for machine learning from a variety of perspectives. We cover topics such as Bayesian networks, decision tree learning, Support Vector Machines, statistical learning methods, unsupervised learning and reinforcement learning. The course covers theoretical concepts such as inductive bias, the PAC learning framework, Bayesian learning methods, margin-‐based learning, and Occam's Razor. Short programming assignments include hands-‐on experiments with various learning algorithms, and a larger course project gives students a chance to dig into an area of their choice. This course is designed to give a graduate-‐level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in machine learning. http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml Homework with solutions http://www.cs.cmu.edu/~tom/10701_sp11/hws.shtml Metacademy Concept list and roadmap list Metacademy is a community-‐driven, open-‐source platform for experts to collaboratively construct a web of knowledge. Right now, Metacademy focuses on machine learning and probabilistic AI, because that's what the current contributors are experts in. But eventually, Metacademy will cover a much wider breadth of knowledge, e.g. mathematics, engineering, music, medicine, computer science… http://www.metacademy.org/list http://www.metacademy.org/roadmaps/ Harvard University Advanced Machine Learning, Fall 2013 (Free access to most of videos) This course is about learning to extract statistical structure from data, for making decisions and predictions, as well as for visualization. The course will cover many of the most important math-‐ ematical and computational tools for probabilistic modeling, as well as examine specific models from the literature and examine how they can be used for particular types of data. There will be a heavy emphasis on implementation. You may use Matlab, Python or R. Each of the five assign-‐ ments will involve some amount of coding, and the final project will almost certainly require the running of computer experiments. https://www.seas.harvard.edu/courses/cs281/ Data Science Course, Fall 2013 Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 28 classification; and communication of results through visualization, stories, and interpretable summaries. We will be using Python for all programming assignments and projects. http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml Oxford University, Nando de Freitas video lectures I am a machine learning professor at UBC. I am making my lectures available to the world with the hope that this will give more folks out there the opportunity to learn some of the wonderful things I have been fortunate to learn myself. Enjoy. http://www.youtube.com/user/ProfNandoDF Yee Whye Teh Home Page, Department of Statistics, University College, University of Oxford Research Interests I am interested in machine learning, Bayesian statistics and computational statistics. My current focus is on developing Bayesian nonparametric methodologies, with applications to large and complex problems in unsupervised learning, computational linguistics, and genetics. Teaching : Statistical Machine Learning and Data Mining (MS1b HT2014) Slides and Problem Sheets with Solutions (not to be missed!) http://www.stats.ox.ac.uk/~teh/smldm.html About Bayesian Nonparametrics (MLSS 2013) https://www.youtube.com/embed/dNeW5zoNJ7g?vq=hd1080&autoplay=1 https://www.youtube.com/embed/7sy_MCbqtco?vq=hd1080&autoplay=1 https://www.youtube.com/embed/kqEWDdTB_3Q?vq=hd1080&autoplay=1 https://www.youtube.com/watch?v=FO0fgVS9OmE&spfreload=10 Slides http://mlss.tuebingen.mpg.de/2013/slides_teh.pdf Cambridge University Machine Learning Slides, Spring 2014 LECTURE SYLLABUS This year, the exposition of the material will be centered around three specific machine learning areas: 1) supervised non-‐paramtric probabilistic inference using Gaussian processes, 2) the TrueSkill ranking system and 3) the latent Dirichlet Allocation model for unsupervised learning in text. http://mlg.eng.cam.ac.uk/teaching/4f13/1314/ Caltech University, Learning from Data Free, introductory Machine Learning online course (MOOC) Taught by Caltech Professor Yaser Abu-‐Mostafa [article] Lectures recorded from a live broadcast, including Q&A Prerequisites: Basic probability, matrices, and calculus 8 homework sets and a final exam Discussion forum for participants machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 29 Topic-‐by-‐topic video library for easy review http://work.caltech.edu/telecourse.html http://work.caltech.edu/library/ University College London Discovery UCL Discovery showcases UCL's research publications, giving access to journal articles, book chapters, conference proceedings, digital web resources, theses and much more, from all UCL disciplines. Where copyright permissions allow, a full copy of each research publication is directly available from UCL Discovery. You can search or browse UCL Discovery, see the most-‐downloaded publications, and keep up to date with the latest UCL research by RSS or even on Twitter. UCL Discovery supports UCL's Publications Policy. http://discovery.ucl.ac.uk/cgi/search/simple?q=machine+learning&_order=bytitle&basic_srchtype=ALL&_satisfyall=ALL&_ac tion_search=Search http://discovery.ucl.ac.uk http://www.youtube.com/watch?v=Euaoblv_nL8 University College London, Supervised Learning The course covers supervised approaches to machine learning. It starts by probabilistic pattern recognition followed by an in-‐depth introduction to various supervised learning algorithms such as Least Squares, Lasso, Perceptron Algorithm, Support Vector Machines and Boosting. http://www0.cs.ucl.ac.uk/staff/M.Herbster/GI01/ Yann LeCun’s Publications My main research interests are Machine Learning, Computer Vision, Mobile Robotics, and Computational Neuroscience. I am also interested in Data Compression, Digital Libraries, the Physics of Computation, and all the applications of machine learning (Vision, Speech, Language, Document understanding, Data Mining, Bioinformatics). http://yann.lecun.com/exdb/publis/index.html#fulllist Francis Bach, Ecole Normale Superieure -‐ Courses and Exercises with solutions (English-‐French) Spring 2014: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Fall 2013: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Spring 2013: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Spring 2013: Statistical machine learning -‐ Filiere Math/Info -‐ L3 -‐ Ecole Normale Superieure (Paris) Fall 2012: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Spring 2012: Statistical machine learning -‐ Filiere Math/Info -‐ L3 -‐ Ecole Normale Superieure (Paris) Spring 2012: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 30 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Fall 2011: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Spring 2011: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Fall 2010: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Spring 2010: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Fall 2009: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Fall 2008: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan May 2008: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des Mines de Paris Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2007: Probabilistic modelling and graphical models: Enseignement Specialise -‐ Ecole des Mines de Paris Fall 2006: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Fall 2005: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan http://www.di.ens.fr/~fbach/ http://videolectures.net/francis_r_bach/ Technion, Israel Institute of Technology, Machine Learning Videos Added the 22-‐Nov-‐2014 Technion -‐ Israel Institute of Technology is Israel's biggest scientific-‐technological university and one of the largest centers of applied research in the world. Here the future is being shaped -‐ by over 13,000 of Israel's most dynamic students active in 18 faculties. Technion is Israel's flagship of world-‐class education, bringing Israel its first Nobel Prizes in science. From the cornerstone laying ceremony in 1912, Technion's over 70,000 alumni have built the state of Israel and created and lead the majority of Israel's successful companies, impacting millions of scientists, students, entrepreneurs and citizens worldwide. http://www.youtube.com/user/Technion/search?query=machine+learning E0 370: Statistical Learning Theory by Prof. Shivani Agarwal, Indian Institute of Science Course Description This is an advanced course on learning theory suitable for PhD students working in learning theory or related areas (e.g. information theory, game theory, computational complexity theory etc) or 2nd-‐year Masters students doing a machine learning related project that involves learning-‐theoretic concepts. The course will consist broadly of three parts and will cover roughly the following topics: machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 31 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Generalization error bounds Uniform convergence Growth function, VC-‐dimension, Sauer's Lemma Covering numbers, pseudo-‐dimension, fat-‐shattering dimension Margin analysis Rademacher averages Algorithmic stability Statistical consistency and learnability Consistency of ERM and SRM methods Learnability/PAC learning Consistency of nearest neighbor methods Consistency of surrogate risk minimization methods (binary and multiclass) Online learning and multi-‐armed bandits Online classification/regression Online learning from experts, online allocation Online convex optimization Online-‐to-‐batch conversions Multi-‐armed bandits (stochastic and adversarial) http://drona.csa.iisc.ernet.in/~shivani/Teaching/E0370/Aug-‐ 2013/index.html#lectures NPTEL, National Programme on Technology Enhanced Learning, India NPTEL provides E-‐learning through online Web and Video courses in Engineering, Science and humanities streams. The mission of NPTEL is to enhance the quality of Engineering education in the country by providing free online courseware. http://nptel.ac.in Probability Theory and Applications http://nptel.ac.in/courses/111104079/ Pattern Recognition http://nptel.ac.in/courses/106106046/1 Pattern Recognition Class, Universität Heidelberg, 2012 (Videos in English) Syllabus: 1. Introduction 1.1 Applications of Pattern Recognition 1.2 k-‐Nearest Neighbors Classification 1.3 Probability Theory 1.4 Statistical Decision Theory 2. Correlation Measures, Gaussian Models 2.1 Pearson Correlation 2.2 Alternative Correlation Measures 2.3 Gaussian Graphical Models machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 32 2.4 Discriminant Analysis 3. Dimensionality Reduction 3.1 Regularized LDA/QDA 3.2 Principal Component Analysis (PCA) 3.3 Bilinear Decompositions 4. Neural Networks 4.1 History of Neural Networks 4.2 Perceptrons 4.3 Multilayer Perceptrons 4.4 The Projection Trick 4.5 Radial Basis Function Networks 5. Support Vector Machines 5.1 Loss Functions 5.2 Linear Soft-‐Margin SVM 5.3 Nonlinear SVM 6. Kernels, Random Forest 6.1 Kernels 6.2 One-‐Class SVM 6.3 Random Forest 6.4 Random Forest Feature Importance 7. Regression 7.1 Least-‐Squares Regression 7.2 Optimum Experimental Design 7.3 Case Study: Functional MRI 7.4 Case Study: Computer Tomography 7.5 Regularized Regression 8. Gaussian Processes 8.1 Gaussian Process Regression 8.2 GP Regression: Interpretation 8.3 Gaussian Stochastic Processes 8.4 Covariance Function 9. Unsupervised Learning 9.1 Kernel Density Estimation 9.2 Cluster Analysis 9.3 Expectation Maximization 9.4 Gaussian Mixture Models 10. Directed Graphical Models machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 33 10.1 Bayesian Networks 10.2 Variable Elimination 10.3 Message Passing 10.4 State Space Models 11. Optimization 11.1 The Lagrangian Method 11.2 Constraint Qualifications 11.3 Linear Programming 11.4 The Simplex Algorithm 12. Structured Learning 12.1 structSVM 12.2 Cutting Planes https://www.youtube.com/playlist?list=PLuRaSnb3n4kRDZVU6wxPzGdx1CN12fn0w&spfreload=10 Videolectures.net VideoLectures.NET is an award-‐winning free and open access educational video lectures repository. The lectures are given by distinguished scholars and scientists at the most important and prominent events like conferences, summer schools, workshops and science promotional events from many fields of Science. The portal is aimed at promoting science, exchanging ideas and fostering knowledge sharing by providing high quality didactic contents not only to the scientific community but also to the general public. All lectures, accompanying documents, information and links are systematically selected and classified through the editorial process taking into account also users' comments. http://videolectures.net/Top/Computer_Science/Machine_Learning/ http://videolectures.net/Top/Computer_Science/Machine_Learning/#o=top MLSS Machine Learning Summer Schools Videos MLSS Videos from 2004 to 2012 http://videolectures.net/site/search/?q=MLSS MLSS Videos 2012 http://www.youtube.com/user/compcinemaucsc/feed MLSS Videos 2012 http://www.youtube.com/channel/UCHhbDEKA7BP58mq1wfTBQNQ Max Planck Institute for Intelligent Systems Tubingen, MLSS Videos 2013 Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems. The Institute studies these principles in biological, computational, hybrid, and material systems ranging from machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 34 nano to macro scales.We take a highly interdisciplinary approach that combines mathematics, computation, material science, and biology. The MPI for Intelligent Systems has campuses in Stuttgart and Tübingen. Our Stuttgart campus has world-‐leading expertise in small-‐scale intelligent systems that leverage novel material science and biology. The Tübingen campus focuses on how intelligent systems process information to perceive, act and learn. http://www.youtube.com/channel/UCty-‐pPOWlWUk4gXNm5pydcg http://mlss.tuebingen.mpg.de/2013/speakers.html MLSS Videos 2014 https://www.youtube.com/playlist?list=PLZSO_6-‐bSqHQCIYxE3ycGLXHMjK3XV7Iz&spfreload=10 All slides of MLSS 2015, Austin, Texas http://www.cs.utexas.edu/mlss/schedule GoogleTechTalks Machine Learning https://www.youtube.com/user/GoogleTechTalks/search?query=machine+learning Deep Learning https://www.youtube.com/user/GoogleTechTalks/search?query=deep+learning Udacity Opencourseware Supervised Learning (select "View Courseware" for free access) Why Take This Course? In this course, you will gain an understanding of a variety of topics and methods in Supervised Learning. Like function approximation in general, Supervised Learning prompts you to make generalizations based on fundamental assumptions about the world. Michael: So why wouldn't you call it "function induction?" Charles: Because someone said "supervised learning" first. Topics covered in this course include: Decision trees, neural networks, instance-‐ based learning, ensemble learning, computational learning theory, Bayesian learning, and many other fascinating machine learning concepts. https://www.udacity.com/course/ud675 Unsupervised Learning (select "View Courseware" for free access) Why Take This Course? You will learn about and practice a variety of Unsupervised Learning approaches, including: randomized optimization, clustering, feature selection and transformation, and information theory. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 35 You will learn important Machine Learning methods, techniques and best practices, and will gain experience implementing them in this course through a hands-‐on final project in which you will be designing a movie recommendation system (just like Netflix!). https://www.udacity.com/course/ud741 Reinforcement Learning (select "View Courseware" for free access) Why Take This Course? You will learn about Reinforcement Learning, the field of Machine Learning concerned with the actions that software agents ought to take in a particular environment in order to maximize rewards. Michael: Reinforcement Learning is a very popular field. Charles: Perhaps because you're in it, Michael. Michael: I don't think that's it. In this course, you will gain an understanding of topics and methods in Reinforcement Learning, including Markov Decision Processes and Game Theory. You will gain experience implementing Reinforcement Learning techniques in a final project. In the final project, we’ll bring back the 80's and design a Pacman agent capable of eating all the food without getting eaten by monsters. https://www.udacity.com/course/ud820 Mathematicalmonk Machine Learning Videos about math, at the graduate level or upper-‐level undergraduate. https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA Judea Pearl Symposium Judea Pearl (born 1936) is an Israeli-‐born American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belief propagation). He is also credited for developing a theory of causal and counterfactual inference based on structural models (see article on causality). He is the 2011 winner of the ACM Turing Award, the highest distinction in computer science, "for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning". (source Wikipedia) http://www.youtube.com/playlist?list=PLMliWGoMCBYilM6tw6S_4BpL_t29jbWsp http://www.youtube.com/user/UCLA/playlists Machine Learning Reading Group, Indian Institute of Science Focus Areas: Machine Learning & Convex Optimization http://clweb.csa.iisc.ernet.in/achintya/mlrg/ SIGDATA, Indian Institute of Technology Kanpur http://www.cse.iitk.ac.in/users/sigdata/ http://www.cse.iitk.ac.in/users/sesres/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 36 Hakka Labs Hakka Labs is passionate about helping professional software engineers level up in their careers. Our content, events & community have grown by leaps and bounds since our humble origin when we launched as a Tumblr blog in 2011. We believe that "software is eating the world" and our passion is in building valuable resources and community for startup-‐oriented software engineers -‐ the folks that will power innovation and disrupt industries, and ultimately shape our future. Hakka originally launched in SF Bay & NYC and rapidly built relationships with the top companies, CTOs and tech influencers in these key areas. We have deep connections to the software engineering worlds on both coasts and often invite groups of CTOs and engineers to our office in Soho, or meet with them at engineering events that we either run or participate in. We're also currently up & running in Berlin & Moscow, and plan to continue to rapidly expand worldwide. Not too shabby for a scrappy startup with a small marketing budget! http://www.hakkalabs.co https://www.youtube.com/user/g33ktalktv/videos Open Yale Course Game Theory Each course includes a full set of class lectures produced in high-‐quality video accompanied by such other course materials as syllabi, suggested readings, exams, and problem sets. The lectures are available as downloadable videos, and an audio-‐ only version is also offered. In addition, searchable transcripts of each lecture are provided. http://oyc.yale.edu/courses Columbia University Machine Learning resources Course related notes Regression by linear combination of basis functions [ps] [pdf] The perceptron [ps] [pdf] Document classification with the multinomial model [ps] [pdf] Sampling from a Gaussian [ps] [pdf] Slides on exponential family distributions [ps] [pdf] http://www.cs.columbia.edu/~jebara/4771/tutorials.html Applied Data Science by Ian Langmore and Daniel Krasner The purpose of this course is to take people with strong mathematical/statistical knowledge and teach them software development fundamentals. This course will cover machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 37 • Design of small software packages • Working in a Unix environment • Designing software in teams • Fundamental statistical algorithms such as linear and logistic regression • Overfitting and how to avoid it • Working with text data (e.g. regular expressions) • Time series • And more. . . http://columbia-‐applied-‐data-‐science.github.io/appdatasci.pdf http://columbia-‐applied-‐data-‐science.github.io Deep Learning Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. This website is intended to host a variety of resources and pointers to information about Deep Learning. In these pages you will find • a reading list, • links to software, • datasets, • a list of deep learning research groups and labs, • a list of announcements for deep learning related jobs (job listings), • as well as tutorials and cool demos. For the latest additions, including papers and software announcement, be sure to visit the Blog section and subscribe to our RSS feed of the website. Contact us if you have any comments or suggestions! http://www.deeplearning.net/tutorial/ http://deeplearning.net BigDataWeek Videos Big Data Week is one of the most unique global platforms of interconnected community events focusing on the social, political, technological and commercial impacts of Big Data. It brings together a global community of data scientists, data technologies, data visualisers and data businesses spanning six major commercial, financial, social and technological sectors. http://www.youtube.com/user/BigDataWeek/videos Neural Information Processing Systems Foundation (NIPS) Video resources The Foundation: The Neural Information Processing Systems (NIPS) Foundation is a non-‐profit corporation whose purpose is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects. Neural information processing is a field which benefits from a combined view of biological, physical, mathematical, and computational sciences. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 38 The primary focus of the NIPS Foundation is the presentation of a continuing series of professional meetings known as the Neural Information Processing Systems Conference, held over the years at various locations in the United States, Canada and Spain. http://www.youtube.com/user/NeuralInformationPro/feed Hong Kong Open Source Conference 2013 (English&Chinese) Wang Leung Wong The Vice-‐Chairperson of the Hong Kong Linux User Group This channel will post the videos of my life and opensource events in Hong Kong. Hong Kong Linux User Group: http://linux.org.hk Facebook: https://www.facebook.com/groups/hklug/ http://www.youtube.com/playlist?list=PL2FSfitY-‐hTKbEKNOwb-‐j0blK6qBauZ1f http://www.youtube.com/playlist?list=PL2FSfitY-‐hTLOL6tT_12YUK4c67e-‐E0xh ICLR 2014 Videos It is well understood that the performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied. The rapidly developing field of representation learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field, and include in it topics such as deep learning and feature learning, metric learning, kernel learning, compositional models, non-‐linear structured prediction, and issues regarding non-‐ convex optimization. Despite the importance of representation learning to machine learning and to application areas such as vision, speech, audio and NLP, there is currently no common venue for researchers who share a common interest in this topic. The goal of ICLR is to help fill this void. ICLR 2014 will be a 3-‐day event from April 14th to April 16th 2014, in Banff, Canada. The conference will follow the recently introduced open reviewing and open publishing publication process, which is explained in further detail here: Publication Model. https://www.youtube.com/playlist?list=PLhiWXaTdsWB-‐3O19E0PSR0r9OseIylUM8 ICLR 2013 Videos ICLR 2013 will be a 3-‐day event from May 2nd to May 4th 2013, co-‐located with AISTATS2013 in Scottsdale, Arizona. The conference will adopt a novel publication process, which is explained in further detail here: Publication Model. https://sites.google.com/site/representationlearning2013/program-‐details/program Machine Learning Conference Videos Events matching your search: • ICML 2011 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 39 • • • • • • • • Sixth Annual Machine Learning Symposium 1st Lisbon Machine Learning School Copulas in Machine Learning Workshop 2011 NIPS 2011 Workshop on Integrating Language and Vision Machine Learning in Computational Biology (MLCB) 2011 Learning Semantics Workshop Sparse Representation and Low-‐rank Approximation • Scale Big Learning: Algorithms, Systems, and Tools for Learning at • Learning) ICML 2012 Oral Talks (International Conference on Machine • Big Data Meets Computer Vision: First International Workshop on Large Scale Visual Recognition and Retrieval • 2nd Workshop on Semantic Perception, Mapping and Exploration (SPME) • • • Object, functional and structured data: towards next generation kernel-‐based methods -‐ ICML 2012 Workshop • Tutorial on Statistical Learning Theory in Reinforcement Learning and Approximate Dynamic Programming • beyond Tutorial on Causal inference -‐ conditional independences and • • ICML 2012 Tutorial on Prediction, Belief, and Markets • Performance Evaluation for Learning Algorithms: Techniques, Application and Issues • • • • • 2nd Lisbon Machine Learning School (2012) OpenCV using Python Big Learning : Algorithms, Systems, and Tools NIPS 2012 Workshop on Log-‐Linear Models Machine Learning in Computational Biology (MLCB) 2012 The 4th International Workshop on Music and Machine Learning: Learning from Musical Structure ICML 2012 Workshop on Representation Learning Inferning 2012: ICML Workshop on interaction between Inference and Learning PAC-‐Bayesian Analysis in Supervised, Unsupervised, and Reinforcement Learning machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 40 NYU Course on Big Data, Large Scale Machine Learning • • • 2013 International Conference on Learning Representations (ICLR) • • ICML 2013 Plenary Webcast Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS) 2013 NYU Course on Deep Learning (Spring 2014) • NYU Course on Machine Learning and Computational Statistics 2014 http://techtalks.tv/search/results/?q=machine+learning Internet Archive Hello Patron, Every day 3 million people use our collections. We have archived over ten petabytes (that's 10,000,000,000,000,000 bytes!) of information, including everything ever written in Balinese. This year we also launched our groundbreaking TV News Search and Borrow service, which former FCC Chairman Newton Minow said "offers citizens exceptional opportunities" to easily do their own fact checking and "to hold powerful public institutions accountable." Your support helps us build amazing services and keep them free for people around the globe. https://archive.org/search.php?query=machine%20learning University of Berkeley http://www.youtube.com/user/UCBerkeley/search?query=machine+learning AMP Camps, Big Data Bootcamp, UC Berkeley AMP Camps are Big Data training events organized by the UC Berkeley AMPLab about big data analytics, machine learning, and popular open-‐source software projects produced by the AMPLab. All AMP Camp curriculum, and whenever possible videos of instructional talks presented at AMP Camps, are published here and accessible for free. http://ampcamp.berkeley.edu AMP Camp 5 was held at UC Berkeley and live-‐streamed online on November 20 and 21, 2014. Videos and exercises from the event are available on the AMPCamp 5 page. http://ampcamp.berkeley.edu/5/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 41 Resources and Tools of Noah's ARK Research Group The following were developed by ARK researchers (*developed in whole or in part before joining ARK): NLP tools: universal part-‐of-‐speech tagset, set of twelve coarse POS tags that generalizes across several languages Semantics: SEMAFOR, an open-‐source statistical frame-‐semantic parser; AMALGr, an open-‐source statistical analyzer for multiword expressions in context Syntax: TurboParser, an open-‐source, trainable statistical dependency parser; MSTParserStacked, an open-‐source, trainable statistical dependency parser based on stacking; DAGEEM code for unsupervised dependency grammar induction Information extraction: Arabic named entity recognizer Libraries/languages: AD3, an approximate MAP decoder; *Dyna, a declarative programming language for dynamic programming algorithms Machine translation tools, including: *cdec, a framework for statistical translation and other structure prediction problems; *Egypt, a statistical machine translation toolkit that includes Giza; gappy pattern models, code for modeling monolingual and bilingual textual patterns with gaps; Rampion, a training algorithm for statistical machine translation models Social media tools, including: Twitter NLP resources Datasets: *STRAND (parallel text collections from the web); CURD (the Carnegie Mellon University Recipe Database); 10-‐K Corpus (company annual reports and stock return volatility data); political blog corpus; movie$ corpus; movie summary corpus; question-‐answer data; Congressional bills corpus; Arabic named entity and supersense corpora; NFL tweets corpus; multiword expressions corpus Project websites: Flexible Learning for NLP; Low-‐Density MT; Compuframes, Big Multilinguality, Corporate Social Network http://www.ark.cs.cmu.edu/#resources ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2014 ABOUT THE ESAC FACULTY The ESAC Faculty was created in 2006 in order to foster an effective scientific environment at ESAC, and to to present a united face to the scientific work done at the centre. The faculty includes all active (i.e. publishing papers) research scientists at ESAC: ESA staff, Research Fellows, Science Contractors, and LAEFF members. For an insight into the founding principles, see the Overview of the ESAC Faculty presentation given at the first assembly. The ESAC Faculty's main purpose is to stimulate and promote science activities at ESAC. For this it maintains an active and attractive visitor programme for short-‐to-‐ medium term collaborative stays at ESAC, covering established researchers as well as young post-‐docs, PhD and graduate students. The Faculty also supports visiting seminar speakers, conferences, workshops and travel not possibly via normal mission budgets. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 42 ESAC Faculty members pursue their own research (as per the scientific interests of individual members), but are also involved in numerous internal and external collaborations (overview of Faculty Science at ESAC). Faculty members are also strongly involved in the ESAC Trainee programme. http://www.cosmos.esa.int/web/esac-‐science-‐faculty/esac-‐statistics-‐workshop-‐ 2014 The Royal Society The Royal Society is a self-‐governing Fellowship of many of the world’s most distinguished scientists drawn from all areas of science, engineering, and medicine. The Society’s fundamental purpose, reflected in its founding Charters of the 1660s, is to recognise, promote, and support excellence in science and to encourage the development and use of science for the benefit of humanity. The Society has played a part in some of the most fundamental, significant, and life-‐ changing discoveries in scientific history and Royal Society scientists continue to make outstanding contributions to science in many research areas. The Royal Society is the national Academy of science in the UK, and its core is its Fellowship and Foreign Membership, supported by a dedicated staff in London and elsewhere. The Fellowship comprises the most eminent scientists of the UK, Ireland and the Commonwealth. A major activity of the Society is identifying and supporting the work of outstanding scientists. The Society supports researchers through its early and senior career schemes, innovation and industry schemes, and other schemes. The Society facilitates interaction and communication among scientists via its discussion meetings, and disseminates scientific advances through its journals. The Society also engages beyond the research community, through independent policy work, the promotion of high quality science education, and communication with the public. https://www.youtube.com/user/RoyalSociety/videos?spfreload=10 Statistical and causal approaches to machine learning by Professor Bernhard Schölkopf https://www.youtube.com/watch?v=ek9jwRA2Jio&spfreload=10 Deep Learning Deep Learning RNNaissance with Dr. Juergen Schmidhuber A great session of NYC-‐ML Meetup Hosted by ShutterStock in the glorious Empire State building. Details: Deep Learning RNNaissance Machine learning and pattern recognition are currently being revolutionised by "Deep Learning" (DL) https://www.youtube.com/watch?v=6bOMf9zr7N8&spfreload=10 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 43 Introduction to Deep Learning with Python by Alec Radford Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning with Python and the Theano library. The emphasis of the talk is on high performance computing, natural language processing using recurrent neural nets, and large scale learning with GPUs. https://www.youtube.com/watch?v=S75EdAcXHKk SlideShare presentation is available here: http://slidesha.re/1zs9M11 A Statistical Learning/Pattern Recognition Glossary by Thomas Minka Welcome to my glossary. It is inspired by Brian Ripley's glossary in "Pattern Recognition for Neural Networks" (and the need to save time explaining things). http://alumni.media.mit.edu/~tpminka/statlearn/glossary/ The Kalman Filter Website by Greg Welch and Gary Bishop The Kalman Filter Some tutorials, references, and research related to the Kalman filter. This site is maintained by Greg Welch in Nursing / Computer Science / Simulation & Training at the University of Central Florida, and Gary Bishop in the Department of Computer Science at the University of North Carolina at Chapel Hill. Welch also holds an adjunct position at UNC-‐Chapel Hill. Please send additions or comments. http://www.cs.unc.edu/~welch/kalman/index.html Lisbon Machine Learning School (LXMLS) LXMLS Lab guide (Great Tutorial!) Day 0 In this class we will introduce several fundamental concepts needed further ahead. We start with an introduc-‐ tion to Python, the programming language we will use in the lab sessions, and to Matplotlib and Numpy, two modules for plotting and scientific computing in Python, respectively. Afterwards, we present several notions on probability theory and linear algebra. Finally, we focus on numerical optimization. The goal of this class is to give you the basic knowledge for you to understand the following lectures. We will not enter in too much detail in any of the topics. Day 1 This day will serve as an introduction to machine learning. We recall some fundamental concepts about deci-‐ sion theory and classification. We also present some widely used models and algorithms and try to provide the main motivation behind them. There are several textbooks that provide a thorough description of some of the concepts introduced here: for example, Mitchell (1997), Duda et al. (2001), Scho ̈lkopf and Smola (2002), Joachims (2002), Bishop (2006), Manning et al. (2008), to name just a few. The concepts that we introduce in this chapter will be machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 44 revisited in later chapters, where the same algorithms and models will be adapted to structured inputs and outputs. For now, we concern only with multi-‐class classification (with just a few classes). Day 2 In this class, we relax the assumption that the data points are independently and identically distributed (i.i.d.) by moving to a scenario of structured prediction, where the inputs are assumed to have temporal or spacial dependencies. We start by considering sequential models, which correspond to a chain structure: for instance, the words in a sentence. In this lecture, we will use part-‐of-‐speech tagging as our example task. We start by defining the notation for this lecture in Section 2.1. Afterwards, in section 2.2, we focus on the well known Hidden Markov Models and in Section 2.3 we describe how to estimate its parameters from labeled data. In Section 2.4 we explain the inference algorithms (Viterbi and Forward-‐Backward) for sequence models. These inference algorithms will be fundamental for the rest of this lecture, as well as for the next lecture on discriminative training of sequence models. In Section 2.6 we describe the task of Part-‐of-‐Speech tagging, and how the Hidden Markov Models are suitable for this task. Finally, in Section 2.7 we address unsupervised learning of Hidden Markov Models through the Expectation Maximization algorithm. Day 3 In this class, we will continue to focus on sequence classification, but instead of following a generative ap-‐ proach (like in the previous chapter) we move towards discriminative approaches. Recall that the difference between these approaches is that generative approaches attempt to model the probability distribution of the data, P(X, Y), whereas discriminative ones only model the conditional probability of the sequence, given the observed data, P(Y|X). Day 4 In this lab we will implement some exercises related with parsing. Day 5 In this lab (and tomorrow), we will work with Amazon.com’s Web Services (AWS)1, a cloud based solution to run some simple analyses. Then, in the next lab, we will build on these tools to construct a larger learning system. We will only look at small problems, such that you can run them both locally and on AWS quickly. This way, you can learn how to use them within the limited time of these lab sessions. Unfortunately, this also means that you will not be dealing with truly large-‐scale problems where AWS is faster than local computations. You should consider these last two days as a proof-‐of-‐concept giving you the knowledge necessary to run things on AWS, which you can apply to your own large-‐scale problems after this summer school. Day 6 In the previous lesson, you learned the fundamentals of MapReduce and applied it to a simple classification problem (language detection, using the Na ̈ıve Bayes classifier). machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 45 Today, we’re going to use MapReduce again to solve a trickier problem: using EM to perform unsupervised POS induction. Use the same login information you used yesterday to access your Amazon machine. http://lxmls.it.pt/2014/guide.pdf LXMLS Slides, 2014 During the morning there will be lectures focusing on the main areas of ML and their application to NLP. These areas include but are not restricted to: Classification, Structured Prediction (sequences, trees, graphs), Parsing, Information Retrieval, and their applications to practical language processing on the Web. For each topic introduced in the morning there will be a practical session in the afternoon, where students will have the opportunity to test the concepts in practice. The practical sessions will consist in implementation exercises (using Python, Numpy, and Matplotlib) of the methods learned during the morning, testing them on real examples. A preliminary version of the lab guide is available here. http://lxmls.it.pt/2014/?page_id=5 INTRODUCTORY APPLIED MACHINE LEARNING by Victor Lavrenko and Nigel Goddard, University of Edinburgh, 2011 The goal of this course is to introduce students to basic algorithms for learning from examples, focusing on classification and clustering problems. This is a level 9 course intended for MSc students and 3rd year undergraduates. http://www.inf.ed.ac.uk/teaching/courses/iaml/ Data Mining and Machine Learning Course Material by Bamshad Mobasher, DePaul University, Fall 2014 COURSE DESCRIPTION The course will focus on the implementations of various data mining and machine learning techniques and their applications in various domains. The primary tools used in the class are the Python programming language and several associated libraries. Additional open source machine learning and data mining tools may also be used as part of the class material and assignments. Students will develop hands on experience developing supervised and unsupervised machine learning algorithms and will learn how to employ these techniques in the context of popular applications such as automatic classification, recommender systems, searching and ranking, text mining, group and community discovery, and social media analytics. http://facweb.cs.depaul.edu/mobasher/classes/CSC478/lecture.html Intelligent Information Retrieval by Bamshad Mobasher, DePaul University, Winter 2015 COURSE DESCRIPTION This course will examine the design, implementation, and evaluation of information retrieval systems, such as Web search engines, as well as new and emerging machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 46 technologies to build the next generation of intelligent and personalized search tools and Web information systems. We will focus on the underlying retrieval models, algorithms, and system implementations, such as vector-‐space and probabilistic retrieval models, as well as the PageRank algorithm used by Google. We will also study more advanced topics in intelligent information retrieval and filtering, particularly on the World Wide Web, including techniques for document categorization, automatic concept discovery, recommender systems, discovery and analysis of online communities and social networks, and personalized search. Throughout the course, current literature from the viewpoints of both research and practical retrieval technologies both on and off the World Wide Web will be examined. http://facweb.cs.depaul.edu/mobasher/classes/csc575/lecture.html Student Dave Youtube Channel https://www.youtube.com/user/TheScienceguy3000/videos?spfreload=10 Current Courses of Justin E. Esarey, RICE University Current Courses POLS 395: Introduction to Statistics [syllabus] POLS 500: Social Scientific Thinking I (PhD) [syllabus] POLS 505: Advanced MLE: Analyzing Categorical and Longitudinal Data [syllabus] POLS 506: Bayesian Statistics (PhD) [syllabus] Lecture 0: Introduction to R [webcast lecture] [R script] Lecture 1: Basic Concepts of Bayesian Inference [webcast lecture][R script][notebook] Lecture 2: Simple Bayesian Models Lecture 3: Basic Monte Carlo Procedures and Sampling Algorithms Lecture 4: The Metropolis-Hastings Algorithm and the Gibbs Sampler Lecture 5: Practical MCMC for Estimating Models Lecture 6: Bayesian Hierarchical Models and GLMs Lecture 7: Fitting Hierarchical Models with BUGS Lecture 8: Item Response Theory and the Scaling of Latente Dimensions Lecture 9: Model Checking, Validation, and Comparison Lecture 10: Missing Data Imputation Lecture 11: Multilevel Regression and Poststratification Lecture 12: Bayesian Spatial Autoregressive Models POLS 507: Nonparametric Models and Machine Learning (PhD) [syllabus] Lecture 1: Introduction to Nonparametric Statistics [webcast lecture] [R script] [notebook] Lecture 2: Nonparametric Uncertainty Estimation and Bootstrapping [webcast lecture] [R script] [notebook] Lecture 3: Ensemble Models and Bayesian Model Averaging [webcast lecture] [R script] [notebook] Lecture 4: "Causal Inference" and Matching [webcast lecture] [R script] machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 47 [notebook] Lecture 5: Instrumental Variable Models [webcast lecture] [R script] [notebook] Lecture 6: Bayesian Networks and Causality [webcast lecture] [R script] [notebook] Lecture 7: Assessing Fit in Discrete Choice Models [webcast lecture] [R script] [notebook] Lecture 8: Identifying and Measuring Latent Variables [webcast lecture] [R script] [notebook] Lecture 9: Neural Networks [webcast lecture] [R script] [notebook] Lecture 10: Classification and Regression Trees [webcast lecture] [R script] [notebook] http://jee3.web.rice.edu/teaching.htm From Bytes to Bites: How Data Science Might Help Feed the World by David Lobell, Stanford University This seminar features leading Industrial and academic experts on big data analytics, information management, data mining, machine learning, and large-‐scale data processing. https://www.youtube.com/watch?v=PZcRjEgZIwk&spfreload=10 Information and Data Analytics Seminar by Jure Leskovec, Stanford, Centre for Professional Development Course Description This seminar features leading Industrial and academic experts on big data analytics, information management, data mining, machine learning, and large-‐scale data processing. https://mvideos.stanford.edu/Graduate#/SeminarDetail/Winter/2015/CS/545/56 17 https://mvideos.stanford.edu/Graduate#/SeminarDetail/Winter/2015/CS/545/57 46 Conference on Empirical Methods in Natural Language Processing (and forerunners) (EMNLP) (Free access to all publications) The ACL Anthology currently hosts 33921 papers on the study of computational linguistics and natural language processing. Subscribe to the mailing list to receive announcements and updates to the Anthology. http://aclanthology.info/venues/emnlp emnlp acl's Youtube Channel https://www.youtube.com/channel/UCZC4e4nrTjVqkW3Gcl16WoA/videos?spfreload=10 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 48 Columbia University's Laboratory for Intelligent Imaging and Neural Computing (LIINC) Columbia University's Laboratory for Intelligent Imaging and Neural Computing (LIINC) was founded in September 2000 by Paul Sajda. The mission of LIINC is to using principles of reverse "neuro"-‐engineering to characterize the cortical networks underlying perceptual and cognitive processes, such as rapid decision making, in the human brain. Our laboratory pursues both basic and applied neuroscience research projects, with emphasis in the following: ... http://liinc.bme.columbia.edu/mainTemplate.htm?liinc_projects.htm Enabling Brain-‐Computer Interfaces for Labeling Our Environment by Paul Sadja NYC Machine Learning Meetup 1/15/15 Paul Sadja from Columbia University presenting "Neural Correlates of the "Aha" Moment: Enabling Brain-‐Computer Interfaces for Labeling Our Environment" https://www.youtube.com/watch?v=weNqauwatBs The Unreasonable Effectivness Of Deep Learning by Yann LeCun, Sept 2014 http://videolectures.net/sahd2014_lecun_deep_learning/ Machine Learning by Prof. Shai Ben-‐David, University of Waterloo, Lecture 1-‐3, Jan 2015 https://www.youtube.com/watch?v=rOcjShZbCFo&spfreload=10 https://www.youtube.com/watch?v=MYbt63PPP8o&spfreload=10 https://www.youtube.com/watch?v=jEIIkhESDac&spfreload=10 Miscellaneous Introduction To Modern Brain-‐Computer Interface Design by Swartz Center for Computational Neuroscience This is an online course on Brain-‐Computer Interface (BCI) design with a focus on modern methods. The lectures were first given by Christian Kothe (SCCN/UCSD) in 2012 at University of Osnabrueck within the Cognitive Science curriculum and have now been recorded in the form of an open online course. The course includes basics of EEG, BCI, signal processing, machine learning, and also contains tutorials on using BCILAB and the lab streaming layer software. http://sccn.ucsd.edu/wiki/Introduction_To_Modern_Brain-‐Computer_Interface_Design Distributed Computing Courses (lectures, exercises with solutions) by ETH Zurich, Group of Prof. Roger Wattenhofer Mission We are interested in both theory and practice of computer science and information technology. In our group we cultivate a large breadth of areas, reflecting our different backgrounds in computer science, mathematics, and electrical engineering. This gives us a unique blend of basic and applied research, proving mathematical theorems on the one hand, and building practical systems on the other. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 49 We currently study the following topics: Distributed computing (computability, locality, complexity), distributed systems (Bitcoin), wireline networks (software defined networks), wireless networks (media access theory and practice), social networks (influence), algorithms (online algorithms, game theory), learning theory (recommendation theory and practice). We regularly publish in different communities: distributed computing (e.g. PODC, SPAA, DISC), networking (e.g. SIGCOMM, MobiCom, SenSys), theory (e.g. STOC, FOCS, SODA, ICALP), and from time to time at random in areas such as machine learning or human computer interaction. Members of our group have won several best paper awards at top conferences such as PODC, SPAA, DISC, MobiCom, or P2P. Roger Wattenhofer has won the Prize for Innovations in Distributed Computing in 2012, for “extensive contributions to the study of distributed approximation”. Some projects turned into startup companies, e.g. Wuala, StreamForge, BitSplitters. Several projects have been covered by popular media and blogs, e.g. Gizmodo, Lifehacker, New York Times, NZZ, PC World Magazine, Red Herring, or Technology Review. Some of the software developed by our students is very popular: The music application Jukefox and the peer-‐to-‐peer client BitThief have together more than 1 million downloads. A branch of the United States FBI has requested to use a ver-‐ sion of BitThief as a tool to uncover illegal activities. About half of the former PhD students are in academic positions, some others founded startup companies. http://dcg.ethz.ch/courses.html The wonderful and terrifying implications of computers that can learn | Jeremy Howard | TEDxBrussels Published on 6 Dec 2014 This talk was given at a local TEDx event, produced independently of the TED Conferences. The extraordinary, wonderful, and terrifying implications of computers that can learn https://www.youtube.com/watch?v=xx310zM3tLs&spfreload=10 Partially derivative, A podcast about data, data science, and awesomeness! Partially Derivative is a show about data, data science, drinking, and awesomeness! We cover our top 10 data-‐related articles and blog posts from the past week — all in 30 minutes, or sometimes longer, depending on much we’ve been drinking. The show is hosted by Jonathon Morgan, a startup CTO, and Dr. Chris Albon, a computational political scientist. http://www.partiallyderivative.com Class Central MOOC Tracker – Never miss a course https://www.class-‐central.com Beginning to Advanced University CS Courses Awesome Courses Introduction machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 50 There is a lot of hidden treasure lying within university pages scattered across the internet. This list is an attempt to bring to light those awesome courses which make their high-‐quality material i.e. assignments, lectures, notes, readings & examinations available online for free. https://github.com/prakhar1989/awesome-‐courses MOOC or Opencourseware – Spanish (checked 2-‐Jan-‐2015) Coming soon … MOOC or Opencourseware – German (checked 2-‐Jan-‐2015) Coming soon … MOOC or Opencourseware – Italian (checked 2-‐Jan-‐2015) Coming soon … MOOC or Opencourseware – French (checked 2-‐Jan-‐2015) University of Laval (French Canadian) Open access to the course material Apprentissage automatique Apprentissage automatique à partir de données et apprentissage supervisé. Minimisation du risque empirique et minimisation du risque structurel. Méthodes d'estimation du vrai risque à partir de données et intervalles de confiance. Classificateurs linéaires et non linéaires. Forme duale de l'algorithme du perceptron. Noyaux de Mercer. Classificateurs à large marge de séparation. SVMs à marge rigide et marge floue. Apprentissage probablement approximativement correct (PAC) et théorie de Vapnik et Chervonenkis sur l'erreur de prédiction des classificateurs. L'apprentissage par compression de l'échantillon et applications aux SCMs et perceptrons. https://cours.ift.ulaval.ca/2009a/ift7002_81602/ Théorie algorithm. des graphes Ce cours aborde des sujets tels la connexité dans un graphe (problèmes du flot maximum, de la dualité min-‐max, de couplage parfait, etc.), la planarité d'un graphe (formule d'Euler, théorème de Kuratowski, graphe dual), le coloriage d'un graphe (coloriages entiers et fractionnaires des sommets ou des arêtes, graphes de Kneiser), les problèmes de transversales d'un graphe (parcours eulériens, cycles hamiltoniens, graphes de DeBruijn, etc.) et la notion de machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 51 marche aléatoire sur un graphe (chaînes de Markov, existence de la distribution limite, «mixing time», etc.). Plusieurs problèmes sur les graphes ont d'élégantes solutions, d'autres évidemment sont NP-‐complets; une partie de ce cours portera donc sur la théorie de la complexité (problèmes NP et NP-‐complets, théorème de Cook, algorithmes de réductions). https://cours.ift.ulaval.ca/2012a/ift7012_89927/ Hugo Larochelle, Apprentissage automatique, French Canadian Je m'intéresse aux algorithmes d'apprentissage automatique, soit aux algorithmes capables d'extraire des concepts ou patrons à partir de données. Mes travaux se concentrent sur le développement d'approches connexionnistes et probabilistes à diverses problèmes d'intelligence artificielle, tels la vision artificielle et le traitement automatique du langage. Les thèmes de recherche auxquels je m'intéresse incluent: Problèmes: apprentissage supervisé, semi-‐supervisé et non-‐supervisé, prédiction de cibles structurées, ordonnancement, estimation de densité; Modèles: réseaux de neurones profonds («deep learning»), autoencodeurs, machines de Boltzmann, champs Markoviens aléatoires; Applications: reconnaissance et suivi d'objects, classification et ordonnancement de documents; https://www.youtube.com/channel/UCiDouKcxRmAdc5OeZdiRwAg http://www.dmi.usherb.ca/~larocheh/index_fr.html Francis Bach, Ecole Normale Superieure -‐ Courses and Exercises with solutions (English-‐French) Spring 2014: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Fall 2013: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Spring 2013: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Spring 2013: Statistical machine learning -‐ Filiere Math/Info -‐ L3 -‐ Ecole Normale Superieure (Paris) Fall 2012: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Spring 2012: Statistical machine learning -‐ Filiere Math/Info -‐ L3 -‐ Ecole Normale Superieure (Paris) Spring 2012: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Fall 2011: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Spring 2011: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Fall 2010: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 52 Spring 2010: Statistical machine learning -‐ Master M2 "Probabilites et Statistiques" -‐ Universite Paris-‐Sud (Orsay) Fall 2009: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Fall 2008: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan May 2008: Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des Mines de Paris Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2007: Probabilistic modelling and graphical models: Enseignement Specialise -‐ Ecole des Mines de Paris Fall 2006: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan Fall 2005: An introduction to graphical models -‐ Master M2 "Mathematiques, Vision, Apprentissage" -‐ Ecole Normale Superieure de Cachan http://www.di.ens.fr/~fbach/ College de France, Mathematics and Digital Science, French One of the Collège de France's missions is to promote French research and thought abroad, and to participate in intel-‐lectual debates on major world issues. The institution therefore participates in international exchange through its teaching and the dissemination of knowledge, as well as through the research programmes involving its Chairs and laboratories. The fact that one fifth of the professors are currently from abroad, confirms the Collège de France's wid-‐ening research and education policy. This policy of international openness translates into: • Collège de France professors' teaching missions abroad • Lectures and lecture series by visiting professors • Junior Visiting Researchers scheme • Lecture series and symposia abroad • Internet broadcasts http://www.college-‐de-‐france.fr/site/audio-‐video/_audiovideos.jsp?index=0&prompt=&fulltextdefault=mots-‐ cles...&fulltext=&fields=TYPE2_ACTIVITY&fieldsdefault=0_0&TYPE2=0&ACTIVITY=mathematiques more to come … MOOC or Opencourseware – Russian (checked 2-‐Jan-‐2015) Russian Machine Learning Resources Google Translation from Russian: Professional information and analytical resource dedicated machine learning , pattern recognition and data mining . Now resource contains 831 article in Russian. (Source 16-‐07-‐2014) Classification Pattern recognition machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 53 Regression analysis Prediction Applied Statistics Analysis and understanding of images Processing and analysis of texts Applied Systems Analysis Data Signal Processing All Destinations http://www.machinelearning.ru/wiki/index.php?title=Заглавная_страница Yandex School The Yandex School of Data Analysis The School of Data Analysis is a free Master’s-‐level program in Computer Science and Data Analysis, which is offered by Yandex since 2007 to graduates in engineering, mathematics, computer science or related fields. The aim of the School is to train specialists in data analysis and information retrieval for further employment at Yandex or any other IT company. … The School’s courses are taught by Russian and international experts at Yandex’s Moscow office in the evenings, several times a week. The average study load is 15-‐ 20 hours per week, including 9-‐12 hours of lectures and seminars. The School also runs distance-‐learning courses and provides lectures over the internet. All courses at the Yandex School of Data Analysis are currently taught only in Russian. http://shad.yandex.ru/lectures/ Alexander D’yakonov Resources http://alexanderdyakonov.narod.ru/index.htm Unknown in Data Mining and Machine Learning (2013) Чему не учат в анализе данных и машинном обучении http://alexanderdyakonov.narod.ru/lpot4emu.pdf Introduction to Data Mining (2012) Введение в анализ данных http://alexanderdyakonov.narod.ru/intro2datamining.pdf Tricks in Data Mining (2011) Шаманство в анализе данных http://alexanderdyakonov.narod.ru/lpotdyakonov.pdf Manual "Logic Games, Data Mining, Weka, RapidMiner, MATLAB" (2010) Анализ данных, обучение по прецедентам, логические игры, системы WEKA, RapidMiner и MatLab http://www.machinelearning.ru/wiki/images/7/7e/Dj2010up.pdf machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 54 Machine Learning lectures by Konstantin Vorontsov. http://shad.yandex.ru/lectures/machine_learning.xml More to come … MOOC or Opencourseware – Japanese (checked 2-‐Jan-‐2015) Coming soon … MOOC or Opencourseware – Chinese (checked 2-‐Jan-‐2015) Yeeyan Coursera Chinese Classroom Google Translation from Chinese (Simplified Han) to English Welcome to Yeeyan × Coursera Chinese classroom. In this always have a small partner to accompany the classroom, you can: join collaborative translation; exchange ideas; enrollment became class representative; punch seek supervision; ...... Finally, welcome to drying out your certificate, either × Coursera joint Yeeyan Translator's Certificate or Certificate of Coursera course, you are overcome my own life winner! http://coursera.yeeyan.org/ Hong Kong Open Source Conference 2013 Wang Leung Wong The Vice-‐Chairperson of the Hong Kong Linux User Group This channel will post the videos of my life and opensource events in Hong Kong. Hong Kong Linux User Group: http://linux.org.hk Facebook: https://www.facebook.com/groups/hklug/ http://www.youtube.com/playlist?list=PL2FSfitY-‐hTKbEKNOwb-‐j0blK6qBauZ1f Guokr.com Machine Learning http://mooc.guokr.com/search/?wd=+%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0 Data Mining http://mooc.guokr.com/search/?wd=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98 Artificial Intelligence http://mooc.guokr.com/search/?wd=%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 55 More coming soon … MOOC or Opencourseware -‐ Portuguese (checked 2-‐Jan-‐2015) Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF, 2010 http://www2.ic.uff.br/~bianca/aa/ Algoritmo de Aprendizado de Máquina by Aurora Trinidad Ramirez Pozo, Universidade Federal do Paraná, UFPR http://www.inf.ufpr.br/aurora/tutoriais/aprendizadomaq/ http://www.inf.ufpr.br/aurora/tutoriais/arvoresdecisao/ http://www.inf.ufpr.br/aurora/tutoriais/Ceapostila.pdf http://www.inf.ufpr.br/aurora/ Digital Library, Universidad de Sao Paulo Link to copy in the browser http://www.teses.usp.br/index.php?option=com_jumi&fileid=20&Itemid=96&lang=en&cx=011662445380875560067%3Aca ck5lsxley&cof=FORID%3A11&hl=en&q=machine+learning&siteurl=www.teses.usp.br%2Findex.php%3Foption%3Dcom_jumi %26fileid%3D20%26Itemid%3D96%26lang%3Den&ref=www.teses.usp.br%2F&ss=5799j3321895j16 Coming soon … MOOC or Opencourseware – Hebrew&English (checked 2-‐Jan-‐ 2015) Open University of Israel ‫בישראל האקדמי בנוף ייחודית היא הפתוחה האוניברסיטה‬. ‫למדנית איכות על ובשקידתה למצוינות בחתירתה האחרות לאוניברסיטאות דומה היא‬ ‫גבוהה ומדעית‬, ‫שלה הארגוני במבנה מהן שונה היא אך‬, ‫שלה ההוראה בשיטות‬, ‫במערך‬ ‫שלה םלקורסי להירשם הפונים המועמדים מן ובדרישותיה הלימודים תכניות‬. ‫הפתוחה האוניברסיטה‬, ‫היא כן כשמה‬. ‫שעריה את פותחת היא‬, ‫מוקדמים תנאים בלא‬ ‫קדם דרישות ובלי‬, ‫קורסים חטיבות או בודדים קורסים ללמוד שמבקשים מי בפני הן‬, ‫הן‬ ‫"אוניברסיטה בוגר" לתואר מלאה לימודים תכנית ללמוד שמעוניינים מי בפני‬. http://www.youtube.com/user/openofek/search?query=machine+learning More coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 56 Exercices & Solutions CS229 Stanford Machine Learning (Huge!) List of projects (free access to abstracts), 2013 and previous years http://cs229.stanford.edu/projects2013.html http://cs229.stanford.edu CS229 Stanford Machine Learning by Andrew Ng, Autumn 2014 (Some Exercises & Solutions collected from CS229's link) http://cs229.stanford.edu/materials/ps1.pdf http://cs229.stanford.edu/materials/ps1sol.pdf http://cs229.stanford.edu/materials/ps2.pdf http://cs229.stanford.edu/materials/ps2sol.pdf http://cs229.stanford.edu/materials/ps3.pdf http://cs229.stanford.edu/materials/ps3sol.pdf http://cs229.stanford.edu/materials/midterm-‐2010-‐solutions.pdf http://cs229.stanford.edu/materials/midterm_aut2014.pdf CS 445/545 Machine Learning by Melanie Mitchell, Winter Quarter 2014 (Some Exercises & Solutions) http://web.cecs.pdx.edu/~mm/MachineLearningWinter2014/ Top Writing Errors by Melanie Mitchell http://web.cecs.pdx.edu/~mm/TopWritingErrors.pdf Introduction to Machine Learning, Machine Learning Lab, University of Freiburg, Germany http://ml.informatik.uni-‐freiburg.de/teaching/ss14/ml http://ml.informatik.uni-‐freiburg.de/_media/teaching/ss14/ml/sheet01.pdf http://ml.informatik.uni-‐freiburg.de/_media/teaching/ss14/sheet01_solution.pdf Unsupervised Feature Learning and Deep Learning by Andrew Ng, 2011 ? http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=ufldl&doc=exercises/ex1/ex1.html Machine Learning by Andrew Ng, 2011 http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex2/ex2. html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex3/ex3. html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex4/ex4. html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex5/ex5. html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex6/ex6. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 57 html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex7/ex7. html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex8/ex8. html http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex9/ex9. html http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning Pattern Recognition and Machine Learning, Solutions to Exercises, by Markus Svensen and Christopher Bishop, 2009 http://research.microsoft.com/en-us/um/people/cmbishop/prml/pdf/prml-web-sol-2009-09-08.pdf Machine Learning Course by Aude Billard, Exercises & Solutions, EPFL, Switzerland Overview and objective The aim of machine learning is to extract knowledge from data. The algorithm may be informed by incorporating prior knowledge of the task at hand. The amount of information varies from fully supervised to unsupervised or semi-‐supervised learning. This course will present some of the core advanced methods in the field for structure discovery, classification and non-‐linear regression. This is an advanced class in Machine Learning; hence, students are expected to have some background in the field. The class will be accompanied by practical session on computer, using the mldemos software (http://mldemos.epfl.ch) that encompasses more than 30 state of the art algorithms. http://lasa.epfl.ch/teaching/lectures/ML_Phd/ T-‐61.3025 Principles of Pattern Recognition Weekly Exercises with Solutions (in English), Aalto University, Finland, 2015 https://noppa.aalto.fi/noppa/kurssi/t-‐61.3025/viikkoharjoitukset T-‐61.3050 Machine Learning: Basic Principles Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014 https://noppa.aalto.fi/noppa/kurssi/t-‐61.3050/viikkoharjoitukset http://www.aalto.fi/en/ CSE-‐E5430 Scalable Cloud Computing Weekly Exercises with Solutions (in English), Aalto University, Finland, Fall 2014 https://noppa.aalto.fi/noppa/kurssi/cse-‐e5430/viikkoharjoitukset Weekly Exercises with Solutions (in English) from Aalto University, Finland TO EXPLORE, not to be missed! https://noppa.aalto.fi/noppa/kurssit/sci/t3060 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 58 SurfStat Australia: an online text in introductory Statistics http://surfstat.anu.edu.au/surfstat-‐home/surfstat-‐main.html (Exercises & Solutions) http://surfstat.anu.edu.au/surfstat-‐home/exercises.html Learning from Data by Amos Storkey, Tutorial & Worksheets (with solutions), University of Edinburgh, Fall 2014 This is a course for basic data analysis, statistical model building and machine learning. The course aims to provide a set of tools that I hope you will find very useful, coupled with a principled approach to formulating solutions to problems in machine learning. http://www.inf.ed.ac.uk/teaching/courses/lfd/lfdtutorials.html Web Search and Mining by Christopher Manning and Prabhakar Raghavan,, Winter 2005 (Slides, Exercises & Solutions) http://web.stanford.edu/class/cs276b/ http://web.stanford.edu/class/cs276b/syllabus.html Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Spring 2014 This course will provide an introduction to the theoretical analysis of prediction methods, focusing on statistical and computational aspects. It will cover approaches such as kernel methods and boosting algorithms, and probabilistic and game theoretic formulations of prediction problems, and it will focus on tools for the theoretical analysis of the performance of learning algorithms and the inherent difficulty of learning problems. http://www.stat.berkeley.edu/~bartlett/courses/2014spring-‐cs281bstat241b/ Introduction to Time Series by Peter Bartlett, Berkeley, Homework & solutions, Fall 2010 An introduction to time series analysis in the time domain and frequency domain. Topics will include: Stationarity, autocorrelation functions, autoregressive moving average models, partial autocorrelation functions, forecasting, seasonal ARIMA models, power spectra, discrete Fourier transform, parametric spectral estimation, nonparametric spectral estimation. http://www.stat.berkeley.edu/~bartlett/courses/153-‐fall2010/index.html machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 59 Introduction to Machine Learning by Stuart Russel, CS 194-‐10, Fall 2011, Assignments & Solutions The course will be a mixture of theory, algorithms, and hands-‐on projects with real data. The goal is to enable students to understand and use machine learning methods across a wide range of settings. http://www.eecs.berkeley.edu/~russell/classes/cs194/f11/ Statistical Learning Theory by Peter Bartlett, Berkeley, Homework & solutions, Fall 2009 This course will provide an introduction to probabilistic and computational methods for the statistical modeling of complex, multivariate data. It will concentrate on graphical models, a flexible and powerful approach to capturing statistical dependencies in complex, multivariate data. In particular, the course will focus on the key theoretical and methodological issues of representation, estimation, and inference. http://www.cs.berkeley.edu/~bartlett/courses/2009fall-‐cs281a/ Applications MIT Media Lab The real-‐time city is now real! The increasing deployment of sensors and hand-‐held electronics in recent years is allowing a new approach to the study of the built environment. The way we describe and understand cities is being radically transformed -‐ alongside the tools we use to design them and impact on their physical structure. Studying these changes from a critical point of view and anticipating them is the goal of the SENSEable City Laboratory, a new research initiative at the Massachusetts Institute of Technology. http://senseable.mit.edu TEDx San Francisco, Connected Reality Connected Reality is an evening that explored how the exponential technologies of the Internet of Things will give us deep insights that augment our understanding of the world and each other and will propel our ability to build intelligent tools that augment our lives. We'll briefly see the future through the eyes of presenters from varied industries of medicine to manufacturing who will illustrate how they use sensor data to perceive and understand the world differently and adjust their realities based on their new connectivity to their environment. http://tedxsf.org/videos/#tedxsf-‐connected-‐reality Emotion&Pain Project One of the main challenges facing healthcare providers in the UK today (and in Europe) is the rising number of people with chronic health problems. Almost 1 in 7 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 60 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! UK citizens experiences chronic pain, some due to chronic diseases such as osteoarthritis, but much of it mechanical low back pain (LBP) with no treatable pathology. 40% of these people experience severe pain and are very restricted by it. The capacity of our current health care system is insufficient to treat all these patients face-‐to-‐face. Pain experience is affected by physical, psychological, and social factors and hence it poses a problem to the medical profession. This has prompted the development of a multidisciplinary approach to the treatment of chronic LBP, primarily involving psychology and physiotherapy alongside specialist clinicians (see British Pain Society guidelines). These programmes enable patients to become more self-‐managing through improving their physical and psychological functioning. While short term results are good, maintenance of these gains, and building on them, remains a problem, with psychological factors being one of the primary limiting causes. Rehabilitation-‐assistive technologies have shown some success in helping recovery in a number of conditions but have yet to have an impact in pain management, mostly because of the complexity of dealing with emotional and motivational aspects of self-‐directed activity increase. By providing the means to automatically recognise, interpret, and act upon human affective states, recent developments in sensing technology and the field of affective computing offer new avenues for addressing these limitations and alleviating the difficulties patients face in building on treatment gains. Thus we propose the design and development of an intelligent system that will enable ubiquitous monitoring and assessment of patients’ pain-‐related mood and movements inside (and in the longer term, outside) the clinical environment. Specifically, we aim to (a) develop a set of methods for automatically recognising audiovisual cues related to pain, behavioural patterns typical of low back pain, and affective states influencing pain, and (b) integrate these methods into a system that will provide appropriate feedback and prompts to the patient based on his/her behaviour measured during self-‐ directed physical therapy sessions. In doing so, we seek to develop a new generation of multimodal patient-‐centred personal health technology. http://www.emo-‐pain.ac.uk NHK Documentary “Robot Revolution” Developing Robots for Dangerous Fukushima Decommission Process http://www.youtube.com/watch?v=mDD1TGv_2fo IBM Research Machine learning applications Five innovations that will change our lives within five years http://www.research.ibm.com/cognitive-‐computing/machine-‐learning-‐applications/index.shtml#fbid=Dp4uN7k8b2O EFPL Ecole Polytechnique Federale de Lausanne machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 61 EPFL is one of two Federal Institutes of Technology in Switzerland. Located along the shore of Lake Geneva, the university has more than 9,000 students in seven academic schools including Life Science, Architecture, and Computer Sciences. http://www.youtube.com/channel/UClMJeVIVyGp-‐3_kWtspkS0Q Visualizing MBTA Data: An interactive exploration of Boston's subway system Boston’s Massachusetts Bay Transit Authority (MBTA) operates the 4th busiest subway system in the U.S. after New York, Washington, and Chicago. … We attempt to present this information to help people in Boston better understand the trains, how people use the trains, and how the people and trains interact with each other. http://mbtaviz.github.io Commercial Applications (listed without any transfer of money) Google glass http://www.youtube.com/watch?v=D7TB8b2t3QE Google self-‐driving car http://www.youtube.com/watch?v=cdgQpa1pUUE SenseFly http://www.youtube.com/watch?v=NuZUSe87miY HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER Earlier this week, roughly 50,000 Skype users woke up to a new way of communicating over the Web-‐based phone-‐ and video-‐calling platform, a feature that could’ve been pulled straight out of Star Trek. The new function, called Skype Translator, translates voice calls between different languages in realtime, turning English to Spanish and Spanish back into English on the fly. Skype plans to incrementally add support for more than 40 languages, promising nothing short of a universal translator for desktops and mobile devices. The product of more than a decade of dedicated research and development by Microsoft Research (Microsoft acquired Skype in 2011), Skype Translator does what several other Silicon Valley icons—not to mention the U.S. Department of Defense— have not yet been able to do. To do so, Microsoft Research (MSR) had to solve some major machine learning problems while pushing technologies like deep neural networks into new territory. http://www.popsci.com/how-‐microsofts-‐machine-‐learning-‐breaking-‐language-‐ barrier machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 62 Free access to Research papers -‐ English Cambridge University Publications page http://mlg.eng.cam.ac.uk/pub/ arXiv.org by Cornell University Library Open access to 999,848 e-‐prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics http://arxiv.org Google Scholar Stand on the shoulders of giants. Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research. http://scholar.google.com/intl/en/scholar/about.html http://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=machine+learning&before_author=m83-‐ _28PAAAJ&astart=0 Google Research Google publishes hundreds of research papers each year. Publishing is important to us; it enables us to collaborate and share ideas with, as well as learn from, the broader scientific community. Submissions are often made stronger by the fact that ideas have been tested through real product implementation by the time of publication. http://research.google.com/pubs/papers.html Yahoo Research The machine learning group is a team of experts in computer science, statistics, mathematical optimization, and automatic control. They focus on making computers learn abstractions, patterns, conditional probability distributions, and policies from web scale data with the goal to improve the online experience for Yahoo! users, partner publishers, and advertisers. Machine learning has such a broad influence on the internet, it can be quite difficult to recognize. Machine learning’s benefits are often hidden – they are the spam emails you don’t see, the uninteresting news articles you don’t see, and the irrelevant search results you don’t see, just to name a new. Machine learning is one of the best technologies we have for solving some of the biggest problems on the Web. http://labs.yahoo.com/areas/?areas=machine-‐learning Microsoft Research The Machine Learning Groups of Microsoft Research include a set of researchers and developers who push the state of the art in machine learning. We span the space from proving theorems about the math underlying ML, to creating new ML systems machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 63 and algorithms, to helping our partner product groups apply ML to large and complex data sets. http://research.microsoft.com/en-‐us/groups/mldept/ Journal from MIT Press The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-‐quality scholarly articles in all areas of machine learning. All published papers are freely available online. http://jmlr.org INRIA Access to Research Papers http://haltools.inrialpes.fr/Public/afficheRequetePubli.php?labos_exp=sierra&CB_auteur=oui&CB_titre=oui&CB_article=oui&l angue=Anglais&tri_exp=annee_publi&tri_exp3=date_publi&ordre_aff=TA&Fen=Aff&css=../css/VisuCondense.css DROPS, Dagstulh Research Online Publication Server Access to Research Papers http://drops.dagstuhl.de/opus/suche/index.php machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 64 Open Source Software – English JAVA Weka 3: Data Mining Software in Java Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-‐processing, classification, regression, clustering, association rules, and visualization. It is also well-‐suited for developing new machine learning schemes. http://www.cs.waikato.ac.nz/~ml/weka/index.html A deep-‐learning library for Java Distributed Deep Learning Platform for Java https://github.com/agibsonccc/java-‐deeplearning List of Java ML Software by Machine Learning Mastery http://machinelearningmastery.com/java-‐machine-‐learning/ List of Java ML Software by MLOSS http://mloss.org/software/language/java/ MathFinder: Math API Discovery and Migration, Software Engineering and Analysis Lab (SEAL), IISc Bangalore MathFinder is an Eclipse plugin supported by a unit test mining backend for discovering and migrating math APIs. It is intended to make (re)implementing math algorithms in Java easier. Given a math expressions (see the syntax below), it returns a pseudo-‐code involving calls to suitable Java APIs. At present, it supports programming tasks that require use of matrix and linear algebra APIs. The underlying technique is however general and can be extended to support other math domains. http://www.iisc-‐seal.net/mathfinder PYTHON Theano Library for Deep Learning Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-‐dimensional arrays efficiently. Theano features: • tight integration with NumPy – Use numpy.ndarray in Theano-‐compiled functions. • transparent use of a GPU – Perform data-‐intensive calculations up to 140x faster than with CPU.(float32 only) machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 65 efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs. • speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny. • dynamic C code generation – Evaluate expressions faster. • extensive unit-‐testing and self-‐verification – Detect and diagnose many types of mistake. Theano has been powering large-‐scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). http://deeplearning.net/software/theano/ http://nbviewer.ipython.org/github/craffel/theano-‐ tutorial/blob/master/Theano%20Tutorial.ipynb • Introduction to Deep Learning with Python Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning with Python and the Theano library. The emphasis of the talk is on high performance computing, natural language processing using recurrent neural nets, and large scale learning with GPUs. https://www.youtube.com/watch?v=S75EdAcXHKk Udacity -‐ Programming foundations with Python You’ll pick up some great tools for your programming toolkit in this course! You will: • Start coding in the programming language Python; • Reuse and share code with Object Oriented Programming; • Create and share amazing, life-‐hacking projects! https://www.udacity.com/course/ud036 Scikit-‐learn, Machine Learning in Python Simple and efficient tools for data mining and data analysis Accessible to everybody, and reusable in various contexts Built on NumPy, SciPy, and matplotlib Open source, commercially usable -‐ BSD license http://scikit-‐learn.org/stable/index.html Pydata PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-‐evolving challenges in the vast realm of data management, processing, analytics, and visualization. https://www.youtube.com/user/PyDataTV/videos PyData NYC 2014 Videos Published 4 days ago https://www.youtube.com/user/PyDataTV/videos?spfreload=10 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 66 PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-‐evolving challenges in the vast realm of data management, processing, analytics, and visualization. We aim to be an accessible, community-‐driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person. A major goal of the conference is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems. http://pydata.org/nyc2014/about/about/ PyData, The Complete Works by Rohit Sivaprasad Added in the kit 11-‐Nov-‐2014 The unofficial index of all PyData talks. This was intially going to be a pickled pandas DataFrame object, but then I decided against it. So here it is -‐ in beautiful Github flavored markdown. There are placeholders for links to the video. Currently, the hyperlinks point to the pydata.org talk pages. Please do feel free to make it better by contributing to the repo. https://github.com/DataTau/datascience-‐anthology-‐pydata Anaconda Completely free enterprise-‐ready Python distribution for large-‐scale data processing, predictive analytics, and scientific computing We want to ensure that Python, NumPy, SciPy, Pandas, IPython, Matplotlib, Numba, Blaze, Bokeh, and other great Python data analysis tools can be used everywhere. We want to make it easier for Python evangelists and teachers to promote the use of Python. We want to give back to the Python community that we love being a part of. https://store.continuum.io/cshop/anaconda/ Ipython Interactive Computing IPython provides a rich architecture for interactive computing with: Powerful interactive shells (terminal and Qt-‐based). A browser-‐based notebook with support for code, text, mathematical expressions, inline plots and other rich media. Support for interactive data visualization and use of GUI toolkits. Flexible, embeddable interpreters to load into your own projects. Easy to use, high performance tools for parallel computing. http://ipython.org Scipy SciPy refers to several related but distinct entities: machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 67 The SciPy Stack, a collection of open source software for scientific computing in Python, and particularly a specified set of core packages. • The community of people who use and develop this stack. • Several conferences dedicated to scientific computing in Python -‐ SciPy, EuroSciPy and SciPy.in. The SciPy library, one component of the SciPy stack, providing many numerical routines. http://www.scipy.org • Numpy NumPy is the fundamental package for scientific computing with Python. It contains among other things: • a powerful N-‐dimensional array object • sophisticated (broadcasting) functions • tools for integrating C/C++ and Fortran code • useful linear algebra, Fourier transform, and random number capabilities Besides its obvious scientific uses, NumPy can also be used as an efficient multi-‐ dimensional container of generic data. Arbitrary data-‐types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. http://www.numpy.org matplotlib matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits. http://matplotlib.org pandas Python Data Analysis Library¶ pandas is an open source, BSD-‐licensed library providing high-‐performance, easy-‐ to-‐use data structures and data analysis tools for the Python programming language. http://pandas.pydata.org SymPy SymPy is a Python library for symbolic mathematics. http://sympy.org/en/index.html Orange Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Add-‐ons for bioinformatics and text mining. Packed with features for data analytics. http://orange.biolab.si machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 68 Pythonic Perambulations: How to be a Bayesian in Python Below I'll explore three mature Python packages for performing Bayesian analysis via MCMC: emcee: the MCMC Hammer pymc: Bayesian Statistical Modeling in Python pystan: The Python Interface to Stan http://jakevdp.github.io/blog/2014/06/14/frequentism-‐and-‐bayesianism-‐4-‐bayesian-‐in-‐python/ emcee emcee is an extensible, pure-‐Python implementation of Goodman & Weare's Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler. It's designed for Bayesian parameter estimation and it's really sweet! http://dan.iel.fm/emcee/current/ PyMC PyMC is a python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems. Along with core sampling functionality, PyMC includes methods for summarizing output, plotting, goodness-‐ of-‐fit and convergence diagnostics. http://pymc-‐devs.github.io/pymc/ Pylearn2 Ian J. Goodfellow, David Warde-‐Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Frédéric Bastien, and Yoshua Bengio. "Pylearn2: a machine learning research library". arXiv preprint arXiv:1308.4214 (BibTeX) https://github.com/lisa-‐lab/pylearn2 Giant list of python learning resources Keep following this post, we'll keep updating this huge list & collection. http://python2web.com/giant-‐list-‐of-‐python-‐learning-‐resources/ PyCon US 2014 PyCon is the largest annual gathering for the community using and developing the open-‐source Python programming language. It is produced and underwritten by the Python Software Foundation, the 501(c)(3) nonprofit organization dedicated to advancing and promoting Python. Through PyCon, the PSF advances its mission of growing the international community of Python programmers. Because PyCon is backed by the non-‐profit PSF, we keep registration costs much lower than comparable technology conferences so that PyCon remains accessible to the widest group possible. The PSF also pays for the ongoing development of the software that runs PyCon and makes it available under a liberal open source license. 140 videos http://pyvideo.org/category/50/pycon-‐us-‐2014 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 69 https://www.youtube.com/user/PyCon2014/about?spfreload=10 PyCon India 2012 https://www.youtube.com/playlist?list=PL6GW05BfqWIdWaV_aP6kHJKFY0ybOOfoA PyCon India 2013 https://www.youtube.com/playlist?list=PL6GW05BfqWIdsaaV35jcHWPWTI-‐DAw6Yn Montreal Python Montréal-‐Python's mission is to promote the growth of a lively and dynamic community of users of the Python programming language and to promote the use of the latter. Montréal-‐Python also aims to disseminate the local Python knowledge to build a stronger developer community. Montréal-‐Python promotes Free and Open Source Software, favors its adoption within the community, and collaborates with community players to achieve this goal. http://www.youtube.com/user/MontrealPython/videos http://montrealpython.org/en/ SciPy 2014 SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. http://pyvideo.org/category/51/scipy-‐2014 PyLadies London Meetup resources PyLadies is an international mentorship group with a focus on helping more women and genderqueers become active participants and leaders in the Python open-‐ source community. Our mission is to promote, educate and advance a diverse Python community through outreach, education, conferences, events, and social gatherings. PyLadies also aims to provide a friendly support network for women and genderqueers, and a bridge to the larger Python world. https://github.com/pyladieslondon/resources Python Tools for Machine Learning by CB Insights http://www.cbinsights.com/blog/python-‐tools-‐machine-‐learning Python Tutorials by Jessica MacKellar I am a startup founder, software engineer, and open source developer living in San Francisco, California. I enjoy the Internet, networking, low-‐level systems engineering, relational databases, tinkering on electronics projects, and contributing to and helping other people contribute to open source software. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 70 "Be the change you wish to see in the world" may be clichéd, but what can I say, I believe in it. I am committed to applying my skills, in individual and collective efforts, to improve the world. Right now, this means I spend a lot of time volunteering, engaging technologists about education, and empowering effective people and initiatives in my capacity as a Director for the Python Software Foundation. http://web.mit.edu/jesstess/ INTRODUCTION TO PYTHON FOR DATA MINING http://nbviewer.ipython.org/github/Syrios12/learningwithdata/blob/master/Python_For_Data_Mining.ipynb Python Scientific Lecture Notes Tutorial material on the scientific Python ecosystem, a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert. http://scipy-‐lectures.github.io/index.html# OCTAVE GNU Octave is a high-‐level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-‐interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable. http://www.gnu.org/software/octave/ PMTK Toolbox by Matt Dunham, Kevin Murphy PMTK is a collection of Matlab/Octave functions, written by Matt Dunham, Kevin Murphy and various other people. The toolkit is primarily designed to accompany Kevin Murphy's textbook Machine learning: a probabilistic perspective, but can also be used independently of this book. The goal is to provide a unified conceptual and software framework encompassing machine learning, graphical models, and Bayesian statistics (hence the logo). (Some methods from frequentist statistics, such as cross validation, are also supported.) Since December 2011, the toolbox is in maintenance mode, meaning that bugs will be fixed, but no new features will be added (at least not by Kevin or Matt). PMTK supports a large variety of probabilistic models, including linear and logistic regression models (optionally with kernels), SVMs and gaussian processes, directed and undirected graphical models, various kinds of latent variable models (mixtures, PCA, HMMs, etc) , etc. Several kinds of prior are supported, including Gaussian (L2 regularization), Laplace (L1 regularization), Dirichlet, etc. Many algorithms are supported, for both Bayesian inference (including dynamic programming, variational Bayes and MCMC) and MAP/ML estimation (including EM, conjugate and projected gradient methods, etc.) machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 71 https://github.com/probml/pmtk3 JULIA Julia is a high-‐level, high-‐performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, largely written in Julia itself, also integrates mature, best-‐of-‐breed C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing. In addition, the Julia developer community is contributing a number of external packages through Julia’s built-‐in package manager at a rapid pace. IJulia, a collaboration between the IPython and Julia communities, provides a powerful browser-‐based graphical notebook interface to Julia. Julia programs are organized around multiple dispatch; by defining functions and overloading them for different combinations of argument types, which can also be user-‐defined. For a more in-‐depth discussion of the rationale and advantages of Julia over other systems, see the following highlights or read the introduction in the online manual. http://julialang.org Julia by example http://www.scolvin.com/juliabyexample/ The R PROJECT for Statistical Computing R R is a language and environment for statistical computing and graphics… R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-‐series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well-‐designed publication-‐quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. http://www.r-‐project.org R Graph Gallery The blog is a collection of script examples with example data and output plots. R produce excellent quality graphs for data analysis, science and business machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 72 presentation, publications and other purposes. Self-‐help codes and examples are provided. Enjoy nice graphs !! http://rgraphgallery.blogspot.co.uk/2013/04/ploting-‐heatmap-‐in-‐map-‐using-‐maps.html Code School -‐ R Course Learn the R programming language for data analysis and visualization. This software programming language is great for statistical computing and graphics. https://www.codeschool.com/courses/try-‐r Coursera R programming In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-‐level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples. https://www.coursera.org/course/rprog Open Intro R Labs OpenIntro Labs promote the understanding and application of statistics through applied data analysis. The statistical software R is a widely used and stable software that is free. RStudio is a user-‐friendly interface for R. http://www.openintro.org/stat/labs.php R Tutorial • Hierarchical Linear Model • Bayesian Classification with Gaussian Process • Bayesian Inference Using OpenBUGS • Significance Test for Kendall's Tau-‐b • Support Vector Machine with GPU, Part II • Hierarchical Cluster Analysis http://www.r-‐tutor.com DataCamp R Course • Introduction to R • Data Analysis and Statistical Inference • Introduction to Computational Finance and Financial Econometrics • How to work with Quandl in R https://www.datacamp.com/courses machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 73 R Bloggers R-‐Bloggers.com is a central hub (e.g: A blog aggregator) of content collected from bloggers who write about R (in English). The site will help R bloggers and users to connect and follow the “R blogosphere” (you can view a 7 minute talk, from useR2011, for more information about the R-‐blogosphere). http://www.r-‐bloggers.com STAN Software Stan is a probabilistic programming language implementing full Bayesian statistical inference with • MCMC sampling (NUTS, HMC) • and penalized maximum likelihood estimation with • Optimization (BFGS) • Stan is coded in C++ and runs on all major platforms (Linux, Mac, Windows). • Stan is freedom-‐respecting, open-‐source software (new BSD core, GPLv3 interfaces). Interfaces Download and getting started instructions, organized by interface: • RStan v2.5.0 (R) • PyStan v2.5.0 (Python) • CmdStan v2.5.0 (shell, command-‐line terminal) • MatlabStan (MATLAB) • Stan.jl (Julia) http://mc-‐stan.org List of Machine Learning Open Source Software To support the open source software movement, JMLR MLOSS publishes contributions related to implementations of non-‐trivial machine learning algorithms, toolboxes or even languages for scientific computing. http://jmlr.org/mloss/ Google Prediction API Google's cloud-‐based machine learning tools can help analyze your data to add the following features to your applications: Customer sentiment analysis, Message routing decisions, Document and email classification, Recommendation systems, Churn analysis, Spam detection, Upsell opportunity analysis, Diagnostics, Suspicious activity identification, and much more … Free Quota: Usage is free for the first six months, up to the following limits per Google Developers Console project. This free quota applies even when billing is enabled, until the six-‐month expiration time. Usage limits: machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 74 Predictions: 100 predictions/day Hosted model predictions: Hosted models have a usage limit of 100 predictions/day/user across all models. Training: 5MB trained/day Streaming updates: 100 streaming updates/day Lifetime cap: 20,000 predictions. Expiration: Free quota expires six months after activating Google Prediction for your project in the Google Developers Console. https://developers.google.com/prediction/ Reddit Reddit /ˈrɛdɪt/,[3] stylized as reddit,[4] is an entertainment, social networking service and news website where registered community members can submit content, such as text posts or direct links. Only registered users can then vote submissions "up" or "down" to organize the posts and determine their position on the site's pages. Content entries are organized by areas of interest called "subreddits". (source Wikipedia) http://www.reddit.com/r/MachineLearning/ SCHOGUN toolbox A large scale machine learning toolbox. SHOGUN is designed for unified large-‐scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. http://www.shogun-‐toolbox.org/page/home/ Comparison between ML toolbox https://docs.google.com/spreadsheet/ccc?key=0Aunb9cCVAP6NdDVBMzY1TjdPcmx4ei1EeUZNNGtKUHc&hl=en#gid=0 Infer.NET, Microsoft Research Infer.NET is a framework for running Bayesian inference in graphical models. It can also be used for probabilistic programming as shown in this video. You can use Infer.NET to solve many different kinds of machine learning problems, from standard problems like classification or clustering through to customised solutions to domain-‐specific problems. Infer.NET has been used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision, and many others. A new feature in Infer.NET 2.5 is Fun, a library turns the simple succinct syntax of F# into a probabilistic modeling language for Bayesian machine learning. You can run your models with F# to compute synthetic data, and you can compile your models with the Infer.NET compiler for efficient inference. See the Infer.NET Fun website for additional information. http://research.microsoft.com/en-‐us/um/cambridge/projects/infernet/default.aspx F# Software Foundation F# is ideally suited to machine learning because of its efficient execution, succinct style, data access capabilities and scalability. F# has been successfully used by some of the most advanced machine learning teams in the world, including several groups at Microsoft Research. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 75 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Try F# has some introductory machine learning algorithms. Further resources related to different aspects of machine learning are below. See also the Math and Statistics and Data Science sections for related material. http://fsharp.org/machine-‐learning/ BigML Now Free Unlimited tasks (up to 16MB/Task) https://bigml.com/ BRML Toolbox in Matlab – David Barber Toolbox, University College London http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.Software Dmitry Efimov Software http://mech.math.msu.su/~efimov/indexe.php SCILAB Scilab is free and open source software for numerical computation providing a powerful computing environment for engineering and scientific applications. Scilab includes hundreds of mathematical functions. It has a high level programming language allowing access to advanced data structures, 2-‐D and 3-‐D graphical functions. http://www.scilab.org/en/scilab/about OverFeat and Torch7, CILVR Lab @ NYU OverFeat is an image recognizer and feature extractor built around a convolutional network. The OverFeat convolutional net was trained on the ImageNet 1K dataset. It participated in the ImangeNet Large Scale Recognition Challenge 2013 under the name “OverFeat NYU”. This release provides C/C++ code to run the network and output class probabilities or feature vectors. It also includes a webcam-‐based demo. Torch7 is an interactive development environment for machine learning and computer vision. It is an extension of the Lua language with a multidimensional numerical array library. Lua is a very simple, compact and efficient interpreter/compiler with a straightforward syntax. It is used widely as a scripting language in the computer game industry. Torch extends Lua with an extensive numerical library and various facilities for machine learning and computer vision. Torch has computational back-‐ends for multicore/multi-‐CPU machines (using Intel/AVX and OpenMP), NVidia GPUs (using CUDA), and ARM CPUs (using the Neon instruction set). Many research projects at the CILVR Lab are built with Torch. http://cilvr.nyu.edu/doku.php?id=code:start FAIR open sources deep-‐learning modules for Torch https://research.facebook.com/blog/879898285375829/fair-‐open-‐sources-‐deep-‐learning-‐modules-‐for-‐torch/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 76 IPython kernel for Torch with visualization and plotting https://github.com/facebook/iTorch Mloss.org Our goal is to support a community creating a comprehensive open source machine learning environment. Ultimately, open source machine learning software should be able to compete with existing commercial closed source solutions. To this end, it is not enough to bring existing and freshly developed toolboxes and algorithmic implementations to people's attention. More importantly the MLOSS platform will facilitate collaborations with the goal of creating a set of tools that work with one another. Far from requiring integration into a single package, we believe that this kind of interoperability can also be achieved in a collaborative manner, which is especially suited to open source software development practices. https://mloss.org/software/view/501/ Sourceforge Find, Create, and Publish Open Source Software for free http://sourceforge.net/directory/os:mac/freshness:recently-‐updated/?q=machine%20learning AForge.NET Framework AForge.NET is a C# framework designed for developers and researchers in the fields of Computer Vision and Artificial Intelligence -‐ image processing, neural networks, genetic algorithms, machine learning, robotics, etc. http://www.aforgenet.com cuda-‐convnet High-‐performance C++/CUDA implementation of convolutional neural networks This is a fast C++/CUDA implementation of convolutional (or more generally, feed-‐ forward) neural networks. It can model arbitrary layer connectivity and network depth. Any directed acyclic graph of layers will do. Training is done using the back-‐ propagation algorithm. Fermi-‐generation GPU (GTX 4xx, GTX 5xx, or Tesla equivalent) required. https://code.google.com/p/cuda-‐convnet/ word2vec Tool for computing continuous distributed representations of words. This tool provides an efficient implementation of the continuous bag-‐of-‐words and skip-‐gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. https://code.google.com/p/word2vec/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 77 Freecode Freecode maintains the Web's largest index of Linux, Unix and cross-‐platform software, and mobile applications. Thousands of applications, which are preferably released under an open source license, are meticulously cataloged in the Freecode database, and links to new applications are added daily. Each entry provides a description of the software, links to download it and to obtain more information, and a history of the project's releases, so readers can keep up-‐to-‐date on the latest developments. Freecode is the first stop for Linux users hunting for the software they need for work or play. It is continuously updated with the latest developments from the "release early, release often" community. In addition to providing news on new releases, Freecode offers a variety of original content on technical, political, and social aspects of software and programming, written by both Freecode readers and Free Software luminaries. The comment board attached to each page serves as a home for spirited discussion, bug reports, and technical support. An essential resource for serious developers, Freecode makes it possible to keep up on who's doing what, and what everyone else thinks of it. http://freecode.com/search?q=machine+learning&submit=Search Open Machine Learning Workshop organized by Alekh Agarwal, Alina Beygelzimer, and John Langford, August 2014 The goal of this workshop is to inform people about open source machine learning systems being developed, aid the coordination of such projects, and discuss future plans. http://hunch.net/~nyoml/ Maxim Milakov Software I am a researcher in machine learning and high-‐performance computing. I designed and implemented nnForge -‐ a library for training convolutional and fully connected neural networks, with CPU and GPU (CUDA) backends. You will find my thoughts on convolutional neural networks and the results of applying convolutional ANNs for various classification tasks in the Blog. http://www.milakov.org Alfonso Nieto-‐Castanon Software http://www.alfnie.com/software Lib Skylark The Sketching based Matrix computations for Machine Learning is a library for matrix computations suitable for general statistical data analysis and optimization applications. Many tasks in machine learning and statistics ultimately end up being problems involving matrices: whether you're finding the key players in the bitcoin market, or inferring where tweets came from, or figuring out what's in sewage, you'll want to machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 78 have a toolkit for least-‐squares and robust regression, eigenvector analysis, non-‐ negative matrix factorization, and other matrix computations. Sketching is a way to compress matrices that preserves key matrix properties; it can be used to speed up many matrix computations. Sketching takes a given matrix A and produces a sketch matrix B that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem with input B, the solution will also be pretty good for input A. For some problems, sketches can also be used to get faster ways to find high-‐precision solutions to the original problem. In other cases, sketches can be used to summarize the data by identifying the most important rows or columns. A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where each row (and/or column) is equally likely to be sampled. This uniform sampling is quick and easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods that do yield good sketches. http://xdata-‐skylark.github.io Mutual Information Text Explorer The Mutual information Text Explorer is a tool that allows interactive exploration of text data and document covariates. See the paper or slides for information. Currently, an experimental system is available. http://brenocon.com/MiTextExplorer/ Data Science Resources by Jonathan Bower on GitHub Added in the kit 27-‐Oct-‐2014 This repo is intended to provide open source resources to facilitate learning or to point practicing/aspiring data scientists in the right direction. It also exists so that I can keep track of resources that are/were helpful to me and hopefully for you. I aim to cover the full spectrum of data science and to hopefully include topics of data science that aren't either actively covered or easy to find in the open-‐source world. For instance, I haven't focused on in-‐depth machine learning theory since that is well covered. If you are looking for ML theory I would look to some of the online courses, books or bootcamps. There is a lot of theory information available online, some is linked lower on this page here, here and other info is available with many purchasable books. Keep in mind that this is a constant work in progress. If you have anything to add, any feedback, or would like to be a contributor -‐ please reach out. If there are any mistakes or typos, be patient with me, but please let me know. Lastly, I would add that a large portion of data science is exploratory data analysis and properly cleaning your data to implement the tools and theory necessary to solve the problem at hand. For each problem there are many different ways and tools to execute a successful solution -‐ if one method isn't working re-‐evaluate, re-‐ work the problem, try another approach and/or reach out to the community for support. Good luck and I hope this repo helpful! https://github.com/jonathan-‐bower/DataScienceResources machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 79 Joseph Misiti's Blog A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-‐php. Other awesome lists can be found in the awesome-‐awesomeness list. https://github.com/josephmisiti/awesome-‐machine-‐learning Michael Waskom GitHub repositories I'm a Ph.D. student in the Department of Psychology at Stanford University, where I work with Anthony Wagner. I use behavioral, computational, and neuroimaging methods to study cognitive control and decision making in humans. Previously, I spent time in John Gabrieli's lab at MIT investigating whether cognition can be improved through training. I did my undergrad at Amherst College, where I studied philosophy and neuroscience. Complementing this research, I have developed a set of software libraries for statistical analysis and visualization. These libraries aim to make computationally-‐ based research more reproducible and improve the visual presentation of statistical and neuroimaging results. https://github.com/mwaskom Visualizing distributions of data This notebook demonstrates different approaches to graphically representing distributions of data, specifically focusing on the tools provided by the seaborn package. http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/p lotting_distributions.ipynb Exploring Seaborn and Pandas based plot types in HoloViews by Philipp John Frederic Rudiger In this notebook we'll look at interfacing between the composability and ability to generate complex visualizations that HoloViews provides and the great looking plots incorporated in the seaborn library. Along the way we'll explore how to wrap different types of data in a number of Seaborn View types, including: -‐ Distribution Views -‐ Bivariate Views -‐ TimeSeries Views Additionally we explore how a Pandas dframe can be wrapped in a general purpose View type, which can either be used to convert the data into standard View types or be visualized directly using a wide array of plotting options, including: -‐ Regression plots, correlation plots, box plots, autocorrelation plots, scatter matrices, histograms or regular scatter or line plots. http://philippjfr.com/blog/seabornviews/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 80 "Machine Learning: An Algorithmic Perspective" Code by Stephen Marsland This webpage contains the code and other supporting material for the textbook "Machine Learning: An Algorithmic Perspective" by Stephen Marsland, published by CRC Press, part of the Taylor and Francis group. The first edition was published in 2009, and a revised and updated second edition is due out towards the end of 2014. The book is aimed at computer science and engineering undergraduates studing machine learning and artificial intelligence. The table of contents for the second edition can be found here. There are lots of Python/NumPy code examples in the book, and the code is available here. Datasets (either the actual data, or links to the appropriate resources) are given at the bottom of the page. Note that the chapter headings and order below refer to the second edition. However, the titles of the chapters should enable users of the first edition to find the relevant sections. In addition, a zip file of the code for the 1st edition is available here. All of the code is freely available to use (with appropriate attribution), but comes with no warranty of any kind. Option 1: Zip file of all code, arranged into chapters Option 2: Choose what you want from here: Chapter 2 (Preliminaries): Plots a 1D Gaussian (Fig 2.14) Plot some 2D Gaussians Chapter 3 (Neurons, Neural Networks, and Linear Discriminants): The Perceptron The Linear Regressor Another Perceptron (for use with logic.py) Demonstration of Perceptron with logic functions Demonstration of Linear Regressor with logic functions Demonstration of Perceptron with Pima Indian dataset Demonstration of Linear Regressor with auto-‐mpg dataset Demonstration of Perceptron with the MNIST dataset Chapter 4 (The Multi-‐Layer Perceptron): The Multi-‐Layer Perceptron Demonstration of the MLP on logic functions Demonstration of the MLP for classification on the Iris dataset Demonstration of the MLP for regression on data from a sine wave Demonstration of the MLP for time series on the Palmerston North Ozone dataset Demonstration of MLP with the MNIST dataset Chapter 5 (Radial Basis Functions and Splines): The Radial Basis Function Linear Least Squares Fitting Demonstration of the RBF on the Iris dataset Chapter 6 (Dimensionality Reduction): Linear Discriminant Analysis Principal Components Analysis machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 81 Factor Analysis Kernel Principal Components Analysis Locally Linear Embedding Isomap Demonstration of PCA Demonstration of kernel PCA on a set of circular data Demonstration of the algorithms on the Iris dataset Chapter 7 (Probabilistic Learning): The Gaussian Mixture Model The k-‐Nearest Neighbour Algorithm The k-‐Nearest Neighbour Smoother The kd-‐Tree Algorithm Chapter 8 (Support Vector Machines): The Support Vector Machine (needs cvxopt) Demonstration of the SVM for classification on the Iris dataset Demonstration of the SVM for the variant of XOR in Figs 8.7 and 8.8 Chapter 9 (Optimisation and Search): Steepest Descent Newton's method Levenberg-‐Marquarft Conjugate Gradients The version of the MLP algorithm trained using conjugate gradients Demonstration of the MLP algorithm trained using conjugate gradients on the Iris dataset Demonstration of Levenberg-‐Marquardt on a least-‐squares fitting problem Demonstration of four solution methods for the Travelling Salesman Problem Chapter 10 (Evolutionary Learning): The Genetic Algorithm A Runner for the GA Population-‐Based Incremental Learning A knapsack problem fitness function The four peaks fitness function The onemax fitness function Exhaustive search algorithm to solve the knapsack problem A greedy algorithm to solve the knapsack problem Chapter 11 (Reinforcement Learning): The SARSA algorithm The TD(0) algorithm Demonstration of the SARSA algorithm on the Cliff problem Demonstration of the TD(0) algorithm on the Cliff problem Chapter 12 (Learning with Trees): The decision tree Demonstration of the decision tree on the Party dataset The Party dataset Chapter 13 (Decision by Committee: Ensemble Learning): machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 82 The Boosting algorithm The Bagging algorithm A decision tree with weights The random forest algorithm Demonstration of bagging, stumping and random forests on the Party dataset Demonstration of bagging on the Car Safety dataset Demonstration of bagging and random forests on the mushroom dataset Chapter 14 (Unsupervised Learning): The k-‐Means Algorithm The k-‐Means Neural Network The Self-‐Organising Map A Simple 2D Demonstration of the SOM Demonstration of k-‐Means and the SOM on the Iris dataset Demonstration of SOM and Perceptron together on the Iris dataset More demonstrations of the SOM Chapter 15 (Markov Chain Monte Carlo Methods): The Linear Congruential Pseudo-‐Random Number Generator The Box-‐Muller method of constructing Gaussian-‐distributed pseudo-‐random numbers The Rejection Sampling Algorithm The Importance Sampling Algorithm The Sampling-‐Importance-‐Resampling Algorithm The Metropolis-‐Hastings Algorithm The Gibbs Sampler Chapter 16 (Graphical Models): The Gibbs Sampler for the Exam Panic dataset The Hidden Markov Model A simple 1D Kalman Filter A complete Kalman Filter The Extended Kalman Filter The Basic Particle Filter A Tracking Particle Filter The Markov Random Field for Image Denoising A demonstration of finding paths in graphs An image for denoising Chapter 17 (Symmetric Weights and Deep Belief Networks): A Hopfield network The Restricted Boltzmann Machine The Deep Belief Network Algorithm Chapter 18 (Gaussian Processes): The Gaussian Process for Regression Algorithm The Gaussian Process for Classification Algorithm Demo of the Gaussian Process for Classification Plots of the Weibull and Gaussian distributions (Fig 18.1) Plots of GP samples (Fig 18.2) machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 83 Very simple data for the Gaussian Process Regression demo http://stephenmonika.net Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!) https://github.com/rasbt http://sebastianraschka.com Matlab and R code for original figures in book "Analysis of Neural Data" by Robert E. Kass, Uri T. Eden, and Emery N. Brown 1 -‐ Introduction 2 -‐ Exploring Data 3 -‐ Probability and Random Variables 4 -‐ Random Vectors 5 -‐ Important Probability Distributions 6 -‐ Sequences of Random Variables 7 -‐ Estimation and Uncertainty 8 -‐ Estimation in Theory and in Practice 9 -‐ Propagation of Uncertainty and the Bootstrap 10 -‐ Models, Hypotheses, and Statistical Significance 11 -‐ General Methods for Testing Hypotheses 12 -‐ Linear Regression 13 -‐ Analysis of Variance 14 -‐ Generalized Linear and Nonlinear Regression 15 -‐ Nonparametric Regression 16 -‐ Bayesian Methods 17 -‐ Multivariate Analysis 18 -‐ Time Series 19 -‐ Point Processes http://www.stat.cmu.edu/~kass/KEB/index.html#1 Open Source Hong Kong Open Source Hong Kong (OSHK) is an open source organization in Hong Kong which is aimed to advocate open source and technologies developments. http://opensource.hk/en/event Lamda Group, Nanjing University Open Source Software http://lamda.nju.edu.cn/Data.ashx#code machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 84 Miscellaneous Overleaf (ex WriteLaTeX) About Overleaf Overleaf is a collaborative writing and publishing system that makes the whole process of producing academic papers much quicker for both authors and publishers. Overleaf is a free service that lets you create, edit and share your scientific ideas easily online using LaTeX, a comprehensive and powerful tool for scientific writing. Overleaf has grown rapidly since its launch in 2011, and today there are over 150,000 users from over 180 countries worldwide who've created over 2 millions projects using the service. Writelatex Limited, the company behind Overleaf, was founded by John Hammersley and John Lees-‐Miller, two mathematicians who worked together on the pioneering Ultra PRT Project and who were inspired by their own experiences in academia to create a better solution for collaborative scientific writing. Overleaf is supported by Digital Science. Digital Science is a technology company serving the needs of scientific research. Their mission is to provide software that makes research simpler, so there’s more time for discovery. Whether at the bench or in a research setting, their range of products help to simplify workflows and change the way science is done. Digital Science believes passionately that tomorrow's research will be different — and better — than today's. Their portfolio brands include Altmetric, Labguru, Figshare, ReadCube, ÜberResearch, BioRAFT and Symplectic. Digital Science is a business division of Macmillan Science and Education. https://www.overleaf.com/2070900jhqnyz#/5252162/ Interview of Dr John Lees-‐Miller by Imperial College London ACM Student Chapter https://www.youtube.com/watch?v=kYkN0Yv56bI&spfreload=10 LISA Lab GitHub repository, Université de Montréal https://github.com/lisa-‐lab http://www.iro.umontreal.ca/~lisa/ Big Data/Cloud Computing – English Apache SPARK Apache Spark Machine Learning Library MLlib is a Spark implementation of some common machine learning (ML) functionality, as well associated tests and data generators. MLlib currently supports machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 85 four common types of machine learning problem settings, namely, binary classification, regression, clustering and collaborative filtering, as well as an underlying gradient descent optimization primitive. http://spark.apache.org/docs/0.9.1/mllib-‐guide.html 2013 Spark Summit exercises Welcome to the Spark Summit hands-‐on exercises. These exercises are adapted from similar exercises that were prepared for and run at AMP Camp Big Data Bootcamps. They were written by volunteer graduate students and postdocs in the UC Berkeley AMPLab. Many of those same graduate students are also volunteers here on the Spark Summit Training day team as well. The exercises we cover today will have you working directly with the Spark specific components of the AMPLab’s open-‐source software stack, called the Berkeley Data Analytics Stack (BDAS). http://spark-‐summit.org/2013/exercises/index.html 2014 Spark Summit Training Course Prerequisites: • Laptop with WiFi capabilities • Java 6 or 7 TRACK A: Introduction to Apache Spark Workshop INTRO EXERCISES The Introduction to Apache Spark workshop is for users to learn the core Spark APIs. This session features hands-‐on technical exercises to get developers up to speed in using Spark for data exploration, analysis, and building big data applications. The integrated lecture and lab format covers the following topics: • Overview of Big Data and Spark • Installing Spark Locally • Using Spark’s Core APIs in Scala, Java, & Python • Building Spark Applications • Deploying on a Big Data Cluster • Building Applications for Multiple Platforms TRACK B:Advanced Apache Spark Workshop ADVANCED EXERCISES The Advanced Apache Spark Workshop will cover advanced topics on architecture, tuning, and each of Spark’s high-‐level libraries (including the latest features). Attendees will have the opportunity after the lunch break to work through labs on each of the libraries. Some familiarity with Spark or MapReduce is expected, as this workshop will not cover basic Spark programming. Topics covered include: Advanced Spark Internals and Tuning – Reynold Xin – SLIDES Spark SQL – Michael Armburst – SLIDES machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 86 Spark Streaming – Tathagata Das – SLIDES • MLlib – Ameet Talwalkar – SLIDES • GraphX – Ankur Dave – SLIDES Apache Spark Summit Videos Videos related to the Apache Spark cluster computing engine. https://www.youtube.com/user/TheApacheSpark/playlists Databricks Videos Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache Spark. We’ve been working for the past six years on cutting-‐edge systems to extract value from Big Data. We believe that Big Data is a huge opportunity that is still largely untapped, and we’re working to revolutionize what you can do with it. Open Source Commitment Apache Spark is 100% open source, and at Databricks we are fully committed to maintaining this model. We believe that no computing platform will win in the Big Data space unless it is fully open source. Spark has one of the largest open source communities in Big Data, with over 200 contributors from 50+ organizations. Databricks works closely with the community to maintain this momentum. https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-‐Q-‐_UUbA/videos SF Scala & SF Bay Area Machine Learning, Joseph Bradley: Decision Trees on Spark Joseph talks about Machine Learning with Spark, focusing on the decision tree and (upcoming) random forest implementations in MLlib. Spark has been established as a natural platform for iterative ML algorithms, and trees provide a great example. This talk aims both to give insight into the underlying implementation and to highlight best practices for using MLlib. http://functional.tv/post/98342564544/sfscala-‐sfbaml-‐joseph-‐bradley-‐decision-‐ trees-‐on-‐spark Slides https://speakerdeck.com/jkbradley/mllib-‐decision-‐trees-‐at-‐sf-‐scala-‐baml-‐meetup Apache MAHOUT Apache Mahout ML library The Apache Mahout™ project's goal is to build a scalable machine learning library. Currently Mahout supports mainly three use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 87 https://mahout.apache.org Apache Mahout on Javaworld Enjoy machine learning with Mahout on Hadoop, 2014 Mahout brings the power of scalable processing to Hadoop's huge data sets http://www.javaworld.com/article/2241046/big-‐data/enjoy-‐machine-‐learning-‐with-‐mahout-‐on-‐hadoop.html Know this right now about Hadoop, 2014 From core elements like HDFS and YARN to ancillary tools like Zookeeper, Flume, and Sqoop, here's your cheat sheet and cartography of the ever expanding Hadoop ecosystem. http://www.javaworld.com/article/2158789/data-‐storage/know-‐this-‐right-‐now-‐about-‐hadoop.html MapReduce programming with Apache Hadoop, 2008 Process massive data sets in parallel on large clusters http://www.javaworld.com/article/2077907/open-‐source-‐tools/mapreduce-‐programming-‐with-‐apache-‐hadoop.html Hadoop Users Group UK Recordings from meetups of the UK Hadoop Users Group. These will be a combination of tech talks, panel sessions and other events that we run. https://www.youtube.com/channel/UCjo2p6jTA0joX8HoUeHFcDg?spfreload=10 Deeplearning4j Deeplearning4j is the first commercial-‐grade deep learning library written in Java. It is meant to be used in business environments, rather than as a research tool for extensive data exploration. Deeplearning4j is most helpful in solving distinct problems, like identifying faces, voices, spam or e-‐commerce fraud. Deeplearning4j aims to be cutting-‐edge plug and play, more convention than configuration. By following its conventions, you get an infinitely scalable deep-‐ learning architecture. The framework has a domain-‐specific language (DSL) for neural networks, to turn their multiple knobs. Deeplearning4j includes a distributed deep-‐learning framework and a normal deep-‐learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce. The distributed framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models. By following the links at the bottom of each page, you will learn to set up, and train with sample data, several types of deep-‐learning networks. These include single-‐ and multithread networks, Restricted Boltzmann machines, deep-‐belief networks and Stacked Denoising Autoencoders. For a quick introduction to neural nets, please see our overview. http://deeplearning4j.org/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 88 Udacity opencourseware "Intro to Hadoop and MapReduce" Course Summary The Apache™ Hadoop® project develops open-‐source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data. Why Take This Course? • How Hadoop fits into the world (recognize the problems it solves) • Understand the concepts of HDFS and MapReduce (find out how it solves the problems) • Write MapReduce programs (see how we solve the problems) • Practice solving problems on your own https://www.udacity.com/course/ud617 Storm Apache Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use! http://storm.incubator.apache.org http://storm.incubator.apache.org/documentation/Tutorial.html Scaling Apache Storm by Taylor Goetz http://www.slideshare.net/ptgoetz https://github.com/ptgoetz Michael Viogiatzis Blog How to spot first stories on Twitter using Storm As a first blog post, I decided to describe a way to detect first stories (a.k.a new events) on Twitter as they happen. This work is part of the Thesis I wrote last year for my MSc in Computer Science in the University of Edinburgh.You can find the document here. http://micvog.com/2013/09/08/storm-‐first-‐story-‐detection/ Elasticsearch Elasticsearch is a flexible and powerful open source, distributed, real-‐time search and analytics engine. Architected from the ground up for use in distributed environments where reliability and scalability are must haves, Elasticsearch gives you the ability to move easily beyond simple full-‐text search. Through its robust set of APIs and query DSLs, plus clients for the most popular programming languages, Elasticsearch delivers on the near limitless promises of search technology. http://www.elasticsearch.org machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 89 Prediction IO BUILD SMARTER SOFTWARE with Machine Learning PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery. http://prediction.io https://hacks.mozilla.org/2014/04/introducing-‐predictionio/ http://www.youtube.com/channel/UCN0jVSCIEh7eeuWXIuo316g Container Cluster Manager Kubernetes builds on top of Docker to construct a clustered container scheduling service. The goals of the project are to enable users to ask a Kubernetes cluster to run a set of containers. The system will automatically pick a worker node to run those containers on. As container based applications and systems get larger, some tools are provided to facilitate sanity. This includes ways for containers to find and communicate with each other and ways to work with and manage sets of containers that do similar work. When looking at the architecture of the system, we'll break it down to services that run on the worker node and services that play a "master" role. https://github.com/GoogleCloudPlatform/kubernetes?utm_source Domino Data Labs Domino is a platform for modern data scientists using Python, R, Matlab, and more. Use our cloud-‐hosted infrastructure to securely run your code on powerful hardware with a single command — without any changes to your code. If you have your own infrastructure, our Enterprise offering provides powerful, easy-‐to-‐use cluster management functionality behind your firewall. Special offer for The Machine Learning Salon's readers: Machine Learning Salon readers can get $50 worth of compute credits when they sign up for Domino. Domino lets you run your analyses on powerful cloud hardware in one step — without any setup or changes to your code. Sign up here, or email support@dominoup.zendesk.com and tell them you are a Machine Learning Salon reader. http://www.dominoup.com Data Science Central Data Science Central is the industry's online resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central provides a community experience that includes a robust editorial platform, social interaction, forum-‐based technical support, the latest in technology, tools and trends and industry job opportunities. http://www.datasciencecentral.com machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 90 Amazon Web Services Videos https://www.youtube.com/user/AmazonWebServices/playlists Google Cloud Computing Videos https://developers.google.com/cloud/videos VLAB: Deep Learning: Intelligence from Big Data, Stanford Graduate School of Business Added the 22-‐Nov-‐2014 http://www.youtube.com/watch?v=czLI3oLDe8M&spfreload=10 Machine Learning and Big Data in Cyber Security Eyal Kolman Technion Lecture Added the 22-‐Nov-‐2014 http://www.youtube.com/watch?v=G2BydTwrrJk&spfreload=10 Chaire Machine Learning Big Data, Telecom Paris Tech (Videos in French) Télécom ParisTech a organisé les premières rencontres de la Chaire de recherche Machine Learning for Big data, le 26 novembre 2014, avec ses partenaires Fondation télécom, Criteo, PSA Peugeot Citroën, Safran. http://www.dailymotion.com/video/x2cti71_chaire-‐ml-‐big-‐data-‐premieres-‐ rencontres_school https://www.youtube.com/user/TelecomParisTech1/search?query=big+data An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014 The past few years have seen a major change in computing systems, as growing data volumes and stalling processor speeds require more and more applications to scale out to distributed systems. Today, a myriad data sources, from the Internet to business operations to scientific instruments, produce large and valuable data streams. However, the processing capabilities of single machines have not kept up with the size of data, making it harder and harder to put to use. As a result, a grow-‐ ing number of organizations—not just web companies, but traditional enterprises and research labs—need to scale out their most important computations to clusters of hundreds of machines. At the same time, the speed and sophistication required of data processing have grown. In addition to simple queries, complex algorithms like machine learning and graph analysis are becoming common in many domains. And in addition to batch processing, streaming analysis of new real-‐time data sources is required to let organizations take timely action. Future computing platforms will need to not only scale out traditional workloads, but support these new applications as well. This dissertation proposes an architecture for cluster computing systems that can tackle emerging data processing workloads while coping with larger and larger scales. Whereas early cluster computing systems, like MapReduce, handled batch processing, our architecture also enables streaming and interactive queries, while machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 91 keeping the scalability and fault tolerance of previous systems. And whereas most deployed systems only support simple one-‐pass computations (e.g., aggregation or SQL queries), ours also extends to the multi-‐pass algorithms required for more complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike the specialized systems proposed for some of these workloads, our architecture allows these computations to be combined, enabling rich new applications that intermix, for example, streaming and batch processing, or SQL and complex analytics. We achieve these results through a simple extension to MapReduce that adds primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to efficiently capture a wide range of workloads. We implement RDDs in the open source Spark system, which we evaluate using both synthetic benchmarks and real user applications. Spark matches or exceeds the performance of specialized systems in many application domains, while offering stronger fault tolerance guarantees and allowing these workloads to be combined. We explore the generality of RDDs from both a theoretical modeling perspective and a practical perspective to see why this extension can capture a wide range of previously disparate workloads. http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-‐2014-‐12.pdf Big Data Requires Big Visions For Big Change | Martin Hilbert | TEDxUCL At the University of California, Davis, Martin thinks about the fundamental theories of how digitization affects society. During his 15 years at the United Nations Secretariat, Martin assisted governments to take advantage of the digital revolution. When the ‘big data’ age arrived, his research was the first to quantify the historical growth of how much technologically mediated information there actually is in the world. He is convinced that ‘big data’ is a huge opportunity for making the world a better place. After joining the faculty of the University of California, Davis, he had more time to think more deeply about the theoretical underpinning and fundamental limitations of the ‘big data’ revolution. When TEDxUCL asked him if there is a limit to the power of data, he answered with the fundamental limitation to all empirical science. The fundamental limit of ‘big data’ has to do with social change and how we envision the future. Luckily, the digital age also provides solutions for fine-‐tuning our future visions. Martin holds doctorates in Economics and Social Sciences, and in Communication, and has provided hands-‐on technical assistance to Presidents, government experts, legislators, diplomats, NGOs, and companies in over 20 countries. More http://www.martinhilbert.net https://www.youtube.com/watch?v=UXef6yfJZAI&spfreload=10 Ethical Quandary in the Age of Big Data | Justin Grace | TEDxUCL Published on 13 Jan 2015 This talk was given at a local TEDx event, produced independently of the TED Conferences. Data is now everywhere. The ‘internet era’ has now passed and we are machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 92 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! entering the era of data. Data use and misuse can lead to both powerful positive change or disaster. Here I discuss the questions we should ask about data and present three case studies where organisations have generated controversy from their data practices. I finish by touching on what we can do to take ownership of our data. Justin is a freelance data scientist who has worked in academia, technology, healthcare and most recently digital media with the Guardian. He is passionate about all things data and understanding how its use and misuse shapes the world we live in and how this affects our relationships with organisations and each other. Justin is a freelance data scientist who has worked in academia, technology, healthcare and most recently digital media with the Guardian. He is passionate about all things data and understanding how its use and misuse shapes the world we live in and how this affects our relationships with organisations and each other. https://www.youtube.com/watch?v=mVZ78kdduyY&spfreload=10 Big Data & Dangerous Ideas | Daniel Hulme | TEDxUCL Published on 15 Jan 2015 This talk was given at a local TEDx event, produced independently of the TED Conferences. This is an illumining and animated talk about how Data and Artificial Intelligence effect our every day lives. It provides a framework for anyone to understand data driven decision making process, and raises critical moral, ethical and legal questions that society needs to address to ensure that our rights are kept safe and that we safeguard our very own existence. Daniel is the Founder and CEO of Satalia (NPComplete Ltd), a spin-‐out of UCL that provides a unique algorithmic technology and professional services to solve industries data-‐driven decision problems. He is passionate about emerging technology and regularly speaks at events with interests in Algorithms, Optimisation, Analytics, Big Data and the Future Internet. Daniel has been awarded a Masters in Computer Science with Machine Learning and Doctorate in Computational Complexity from UCL. He is the Director of UCL Business Analytics MSc, and has Senior Researcher and Lecturing positions in Computer Science and Management Science at UCL and Pearson College. He is a Visiting Fellow of the Big Innovation Centre, and has advisory and executive positions across world-‐wide companies in the area of Education, Analytics, Big Data, Data-‐driven Decision Making and Open-‐Innovation. He holds an international Kauffman Global Entrepreneur Scholarship and actively promotes entrepreneurship and technology innovation across the globe. https://www.youtube.com/watch?v=tLQoncvCKxs&spfreload=10 http://www0.cs.ucl.ac.uk/staff/D.Hulme/ List of good free Programming and Data Resources, BITBOOTCAMP We are a group of data enthusiasts with years of experience working at leading financial companies on Wall Street. In Jan 2014, we started Bit Bootcamp: an intensive and immersive big data boot camp to spread the knowledge and to address the shortage of good talent in the industry. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 93 The motivation for the bootcamp comes from our own difficulties faced while we were trying to hire new talent. No matter now much money we threw at the problem, we could not find people with the right skills. Then we figured we might as well train them ourselves. http://bitboot.camp/resources.html machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 94 Predictive Modeling Competitions – English Angry Birds AI Competition Here you will find all the information about upcoming and previous Angry Birds AI Competitions. The task of this competition is to develop a computer program that can successfully play Angry Birds. The long term goal is to build an intelligent Angry Birds playing agent that can play new levels better than the best human players. http://www.aibirds.org LinkedIn Economic Graph Challenge, Deadline: 15-‐12-‐2014, $25,000 research award Added in the kit 25-‐oct-‐2014 Eligibility The LinkedIn Economic Graph Challenge is open to all U.S. residents (including citizens, permanent residents and visa holders) ages 18 and up. Team entries are allowed, but entries on behalf of a company are not allowed. Teams may have up to five individuals. Only one proposal (either team or individual) per person. Sorry, current LinkedIn employees, contractors, affiliates or interns are not eligible to enter. Selection Proposals are due by midnight Pacific Time on December 15, 2014. Each entry must be an idea or process designed to create positive economic opportunity and impact for members of the global workforce, so dream big. Proposals will be evaluated on a combination of three factors, all equally weighted: Novelty: Takes into account the thoughtfulness and originality of the entry, including its unique approach to taking advantage of data from the Economic Graph. Impact: Considers the potential benefits to the region, country and the world, as well as the extensibility of the proposal. Feasibility: This criterion will weigh the practicality of the submission, measuring the likelihood it can be researched and implemented within a reasonable time period and the types of data from LinkedIn that will be necessary for the proposed research. A diverse panel of judges will evaluate and select winning proposals. Research Award Recipients LinkedIn will select up to three proposals as winners of the LinkedIn Economic Graph Challenge. Selected winners will be notified in early 2015. Each winning submission will receive: A one-‐time $25,000 (USD) research award. Round-‐trip travel and accommodations to LinkedIn headquarters in Mountain View, CA to participate in the LinkedIn Economic Challenge Research Reception (early 2015) and Final Presentation (Fall 2015). machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 95 The potential to receive research resources to execute proposal including a LinkedIn employee collaborator, access to select data from LinkedIn, and equipment for use during the six month research period. Research award recipients will have six months to conduct their research, and will return to Mountain View, CA, for a final presentation in Fall 2015. Research award recipients must sign agreements covering intellectual property and non-‐disclosure of information, and may not publish results without written consent from LinkedIn Corporation. http://economicgraphchallenge.linkedin.com/details/ ChaLearn Added in the kit before 24-‐Oct-‐2014 Mission: Machine Learning is the science of building hardware or software that can achieve tasks by learning from examples. The examples often come as {input, output} pairs. Given new inputs a trained machine can make predictions of the unknown output. Examples of machine learning tasks include: • automatic reading of handwriting • assisted medical diagnosis • automatic text classification (classification of web pages; spam filtering) • financial predictions We organize challenges to stimulate research in this field. The web sites of past challenges remain open for post-‐challenge submission as ever-‐going benchmarks. ChaLearn is a tax-‐exempt organization under section 501(c)(3) of the US IRS code. DLN: 17053090370022. http://www.chalearn.org ChaLearn Automatic Machine Learning Challenge (AutoML) https://www.codalab.org/competitions/2321 IMAGENET Large Scale Visual Recognition Challenge 2014 (closed) Added in the kit 30-‐Oct-‐2014 Introduction This challenge evaluates algorithms for object detection and image classification at large scale. This year there will be two competitions: A PASCAL-‐style detection challenge on fully labeled data for 200 categories of objects, and An image classification plus object localization challenge with 1000 categories. NEW: This year all participants are encouraged to submit object localization results; in past challenges, submissions to classification and classification with localization tasks were accepted separately. One high level motivation is to allow researchers to compare progress in detection across a wider variety of objects -‐-‐ taking advantage of the quite expensive labeling machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 96 effort. Another motivation is to measure the progress of computer vision for large scale image indexing for retrieval and annotation. History ILSVRC 2013 ILSVRC 2012 ILSVRC 2011 ILSVRC 2010 http://image-‐net.org/challenges/LSVRC/2014/ Kaggle Added in the kit before 24-‐Oct-‐2014 Kaggle is the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions. http://www.kaggle.com/competitions Kaggle Competition Past Solutions Added in the kit before 24-‐Oct-‐2014 We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn what makes a stellar and just a good solution. I will post solutions I came upon so we can all learn to become better! I collected the following source code and interesting discussions from the Kaggle held competitions for learning purposes. Not all competitions are listed because I am only manually collecting them, also some competitions are not listed due to no one sharing. I will add more as time goes by. Thank you. http://www.chioka.in/kaggle-‐competition-‐solutions/ Kaggle Connectomics Winning Solution Research Article Added in the kit before 24-‐oct-‐2014 Simple connectome inference from partial correlation statistics in calcium imaging http://arxiv.org/abs/1406.7865 Solution to the Galaxy Zoo Challenge Added in the kit before 24-‐oct-‐2014 http://benanne.github.io/2014/04/05/galaxy-‐zoo.html https://github.com/benanne/kaggle-‐galaxies Winning 2 Kaggle in class competitions on spam Added in the kit before 24-‐oct-‐2014 http://mlwave.com/winning-‐2-‐kaggle-‐in-‐class-‐competitions-‐on-‐spam/ Matlab Benchmark for Packing Santa’s Sleigh translated in Python Added in the kit before 24-‐oct-‐2014 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 97 http://beatingthebenchmark.blogspot.co.uk/search?updated-‐min=2013-‐01-‐01T00:00:00-‐08:00&updated-‐max=2014-‐01-‐ 01T00:00:00-‐08:00&max-‐results=4 Machine learning best practices we've learned from hundreds of competitions -‐ Ben Hamner (Kaggle) Ben Hamner is Chief Scientist at Kaggle, leading its data science and development teams. He is the principal architect of many of Kaggle's most advanced machine learning projects including current work in Eagle Ford and GE's flight arrival prediction and optimization modeling. https://www.youtube.com/watch?v=9Zag7uhjdYo TEDx San Francisco, Jeremy Howard talk (Connecting Devices with Algorithms) Added in the kit before 24-‐oct-‐2014 http://tedxsf.org/videos/#tedxsf-‐connected-‐reality CrowdANALYTICS Added in the kit before 24-‐oct-‐2014 https://crowdanalytix.com/jq/solver.html Challenges for governmental applications Added in the kit before 24-‐oct-‐2014 https://challenge.gov InnoCentive Challenge Center Added in the kit before 24-‐oct-‐2014 https://www.innocentive.com/ar/challenge/browse TunedIT Added in the kit before 24-‐oct-‐2014 http://tunedit.org Ants, AI Challenge, sponsored by Google, 2011 Added in the kit before 24-‐oct-‐2014 The AI Challenge is all about creating artificial intelligence, whether you are a beginning programmer or an expert. Using one of the easy-‐to-‐use starter kits, you will create a computer program (in any language) that controls a colony of ants which fight against other colonies for domination. http://ants.aichallenge.org International Collegial Programming Contest Added in the kit before 24-‐oct-‐2014 The ACM International Collegiate Programming Contest (ICPC) is the premiere global programming competition conducted by and for the world’s universities. The competition operates under the auspices of ACM, is sponsored by IBM, and is headquartered at Baylor University. For nearly four decades, the ICPC has grown to be a game-‐ changing global competitive educational program that has raised machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 98 aspirations and performance of generations of the world’s problem solvers in the computing sciences and engineering. http://icpc.baylor.edu/welcome.icpc Dream challenges Added in the kit before 24-‐oct-‐2014 The Dialogue on Reverse Engineering Assessment and Methods (DREAM) project is an initiative to advance the field of systems biology through the organization of Challenges to foster the development of predictive models that allow scientists to better understand human disease. Challenges engage broad and diverse communities of scientists to competitively solve a specific problem in a given time period. The concept fosters collaboration between scientists through shared data and approaches. DREAM has developed by “Challenge” concept by launching 27 successful challenges over the past seven years. Sage Bionetworks and DREAM merged in early 2013 in order to develop Challenges engage a broader participation of the research community in open science projects hosted on Synapse, and that provide a meaningful impact to both discovery and clinical research. By presenting the research community with well-‐formulated questions that usually involve complex data, we effectively enable the sharing and improvement of predictive models, accelerating many-‐fold the transformation of this data into useful scientific knowledge. Our ultimate goal is to foster collaborations of like-‐minded researchers that together will find the solution for vexing problems that matter most to citizens and patients. https://www.synapse.org/#!Wiki:syn1929437/ENTITY Texata Added in the kit before 24-‐oct-‐2014 Welcome to the Official 2014 TEXATA Big Data Analytics World Championships. This global event is a fun, innovative and challenging competition for students and professionals to develop and test their Big Data Analytics skills against their friends, colleagues and top data experts from around the world. TEXATA 2014 is a World Championship Event independently organized and administered by the Professional Services Champions League (PSCL). http://www.texata.com Cisco Internet of Things Innovation Grand Challenge Added in the kit before 24-‐oct-‐2014 The focus of the Internet of Things (IoT) Innovation Grand Challenge is to spearhead an industry-‐wide initiative to accelerate the adoption of breakthrough technologies and products that will contribute to the growth and evolution of the Internet of Things. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 99 This global open competition aims to recognize, promote and reward innovators, entrepreneurs and early-‐stage startup businesses that can help us transform businesses and industries by re-‐inventing business processes, operational efficiencies and customer service innovations. We are seeking submissions from early stage businesses and teams that have technology-‐based prototypes and proof of concepts (PoC) in development. https://iotchallenge.cisco.spigit.com/Page/Home Predictive Modeling Competitions -‐ Spanish Coming soon … Predictive Modeling Competitions -‐ German Coming soon … Predictive Modeling Competitions -‐ Italian Coming soon … Predictive Modeling Competitions – French RATP OpenDataLab results http://data.ratp.fr/fr/actualites.html Coming soon … Predictive Modeling Competitions -‐ Russian Competition Avito.ru-‐2014: Recognition of contact information in images Contest to recognize the contact information on the pictures Avito.ru -‐ contest on solving applied problems from the field of image analysis, held under an informational support of the 10-‐th International Conference"intellectualization of information processing-2014" (IOI 2014), Greece, on. Crete, 4-‐11 October 2014. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 100 The organizers of the competition -‐ the company Avito.ru and her partner -‐ Foreksis . Questions to the organizers of the contest can be set in the discussion page of the competition for registered portalMachineLearning.ru users, or by e-‐mail to competition.avito.2014@forecsys.ru indicating in the subject line "Question". With information about the organizer of the contest, its rules, the number of awards, the date, place and manner of their preparation can be found here . Preliminary rating of participants . Key dates of the competition October 1, 2014 -‐ Start of the contest until 23:59 November 4 -‐ Registration of participants 23:59 November 13 -‐Education and collection algorithms participants November 14 -‐ Providing control sample C, and answers for the sample B to 23:59 November 18 -‐ Collecting the results of algorithms the control sample C November 19 -‐ December 10 -‐ The winners and check the reproducibility of results, publication of presentations of winners on the contest page http://www.machinelearning.ru/wiki/index.php?title=%D0%9A%D0%BE%D0%B D%D0%BA%D1%83%D1%80%D1%81_Avito.ru-‐ 2014:_%D1%80%D0%B0%D1%81%D0%BF%D0%BE%D0%B7%D0%BD%D0%B 0%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%BA%D0%BE%D0%BD% D1%82%D0%B0%D0%BA%D1%82%D0%BD%D0%BE%D0%B9_%D0%B8%D0% BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%86%D0%B8%D0%B8_% D0%BD%D0%B0_%D0%B8%D0%B7%D0%BE%D0%B1%D1%80%D0%B0%D0% B6%D0%B5%D0%BD%D0%B8%D1%8F%D1%85 Russian AI Cup -‐ Competition Programming Artificial Intelligence, 2013 Open competition Programming Artificial Intelligence. Try your hand at programming strategy game! It's simple, clear and fun! Championship second Russian AI Cup called CodeTroopers. You have to program the AI to the detachment of soldiers. Your strategy will battle each other in the Sandbox and the championship. You can use any of the programming languages: C + +, Java, C #, Python or Pascal. Sandbox is already open. Good luck! To participate in the competition are invited as novice programmers -‐ students and students and professionals alike. Does not require any special knowledge, fairly basic programming skills. http://russianaicup.ru/ Predictive Modeling Competitions -‐ Portuguese Coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 101 Open Dataset – English The Text REtrieval Conference (TREC) Datasets Added in the kit 04-‐Nov-‐2014 The Text REtrieval Conference (TREC), co-‐sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the TIPSTER Text program. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-‐scale evaluation of text retrieval methodologies. In particular, the TREC workshop series has the following goals: • to encourage research in information retrieval based on large test collections; • to increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas; • to speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-‐world problems; and • to increase the availability of appropriate evaluation techniques for use by industry and academia, including development of new evaluation techniques more applicable to current systems. TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provides a test set of documents and questions. Participants run their own retrieval systems on the data, and return to NIST a list of the retrieved top-‐ranked documents. NIST pools the individual results, judges the retrieved documents for correctness, and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences. This evaluation effort has grown in both the number of participating systems and the number of tasks each year. Ninety-‐three groups representing 22 countries participated in TREC 2003. The TREC test collections and evaluation software are available to the retrieval research community at large, so organizations can evaluate their own retrieval systems at any time. TREC has successfully met its dual goals of improving the state-‐of-‐the-‐art in information retrieval and of facilitating technology transfer. Retrieval system effectiveness approximately doubled in the first six years of TREC. TREC has also sponsored the first large-‐scale evaluations of the retrieval of non-‐ English (Spanish and Chinese) documents, retrieval of recordings of speech, and retrieval across multiple languages. TREC has also introduced evaluations for open-‐ domain question answering and content-‐based retrieval of digital video. The TREC test collections are large enough so that they realistically model operational settings. Most of today's commercial search engines include technology first developed in TREC. http://trec.nist.gov/data.html machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 102 HDX Humanitarian Data Exchange Added in the kit 28-‐oct-‐2014 What is HDX? The goal of the Humanitarian Data Exchange (HDX) is to make humanitarian data easy to find and use for analysis. We are working on three elements that will eventually combine into an integrated data platform. Repository The HDX repository, where data providers can upload their raw data spreadsheets for others to find and use. Analytics HDX analytics, a database of high-‐value data that can be compared across countries and crises, with tools for analysis and visualisation. Standards Standards to help share humanitarian data through the use of a consensus Humanitarian Exchange Language. https://data.hdx.rwlabs.org/dataset World Data Bank Added in the kit before 24-‐oct-‐2014 Explore. Create. Share: Development Data DataBank is an analysis and visualisation tool that contains collections of time series data on a variety of topics. You can create your own queries; generate tables, charts, and maps; and easily save, embed, and share them. The World Bank Group has set two goals for the world to achieve by 2030: • End extreme poverty by decreasing the percentage of people living on less than $1.25 a day to no more than 3% • Promote shared prosperity by fostering the income growth of the bottom 40% for every country The World Bank is a vital source of financial and technical assistance to developing countries around the world. We are not a bank in the ordinary sense but a unique partnership to reduce poverty and support development. The World Bank Group comprises five institutions managed by their member countries. Established in 1944, the World Bank Group is headquartered in Washington, D.C. We have more than 10,000 employees in more than 120 offices worldwide. http://databank.worldbank.org/data/home.aspx US Dataset Added in the kit before 24-‐oct-‐2014 The home of the U.S. Government’s open data Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more. http://www.data.gov/ US City Open Data Census Added in the kit before 24-‐oct-‐2014 http://us-‐city.census.okfn.org machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 103 Machine Learning repository Added in the kit before 24-‐oct-‐2014 The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. https://archive.ics.uci.edu/ml/datasets.html IMAGENET Added in the kit 30-‐Oct-‐2014 ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures. Who uses ImageNet? We envision ImageNet as a useful resource to researchers in the academic world, as well as educators around the world. Does ImageNet own the images? Can I download the images? No, ImageNet does not own the copyright of the images. ImageNet only provides thumbnails and URLs of images, in a way similar to what image search engines do. In other words, ImageNet compiles an accurate list of web images for each synset of WordNet. For researchers and educators who wish to use the images for non-‐ commercial research and/or educational purposes, we can provide access through our site under certain conditions and terms. For details click here http://www.image-‐net.org Stanford Large Network Dataset Collection Added in the kit before 24-‐oct-‐2014 Social networks : online social networks, edges represent interactions between people Networks with ground-‐truth communities : ground-‐truth network communities in social and information networks Communication networks : email communication networks with edges representing communication Citation networks : nodes represent papers, edges represent citations Collaboration networks : nodes represent scientists, edges represent collaborations (co-‐authoring a paper) Web graphs : nodes represent webpages and edges are hyperlinks machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 104 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Amazon networks : nodes represent products and edges link commonly co-‐ purchased products Internet networks : nodes represent computers and edges communication Road networks : nodes represent intersections and edges roads connecting the intersections Autonomous systems : graphs of the internet Signed networks : networks with positive and negative edges (friend/foe, trust/distrust) Location-‐based online social networks : Social networks with geographic check-‐ins Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets Online communities : Data from online communities such as Reddit and Flickr Online reviews : Data from online review systems such as BeerAdvocate and Amazon Information cascades : ... SNAP networks are also availalbe from UF Sparse Matrix collection. Visualizations of SNAP networks by Tim Davis. http://snap.stanford.edu/data/ Deep Learning datasets Added in the kit before 24-‐oct-‐2014 Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. This website is intended to host a variety of resources and pointers to information about Deep Learning. In these pages you will find • a reading list, • links to software, • datasets, • a list of deep learning research groups and labs, • a list of announcements for deep learning related jobs (job listings), • as well as tutorials and cool demos. http://deeplearning.net/datasets/ Open Government Data (OGD) Platform India Added in the kit before 24-‐oct-‐2014 http://data.gov.in Yahoo Datasets Added in the kit before 24-‐oct-‐2014 We have various types of data available to share. They are categorized into Ratings, Language, Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant data and resources available via the Yahoo! Developer Network. http://webscope.sandbox.yahoo.com/catalog.php machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 105 Windows Azure Marketplace Added in the kit before 24-‐oct-‐2014 One-‐Stop Shop for Premium Data and Applications Hundreds of Apps, Thousands of Subscriptions, Trillions of Data Points https://datamarket.azure.com/browse/data?price=free Amazon Public Data Sets Added in the kit before 24-‐oct-‐2014 Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-‐based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. Learn more about Public Data Sets on AWS and visit the Public Data Sets forum. http://aws.amazon.com/datasets/ Wikipedia: Database Download Added in the kit before 24-‐oct-‐2014 Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-‐ licensed under the Creative Commons Attribution-‐ShareAlike 3.0 License (CC-‐BY-‐ SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights. http://en.wikipedia.org/wiki/Wikipedia:Database_download Gutenberg project (Free books available in different format, useful for NLP) Added in the kit before 24-‐oct-‐2014 Project Gutenberg offers 45,541 free ebooks to download. (source the 5th June 2014) http://www.gutenberg.org/ebooks/search/?sort_order=downloads Freebase Added in the kit before 24-‐oct-‐2014 Use Freebase data Freebase data is free to use under an open license. You can: Query Freebase using our Search, Topic, or MQL APIs Download our weekly data dumps http://www.freebase.com Datamob Data Added in the kit before 24-‐oct-‐2014 http://datamob.org/datasets Reddit Datasets Added in the kit before 24-‐oct-‐2014 http://www.reddit.com/r/datasets/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 106 100+ Interesting Data Sets for Statistics Added in the kit before 24-‐oct-‐2014 Summary: Looking for interesting data sets? Here's a list of more than 100 of the best stuff, from dolphin relationships to political campaign donations to death row prisoners. http://rs.io/2014/05/29/list-‐of-‐data-‐sets.html Data portal of the City of Chicago Added in the kit before 24-‐oct-‐2014 https://data.cityofchicago.org/browse?limitTo=datasets&utf8=✓ Remark: you need to copy the following link in your browser, temporary problem Gold mine where we can find data set such as names, salaries, positions of all persons working for Chicago City! https://data.cityofchicago.org/Administration-‐Finance/Current-‐Employee-‐Names-‐ Salaries-‐and-‐Position-‐Title/xzkq-‐xp2w Data portal of the City of Seattle Added in the kit before 24-‐oct-‐2014 https://data.seattle.gov/browse Data portal of the City of LA Added in the kit before 24-‐oct-‐2014 https://data.lacity.org/browse?limitTo=datasets&utf8=✓ Remark: you need to copy the following link in your browser, temporary problem California Department of Water Resources Added in the kit 27-‐oct-‐2014 DWR has many programs and data tools to collect and disseminate information on water resources. All Water Data Topics… http://www.water.ca.gov/nav/nav.cfm?loc=t&id=106 CALIFORNIA DATA EXCHANGE CENTER (CDEC) With the cooperation of over 140 other agencies, the CDEC provides real-‐time, forecast, and historical hydrologic data. This data includes water discharge in rivers, water storage in reservoirs, precipitation accumulation, and water content in snow pack, primarily focused in flood management. However, the data is also helpful for determining general water availability and natural supply trends. More about CDEC http://cdec.water.ca.gov/ CALIFORNIA IRRIGATION MANAGEMENT INFORMATION SYSTEM (CIMIS) CIMIS is a network of over 120 automated weather stations in California. CIMIS was developed in 1982 by DWR and the University of California, Davis to assist California's irrigators to manage their water resources efficiently. More about CIMIS http://wwwcimis.water.ca.gov/cimis/welcome.jsp WATER DATA LIBRARY The library provides geographic-‐based data on water conditions. More about the Water Data Library http://www.water.ca.gov/waterdatalibrary/ INTERAGENCY ECOLOGICAL PROGRAM machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 107 The Interagency Ecological Program (IEP) provides ecological information and scientific leadership for use in management of the San Francisco Estuary. More about IEP http://www.water.ca.gov/iep/ INTEGRATED WATER RESOURCES INFORMATION SYSTEM (IWRIS) IWRIS is a one stop shop for state-‐wide water resources information. It integrates multi-‐disciplinary data to support Integrated Regional Water Management. More about IWRIS http://www.water.ca.gov/iwris/ http://www.water.ca.gov/data_home.cfm Data portal of the City of Dallas Added in the kit before 24-‐oct-‐2014 https://www.dallasopendata.com/browse Data portal of the City of Austin Added in the kit before 24-‐oct-‐2014 https://data.austintexas.gov How to produce and use datasets: lessons learned, mlwave Added in the kit before 24-‐oct-‐2014 http://mlwave.com/how-‐to-‐produce-‐and-‐use-‐datasets-‐lessons-‐learned/ MITx and HarvardX release MOOC datasets and visualization tools Added in the kit before 24-‐oct-‐2014 http://newsoffice.mit.edu/2014/mitx-‐and-‐harvardx-‐release-‐mooc-‐datasets-‐and-‐vizualization-‐tools Finding the perfect house using open data, Justin Palmer’s Blog Added in the kit before 24-‐oct-‐2014 http://dealloc.me/2014/05/24/opendata-‐house-‐hunting/ Synapse Added in the kit before 24-‐oct-‐2014 A private or public workspace that allows you to aggregate, describe, and share your research. A tool to improve reproducibility of data intensive science, recording progress as you work with tools such as R and Python. A set of living research projects enabling contribution to large-‐scale collaborative solutions to scientific problems. https://www.synapse.org NYC Taxi Trips Date from 2013 Added in the kit before 24-‐oct-‐2014 These data were made publicly available thanks to Chris Whong who did the heavy lifting. He is also providing links to a bittorrent where the data can be downloaded much faster. Read more about it here. http://www.andresmh.com/nyctaxitrips/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 108 Sebastian Raschka’s Dataset Collections Added in the kit before 24-‐oct-‐2014 https://github.com/rasbt/pattern_classification/blob/master/resources/dataset_collections.md Awesome Public Datasets by Xiaming Chen, Shanghai, China This list of public data sources are collected and tidyed from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. https://github.com/caesar0301/awesome-‐public-‐datasets I am now a Ph.D. candidate with Prof. Yaohui Jin at Shanghai Jiao Tong Univ.. I received my B.S. (2010) of Optical Information and Science Technology at Xidian University, Xi'an, China. My research interests come from the measurement and analysis of network traffic, especially the renewed models and characteristics of networks traffic, with the data mining techniques and high performance processing platforms like Network Processors and distributed processing systems like Hadoop/MapReduce or Spark. If you are interested in my articles, researches, or projects, you can reach me via email or other partially instant messages like github. Enjoy! :-‐) http://hsiamin.com/pages/about.html UK Dataset Added in the kit before 24-‐oct-‐2014 Opening up government http://data.gov.uk/ LONDON DATASTORE -‐ 591 datasets Added in the kit 24-‐oct-‐2014 Welcome to the new look DataStore Over the last few months we have been busy updating London Datastore to deliver a host of practical new features -‐ improved (geography based) searches, dataset previews and APIs – all of which will make for a much sleeker experience. The technical improvements are there to support our broader aim of kick-‐starting collaboration so that the value of data in our city reaches its full potential. Have a look around, read the introductory blog and Let us know what you think. http://data.london.gov.uk Transport For London Open Data, UK Added in the kit before 24-‐oct-‐2014 http://www.tfl.gov.uk/info-‐for/open-‐data-‐users/our-‐open-‐data Gaussian Processes List of Datasets Added in the kit 04-‐Nov-‐2014 Welcome to the web site for theory and applications of Gaussian Processes Gaussian Process is powerful non-‐parametric machine learning technique for constructing comprehensive probabilistic models of real world problems. They can machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 109 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! be applied to geostatistics, supervised, unsupervised, reinforcement learning, principal component analysis, system identification and control, rendering music performance, optimization and many other tasks. People Geology & Modelling Research Group at Rio Tinto Centre for Mine Automation, ACFR, University of Sydney http://gaussianprocess.com/datasets.php The New York Times Linked Open Data (Beta) Added in the kit 02-‐Nov-‐2014 For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies ever developed. In 2009, we began to publish this vocabulary as linked open data. The Data As of 13 January 2010, The New York Times has published approximately ,10,000 subject headings as linked open data under a CC BY license. We provide both RDF documents and a human-‐friendly HTML versions. The table below gives a breakdown of the various tag types and mapping strategies on data.nytimes.com. Type Manually Mapped Tags Automatically Mapped Tags Total People 4,978 0 4,978 Organizations 1,489 1,592 3,081 Locations 1,910 0 1,910 Descriptors 498 0 498 Total 10,467 http://data.nytimes.com Google Public Data Explorer Added the 4-‐Nov-‐2014 The Google Public Data Explorer makes large, public-‐interest datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your own comparisons, and share your findings. Students, journalists, policy makers and everyone else can play with the tool to create visualizations of public data, link to them, or embed them in their own webpages. Embedded charts and links can update automatically so you’re always sharing the latest available data. The Public Data Explorer launched in March, 2010. See this blog post, which originally announced the product, for more background and historical perspective. https://www.google.com/publicdata/directory?hl=en_US&dl=en_US#!st=DATASET machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 110 Open Dataset -‐ French Montreal, Portail Donnees Ouvertes (French&English), Canada Added in the kit before 24-‐oct-‐2014 http://donnees.ville.montreal.qc.ca Insee, France Added in the kit before 24-‐oct-‐2014 http://www.insee.fr/fr/publications-‐et-‐services/depliant_webinsee.pdf RATP Open Data, French Tube in Paris, France Added in the kit before 24-‐oct-‐2014 http://data.ratp.fr/fr/les-‐donnees.html L’Open-‐Data français cartographié Added in the kit 28-‐Oct-‐2014 Voici trois cartographies de l’écosphère de l‘Open Data français. Sur fond noir, les trois posters (téléchargeable au format « A0″) livrent un aperçu général sur l’open-‐ data français actuel. Les trois cartographies sont basées sur les données fournies par Data-‐Publica, notamment deux études réalisées récemment par Guillaume Lebourgeois, Pierrick Boitel et Perrine Letellier (ayant accueilli les deux derniers dans mon enseignement à l’UTC au semestre dernier). L’objectif de ces cartes est d’entamer une « radiographie » assez complète du domaine, renouvelable dans le temps (peut-‐être tous les six mois) et directement associée aux données présentes chez Data-‐Publica. En somme, une sorte d’observatoire de l’open-‐data français dans lequel je me lance à travers les productions de l’Atelier de Cartographie. http://ateliercartographie.wordpress.com/2012/09/23/lopen-‐data-‐francais-‐ cartographie/ Open Dataset -‐ China Lamda Group Data • Image Data For Multi-‐Instance Multi-‐Label Learning • MDDM Data for for multi-‐label dimensionality reduction. • Text Data for Multi-‐Instance Learning • MILWEB Data for Multi-‐Instance Learning Based Web Index Recommendation. • SGBDota Data for the PCES (Positive Concept Expansion with Single snapshot) problem. • Single Face Dataset Data for Face Recognition with One Training Image per Person. • Text Data For Multi-‐Instance Multi-‐Label Learning http://lamda.nju.edu.cn/Data.ashx machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 111 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Data Visualisation Visualization Lab Gallery, Computer Science Division, University of California, Berkeley Added the 15-‐Nov-‐2014 CS 294-‐10 Fall '14 Visualization Instructors: Maneesh Agrawala and Jessica Hullman Course Wiki CS 160 Spring '14 User Interface Design Instructor: Maneesh Agrawala and Bjoern Hartmann TAs: Brittany Cheng, Steve Rubin, and Eric Xiao Course Wiki CS 294-‐10 Fall '13 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Spring '12 User Interface Design Instructor: Maneesh Agrawala TAs: Nicholas Kong, Anuj Tewari Course Wiki CS 294-‐69 Fall '11 Image Manipulation and Computational Photography Instructor: Maneesh Agrawala TA: Floraine Berthouzoz Course Wiki CS 294-‐10 Spring '11 Visualization Instructor: Maneesh Agrawala Course Wiki CS 184 Fall '10 Computer Graphics Instructor: Maneesh Agrawala TAs: Robert Carroll, Fu-‐Chung Huang Course Wiki CS 160 Spring '10 User Interface Instructors: Bjoern Hartmann, Maneesh Agrawala TAs: Kenrick Kin, Anuj Tewari Course Wiki CS 294-‐10 Spring '10 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Spring '09 User Interfaces Instructors: Maneesh Agrawala, Jeffrey Nichols TAs: Nicholas Kong machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 112 Course Wiki CS 294-‐10 Fall '08 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Spring '08 User Interfaces Instructor: Maneesh Agrawala TAs: Wesley Willett and Seth Horrigan Course Wiki CS 294-‐10 Fall '07 Visualization Instructor: Maneesh Agrawala Course Wiki CS 160 Fall '06 User Interfaces Instructor: Maneesh Agrawala TAs: David Sun and Jerry Yu Course Wiki CS 294-‐10 Spring '06 Visualization Organizers: Maneesh Agrawala, Jeffrey Heer Course Wiki http://vis.berkeley.edu/courses/cs294-‐10-‐ fa14/wiki/index.php/Visualization_Gallery Visualization Lab Software, Computer Science Division, University of California, Berkeley Added the 15-‐Nov-‐2014 http://vis.berkeley.edu/software Visualization Lab Course Wiki, Computer Science Division, University of California, Berkeley Added the 15-‐Nov-‐2014 http://vis.berkeley.edu/courses/ Mike Bostock Visualizing algorithms http://bost.ocks.org/mike/ Eyeo Festival Eyeo assembles an incredible set of creative coders, data designers and artists, and attendees -‐-‐ expect enthralling talks, unique workshops and interactions with open source instigators and super fascinating practitioners. Join us for an extraordinary festival. http://eyeofestival.com machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 113 MIT Data Collider A new language for data visualisation http://datacollider.io D3 JS Data-‐Driven Documents D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-‐driven approach to DOM manipulation. http://d3js.org Shan He, Research Fellow at MIT Senseable City Lab Shan He is research fellow at MIT Senseable City Lab. She is an architect and a computational design specialist. She is currently a student at MIT Department of Architecture pursuing her SMArchS in Design and Computation. At Senseable, her focus is on data visualization, interactive design and web application. Prior to coming to MIT she worked as a product designer for Blu Homes where she worked on developing an online 3-‐D customization tool with intellectual property. During her time at MIT she has worked as a research assistant for the Clean Energy City Lab at the Advanced Urbanism Center and also for the Mobile Experience Lab at the CMS. Shan holds a B.Arch from Tsinghua University in China and a M.Arch from University of Michigan, Ann Arbor. http://cargocollective.com/shanhe/About-‐Shan-‐He Gource software version control visualization Software projects are displayed by Gource as an animated tree with the root directory of the project at its centre. Directories appear as branches with files as leaves. Developers can be seen working on the tree at the times they contributed to the project. https://www.youtube.com/watch?v=NjUuAuBcoqs#t=73 https://code.google.com/p/gource/ Logstalgia, website access log visualization Logstalgia (aka ApachePong) is a website access log visualization tool. https://code.google.com/p/logstalgia/ Andrew Caudwell's Blog Andrew Caudwell is a software developer and sometimes computer graphics programmer/artist located in Wellington, New Zealand. He is probably best known through his work as the author of several popular data visualizations: machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 114 Logstalgia (aka Apache Pong) – a visualization of website traffic as a pong-‐like game Gource – a force-‐directed layout software version control visualization This blog is a collection of his work, experiments, thoughts and ideas on procedurally generated computer graphics and animation. http://www.thealphablenders.com MLDemos , EPFL, Switzerland MLDemos is an open-‐source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. MLDemos is open-‐source and free for personal and academic use. http://mldemos.epfl.ch The University of Florida Sparse Matrix Collection We describe the University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications. The Collection is widely used by the numerical linear algebra community for the development and performance evaluation of sparse matrix algorithms. It allows for robust and repeatable experiments: robust because performance results with artificially-‐ generated matrices can be misleading, and repeatable because matrices are curated and made publicly available in many formats. Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs). We provide software for accessing and managing the Collection, from MATLAB, Mathematica, Fortran, and C, as well as an online search capability. Graph visualization of the matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task. http://www.cise.ufl.edu/research/sparse/matrices/ Visualization & Graphics lab, Dept. of CSA and SERC, Indian Institute of Science, Bangalore This is the video channel of the Visualization & Graphics lab (http://vgl.serc.iisc.ernet.in) which is part of the Dept. of CSA and SERC, Indian Institute of Science, Bangalore. It contains videos created by the members of the lab as part of their research. https://www.youtube.com/user/vgliisc/videos?spfreload=10 Allison McCann Allison McCann is a visual journalist and data reporter for FiveThirtyEight. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 115 http://allisontmccann.com Scott Murray I write software that generates images and interactive experiences. I’m interested in data visualization, generative art, and designed experiences that encourage people to slow down and reflect. I am an Assistant Professor of Design at USF, a contributor to Processing, and the author of Interactive Data Visualization for the Web. I studied at MassArt’s Dynamic Media Institute (M.F.A. 2010) and Vassar College (A.B. 2001). Website The energetic particles on the home page were created with Processing and Processing.js. Site content is managed in a database-‐free environment with Kirby. Changes are pushed with git to magical boxes at Pagoda Box, where the files are hosted. Site analytics magic performed by Piwik. The site was made mobile-‐friendly through a combination of CSS3 media queries and JavaScript. http://alignedleft.com The Best New York City Maps of 2014 As the growth of the digital universe continues to accelerate, we now have more information at our fingertips than ever before. One product of this new information age is the rapidly-‐growing art form of digital cartography. Done well, digital maps are as beautiful and striking as any classic art form, and at the same time convey deep insights inexpressible with language. Due its population and cultural density, New York City is the subject of some of the best map work being done today. Here are our favorites from 2014. https://revaluate.com/blog/best-‐nyc-‐maps-‐of-‐2014/ Gephi: The Open Graph Viz Platform Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs. Runs on Windows, Linux and Mac OS X. Gephi is open-‐source and free. What else? ;-‐) Gephi is an open-‐source software for network visualization and analysis. It helps data analysts to intuitively reveal patterns and trends, highlight outliers and tells stories with their data. It uses a 3D render engine to display large graphs in real-‐ time and to speed up the exploration. Gephi combines built-‐in functionalities and flexible architecture to explore, analyze, spatialize, filter, cluster, manipulate, export all types of networks. Gephi is based on a visualize-‐and-‐manipulate paradigm which allow any user to discover networks and data properties. Moreover, it is designed to follow the chain of a case study, from data file to nice printable maps. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 116 Gephi is a free/libre software distributed under the GPL 3 ("GNU General Public License"). Tags: network, network science, infovis, visualization, visual analytics, exploratory data analysis, graph, graph viz, graph theory, complex network, software, open source, science https://gephi.github.io/features/ http://gephi.github.io Data Analysis and Visualization Using R by David Robinson This is a course that combines video, HTML and interactive elements to teach the statistical programming language R. http://varianceexplained.org/RData/ Books – English An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia, 2014 The past few years have seen a major change in computing systems, as growing data volumes and stalling processor speeds require more and more applications to scale out to distributed systems. Today, a myriad data sources, from the Internet to business operations to scientific instruments, produce large and valuable data streams. However, the processing capabilities of single machines have not kept up with the size of data, making it harder and harder to put to use. As a result, a grow-‐ ing number of organizations—not just web companies, but traditional enterprises and research labs—need to scale out their most important computations to clusters of hundreds of machines. At the same time, the speed and sophistication required of data processing have grown. In addition to simple queries, complex algorithms like machine learning and graph analysis are becoming common in many domains. And in addition to batch processing, streaming analysis of new real-‐time data sources is required to let organizations take timely action. Future computing platforms will need to not only scale out traditional workloads, but support these new applications as well. This dissertation proposes an architecture for cluster computing systems that can tackle emerging data processing workloads while coping with larger and larger scales. Whereas early cluster computing systems, like MapReduce, handled batch processing, our architecture also enables streaming and interactive queries, while keeping the scalability and fault tolerance of previous systems. And whereas most deployed systems only support simple one-‐pass computations (e.g., aggregation or SQL queries), ours also extends to the multi-‐pass algorithms required for more complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike the specialized systems proposed for some of these workloads, our architecture allows these computations to be combined, enabling rich new applications that machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 117 intermix, for example, streaming and batch processing, or SQL and complex analytics. We achieve these results through a simple extension to MapReduce that adds primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to efficiently capture a wide range of workloads. We implement RDDs in the open source Spark system, which we evaluate using both synthetic benchmarks and real user applications. Spark matches or exceeds the performance of specialized systems in many application domains, while offering stronger fault tolerance guarantees and allowing these workloads to be combined. We explore the generality of RDDs from both a theoretical modeling perspective and a practical perspective to see why this extension can capture a wide range of previously disparate workloads. http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-‐2014-‐12.pdf Deep Learning (Artificial Intelligence) , An MIT Press book in preparation, by Yoshua Bengio, Ian Goodfellow and Aaron Courville, 20-‐Oct-‐2014 Please help us make this a great book! This draft is still full of typos and can be improved in many ways. Your suggestions are more than welcome. Do not hesitate to contact any of the authors directly by e-‐mail or Google+ messages: Yoshua, Ian, Aaron. Table of Contents Deep Learning for AI Linear Algebra Probability and Information Theory Numerical Computation Machine Learning Basics Feedforward Deep Networks Structured Probabilistic Models: A Deep Learning Perspective Unsupervised and Transfer Learning Convolutional Networks Sequence Modeling: Recurrent and Recursive Nets The Manifold Perspective on Auto-‐Encoders Confronting the Partition Function References http://www.iro.umontreal.ca/~bengioy/dlbook/ Deep Learning Tutorial by LISA Lab, University of Montreal, 2014 Added in the kit 04-‐Nov-‐2014 The tutorials presented here will introduce you to some of the most important deep learning algorithms and will also show you how to run them using Theano. Theano is a python library that makes writing deep learning models easy, and gives the option of training them on a GPU. The algorithm tutorials have some prerequisites. You should know some python, and be familiar with numpy. Since this tutorial is about using Theano, you should read over the Theano basic tutorial first. Once you’ve done that, read through our Getting Started chapter – it introduces the notation, and [downloadable] datasets machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 118 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! used in the algorithm tutorials, and the way we do optimization by stochastic gradient descent. The purely supervised learning algorithms are meant to be read in order: 1. Logistic Regression -‐ using Theano for something simple 2. Multilayer perceptron -‐ introduction to layers 3. Deep Convolutional Network -‐ a simplified version of LeNet5 The unsupervised and semi-‐supervised learning algorithms can be read in any order (the auto-‐encoders can be read independently of the RBM/DBN thread): • Auto Encoders, Denoising Autoencoders -‐ description of autoencoders • Stacked Denoising Auto-‐Encoders -‐ easy steps into unsupervised pre-‐training for deep nets • Restricted Boltzmann Machines -‐ single layer generative RBM model • DeepBeliefNetworks-‐unsupervisedgenerativepre-‐ trainingofstackedRBMsfollowedbysupervised fine-‐tuning Building towards including the mcRBM model, we have a new tutorial on sampling from energy models: • HMC Sampling -‐ hybrid (aka Hamiltonian) Monte-‐Carlo sampling with scan() Building towards including the Contractive auto-‐encoders tutorial, we have the code for now: • Contractive auto-‐encoders code -‐ There is some basic doc in the code. Energy-‐based recurrent neural network (RNN-‐RBM): • Modeling and generating sequences of polyphonic music http://deeplearning.net/tutorial/deeplearning.pdf Statistical Inference for Everyone, by Professor Bryan Blais, 2014 This is a new approach to an introductory statistical inference textbook, motivated by probability theory as logic. It is targeted to the typical Statistics 101 college student, and covers the topics typically covered in the first semester of such a course. It is freely available under the Creative Commons License, and includes a software library in Python for making some of the calculations and visualizations easier. I am a professor of Science and Technology, Bryant University and a research professor at the Institute for Brain and Neural Systems, Brown University. My interests include Theoretical Neuroscience learning and memory in neural systems vision spike-‐timing dependent plasticity Bayesian Inference frequentist versus Bayesian statistics Bayesian approaches to learning and memory Digital to Analog Computer Control autonomous experiments neural networks and robotics machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 119 Global Resources Dynamics of global resources and economics Population growth, Malthusian traps, and energy http://web.bryant.edu/~bblais/statistical-‐inference-‐for-‐everyone-‐sie.html Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014 The book The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining). The book, like the course, is designed at the undergraduate computer science level with no formal prerequisites. To support deeper explorations, most of the chapters are supplemented with further reading references. The Mining of Massive Datasets book has been published by Cambridge University Press. You can get 20% discount here. By agreement with the publisher, you can download the book for free from this page. Cambridge University Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3. We welcome your feedback on the manuscript. The 2nd edition of the book (v2.1) The following is the second edition of the book. There are three new chapters, on mining large graphs, dimensionality reduction, and machine learning. There is also a revised Chapter 2 that treats map-‐reduce programming in a manner closer to how it is used in practice. Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford CS246: Mining Massive Datasets course. Note that the slides do not necessarily cover all the material convered in the corresponding chapters. Download the latest version of the book as a single big PDF file (511 pages, 3 MB). Note to the users of provided slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org/. Comments and corrections are most welcome. Please let us know if you are using these materials in your course and we will list and link to your course. http://infolab.stanford.edu/~ullman/mmds/book.pdf Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi, Huan Liu, 2014 Added in the kit 29-‐oct-‐2014 The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 120 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining. http://dmml.asu.edu/smm/book/ Slides http://dmml.asu.edu/smm/slides/ Causal Inference by Miguel A. Hernán and James M. Robins, May 14, 2014, Draft Added in the kit 29-‐oct-‐2014 The book provides a cohesive presentation of concepts of, and methods for, causal inference. Much of this material is currently scattered across journals in several disciplines or confined to technical articles. We expect that the book will be of interest to anyone interested in causal inference, e.g., epidemiologists, statisticians, psychologists, economists, sociologists, other social scientists… The book is geared towards graduate students and practitioners. We have divided the book in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from complex longitudinal data. We will make drafts of selected book sections available on this website. The idea is that interested readers can submit suggestions or criticisms before the book is published. If you wish to share any comments, please email me or visit us on Facebook (user causalinference). Warning: These documents are drafts. We are constantly revising and correcting errors without documenting the changes. Please make sure you use the most updated version posted here. http://www.hsph.harvard.edu/miguel-‐hernan/causal-‐inference-‐book/ Slides for High Performance Python tutorial at EuroSciPy2014 by Ian Ozsvald Added in the kit 29-‐oct-‐2014 This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-‐organiser of PyDataLondon, co-‐founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-‐founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me. https://github.com/ianozsvald/euroscipy2014_highperformancepython http://ianozsvald.com/2014/08/30/slides-‐for-‐high-‐performance-‐python-‐tutorial-‐ at-‐euroscipy2014-‐book-‐signing/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 121 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! Neural Networks and Deep Learning, 2014 Added in the kit before 24-‐oct-‐2014 Neural Networks and Deep Learning is a free online book. The book will teach you about: Neural networks, a beautiful biologically-‐inspired programming paradigm which enables a computer to learn from observational data Deep learning, a powerful set of techniques for learning in neural networks Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts behind neural networks and deep learning. The book is currently an incomplete beta draft. More chapters will be added over the coming months. For now, you can: Read Chapter 1, which explains how neural networks can learn to recognize handwriting Read Chapter 2, which explains backpropagation, the most important algorithm used to learn in neural networks. http://neuralnetworksanddeeplearning.com/index.html Probabilistic Programming and Bayesian Methods for Hackers by Cameron Davidson-‐ Pilon, 2014 Added in the kit before 24-‐oct-‐2014 Bayesian Methods for Hackers is designed as a introduction to Bayesian inference from a computational/understanding-‐first, and mathematics-‐second, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical-‐background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining. https://github.com/CamDavidsonPilon/Probabilistic-‐Programming-‐and-‐Bayesian-‐ Methods-‐for-‐Hackers Bayesian Reasoning and Machine Learning, David Barber, 2012 (online version 02-‐ 2014) Added in the kit before 24-‐oct-‐2014 Machine learning methods extract value from vast data sets quickly and with modest resources. They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly. People who know the methods have their choice of rewarding jobs. This hands-‐on text opens these opportunities to computer science students with modest mathematical backgrounds. It is designed for final-‐year undergraduates and master's students with limited background in machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 122 linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models. Students learn more than a menu of techniques, they develop analytical and problem-‐solving skills that equip them for the real world. Numerous examples and exercises, both computer based and theoretical, are included in every chapter. Resources for students and instructors, including a MATLAB toolbox, are available online. http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=Brml.Online Past, Present, and Future of Statistical Science by COPSS, 2014 Added in the kit before 24-‐oct-‐2014 http://nisla05.niss.org/copss/past-‐present-‐future-‐copss.pdf Interactive Data Visualization for the Web By Scott Murray, 2013 Read online for free on the publisher website This online version of Interactive Data Visualization for the Web includes 44 examples that will show you how to best represent your interactive data. For instance, you'll learn how to create this simple force layout with 10 nodes and 12 edges. Click and drag the nodes below to see the diagram react. This step-‐by-‐step guide is ideal whether you’re a designer or visual artist with no programming experience, a reporter exploring the new frontier of data journalism, or anyone who wants to visualize and share data. Create and publish your own interactive data visualization projects on the Web—even if you have little or no experience with data visualization or web development. It’s easy and fun with this practical, hands-‐on introduction. Author Scott Murray teaches you the fundamental concepts and methods of D3, a JavaScript library that lets you express data visually in a web browser. Along the way, you’ll expand your web programming skills, using tools such as HTML and JavaScript. http://chimera.labs.oreilly.com/books/1230000000345 Essential of Metaheuristics by Sean Luke, 2013 Added in the kit before 24-‐oct-‐2014 Fill the form and download for free This is an open set of lecture notes on metaheuristics algorithms, intended for undergraduate students, practitioners, programmers, and other non-‐experts. It was developed as a series of lecture notes for an undergraduate course I taught at GMU. The chapters are designed to be printable separately if necessary. As it's lecture notes, the topics are short and light on examples and theory. It's best when complementing other texts. With time, I might remedy this. http://cs.gmu.edu/~sean/book/metaheuristics/ Statistical Model Building, Machine Learning, and the Ah-‐Ha Moment by Grace Wahba, 2013 Added in the kit before 24-‐oct-‐2014 https://archive.org/details/arxiv-‐1303.5153 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 123 An Introduction to Statistical Learning with applications in R. by Gareth James Daniela Witten Trevor Hastie Robert Tibshirani, 2013 (first printing) http://web.stanford.edu/~hastie/local.ftp/Springer/ISLR_print1.pdf Supervised Sequence Labelling with Recurrent Neural Networks by Alex Graves, 2012 Structure of the Book The chapters are roughly grouped into three parts: background material is presented in Chapters 2–4, Chapters 5 and 6 are primarily experimental, and new methods are introduced in Chapters 7–9. Chapter 2 briefly reviews supervised learning in general, and pattern classi-‐ fication in particular. It also provides a formal definition of sequence labelling, and discusses three classes of sequence labelling task that arise under different relationships between the input and label sequences. Chapter 3 provides back-‐ ground material for feedforward and recurrent neural networks, with emphasis on their application to labelling and classification tasks. It also introduces the sequential Jacobian as a tool for analysing the use of context by RNNs. Chapter 4 describes the LSTM architecture and introduces bidirectional LSTM (BLSTM). Chapter 5 contains an experimental comparison of BLSTM to other neural network architectures applied to framewise phoneme classification. Chapter 6 investigates the use of LSTM in hidden Markov model-‐neural network hybrids. Chapter 7 introduces connectionist temporal classification, Chapter 8 covers multidimensional networks, and hierarchical subsampling networks are described in Chapter 9. http://www.cs.toronto.edu/%7Egraves/preprint.pdf A course in Machine Learning by Hal Daume, 2012 Added in the kit before 24-‐oct-‐2014 Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential consumer of machine learning. CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some. http://ciml.info Machine Learning in Action, Peter Harrington, 2012 Added in the kit before 24-‐oct-‐2014 Chapter 1 and 7 are available for free on the publisher website http://www.manning.com/pharrington/MLiAchapter1sample.pdf machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 124 http://www.manning.com/pharrington/MLiAchapter7sample.pdf A Programmer's Guide to Data Mining, by Ron Zacharski, 2012 About This Book Before you is a tool for learning basic data mining techniques. Most data mining textbooks focus on providing a theoretical foundation for data mining, and as result, may seem notoriously difficult to understand. Don’t get me wrong, the information in those books is extremely important. However, if you are a programmer interested in learning a bit about data mining you might be interested in a beginner’s hands-‐on guide as a first step. That’s what this book provides. This guide follows a learn-‐by-‐doing approach. Instead of passively reading the book, I encourage you to work through the exercises and experiment with the Python code I provide. I hope you will be actively involved in trying out and programming data mining techniques. The textbook is laid out as a series of small steps that build on each other until, by the time you complete the book, you have laid the foundation for understanding data mining techniques. This book is available for download for free under a Creative Commons license (see link in footer). You are free to share the book, and remix it. Someday I may offer a paper copy, but the online version will always be free. http://guidetodatamining.com Artificial Intelligence, Foundations of Computational Agents by David Poole and Alan Mackworth, 2010 Added in the kit before 24-‐oct-‐2014 Artificial Intelligence: Foundations of Computational Agents is a book about the science of artificial intelligence (AI). The view we take is that AI is the study of the design of intelligent computational agents. The book is structured as a textbook but it is designed to be accessible to a wide audience. We wrote this book because we are excited about the emergence of AI as an integrated science. As with any science worth its salt, AI has a coherent, formal theory and a rambunctious experimental wing. Here we balance theory and experiment and show how to link them intimately together. We develop the science of AI together with its engineering applications. We believe the adage, "There is nothing so practical as a good theory." The spirit of our approach is captured by the dictum, "Everything should be made as simple as possible, but not simpler." We must build the science on solid foundations; we present the foundations, but only sketch, and give some examples of, the complexity required to build useful intelligent systems. Although the resulting systems the will be complex, the foundations and the building blocks should be simple. http://artint.info/html/ArtInt.html The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J. Friedman, 2009 Added in the kit before 24-‐oct-‐2014 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 125 During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting-‐-‐the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-‐negative matrix factorization and spectral clustering. There is also a chapter on methods for ``wide'' data (italics p bigger than n), including multiple testing and false discovery rates. http://statweb.stanford.edu/~tibs/ElemStatLearn/ Learning Deep Architecture for AI by Yoshua Bengio, 2009 Added in the kit before 24-‐oct-‐2014 Abstract Theoretical results suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g., in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state- of-the-art in certain areas. This monograph discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of singlelayer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks. http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf An Introduction to Information Retrieval by Christopher D. Manning Prabhakar Raghavan Hinrich Schütze, 2009 Added in the kit before 24-‐oct-‐2014 This book is the result of a series of courses we have taught at Stanford University and at the University of Stuttgart, in a range of durations including a single quarter, one semester and two quarters. These courses were aimed at early-‐stage graduate students in computer science, but we have also had enrollment from upper-‐class computer science undergraduates, as well as students from law, medical informatics, statistics, linguistics and various en-‐ gineering disciplines. The key machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 126 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! design principle for this book, therefore, was to cover what we believe to be important in a one-‐term graduate course on information retrieval. An additional principle is to build each chapter around material that we believe can be covered in a single lecture of 75 to 90 minutes. The first eight chapters of the book are devoted to the basics of information retrieval, and in particular the heart of search engines; we consider this material to be core to any course on information retrieval. … Chapters 9–21 build on the foundation of the first eight chapters to cover a variety of more advanced topics. http://nlp.stanford.edu/IR-‐book/pdf/irbookprint.pdf http://www-‐nlp.stanford.edu/IR-‐book/ Kernel Method in Machine Learning by Thomas Hofmann; Bernhard Schölkopf; Alexander J. Smola, 2008 Added in the kit before 24-‐oct-‐2014 We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data. We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data. https://archive.org/details/arxiv-‐math0701907 Introduction to Machine Learning, Alex Smola, S.V.N. Vishwanathan, 2008 Added in the kit before 24-‐oct-‐2014 Over the past two decades Machine Learning has become one of the main-‐ stays of information technology and with that, a rather central, albeit usually hidden, part of our life. With the ever increasing amounts of data becoming available there is good reason to believe that smart data analysis will become even more pervasive as a necessary ingredient for technological progress. The purpose of this chapter is to provide the reader with an overview over the vast range of applications which have at their heart a machine learning problem and to bring some degree of order to the zoo of problems. After that, we will discuss some basic tools from statistics and probability theory, since they form the language in which many machine learning problems must be phrased to become amenable to solving. Finally, we will outline a set of fairly basic yet effective algorithms to solve an important problem, namely that of classification. More sophisticated tools, a discussion of more general problems and a detailed analysis will follow in later parts of the book. http://alex.smola.org/drafts/thebook.pdf machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 127 Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 Added in the kit before 24-‐oct-‐2014 Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field, and together they have undergone substantial development over the past ten years. In particular, Bayesian methods have grown from a specialist niche to become mainstream, while graphical models have emerged as a general framework for describing and applying probabilistic models. Also, the practical applicability of Bayesian methods has been greatly enhanced through the development of a range of approximate inference algorithms such as variational Bayes and expectation propa-‐ gation. Similarly, new models based on kernels have had significant impact on both algorithms and applications. Chapter 8 – Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to the sum rule and the product rule. All of the probabilistic infer-‐ ence and learning manipulations discussed in this book, no matter how complex, amount to repeated application of these two equations. We could therefore proceed to formulate and solve complicated probabilistic models purely by algebraic ma-‐ nipulation. However, we shall find it highly advantageous to augment the analysis using diagrammatic representations of probability distributions, called probabilistic graphical models. These offer several useful properties: 1. They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models. 2. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph. 3. Complex computations, required to perform inference and learning in sophis-‐ ticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly. http://research.microsoft.com/en-‐us/um/people/cmbishop/PRML/pdf/Bishop-‐PRML-‐sample.pdf http://research.microsoft.com/en-‐us/um/people/cmbishop/prml/ Gaussian processes for Machine Learning, C. Rasmussen and C. Williams, 2006 Added in the kit before 24-‐oct-‐2014 Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-‐ learning community over the past decade, and this book provides a long-‐needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-‐contained, targeted at researchers and students in machine learning and applied statistics.The book deals with the supervised-‐learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-‐known machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 128 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! techniques from machine learning and statistics are discussed, including support-‐ vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-‐Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes. http://www.gaussianprocess.org/gpml/chapters/ Bayesian Machine Learning by Chakraborty, Sounak, 2005 Added in the kit before 24-‐oct-‐2014 PhD Thesis https://archive.org/details/bayesianmachinel00chak Machine Learning by Tom Mitchell, 2005 Added in the kit before 24-‐oct-‐2014 Policy on use:. You are welcome to download these chapters for your personal use, or for use in classes you teach. In return, I ask only two things: • Please do not re-‐post these documents on the internet. If you wish to make them available to your students, point them directly to this site. • If you find errors please send me email at Tom.Mitchell@cmu.edu I hope you find these useful! Tom Mitchell http://www.cs.cmu.edu/%7Etom/NewChapters.html http://www.cs.cmu.edu/%7Etom/mlbook-‐chapter-‐slides.html Information Theory, Inference, and Learning Algorithms, David McKay, 2003 Added in the kit before 24-‐oct-‐2014 This book is aimed at senior undergraduates and graduate students in Engineering, Science, Mathematics, and Computing. It expects familiarity with calculus, probability theory, and linear algebra as taught in a first-‐ or second-‐ year undergraduate course on mathematics for scientists and engineers. Conventional courses on information theory cover not only the beautiful theoretical ideas of Shannon, but also practical solutions to communica-‐ tion problems. This book goes further, bringing in Bayesian data modelling, Monte Carlo methods, variational methods, clustering algorithms, and neural networks. Why unify information theory and machine learning? Because they are two sides of the same coin. In the 1960s, a single field, cybernetics, was populated by information theorists, computer scientists, and neuroscientists, all studying common problems. Information theory and machine learning still belong together. Brains are the ultimate compression and communication systems. And the state-‐of-‐ the-‐art algorithms for both data compression and error-‐correcting codes use the same tools as machine learning. http://www.inference.phy.cam.ac.uk/itprnn/book.html https://archive.org/details/MackayInformationTheoryFreeEbookReleasedByAuthor machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 129 Free Book List Added in the kit before 24-‐oct-‐2014 E-‐Books for free online viewing and/or download http://www.e-‐booksdirectory.com/listing.php?category=284 Free resource book (need to sign in) Added in the kit before 24-‐oct-‐2014 There are too many machine learning resources on the internet, so much so that it can feel overwhelming. I have read the books and taken the courses and can give you good advice on where to start. Resources you can use to learn faster I have hand-‐picked the best machine learning… …books …websites …videos …university courses …software …competition sites These resources have been listed in a handy PDF that you can download now http://machinelearningmastery.com/machine-‐learning-‐resources/ Free ML ebooks on it-‐ebooks, but this website is controversial, please read stackoverflow before accessing to this website by yourself Added in the kit before 24-‐oct-‐2014 http://meta.stackoverflow.com/questions/255032/should-‐we-‐add-‐it-‐ebooks-‐info-‐to-‐the-‐stack-‐overflow-‐url-‐blacklist Wikipedia: Machine Learning, the Complete Guide Added in the kit before 24-‐oct-‐2014 This is a Wikipedia book, a collection of Wikipedia articles that can be easily saved, rendered electronically, and ordered as a printed book. For information and help on Wikipedia books in general, see Help:Books (general tips) and WikiProject Wikipedia-‐Books (questions and assistance). https://en.wikipedia.org/wiki/Book:Machine_Learning_-‐_The_Complete_Guide ISSUU Rediscover reading With over 19 million publications, Issuu is the fastest growing digital publishing platform in the world. Millions of avid readers come here every day to read the free publications created by enthusiastic publishers from all over the globe with topics in fashion, lifestyle, art, sports and global affairs to mention a few. And that's not all. We've also got a prominent range of independent publishers utilizing the Issuu network to reach new fans every day. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 130 Created by a bunch of geeks with an undying love for the publishing industry, Issuu has grown to become one of the biggest publishing networks in the industry. It's an archive, library and newsstand all gathered in one reading experience. http://issuu.com/search?q=%22machine+learning%22 Neural Networks, A Systematic Introduction by Raul Rojas We are now beginning to see good textbooks for introducing the subject to various student groups. This book by Rau ́l Rojas is aimed at advanced undergraduates in computer science and mathematics. This is a revised version of his German text which has been quite successful. It is also a valuable self-‐ instruction source for professionals interested in the relation of neural network ideas to theoretical computer science and articulating disciplines. The book is divided into eighteen chapters, each designed to be taught in about one week. The first eight chapters follow a progression and the later ones can be covered in a variety of orders. The emphasis throughout is on explicating the computational nature of the structures and processes and relating them to other computational formalisms. Proofs are rigorous, but not overly formal, and there is extensive use of geometric intuition and diagrams. Specific applications are discussed, with the emphasis on computational rather than engineering issues. There is a modest number of exercises at the end of most chapters. http://www.inf.fu-‐berlin.de/inst/ag-‐ki/rojas_home/documents/1996/NeuralNetworks/neuron.pdf Books -‐ Spanish Coming soon … Books -‐ German Coming soon … Books -‐ Italian Coming soon … Books -‐ French Coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 131 Books – Russian Pattern Recognition by А.Б.Мерков, 2011 http://www.recognition.mccme.ru/pub/RecognitionLab.html/slbook.pdf Algorithmic models of learning classification: rationale, comparison, selection, 2014 http://www.machinelearning.ru/wiki/images/c/c3/Donskoy14algorithmic.pdf More coming soon … Books -‐ Japanese Coming soon … Books -‐ Chinese Blog recommending useful books A blog written in Chinese which introduces and recommends many useful ML books (the books are mostly written in English). http://blog.csdn.net/pongba/article/details/2915005 Textbook for Statistics http://baike.baidu.com/subview/1724467/13114186.htm Introduction to Pattern recognition http://baike.baidu.com/view/3911812.htm Translated version of Machine Learning by Tom Mitchell: http://book.douban.com/subject/1102235/ Books -‐ Portuguese Coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 132 Presentation, Infographics and Documents -‐ English Meetup's Presentations https://skillsmatter.com/explore?content=skillscasts&location=&q=machine+learning Slides Slideshare.com http://www.slideshare.net/search/slideshow?searchfrom=header&q=machine+learning Slides.com http://slides.com/explore?search=machine%20learning Powershow.com http://www.powershow.com/search/presentations/machine-‐learning Speaker Deck https://speakerdeck.com/search?q=machine+learning Slides from Lectures Introduction to Artificial Intelligence, 2014, University of Waterloo https://www.student.cs.uwaterloo.ca/~cs486/syllabus.html Aprendizado de Maquina, Conceitos e definicoes by Jose Augusto Baranauskas http://dcm.ffclrp.usp.br/~augusto/teaching/ami/AM-‐I-‐Conceitos-‐Definicoes.pdf Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computação, UFF, 2010 http://www2.ic.uff.br/~bianca/aa/ More coming soon … Slides from Meetups NYC ML Meetup, 2014 Natural Language Processing in Investigative Journalism by Jonathan Stray http://www.scribd.com/doc/230605794/Natural-‐Language-‐Processing-‐in-‐Investigative-‐Journalism https://github.com/overview/overview-‐server/wiki/Visualization-‐Plugin-‐API Statistics with Doodles by Thomas Levine http://thomaslevine.com/!/statistics-‐with-‐doodles-‐2014-‐03/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 133 More coming soon … Slides from Conferences More coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 134 Conferences International Conference in Machine Learning (ICML) ICML, Beijing, China 2014 http://icml.cc/2014/ ICML, Atlanta, US 2013 http://icml.cc/2013/ http://techtalks.tv/icml/2013/ ICML, Edinburgh, UK 2012 http://icml.cc/2012/ http://techtalks.tv/icml/2012/orals/ http://techtalks.tv/icml_2012_representation_learning/ http://techtalks.tv/icml/2012/inferning2012/ http://techtalks.tv/icml/2012/object2012/ http://techtalks.tv/icml/2012/icml_colt_2012_tutorials/icml-‐2012-‐tutorial-‐on-‐prediction-‐belief-‐and-‐market/ ICML, Bellevue, US 2011 http://www.icml-‐2011.org http://techtalks.tv/icml-‐2011/ ICML, Haifa, Israel 2010 http://www.icml2010.org Full archive of ICML http://machinelearning.org/icml.html Machine Learning Conference Videos http://techtalks.tv/search/results/?q=machine+learning Annual Machine Learning Symposium 6th http://techtalks.tv/sixth-‐annual-‐machine-‐learning-‐symposium/ 8th http://www.nyas.org/Events/Detail.aspx?cid=2cc3521e-‐408a-‐460e-‐b159-‐e774734bcbea Archive http://www.nyas.org/whatwedo/fos/machine.aspx machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 135 MLSS Machine Learning Summer Schools http://www.mlss.cc http://www.mlss2014.com/index.html Data Gotham 2012,2013 http://www.youtube.com/user/DataGotham machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 136 Meetup -‐ English 631 Machine Learning Meetup in the World http://machine-‐learning.meetup.com/ Data Science Weekly – List of Meetups List of Data Science Meetups: NYC, San Francisco, Washington DC, Boston, Chicago, Seattle, Denver, Austin, Atlanta, Toronto, Vancouver, London, Berlin, Paris, Amsterdam, Tel Aviv, Dubai, Delhi, Bangalore, Singapore, Sydney http://www.datascienceweekly.org/data-‐science-‐resources/data-‐science-‐meetups Other Meetups missing in Data Science Weekly London Machine Learning Meetup http://www.meetup.com/London-‐Machine-‐Learning-‐Meetup/ London Deep Learning Meetup http://www.meetup.com/Deep-‐Learning-‐London/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 137 Blog – English Data Science Weekly The Data Science Weekly Blog contains interviews to better understand how people are using Data and Data Science to change the world. http://www.datascienceweekly.org/blog Yann LeCun, Google+ My main research interests are Machine Learning, Computer Vision, Mobile Robotics, and Computational Neuroscience. I am also interested in Data Compression, Digital Libraries, the Physics of Computation, and all the applications of machine learning (Vision, Speech, Language, Document understanding, Data Mining, Bioinformatics). https://plus.google.com/+YannLeCunPhD/posts Igor Carron Blog Nuit Blanche is a blog that focuses on Compressive Sensing, Advanced Matrix Factorization Techniques, Machine Learning as well as many other engaging ideas and techniques needed to handle and make sense of very high dimensional data also known as Big Data. http://nuit-‐blanche.blogspot.co.uk KDD Community, Knowledge discovery and Data Mining KDD bringing together the data mining, data science and analytics community http://www.sigkdd.org/blog Kaggle Blog http://blog.kaggle.com Digg Digg is a news aggregator with an editorially driven front page, aiming to select stories specifically for the Internet audience such as science, trending political issues, and viral Internet issues. (source wikipedia) http://digg.com/search?q=machine+learning Feedly Found a site you like? Use the +feedly button to add it to your feedly reading list http://feedly.com/index.html#explore%2F%23Machine%20Learning Mlwave Learning Machine Learning ML Wave is a platform that talks about machine learning and data science. It was founded in 2014 by the Dutch Kaggle user Triskelion. http://mlwave.com FastML Machine Learning made easy machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 138 FastML probably grew out of a frustration with papers you need a PhD in math to understand and with either no code or half-‐baked Matlab implementation of homework-‐assignment quality. We understand that some cutting-‐edge researchers might have no interest in providing the goodies for free, or just no interest in such down-‐to-‐earth matters. But we don’t have time nor desire to become experts in every machine learning topic. Fortunately, there is quite a lot of good software with acceptable documentation. http://fastml.com Beating the Benchmark http://beatingthebenchmark.blogspot.co.uk YOU CANalytics Welcome to UCAnalytics.com, the idea behind this website is to explore the applications of advanced Analytics and data mining in business. Analytics is an effort to explore interesting but hidden patterns in data for business growth. This idea has inspired me to name the site • UCAnalytics: YOU CANalytics • UCAnalytics: YOU SEE Analytics • UCAnalytics: University for Analytics This is sort of like finding patterns in a cluster of clouds – a fun exercise. However, we will explore some serious business applications and usage of Analytics over here. A few topics including 1. Analytical Scorecard Development 2. Customer Segmentation to gain deeper knowledge of customer behaviour 3. Data mining and Big Data Analytics 4. Business Applications of Bayesian Statistics – Nate Silver has made Bayesian cool! 5. Challenges & Pitfalls in Business Forecasting – Time Series Modelling 6. Business Growth through right Design-‐of-‐Experiments 7. Business Growth & Risk Estimation through Analytical simulations Look forward to share my ideas and hear back from you. Roopam Upadhyay http://ucanalytics.com/blogs Trevor Stephens Blog http://trevorstephens.com Mozilla Hacks Mozilla Hacks is one of the key resources for people developing for the Open Web, talking about news and in-‐depth descriptions of technologies and features. https://hacks.mozilla.org/?s=machine+learning Banach's Algorithmic Corner, University of Warsaw This blog is maintained by members of Algorithmic group at University of Warsaw: http://corner.mimuw.edu.pl machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 139 DataCamp Blog http://blog.datacamp.com Natural Language Processing Blog, Hal Daume http://nlpers.blogspot.co.uk Maxim Milakov Blog I am a researcher in machine learning and high-‐performance computing. I designed and implemented nnForge -‐ a library for training convolutional and fully connected neural networks, with CPU and GPU (CUDA) backends. You will find my thoughts on convolutional neural networks and the results of applying convolutional ANNs for various classification tasks in the Blog. http://www.milakov.org Alfonso Nieto-‐Castanon Blog I work on the field of computational neuroscience, and my background is on neuroscience (Ph.D. Cognitive and Neural Systems, Boston University) and engineering (B.S./M.S. Telecommunication Engineering, Universidad de Valladolid). My areas of specialization are modeling and statistics, fMRI analysis methods, and signal processing. http://www.alfnie.com/home Persontyle Blog Every object on earth is generating data, including our homes, our cars and yes even our bodies. Data is the by-‐product of our new digital existence. Data has the potential to revolutionize the way business, government, science, research, and healthcare are carried out. Data presents unprecedented opportunities to those who have the skills and expertise to use it to unveil patterns, insights, signals and predict trends which was never possible before. In massively connected data driven world, it is imperative that the workforce of today and tomorrow is able to understand what data is available and use scientific methods to analyze and interpret it. We’re here to help you learn and apply the art and science of turning data into meaningful insights and intelligent predictions http://www.persontyle.com/blog/ Analytics Vidhya Learn everything about Analytics Welcome to Analytics Vidhya! For those of you, who are wondering what is “Analytics Vidhya”, “Analytics” can be defined as the science of extracting insights from raw data. The spectrum of analytics starts from capturing data and evolves into using insights / trends from this data to make informed decisions. “Vidhya” on the other hand is a Sanskrit noun machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 140 meaning “Knowledge” or “Clarity on a subject”. Knowledge, which has been gained through reading literature or through self practice / experimentation. Through this blog, I want to create a passionate community, which dedicates itself in study of Analytics. I share my learning and tips on Analytics through this blog. http://www.analyticsvidhya.com/blog/ Bugra Akyildiz's Blog Great Blog (Notes) both theoretical and practical I work as a Machine Learning/NLP Engineer at CB Insights where I apply machine learning algorithms to NLP problems. I received B.S from Bilkent University and M.Sc from New York University focusing signal processing and machine learning. http://bugra.github.io Data origami 8 great data blogs to follow https://www.dataorigami.net/blogs/great-‐data-‐blogs Rasbt’s Blog A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks Links to useful resources https://github.com/rasbt/pattern_classification#links-‐to-‐useful-‐resources Gilles Louppe's Blog Understanding Random Forest, PhD Thesis https://github.com/glouppe/phd-‐thesis/blob/master/thesis.pdf AI Topics AITopics is a mediated information portal provided by AAAI (The Association for the Advancement of Artificial Intelligence), with the goal of communicating the science and applications of AI to interested people around the world. Contents ! Good Starting Places ! General Readings ! Organizations ! Educational Resources ! Hardware and Software ! Competitions ! News ! Videos ! Podcasts ! Classic Articles & Books http://aitopics.org/topic/machine-‐learning machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 141 AI International This international AI site is designed to help you locate AI research efforts in your country or region. Pages on this site will link to local AI societies, universities, labs, and other research efforts. http://www.aiinternational.org/index.html Joseph Misiti's Blog machine-‐learning + applied mathematics + django + hadoop. Co-‐Founder of @socialq. https://github.com/josephmisiti https://medium.com/@josephmisiti MIRI, Machine Intelligence Research Institute The mathematics of safe machine intelligence MIRI’s mission is to ensure that the creation of smarter-‐than-‐human intelligence has a positive impact. We aim to make intelligent machines behave as we intend even in the absence of immediate human supervision. Much of our current research deals with reflection, an AI’s ability to reason about its own behavior in a principled rather than ad-‐hoc way. We focus our research on AI approaches that can be made transparent (e.g. principled decision algorithms, not genetic algorithms), so that humans can understand why the AIs behave as they do. http://intelligence.org/blog/ Kevin Davenport Data Blog Added in the kit 04-‐Nov-‐2014 I'm a tech enthusiast interested in automation, machine learning, and conveying complex statistical models through visualization. Recent Posts Regularized Logistic Regression Intuition October 27, 2014 Dynamic Time-‐Series Modeling May 22, 2014 A Real World Introduction to Information Entropy April 21, 2014 The Cost Function of K-‐Means February 14, 2014 Mahalanobis Distance and Outliers December 3, 2013 Quick Look: Facebook’s Kaggle Competition October 21, 2013 Significance Magazine Contribution August 28, 2013 Absolute Deviation Around the Median August 8, 2013 My Trip to Spain: The R User Conference 2013 July 23, 2013 Gradient Boosting: Analysis of LendingClub’s Data July 4, 2013 Shiny Server on CentOS June 29, 2013 Data imputation I June 12, 2013 ggplot2 graphics in a loop April 30, 2013 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 142 Predicting Dichotomous Outcomes I April 14, 2013 Data visualization with R and ggplot2 March 28, 2013 Samsung Phone Data Analysis Project March 19, 2013 Layman’s Random Forests March 19, 2013 Commercial Machine Learning Algorithms? March 4, 2013 Simple Count Probability February 24, 2013 Common & special cause variation: Part 1 February 13, 2013 Unknown Variance Two-‐Tailed Test of Population Mean February 11, 2013 Tidy Data January 31, 2013 http://kldavenport.com Alexandre Passant's Blog Added in the kit 30-‐Oct-‐2014 I'm a hacker, researcher, and entrepreneur. I'm passionate about the Web and I love when smart algorithms and architectures power beautiful and useful products. I'm co-‐founder of MDG Web (http://mdg.io), a music-‐tech start-‐up based in Dogpatch Labs Dublin and focusing on the music discovery field. We're building seevl (http://seevl.fm), a free, unlimited and targeted music discovery platform available as a standalone app and a Deezer app. We also work with industry stakeholders to let hem promote their content on streaming platforms through their own branded apps. I was previously a Research Fellow and Unit Leader at DERI (http://deri.ie), the world's largest Web 3.0 R&D lab, leading high-‐impact projects with partners such as Google, Cisco, and more, on the Social / Semantic / Sensor Web, with a focus on Knowledge Representation and Management, Personalisation, Privacy, Distributed Systems, and Recommender Systems. Overall, I’m trying to make the Web a better place, and I’m having fun doing it. http://apassant.net Daniel Nouri’s Blog Using convolutional neural nets to detect facial keypoints tutorial, Daniel Nouri's Blog This is a hands-‐on tutorial on deep learning. Step by step, we'll go about building a solution for the Facial Keypoint Detection Kaggle challenge. The tutorial introduces Lasagne, a new library for building neural networks with Python and Theano. We'll use Lasagne to implement a couple of network architectures, talk about data augmentation, dropout, the importance of momentum, and pre-‐training. Some of these methods will help us improve our results quite a bit. I'll assume that you already know a fair bit about neural nets. That's because we won't talk about much of the background of how neural nets work; there's a few of good books and videos for that, like the Neural Networks and Deep Learning online book. Alec Radford's talk Deep Learning with Python's Theano library is a great quick introduction. Make sure you also check out Andrej Karpathy's mind-‐blowing ConvNetJS Browser Demos. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 143 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! http://danielnouri.org/notes/2014/12/17/using-‐convolutional-‐neural-‐nets-‐to-‐ detect-‐facial-‐keypoints-‐tutorial/ Yvonne Rogers Blog Yvonne Rogers is a Professor of Interaction Design, the director of UCLIC and a deputy head of the Computer Science department at UCL. Her research interests are in the areas of ubiquitous computing, interaction design and human-‐computer interaction. A central theme is how to design interactive technologies that can enhance life by augmenting and extending everyday, learning and work activities. This involves informing, building and evaluating novel user experiences through creating and assembling a diversity of pervasive technologies. http://www.interactiveingredients.com Igor Subbotin's Blog (Both in English & Russian) (Huge list of resources) 153 abonnés|56 448 consultations (02-‐Jan-‐2015) Data science digest #30 (22-‐ 28 December 2014) http://igorsubbotin.blogspot.ru http://igorsubbotin.blogspot.ru/2014/12/data-‐science-‐digest-‐30-‐eng.html Sebastian Raschka GitHub Repository & Blog (Great Resources, everything you need is there!) https://github.com/rasbt http://sebastianraschka.com Popular Science Website http://www.popsci.com/find/machine%20learning HOW MICROSOFT'S MACHINE LEARNING IS BREAKING THE GLOBAL LANGUAGE BARRIER Earlier this week, roughly 50,000 Skype users woke up to a new way of communicating over the Web-‐based phone-‐ and video-‐calling platform, a feature that could’ve been pulled straight out of Star Trek. The new function, called Skype Translator, translates voice calls between different languages in realtime, turning English to Spanish and Spanish back into English on the fly. Skype plans to incrementally add support for more than 40 languages, promising nothing short of a universal translator for desktops and mobile devices. The product of more than a decade of dedicated research and development by Microsoft Research (Microsoft acquired Skype in 2011), Skype Translator does what several other Silicon Valley icons—not to mention the U.S. Department of Defense— have not yet been able to do. To do so, Microsoft Research (MSR) had to solve some major machine learning problems while pushing technologies like deep neural networks into new territory. http://www.popsci.com/how-‐microsofts-‐machine-‐learning-‐breaking-‐language-‐barrier machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 144 Max Woolf's Blog Max Woolf is a Software QA Engineer living and working in the San Francisco Bay Area for over 2 years. He graduated from Carnegie Mellon University in 2012 with a degree in Business Administration, concentrating in Computing and Information Technology. In his spare time, Max uses Python to gather data from public APIs and ggplot2 to make pretty charts from that data. Max also comments on technology blogs rather frequently. http://minimaxir.com Rasmus Bååth's Research Blog I’m a phd student at Lund University Cognitive Science in Sweden. My main research interest is music cognition and especially rhythm perception and production. I’m also interested in statistics and statistical computing using R. My blog is syndicated on R-‐bloggers and StatsBlogs two great sites if you are interested in R and statistics. Everything published on my blog is licensed under a Creative Commons Attribution 4.0 International License. I also run a drinks blog over at groggbloggen.se, it’s in Swedish but focuses on minimalist drinks with only two ingrediets (which are called grogs in Sweden) so you should be able to figure it out! :) I believe that if you haven’t tried using Bayesian statistics you’re really missing out on something. Why not do some Bayesian statistics right now in the browser and try my Bayesian “t-‐test” demo featuring MCMC in javascript! http://www.sumsar.net Flowing Data's Blog About The greatest value of a picture is when it forces us to notice what we never expected to see. —John W. Tukey. Exploratory Data Analysis. 1977. FlowingData explores how statisticians, designers, data scientists, and others use analysis, visualization, and exploration to understand data and ourselves. As for me, I'm Dr. Nathan Yau, PhD, but you can call me Nathan. My dissertation was on personal data collection and how we can use visualization in the everyday context. That expands to more general types of data and visualization and design for a growing audience I've also written a couple of books on how to visualize data, and the series is growing. http://flowingdata.com Genetic algorithm walkers http://flowingdata.com/2015/01/16/genetic-‐algorithm-‐walkers/ machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 145 Miscellaneous Allen Institute for Artificial Intelligence (AI2) MISSION The core mission of The Allen Institute for Artificial Intelligence (AI2) is to contribute to humanity through high-‐impact AI research and engineering. We will do this by constructing AI systems with reasoning, learning and reading capabilities. Please see the New York Times Profile of AI2. http://allenai.org/index.html https://www.youtube.com/channel/UCEqgmyWChwvt6MFGGlmUQCQ?spfreload=10 http://www.nytimes.com/2014/12/16/science/paul-‐allen-‐adds-‐oomph-‐to-‐ai-‐pursuit.html?_r=0 Artificial General Intelligence (AGI) Society This channel contains videos from the Artificial General Intelligence Society. The AGI Society organizes a yearly conference and occasional summer school. Artificial General Intelligence (AGI) is an emerging field aiming at the building of “thinking machines”; that is, general-‐purpose systems with intelligence comparable to that of the human mind (and perhaps ultimately well beyond human general intelligence). While this was the original goal of Artificial Intelligence (AI), the mainstream of AI research has turned toward domain-‐dependent and problem-‐ specific solutions; therefore it has become necessary to use a new name to indicate research that still pursues the “Grand AI Dream”. Similar labels for this kind of research include “Strong AI”, “Human-‐level AI”, etc. https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA?spfreload=10 http://www.agi-‐society.org AUAI, Association for Uncertainty in Artificial Intelligence About AUAI The Association for Uncertainty in Artificial Intelligence is a non-‐profit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty. The next UAI conference is the 30th conference, UAI-‐2015 in Amsterdam, The Netherlands, on July 12-‐16, 2015. Join our Facebook group or add yourself to the UAI Mailing list to keep updated on announcements and relevant AI news. Principles and applications developed within the UAI community have been at the forefront of research in Artificial Intelligence. The UAI community and annual meeting have been primary sources of advances in graphical models for representing and reasoning with uncertainty. http://www.auai.org machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 146 Blog -‐ Spanish Coming soon … Blog -‐ Italian Coming soon … Blog -‐ German Coming soon … Blog -‐ French L'ATELIER's News L'Atelier, cellule de veille de BNP Paribas depuis plus de 30 ans. BNP ParibasL'Atelier est implanté dans trois territoires majeurs de l'innovation (USA, Chine, Europe) pour repérer, conseiller et accompagner les entreprises. La cellule de veille s’appuie sur quatre activités : le Média, qui réalise une veille partagée sur ses différents supports (site, radio, médias sociaux) ; les Evénements, qui permettent l’échange autour de problématiques innovantes, le Conseil en stratégie numérique, qui replace les innovations détectées dans le contexte des entreprises et des métiers. Enfin, L'Atelier Lab rapproche entrepreneurs innovants et grandes entreprises, pour les aider à concevoir ensemble de nouveaux produits et services numériques. http://www.atelier.net/search/apachesolr_search/machine%20learning Blog -‐ Russian Igor Subbotin's Blog (Both in English & Russian) (Huge list of resources) 153 abonnés|56 448 consultations (02-‐Jan-‐2015) Data science digest #30 (22-‐ 28 December 2014) http://igorsubbotin.blogspot.ru http://igorsubbotin.blogspot.ru/2014/12/data-‐science-‐digest-‐30-‐eng.html More coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 147 Blog -‐ Japanese Coming soon … Blog -‐ Chinese Coming soon … Blog -‐ Portuguese Coming soon … Journals -‐ English Journal of Machine Learning Research, MIT Press http://jmlr.org Machine Learning Journal (last article could be downloaded for free) http://link.springer.com/journal/10994 Machine Learning (Theory) This is an experiment in the application of a blog to academic research in machine learning and learning theory by John Langford. Exactly where this experiment takes us and how the blog will turn out to be useful (or not) is one of those prediction problems we so dearly love in machine learning. http://hunch.net List of Journals on Microsoft Academic Research website http://academic.research.microsoft.com/RankList?entitytype=4&topDomainID=2&subDomainID=6&last=0&start=1&end=10 0 Wired magazine http://www.wired.com/tag/machine-‐learning/ Data Science Central Data Science Central is the industry's online resource for big data practitioners. From Analytics to Data Integration to Visualization, Data Science Central provides a community experience that includes a robust editorial platform, social interaction, forum-‐based technical support, the latest in technology, tools and trends and industry job opportunities. http://www.datasciencecentral.com machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 148 Journals – Spanish Coming soon … Journals – German Coming soon … Journals – Italian Coming soon … Journals – French Coming soon … Journals – Russian Coming soon … Journals – Japanese Coming soon … Journals – Chinese Coming soon … Journals -‐ Portuguese Coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 149 Forum, Q&A -‐ English Data Tau Hacker News for Data Scientists Great website with a lot of really good and leading edge information! Respect the user’s privacy by do not asking any personal information or email! Remark: machinelearningsalon.org is using standard templates for forums which are provided by its website hosting system, but machinelearningsalon.org is looking forward to do the same than DataTau.com! http://www.datatau.com Hacker News Great website like datatau.com but less dedicated to Machine Learning! Respect the user’s privacy by do not asking any personal information or email! https://news.ycombinator.com Metaoptimize Where scientists ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization! http://metaoptimize.com/qa/ Kaggle Forums 44,032 posts in 8,087 topics in 439 forums. (source 4th June 2014) https://www.kaggle.com/forums Reddit in English News, Research Papers, Videos, Lectures, Softwares and Discussions on: • Machine Learning • Data Mining • Information Retrieval • Predictive Statistics • Learning Theory • Search Engines • Pattern Recognition • Analytics http://www.reddit.com/r/MachineLearning/ Beginners: Please have a look at our FAQ and Link-‐Collection http://www.reddit.com/r/MachineLearning/wiki/index Cross validated Stack Exchange Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It's 100% free, no registration required. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 150 http://stats.stackexchange.com Open data Stack Exchange Open Data Stack Exchange is a question and answer site for developers and researchers interested in open data. It's 100% free, no registration required. http://opendata.stackexchange.com Data Science Beta Stack Exchange Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It's 100% free, no registration required. http://datascience.stackexchange.com Quora Quora is your best source for knowledge. Why do I need to sign in? Quora is a knowledge-‐sharing community that depends on everyone being able to pitch in when they know something. http://www.quora.com/Machine-‐Learning Machine Learning Impact Forum Welcome! Please contribute your ideas for what challenges we might aspire to solve, changes in our community that can improve machine learning impact, and examples of machine learning projects that have had tangible impact. http://mlimpact.com Forum, Q&A -‐ Spanish Coming soon … Forum, Q&A -‐ German Coming soon … Forum, Q&A -‐ Italian Coming soon … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 151 Forum, Q&A -‐ French Coming soon … Forum, Q&A -‐ Russian Reddit in Russian http://www.reddit.com/r/MachineLearning_Ru http://www.reddit.com/r/MachineLearning_Ru/comments/249f7x/meta_коллекция_полезных_ресурсов_и_ссылок_faq/ Habrahabr.ru Forum (in Russian translated by Google Chrome) http://habrahabr.ru Some examples: Playing with genetic algorithms What is a genetic algorithm Why it works We formalize the problem with a random string An example of the algorithm Experiments with the classics Code and data Findings http://habrahabr.ru/post/246951/ PythonDigest -‐ 2014, the results of our work in figures and references The main purpose for which it was created digest creation aggregator of news and information, as a programming language python, and by branch or modules. During the existence of the digest collected approximately 5235 materials, translated and published in 1776 news. http://habrahabr.ru/post/247067/ More coming soon … Forum, Q&A – Portuguese Forum, Q&A – Chinese machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 152 Zhihu.com Machine Learning http://www.zhihu.com/search?q=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4 %B9%A0&type=question Data Mining http://www.zhihu.com/search?q=%E6%95%B0%E6%8D%AE%E6%8C%96%E6% 8E%98&type=question Artificial Intelligence http://www.zhihu.com/search?q=%E4%BA%BA%E5%B7%A5%E6%99%BA%E8 %83%BD&type=question Guokr.com Machine Learning http://www.guokr.com/search/all/?wd=%E6%9C%BA%E5%99%A8%E5%AD%A 6%E4%B9%A0 Data Mining http://www.guokr.com/search/all/?wd=%E6%95%B0%E6%8D%AE%E6%8C%9 6%E6%8E%98&sort=&term=True Artificial Intelligence http://www.guokr.com/search/all/?wd=%E4%BA%BA%E5%B7%A5%E6%99%B A%E8%83%BD&sort=&term=True More coming soon … Governmental Reports -‐ English Big Data report, Whitehouse, US http://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1_2014.pdf Fun -‐ English Founder of PhD Comics Jorge is the creator of "PHD Comics", the popular comic strip about life (or the lack thereof) in Academia. He is also the co-‐founder of PHDtv, a video science and discovery outreach collaborative, and a founding board member of Endeavor College Prep, a non-‐profit school for kids in East L.A. He earned his Ph.D. in Robotics machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 153 from Stanford University and was an Instructor and Research Associate at Caltech from 2003-‐2005. He is originally from Panama. http://jorgecham.com Companies using Machine Learning and Artificial Intelligence techniques will answer 3 questions, and their answers will be published for free on this website: 1-‐ Why is Machine Learning important to your Business? 2-‐ What Machine Learning algorithms and technologies are you using? 3-‐ What Machine Learning development could you forecast in the near future? machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 154 MACHINE LEARNING RESEARCH GROUPS: DRAFT, A LOT MORE TO COME SOON MACHINE LEARNING RESEARCH GROUPS in AMERICA, USA MIT Computer Science and Artificial Intelligence Lab The Computer Science and Artificial Intelligence Laboratory – known as CSAIL – is the largest research laboratory at MIT and one of the world’s most important centers of information technology research. CSAIL and its members have played a key role in the computer revolution. The Lab’s researchers have been key movers in developments like time-‐sharing, massively parallel computers, public key encryption, the mass commercialization of robots, and much of the technology underlying the ARPANet, Internet and the World Wide Web. CSAIL members (former and current) have launched more than 100 companies, including 3Com, Lotus Development Corporation, RSA Data Security, Akamai, iRobot, Meraki, ITA Software, and Vertica. The Lab is home to the World Wide Web Consortium (W3C), directed by Tim Berners-‐Lee, inventor of the Web and a CSAIL member. CSAIL research is focused on developing the architectures and infrastructures of tomorrow’s information technology, and on creating innovations that will yield long-‐term improvements in how people live and work. Lab members conduct research in almost all aspects of computer science, including artificial intelligence, the theory of computation, systems, machine learning, computer graphics, as well as exploring revolutionary new computational methods for advancing healthcare, manufacturing, energy and human productivity. http://www.csail.mit.edu Stanford University Artificial Intelligence Laboratory Welcome to the Stanford AI Lab Founded in 1962, The Stanford Artificial Intelligence Laboratory (SAIL) has been a center of excellence for Artificial Intelligence research, teaching, theory, and practice for over fifty years. Reading group We have several weekly reading groups where we present and discuss papers on various topics in machine learning, natural language processing, computer vision, etc. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 155 Autonomous Highway Driving A deep learning model outputs the location of lane markings and surrounding cars given only a single camera image. http://ai.stanford.edu http://ai.stanford.edu/courses/ Carnegie Mellon University Machine Learning Department The Machine Learning Department is an academic department within Carnegie Mellon University's School of Computer Science. We focus on research and education in all areas of statistical machine learning. Watch an interview with Tom Mitchell, Department Head: http://videolectures.net/mlas06_mitchell_itm/ http://www.ml.cmu.edu Noah's ARK Research Group, Carnegie Mellon University Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. (The research is formal; the group is informal.) As you may have guessed, our research focuses on problems of ambiguity and uncertainty in natural language processing, including morphology, syntax, semantics, translation, and behavioral/social phenomena observed through language—all viewed through a computational lens. http://www.ark.cs.cmu.edu Harvard University Intelligent Interactive Systems Group Intelligent Interactive Systems are fundamentally hard to design because they require intelligent technology that is well suited for people's abilities, limitations, and preferences; they also require entirely novel interactions that can give the user a predictable and reliable experience despite the fact that the underlying technology is inherently proactive, unpredictable, and occasionally wrong. Thus, design of successful intelligent interactive systems requires intimate knowledge of and ability to innovate in two very disparate areas: human-‐computer interaction and artificial intelligence or machine learning. Our projects span the full range from formal user studies to statistical machine learning. We have worked on developing new intelligent technologies to enable novel interactions (e.g., SUPPLE system) and on understanding the principles underlying how people interact with intelligent systems (e.g., the project on exploring the design space of adaptive user interfaces). Our Brain-‐Computer Interface project aims at developing a new set of interactions for efficiently machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 156 controlling complex applications, and we are also interested in building and studying complete applications. One particular area of inteterest is the ability-‐based user interfaces -‐-‐ an approach for adapting interactions to the individual abilities of people with impairments or of able-‐bodied people in unusual situations. http://iis.seas.harvard.edu http://iis.seas.harvard.edu/resources/ University of California, Berkeley Statistical Machine Learning Research Statement Statistical machine learning merges statistics with the computational sciences-‐-‐-‐ computer science, systems science and optimization. Much of the agenda in statistical machine learning is driven by applied problems in science and technology, where data streams are increasingly large-‐scale, dynamical and heterogeneous, and where mathematical and algorithmic creativity are required to bring statistical methodology to bear. Fields such as bioinformatics, artificial intelligence, signal processing, communications, networking, information management, finance, game theory and control theory are all being heavily influenced by developments in statistical machine learning. The field of statistical machine learning also poses some of the most challenging theoretical problems in modern statistics, chief among them being the general problem of understanding the link between inference and computation. Research in statistical machine learning at Berkeley builds on Berkeley's world-‐class strengths in probability, mathematical statistics, computer science and systems science. Moreover, by its interdisciplinary nature, statistical machine learning helps to forge new links among these fields. An education in statistical machine learning at Berkeley thus involves an immersion in the traditions of statistical science broadly defined, a thoroughgoing involvement in exciting applied problems, and an opportunity to help shape the future of statistics. http://www.stat.berkeley.edu/~statlearning/ UC Berkeley AMPLab, AMP: ALGORITHMS MACHINES PEOPLE People will play a key role in data-‐intensive applications – not simply as passive consumers of results, but as active providers and gatherers of data, and to solve ML-‐ hard problems that algorithms on their own cannot solve. With crowdsourcing, people can be viewed as highly valuable but unreliable and unpredictable resources, in terms of both latency and answer quality. They must be incentivized appropriately to provide quality answers despite varying expertise, diligence and even malicious behavior. The AMPLab is addressing these issues in all phases of the analytics lifecycle. https://amplab.cs.berkeley.edu Videos https://www.youtube.com/user/BerkeleyAMPLab/videos?spfreload=10 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 157 Berkeley Institute for Data Science The Berkeley Institute for Data Science (BIDS) was founded in fall 2013 to build on existing campus strengths with a multidisciplinary emphasis that aims to facilitate and enhance the development and application of cutting-‐edge data science techniques in the biological, physical, social and engineering sciences. The Institute aims to build on the many recent innovations in data science techniques so that they can be applied in effective ways to domain science challenges. BIDS brings together researchers across disciplines and enhances career paths for data scientists through a number of newly created Data Science Fellows positions, graduate student fellowships, boot-‐camps, special classes, and conferences of interest to the academic community and general public. The Institute’s initial support is provided by a 5-‐year $12.5 million grant from the Moore and Sloan Foundations together with significant support provided by UC Berkeley. The “Moore-‐Sloan Data Science Environment” also supports similar programs with shared goals and objectives at the University of Washington and New York University. http://vcresearch.berkeley.edu/DATASCIENCE/BIDS Data Science Lecture Series: Maximizing Human Potential Using Machine Learning-‐ Driven Applications https://www.youtube.com/channel/UCBBd3JxQl455JkWBeulc-‐9w?spfreload=10 Princeton University Department of Computer Science -‐ ARTIFICIAL INTELLIGENCE & MACHINE LEARNING Machine learning and computational perception research at Princeton is focused on the theoretical foundations of machine learning, the experimental study of machine learning algorithms, and the interdisciplinary application of machine learning to other domains, such as biology and information retrieval. Some of the techniques that we are studying include boosting, probabilistic graphical models, support-‐ vector machines, and nonparametric Bayesian techniques. We are especially interested in learning from large and complex data sets. Example applications include habitat modeling of species distributions, topic models of large collections of scientific articles, classification of brain images, protein function classification, and extensions of the Wordnet semantic network. http://www.cs.princeton.edu/research/areas/mlearn University of California, Los Angeles (UCLA) Research Laboratories and Groups Automated Reasoning Group (Adnan Darwiche) Biocybernetics Laboratory (Joe DiStefano) Center for Vision, Cognition, Learning and Art (Song-‐Chun Zhu) Cognitive Systems Laboratory (Judea Pearl) machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 158 Concurrent Systems Laboratory (Yuval Tamir) Digital Arithmetic and Reconfigurable Architecture Laboratory (Milos Ercegovac) ER: Embedded and Reconfigurable System Design (Majid Sarrafzadeh) Information and Data Management Group (multiple faculty) Internet Research Laboratory (Lixia Zhang) Laboratory for Embedded Collaborative Systems (LECS) (archived CENS documents) Laboratory for Advanced Systems Research (LASR) (Peter Reiher) MAGIX: Computer Graphics & Vision Laboratory (Demetri Terzopoulos) Multimedia Information System Technology Group & Laboratory (Alfonso Cardenas) Network Research Laboratory (Mario Gerla) Software Systems Group (multiple faculty) Vision Laboratory (Stefano Soatto) VLSI Architecture, Synthesis & Technology (VAST) Laboratory (Jason Cong) Web Information Systems Laboratory (Carlo Zaniolo) WiNG (Wireless Networking Group) (Songwu Lu) http://www.cs.ucla.edu/research/research-‐labs Cornwell University https://confluence.cornell.edu/display/ml/Home https://confluence.cornell.edu/display/ML/Courses University of Illinois at Urbana Champaign Machine Learning Research The Department of Computer Science at the University of Illinois at Urbana Champaign has several faculty members working in the area of machine learning, learning theory, explanation based learning, learning in natural language processing and data mining. In addition, many faculty members inside and outside the department whose primary research interests are in other areas have specific research projects involving machine learning in some way. http://ml.cs.illinois.edu California Institute of Technology, Caltech Department of Computing + Mathematical Science The Computing + Mathematical Sciences department pursues numerous research interests covering a wide array of application areas. We take full advantage of Caltech's unique interdisciplinary character by drawing on research expertise not only from our own department, but from throughout the Institute. Research efforts within the department evolve at a fast pace, and cover currently six discernible focus areas: • Discrete Differential Modeling • DNA Computing and Molecular Programming machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 159 • Perceptual and Machine Learning for Autonomous Systems • Rigorous Systems Research • Scientific Computing and Applied Analysis • Theory of Computation http://www.cms.caltech.edu/research/foci University of Washington Machine Learning UW is one of the world's top centers of research in machine learning. We are active in most major areas of ML and in a variety of applications like natural language processing, vision, computational biology, the Web, and social networks. Check out the links on the left to find out who's who and what's happening in ML at UW. And be sure to see our CSE-‐wide efforts in Big Data https://www.cs.washington.edu/research/ml/ "Big Data" Research and Education UW CSE is driving the "Big Data" revolution. Our traditional strength in data management (Magda Balazinska, Bill Howe, Dan Suciu), machine learning (Pedro Domingos), and open information extraction (Oren Etzioni, Dan Weld) has recently been augmented by key hires in machine learning (Emily Fox, Carlos Guestrin, Ben Taskar) and data visualization (Jeff Heer). Our efforts are coordinated with those of outstanding researchers in the University of Washington's top-‐ten programs in Statistics, Biostatistics, and Applied Mathematics, among others. Through the University of Washington eScience Institute (directed by Ed Lazowska) we are integrally involved in ensuring that researchers across the campus have access to cutting-‐edge approaches to data-‐ driven discovery. http://www.cs.washington.edu/research/bigdata Social Robotics Lab -‐ Yale University The members of our lab perform research over a diverse collection of topics. Though these projects approach social and developmental research from varied perspectives, they all share common themes. Robots provide an embodied, empirical testbed that allows for repeated validation. Robots also enable the use of social interactions as part of the modeled experimental environment, staying grounded in real-‐world perceptions, and appropriately integrating perceptual, motor, and cognitive skills. http://scazlab.yale.edu/publications/all-‐publications machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 160 Georgia Institute of Technology ML@GT http://ml.cc.gatech.edu University of Texas and Austin Machine Learning Research Group Machine learning is the study of adaptive computational systems that improve their performance with experience. The Machine Learning Research Group at UT Austin is led by Professor Raymond Mooney, and our research has explored a wide variety of issues in machine learning for over two decades. Our current research focuses primarily on natural language learning, statistical relational learning, transfer learning, and active learning. https://www.cs.utexas.edu/~ml/ University of Pennsylvania Penn Research in Machine Learning Current projects: • Structured Prediction • Bandit and Limited-‐Feedback Problems • Computation and Statistics • Online Learning, Sequential Prediction, Regret Minimization • Statistical Learning Theory http://priml.upenn.edu/Main/Research Columbia University Machine Learning @ Columbia The Columbia Machine Learning Lab pursues research in machine learning with applications in vision, graphs and spatio-‐temporal data. Funding provided by NSF. http://www.cs.columbia.edu/learning/ New York City University CILVR Lab and Center for Data Science The CILVR Lab (Computational Intelligence, Learning, Vision, and Robotics) regroups three faculty members, research scientists, postdocs, and students working on AI, machine learning, and a wide variety of applications, notably computer perception, robotics, and health care. http://cilvr.nyu.edu/doku.php http://cds.nyu.edu machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 161 University of Chicago http://ml.cs.uchicago.edu The Johns Hopkins Center for Language and Speech Processing (CLSP) Archive Videos The Johns Hopkins Center for Language and Speech Processing (CLSP) is an interdisciplinary research and educational center focused on the science and technology of language and speech. Within its field, CLSP is recognized as one of the largest and most influential academic research centers in the world. The center conducts research across a broad spectrum of fundamental and applied topics including acoustic processing, automatic speech recognition, big data, cognitive modeling, computational linguistics, information extraction, machine learning, machine translation, and text analysis. http://clsp.jhu.edu/seminars/archive/video/ Miscellaneous IARPA Organization The Intelligence Advanced Research Projects Activity (IARPA) invests in high-‐ risk/high-‐payoff research programs that have the potential to provide our nation with an overwhelming intelligence advantage over future adversaries. http://www.iarpa.gov MACHINE LEARNING RESEARCH GROUPS in AMERICA, CANADA University of Toronto Machine Learning Lab Machine Learning @ UofT: The Department of Computer Science at the University of Toronto has several faculty members working in the area of machine learning, neural networks, statistical pattern recognition, probabilistic planning, and adaptive systems. In addition, many faculty members inside and outside the department whose primary research interests are in other areas have specific research projects involving machine learning in some way. http://learning.cs.toronto.edu/index.shtml http://learning.cs.toronto.edu/index.shtml?section=research The Fields Institute for Research in Mathematical Science, Canada The Fields Institute is a center for mathematical research activity -‐ a place where mathematicians from Canada and abroad, from business, industry and financial institutions, can come together to carry out research and formulate problems of machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 162 mutual interest. Our mission is to provide a supportive and stimulating environment for mathematics innovation and education. The Fields Institute promotes mathematical activity in Canada and helps to expand the application of mathematics in modern society. http://www.fields.utoronto.ca University of Waterloo Artificial Intelligence Research Group The Artificial Intelligence Group conducts research in many areas of artificial intelligence. The group has active interests in: models of intelligent interaction, multi-‐agent systems, natural language understanding, constraint programming, computational vision, robotics, machine learning, and reasoning under uncertainty. http://ai.uwaterloo.ca Course material http://ai.uwaterloo.ca/coursegr.html University of British Columbia Artificial Intelligence Research Groups Research Groups Computer Vision and Robotics: This is one of the most influential vision and robotics groups in the world. It is this group that created RoboCup and the celebrated SIFT features. The students in this group have won most of the AAAI Semantic Robot Challenges. The group has four active faculty: David Lowe, Jim Little, Alan Mackworth and Bob Woodham. Empirical Algorithmics: Led by Holger Hoos and Kevin Leyton Brown, this research group studies the empirical behaviour of algorithms and develops automated methods for improving algorithmic performance. Work by the empirical algorithmics group at UBC/CS has lead to substantial improvements in the state of the art in solving a wide range of prominent problems, including SAT, AI Planning and Mixed Integer Programming, and won numerous awards. Game Theory and Decision Theory: With Kevin Leyton Brown in the lead, this group has made significant contributions to algorithmic game theory, multiagent systems and mechanism design. David Poole also contributes to this group with his work on decision processes and planning. The research problems attacked by this group are therefore of great importance to e-‐commerce, auctions and advertising. Intelligent User Interfaces: With Cristina Conati and Giuseppe Carenini this group's goal is to investigate principles and techniques for preference modeling and elicitation, interactive decision making, user-‐adaptive information visualization and visual interfaces for text analysis. Knowledge Representation and Reasoning: David Poole leads this group with his foundational work on probabilistic first order logic and semantic science. This work on logical and probabilistic reasoning has been of profound and broad impact in the field of artificial intelligence (AI). Holger Hoos is also an important member of this machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 163 group with his work on satisfiability (SAT) and planning, which has won numerous awards and competitions. Machine Learning: With the guidance of Nando de Freitas and Kevin Murphy, this group's vision is to advance the frontier of knowledge in Bayesian inference, Monte Carlo algorithms, probabilistic graphical models, neural computation, personalization, mining web-‐scale datasets, prediction and optimal decision making. Natural Language Processing: Under the leadership of Giuseppe Carenini and Raymond Ng (Data Management and Mining Lab) this group's vision is to further our understanding of abstactive summarization, mining conversations and evaluative text, natural language generation. https://www.cs.ubc.ca/cs-‐research/lci/research-‐groups/machine-‐learning University of Montreal Machine Learning Lab The LISA (machine learning lab) aims towards improving our understanding of the principles that give rise to powerful learning and to intelligence, which will be important to make significant progress on learning algorithms and artificial intelligence (AI). Acquiring the kind of complex knowledge necessary for AI requires some form of learning, with the ability to discover hidden relationships and statistical structure that may be highly complex, with many interacting factors of variations explaining the observed high-‐dimensional data that sensors can provide. According to us this is the main challenge for machine learning and AI. Like the brain, deep learning algorithms are based on several levels of representation and processing, creating several levels of levels of abstraction. Compared to learning algorithms based on shallower architectures, deep learners have the potential to efficiently represent highly complex functions and distributions. We explore various learning algorithms for deep learning, based in particular on unsupervised pre-‐training (e.g., various kinds of Boltzmann machines and auto-‐encoders). Unsupervised pre-‐training allows to exploit very large quantities of mostly unlabeled examples (such as documents, images, and videos from the web). The learned representations capture the salient factors of variation (and invariances) implicitly present in the data, and can be exploited in the context of several supervised learning tasks (multi-‐task learning, self-‐taught learning, semi-‐supervised learning). http://lisa.iro.umontreal.ca/index_en.html University of Sherbrooke Intelligence artificielle Trois équipes oeuvrent dans cet axe de recherche; d'autres projets sont conduits par des chercheurs agissant à titre individuel. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 164 L'équipe de recherche dans le domaine des systèmes tutoriels intelligents ASTUS (Apprentissage par Système Tutoriel de l'Université de Sherbrooke) travaille autour des thèmes suivants: représentation des connaissances, modélisation de l'utilisateur, interactions humain-‐machine, psychologie de l'éducation et sciences cognitives. L'équipe de recherche dans le domaine du forage de données, Prospectus (Prospection de données à l'Université de Sherbrooke), travaille autour des thèmes suivants: prospection des données, prospection et modélisation des connaissances, reconnaissance de formes, segmentation et classification, méthodes d'intelligence artificielle non symboliques, réseaux de neurones et réseaux bayésiens, détection de structures et comportements latents. L'équipe de recherche dans le domaine de la planification en intelligence artificielle, PLANIART, travaille autour de thèmes suivant : planification de trajectoires, planification de comportements et reconnaissance de plans dans les jeux vidéo et en robotique mobile. La planification permet de décider quoi faire (décomposition des buts), comment le faire (allocation des ressources) et quand le faire (ordonnancement). http://www.usherbrooke.ca/informatique/recherche/domaines-‐de-‐recherche/intelligence-‐artificielle/ Centre de recherche sur les environnements intelligents Le Centre de Recherche sur les Environnements Intelligents (CREI) comprend 13 membres réguliers, 11 membres associés et plus d'une soixantaine d'étudiants gradués. Le CREI fédère 7 laboratoires dont les intérêts de recherche portent sur l'imagerie numérique, l’intelligence artificielle, la modélisation-‐validation et l’intelligence ambiante. Les chercheurs du CREI collaborent depuis des années, développant des applications en lien avec les environnements intelligents. http://www.usherbrooke.ca/crei/ University of Laval Machine Learning Research Group Selected Papers 2014 Luc Bégin, Pascal Germain, François Laviolette and Jean-‐Francis Roy. PAC-‐Bayesian Theory for Transductive Learning. International Conference on Artificial Intelligence and Statistics (AISTATS), 2014. [ pdf, supplementary, abstract | Poster | Source code ] 2013 Sébastien Giguère, François Laviolette, Mario Marchand, Denise Tremblay, Sylvain Moineau, Éric Biron and Jacques Corbeil. Improved design and screening of high bioactivity peptides for drug discovery. Under Review. [ pdf | Source Code ] Sébastien Giguère, Alexandre Drouin, Alexandre Lacoste, Mario Marchand, Jacques Corbeil, François Laviolette. MHC-‐NP: Predicting Peptides Naturally Processed by the MHC. Journal of Immunological Methods, 2013, vol. 400, p. 30-‐36. [ pdf ] machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 165 Pascal Germain, Amaury Habrard, François Laviolette, Emilie Morvant. A PAC-‐ Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers. In ICML 2013. [ bib | pdf | Source Code ] Sébastien Giguère, François Laviolette, Mario Marchand, Khadidja Sylla. Risk Bounds and Learning Algorithms for the Regression Approach to Structured Output Prediction. In ICML 2013. [ bib | pdf ] Maxime Latulippe, Alexandre Drouin, Philippe Giguere, and François Laviolette. Accelerated Robust Point Cloud Registration in Natural Environments through Positive and Unlabeled Learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2013) 2013. [ pdf ] Sébastien Giguère, Mario Marchand, François Laviolette, Jacques Corbeil, and Alexandre Drouin. Learning a Peptide-‐Protein Binding Affinity Predictor with Kernel Ridge Regression. BMC Bioinformatics, 2013, vol. 14, no 1, p. 82. [ bib | pdf ] http://graal.ift.ulaval.ca More to come … MACHINE LEARNING RESEARCH GROUPS in AMERICA, BRAZIL USP -‐ UNIVERSIDADE DE SÃO PAULO, Instituto de Ciências Matemáticas e de Computação http://www.icmc.usp.br/Portal/ UFRJ - Federal University of Rio de Janeiro UFMG - Federal University of Minas Gerais UFRGS - Federal University of Rio Grande do Sul Unicamp - University of Campinas Unesp - São Paulo State University UFSC - Federal University of Santa Catarina UnB - University of Brasília UFPR - Federal University of Paraná UFPE - Federal University of Pernambuco UNIFESP - Federal University of São Paulo UFSCAR- Federal University of São Carlos UERJ- Rio de Janeiro State University UFSM- Federal University of Santa Maria PUC-RIO- Pontifical Catholic University of Rio de Janeiro UFC- Federal University of Ceará UFBA- Federal University of Bahia UFF- Fluminense Federal University machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 166 PUCRS- Pontifical Catholic University of Rio Grande do Sul UFV- Federal University of Viçosa More coming soon … MACHINE LEARNING RESEARCH GROUPS in EUROPE, UK University College London The Centre for Computational Statistics and Machine Learning (CSML) The Centre for Computational Statistics and Machine Learning (CSML) spans three departments at University College London, Computer Science, Statistical Science, and the Gatsby Computational Neuroscience Unit. The Centre will pioneer an emerging field that brings together statistics, the recent extensive advances in theoretically well-‐founded machine learning, and links with a broad range of application areas drawn from across the college, including neuroscience, astrophysics, biological sciences, complexity science, etc. There is a deliberate intention to maintain and cultivate a plurality of approaches within the centre including Bayesian, frequentist, on-‐line, statistical, etc. http://www.csml.ucl.ac.uk CASA (Centre for Advanced Spatial Studies) Working Papers http://www.bartlett.ucl.ac.uk/casa/latest/publications/working-‐papers Example #198 A global inter-‐country economic model based on linked input-‐output models We present a new, flexible and extensible alternative to multi-‐regional input-‐output (MRIO) for modelling the global economy. The limited coefficient set of MRIO (technical coefficients only) is extended to include two new sets of coefficients, import ratios and import propensities. These new coefficient sets assist in the interaction of the new model with other social science models such as those of trade, migration, international security and development aid. The model uses input-‐output models as descriptions of the internal workings of countries' economies, and couples these more loosely than in MRIO using trade data for commodities and services from the UN. The model is constructed using a minimal number of assumptions, seeks to be as parsimonious as possible in terms of the number of coefficients, and is based to a great extent on empirical observation. Two new metrics are introduced, measuring sectors' economic significance and economic self-‐reliance per country. The Chinese vehicles sector is shown to be the world's most significant, and self-‐reliance is shown to be strongly correlated with population. The new model is shown to be equivalent to an MRIO under an additional assumption, allowing existing analysis techniques to be applied. http://www.bartlett.ucl.ac.uk/casa/publications/working-‐paper-‐198 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 167 Oxford University The Machine Learning Research Group in the Department of Engineering Science The Machine Learning Research Group is a sub-‐group within Information Engineering (Robotics Research Group) in the Department of Engineering Science of the University of Oxford. We are interested in probabilistic reasoning applied to problems in science, engineering and computing. We use the tools of statistical, and in particular Bayesian, inference to deal rationally with uncertainty and information in a number of domains including astronomy, biology, finance, image & signal processing and multi-‐agent systems, as well as researching the theory of Bayesian modelling and inference. http://www.robots.ox.ac.uk/~parg/doku.php?id=home Machine Learning research in the Department of Computer Science Machine Learning research in the Department of Computer Science evolves along the following directions Deep learning Large scale machine learning and big data Random forests and ensemble methods Proabilistic graphical models Bayesian optimisation Reinforcement learning Monte Carlo methods and randomised algorithms. Applications to control, games, language understanding, computer vision, speech, time series, and all types of structured and unstructured data. The group is part of wider Machine Learning initiative at Oxford, which includes researchers in statistics (Yee Whye Teh, Arnaud Doucet, Chris Holmes) and information engineering (Michael Osborne,Steve Roberts,Frank Wood) http://www.cs.ox.ac.uk/activities/machlearn/ Imperial College Machine Learning Group Transforming Big Data into Knowledge The Machine Learning Group is a cross-faculty network of Imperial College’s Department of Computing. We embrace research at the interface of machine learning, artificial intelligence and its Big Data applications. Research With an ever-increasing use of Internet, digital devices and science, tremendous amount of data encapsulating valuable knowledge have become available. We reflect this impact in the many vibrant facets of this field from automated reasoning to probabilistic inference, from creative and affective computing to human-computer interaction, from machine vision to neurotechnology, from machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 168 bioinformatics to medical & economic applications. Broadly members of the group belong to at least one of the two pillars of Machine Learning: • Data-level machine learning to support feature extraction from data (“Big Data”) • Knowledge-level machine learning and knowledge representation to extract readable and insightful relational knowledge which supports humanunderstandable machine inference At the data-level, ongoing research focuses on applying a wide variety of featurebased machine learning techniques in key application areas. Notable recent successes in these areas include the application of machine learning to medical imaging of the brain and heart (Rueckert), human emotions and social signals (Pantic, Zafeiriou), robotic vision (Davison), autonomous systems (Deisenroth), medical applications (Gillies), computational neuroscience and Brain-MachineInterfaces (Faisal). At the knowledge-level, our key expertise lies in Relational and First-Order Logic Learning. Past research had major impact in scientific discovery in biological prediction tasks (Muggleton), security and semi-automated software engineering (Russo). Moreover, the closely related areas of smart analysis of biological or economic network topologies (Przulj) and robust systems optimisation (Parpas) and scalable data analytics (Pietzuch). http://wp.doc.ic.ac.uk/mlg/ The Data Science Institute The Data Science Institute at Imperial College is being established to conduct research on the foundations of data science by developing advanced theory, technology and systems that will contribute to the state-‐of-‐the-‐art in data science and big data, and support data-‐driven research at Imperial and beyond. The Institute will empower Imperial and its partners to collaborate in the pursuit of world class data-‐driven innovation. http://www3.imperial.ac.uk/data-‐science The University of Edinburgh, Institute for Adaptive and Neural Computation http://www.anc.ed.ac.uk/machine-‐learning/ Cambridge University We are a part of the Computational and Biological Learning Laboratory located in the Department of Engineering at the University of Cambridge. The research in our group is very broad, and we are interested in all aspects of machine learning. Particular strengths of the group are in Bayesian approaches to modelling and inference in statistical applications. The type of work we do can range from studying fundamental concepts in applied Bayesian statistics, all the way to getting our algorithms to perform competitively against the state-‐of-‐the-‐art in big-‐data applications. We also work in a broad range of application domains, including neuroscience, bioinformatics, finance, social networks, and physics, just to name a machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 169 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! About Us few, and we actively seek to collaborate with other groups within the Department of Engineering, throughout the university as a whole, and with other groups within the UK and around the world. If you are interested in finding out more about our research, please visit our Publications page, or visit the individual research pages of our group members. http://mlg.eng.cam.ac.uk Queen Mary University of London Centre for Intelligent Sensing I am delighted to introduce you to the Centre for Intelligent Sensing (CIS). CIS is a focal point for research in Intelligent Sensing at Queen Mary University of London. The Centre focuses on breakthrough innovations in computational intelligence that will have a major impact in transforming the way humans and machines utilise a variety of sensor inputs for interpretation and decision making. The Centre gathers 33 academics with expertise in all aspects of intelligent sensing – from the design and building of the physical sensors to the mathematical and computational challenges of extracting key information from real-‐time streams of high-‐dimensional data acquired by networks of sensors. The legal, ethical and social implications of these processes are also addressed. CIS researchers have an outstanding international reputation in camera and sensor networks, image and signal processing, computer vision, data mining, pattern recognition, machine learning, bio-‐inspired computing, human-‐computer interaction, affective computing and social signal processing. The Centre also provides post-‐graduate research and teaching in Intelligent Sensing, and is responsible for the MSc programme in Computer Vision. I do hope that you will enjoy reading this brochure and learning more about who we are and how the research we do helps to address important societal challenges. I also invite you to keep up to date with our activities by following us on Twitter @intelsensing and to enjoy our research videos at http://cis.eecs.qmul.ac.uk. Professor Andrea Cavallaro Director http://cis.eecs.qmul.ac.uk Videos https://www.youtube.com/user/intelsensing/feed?spfreload=10 ICRI, The Intel Collaborative Research Institute The Intel Collaborative Research Institute is concerned with how to enhance the social, economic and environmental well being of cities by advancing compute, communication and social constructs to deliver innovations in system architecture, algorithms and societal participation. http://www.cities.io machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 170 MACHINE LEARNING RESEARCH GROUPS in EUROPE, FRANCE Magnet, MAchine learninG in information NETworks, INRIA, France The Magnet project aims to design new machine learning based methods geared towards mining information networks. Information networks are large collections of interconnected data and documents like citation networks and blog networks among others. For this, we will define new structured prediction methods for (networks of) texts based on machine learning algorithms in graphs. Such algorithms include node classification, link prediction, clustering and probabilistic modeling of graphs. Envisioned applications include browsing, monitoring and recommender systems, and more broadly information extraction in information networks. Application domains cover social networks for cultural data and e-‐ commerce, and biomedical informatics. https://team.inria.fr/magnet/ Sierra Team -‐ Ecole Normale Superieure , CNRS, INRIA SIERRA is based in the Laboratoire d'Informatique de l'École Normale Superiéure (CNRS/ENS/INRIA UMR 8548) and is a joint research team between INRIA Rocquencourt, École Normale Supérieure de Paris and Centre National de la Recherche Scientifique. We follow four main research directions: Supervised learning: This part of our research focuses on methods where, given a set of examples of input/output pairs, the goal is to predict the output for a new input, with research on kernel methods, calibration methods, structured prediction, and multi-‐task learning. Unsupervised learning: We focus here on methods where no output is given and the goal is to find structure of certain known types (e.g., discrete or low-‐ dimensional) in the data, with a focus on matrix factorization, statistical tests, dimension reduction, and semi-‐supervised learning. Parsimony: The concept of parsimony is central to many areas of science. In the context of statistical machine learning, this takes the form of variable or feature selection. The team focuses primarily on structured sparsity, with theoretical and algorithmic contributions. Optimization: Optimization in all its forms is central to machine learning, as many of its theoretical frameworks are based at least in part on empirical risk minimization. The team focuses primarily on convex and bandit optimization. http://www.di.ens.fr/sierra/ ENS Ecole Normale Superieure The Computer Science Department of ENS (DI ENS) is both a teaching department and a research laboratory affiliated with CNRS and INRIA (UMR 8548). On the teaching side, the DI ENS trains students through its Pre-‐doctoral program and the Masters program (MPRI). On the research side, the research is structured into research groups. The DI ENS is member of the Fondation Sciences Mathématiques de Paris. machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 171 The Computer Services (SPI) and the Mathematics and Computer Science Library are common to the DI ENS and the Department of Mathematics and Applications (DMA). Teams of the Computer Science Department at École normale supérieure Antique — Static analysis by abstract interpretation (head: Xavier Rival) Cascade — Cryptography (head: David Pointcheval) Data — Signal Processing and Classification (head: Stéphane Mallat) Dyogene — Dynamics of Geometric Networks (head: Marc Lelarge) Parkas — Parallelism of Synchronous Kahn Networks (head: Marc Pouzet) Sierra — Machine Learning (head: Francis Bach) Talgo — Theory, Algorithms, topoLogy, Graphs, and Optimization (head: Claire Mathieu) Willow — Artificial Vision (head: Jean Ponce) http://www.di.ens.fr WILLOW Publications and PhD Thesis Our research is concerned with representational issues in visual object recognition and scene understanding. Our objective is to develop geometric, physical, and statistical models for all components of the image interpretation process, including illumination, materials, objects, scenes, and human activities. These models will be used to tackle fundamental scientific challenges such as three-‐dimensional (3D) object and scene modeling, analysis, and retrieval; human activity capture and classification; and category-‐level object and scene recognition. They will also support applications with high scientific, societal, and/or economic impact in domains such as quantitative image analysis in domains such as archaeology and cultural heritage conservation; film post-‐production and special effects; and video annotation, interpretation, and retrieval. Moreover, machine learning now represents a significant part of computer vision research, and one of the aims of the project is to foster the joint development of contributions to machine learning and computer vision, together with algorithmic and theoretical work on generic statistical machine learning. http://www.di.ens.fr/willow/publications/YearOnly/publications.html Laboratoire Hubert Curien UMR CNRS 5516, Machine Learning Group leader: Marc Sebban Machine learning is the sub-‐field of artificial intelligence and computer science that studies how machines can learn. A machine learns when it modifies its own behavior as the result of its past experience and performance. Because of this need to analyze the past experience, machine learning techniques are very related to data mining ones. The Machine Learning team is divided into two collaborating sub-‐ projects, one more specialised in statistical learning theory and one more specialised in data mining and information retrieval. In the first sub-‐project statistical learning theory, the precise focus is on: -‐ Metric Learning, machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 172 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! -‐ Transfert Learning and Domain Adaptation -‐ Machine Learning for Computer Vision Applications -‐ Machine Learning for Natural Language Processing In the data mining and information retrieval sub-‐project, the focus is on: -‐ Developing methods to efficiently mine structured data: documents, graph, social networks, etc., -‐ Modeling heterogeneous structured documents for information retrieval, -‐ Data Mining for Image and Video Analysis http://laboratoirehubertcurien.fr/spip.php?rubrique28 MACHINE LEARNING RESEARCH GROUPS in EUROPE, GERMANY Max Planck Institute for Intelligent Systems, Tübingen site Intelligent systems can optimise their structure and properties in order to successfully function within a complex, partially changing environment. Three sub-‐ areas – perception, learning and action – can be differentiated here. The scientists at the Max Planck Institute for Intelligent Systems are carrying out basic research and development of intelligent systems in all three sub-‐areas. Research expertise in the areas of computer science, material science and biology is brought together in one Institute, at two different sites. Machine learning, image recognition, robotics and biological systems will be investigated in Tübingen, while so-‐called learning material systems, micro-‐ and nanorobitics, as well as self-‐organisation will be explored in Stuttgart. Although the focus is on basic research, the Institute has a high potential for practical applications in, among other areas, robotics, medical technology, and innovative technologies based on new materials. http://www.mpg.de/1342929/intelligenteSystemeTuebingen BRML Research Lab, Institute of Informatics at the Technische Universität München Patrick van der Smagt's BRML is a collaborative research lab of fortiss-‐-‐an Institute at TUM; Chair for Robotics and Embedded Systems, Institute of Informatics at the Technische Universität München; and the DLR Institute of Robotics and Mechatronics. The heart of our inforfacious research is formed by machine learning. Within that, we focus on biomechanics and body-‐machine interfaces. We apply our methods to advanced rehabilitation and assistive robotics. http://brml.org HCI, Heidelberg Collaboratory for Image Processing, Universität Heidelberg The HCI is an "Industry on Campus" project established in the context of the German excellence initiative jointly by the University of Heidelberg and the following companies:... The HCI has been established in January, 2008 and moved to its new premises in March, 2008. The HCI consists of four chairs and one associate groups: -‐ Computer Vision(Ommer lab) machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 173 -‐ Digital Image Processing (Jähne lab) -‐ Image and Pattern Analysis (Schnörr lab) -‐ Image Processing and Modelling (Garbe lab) -‐ Multidimensional Image Processing (Hamprecht lab) The strategic concept of the HCI is built on the simple fact that basic problems in image processing are largely application-‐independent. The approximately 80 scientists working in the HCI conduct basic research with the aim of providing cutting-‐edge solutions to basic image analysis problems for applications in industry, environmental and life sciences. The HCI is part of the institutional strategy of the University of Heidelberg within the Excellence Initiative. http://hci.iwr.uni-‐heidelberg.de MACHINE LEARNING RESEARCH GROUPS in EUROPE, SWITZERLAND EPFL Ecole Polytechnique Federale de Lausanne, Switzerland Artificial Intelligence & Machine Learning The modern world is full of artificial, abstract environments that challenge our natural intelligence. The goal of our research is to develop Artificial Intelligence that gives people the capability to master these challenges, ranging from formal methods for automated reasoning to interaction techniques that stimulate truthful elicitation of preferences and opinions. Another aspect is characterizing human intelligence and cognitive science, with applications in human-computer interaction and computer animation. Machine Learning aims to automate the statistical analysis of large complex datasets by adaptive computing. A core strategy to meet growing demands of science and applications, it provides a data-driven basis for automated decision making and probabilistic reasoning. Machine learning applications at EPFL range from natural language and image processing to scientific imaging as well as computational neuroscience. http://ic.epfl.ch/intelligence-‐artificielle-‐et-‐apprentissage-‐automatique IDSIA: the Swiss AI Lab The Swiss AI Lab IDSIA (Istituto Dalle Molle di Studi sull'Intelligenza Artificiale) is a non-‐profit oriented research institute for artificial intelligence, affiliated with both the Faculty of Informatics of the Università della Svizzera Italiana and the Department of Innovative Technologies of SUPSI, the University of Applied Sciences of Southern Switzerland. We focus on machine learning (deep neural networks, reinforcement learning), operations research, data mining, and robotics. IDSIA researchers win nine international competitions Our neural networks research team has won nine international competitions in machine learning and pattern recognition. Follow the link to learn more about the methods that allowed us to achieve these results. http://www.idsia.ch machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 174 MACHINE LEARNING RESEARCH GROUPS in EUROPE, NETHERLANDS Machine Learning Research Groups in The Netherlands A large number of researchers and research groups are active in the broad area of machine learning, ranging from Bayesian inference, to robotics and neural networks. Collected is a brief overview, the researchers can be contacted for more information. http://www.mlplatform.nl/researchgroups/ MACHINE LEARNING RESEARCH GROUPS in EUROPE, POLAND University of Warsaw, Dept. of Mathematics, Informatics and Mechanics Algorithms group Our research The research of our group focuses on several branches of modern algorithmics and the underlying fields of discrete mathematics. The latter include combinatorics on words and on ordered sets, graph theory, formal languages, computational geometry, information theory, foundation of cryptography. The research on algorithms covers parallel and distributed algorithms, large scale algorithms, approximation and randomized algorithms, fixed-‐parameter and exponential-‐time algorithms, dynamic algorithms, radio algorithms, multi-‐party computations, and cryptographic protocols. http://zaa.mimuw.edu.pl more to come … MACHINE LEARNING RESEARCH GROUPS in ASIA, INDIA RESEARCH LABS, Department of Computer Science and Automation, IISc, Bangalore The department houses a number of research labs, each dedicated to a focused area of research. The lab members comprise faculty, students (both ME and research students), and dedicated project staff. The labs are usually equipped with specialized software and computing facilities, and carry out work on various projects in their area. http://www.csa.iisc.ernet.in/research/research-‐reslabs.php MLSIG: Machine Learning Special Interest Group, Indian Institute of Science The Machine Learning Special Interest Group (MLSIG) is a group of faculty and students at the Indian Institute of Science in Bangalore, who share interests in machine learning and related fields. The group enjoys the presence of several machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 175 outstanding faculty engaged in cutting-‐edge research on a variety of aspects of machine learning and related fields, ranging from theoretical foundations to new algorithms as well as several exciting applications; highly motivated PhD and Masters' research students who complement and expand the energy of the faculty; and close proximity and partnerships with a variety of industry research laboratories, both within Bangalore and outside the city. http://drona.csa.iisc.ernet.in/~mlcenter/ Indian Institute of Technology of Kanpur https://www.google.com/search?q=machine%20learning&domains=iitk.ac.in&sitesearch=www.iitk.ac.in&gws_rd=ssl More to come … MACHINE LEARNING RESEARCH GROUPS in ASIA, CHINA Peking University School of Electronics Engineering and Computer Science We have built strong cooperation with many famous academic organizations, e.g., University of California at Berkeley, University of California at Los Angeles, Stanford University, University of Illinois at Urbana-‐Champaign, Oxford University, University of Edinburgh, Paris High Division, University of Tokyo, Waseda University. These cooperation cover most of our research directions: from electronic communication, optical communication, to quantum communication; from computer hardware, software, to network; from micro-‐electromechanical system to nano techniques; from machine perception to machine intelligence. Center for Information Science Main Research Areas � Machine Vision Image processing, image and video compression, pattern recognition and machine learning, biometrics, 3-‐D visual informational processing. � Machine Audition Computational auditory models, speech signal processing, spoken language processing, natural language processing, intelligent human-‐ machine interaction. � Intelligent Information Systems Computational intelligence, multimedia resource organization and management, data mining and content-‐oriented massive information integration, analysis, processing and service. � Physiology and Psychology for Machine Perception Electro-‐physiology, psychophysics and neurophysiology of vision and audition, theories and methods of hearing rehabilitation. http://www.cis.pku.edu.cn/ http://eecs.pku.edu.cn/eecs_english/CnterInfoScience.shtml Institute of Computational Linguistics Main Research Areas machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 176 � Comprehensive Language Knowledge Databases, including large scale word-‐ level information database for the Chinese language. � Corpus based NLP, including large scale corpus processing and statistical models and theories. � Domain Knowledge Construction, including computational terminology and term database construction. � Multilingual Semantic Lexicons, focusing on the study of a Chinese concept dictionary. � Computer-‐aided Translation, focusing on translation methods for technical documents. � Information Retrieval, Extraction and Summarization, including various levels of docu ment processing such as document retrieval, topic extraction, summarization, and question answering. http://eecs.pku.edu.cn/index.aspx?menuid=5&type=articleinfo&lanmuid=84&infoid=232&language=cn http://eecs.pku.edu.cn/eecs_english/InstComputationalLinguistics.shtml PKU Real course online http://www.grids.cn/ Beijing University of Technology Beijing Key Lab of Multimedia and Intelligent Software Technology Artificial Intelligence and Knowledge Engineering The research fields in this direction include fundamental research of Knowledge Science and Knowledge Engineering, research and application of Data Mining and Machine Learning, and Knowledge-‐Based Computer Aided Animation Generation. In those fields, the laboratory has performed 8 programs from National Natural Science Foundation (including 1 subprogram of major research program of National Natural Science Foundation), 1 program from Key Programs in the National Science & Technology Pillar Program, 5 programs from 863 High-‐Tech Programs, 3 programs from Beijing Natural Science Foundation, and won the second prize Advanced Science & Technology Award of Beijing twice. http://bjut.edu.cn/bjut_en/detail.jsp?articleID=4171 University of Science and Technology of China, USTC http://en.wikipedia.org/wiki/University_of_Science_and_Technology_of_China Nanjing University Lamda Group LAMDA is affiliated with the National Key Laboratory for Novel Software Technology and the Department of Computer Science & Technology, Nanjing University, China. It locates at Computer Science and Technology Building in the machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 177 Xianlin campus of Nanjing University, mainly in Rm910. The Founding Director of LAMDA is Prof. Zhi-‐Hua Zhou. "LAMDA" means "Learning And Mining from DatA". The main research interests of LAMDA include machine learning, data mining, pattern recognition, information retrieval, evolutionary computation, neural computation, and some other related areas. Currently our research mainly involves: ensemble learning, semi-‐supervised and active learning, multi-‐instance and multi-‐label learning, cost-‐sensitive and class-‐ imbalance learning, metric learning, dimensionality reduction and feature selection, structure learning and clustering, theoretical foundations of evolutionary computation, improving comprehensibility, content-‐based image retrieval, web search and mining, face recognition, computer-‐aided medical diagnosis, bioinformatics, etc. http://lamda.nju.edu.cn/MainPage.ashx More to come … MACHINE LEARNING RESEARCH GROUPS in ASIA, RUSSIA Moscow State University http://www.msu.ru/ More to come … MACHINE LEARNING RESEARCH GROUPS in AFRICA More to come … MACHINE LEARNING RESEARCH GROUPS in OCEANIA NICTA Machine Learning Research Group, Australia We want to change the world. Machine learning is a powerful technology that can help solve almost any problem. We think about it differently to much of the machine learning research community. We focus on important and challenging problems such as • Navigating the world’s patent literature • Finding sites for geothermal energy production • Predicting the output of rooftop solar photovoltaic systems • Building actionable data analytics for the enterprise • Managing the traffic in large cities • Predicting failures of widespread infrastructure machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 178 We develop new technologies to solve these problems and make them freely available or commercially deploy them. We regularly host visitors and regularly have job openings and opportunities for PhD students. If you also want to change the world, come and join us. http://www.nicta.com.au/research/machine_learning http://nicta.com.au/research/machine_learning/research_topics More to come … machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 179 Academics (with free access to their publications): DRAFT, A LOT MORE TO COME SOON Academics (with free access to their publications), US Stanford University Andrew Ng Added in the kit before 24-‐Oct-‐2014 Andrew Ng is a Co-‐founder of Coursera and the Director of the Stanford AI Lab. In 2011 he led the development of Stanford University’s main MOOC (Massive Open Online Courses) platform and also taught an online Machine Learning class that was offered to over 100,000 students, leading to the founding of Coursera. Ng’s goal is to give everyone in the world access to a high quality education, for free. Today, Coursera partners with some of the top universities in the world to offer high quality free online courses. It is the largest MOOC platform in the world. Outside online education, Ng’s work at Stanford is on machine learning with an emphasis on deep learning. He also founded and led a project at Google to develop massive-‐scale deep learning algorithms. It resulted in the famous cat detector popularly known as the “Google cat” in which a massive neural network with 1 billion parameters learned from unlabeled YouTube videos. http://cs.stanford.edu/people/ang/?page_id=414 Carnegie Mellon University Tom Mitchell Dr. Mitchell works on new learning algorithms, such as methods for learning from labeled and unlabeled data. Much of his research is driven by applications of machine learning such as understanding natural language text, and analyzing fMRI brain image data to model human cognition. http://www.cs.cmu.edu/~tom/ Robert Kass Dr. Kass has long-‐standing interests in the Bayesian approach to statistical inference, and has contributed to the development of Bayesian methods and their computational implementation. Over the past 10 years he has focused on statistical problems in neuroscience, especially in the analysis of signals coming from single neurons and from multiple neurons recorded simultaneously. http://www.stat.cmu.edu/~kass/ Alexander J. Smola Researcher, Google machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 180 Professor, Carnegie Mellon University Interests My primary research interest covers the following four areas: • Scalability of algorithms. This means pushing algorithms to internet scale, distributing them on many (faulty) machines, showing convergence, and modifying models to fit these requirements. For instance, randomized techniques are quite promising in this context. In other words, I'm interested in big data. • Kernels methods are quite an effective means of making linear methods nonlinear and nonparametric. My research interests include support vector Machines, gaussian processes, and conditional random fields. Kernels are very useful also for the representation of distributions, that is two-‐sample tests, independence tests and many applications to unsupervised learning. • Statistical modeling, primarily with Bayesian Nonparametrics is a great way of addressing many modeling problems. Quite often, the techniques overlap with kernel methods and scalability in rather delightful ways. Applications, primarily in terms of user modeling, document analysis, temporal models, and modeling data at scale is a great source of inspiration. That is, how can we find principled techniques to solve the problem, what are the underlying concepts, how can we solve things automatically. http://alex.smola.org http://videolectures.net/site/search/?q=smola 2,028 subscribers (screenshot 17-‐Jan-‐2015) 122,631 views Joined 30 Dec 2006 https://www.youtube.com/channel/UCYoS2VT03weLA7uzvL2Vybw?spfreload=10 Princeton University, US Robert Schapire Added in the kit before 24-‐Oct-‐2014 Robert Elias Schapire is the David M. Siegel '83 Professor in the computer science department at Princeton University. His primary specialty is theoretical and applied machine learning. His work led to the development of the boosting meta-‐algorithm used in machine learning. Together with Yoav Freund, he invented the AdaBoost algorithm in 1996. He received the Gödel prize in 2003 for his work on AdaBoost with Yoav Freund. In 2014, Schapire was elected to the National Academy of Engineering for his contributions to machine learning through the invention and development of boosting algorithms.[1] (Source Wikipedia) http://www.cs.princeton.edu/~schapire/ http://mitpress.mit.edu/sites/default/files/titles/content/9780262017183_sch_0001.pdf Mona Singh Added in the kit before 24-‐Oct-‐2014 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 181 My group develops algorithms for a diverse set of problems in computational molecular biology. We are particularly interested in predicting specificity in protein interactions and uncovering how molecular interactions and functions vary across context, organisms and individuals. We leverage high-‐throughput biological datasets in order to develop data-‐driven algorithms for predicting protein interactions and specificity; for analyzing biological networks in order to uncover cellular organization, functioning, and pathways; for uncovering protein functions via sequences and structures; and for analyzing proteomics and sequencing data. An appreciation of protein structure guides much of our research. http://www.cs.princeton.edu/~mona/ Olga Troyanskaya Added in the kit before 24-‐Oct-‐2014 The goal of my research is to bring the capabilities of computer science and statistics to the study of gene function and regulation in the biological networks through integrated analysis of biological data from diverse data sources-‐-‐both existing and yet to come (e.g. from diverse gene expression data sets and proteomic studies). I am designing systematic and accurate computational and statistical algorithms for biological signal detection in high-‐throughput data sets. More specifically, I am interested in developing methods for better gene expression data processing and algorithms for integrated analysis of biological data from multiple genomic data sets and different types of data sources (e.g. genomic sequences, gene expression, and proteomics data). http://reducio.princeton.edu/cm/node/13 UCLA, US Judea Pearl, Cognitive System Laboratory Added in the kit before 24-‐Oct-‐2014 Judea Pearl (born 1936) is an Israeli-‐born American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belief propagation). He is also credited for developing a theory of causal and counterfactual inference based on structural models (see article on causality). He is the 2011 winner of the ACM Turing Award, the highest distinction in computer science, "for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning". (source Wikipedia) http://bayes.cs.ucla.edu/csl_papers.html machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 182 Rice University, US Justin Esarey Lectures, Assistant Professor of Political Science Dr. Justin Esarey is an Assistant Professor of Political Science at Rice University who specializes in political methodology. His areas of expertise include detecting and presenting context-‐specific relationships, model specification and sensitivity, the analysis of binary data, laboratory social experimentation, and promoting thoughtful inference (and thinking about inference) by using technology to make methodological resources available to the scholarly public. His recent substantive projects study the relationship between corruption and female participation in government, the effect of "naming and shaming" on human rights abuse, and the behavioral implications of political ideology. https://www.youtube.com/user/jeesarey/videos?spfreload=10 Justin Esarey Publications & Software, Assistant Professor of Political Science, Rice University http://jee3.web.rice.edu/research.htm University of Maryland, US Hal Daume III Added in the kit before 24-‐Oct-‐2014 I am Hal Daumé III, an Associate Professor in Computer Science (also UMIACS and Linguistics) at the University of Maryland; I was previously in the School of Computing at the University of Utah (CV). Although I'd like to be known for my research in language (computational linguistics and natural language processing) and machine learning (structured prediction, domain adapation and Bayesian methods), I am probably best known for my NLPers blog. I associate myself most with conferences like ACL, ICML, EMNLP and NIPS. At UMD, I'm affiliated with the Computational Linguistics lab, the machine learning reading group, the language science program and the AI group, and interact closely with LINQS and computer vision. http://hal3.name Portland State University Melanie Mitchell Research My research interests: Artificial intelligence, machine learning, and complex systems. Evolutionary computation and artificial life. Understanding how natural systems perform computation, and how to use ideas from natural systems to develop new kinds of computational systems. Cognitive science, particularly computer modeling of perception and analogy-‐making, emergent computation and representation, and philosophical foundations of cognitive science. Biographical Sketch machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 183 Melanie Mitchell is Professor of Computer Science at Portland State University, and External Professor and Member of the Science Board at the Santa Fe Institute. She attended Brown University, where she majored in mathematics and did research in astronomy, and the University of Michigan, where she received a Ph.D. in computer science, Her dissertation, in collaboration with her advisor Douglas Hofstadter, was the development of Copycat, a computer program that makes analogies. She has held faculty or professional positions at the University of Michigan, the Santa Fe Institute, Los Alamos National Laboratory, the OGI School of Science and Engineering, and Portland State University. She is the author or editor of five books and over 70 scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her most recent book, Complexity: A Guided Tour (Oxford, 2009), won the 2010 Phi Beta Kappa Science Book Award. It was also named by Amazon.com as one of the ten best science books of 2009, and was longlisted for the Royal Society's 2010 book prize. Melanie directs the Santa Fe Institute's Complexity Explorer project, which offers online courses and other educational resources related to the field of complex systems. http://web.cecs.pdx.edu/~mm/ Academics (with free access to their publications), FRANCE Ecole Normale Superieure, FRANCE Francis Bach Added in the kit before 24-‐Oct-‐2014 http://www.di.ens.fr/~fbach/ http://videolectures.net/francis_r_bach/ INRIA Gaël Varoquaux Machine learning and brain imaging researcher ! Research faculty (CR1), Parietal team, INRIA ! Associate researcher, Unicog team, INSERM ACADEMIC RESEARCH Machine learning to link cognition with brain activity: I am interested in data mining of functional brain images (fMRI) to learn models of brain function. ! Machine learning for encoding / decoding models ! Spatial penalties for learning and denoising ! Resting-‐state methods ! Functional parcellations of the brain ! Functional connectivity • More... machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 184 My publications page and my Google scholar page. Research at Parietal OPEN-‐SOURCE SOFTWARE Core contributor to scientific computing in Python: • scikit-‐learn: Machine learning in Python • joblib: lightweight pipelining of scientific code • Mayavi: 3D plotting and scientific visualization • nilearn: Machine learning for NeuroImaging More... I am editor of the scipy lecture notes. See my view on scientific computing. http://gael-‐varoquaux.info http://videolectures.net/gael_varoquaux/ Academics (with free access to their publications), UK University College London, UK John Shaw-‐Taylor Added in the kit before 24-‐Oct-‐2014 John S Shawe-‐Taylor is a professor at University College London (UK) where he is Director of the Centre for Computational Statistics and Machine Learning (CSML). His main research area is Statistical Learning Theory, but his contributions range from Neural Networks, to Machine Learning, to Graph Theory. John Shawe-‐Taylor obtained a PhD in Mathematics at Royal Holloway, University of London in 1986. He subsequently completed an MSc in the Foundations of Advanced Information Technology at Imperial College. He was promoted to Professor of Computing Science in 1996. He has published over 150 research papers. He moved to the University of Southampton in 2003 to lead the ISIS research group. He has been appointed the Director of the Centre for Computational Statistics and Machine Learning at University College, London from July 2006. He has coordinated a number of European wide projects investigating the theory and practice of Machine Learning, including the NeuroCOLT projects. He is currently the scientific coordinator of a Framework VI Network of Excellence in Pattern Analysis, Statistical Modelling and Computational Learning (PASCAL) involving 57 partners. http://www0.cs.ucl.ac.uk/staff/J.Shawe-‐Taylor/ http://videolectures.net/site/search/?q=John+Shaw-‐Taylor Mark Herbster Added in the kit before 24-‐Oct-‐2014 My research currently focuses on the problem of predicting a labeling of a graph. This problem is foundational for transductive and semi-‐supervised learning. Initial machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 185 bounds and experimental results are given in Online learning over graphs. The paper Prediction on a graph with a perceptron significantly improves on previous results in terms of the tightness and interpretability of the bounds. In the recent work A fast method to predict the labeling of a tree we've developed methods to speed up graph prediction methods. I am also broadly interested in online learning, see my publications page for more details. http://www0.cs.ucl.ac.uk/staff/M.Herbster/pubs/ http://videolectures.net/mark_herbster/ David Barber Added in the kit before 24-‐Oct-‐2014 David Barber received a BA in Mathematics from Cambridge University and subsequently a PhD in Theoretical Physics (Statistical Mechanics) from Edinburgh University. He is currently Reader in Information Processing in the department of Computer Science UCL where he develops novel information processing schemes, mainly based on the application of probabilistic reasoning. Prior to joining UCL he was a lecturer at Aston and Edinburgh Universities. http://web4.cs.ucl.ac.uk/staff/d.barber/publications/david_barber_online.html http://videolectures.net/site/search/?q=david+barber Gabriel Brostow Added in the kit before 24-‐Oct-‐2014 My name is Gabriel Brostow, and I am an associate professor (Senior Lecturer) in Computer Science here at UCL. My group explores research problems relating to Computer Vision and Computer Graphics. The students and colleagues here have diverse interests, but my focus is on "Smart Capture" for analysis and synthesis applications. To me, smart capture of visual data (usually video) means having or finding satisfying answers to these questions about a system, whether interactive or fully automated: I) Does the system know the intended purpose of the data being captured? II) Can the system assess its own accuracy? III) Does the system compare new inputs to old ones? I love this field because it allows us to apply our expertise to a variety of tough problems, including film and photo special effects (computational photography), action analysis (of people, animals, and cells), and authoring systems (for architecture, animation, presentations) that make the most of user effort. "Motion reveals everything" used to be my main research mantra, but that has now taken hold sufficiently (obviously NOT just through my efforts!) that it no longer needs championing. http://www0.cs.ucl.ac.uk/staff/g.brostow/#Research Jun Wang Added in the kit before 24-‐Oct-‐2014 My research focus is on the areas of information retrieval, large scale data mining, multimedia content analysis, and statistical pattern recognition; current research covers both theoretical and practical aspects: machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 186 portfolio theory and statistical modeling of information retrieval, data mining and collaborative filtering (recommendation), web economy and online advertising, user-‐centric information seeking, social, “the wisdom of crowds”, approaches for content understanding, organisation, and retrieval, peer-‐to-‐peer information retrieval and filtering, and multimedia content analysis, indexing and retrieval. http://scholar.google.com/citations?user=wIE1tY4AAAAJ&hl=en David Jones Lab Added in the kit before 24-‐Oct-‐2014 My main research interests are in protein structure prediction and analysis, simulations of protein folding, Hidden Markov Model methods, transmembrane protein analysis, machine learning applications in bioinformatics, de novo protein design methodology, and genome analysis including the application of intelligent software agents. New areas of research include the use of high throughput computing and Grid technology for bioinformatics applications, analysis and prediction of protein disorder, expression array data analysis and the analysis and prediction of protein function and protein-‐protein interactions. http://bioinf.cs.ucl.ac.uk/publications/ Simon Prince Added in the kit before 24-‐Oct-‐2014 My initial work addressed human stereo vision. My doctoral thesis concerned the solution of the binocular stereo correspondence problem in the human visual system. I also worked on the physiology of stereo vision in my subsequent post-‐ doctoral research. I became interested in computer vision and made the switch in 2000. My first Computer Science research was on time-‐series methods for the solution of the inverse problem in Optical Tomography with Simon Arridge at UCL. In Singapore, I worked for several years on augmented reality. This involved developing algorithms for camera pose estimation, and a three-‐dimensional video-‐conferencing system using real-‐time image based rendering. More recently, I have worked on face detection in a novel foveated sensor system. I am interested in face recognition in general and have presented work on how to recognize faces in the presence of large pose and lighting changes. I am interested in most areas of computer vision and computer graphics, and still maintain active links with the neuroscience and medical imaging communities. http://web4.cs.ucl.ac.uk/research/vis/pvl/ http://www.computervisionmodels.com Massimiliano Pontil Added in the kit before 24-‐Oct-‐2014 I am mainly interested in machine learning theory and pattern recognition. I have also some interest in function representation and approximation, numerical machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 187 optimization and statistics. I have worked on different machine learning approaches, particularly on regularization methods, such as support vector machines and other kernel-‐based methods, multi-‐task and transfer learning, online learning and learning over graphs. I have also worked on machine learning applications arising in computer vision, natural language processing, bioinformatics and user modeling. http://www0.cs.ucl.ac.uk/staff/M.Pontil/pubs.html Cambridge University, UK Richard E Turner Added in the kit before 24-‐Oct-‐2014 Richard Turner holds a Lectureship (equivalent to US Assistant Professor) in Computer Vision and Machine Learning in the Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, UK. Before taking up this position, he held an EPSRC Postdoctoral research fellowship which he spent at both the University of Cambridge and the Laboratory for Computational Vision, NYU, USA. He has a PhD degree in Computational Neuroscience and Machine Learning from the Gatsby Computational Neuroscience Unit, UCL, UK and a M.Sci. degree in Natural Sciences (specialism Physics) from the University of Cambridge, UK. http://scholar.google.com/citations?user=DgLEyZgAAAAJ&hl=en Oxford University, UK Phil Blunsom Added in the kit before 24-‐Oct-‐2014 My research interests lie at the intersection of machine learning and computational linguistics. I apply machine learning techniques, such as graphical models, to a range of problems relating to the understanding, learning and manipulation of language. Recently I have focused on structural induction problems such as grammar induction and learning statistical machine translation models http://scholar.google.co.uk/citations?user=eJwbbXEAAAAJ&hl=en Nando de Freitas Added in the kit before 24-‐Oct-‐2014 I want to understand intelligence and how minds work. My research is multi-‐ disciplinary and focuses primarily on the following areas: Machine learning, big data, and computational statistics Artificial intelligence, probabilistic reasoning, and decision making Computational neuroscience, neural networks, and cognitive science Randomized algorithms, and Monte Carlo simulation Vision, robotics, and speech perception machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 188 http://scholar.google.co.uk/citations?user=nzEluBwAAAAJ&hl=en Karl Hermann Added in the kit before 24-‐Oct-‐2014 My research is at the intersection of Natural Language Processing and Machine Learning, with particular emphasis on semantics. Current topics of interest include: Compositional Semantics Learning from Multilingual Data Semantic Frame Identification Machine Translation Hypergraph Grammars http://www.cs.ox.ac.uk/people/publications/personal/KarlMoritz.Hermann.html Edward Grefenstette Added in the kit before 24-‐Oct-‐2014 I am a Franco-‐American computer scientist, working as a research assistant on EPSRC Project EP/I03808X/1 entitled A Unified Model of Compositional and Distributional Semantics: Theory and Applications. I am also lecturing at Hertford College to students taking Oxford's new computer science and philosophy course. From October 2013, I will also be a Fulford Junior Research Fellow at Somerville College. http://www.cs.ox.ac.uk/people/publications/date/Edward.Grefenstette.html Delft University of Technology, NETHERLANDS Thomas Geijtenbeek Publications & Videos Added in the kit 08-‐Nov-‐2014 I am a postdoctoral researcher at Delft University of Technology. My main research interests are simulation, control, animation and artificial intelligence. In addition, I work part-‐time as Manager Software Development at Motek Medical. http://goatstream.com/research/ Academics (with free access to their publications), CANADA University of Montreal, CANADA Yoshua Bengio Added in the kit before 24-‐Oct-‐2014 My long-‐term goal is to understand intelligence; understanding the underlying principles would deliver artificial intelligence, and I believe that learning algorithms are essential in this quest. Machine learning algorithms attempt to endow machines with the ability to capture operational knowledge through examples, e.g., allowing a machine to classify or machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… 189 Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! predict correctly in new cases. Machine learning research has been extremely successful in the past two decades and is now applied in many areas of science and technology, some well known examples including web search engines, natural language translation, speech recognition, machine vision, and data-‐mining. Yet, machines still seem to fall short of even mammal-‐level intelligence in many respects. One of the remaining frontiers of machine learning is the difficulty of learning the kind of complicated and highly-‐varying functions that are necessary to perform machine vision or natural language processing tasks at a level comparable to humans (even a 2-‐year old). See my lab's long-‐term vision web page for a broader introduction. An introductory discussion of recent and ongoing research is below. See the lab's publications site for a downloadable and complete bibliographic list of my papers. http://www.iro.umontreal.ca/~bengioy/yoshua_en/research.html http://www.iro.umontreal.ca/~bengioy/yoshua_en/ Deep Learning Slides by Yoshua Bengio, MLSS 2015, Austin, Texas http://www.iro.umontreal.ca/~bengioy/talks/mlss-‐austin.pdf KyungHyun Cho http://www.kyunghyuncho.me/home Deep Learning Tutorial at KAIST Slides https://drive.google.com/file/d/0B16RwCMQqrtdb05qdDFnSXprM0E/edit?pli=1 University of Toronto, CANADA Geoffrey Hinton Added in the kit 11-‐Nov-‐2014 I design learning algorithms for neural networks. My aim is to discover a learning procedure that is efficient at finding complex structure in large, high-‐dimensional datasets and to show that this is how the brain learns to see. I was one of the researchers who introduced the back-‐propagation algorithm that has been widely used for practical applications. My other contributions to neural network research include Boltzmann machines, distributed representations, time-‐delay neural nets, mixtures of experts, variational learning, contrastive divergence learning, dropout, and deep belief nets. My students have changed the way in which speech recognition and object recognition are done. I now work part-‐time at Google and part-‐time at the University of Toronto. http://www.cs.toronto.edu/~hinton/papers.html http://www.cs.toronto.edu/~hinton/ Alex Graves Research Interests machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 190 Recurrent neural networks (especially LSTM) Supervised sequence labelling (especially speech and handwriting recognition) Unsupervised sequence learning http://www.cs.toronto.edu/%7Egraves/ Universite de Sherbrooke, CANADA Hugo Larochelle Added in the kit before 24-‐Oct-‐2014 Je m'intéresse aux algorithmes d'apprentissage automatique, soit aux algorithmes capables d'extraire des concepts ou patrons à partir de données. Mes travaux se concentrent sur le développement d'approches connexionnistes et probabilistes à diverses problèmes d'intelligence artificielle, tels la vision artificielle et le traitement automatique du langage. Les thèmes de recherche auxquels je m'intéresse incluent: Problèmes: apprentissage supervisé, semi-‐supervisé et non-‐supervisé, prédiction de cibles structurées, ordonnancement, estimation de densité; Modèles: réseaux de neurones profonds («deep learning»), autoencodeurs, machines de Boltzmann, champs Markoviens aléatoires; Applications: reconnaissance et suivi d'objects, classification et ordonnancement de documents. http://www.dmi.usherb.ca/~larocheh/index_fr.html http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html University of British Columbia, CANADA Added in the kit before 24-‐Oct-‐2014 Great access to all publications of the faculty members Giuseppe Carenini http://www.cs.ubc.ca/%7Ecarenini/storage/new-‐papers-‐frame.html Cristina Conati http://www.cs.ubc.ca/~conati/publications.php Kevin Leyton-‐Brown http://www.cs.ubc.ca/~kevinlb/publications.html Holger Hoos http://www.cs.ubc.ca/~hoos/publications.html machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 191 Jim Little http://www.cs.ubc.ca/~little/links/papers.html David Lowe http://www.cs.ubc.ca/~lowe/pubs.html Karon MacLean http://www.cs.ubc.ca/labs/spin/publications/index.html Alan Mackworth http://www.cs.ubc.ca/~mack/Publications/sort_date.html Dinesh K. Pai http://www.cs.ubc.ca/~pai/ David Poole http://www.cs.ubc.ca/~poole/publications.html University of Waterloo Prof. Shai Ben-‐David Research Interests My research interests span a wide spectrum of topics in the foundations of computer science and its applications, with a particular emphasis on statistical and computational machine learning. The common thread throughout my research is aiming to provide mathematical formulation and understanding of real world problems. In particular, I have been looking at popular machine learning and data mining paradigms that seem to lack clear theoretical justification. https://cs.uwaterloo.ca/~shai/ http://videolectures.net/shai_ben_david/ Academics (with free access to their publications), GERMANY University of Freiburg Machine Learning Lab Future computer programs will contain a growing part of 'intelligent' software modules that are not conventionally programmed, but that are learned either from machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 192 data provided by the user or from data that the program autonomously collects during its use. In this spirit, the Machine Learning Lab deals with research on Machine Learning techniques and the integration of learning modules into larger software systems, aiming at their effective application in complex real-‐world problems. Application areas are robotics, control, forecasting and disposition systems, scheduling and related fields. Research Areas: Efficient Reinforcement Learning Algorithms, Intelligent Robot Control Architectures, Learning in Multiagent Systems, (Un-‐)Supervised Learning, Deep Learning, Autonomous Robots, Industrial Applications, Clinical Applications http://ml.informatik.uni-‐freiburg.de Academics (with free access to their publications), CHINA USPC, CHINA En-‐Hong Chen Added in the kit before 24-‐Oct-‐2014 My current research interests are data mining and machine learning, especially social network analysis and recommender systems. I have published more than 100 papers on many journals and conferences, including international journals such as IEEE Trans, ACM Trans, and important data mining conferences, such as KDD, ICDM, NIPS. My research is supported by the National Natural Science Foundation of China, National High Technology Research and Development Program 863 of China, etc. I won the Best Application Paper Award on KDD2008 and Best Research Paper Award on ICDM2011. http://staff.ustc.edu.cn/~cheneh/#pub Linli Xu My research area is Machine Learning. More specifically, my work combines aspects from the following: • Unsupervised learning and semi-‐supervised learning, clustering • Large margin approaches, support vector machines • Optimization, convex programming http://staff.ustc.edu.cn/~linlixu/papers.html University of Beijing, CHINA Yuan Yao, School of Mathematical Sciences Added in the kit before 24-‐Oct-‐2014 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 193 My most recent interests are focusing on mathematics for data sciences, in particular topological and geometric methods for high dimensional data analysis and statistical machine learning, with applications in computational biology and information technology. Publications and code to reproduce results http://www.math.pku.edu.cn/teachers/yaoy/research.html Academics (with free access to their publications), RUSSIA Moscow State University, RUSSIA Dmitry Efimov Added in the kit before 24-‐Oct-‐2014 Dmitry is an expert in promising areas of modern complex and functional analysis; the author of original results. He begins with the systematic study of some classes of analytic functions in the half-‐plane that are analogous to the well-‐known Privalov classes and maximal Privalov classes in the disc. His main results are the following: 1) A new factorization formula and accurate estimates of growth for functions in these classes; 2) The introduction of natural invariant metrics under which the classes form Frecher algebras; 3) A complete description of the linear isometries as well as the bounded and completely bounded subsets in the classes. http://mech.math.msu.su/~efimov/indexe.php https://www.kaggle.com/users/29346/dmitry-‐efimov Academics (with free access to their publications), POLAND University of Warsaw, POLAND Marcin Murca Added in the kit before 24-‐Oct-‐2014 I am an assistant professor at the Institute of Informatics, University of Warsaw, member of the Algorithms Group (see our blog!). I work on graph algorithms, approximation algorithms and on-‐line algorithms – you can find most of my papers at DBLP or here. You can find my PhD Thesis here – it contains a rather detailed exposition of the algebraic approach to matching problems in graphs. http://duch.mimuw.edu.pl/~mucha/wordpress/?page_id=58 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 194 Academics (with free access to their publications), SWITZERLAND Prof. Jürgen Schmidhuber's Home Page (Great resources! Not to be missed!) Prof. Jürgen Schmidhuber's Artificial Intelligence team has won nine international competitions in machine learning and pattern recognition (more than any other AI research group) and seven independent best paper/best video awards, achieved the world's first superhuman visual classification results, Deep Learning since 1991 -‐ Winning Contests in Pattern Recognition and Sequence Learning Through Fast & Deep / Recurrent Neural Networks has pioneered Deep Learning methods for Artificial Neural Networks since 1991, and established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He generalized algorithmic information theory, and the many-‐worlds theory of physics, to obtain a minimal theory of all constructively computable universes -‐ an elegant algorithmic theory of everything. Google & Apple and many other leading companies are now using the machine learning techniques developed in his group at the Swiss AI Lab IDSIA & USI & SUPSI (ex-‐TUM CogBotLab). Since age 15 or so his main scientific ambition has been to build an optimal scientist through self-‐improving AI, then retire. Progress is accelerating -‐ are 40,000 years of human-‐dominated history about to converge within the next few decades? http://people.idsia.ch/~juergen/ Free access to a list of Machine Learning MSc/PhD Dissertations Machine Learning Department, Carnegie Mellon University Added in the kit before 18-‐Nov-‐2014 https://www.ml.cmu.edu/research/phd-‐dissertations.html Machine Learning Department, Columbia University (Search for PhD on the page) Added in the kit before 18-‐Nov-‐2014 http://www.cs.columbia.edu/learning/papers.html PhD Dissertations, University of Edingburgh, UK Added in the kit before 18-‐Nov-‐2014 https://www.era.lib.ed.ac.uk/handle/1842/3389/browse?type=dateissued&sort_b y=2&order=DESC&rpp=20&etal=0&submit_browse=Update MSc Dissertations, University of Oxford, UK Added in the kit before 18-‐Nov-‐2014 machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 195 https://www.cs.ox.ac.uk/admissions/grad/A_list_of_some_recent_theses_that_recei ved_high_marks Machine Learning Group, Department of Engineering, University of Cambridge, UK (Search for PhD on the page) Added in the kit before 18-‐Nov-‐2014 http://mlg.eng.cam.ac.uk/pub/ Barton: MIT Libraries' Catalog Brief Results Display from MIT Theses only Results for W-‐all keywords= machine learning; sorted by : Year http://library.mit.edu/F/DH7286CBBN6UP4I27QMKJV58GL9BR62RT24P8V3KYRFY11JS3X-‐03749?func=find-‐ b&find_code=WRD&request=machine+learning New York University Computer Science PhD Theses http://www.cs.nyu.edu/web/Research/theses.html Digital Collection of The Australian National University (PhD Thesis) https://digitalcollections.anu.edu.au/handle/1885/3/simple-‐ search?query=machine+learning&rpp=10&sort_by=0&order=DESC&etal=0&submit_search=Update machinelearningsalon kit – 24th Jan 2015 – Test of the links going on… Don’t keep an old version! Machinelearningsalon.org kit is regularly updated! 196

The Machine Learning Salon

Related documents

Products

Support

The Machine Learning Salon

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib