Personalisation and Recommendations using Drupal • Keywords: – – – – – – – – – Personalisation Recommendations Scalable machine learning Predictions Similarity Data Mining Big Data Trend Spotting Clustering Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 • Kendra Initiative mission – Foster an Open Distributed Marketplace for Digital Media • EU funded – P2P-Next • http://www.p2p-next.org – SARACEN (Socially Aware, collaboRative, scAlable Coding mEdia distributioN) • http://www.saracen-p2p.eu Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Deliverables • Kendra Signpost – Metadata interoperability, mapping and transformation • Smart Filters – Portable preferences and filters • Kendra Social, Kendra Hub – Social networking management tools • Standards work – OpenSocial extension – Social API – see Abstracting Social Networking functionality in Drupal sprint • Kendra Match – Searching and recommendation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Components • • • • • Drupal Recommender API module Recommender helper modules async_command module Apache Mahout or cloud service Hadoop cluster (optional) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Industry Examples • • • • • • Amazon Netflix Spotify, Pandora Facebook, LinkedIn OKCupid iTunes: Genius; app store - not so much Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Machine learning • Collaborative Filtering – AKA recommender engines • Clustering • Classification Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Collaborative Filtering • Input: preference data • Output: predictions • Preference = <uid1, (nid1 or uid2), w1> – w1 = signed integer representing weight of uid1nid1 or uid1-uid2 correlation (affinity) • Prediction = <uid1, (nid1or uid2), w2> – w2 = float representing strength of uid1-nid1 or uid1-uid2 correlation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Enter Mahout • Apache Mahout is a scalable machine learning library that supports large data sets. • Launched Spring 2010 • Grew from the Apache Lucene project (basis for Apache Solr) • Merged with Taste project Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Use Cases • • • • Recommendation mining Clustering Classification Frequent itemset mining Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Out-of-box algorithms • Recommendation – – – – – • Clustering – – – – – – – – • User-based recommender Item-based recommender Slope-One recommender Distributed Item-Based Collaborative Filtering Collaborative Filtering using parallel matrix factorisation Canopy Clustering K-Means Clustering Fuzzy K-Means Mean Shift Clustering Dirichlet Process Clustering Latent Dirichlet Allocation Spectral Clustering Minhash Clustering Model combination – Naive Bayes algorithm Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Hadoop • Provides clustering capabilities • Not trivial to set up • Not yet implemented in Recommender API (issue #1206840) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Recommender API • Drupal 7 (alpha) & 6 (beta) • Can run either on same server as Apache web server or on a remote server • Java helper program (was PHP) • Uses JDBC and Java Persistence API (JPA) • Drupal helper modules Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Recommender API helper modules • • • • • • Browsing History Recommender OG Similar groups module Ubercart Products Recommender Fivestar Recommender Points Voting Recommender Flag Recommender Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Asynchronous operation • Async_command module – Talks to Mahout – Typically run via cron • Results are stored directly in Drupal db – Recommender tables – Via JDBC Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Hosting Solutions • Self-hosted: all-in-one (web server, database server, recommender server) - has its pro’s & cons • Recommender API Cloud Service - looking for beta testers • Amazon Elastic MapReduce (EMR) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Installing Mahout • Prerequisites: – Dedicated VM if possible – Linux, Mac OSX Leopard 10.5.6 or later, Windows (Cygwin) – Java JDK 1.6 – Maven 2.0.11 or higher (maven.apache.org) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Installing Mahout • Building – Follow instructions – https://cwiki.apache.org/MAHOUT/buildingmaho ut.html • Use maven to build examples Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Installing Mahout • Testing: Grouplens – On a single 2GHz server: • 100K ratings (1000 users, 1700 items): 9 minutes. 1M ratings (6000 users, 4000 items): 12 hours. 10M ratings (72,000 users, 10,000 items): fuggedaboutit – Using 6 concurrent 2GHz processing units: • 100K ratings (1000 users, 1700 items): 2 minutes. 1M ratings (6000 users, 4000 items): 2 hours. 10M ratings (72,000 users, 10,000 items): 11 days 20 hours. Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Installing Recommender API • See http://drupal.org/node/1207634 • Configuration – sites/all/modules/async_command/config.propert ies should match settings.php • Download and enable async_command • Check /admin/config/search/recommender/admin Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Usage • Making recommendations – User-user – User-item – Item-item • Predictions/similarity feeds back into Drupal • Blocks • Views Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Case study: Data Mining and Recommendations in SARACEN • SARACEN: http://www.saracen-p2p.eu/ • Feedback loop to measure subjective quality of the recommendations – – – – Limited set of data, small user base API provides an initial set of recommended videos User can then watch a recommended video User’s actions are incorporated into their implicit profile, feeds back to the recommender API – Recommender API generates new predictions based on the complete set of implicit profile metadata Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 SARACEN: Prototype Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Recommender data sources • Explicit data – SARACEN account data, including location and language – Linked accounts and profiles • e.g. Facebook user profile, “likes”, connections, metadata • Implicit data – – – – – – Activity history recorded during the user’s sessions Searches Shared content Viewed content Albums (media containers) Content ratings Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Scalability • Don’t need Hadoop if – Number of users is orders of magnitude larger than the number of items – Users browse anonymously most of the time – Few users log in and need personalised recommendations – Item churn rate is relatively low Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Worth Considering • Decreased Transparency • Decreased Serendipity • Sleep deprivation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Resources: Recommender API • http://drupal.org/project/recommender • http://recommenderapi.com/cloud • https://cwiki.apache.org/confluence/display/ MAHOUT Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Resources: Mahout • http://mahout.apache.org/ • Mahout in Action – http://www.manning.com/owen/ – ISBN 9781935182689. • The Optimality of Naive Bayes, Harry Zhang. • http://aws.amazon.com/elasticmapreduce/ Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Acknowledgements • Socially Aware, collaboRative, scAlable Coding mEdia distributioN (SARACEN) – http://www.saracen-p2p.eu – Funded within the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement 248474 Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Questions? • Kendra Initiative – @kendra – http://www.kendra.org.uk – https://github.com/kendrainitiative • Klokie Grossfeld – @klokie – klokie@kendra.org.uk – http://www.linkedin.com/in/klokie • Daniel Harris – @dahacouk – daniel@kendra.org.uk – http://www.linkedin.com/in/dahacouk Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16 Thanks http://barcelona2012.drupaldays.org/abstracting-socialnetworking-functionality-drupal Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16