Social networking - Drupal Developer Days Barcelona

advertisement
Personalisation and
Recommendations using Drupal
• Keywords:
–
–
–
–
–
–
–
–
–
Personalisation
Recommendations
Scalable machine learning
Predictions
Similarity
Data Mining
Big Data
Trend Spotting
Clustering
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
• Kendra Initiative mission
– Foster an Open Distributed Marketplace for Digital
Media
• EU funded
– P2P-Next
• http://www.p2p-next.org
– SARACEN (Socially Aware, collaboRative, scAlable
Coding mEdia distributioN)
• http://www.saracen-p2p.eu
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Deliverables
• Kendra Signpost
– Metadata interoperability, mapping and transformation
• Smart Filters
– Portable preferences and filters
• Kendra Social, Kendra Hub
– Social networking management tools
• Standards work
– OpenSocial extension
– Social API – see Abstracting Social Networking functionality in
Drupal sprint
• Kendra Match
– Searching and recommendation
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Components
•
•
•
•
•
Drupal Recommender API module
Recommender helper modules
async_command module
Apache Mahout or cloud service
Hadoop cluster (optional)
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Industry Examples
•
•
•
•
•
•
Amazon
Netflix
Spotify, Pandora
Facebook, LinkedIn
OKCupid
iTunes: Genius; app store - not so much
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Machine learning
• Collaborative Filtering
– AKA recommender engines
• Clustering
• Classification
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Collaborative Filtering
• Input: preference data
• Output: predictions
• Preference = <uid1, (nid1 or uid2), w1>
– w1 = signed integer representing weight of uid1nid1 or uid1-uid2 correlation (affinity)
• Prediction = <uid1, (nid1or uid2), w2>
– w2 = float representing strength of uid1-nid1 or
uid1-uid2 correlation
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Enter Mahout
• Apache Mahout is a scalable machine learning
library that supports large data sets.
• Launched Spring 2010
• Grew from the Apache Lucene project (basis
for Apache Solr)
• Merged with Taste project
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Use Cases
•
•
•
•
Recommendation mining
Clustering
Classification
Frequent itemset mining
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Out-of-box algorithms
•
Recommendation
–
–
–
–
–
•
Clustering
–
–
–
–
–
–
–
–
•
User-based recommender
Item-based recommender
Slope-One recommender
Distributed Item-Based Collaborative Filtering
Collaborative Filtering using parallel matrix factorisation
Canopy Clustering
K-Means Clustering
Fuzzy K-Means
Mean Shift Clustering
Dirichlet Process Clustering
Latent Dirichlet Allocation
Spectral Clustering
Minhash Clustering
Model combination
– Naive Bayes algorithm
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Hadoop
• Provides clustering capabilities
• Not trivial to set up
• Not yet implemented in Recommender API
(issue #1206840)
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Recommender API
• Drupal 7 (alpha) & 6 (beta)
• Can run either on same server as Apache web
server or on a remote server
• Java helper program (was PHP)
• Uses JDBC and Java Persistence API (JPA)
• Drupal helper modules
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Recommender API helper modules
•
•
•
•
•
•
Browsing History Recommender
OG Similar groups module
Ubercart Products Recommender
Fivestar Recommender
Points Voting Recommender
Flag Recommender
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Asynchronous operation
• Async_command module
– Talks to Mahout
– Typically run via cron
• Results are stored directly in Drupal db
– Recommender tables
– Via JDBC
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Hosting Solutions
• Self-hosted: all-in-one (web server, database
server, recommender server) - has its pro’s &
cons
• Recommender API Cloud Service - looking for
beta testers
• Amazon Elastic MapReduce (EMR)
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Installing Mahout
• Prerequisites:
– Dedicated VM if possible
– Linux, Mac OSX Leopard 10.5.6 or later, Windows
(Cygwin)
– Java JDK 1.6
– Maven 2.0.11 or higher (maven.apache.org)
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Installing Mahout
• Building
– Follow instructions
– https://cwiki.apache.org/MAHOUT/buildingmaho
ut.html
• Use maven to build examples
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Installing Mahout
• Testing: Grouplens
– On a single 2GHz server:
• 100K ratings (1000 users, 1700 items): 9 minutes. 1M
ratings (6000 users, 4000 items): 12 hours. 10M ratings
(72,000 users, 10,000 items): fuggedaboutit
– Using 6 concurrent 2GHz processing units:
• 100K ratings (1000 users, 1700 items): 2 minutes. 1M
ratings (6000 users, 4000 items): 2 hours. 10M ratings
(72,000 users, 10,000 items): 11 days 20 hours.
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Installing Recommender API
• See http://drupal.org/node/1207634
• Configuration
– sites/all/modules/async_command/config.propert
ies should match settings.php
• Download and enable async_command
• Check
/admin/config/search/recommender/admin
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Usage
• Making recommendations
– User-user
– User-item
– Item-item
• Predictions/similarity feeds back into Drupal
• Blocks
• Views
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Case study: Data Mining and
Recommendations in SARACEN
• SARACEN: http://www.saracen-p2p.eu/
• Feedback loop to measure subjective quality of
the recommendations
–
–
–
–
Limited set of data, small user base
API provides an initial set of recommended videos
User can then watch a recommended video
User’s actions are incorporated into their implicit
profile, feeds back to the recommender API
– Recommender API generates new predictions based
on the complete set of implicit profile metadata
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
SARACEN: Prototype
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Recommender data sources
• Explicit data
– SARACEN account data, including location and language
– Linked accounts and profiles
• e.g. Facebook user profile, “likes”, connections, metadata
• Implicit data
–
–
–
–
–
–
Activity history recorded during the user’s sessions
Searches
Shared content
Viewed content
Albums (media containers)
Content ratings
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Scalability
• Don’t need Hadoop if
– Number of users is orders of magnitude larger
than the number of items
– Users browse anonymously most of the time
– Few users log in and need personalised
recommendations
– Item churn rate is relatively low
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Worth Considering
• Decreased Transparency
• Decreased Serendipity
• Sleep deprivation
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Resources: Recommender API
• http://drupal.org/project/recommender
• http://recommenderapi.com/cloud
• https://cwiki.apache.org/confluence/display/
MAHOUT
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Resources: Mahout
• http://mahout.apache.org/
• Mahout in Action
– http://www.manning.com/owen/
– ISBN 9781935182689.
• The Optimality of Naive Bayes, Harry Zhang.
• http://aws.amazon.com/elasticmapreduce/
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Acknowledgements
• Socially Aware, collaboRative, scAlable Coding
mEdia distributioN (SARACEN)
– http://www.saracen-p2p.eu
– Funded within the European Union’s Seventh
Framework Programme (FP7/2007-2013) under
grant agreement 248474
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Questions?
• Kendra Initiative
– @kendra
– http://www.kendra.org.uk
– https://github.com/kendrainitiative
• Klokie Grossfeld
– @klokie
– klokie@kendra.org.uk
– http://www.linkedin.com/in/klokie
• Daniel Harris
– @dahacouk
– daniel@kendra.org.uk
– http://www.linkedin.com/in/dahacouk
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Thanks
http://barcelona2012.drupaldays.org/abstracting-socialnetworking-functionality-drupal
Drupal Developer Days Barcelona – Kendra Initiative
2012.06.16
Download