Personalised News Recommendation based on a User`s Browsing

advertisement
Personalised News Recommendation based on a User’s
Browsing History
Alexei De Bono
Supervisor: Dr. Joel Azzopardi
Co-supervisor: Dr. Colin Layfield
People often find it hard to describe what they’re interested in, however they are
very good at recognizing what interests them once they see it.
A significant recommendation system relies on an accurate user model to be
generated. Document clustering in collaboration with the idea of user profiling are
both highly regarded concepts in the field of information retrieval. Clustering
based on document-term relation matrices suffers from noise due to different
words with similar meanings. The method described in this thesis uses Latent
Semantic Analysis (LSA) in order to cluster the user’s documents to build the
user model and in turn, find the closest related news articles that the user may
deem as “interesting”.
It is important to note that all of this will be done in an implicit fashion with
relatively no requirements needed from the user in order to build the profile and
recommend suggested articles. The Google Chrome Extension framework
provides a good starting point for such a system by means of its well structured
browsing history data. The major condition put on the system is that the user
profile can be created or updated whilst the extension is to be open. Background
processing of the term-document matrix and SVD computations are not available
in this context. Furthermore, only client side web technology acting solely within
the browser will be used for all components.
The results obtained from this research show the possibility of one’s interests
being individually extracted by clustering his/her browsing history. The
introduction of LSA improves the results of the clustering algorithms when the
correct amount of dimensions is specified. Multiple clustering algorithms were
evaluated and results show that the better performing algorithms are the original
k0means when given the ideal number of clusters (the system provides this by
using Dunn’s index), MajorClust and Split K-Means both coupled with “recursive
merge” and “dissolve” steps explained in this paper. It is also shown that the
better clustering algorithms provided the better recommendation performances.
This explains that by clustering one’s browsing history, a satisfactory production
of recommendations is possible in a fully unsupervised, private and implicit
manner.
Download