Uploaded by ­조용운 / 학생 / 건설환경공학부

RS with Transformer

advertisement
Title: [PDF][PDF] BERTERS: Multimodal Representation Learning for Expert
Recommendation System with Transformer
Link: https://www.academia.edu/download/80829631/2007.07229v1.pdf
Year: 2020
Title: Deep multifaceted transformers for multi-objective ranking in
large-scale e-commerce recommender systems
Link: https://dl.acm.org/doi/abs/10.1145/3340531.3412697
Year: 2020
Abstract: Recommender Systems have been playing essential roles in ecommerce portals. Existing recommendation algorithms usually learn the
ranking scores of items by optimizing a single task (e.g. Click-through
rate prediction) based on users' historical click sequences, but they
generally pay few attention to simultaneously modeling users' multiple
types of behaviors or jointly optimize multiple objectives (e.g. both
Click-through rate and Conversion rate), which are both vital for ecommerce sites. In this paper, we argue that it is crucial to formulate
users' different interests based on multiple types of behaviors and
perform multi-task learning for significant improvement in multiple
objectives simultaneously. We propose Deep Multifaceted Transformers
(DMT), a novel framework that can model users' multiple types of behavior
sequences simultaneously with multiple Transformers. It utilizes Multigate Mixture-of-Experts to optimize multiple objectives. Besides, it
exploits unbiased learning to reduce the selection bias in the training
data. Experiments on JD real production dataset demonstrate the
effectiveness of DMT, which significantly outperforms state-of-art
methods. DMT has been successfully deployed to serve the main traffic in
the commercial Recommender System in JD.com. To facilitate future
research,
we
release
the
codes
and
datasets
at
https://github.com/guyulongcs/CIKM2020_DMT.
Title: SSE-PT: Sequential recommendation via personalized transformer
Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412258
Year: 2020
Abstract: Temporal information is crucial for recommendation problems
because user preferences are naturally dynamic in the real world. Recent
advances in deep learning, especially the discovery of various attention
mechanisms and newer architectures in addition to widely used RNN and CNN
in natural language processing, have allowed for better use of the
temporal ordering of items that each user has engaged with. In particular,
the SASRec model, inspired by the popular Transformer model in natural
languages processing, has achieved state-of-the-art results. However,
SASRec, just like the original Transformer model, is inherently an unpersonalized model and does not include personalized user embeddings. To
overcome this limitation, we propose a Personalized Transformer (SSE-PT)
model, outperforming SASRec by almost 5% in terms of NDCG@10 on 5 realworld datasets. Furthermore, after examining some random users’
engagement history, we find our model not only more interpretable but
also able to focus on recent engagement patterns for each user. Moreover,
our SSE-PT model with a slight modification, which we call SSE-PT++, can
handle extremely long sequences and outperform SASRec in ranking results
with comparable training speed, striking a balance between performance
and speed requirements. Our novel application of the Stochastic Shared
Embeddings (SSE) regularization is essential to the success of
personalization.
Code
and
data
are
open-sourced
at
https://github.com/wuliwei9278/SSE-PT.
Title: BERT4Rec: Sequential recommendation with bidirectional encoder
representations from transformer
Link: https://dl.acm.org/doi/abs/10.1145/3357384.3357895
Year: 2019
Abstract: Modeling users' dynamic preferences from their historical
behaviors is challenging and crucial for recommendation systems. Previous
methods employ sequential neural networks to encode users' historical
interactions from left to right into hidden representations for making
recommendations. Despite their effectiveness, we argue that such leftto-right unidirectional models are sub-optimal due to the limitations
including: \begin enumerate* [label=series\itshape\alph*\upshape)] \item
unidirectional architectures restrict the power of hidden representation
in users' behavior sequences; \item they often assume a rigidly ordered
sequence which is not always practical. \end enumerate* To address these
limitations, we proposed a sequential recommendation model called
BERT4Rec, which employs the deep bidirectional self-attention to model
user behavior sequences. To avoid the information leakage and efficiently
train the bidirectional model, we adopt the Cloze objective to sequential
recommendation, predicting the random masked items in the sequence by
jointly conditioning on their left and right context. In this way, we
learn a bidirectional representation model to make recommendations by
allowing each item in user historical behaviors to fuse information from
both left and right sides. Extensive experiments on four benchmark
datasets show that our model outperforms various state-of-the-art
sequential models consistently.
Title: Knowledge-enhanced hierarchical graph transformer network for
multi-behavior recommendation
Link: https://ojs.aaai.org/index.php/AAAI/article/view/16576
Year: 2021
Absract: Accurate user and item embedding learning is crucial for modern
recommender systems. However, most existing recommendation techniques
have thus far focused on modeling users' preferences over singular type
of user-item interactions. Many practical recommendation scenarios
involve multi-typed user interactive behaviors (e.g., page view, add-tofavorite and purchase), which presents unique challenges that cannot be
handled by current recommendation solutions. In particular: i) complex
inter-dependencies across different types of user behaviors; ii) the
incorporation of knowledge-aware item relations into the multi-behavior
recommendation framework; iii) dynamic characteristics of multi-typed
user-item interactions. To tackle these challenges, this work proposes a
Knowledge-Enhanced Hierarchical Graph Transformer Network (KHGT), to
investigate multi-typed interactive patterns between users and items in
recommender systems. Specifically, KHGT is build upon a graph-structured
neural architecture to i) capture type-specific behavior semantics; ii)
explicitly discriminate which types of user-item interactions are more
important in assisting the forecasting task on the target behavior.
Additionally, we further integrate the multi-modal graph attention layer
with temporal encoding strategy, to empower the learned embeddings be
reflective of both dedicated multiplex user-item and item-item
collaborative relations, as well as the underlying interaction dynamics.
Extensive experiments conducted on three real-world datasets show that
KHGT consistently outperforms many state-of-the-art recommendation
methods across various evaluation settings. Our implementation is
available in https://github.com/akaxlh/KHGT.
Title: Behavior sequence transformer for e-commerce recommendation in
alibaba
Link: https://dl.acm.org/doi/abs/10.1145/3326937.3341261
Year: 2019
Abstract: Deep learning based methods have been widely used in industrial
recommendation systems (RSs). Previous works adopt an Embedding&MLP
paradigm: raw features are embedded into low-dimensional vectors, which
are then fed on to MLP for final recommendations. However, most of these
works just concatenate different features, ignoring the sequential nature
of users' behaviors. In this paper, we propose to use the powerful
Transformer model to capture the sequential signals underlying users'
behavior sequences for recommendation in Alibaba. Experimental results
demonstrate the superiority of the proposed model, which is then deployed
online at Taobao and obtain significant improvements in online ClickThrough-Rate (CTR) comparing to two baselines.
Title:
Transformers4rec:
Bridging
the
sequential/session-based recommendation
gap
between
nlp
and
Link: https://dl.acm.org/doi/abs/10.1145/3460231.3474255
Year: 2021
Abstract: Much of the recent progress in sequential and session-based
recommendation has been driven by improvements in model architecture and
pretraining techniques originating in the field of Natural Language
Processing. Transformer architectures in particular have facilitated
building higher-capacity models and provided data augmentation and
training techniques which demonstrably improve the effectiveness of
sequential recommendation. But with a thousandfold more research going
on in NLP, the application of transformers for recommendation
understandably lags behind. To remedy this we introduce Transformers4Rec,
an open-source library built upon HuggingFace’s Transformers library with
a similar goal of opening up the advances of NLP based Transformers to
the recommender system community and making these advancements
immediately accessible for the tasks of sequential and session-based
recommendation. Like its core dependency, Transformers4Rec is designed
to be extensible by researchers, simple for practitioners, and fast and
robust in industrial deployments. In order to demonstrate the usefulness
of the library and the applicability of Transformer architectures in
next-click prediction for user sessions, where sequence lengths are much
shorter than those commonly found in NLP, we have leveraged
Transformers4Rec to win two recent session-based recommendation
competitions. In addition, we present in this paper the first
comprehensive empirical analysis comparing many Transformer architectures
and training approaches for the task of session-based recommendation. We
demonstrate that the best Transformer architectures have superior
performance across two e-commerce datasets while performing similarly to
the baselines on two news datasets. We further evaluate in isolation the
effectiveness of the different training techniques used in causal
language modeling, masked language modeling, permutation language
modeling and replacement token detection for a single Transformer
architecture, XLNet. We establish that training XLNet with replacement
token detection performs well across all datasets. Finally, we explore
techniques to include side information such as item and user context
features in order to establish best practices and show that the inclusion
of side information uniformly improves recommendation performance.
Transformers4Rec library is available at https://github.com/NVIDIAMerlin/Transformers4Rec/
Title: Multiplex behavioral relation learning for recommendation via
memory augmented transformer network
Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401445
Year: 2020
Abstract: Capturing users' precise preferences is of great importance in
various recommender systems (e.g., e-commerce platforms and online
advertising sites), which is the basis of how to present personalized
interesting product lists to individual users. In spite of significant
progress has been made to consider relations between users and items,
most of existing recommendation techniques solely focus on singular type
of user-item interactions. However, user-item interactive behavior is
often exhibited with multi-type (e.g., page view, add-to-favorite and
purchase) and inter-dependent in nature. The overlook of multiplex
behavior relations can hardly recognize the multi-modal contextual
signals across different types of interactions, which limit the
feasibility of current recommendation methods. To tackle the above
challenge, this work proposes a Memory-Augmented Transformer Networks
(MATN), to enable the recommendation with multiplex behavioral relational
information, and joint modeling of type-specific behavioral context and
type-wise behavior inter-dependencies, in a fully automatic manner. In
our MATN framework, we first develop a transformer-based multi-behavior
relation encoder, to make the learned interaction representations be
reflective of the cross-type behavior relations. Furthermore, a memory
attention network is proposed to supercharge MATN capturing the
contextual signals of different types of behavior into the categoryspecific latent embedding space. Finally, a cross-behavior aggregation
component is introduced to promote the comprehensive collaboration across
type-aware interaction behavior representations, and discriminate their
inherent
contributions
in
assisting
recommendations.
Extensive
experiments on two benchmark datasets and a real-world e-commence user
behavior data demonstrate significant improvements obtained by MATN over
baselines. Codes are available at: https://github.com/akaxlh/MATN.
Title: Recommender systems in the era of large language models (llms)
Link: https://arxiv.org/abs/2307.02046
Year: 2023
Abstract: With the prosperity of e-commerce and web applications,
Recommender Systems (RecSys) have become an important component of our
daily life, providing personalized suggestions that cater to user
preferences. While Deep Neural Networks (DNNs) have made significant
advancements in enhancing recommender systems by modeling user-item
interactions and incorporating textual side information, DNN-based
methods still face limitations, such as difficulties in understanding
users' interests and capturing textual side information, inabilities in
generalizing to various recommendation scenarios and reasoning on their
predictions, etc. Meanwhile, the emergence of Large Language Models
(LLMs), such as ChatGPT and GPT4, has revolutionized the fields of Natural
Language Processing (NLP) and Artificial Intelligence (AI), due to their
remarkable abilities in fundamental responsibilities of language
understanding and generation, as well as impressive generalization and
reasoning capabilities. As a result, recent studies have attempted to
harness the power of LLMs to enhance recommender systems. Given the rapid
evolution of this research direction in recommender systems, there is a
pressing need for a systematic overview that summarizes existing LLMempowered recommender systems, to provide researchers in relevant fields
with an in-depth understanding. Therefore, in this paper, we conduct a
comprehensive review of LLM-empowered recommender systems from various
aspects including Pre-training, Fine-tuning, and Prompting. More
specifically, we first introduce representative methods to harness the
power of LLMs (as a feature encoder) for learning representations of
users and items. Then, we review recent techniques of LLMs for enhancing
recommender systems from three paradigms, namely pre-training, finetuning, and prompting. Finally, we comprehensively discuss future
directions in this emerging field.
Title: [PDF][PDF] Design of distribution transformer health management
system using IoT sensors
Link:
https://www.researchgate.net/profile/Rajesh-SharmaRajendran/publication/354650561_Design_of_Distribution_Transformer_Heal
th_Management_System_using_IoT_Sensors/links/63a9f7f103aad5368e41d731/D
esign-of-Distribution-Transformer-Health-Management-System-using-IoTSensors.pdf
Year: 2021
Abstract: Transformers are one of the primary device required for an AC
(Alternating Current) distribution system which works on the principle
of mutual induction without any rotating parts. There are two types of
transformers are utilized in the distribution systems namely step up
transformer and step down transformer. The step up transformers are need
to be placed at some regular distances for reducing the line losses
happening over the electrical transmission systems. Similarly the step
down transformers are placed near to the destinations for regulating the
electricity power for the commercial usage. Certain regular check-ups are
must for a distribution transformer for increasing its operational life
time. The proposed work is designed to regularize such health check-ups
using IoT sensors for making a centralized remote monitoring system.
Title: Continuous-time sequential recommendation with temporal graph
collaborative transformer
Link: https://dl.acm.org/doi/abs/10.1145/3459637.3482242
Year: 2021
Abstract: In order to model the evolution of user preference, we should
learn user/item embeddings based on time-ordered item purchasing
sequences, which is defined as Sequential Recommendation~(SR) problem.
Existing methods leverage sequential patterns to model item transitions.
However, most of them ignore crucial temporal collaborative signals,
which are latent in evolving user-item interactions and coexist with
sequential patterns. Therefore, we propose to unify sequential patterns
and temporal collaborative signals to improve the quality of
recommendation, which is rather challenging. Firstly, it is hard to
simultaneously encode sequential patterns and collaborative signals.
Secondly, it is non-trivial to express the temporal effects of
collaborative signals. Hence, we design a new framework Temporal Graph
Sequential Recommender (TGSRec) upon our defined continuous-time
bipartite graph. We propose a novel Temporal Collaborative Transformer
TCT layer in TGSRec, which advances the self-attention mechanism by
adopting a novel collaborative attention. TCT layer can simultaneously
capture collaborative signals from both users and items, as well as
considering temporal dynamics inside sequential patterns. We propagate
the information learned from TCT layer over the temporal graph to unify
sequential patterns and temporal collaborative signals. Empirical results
on five datasets show that modelname significantly outperforms other
baselines, in average up to 22.5% and 22.1% absolute improvements in
Recall@10 and MRR, respectively.
Title:
Cd-HRNN:
Content-Driven
Recommendation System
HRNN
to
Improve
Session-Based
Link: https://ieeexplore.ieee.org/abstract/document/10191438/
Year: 2023
Abstract: The increasing popularity of digital entertainment systems has
made personalization a key factor for success in the industry.
Recommendation systems, particularly for videos and movies, are crucial
in this regard. However, many existing systems are implicit feedback
recommendation system that uses indirect signals to infer user
preferences, such as user actions (e.g. clicks, views, purchases) or
interactions with items (e.g. listening to a song, watching a movie). The
challenge lies in the limited information and uncertainty present in user
behavior, making it difficult to predict their interests and preferences.
In Previous research, Recurrent Neural Networks (RNNs) have shown to be
efficient in predicting the next item in a session, based on past item
click sequences, but their effectiveness is limited when only relying on
click sequences as input data. In this paper, we extend the Hierarchical
RNN architecture (HRNN) for generating recommendations by combining
session clicks and item content information, such as item ids and item
description respectively. The Bidirectional Encoder Representations from
Transformers (BERT) architecture is applied for generating feature
vectors from text descriptions of the items. Our model has been
extensively tested on the benchmark dataset MovieLens 1m has demonstrated
superiority over state-of-the-art (SOTA) session-based recommendation
systems (SBRS) models. Experimental results establish the efficacy of
using content information along with item ids for recommendation.
Title: M6-rec: Generative pretrained language models are open-ended
recommender systems
Link: https://arxiv.org/abs/2205.08084
Year: 2022
Abstract: Industrial recommender systems have been growing increasingly
complex, may involve \emph{diverse domains} such as e-commerce products
and user-generated contents, and can comprise \emph{a myriad of tasks}
such as retrieval, ranking, explanation generation, and even AI-assisted
content production. The mainstream approach so far is to develop
individual algorithms for each domain and each task. In this paper, we
explore the possibility of developing a unified foundation model to
support \emph{open-ended domains and tasks} in an industrial recommender
system, which may reduce the demand on downstream settings' data and can
minimize the carbon footprint by avoiding training a separate model from
scratch for every task. Deriving a unified foundation is challenging due
to (i) the potentially unlimited set of downstream domains and tasks, and
(ii) the real-world systems' emphasis on computational efficiency. We
thus build our foundation upon M6, an existing large-scale industrial
pretrained language model similar to GPT-3 and T5, and leverage M6's
pretrained ability for sample-efficient downstream adaptation, by
representing user behavior data as plain texts and converting the tasks
to either language understanding or generation. To deal with a tight
hardware budget, we propose an improved version of prompt tuning that
outperforms fine-tuning with negligible 1\% task-specific parameters, and
employ techniques such as late interaction, early exiting, parameter
sharing, and pruning to further reduce the inference time and the model
size. We demonstrate the foundation model's versatility on a wide range
of tasks such as retrieval, ranking, zero-shot recommendation,
explanation generation, personalized content creation, and conversational
recommendation, and manage to deploy it on both cloud servers and mobile
devices.
Title: Personalized transformer for explainable recommendation
Link: https://arxiv.org/abs/2105.11601
Year: 2021
Abstract:Personalization of natural language generation plays a vital
role in a large spectrum of tasks, such as explainable recommendation,
review summarization and dialog systems. In these tasks, user and item
IDs are important identifiers for personalization. Transformer, which is
demonstrated with strong language modeling capability, however, is not
personalized and fails to make use of the user and item IDs since the ID
tokens are not even in the same semantic space as the words. To address
this problem, we present a PErsonalized Transformer for Explainable
Recommendation (PETER), on which we design a simple and effective learning
objective that utilizes the IDs to predict the words in the target
explanation, so as to endow the IDs with linguistic meanings and to
achieve personalized Transformer. Besides generating explanations, PETER
can also make recommendations, which makes it a unified model for the
whole recommendation-explanation pipeline. Extensive experiments show
that our small unpretrained model outperforms fine-tuned BERT on the
generation task, in terms of both effectiveness and efficiency, which
highlights the importance and the nice utility of our design.
Title: Towards knowledge-based recommender dialog system
Link: https://arxiv.org/abs/1908.05391
Year: 2019
Abstract:In this paper, we propose a novel end-to-end framework called
KBRD, which stands for Knowledge-Based Recommender Dialog System. It
integrates the recommender system and the dialog generation system. The
dialog system can enhance the performance of the recommendation system
by introducing knowledge-grounded information about users' preferences,
and the recommender system can improve that of the dialog generation
system by providing recommendation-aware vocabulary bias. Experimental
results demonstrate that our proposed model has significant advantages
over the baselines in both the evaluation of dialog generation and
recommendation. A series of analyses show that the two systems can bring
mutual benefits to each other, and the introduced knowledge contributes
to both their performances.
Title: Hybrid transformer
knowledge graph completion
with
multi-level
fusion
for
multimodal
Link: https://dl.acm.org/doi/abs/10.1145/3477495.3531992
Year: 2022
Abstract: Multimodal Knowledge Graphs (MKGs), which organize visual-text
factual knowledge, have recently been successfully applied to tasks such
as information retrieval, question answering, and recommendation system.
Since most MKGs are far from complete, extensive knowledge graph
completion studies have been proposed focusing on the multimodal entity,
relation extraction and link prediction. However, different tasks and
modalities require changes to the model architecture, and not all
images/objects are relevant to text input, which hinders the
applicability to diverse real-world scenarios. In this paper, we propose
a hybrid transformer with multi-level fusion to address those issues.
Specifically, we leverage a hybrid transformer architecture with unified
input-output for diverse multimodal knowledge graph completion tasks.
Moreover, we propose multi-level fusion, which integrates visual and text
representation via coarse-grained prefix-guided interaction and finegrained
correlation-aware
fusion
modules.
We
conduct
extensive
experiments to validate that our MKGformer can obtain SOTA performance
on four datasets of multimodal link prediction, multimodal RE, and
multimodal NER1. https://github.com/zjunlp/MKGformer.
Title: Fast multi-resolution transformer fine-tuning for extreme multilabel text classification
Link:
https://proceedings.neurips.cc/paper_files/paper/2021/hash/3bbca1d243b0
1b47c2bf42b29a8b265c-Abstract.html
Year: 2021
Abstract: Extreme multi-label text classification~(XMC) seeks to find
relevant labels from an extreme large label collection for a given text
input. Many real-world applications can be formulated as XMC problems,
such as recommendation systems, document tagging and semantic search.
Recently, transformer based XMC methods, such as X-Transformer and
LightXML, have shown significant improvement over other XMC methods.
Despite leveraging pre-trained transformer models for text representation,
the fine-tuning procedure of transformer models on large label space
still has lengthy computational time even with powerful GPUs. In this
paper, we propose a novel recursive approach, XR-Transformer to
accelerate the procedure through recursively fine-tuning transformer
models on a series of multi-resolution objectives related to the original
XMC objective function. Empirical results show that XR-Transformer takes
significantly less training time compared to other transformer-based XMC
models while yielding better state-of-the-art results. In particular, on
the public Amazon-3M dataset with 3 million labels, XR-Transformer is not
only 20x faster than X-Transformer but also improves the Precision@1 from
51% to 54%.
Title: Augmenting sequential recommendation with pseudo-prior items via
reversely pre-training transformer
Link: https://dl.acm.org/doi/abs/10.1145/3404835.3463036
Year: 2021
Abstract: Sequential Recommendation characterizes the evolving patterns
by modeling item sequences chronologically. The essential target of it
is to capture the item transition correlations. The recent developments
of transformer inspire the community to design effective sequence
encoders,e.g., SASRec and BERT4Rec. However, we observe that these
transformer-based models suffer from the cold-start issue,i.e.,
performing poorly for short sequences. Therefore, we propose to augment
short sequences while still preserving original sequential correlations.
We introduce a new framework for Augmenting Sequential Recommendation
with Pseudo-prior items (ASReP). We firstly pre-train a transformer with
sequences in a reverse direction to predict prior items. Then, we use
this transformer to generate fabricated historical items at the beginning
of short sequences. Finally, we fine-tune the transformer using these
augmented sequences from the time order to predict the next item.
Experiments on two real-world datasets verify the effectiveness of ASReP.
The code is available on https://github.com/DyGRec/ASReP.
Title: CIRS: Bursting
recommender system
filter
bubbles
by
counterfactual
interactive
Link: https://dl.acm.org/doi/abs/10.1145/3594871
Year: 2023
Abstract: While personalization increases the utility of recommender
systems, it also brings the issue of filter bubbles. e.g., if the system
keeps exposing and recommending the items that the user is interested in,
it may also make the user feel bored and less satisfied. Existing work
studies filter bubbles in static recommendation, where the effect of
overexposure is hard to capture. In contrast, we believe it is more
meaningful to study the issue in interactive recommendation and optimize
long-term user satisfaction. Nevertheless, it is unrealistic to train the
model online due to the high cost. As such, we have to leverage offline
training data and disentangle the causal effect on user satisfaction. To
achieve this goal, we propose a counterfactual interactive recommender
system (CIRS) that augments offline reinforcement learning (offline RL)
with causal inference. The basic idea is to first learn a causal user
model on historical data to capture the overexposure effect of items on
user satisfaction. It then uses the learned causal user model to help the
planning of the RL policy. To conduct evaluation offline, we innovatively
create an authentic RL environment (KuaiEnv) based on a real-world fully
observed user rating dataset. The experiments show the effectiveness of
CIRS in bursting filter bubbles and achieving long-term success in
interactive recommendation. The implementation of CIRS is available via
https://github.com/chongminggao/ CIRS-codes.
Title: Self-supervised learning for recommender systems: A survey
Link: https://ieeexplore.ieee.org/abstract/document/10144391/
Year: 2023
Abstract: In recent years, neural architecture-based recommender systems
have achieved tremendous success, but they still fall short of expectation
when dealing with highly sparse data. Self-supervised learning (SSL), as
an emerging technique for learning from unlabeled data, has attracted
considerable attention as a potential solution to this issue. This survey
paper presents a systematic and timely review of research efforts on
self-supervised recommendation (SSR). Specifically, we propose an
exclusive definition of SSR, on top of which we develop a comprehensive
taxonomy to divide existing SSR methods into four categories: contrastive,
generative, predictive, and hybrid. For each category, we elucidate its
concept and formulation, the involved methods, as well as its pros and
cons. Furthermore, to facilitate empirical comparison, we release an
open-source library SELFRec ( https://github.com/Coder-Yu/SELFRec ),
which incorporates a wide range of SSR models and benchmark datasets.
Through rigorous experiments using this library, we derive and report
some significant findings regarding the selection of self-supervised
signals for enhancing recommendation. Finally, we shed light on the
limitations in the current research and outline the future research
directions.
Title: Improving Arabic text categorization using transformer training
diversification
Link: https://aclanthology.org/2020.wanlp-1.21/
Year: 2020
Abstract: Automatic categorization of short texts, such as news headlines
and social media posts, has many applications ranging from content
analysis to recommendation systems. In this paper, we use such text
categorization i.e., labeling the social media posts to categories like
‘sports’, ‘politics’, ‘human-rights’ among others, to showcase the
efficacy of models across different sources and varieties of Arabic. In
doing so, we show that diversifying the training data, whether by using
diverse training data for the specific task (an increase of 21% macro F1)
or using diverse data to pre-train a BERT model (26% macro F1), leads to
overall improvements in classification effectiveness. In our work, we
also introduce two new Arabic text categorization datasets, where the
first is composed of social media posts from a popular Arabic news channel
that cover Twitter, Facebook, and YouTube, and the second is composed of
tweets from popular Arabic accounts. The posts in the former are nearly
exclusively authored in modern standard Arabic (MSA), while the tweets
in the latter contain both MSA and dialectal Arabic.
Title: Structure Guided Multi-modal Pre-trained Transformer for Knowledge
Graph Reasoning
Link: https://arxiv.org/abs/2307.03591
Year: 2023
Abstract:Multimodal knowledge graphs (MKGs), which intuitively organize
information in various modalities, can benefit multiple practical
downstream tasks, such as recommendation systems, and visual question
answering. However, most MKGs are still far from complete, which motivates
the flourishing of MKG reasoning models. Recently, with the development
of general artificial architectures, the pretrained transformer models
have drawn increasing attention, especially for multimodal scenarios.
However, the research of multimodal pretrained transformer (MPT) for
knowledge graph reasoning (KGR) is still at an early stage. As the biggest
difference between MKG and other multimodal data, the rich structural
information underlying the MKG still cannot be fully leveraged in existing
MPT models. Most of them only utilize the graph structure as a retrieval
map for matching images and texts connected with the same entity. This
manner hinders their reasoning performances. To this end, we propose the
graph Structure Guided Multimodal Pretrained Transformer for knowledge
graph reasoning, termed SGMPT. Specifically, the graph structure encoder
is adopted for structural feature encoding. Then, a structure-guided
fusion module with two different strategies, i.e., weighted summation and
alignment constraint, is first designed to inject the structural
information into both the textual and visual features. To the best of our
knowledge, SGMPT is the first MPT model for multimodal KGR, which mines
the structural information underlying the knowledge graph. Extensive
experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT
outperforms existing state-of-the-art models, and prove the effectiveness
of the designed strategies.
Title: [HTML][HTML] A dynamic graph representation learning based on
temporal graph transformer
Link:
https://www.sciencedirect.com/science/article/pii/S1110016822005336
Year: 2023
Abstract: The graph neural network has received significant attention in
recent years because of its unique role in mining graph-structure data
and its ubiquitous application in various fields, such as social
networking and recommendation systems. Although most work focuses on
learning low-dimensional node representation in static graphs, the
dynamic nature of real-world networks makes temporal graphs more
practical and significant. In this paper, we propose a dynamic graph
representation learning method based on a temporal graph transformer
(TGT), which can efficiently preserve high-order information and
temporally evolve structural properties by incorporating an update module,
an aggregation module, and a propagation module in a single model. The
experimental results on three real-world networks demonstrate that the
TGT outperforms state-of-the-art baselines for dynamic link prediction
and edge classification tasks in terms of both accuracy and efficiency.
Title: Personalized re-ranking for recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3298689.3347000
Year: 2019
Abstract: Ranking is a core task in recommender systems, which aims at
providing an ordered list of items to users. Typically, a ranking function
is learned from the labeled dataset to optimize the global performance,
which produces a ranking score for each individual item. However, it may
be sub-optimal because the scoring function applies to each item
individually and does not explicitly consider the mutual influence
between items, as well as the differences of users' preferences or intents.
Therefore, we propose a personalized re-ranking model for recommender
systems. The proposed re-ranking model can be easily deployed as a followup modular after any ranking algorithm, by directly using the existing
ranking feature vectors. It directly optimizes the whole recommendation
list by employing a transformer structure to efficiently encode the
information of all items in the list. Specifically, the Transformer
applies a self-attention mechanism that directly models the global
relationships between any pair of items in the whole list. We confirm
that the performance can be further improved by introducing pre-trained
embedding to learn personalized encoding functions for different users.
Experimental results on both offline benchmarks and real-world online ecommerce systems demonstrate the significant improvements of the proposed
re-ranking model.
Title: An ensemble-based hotel recommender system
analysis and aspect categorization of hotel reviews
using
sentiment
Link:
https://www.sciencedirect.com/science/article/pii/S1568494620308735
Year: 2021
Abstract: Finding a suitable hotel based on user’s need and affordability
is a complex decision-making process. Nowadays, the availability of an
ample amount of online reviews made by the customers helps us in this
regard. This very fact gives us a promising research direction in the
field of tourism called hotel recommendation system which also helps in
improving the information processing of consumers. Real-world reviews may
showcase different sentiments of the customers towards a hotel and each
review can be categorized based on different aspects such as cleanliness,
value, service, etc. Keeping these facts in mind, in the present work,
we have proposed a hotel recommendation system using Sentiment Analysis
of the hotel reviews, and aspect-based review categorization which works
on the queries given by a user. Furthermore, we have provided a new rich
and diverse dataset of online hotel reviews crawled from Tripadvisor.com.
We have followed a systematic approach which first uses an ensemble of a
binary classification called Bidirectional Encoder Representations from
Transformers (BERT) model with three phases for positive–negative,
neutral–negative, neutral–positive sentiments merged using a weight
assigning protocol. We have then fed these pre-trained word embeddings
generated by the BERT models along with other different textual features
such as word vectors generated by Word2vec, TF–IDF of frequent words,
subjectivity score, etc. to a Random Forest classifier. After that, we
have also grouped the reviews into different categories using an approach
that involves fuzzy logic and cosine similarity. Finally, we have created
a recommender system by the aforementioned frameworks. Our model has
achieved a Macro F1-score of 84% and test accuracy of 92.36% in the
classification of sentiment polarities. Also, the results of the
categorized reviews have formed compact clusters. The results are quite
promising and much better compared to state-of-the-art models.
Title: Pre-training graph transformer with multimodal side information
for recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3474085.3475709
Year: 2021
Abstract: Side information of items, e.g., images and text description,
has shown to be effective in contributing to accurate recommendations.
Inspired by the recent success of pre-training models on natural language
and images, we propose a pre-training strategy to learn item
representations by considering both item side information and their
relationships. We relate items by common user activities, e.g., copurchase, and construct a homogeneous item graph. This graph provides a
unified view of item relations and their associated side information in
multimodality. We develop a novel sampling algorithm named MCNSampling
to select contextual neighbors for each item. The proposed Pre-trained
Multimodal Graph Transformer (PMGT) learns item representations with two
objectives: 1) graph structure reconstruction, and 2) masked node feature
reconstruction. Experimental results on real datasets demonstrate that
the proposed PMGT model effectively exploits the multimodality side
information to achieve better accuracies in downstream tasks including
item recommendation and click-through ratio prediction. In addition, we
also report a case study of testing PMGT in an online setting with 600
thousand users.
Title: Knowledge aware emotion recognition in textual conversations via
multi-task incremental transformer
Link: https://aclanthology.org/2020.coling-main.392/
Year: 2020
Abstract: Emotion recognition in textual conversations (ERTC) plays an
important role in a wide range of applications, such as opinion mining,
recommender systems, and so on. ERTC, however, is a challenging task. For
one thing, speakers often rely on the context and commonsense knowledge
to express emotions; for another, most utterances contain neutral emotion
in conversations, as a result, the confusion between a few non-neutral
utterances and much more neutral ones restrains the emotion recognition
performance. In this paper, we propose a novel Knowledge Aware Incremental
Transformer with Multi-task Learning (KAITML) to address these challenges.
Firstly, we devise a dual-level graph attention mechanism to leverage
commonsense knowledge, which augments the semantic information of the
utterance. Then we apply the Incremental Transformer to encode multiturn contextual utterances. Moreover, we are the first to introduce multitask learning to alleviate the aforementioned confusion and thus further
improve the emotion recognition performance. Extensive experimental
results show that our KAITML model outperforms the state-of-the-art
models across five benchmark datasets.
Title: A survey on knowledge graph-based recommender systems
Link: https://ieeexplore.ieee.org/abstract/document/9390863/
Year: 2021
Abstract: To solve the cognitive overlord problem and information
explosion, recommender systems have been using to model the user interest.
Although recommender systems have been developed for decades, there still
exists many problems such as cold start and data sparsity. Thus, the
knowledge graph is introduced into the recommendation domain to alleviate
these problems. We collect papers related to the knowledge graph-based
recommender systems in recent years to summarize their fundamental
knowledge and main ideas, including the usage of the knowledge graph in
the recommender systems and user interest models. Finally, we propose
several future directions aiming to make some progress.
Title: Improving conversational recommender systems via knowledge graph
based semantic fusion
Link: https://dl.acm.org/doi/abs/10.1145/3394486.3403143
Year: 2020
Abstract: Conversational recommender systems (CRS) aim to recommend highquality items to users through interactive conversations. Although
several efforts have been made for CRS, two major issues still remain to
be solved. First, the conversation data itself lacks of sufficient
contextual information for accurately understanding users' preference.
Second, there is a semantic gap between natural language expression and
item-level user preference. To address these issues, we incorporate both
word-oriented and entity-oriented knowledge graphs~(KG) to enhance the
data representations in CRSs, and adopt Mutual Information Maximization
to align the word-level and entity-level semantic spaces. Based on the
aligned semantic representations, we further develop a KG-enhanced
recommender component for making accurate recommendations, and a KGenhanced dialog component that can generate informative keywords or
entities in the response text. Extensive experiments have demonstrated
the effectiveness of our approach in yielding better performance on both
recommendation and conversation tasks.
Title:
Leveraging
historical
conversational recommender system
interaction
data
for
improving
Link: https://dl.acm.org/doi/abs/10.1145/3340531.3412098
Year: 2020
Abstract: Recently, conversational recommender system (CRS) has become
an emerging and practical research topic. Most of the existing CRS methods
focus on learning effective preference representations for users from
conversation data alone. While, we take a new perspective to leverage
historical interaction data for improving CRS. For this purpose, we
propose a novel pre-training approach to integrating both item-based
preference sequence (from historical interaction data) and attributebased preference sequence (from conversation data) via pre-training
methods. We carefully design two pre-training tasks to enhance
information fusion between item- and attribute-based preference. To
improve the learning performance, we further develop an effective
negative sample generator which can produce high-quality negative samples.
Experiment results on two real-world datasets have demonstrated the
effectiveness of our approach for improving CRS.
Title: Specter: Document-level representation learning using citationinformed transformers
Link: https://arxiv.org/abs/2004.07180
Year: 2020
Abstract:Representation learning is a critical ingredient for natural
language processing systems. Recent Transformer language models like BERT
learn powerful textual representations, but these models are targeted
towards token- and sentence-level training objectives and do not leverage
information on inter-document relatedness, which limits their documentlevel representation power. For applications on scientific documents,
such as classification and recommendation, the embeddings power strong
performance on end tasks. We propose SPECTER, a new method to generate
document-level embedding of scientific documents based on pretraining a
Transformer language model on a powerful signal of document-level
relatedness: the citation graph. Unlike existing pretrained language
models, SPECTER can be easily applied to downstream applications without
task-specific fine-tuning. Additionally, to encourage further research
on document-level models, we introduce SciDocs, a new evaluation
benchmark consisting of seven document-level tasks ranging from citation
prediction, to document classification and recommendation. We show that
SPECTER outperforms a variety of competitive baselines on the benchmark.
Title: [PDF][PDF] Adversarial oracular seq2seq learning for sequential
recommendation
Link: https://www.ijcai.org/Proceedings/2020/0264.pdf
Year: 2021
Abstract:
Title: Uprec: User-aware pre-training for recommender systems
Link: https://arxiv.org/abs/2102.10989
Year: 2021
Abstract:Existing sequential recommendation methods rely on large amounts
of training data and usually suffer from the data sparsity problem. To
tackle this, the pre-training mechanism has been widely adopted, which
attempts to leverage large-scale data to perform self-supervised learning
and transfer the pre-trained parameters to downstream tasks. However,
previous pre-trained models for recommendation focus on leverage
universal sequence patterns from user behaviour sequences and item
information, whereas ignore capturing personalized interests with the
heterogeneous user information, which has been shown effective in
contributing to personalized recommendation. In this paper, we propose a
method to enhance pre-trained models with heterogeneous user information,
called User-aware Pre-training for Recommendation (UPRec). Specifically,
UPRec leverages the user attributes andstructured social graphs to
construct self-supervised objectives in the pre-training stage and
proposes two user-aware pre-training tasks. Comprehensive experimental
results on several real-world large-scale recommendation datasets
demonstrate that UPRec can effectively integrate user information into
pre-trained models and thus provide more appropriate recommendations for
users.
Title: Towards topic-guided conversational recommender system
Link: https://arxiv.org/abs/2010.04125
Year: 2020
Abstract:Conversational recommender systems (CRS) aim to recommend highquality items to users through interactive conversations. To develop an
effective CRS, the support of high-quality datasets is essential.
Existing CRS datasets mainly focus on immediate requests from users,
while lack proactive guidance to the recommendation scenario. In this
paper, we contribute a new CRS dataset named \textbf{TG-ReDial}
(\textbf{Re}commendation
through
\textbf{T}opic-\textbf{G}uided
\textbf{Dial}og). Our dataset has two major features. First, it
incorporates topic threads to enforce natural semantic transitions
towards the recommendation scenario. Second, it is created in a semiautomatic way, hence human annotation is more reasonable and controllable.
Based on TG-ReDial, we present the task of topic-guided conversational
recommendation, and propose an effective approach to this task. Extensive
experiments have demonstrated the effectiveness of our approach on three
sub-tasks, namely topic prediction, item recommendation and response
generation. TG-ReDial is available at this https URL.
Title: What does bert know about books, movies and music? probing bert
for conversational recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412249
Year: 2020
Abstract: Heavily pre-trained transformer models such as BERT have
recently shown to be remarkably powerful at language modelling, achieving
impressive results on numerous downstream tasks. It has also been shown
that they implicitly store factual knowledge in their parameters after
pre-training. Understanding what the pre-training procedure of LMs
actually learns is a crucial step for using and improving them for
Conversational Recommender Systems (CRS). We first study how much offthe-shelf pre-trained BERT “knows” about recommendation items such as
books, movies and music. In order to analyze the knowledge stored in
BERT’s parameters, we use different probes (i.e., tasks to examine a
trained model regarding certain properties) that require different types
of knowledge to solve, namely content-based and collaborative-based.
Content-based knowledge is knowledge that requires the model to match the
titles of items with their content information, such as textual
descriptions and genres. In contrast, collaborative-based knowledge
requires the model to match items with similar ones, according to
community interactions such as ratings. We resort to BERT’s Masked
Language Modelling (MLM) head to probe its knowledge about the genre of
items, with cloze style prompts. In addition, we employ BERT’s Next
Sentence Prediction (NSP) head and representations’ similarity (SIM) to
compare relevant and non-relevant search and recommendation querydocument inputs to explore whether BERT can, without any fine-tuning,
rank relevant items first. Finally, we study how BERT performs in a
conversational recommendation downstream task. To this end, we fine-tune
BERT to act as a retrieval-based CRS. Overall, our experiments show that:
(i) BERT has knowledge stored in its parameters about the content of
books, movies and music; (ii) it has more content-based knowledge than
collaborative-based knowledge; and (iii) fails on conversational
recommendation when faced with adversarial data.
Title: Graph neural
recommendation
network
for
tag
ranking
in
tag-enhanced
video
Link: https://dl.acm.org/doi/abs/10.1145/3340531.3416021
Year: 2020
Abstract: In tag-enhanced video recommendation systems, videos are
attached with some tags that highlight the contents of videos from
different aspects. Tag ranking in such recommendation systems provides
personalized tag lists for videos from their tag candidates. A better tag
ranking model could attract users to click more tags, enter their
corresponding tag channels, and watch more tag-specific videos, which
improves both tag click rate and video watching time. However, most
conventional tag ranking models merely concentrate on tag-video relevance
or tag-related behaviors, ignoring the rich information in video-related
behaviors. We should consider user preferences on both tags and videos.
In this paper, we propose a novel Graph neural network based tag ranking
(GraphTR) framework on a huge heterogeneous network with video, tag, user
and media. We design a novel graph neural network that combines multifield transformer, GraphSAGE and neural FM layers in node aggregation.
We also propose a neighbor-similarity based loss to encode various user
preferences into heterogeneous node representations. In experiments, we
conduct both offline and online evaluations on a real-world video
recommendation system in WeChat Top Stories. The significant improvements
in both video and tag related metrics confirm the effectiveness and
robustness in real-world tag-enhanced video recommendation. Currently,
GraphTR has been deployed on WeChat Top Stories for more than six months.
The source codes are in https://github.com/lqfarmer/GraphTR.
Title: Molecular graph enhanced transformer for retrosynthesis prediction
Link:
https://www.sciencedirect.com/science/article/pii/S0925231221009413
Year: 2021
Abstract: With massive possible synthetic routes in chemistry,
retrosynthesis prediction is still a challenge for researchers. Recently,
retrosynthesis prediction is formulated as a Machine Translation (MT)
task. Namely, since each molecule can be represented as a Simplified
Molecular-Input Line-Entry System (SMILES) string, the process of
retrosynthesis is analogized to a process of language translation from
the product to reactants. However, the MT models that applied on SMILES
data usually ignore the information of natural atomic connections and the
topology of molecules. To make more chemically plausible constrains on
the atom representation learning for better performance, in this paper,
we propose a Graph Enhanced Transformer (GET) framework, which adopts
both the sequential and graphical information of molecules. Four
different GET designs are proposed, which fuse the SMILES representations
with atom embeddings learned from our improved Graph Neural Network (GNN).
Empirical results show that our model significantly outperforms the
vanilla Transformer model in test accuracy.
Title: [HTML][HTML] News recommender system: a review of recent progress,
challenges, and opportunities
Link: https://link.springer.com/article/10.1007/s10462-021-10043-x
Year: 2022
Abstract: Nowadays, more and more news readers read news online where
they have access to millions of news articles from multiple sources. In
order to help users find the right and relevant content, news recommender
systems (NRS) are developed to relieve the information overload problem
and suggest news items that might be of interest for the news readers.
In this paper, we highlight the major challenges faced by the NRS and
identify the possible solutions from the state-of-the-art. Our discussion
is divided into two parts. In the first part, we present an overview of
the recommendation solutions, datasets, evaluation criteria beyond
accuracy and recommendation platforms being used in the NRS. We also talk
about two popular classes of models that have been successfully used in
recent years. In the second part, we focus on the deep neural networks
as solutions to build the NRS. Different from previous surveys, we study
the effects of news recommendations on user behaviors and try to suggest
possible remedies to mitigate those effects. By providing the state-ofthe-art knowledge, this survey can help researchers and professional
practitioners have a better understanding of the recent developments in
news recommendation algorithms. In addition, this survey sheds light on
the potential new directions.
Title: Random Offset Block Embedding (ROBE) for compressed embedding
tables in deep learning recommendation systems
Link:
https://proceedings.mlsys.org/paper_files/paper/2022/hash/1eb34d662b67a
14e3511d0dfd78669be-Abstract.html
Year: 2022
Abstract: Deep learning for recommendation data is one of the most pervasive and challenging AI
workload in recent times. State-of-the-art recommendation models are one of the largest models
matching the likes of GPT-3 and Switch Transformer. Challenges in deep learning recommendation
models (DLRM) stem from learning dense embeddings for each of the categorical tokens. These
embedding tables in industrial scale models can be as large as hundreds of terabytes. Such large models
lead to a plethora of engineering challenges, not to mention prohibitive communication overheads, and
slower training and inference times. Of these, slower inference time directly impacts user experience.
Model compression for DLRM is gaining traction and the community has recently shown impressive
compression results. In this paper, we present Random Offset Block Embedding Array (ROBE) as a low
memory alternative to embedding tables which provide orders of magnitude reduction in memory
usage while maintaining accuracy and boosting execution speed. ROBE is a simple fundamental
approach in improving both cache performance and the variance of randomized hashing, which could
be of independent interest in itself. We demonstrate that we can successfully train DLRM models with
same accuracy while using 1000× less memory. A 1000× compressed model directly results in faster
inference without any engineering effort. In particular, we show that we can train DLRM model using
ROBE array of size 100MB on a single GPU to achieve AUC of 0.8025 or higher as required by official
MLPerf CriteoTB benchmark DLRM model of 100GB while achieving about
in inference throughput.
3.1× (209\%) improvement
Title: Guided transformer: Leveraging multiple external sources for
representation learning in conversational search
Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401061
Year: 2020
Abstract: Asking clarifying questions in response to ambiguous or faceted
queries has been recognized as a useful technique for various information
retrieval systems, especially conversational search systems with limited
bandwidth interfaces. Analyzing and generating clarifying questions have
been studied recently but the accurate utilization of user responses to
clarifying questions has been relatively less explored. In this paper,
we enrich the representations learned by Transformer networks using a
novel attention mechanism from external information sources that weights
each term in the conversation. We evaluate this Guided Transformer model
in a conversational search scenario that includes clarifying questions.
In our experiments, we use two separate external sources, including the
top retrieved documents and a set of different possible clarifying
questions for the query. We implement the proposed representation
learning model for two downstream tasks in conversational search;
document retrieval and next clarifying question selection. Our
experiments use a public dataset for search clarification and demonstrate
significant improvements compared to competitive baselines.
Title: Self-supervised reinforcement learning for recommender systems
Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401147
Year: 2020
Abstract: In session-based or sequential recommendation, it is important
to consider a number of factors like long-term user engagement, multiple
types of user-item interactions such as clicks, purchases etc. The current
state-of-the-art supervised approaches fail to model them appropriately.
Casting sequential recommendation task as a reinforcement learning (RL)
problem is a promising direction. A major component of RL approaches is
to train the agent through interactions with the environment. However,
it is often problematic to train a recommender in an on-line fashion due
to the requirement to expose users to irrelevant recommendations. As a
result, learning the policy from logged implicit feedback is of vital
importance, which is challenging due to the pure off-policy setting and
lack of negative rewards (feedback). In this paper, we propose selfsupervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output
layers: one for self-supervised learning and the other for RL. The RL
part acts as a regularizer to drive the supervised layer focusing on
specific rewards (e.g., recommending items which may lead to purchases
rather than clicks) while the self-supervised layer with cross-entropy
loss provides strong gradient signals for parameter updates. Based on
such an approach, we propose two frameworks namely Self-Supervised Qlearning (SQN) and Self-Supervised Actor-Critic (SAC). We integrate the
proposed frameworks with four state-of-the-art recommendation models.
Experimental results on two real-world datasets demonstrate the
effectiveness of our approach.
Title: MEANTIME: Mixture of attention mechanisms with multi-temporal
embeddings for sequential recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412216
Year: 2020
Abstract: Recently, self-attention based models have achieved state-ofthe-art performance in sequential recommendation task. Following the
custom from language processing, most of these models rely on a simple
positional embedding to exploit the sequential nature of the user’s
history. However, there are some limitations regarding the current
approaches. First, sequential recommendation is different from language
processing in that timestamp information is available. Previous models
have not made good use of it to extract additional contextual information.
Second, using a simple embedding scheme can lead to information bottleneck
since the same embedding has to represent all possible contextual biases.
Third, since previous models use the same positional embedding in each
attention head, they can wastefully learn overlapping patterns. To
address these limitations, we propose MEANTIME (MixturE of AtteNTIon
mechanisms with Multi-temporal Embeddings) which employs multiple types
of temporal embeddings designed to capture various patterns from the
user’s behavior sequence, and an attention structure that fully leverages
such diversity. Experiments on real-world data show that our proposed
method outperforms current state-of-the-art sequential recommendation
methods, and we provide an extensive ablation study to analyze how the
model gains from the diverse positional information.
Title: Deep learning for recommender systems: A Netflix case study
Link:
https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/18140
Year: 2021
Abstract: Deep learning has profoundly impacted many areas of machine
learning. However, it took a while for its impact to be felt in the field
of recommender systems. In this article, we outline some of the challenges
encountered and lessons learned in using deep learning for recommender
systems at Netflix. We first provide an overview of the various
recommendation tasks on the Netflix service. We found that different
model architectures excel at different tasks. Even though many deep-
learning models can be understood as extensions of existing (simple)
recommendation algorithms, we initially did not observe significant
improvements in performance over well-tuned non-deep-learning approaches.
Only when we added numerous features of heterogeneous types to the input
data, deep-learning models did start to shine in our setting. We also
observed that deep-learning methods can exacerbate the problem of
offline–online metric (mis-)alignment. After addressing these challenges,
deep learning has ultimately resulted in large improvements to our
recommendations as measured by both offline and online metrics. On the
practical side, integrating deep-learning toolboxes in our system has
made it faster and easier to implement and experiment with both deeplearning and non-deep-learning approaches for various recommendation
tasks. We conclude this article by summarizing our take-aways that may
generalize to other applications beyond Netflix.
Title: Transformer network
lithium-ion batteries
for
remaining
useful
life
prediction
of
Link: https://ieeexplore.ieee.org/abstract/document/9714323/
Year: 2022
Abstract: Accurately predicting the Remaining Useful Life (RUL) of a Liion battery plays an important role in managing the health and estimating
the state of a battery. With the rapid development of electric vehicles,
there is an increasing need to develop and improve the techniques for
predicting RUL. To predict RUL, we designed a Transformer-based neural
network. First, battery capacity data is always full of noise, especially
during battery charge/discharge regeneration. To alleviate this problem,
we applied a Denoising Auto-Encoder (DAE) to process raw data. Then, to
capture temporal information and learn useful features, a reconstructed
sequence was fed into a Transformer network. Finally, to bridge denoising
and prediction tasks, we combined these two tasks into a unified framework.
Results of extensive experiments conducted on two data sets and a
comparison with some existing methods show that our proposed method
performs better in predicting RUL. Our projects are all open source and
are available at https://github.com/XiuzeZhou/RUL .
Title: A hierarchical recommendation system for E-commerce using online
user reviews
Link:
https://www.sciencedirect.com/science/article/pii/S1567422322000151
Year: 2022
Abstract: Recommendation systems are considered as one of the important
components of e-commerce platforms due to their direct impact on
profitability. In this study, we propose a hierarchical recommendation
system to increase the performance of the e-commerce recommendation
system. Our DeepIDRS approach has a two-level hierarchical structure: (1)
The first level uses bidirectional encoder representations to represent
textual information of an item (title, description, and a subset of item
reviews), efficiently and accurately; (2) The second level is an
attention-based sequential recommendation model that uses item embeddings
derived from the first level of the hierarchical structure. Furthermore,
we compare our approach DeepIDRS with various approaches from different
perspectives. Our results in the real-world dataset show that DeepIDRS
provides at least 10% better HR@10 and NCCG@10 performance than other
review-based models. With this study, for e-commerce, we clearly show
that a hierarchical, explainable recommendation system that accurately
represents the item title, description, and a subset of item reviews,
improves performance.
Title: [PDF][PDF] Bert, elmo, use and infersent sentence encoders: The
panacea for research-paper recommendation?
Link: https://ceur-ws.org/Vol-2431/paper2.pdf
Year: 2019
Abstract:
Title: EdgeRec: recommender system on edge in Mobile Taobao
Link: https://dl.acm.org/doi/abs/10.1145/3340531.3412700
Year: 2020
Abstract: Recommender system (RS) has become a crucial module in most
web-scale applications. Recently, most RSs are in the waterfall form
based on the cloud-to-edge framework, where recommended results are
transmitted to edge (e.g., user mobile) by computing in advance in the
cloud server. Despite effectiveness, network bandwidth and latency
between cloud server and edge may cause the delay for system feedback and
user perception. Hence, real-time computing on edge could help capture
user preferences more preciously and thus make more satisfactory
recommendations. Our work, to our best knowledge, is the first attempt
to design and implement the novel Recommender System on Edge (EdgeRec),
which achieves Real-time User Perception and Real-time System Feedback.
Moreover, we propose Heterogeneous User Behavior Sequence Modeling and
Context-aware Reranking with Behavior Attention Networks to capture
user's diverse interests and adjust recommendation results accordingly.
Experimental results on both the offline evaluation and online
performance in Taobao home-page feeds demonstrate the effectiveness of
EdgeRec.
Title: Towards question-based recommender systems
Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401180
Year: 2020
Abstract: Conversational and question-based recommender systems have
gained increasing attention in recent years, with users enabled to
converse with the system and better control recommendations. Nevertheless,
research in the field is still limited, compared to traditional
recommender systems. In this work, we propose a novel Question-based
recommendation method, Qrec, to assist users to find items interactively,
by answering automatically constructed and algorithmically chosen
questions. Previous conversational recommender systems ask users to
express their preferences over items or item facets. Our model, instead,
asks users to express their preferences over descriptive item features.
The model is first trained offline by a novel matrix factorization
algorithm, and then iteratively updates the user and item latent factors
online by a closed-form solution based on the user answers. Meanwhile,
our model infers the underlying user belief and preferences over items
to learn an optimal question-asking strategy by using Generalized Binary
Search, so as to ask a sequence of questions to the user. Our experimental
results demonstrate that our proposed matrix factorization model
outperforms the traditional Probabilistic Matrix Factorization model.
Further, our proposed Qrec model can greatly improve the performance of
state-of-the-art baselines, and it is also effective in the case of coldstart user and item recommendations.
Title: Enhancing Recommender Systems with Large Language Model Reasoning
Graphs
Link: https://arxiv.org/abs/2308.10835
Year: 2023
Abstract:Recommendation systems aim to provide users with relevant
suggestions, but often lack interpretability and fail to capture higherlevel semantic relationships between user behaviors and profiles. In this
paper, we propose a novel approach that leverages large language models
(LLMs) to construct personalized reasoning graphs. These graphs link a
user's profile and behavioral sequences through causal and logical
inferences, representing the user's interests in an interpretable way.
Our approach, LLM reasoning graphs (LLMRG), has four components: chained
graph reasoning, divergent extension, self-verification and scoring, and
knowledge base self-improvement. The resulting reasoning graph is encoded
using graph neural networks, which serves as additional input to improve
conventional recommender systems, without requiring extra user or item
information. Our approach demonstrates how LLMs can enable more logical
and interpretable recommender systems through personalized reasoning
graphs. LLMRG allows recommendations to benefit from both engineered
recommendation systems and LLM-derived reasoning graphs. We demonstrate
the effectiveness of LLMRG on benchmarks and real-world scenarios in
enhancing base recommendation models.
Title: Deep learning based recommender system using cross convolutional
filters
Link:
https://www.sciencedirect.com/science/article/pii/S0020025522000561
Year: 2022
Abstract: With the recent development of online transactions, recommender
systems have increasingly attracted attention in various domains. The
recommender system supports the users’ decision making by recommending
items that are more likely to be preferred. Many studies in the field of
deep learning-based recommender systems have attempted to capture the
complex interactions between users’ and items’ features for accurate
recommendation. In this paper, we propose a recommender system based on
the convolutional neural network using the outer product matrix of
features and cross convolutional filters. The proposed method can deal
with the various types of features and capture the meaningful higherorder interactions between users and items, giving greater weight to
important features. Moreover, it can alleviate the overfitting problem
since the proposed method includes the global average or max pooling
instead of the fully connected layers in the structure. Experiments showed
that the proposed method performs better than the existing methods, by
capturing important interactions and alleviating the overfitting issue.
Title: Bert4sessrec: Content-based video relevance prediction
bidirectional encoder representations from transformer
with
Link: https://dl.acm.org/doi/abs/10.1145/3343031.3356051
Year: 2019
Abstract: This paper describes our solution for the Content-Based Video
Relevance Prediction (CBVRP) challenge, where the task is to predict user
click-through behavior on new TV series or new movies according to the
user's historical behavior. We consider the task as a session-based
recommendation problem and we focus on the modeling of the session. Thus,
we use the Bidirectional Encoder Representations from Transformer (BERT)
methodology and propose a BERT for session-based recommendation
(BERT4SessRec) method. Our method has two stages: in the pre-training
stage, we use all sessions as training data and train the bidirectional
session encoder with the masking trick; in the fine-tuning stage, we use
the provided click-through data and train the click-through prediction
network. Our method achieves session representations with the help of
BERT, which effectively captures the bidirectional correlation in each
session. In addition, the pre-training stage makes full use of all
sessions, overcoming the positive-negative imbalance problem of the
click-through data. We report the results of using different kinds of
features on the test set of the challenge, which verify the effectiveness
of our method.
Title: Bert4nilm: A bidirectional transformer model for non-intrusive
load monitoring
Link: https://dl.acm.org/doi/abs/10.1145/3427771.3429390
Year: 2020
Abstract:
Non-intrusive
load
monitoring
(NILM)
based
energy
disaggregation is the decomposition of a system's energy into the
consumption of its individual appliances. Previous work on deep learning
NILM algorithms has shown great potential in the field of energy
management and smart grids. In this paper, we propose BERT4NILM, an
architecture based on bidirectional encoder representations from
transformers (BERT) and an improved objective function designed
specifically for NILM learning. We adapt the bidirectional transformer
architecture to the field of energy disaggregation and follow the pattern
of sequence-to-sequence learning. With the improved loss function and
masked training, BERT4NILM outperforms state-of-the-art models across
various metrics on the two publicly available datasets UK-DALE and REDD.
Title:
[HTML][HTML]
Knowledge
transfer
recommendation: A review and prospect
via
pre-training
for
Link:
https://www.frontiersin.org/articles/10.3389/fdata.2021.602071/full
Year: 2021
Abstract: Recommender systems aim to provide item recommendations for
users and are usually faced with data sparsity problems (e.g., cold start)
in real-world scenarios. Recently pre-trained models have shown their
effectiveness in knowledge transfer between domains and tasks, which can
potentially alleviate the data sparsity problem in recommender systems.
In this survey, we first provide a review of recommender systems with
pre-training. In addition, we show the benefits of pre-training to
recommender systems through experiments. Finally, we discuss several
promising directions for future research of recommender systems with pretraining. The source code of our experiments will be available to
facilitate future research.
Title: Developing a personalized recommendation system in a smart product
service system based on unsupervised learning model
Link:
https://www.sciencedirect.com/science/article/pii/S0166361521000282
Year: 2021
Abstract: Contemporary consumers have begun shifting their focus from
product functionality toward the value that can be derived from products.
In response to this trend, companies have begun using product service
systems (PSS), business models that provide customers not only with
tangible products but also with intangible services. Moreover, with the
increasing use of smart devices, services providers can offer customized
services to customers based on user-generated data with smart product
service systems (Smart PSS).
Despite extensive research on Smart PSS framework, few of these frameworks
treated customer as an active data producer, which means producing data
for the Smart PSS actively. Additionally, most of them proposed a general
solution instead of a personalized one. To bridge the research gap, this
study proposed a method that includes: (1) unsupervised natural language
processing (NLP) methods to analyze user-provided data. (2) a
recommendation system integrating deep learning to offer customers with
personalized solutions. Thus, the role of customers is not only a service
receiver but also an active data producer and forms a value co-creation
process with service providers. A case study of tourist recommendation
validate the benefits of proposed method. The main contribution of this
research is to develop a personalized smart PSS method which could achieve
a win-win situation for all players in this method.
Title: A review of modern fashion recommender systems
Link: https://arxiv.org/abs/2202.02757
Year: 2022
Abstract:The textile and apparel industries have grown tremendously over
the last few years. Customers no longer have to visit many stores, stand
in long queues, or try on garments in dressing rooms as millions of
products are now available in online catalogs. However, given the plethora
of options available, an effective recommendation system is necessary to
properly sort, order, and communicate relevant product material or
information to users. Effective fashion RS can have a noticeable impact
on billions of customers' shopping experiences and increase sales and
revenues on the provider side. The goal of this survey is to provide a
review of recommender systems that operate in the specific vertical domain
of garment and fashion products. We have identified the most pressing
challenges in fashion RS research and created a taxonomy that categorizes
the literature according to the objective they are trying to accomplish
(e.g., item or outfit recommendation, size recommendation, explainability,
among others) and type of side-information (users, items, context). We
have also identified the most important evaluation goals and perspectives
(outfit generation, outfit recommendation, pairing recommendation, and
fill-in-the-blank outfit compatibility prediction) and the most commonly
used datasets and evaluation metrics.
Title: An empirical study on the usage of transformer models for code
completion
Link: https://ieeexplore.ieee.org/abstract/document/9616462/
Year: 2021
Abstract: Code completion aims at speeding up code writing by predicting
the next code token(s) the developer is likely to write. Works in this
field focused on improving the accuracy of the generated predictions,
with substantial leaps forward made possible by deep learning (DL) models.
However, code completion techniques are mostly evaluated in the scenario
of predicting the next token to type, with few exceptions pushing the
boundaries to the prediction of an entire code statement. Thus, little
is known about the performance of state-of-the-art code completion
approaches in more challenging scenarios in which, for example, an entire
code block must be generated. We present a large-scale study exploring
the capabilities of state-of-the-art Transformer-based models in
supporting code completion at different granularity levels, including
single tokens, one or multiple entire statements, up to entire code blocks
(e.g., the iterated block of a for loop). We experimented with several
variants of two recently proposed Transformer-based models, namely
RoBERTa and the Text-To-Text Transfer Transformer (T5), for the task of
code completion. The achieved results show that Transformer-based models,
and in particular the T5, represent a viable solution for code completion,
with perfect predictions ranging from ∼ 29%, obtained when asking the
model to guess entire blocks, up to ∼ 69%, reached in the simpler scenario
of few tokens masked from the same code statement.
Title: Graph transformer networks
Link:
https://proceedings.neurips.cc/paper/2019/hash/9d63484abb477c97640154d4
0595a3bb-Abstract.html
Year: 2019
Abstract: Graph neural networks (GNNs) have been widely used in
representation learning on graphs and achieved state-of-the-art
performance in tasks such as node classification and link prediction.
However, most existing GNNs are designed to learn node representations
on the fixed and homogeneous graphs. The limitations especially become
problematic when learning representations on a misspecified graph or a
heterogeneous graph that consists of various types of nodes and edges.
In this paper, we propose Graph Transformer Networks (GTNs) that are
capable of generating new graph structures, which involve identifying
useful connections between unconnected nodes on the original graph, while
learning effective node representation on the new graphs in an end-toend fashion. Graph Transformer layer, a core layer of GTNs, learns a soft
selection of edge types and composite relations for generating useful
multi-hop connections so-call meta-paths. Our experiments show that GTNs
learn new graph structures, based on data and tasks without domain
knowledge, and yield powerful node representation via convolution on the
new graphs. Without domain-specific graph preprocessing, GTNs achieved
the best performance in all three benchmark node classification tasks
against the state-of-the-art methods that require pre-defined meta-paths
from domain knowledge.
Title: Deep learning
recommendation systems
recommendation
model
for
personalization
and
Link: https://arxiv.org/abs/1906.00091
Year: 2019
Abstract:With the advent of deep learning, neural network-based
recommendation models have emerged as an important tool for tackling
personalization and recommendation tasks. These networks differ
significantly from other deep learning networks due to their need to
handle categorical features and are not well studied or understood. In
this paper, we develop a state-of-the-art deep learning recommendation
model (DLRM) and provide its implementation in both PyTorch and Caffe2
frameworks. In addition, we design a specialized parallelization scheme
utilizing model parallelism on the embedding tables to mitigate memory
constraints while exploiting data parallelism to scale-out compute from
the
fully-connected
layers.
We
compare
DLRM
against
existing
recommendation models and characterize its performance on the Big Basin
AI platform, demonstrating its usefulness as a benchmark for future
algorithmic experimentation and system co-design.
Title: Tutorial on conversational recommendation systems
Link: https://dl.acm.org/doi/abs/10.1145/3383313.3411548
Year: 2020
Abstract: Recent years have witnessed the emerging of conversational
systems, including both physical devices and mobile-based applications.
Both the research community and industry believe that conversational
systems will have a major impact on human-computer interaction, and
specifically, the RecSys community has begun to explore Conversational
Recommendation Systems. Conversational recommendation aims at finding or
recommending the most relevant information (e.g., web pages, answers,
movies, products) for users based on textual- or spoken-dialogs, through
which users can communicate with the system more efficiently using natural
language conversations. Due to users’ constant need to look for
information to support both work and daily life, conversational
recommendation system will be one of the key techniques towards an
intelligent web. The tutorial focuses on the foundations and algorithms
for conversational recommendation, as well as their applications in realworld systems such as search engine, e-commerce and social networks. The
tutorial
aims
at
introducing
and
communicating
conversational
recommendation methods to the community, as well as gathering researchers
and practitioners interested in this research direction for discussions,
idea communications, and research promotions.
Title: Why are deep learning models not consistently winning recommender
systems competitions yet? A position paper
Link: https://dl.acm.org/doi/abs/10.1145/3415959.3416001
Year: 2020
Abstract: For the past few years most published research on recommendation
algorithms has been based on deep learning (DL) methods. Following common
research practices in our field, these works usually demonstrate that a
new DL method is outperforming other models not based on deep learning
in offline experiments. This almost consistent success of DL based models
is however not observed in recommendation-related machine learning
competitions like the challenges that are held with the yearly ACM RecSys
conference. Instead the winning solutions mostly consist of substantial
feature engineering efforts and the use of gradient boosting or ensemble
techniques. In this paper we investigate possible reasons for this
surprising phenomenon. We consider multiple possible factors such as the
characteristics and complexity of the problem settings, datasets, and DL
methods; the background of the competition participants; or the
particularities of the evaluation approach.
Title: Optimizing deep learning recommender systems training on cpu
cluster architectures
Link: https://ieeexplore.ieee.org/abstract/document/9355237/
Year: 2020
Abstract: During the last two years, the goal of many researchers has
been to squeeze the last bit of performance out of HPC system for AI
tasks. Often this discussion is held in the context of how fast ResNet50
can be trained. Unfortunately, ResNet50 is no longer a representative
workload in 2020. Thus, we focus on Recommender Systems which account for
most of the AI cycles in cloud computing centers. More specifically, we
focus on Facebook's DLRM benchmark. By enabling it to run on latest CPU
hardware and software tailored for HPC, we are able to achieve up to twoorders of magnitude improvement in performance on a single socket compared
to the reference CPU implementation, and high scaling efficiency up to
64 sockets, while fitting ultra-large datasets which cannot be held in
single node's memory. Therefore, this paper discusses and analyzes novel
optimization and parallelization techniques for the various operators in
DLRM. Several optimizations (e.g. tensorcontraction accelerated MLPs,
framework MPI progression, BFLOAT16 training with up to 1.8× speed-up)
are general and transferable to many other deep learning topologies.
Title: A survey on adversarial recommender systems: from attack/defense
strategies to generative adversarial networks
Link: https://dl.acm.org/doi/abs/10.1145/3439729
Year: 2021
Abstract: Latent-factor models (LFM) based on collaborative filtering
(CF), such as matrix factorization (MF) and deep CF methods, are widely
used in modern recommender systems (RS) due to their excellent performance
and recommendation accuracy. However, success has been accompanied with
a major new arising challenge: Many applications of machine learning (ML)
are adversarial in nature [146]. In recent years, it has been shown that
these methods are vulnerable to adversarial examples, i.e., subtle but
non-random perturbations designed to force recommendation models to
produce erroneous outputs.The goal of this survey is two-fold: (i) to
present recent advances on adversarial machine learning (AML) for the
security of RS (i.e., attacking and defense recommendation models) and
(ii) to show another successful application of AML in generative
adversarial networks (GANs) for generative applications, thanks to their
ability for learning (high-dimensional) data distributions. In this
survey, we provide an exhaustive literature review of 76 articles
published in major RS and ML journals and conferences. This review serves
as a reference for the RS community working on the security of RS or on
generative models using GANs to improve their quality.
Title: GPU accelerated feature engineering and training for recommender
systems
Link: https://dl.acm.org/doi/abs/10.1145/3415959.3415996
Year: 2020
Abstract: In this paper we present our 1st place solution of the RecSys
Challenge 2020 which focused on the prediction of user behavior,
specifically the interaction with content, on this year’s dataset from
competition host Twitter. Our approach achieved the highest score in
seven of the eight metrics used to calculate the final leaderboard
position. The 200 million tweet dataset required significant computation
to do feature engineering and prepare the dataset for modelling, and our
winning solution leveraged several key tools in order to accelerate our
training pipeline. Within the paper we describe our exploratory data
analysis (EDA) and training, the final features and models used, and the
acceleration of the pipeline. Our final implementation runs entirely on
GPU including feature engineering, preprocessing, and training the models.
From our initial single threaded efforts in Pandas which took over ten
hours we were able to accelerate feature engineering, preprocessing and
training to two minutes and eighteen seconds, an end to end speedup of
over 280x, using a combination of RAPIDS cuDF, Dask, UCX and XGBoost on
a single machine with four NVIDIA V100 GPUs. Even when compared to heavily
optimized code written later using Dask and Pandas on a 20 core CPU, our
solution was still 25x faster. The acceleration of our pipeline was
critical in our ability to quickly perform EDA which led to the discovery
of a range of effective features used in the final solution, which is
provided as open source [16].
Title: Attentive capsule network for click-through rate and conversion
rate prediction in online advertising
Link:
https://www.sciencedirect.com/science/article/pii/S0950705120306511
Year: 2021
Abstract: Estimating Click-through Rate (CTR) and Conversion Rate (CVR)
are two essential user response prediction tasks in computing advertising
and recommendation systems. The mainstream methods map sparse, highdimensional categorical features (e.g., user id, item id) into lowdimensional representations with neural networks. Although they have
achieved significant advancement in recent years, how to capture user’s
diverse interests effectively from past behaviors is still challenging.
Recently some works try using attention-based methods to learn the
representation from user behavior history adaptively. However, it is
insufficient to capture the diversity of user’s interests. As a step
forward to improve this goal, we propose a method named Attentive Capsule
Network (ACN). It uses Transformers for feature interaction and leverages
capsule networks to capture multiple interests from user behavior history.
To precisely obtain sequence representation related to the current
advertisement, we further design a modified dynamic routing algorithm
integrating with an attention mechanism. Experimental results on realworld datasets demonstrate the effectiveness of our proposed ACN with
significant improvement over state-of-the-art approaches. Moreover, it
also offers good explainability when extracting diverse interest points
of users from behavior history.
Title: [PDF][PDF] Disentangling the Performance Puzzle of Multimodalaware Recommender Systems
Link:
https://sisinflab.poliba.it/publications/2023/MCPD23/_END__EvalRS_2023_
__Disentangling_the_Performance_Puzzle_of_Multimodal_aware_Recommender_
Systems.pdf
Year: 2023
Abstract:
Title: A novel time-aware food recommender-system based on deep learning
and graph clustering
Link: https://ieeexplore.ieee.org/abstract/document/9775081/
Year: 2022
Abstract: Food recommender-systems are considered an effective tool to
help users adjust their eating habits and achieve a healthier diet. This
paper aims to develop a new hybrid food recommender-system to overcome
the shortcomings of previous systems, such as ignoring food ingredients,
time factor, cold start users, cold start food items and community aspects.
The
proposed
method
involves
two
phases:
food
content-based
recommendation and user-based recommendation. Graph clustering is used
in the first phase, and a deep-learning based approach is used in the
second phase to cluster both users and food items. Besides a holisticlike approach is employed to account for time and user-community related
issues in a way that improves the quality of the recommendation provided
to the user. We compared our model with a set of state-of-the-art
recommender-systems using five distinct performance metrics: Precision,
Recall, F1, AUC and NDCG. Experiments using dataset extracted from
“Allrecipes.com” demonstrated that the developed food recommender-system
performed best.
Title: Recommendation as language processing (rlp): A unified pretrain,
personalized prompt & predict paradigm (p5)
Link: https://dl.acm.org/doi/abs/10.1145/3523227.3546767
Year: 2022
Abstract: For a long time, different recommendation tasks require
designing task-specific architectures and training objectives. As a
result, it is hard to transfer the knowledge and representations from one
task to another, thus restricting the generalization ability of existing
recommendation approaches. To deal with such issues, considering that
language can describe almost anything and language grounding is a powerful
medium to represent various problems or tasks, we present a flexible and
unified text-to-text paradigm called “Pretrain, Personalized Prompt, and
Predict Paradigm” (P5) for recommendation, which unifies various
recommendation tasks in a shared framework. In P5, all data such as user-
item interactions, user descriptions, item metadata, and user reviews are
converted to a common format — natural language sequences. The rich
information from natural language assists P5 to capture deeper semantics
for personalization and recommendation. Specifically, P5 learns different
tasks with the same language modeling objective during pretraining. Thus,
it serves as the foundation model for various downstream recommendation
tasks, allows easy integration with other modalities, and enables
instruction-based recommendation. P5 advances recommender systems from
shallow model to deep model to big model, and will revolutionize the
technical form of recommender systems towards universal recommendation
engine. With adaptive personalized prompt for different users, P5 is able
to make predictions in a zero-shot or few-shot manner and largely reduces
the necessity for extensive fine-tuning. On several benchmarks, we
conduct experiments to show the effectiveness of P5. To help advance
future research on Recommendation as Language Processing (RLP),
Personalized Foundation Models (PFM), and Universal Recommendation Engine
(URE), we release the source code, dataset, prompts, and pretrained P5
model at https://github.com/jeykigung/P5.
Title: Equivariant contrastive learning for sequential recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3604915.3608786
Year: 2023
Abstract: Contrastive learning (CL) benefits the training of sequential
recommendation models with informative self-supervision signals. Existing
solutions apply general sequential data augmentation strategies to
generate positive pairs and encourage their representations to be
invariant. However, due to the inherent properties of user behavior
sequences, some augmentation strategies, such as item substitution, can
lead to changes in user intent. Learning indiscriminately invariant
representations for all augmentation strategies might be sub-optimal.
Therefore, we propose Equivariant Contrastive Learning for Sequential
Recommendation (ECL-SR), which endows SR models with great discriminative
power, making the learned user behavior representations sensitive to
invasive augmentations (e.g., item substitution) and insensitive to mild
augmentations (e.g., feature-level dropout masking). In detail, we use
the conditional discriminator to capture differences in behavior due to
item substitution, which encourages the user behavior encoder to be
equivariant to invasive augmentations. Comprehensive experiments on four
benchmark datasets show that the proposed ECL-SR framework achieves
competitive performance compared to state-of-the-art SR models. The
source code is available at https://github.com/Tokkiu/ECL.
Title: A survey of graph neural networks
Challenges, methods, and directions
for
Link: https://dl.acm.org/doi/abs/10.1145/3568022
recommender
systems:
Year: 2023
Abstract: Recommender system is one of the most important information
services on today’s Internet. Recently, graph neural networks have become
the new state-of-the-art approach to recommender systems. In this survey,
we conduct a comprehensive review of the literature on graph neural
network-based recommender systems. We first introduce the background and
the history of the development of both recommender systems and graph
neural networks. For recommender systems, in general, there are four
aspects for categorizing existing works: stage, scenario, objective, and
application. For graph neural networks, the existing methods consist of
two categories: spectral models and spatial ones. We then discuss the
motivation of applying graph neural networks into recommender systems,
mainly consisting of the high-order connectivity, the structural property
of data and the enhanced supervision signal. We then systematically
analyze
the
challenges
in
graph
construction,
embedding
propagation/aggregation, model optimization, and computation efficiency.
Afterward and primarily, we provide a comprehensive overview of a
multitude of existing works of graph neural network-based recommender
systems, following the taxonomy above. Finally, we raise discussions on
the open problems and promising future directions in this area. We
summarize the representative papers along with their code repositories
in https://github.com/tsinghua-fib-lab/GNN-Recommender-Systems.
Title: Contrastive learning for debiased candidate generation in largescale recommender systems
Link: https://dl.acm.org/doi/abs/10.1145/3447548.3467102
Year: 2021
Abstract: Deep candidate generation (DCG) that narrows down the
collection of relevant items from billions to hundreds via representation
learning has become prevalent in industrial recommender systems. Standard
approaches approximate maximum likelihood estimation (MLE) through
sampling for better scalability and address the problem of DCG in a way
similar to language modeling. However, live recommender systems face
severe exposure bias and have a vocabulary several orders of magnitude
larger than that of natural language, implying that MLE will preserve and
even exacerbate the exposure bias in the long run in order to faithfully
fit the observed samples. In this paper, we theoretically prove that a
popular choice of contrastive loss is equivalent to reducing the exposure
bias via inverse propensity weighting, which provides a new perspective
for understanding the effectiveness of contrastive learning. Based on the
theoretical discovery, we design CLRec, a contrastive learning method to
improve DCG in terms of fairness, effectiveness and efficiency in
recommender systems with extremely large candidate size. We further
improve upon CLRec and propose Multi-CLRec, for accurate multi-intention
aware bias reduction. Our methods have been successfully deployed in
Taobao, where at least four-month online A/B tests and offline analyses
demonstrate its substantial improvements, including a dramatic reduction
in the Matthew effect.
Title: Graph neural networks in recommender systems: a survey
Link: https://dl.acm.org/doi/abs/10.1145/3535101
Year: 2022
Abstract: With the explosive growth of online information, recommender
systems play a key role to alleviate such information overload. Due to
the important application value of recommender systems, there have always
been emerging works in this field. In recommender systems, the main
challenge is to learn the effective user/item representations from their
interactions and side information (if any). Recently, graph neural
network (GNN) techniques have been widely utilized in recommender systems
since most of the information in recommender systems essentially has
graph structure and GNN has superiority in graph representation learning.
This article aims to provide a comprehensive review of recent research
efforts on GNN-based recommender systems. Specifically, we provide a
taxonomy of GNN-based recommendation models according to the types of
information used and recommendation tasks. Moreover, we systematically
analyze the challenges of applying GNN on different types of data and
discuss how existing works in this field address these challenges.
Furthermore, we state new perspectives pertaining to the development of
this field. We collect the representative papers along with their opensource implementations in https://github.com/wusw14/GNN-in-RS.
Title: [HTML][HTML] A survey of recommendation systems: recommendation
models, techniques, and application fields
Link: https://www.mdpi.com/2079-9292/11/1/141
Year: 2022
Abstract: This paper reviews the research trends that link the advanced
technical aspects of recommendation systems that are used in various
service areas and the business aspects of these services. First, for a
reliable analysis of recommendation models for recommendation systems,
data mining technology, and related research by application service, more
than 135 top-ranking articles and top-tier conferences published in
Google Scholar between 2010 and 2021 were collected and reviewed. Based
on this, studies on recommendation system models and the technology used
in recommendation systems were systematized, and research trends by year
were analyzed. In addition, the application service fields where
recommendation systems were used were classified, and research on the
recommendation system model and recommendation technique used in each
field was analyzed. Furthermore, vast amounts of application servicerelated data used by recommendation systems were collected from 2010 to
2021 without taking the journal ranking into consideration and reviewed
along with various recommendation system studies, as well as applied
service field industry data. As a result of this study, it was found that
the flow and quantitative growth of various detailed studies of
recommendation systems interact with the business growth of the actual
applied service field. While providing a comprehensive summary of
recommendation systems, this study provides insight to many researchers
interested in recommendation systems through the analysis of its various
technologies and trends in the service field to which recommendation
systems are applied.
Title: Deep learning techniques
collaborative filtering
for
recommender
systems
based
on
Link: https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.12647
Year: 2020
Abstract: In the Big Data Era, recommender systems perform a fundamental
role in data management and information filtering. In this context,
Collaborative Filtering (CF) persists as one of the most prominent
strategies to effectively deal with large datasets and is capable of
offering users interesting content in a recommendation fashion.
Nevertheless, it is well-known CF recommenders suffer from data sparsity,
mainly in cold-start scenarios, substantially reducing the quality of
recommendations. In the vast literature about the aforementioned topic,
there are numerous solutions, in which the state-of-the-art contributions
are, in some sense, conditioned or associated with traditional CF methods
such as Matrix Factorization (MF), that is, they rely on linear
optimization procedures to model users and items into low-dimensional
embeddings. To overcome the aforementioned challenges, there has been an
increasing number of studies exploring deep learning techniques in the
CF context for latent factor modelling. In this research, authors conduct
a systematic review focusing on state-of-the-art literature on deep
learning techniques applied in collaborative filtering recommendation,
and also featuring primary studies related to mitigating the cold start
problem. Additionally, authors considered the diverse non-linear
modelling strategies to deal with rating data and side information, the
combination of deep learning techniques with traditional CF-based linear
methods, and an overview of the most used public datasets and evaluation
metrics concerning CF scenarios.
Title: Neural networks and deep learning
Link:
https://www.emerald.com/insight/content/doi/10.1108/978-1-83909694-520211010/full/html
Year: 2021
Abstract: Neural networks, which provide the basis for deep learning, are
a class of machine learning methods that are being applied to a diverse
array of fields in business, health, technology, and research. In this
chapter, we survey some of the key features of deep neural networks and
aspects of their design and architecture. We give an overview of some of
the different kinds of networks and their applications and highlight how
these architectures are used for business applications such as
recommender systems. We also provide a summary of some of the
considerations needed for using neural network models and future
directions in the field.
Title: Multi-aspect aware session-based recommendation for intelligent
transportation services
Link: https://ieeexplore.ieee.org/abstract/document/9093954/
Year: 2020
Abstract: In the intelligent transportation system, the session data
usually represents the users' demand. However, the traditional approaches
only focus on the sequence information or the last item clicked by the
user, which cannot fully represent user preferences. To address this
issue, this paper proposes an Multi-aspect Aware Session-based
Recommendation (MASR) model for intelligent transportation services,
which comprehensively considers the user's personalized behavior from
multiple aspects. In addition, it developed a concise and efficient
transformer-style self-attention to analyze the sequence information of
the current session, for accurately grasping the user's intention.
Finally, the experimental results show that MASR is available to improve
user satisfaction with more accurate and rapid recommendations, and
reduce the number of user operations to decrease the safety risk during
the transportation service.
Title: Contrastive self-supervised sequential recommendation with robust
augmentation
Link: https://arxiv.org/abs/2108.06479
Year: 2021
Abstract:Sequential Recommendationdescribes a set of techniques to model
dynamic user behavior in order to predict future interactions in
sequential user data. At their core, such approaches model transition
probabilities between items in a sequence, whether through Markov chains,
recurrent networks, or more recently, Transformers. However both old and
new issues remain, including data-sparsity and noisy data; such issues
can impair the performance, especially in complex, parameter-hungry
models. In this paper, we investigate the application of contrastive
Self-Supervised Learning (SSL) to the sequential recommendation, as a way
to alleviate some of these issues. Contrastive SSL constructs
augmentations from unlabelled instances, where agreements among positive
pairs are maximized. It is challenging to devise a contrastive SSL
framework for a sequential recommendation, due to its discrete nature,
correlations among items, and skewness of length distributions. To this
end, we propose a novel framework, Contrastive Self-supervised Learning
for sequential Recommendation (CoSeRec). We introduce two informative
augmentation operators leveraging item correlations to create highquality views for contrastive learning. Experimental results on three
real-world datasets demonstrate the effectiveness of the proposed method
on improving model performance and the robustness against sparse and
noisy data. Our implementation is available online at \url{this https
URL}
Title: [HTML][HTML] Advances and challenges in conversational recommender
systems: A survey
Link:
https://www.sciencedirect.com/science/article/pii/S2666651021000164
Year: 2021
Abstract: Recommender systems exploit interaction history to estimate
user preference, having been heavily used in a wide range of industry
applications. However, static recommendation models are difficult to
answer two important questions well due to inherent shortcomings: (a)
What exactly does a user like? (b) Why does a user like an item? The
shortcomings are due to the way that static models learn user preference,
i.e., without explicit instructions and active feedback from users. The
recent rise of conversational recommender systems (CRSs) changes this
situation fundamentally. In a CRS, users and the system can dynamically
communicate through natural language interactions, which provide
unprecedented opportunities to explicitly obtain the exact preference of
users. Considerable efforts, spread across disparate settings and
applications, have been put into developing CRSs. Existing models,
technologies, and evaluation methods for CRSs are far from mature. In
this paper, we provide a systematic review of the techniques used in
current CRSs. We summarize the key challenges of developing CRSs in five
directions: (1) Question-based user preference elicitation. (2) Multiturn conversational recommendation strategies. (3) Dialogue understanding
and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation
and user simulation. These research directions involve multiple research
fields like information retrieval (IR), natural language processing (NLP),
and human-computer interaction (HCI). Based on these research directions,
we discuss some future challenges and opportunities. We provide a road
map for researchers from multiple communities to get started in this area.
We hope this survey can help to identify and address challenges in CRSs
and inspire future research.
Title: [PDF][PDF] Deep feedback network for recommendation
Link: https://www.ijcai.org/Proceedings/2020/0349.pdf
Year: 2021
Title: Adversarial feature translation for multi-domain recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3447548.3467176
Year: 2021
Abstract: Real-world super platforms such as Google and WeChat usually
have different recommendation scenarios to provide heterogeneous items
for users' diverse demands. Multi-domain recommendation (MDR) is proposed
to improve all recommendation domains simultaneously, where the key point
is to capture informative domain-specific features from all domains. To
address this problem, we propose a novel Adversarial feature translation
(AFT) model for MDR, which learns the feature translations between
different domains under a generative adversarial network framework.
Precisely, in the multi-domain generator, we propose a domain-specific
masked encoder to highlight inter-domain feature interactions, and then
aggregate these features via a transformer and a domain-specific
attention. In the multi-domain discriminator, we explicitly model the
relationships between item, domain and users' general/domain-specific
representations with a two-step feature translation inspired by the
knowledge representation learning. In experiments, we evaluate AFT on a
public and an industrial MDR datasets and achieve significant
improvements. We also conduct an online evaluation on a real-world MDR
system. We further give detailed ablation tests and model analyses to
verify the effectiveness of different components. Currently, we have
deployed AFT on WeChat Top Stories. The source code is in
https://github.com/xiaobocser/AFT.
Title: Time interval aware self-attention for sequential recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3336191.3371786
Year: 2020
Abstract: Sequential recommender systems seek to exploit the order of
users' interactions, in order to predict their next action based on the
context of what they have done recently. Traditionally, Markov
Chains(MCs), and more recently Recurrent Neural Networks (RNNs) and Self
Attention (SA) have proliferated due to their ability to capture the
dynamics of sequential patterns. However a simplifying assumption made
by most of these models is to regard interaction histories as ordered
sequences, without regard for the time intervals between each interaction
(i.e., they model the time-order but not the actual timestamp). In this
paper, we seek to explicitly model the timestamps of interactions within
a sequential modeling framework to explore the influence of different
time intervals on next item prediction. We propose TiSASRec (Time Interval
aware Self-attention based sequential recommendation), which models both
the absolute positions of items as well as the time intervals between
them in a sequence. Extensive empirical studies show the features of
TiSASRec under different settings and compare the performance of selfattention with different positional encodings. Furthermore, experimental
results show that our method outperforms various state-of-the-art
sequential models on both sparse and dense datasets and different
evaluation metrics.
Title: A deep learning approach for robust detection of bots in twitter
using transformers
Link: https://ieeexplore.ieee.org/abstract/document/9385071/
Year: 2021
Abstract: During the last decades, the volume of multimedia content posted
in social networks has grown exponentially and such information is
immediately propagated and consumed by a significant number of users. In
this scenario, the disruption of fake news providers and bot accounts for
spreading propaganda information as well as sensitive content throughout
the network has fostered applied research to automatically measure the
reliability of social networks accounts via Artificial Intelligence (AI).
In this paper, we present a multilingual approach for addressing the bot
identification task in Twitter via Deep learning (DL) approaches to
support end-users when checking the credibility of a certain Twitter
account. To do so, several experiments were conducted using state-ofthe-art Multilingual Language Models to generate an encoding of the textbased features of the user account that are later on concatenated with
the rest of the metadata to build a potential input vector on top of a
Dense Network denoted as Bot-DenseNet. Consequently, this paper assesses
the language constraint from previous studies where the encoding of the
user account only considered either the metadata information or the
metadata information together with some basic semantic text features.
Moreover, the Bot-DenseNet produces a low-dimensional representation of
the user account which can be used for any application within the
Information Retrieval (IR) framework.
Title: Multimedia recommender systems: Algorithms and challenges
Link: https://link.springer.com/chapter/10.1007/978-1-0716-2197-4_25
Year: 2021
Abstract: This chapter studies state-of-the-art research related to
multimedia recommender systems (MMRS), focusing on methods that integrate
multimedia content as side information to various recommendation models.
The multimedia features are then used by an MMRS to recommend either (1)
media items from which the features were derived, or (2) non-media items
utilizing the features obtained from a proxy multimedia representation
of the item (e.g., images of clothes). We first outline the key
considerations and challenges that must be taken into account while
developing an MMRS. We then discuss the most popular multimedia content
processing approaches to produce item representations that may be
utilized as side information in an MMRS. Finally, we discuss recent stateof-the-art MMRS algorithms, which we classify and present according to
classical hybrid models (e.g., VBPR), neural approaches, and graph-based
approaches. Throughout this work, we mentioned several use-cases of MMRSs
in the recommender systems research across several domains or products
types such as food, fashion, music, videos, and so forth. We hope this
chapter provides fresh insights into the nexus of multimedia and
recommender systems, which could be exploited to broaden the frontier in
the field.
Title: A memory transformer network for incremental learning
Link: https://arxiv.org/abs/2210.04485
Year: 2022
Abstract:We study class-incremental learning, a training setup in which
new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application
of classification models to class-incremental learning results in the
"catastrophic forgetting" of previously seen classes. One of the most
successful existing methods has been the use of a memory of exemplars,
which overcomes the issue of catastrophic forgetting by saving a subset
of past data into a memory bank and utilizing it to prevent forgetting
when training future tasks. In our paper, we propose to enhance the
utilization of this memory bank: we not only use it as a source of
additional training data like existing works but also integrate it in the
prediction process explicitly.Our method, the Memory Transformer Network
(MTN), learns how to combine and aggregate the information from the
nearest neighbors in the memory with a transformer to make more accurate
predictions. We conduct extensive experiments and ablations to evaluate
our approach. We show that MTN achieves state-of-the-art performance on
the challenging ImageNet-1k and Google-Landmarks-1k incremental learning
benchmarks.
Title: A survey of recommender systems with multi-objective optimization
Link:
https://www.sciencedirect.com/science/article/pii/S0925231221017185
Year: 2022
Abstract: Recommender systems have been widely applied to several domains
and applications to assist decision making by recommending items tailored
to user preferences. One of the popular recommendation algorithms is the
model-based approach which optimizes a specific objective to improve the
recommendation performance. These traditional recommendation models
usually deal with a single objective, such as minimizing the prediction
errors or maximizing the ranking quality of the recommendations. In recent
years, there is an emerging demand for multi-objective recommender
systems
in
which
multiple
objectives
are
considered
and
the
recommendations can be optimized by the multi-objective optimization. For
example, a recommendation model may be built by optimizing multiple
metrics, such as accuracy, novelty and diversity of the recommendations.
The multi-objective optimization methodologies have been well developed
and applied to the area of recommender systems. In this article, we
provide a comprehensive literature review of the multi-objective
recommender systems. Particularly, we identify the circumstances in which
a multi-objective recommender system could be useful, summarize the
methodologies and evaluation approaches in these systems, point out
existing challenges or weaknesses, finally provide the guidelines and
suggestions for the development of multi-objective recommender systems.
Title: Untargeted attack against federated recommendation systems via
poisonous item embeddings and the defense
Link: https://ojs.aaai.org/index.php/AAAI/article/view/25611
Year: 2023
Abstract: Federated recommendation (FedRec) can train personalized
recommenders without collecting user data, but the decentralized nature
makes it susceptible to poisoning attacks. Most previous studies focus
on the targeted attack to promote certain items, while the untargeted
attack that aims to degrade the overall performance of the FedRec system
remains less explored. In fact, untargeted attacks can disrupt the user
experience and bring severe financial loss to the service provider. However,
existing untargeted attack methods are either inapplicable or ineffective
against FedRec systems. In this paper, we delve into the untargeted attack
and its defense for FedRec systems. (i) We propose ClusterAttack, a novel
untargeted attack method. It uploads poisonous gradients that converge
the item embeddings into several dense clusters, which make the
recommender generate similar scores for these items in the same cluster
and perturb the ranking order. (ii) We propose a uniformity-based defense
mechanism (UNION) to protect FedRec systems from such attacks. We design
a contrastive learning task that regularizes the item embeddings toward
a uniform distribution. Then the server filters out these malicious
gradients by estimating the uniformity of updated item embeddings.
Experiments on two public datasets show that ClusterAttack can
effectively degrade the performance of FedRec systems while circumventing
many defense methods, and UNION can improve the resistance of the system
against various untargeted attacks, including our ClusterAttack.
Title: Filter-enhanced MLP is all you need for sequential recommendation
Link: https://dl.acm.org/doi/abs/10.1145/3485447.3512111
Year: 2022
Abstract: Recently, deep neural networks such as RNN, CNN and Transformer
have been applied in the task of sequential recommendation, which aims
to capture the dynamic preference characteristics from logged user
behavior data for accurate recommendation. However, in online platforms,
logged user behavior data is inevitable to contain noise, and deep
recommendation models are easy to overfit on these logged data. To tackle
this problem, we borrow the idea of filtering algorithms from signal
processing that attenuates the noise in the frequency domain. In our
empirical
experiments,
we
find
that
filtering
algorithms
can
substantially improve representative sequential recommendation models,
and integrating simple filtering algorithms (e.g., Band-Stop Filter) with
an all-MLP architecture can even outperform competitive Transformer-based
models. Motivated by it, we propose FMLP-Rec, an all-MLP model with
learnable filters for sequential recommendation task. The all-MLP
architecture endows our model with lower time complexity, and the
learnable filters can adaptively attenuate the noise information in the
frequency domain. Extensive experiments conducted on eight real-world
datasets demonstrate the superiority of our proposed method over
competitive RNN, CNN, GNN and Transformer-based methods. Our code and
data
are
publicly
available
at
the
link:
https://github.com/RUCAIBox/FMLP-Rec .
Title: Noninvasive self-attention
sequential recommendation
for
side
information
fusion
in
Link: https://ojs.aaai.org/index.php/AAAI/article/view/16549
Year: 2021
Abstract: Sequential recommender systems aim to model users’ evolving
interests from their historical behaviors, and hence make customized
time-relevant recommendations. Compared with traditional models, deep
learning approaches such as CNN and RNN have achieved remarkable
advancements in recommendation tasks. Recently, the BERT framework also
emerges as a promising method, benefited from its self-attention
mechanism in processing sequential data. However, one limitation of the
original BERT framework is that it only considers one input source of the
natural language tokens. It is still an open question to leverage various
types of information under the BERT framework. Nonetheless, it is
intuitively appealing to utilize other side information, such as item
category or tag, for more comprehensive depictions and better
recommendations. In our pilot experiments, we found naive approaches,
which directly fuse types of side information into the item embeddings,
usually bring very little or even negative effects. Therefore, in this
paper, we propose the NOn-inVasive self-Attention mechanism (NOVA) to
leverage side information effectively under the BERT framework. NOVA
makes use of side information to generate better attention distribution,
rather than directly altering the item embeddings, which may cause
information overwhelming. We validate the NOVA-BERT model on both public
and commercial datasets, and our method can stably outperform the stateof-the-art models with negligible computational overheads.
Title: An approach to integrating sentiment analysis into recommender
systems
Link: https://www.mdpi.com/1424-8220/21/16/5666
Year: 2021
Abstract: Recommender systems have been applied in a wide range of domains
such as e-commerce, media, banking, and utilities. This kind of system
provides personalized suggestions based on large amounts of data to
increase user satisfaction. These suggestions help client select products,
while organizations can increase the consumption of a product. In the
case of social data, sentiment analysis can help gain better understanding
of a user’s attitudes, opinions and emotions, which is beneficial to
integrate in recommender systems for achieving higher recommendation
reliability. On the one hand, this information can be used to complement
explicit ratings given to products by users. On the other hand, sentiment
analysis of items that can be derived from online news services, blogs,
social media or even from the recommender systems themselves is seen as
capable of providing better recommendations to users. In this study, we
present and evaluate a recommendation approach that integrates sentiment
analysis into collaborative filtering methods. The recommender system
proposal is based on an adaptive architecture, which includes improved
techniques for feature extraction and deep learning models based on
sentiment analysis. The results of the empirical study performed with two
popular datasets show that sentiment–based deep learning models and
collaborative filtering methods can significantly improve the recommender
system’s performance.
Title: Stochastic
embedding layers
shared
embeddings:
Data-driven
regularization
of
Link:
https://proceedings.neurips.cc/paper/2019/hash/37693cfc748049e45d87b8c7
d8b9aacd-Abstract.html
Year: 2019
Abstract: In deep neural nets, lower level embedding layers account for
a large portion of the total number of parameters. Tikhonov regularization,
graph-based regularization, and hard parameter sharing are approaches
that introduce explicit biases into training in a hope to reduce
statistical complexity. Alternatively, we propose stochastically shared
embeddings (SSE), a data-driven approach to regularizing embedding layers,
which stochastically transitions between embeddings during stochastic
gradient descent (SGD). Because SSE integrates seamlessly with existing
SGD algorithms, it can be used with only minor modifications when training
large scale neural networks. We develop two versions of SSE: SSE-Graph
using knowledge graphs of embeddings; SSE-SE using no prior information.
We provide theoretical guarantees for our method and show its empirical
effectiveness on 6 distinct tasks, from simple neural networks with one
hidden layer in recommender systems, to the transformer and BERT in
natural languages. We find that when used along with widely-used
regularization methods such as weight decay and dropout, our proposed SSE
can further reduce overfitting, which often leads to more favorable
generalization results.
Title: CRSLab: An
recommender system
open-source
toolkit
for
building
conversational
Link: https://arxiv.org/abs/2101.00939
Year: 2021
Abstract:In recent years, conversational recommender system (CRS) has
received much attention in the research community. However, existing
studies on CRS vary in scenarios, goals and techniques, lacking unified,
standardized implementation or comparison. To tackle this challenge, we
propose an open-source CRS toolkit CRSLab, which provides a unified and
extensible framework with highly-decoupled modules to develop CRSs. Based
on this framework, we collect 6 commonly-used human-annotated CRS
datasets and implement 18 models that include recent techniques such as
graph neural network and pre-training models. Besides, our toolkit
provides a series of automatic evaluation protocols and a human-machine
interaction interface to test and compare different CRS methods. The
project and documents are released at this https URL.
Title:
U-BERT:
recommendation
Pre-training
user
representations
for
improved
Link: https://ojs.aaai.org/index.php/AAAI/article/view/16557
Year: 2021
Abstract: Learning user representation is a critical task for
recommendation systems as it can encode user preference for personalized
services. User representation is generally learned from behavior data,
such as clicking interactions and review comments. However, for less
popular domains, the behavior data is insufficient to learn precise user
representations. To deal with this problem, a natural thought is to
leverage content-rich domains to complement user representations.
Inspired by the recent success of BERT in NLP, we propose a novel pre-
training and fine-tuning based approach U-BERT. Different from typical
BERT applications, U-BERT is customized for recommendation and utilizes
different frameworks in pre-training and fine-tuning. In pre-training,
U-BERT focuses on content-rich domains and introduces a user encoder and
a review encoder to model users' behaviors. Two pre-training strategies
are proposed to learn the general user representations; In fine-tuning,
U-BERT focuses on the target content-insufficient domains. In addition
to the user and review encoders inherited from the pre-training stage,
U-BERT further introduces an item encoder to model item representations.
Besides, a review co-matching layer is proposed to capture more semantic
interactions between the reviews of the user and item. Finally, U-BERT
combines
user
representations,
item
representations
and
review
interaction
information
to
improve
recommendation
performance.
Experiments on six benchmark datasets from different domains demonstrate
the state-of-the-art performance of U-BERT.
Title: Sequential recommendation via stochastic self-attention
Link: https://dl.acm.org/doi/abs/10.1145/3485447.3512077
Year: 2022
Abstract: Sequential recommendation models the dynamics of a user’s
previous behaviors in order to forecast the next item, and has drawn a
lot of attention. Transformer-based approaches, which embed items as
vectors and use dot-product self-attention to measure the relationship
between items, demonstrate superior capabilities among existing
sequential methods. However, users’ real-world sequential behaviors are
uncertain rather than deterministic, posing a significant challenge to
present techniques. We further suggest that dot-product-based approaches
cannot fully capture collaborative transitivity, which can be derived in
item-item transitions inside sequences and is beneficial for cold start
items. We further argue that BPR loss has no constraint on positive and
sampled negative items, which misleads the optimization.
We propose a novel STOchastic Self-Attention (STOSA) to overcome these
issues. STOSA, in particular, embeds each item as a stochastic Gaussian
distribution, the covariance of which encodes the uncertainty. We devise
a novel Wasserstein Self-Attention module to characterize item-item
position-wise relationships in sequences, which effectively incorporates
uncertainty into model training. Wasserstein attentions also enlighten
the collaborative transitivity learning as it satisfies triangle
inequality. Moreover, we introduce a novel regularization term to the
ranking loss, which assures the dissimilarity between positive and the
negative items. Extensive experiments on five real-world benchmark
datasets demonstrate the superiority of the proposed model over stateof-the-art baselines, especially on cold start items. The code is
available in https://github.com/zfan20/STOSA.
Title: A troubling analysis
recommender systems research
of
reproducibility
and
progress
in
Link: https://dl.acm.org/doi/abs/10.1145/3434185
Year: 2021
Abstract: The design of algorithms that generate personalized ranked item
lists is a central topic of research in the field of recommender systems.
In the past few years, in particular, approaches based on deep learning
(neural) techniques have become dominant in the literature. For all of
them, substantial progress over the state-of-the-art is claimed. However,
indications exist of certain problems in today’s research practice, e.g.,
with respect to the choice and optimization of the baselines used for
comparison, raising questions about the published claims. To obtain a
better understanding of the actual progress, we have compared recent
results in the area of neural recommendation approaches based on
collaborative filtering against a consistent set of existing simple
baselines. The worrying outcome of the analysis of these recent works—
all were published at prestigious scientific conferences between 2015 and
2018—is that 11 of the 12 reproducible neural approaches can be
outperformed by conceptually simple methods, e.g., based on the nearestneighbor heuristic or linear models. None of the computationally complex
neural methods was actually consistently better than already existing
learning-based techniques, e.g., using matrix factorization or linear
models. In our analysis, we discuss common issues in today’s research
practice, which, despite the many papers that are published on the topic,
have apparently led the field to a certain level of stagnation.
Title: Fast-adapting and privacy-preserving federated recommender system
Link: https://link.springer.com/article/10.1007/s00778-021-00700-6
Year: 2021
Abstract: In the mobile Internet era, recommender systems have become an
irreplaceable tool to help users discover useful items, thus alleviating
the information overload problem. Recent research on deep neural network
(DNN)-based recommender systems have made significant progress in
improving prediction accuracy, largely attributed to the widely
accessible large-scale user data. Such data is commonly collected from
users’ personal devices and then centrally stored in the cloud server to
facilitate model training. However, with the rising public concerns on
user privacy leakage in online platforms, online users are becoming
increasingly anxious over abuses of user privacy. Therefore, it is urgent
and beneficial to develop a recommender system that can achieve both high
prediction accuracy and strong privacy protection. To this end, we propose
a DNN-based recommendation model called PrivRec running on the
decentralized federated learning (FL) environment, which ensures that a
user’s data is fully retained on her/his personal device while
contributing to training an accurate model. On the other hand, to better
embrace the data heterogeneity (e.g., users’ data vary in scale and
quality significantly) in FL, we innovatively introduce a first-order
meta-learning method that enables fast on-device personalization with
only a few data points. Furthermore, to defend against potential malicious
participants that pose serious security threat to other users, we further
develop a user-level differentially private model, namely DP-PrivRec, so
attackers are unable to identify any arbitrary user from the trained
model. To compensate for the loss by adding noise during model updates,
we introduce a two-stage training approach. Finally, we conduct extensive
experiments on two large-scale datasets in a simulated FL environment,
and the results validate the superiority of both PrivRec and DP-PrivRec.
Title: A generic network compression framework for sequential recommender
systems
Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401125
Year: 2020
Abstract: Sequential recommender systems (SRS) have become the key
technology in capturing user's dynamic interests and generating highquality recommendations. Current state-of-the-art sequential recommender
models are typically based on a sandwich-structured deep neural network,
where one or more middle (hidden) layers are placed between the input
embedding layer and output softmax layer. In general, these models require
a large number of parameters to obtain optimal performance. Despite the
effectiveness, at some point, further increasing model size may be harder
for model deployment in resource-constraint devices. To resolve the
issues, we propose a compressed sequential recommendation framework,
termed as CpRec, where two generic model shrinking techniques are employed.
Specifically, we first propose a block-wise adaptive decomposition to
approximate the input and softmax matrices by exploiting the fact that
items in SRS obey a long-tailed distribution. To reduce the parameters
of the middle layers, we introduce three layer-wise parameter sharing
schemes. We instantiate CpRec using deep convolutional neural network
with dilated kernels given consideration to both recommendation accuracy
and efficiency. By the extensive ablation studies, we demonstrate that
the proposed CpRec can achieve up to 4~8 times compression rates in realworld SRS datasets. Meanwhile, CpRec is faster during training & inference,
and in most cases outperforms its uncompressed counterpart.
Title: Recommendation system for technology convergence opportunities
based on self-supervised representation learning
Link: https://link.springer.com/article/10.1007/s11192-020-03731-y
Year: 2021
Abstract: We show how a deep neural network can be designed to learn
meaningful representations from high-dimensional and heterogeneous
categorical features in patent data using self-supervised learning. Based
on each firm’s technology portfolio and each patent’s co-classification
information, we propose a novel recommendation system for firms seeking
new convergence opportunities through representations of convergence
items and firms. The results of this work are expected to recommend
convergence opportunities in multiple technology fields by considering
the target firm’s potential preference. First, we create a technology
portfolio consisting of a set of patents owned by each firm. Then, we
train a neural network to extract latent representations of firms and
technology convergence items. Despite a lack of indicators related to a
firm’s latent preference for a convergence item, a self-supervised neural
network can capture the similarity with semantic information of firm’s
latent preference that is implicitly present in patent’s coclassification information in each firm’s technology portfolio. We then
calculate the similarity between the vector of a target firm and
convergence items for recommendation. The top N similar convergence items
that have the highest scores are recommended as the new convergence items
for the target firm. We apply our framework to the dataset of patents
granted by the United States Patent and Trademark Office between 2011 and
2015. The results indicate that the recent development in theories and
empirical studies of deep representation learning can shed new light on
extracting valuable information from the structured part of patent data.
Title: AICF: Attention-based item collaborative filtering
Link:
https://www.sciencedirect.com/science/article/pii/S1474034620300598
Year: 2020
Abstract: Item-to-item collaborative filtering (short for ICF) has been
widely used in ecommerce websites due to his interpretability and
simplicity in real-time personalized recommendation. The focus of ICF is
to calculate the similarity between items. With the rapid development of
machine learning in recent years, it takes similarity model instead of
cosine similarity and Pearson coefficient to calculate the similarity
between items in recommendation. However, the existing similarity models
can not sufficient to express the preferences of users for different
items. In this work, we propose a novel attention-based item collaborative
filtering model(AICF) which adopts three different attention mechanisms
to estimate the weights of historical items that users have interacted
with. Compared with the state-of-the-art recommendation models, the AICF
model with simple attention mechanism Self-Attention can better estimate
the weight of historical items on non-sparse data sets. Due to depth
models can model complex connection between items, our model with the
more complex Transformer achieves superior recommendation performance on
sparse data. Extensive experiments on ML-1M and Pinterest-20 show that
the proposed model greatly outperforms other novel models in
recommendation
accuracy
and
provides
users
with
personalized
recommendation list more in line with their interests.
Title: Direct feedback alignment scales to modern deep learning tasks and
architectures
Link:
https://proceedings.neurips.cc/paper/2020/hash/69d1fc78dbda242c43ad6590
368912d4-Abstract.html
Year: 2020
Abstract:
Despite
being
the
workhorse
of
deep
learning,
the
backpropagation algorithm is no panacea. It enforces sequential layer
updates, thus preventing efficient parallelization of the training
process. Furthermore, its biological plausibility is being challenged.
Alternative schemes have been devised; yet, under the constraint of
synaptic asymmetry, none have scaled to modern deep learning tasks and
architectures. Here, we challenge this perspective, and study the
applicability of Direct Feedback Alignment (DFA) to neural view synthesis,
recommender systems, geometric learning, and natural language processing.
In contrast with previous studies limited to computer vision tasks, our
findings show that it successfully trains a large range of state-of-theart deep learning architectures, with performance close to fine-tuned
backpropagation. When a larger gap between DFA and backpropagation exists,
like in Transformers, we attribute this to a need to rethink common
practices for large and complex architectures. At variance with common
beliefs, our work supports that challenging tasks can be tackled in the
absence of weight transport.
Title: Contrastive learning for recommender system
Link: https://arxiv.org/abs/2101.01317
Year: 2021
Abstract:Recommender systems, which analyze users' preference patterns
to suggest potential targets, are indispensable in today's society.
Collaborative Filtering (CF) is the most popular recommendation model.
Specifically, Graph Neural Network (GNN) has become a new state-of-theart for CF. In the GNN-based recommender system, message dropout is
usually used to alleviate the selection bias in the user-item bipartite
graph. However, message dropout might deteriorate the recommender
system's performance due to the randomness of dropping out the outgoing
messages based on the user-item bipartite graph. To solve this problem,
we propose a graph contrastive learning module for a general recommender
system that learns the embeddings in a self-supervised manner and reduces
the randomness of message dropout. Besides, many recommender systems
optimize models with pairwise ranking objectives, such as the Bayesian
Pairwise Ranking (BPR) based on a negative sampling strategy. However,
BPR has the following problems: suboptimal sampling and sample bias. We
introduce a new debiased contrastive loss to solve these problems, which
provides sufficient negative samples and applies a bias correction
probability to alleviate the sample bias. We integrate the proposed
framework, including graph contrastive module and debiased contrastive
module with several Matrix Factorization(MF) and GNN-based recommendation
models. Experimental results on three public benchmarks demonstrate the
effectiveness of our framework.
Title:
KRED:
Knowledge-aware
recommendations
document
representation
for
news
Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412237
Year: 2020
Abstract: News articles usually contain knowledge entities such as
celebrities or organizations. Important entities in articles carry key
messages and help to understand the content in a more direct way. An
industrial news recommender system contains various key applications,
such as personalized recommendation, item-to-item recommendation, news
category classification, news popularity prediction and local news
detection. We find that incorporating knowledge entities for better
document understanding benefits these applications consistently. However,
existing document understanding models either represent news articles
without considering knowledge entities (e.g., BERT) or rely on a specific
type of text encoding model (e.g., DKN) so that the generalization ability
and efficiency is compromised. In this paper, we propose KRED, which is
a fast and effective model to enhance arbitrary document representation
with a knowledge graph. KRED first enriches entities’ embeddings by
attentively aggregating information from their neighborhood in the
knowledge graph. Then a context embedding layer is applied to annotate
the dynamic context of different entities such as frequency, category and
position. Finally, an information distillation layer aggregates the
entity embeddings under the guidance of the original document
representation and transforms the document vector into a new one. We
advocate to optimize the model with a multi-task framework, so that
different news recommendation applications can be united and useful
information can be shared across different tasks. Experiments on a realworld Microsoft News dataset demonstrate that KRED greatly benefits a
variety of news recommendation applications.
Download