Transformer Models for Recommender Systems

Title: [PDF][PDF] BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer Link: https://www.academia.edu/download/80829631/2007.07229v1.pdf Year: 2020 Title: Deep multifaceted transformers for multi-objective ranking in large-scale e-commerce recommender systems Link: https://dl.acm.org/doi/abs/10.1145/3340531.3412697 Year: 2020 Abstract: Recommender Systems have been playing essential roles in ecommerce portals. Existing recommendation algorithms usually learn the ranking scores of items by optimizing a single task (e.g. Click-through rate prediction) based on users' historical click sequences, but they generally pay few attention to simultaneously modeling users' multiple types of behaviors or jointly optimize multiple objectives (e.g. both Click-through rate and Conversion rate), which are both vital for ecommerce sites. In this paper, we argue that it is crucial to formulate users' different interests based on multiple types of behaviors and perform multi-task learning for significant improvement in multiple objectives simultaneously. We propose Deep Multifaceted Transformers (DMT), a novel framework that can model users' multiple types of behavior sequences simultaneously with multiple Transformers. It utilizes Multigate Mixture-of-Experts to optimize multiple objectives. Besides, it exploits unbiased learning to reduce the selection bias in the training data. Experiments on JD real production dataset demonstrate the effectiveness of DMT, which significantly outperforms state-of-art methods. DMT has been successfully deployed to serve the main traffic in the commercial Recommender System in JD.com. To facilitate future research, we release the codes and datasets at https://github.com/guyulongcs/CIKM2020_DMT. Title: SSE-PT: Sequential recommendation via personalized transformer Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412258 Year: 2020 Abstract: Temporal information is crucial for recommendation problems because user preferences are naturally dynamic in the real world. Recent advances in deep learning, especially the discovery of various attention mechanisms and newer architectures in addition to widely used RNN and CNN in natural language processing, have allowed for better use of the temporal ordering of items that each user has engaged with. In particular, the SASRec model, inspired by the popular Transformer model in natural languages processing, has achieved state-of-the-art results. However, SASRec, just like the original Transformer model, is inherently an unpersonalized model and does not include personalized user embeddings. To overcome this limitation, we propose a Personalized Transformer (SSE-PT) model, outperforming SASRec by almost 5% in terms of NDCG@10 on 5 realworld datasets. Furthermore, after examining some random users’ engagement history, we find our model not only more interpretable but also able to focus on recent engagement patterns for each user. Moreover, our SSE-PT model with a slight modification, which we call SSE-PT++, can handle extremely long sequences and outperform SASRec in ranking results with comparable training speed, striking a balance between performance and speed requirements. Our novel application of the Stochastic Shared Embeddings (SSE) regularization is essential to the success of personalization. Code and data are open-sourced at https://github.com/wuliwei9278/SSE-PT. Title: BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer Link: https://dl.acm.org/doi/abs/10.1145/3357384.3357895 Year: 2019 Abstract: Modeling users' dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users' historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such leftto-right unidirectional models are sub-optimal due to the limitations including: \begin enumerate* [label=series\itshape\alph*\upshape)] \item unidirectional architectures restrict the power of hidden representation in users' behavior sequences; \item they often assume a rigidly ordered sequence which is not always practical. \end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently. Title: Knowledge-enhanced hierarchical graph transformer network for multi-behavior recommendation Link: https://ojs.aaai.org/index.php/AAAI/article/view/16576 Year: 2021 Absract: Accurate user and item embedding learning is crucial for modern recommender systems. However, most existing recommendation techniques have thus far focused on modeling users' preferences over singular type of user-item interactions. Many practical recommendation scenarios involve multi-typed user interactive behaviors (e.g., page view, add-tofavorite and purchase), which presents unique challenges that cannot be handled by current recommendation solutions. In particular: i) complex inter-dependencies across different types of user behaviors; ii) the incorporation of knowledge-aware item relations into the multi-behavior recommendation framework; iii) dynamic characteristics of multi-typed user-item interactions. To tackle these challenges, this work proposes a Knowledge-Enhanced Hierarchical Graph Transformer Network (KHGT), to investigate multi-typed interactive patterns between users and items in recommender systems. Specifically, KHGT is build upon a graph-structured neural architecture to i) capture type-specific behavior semantics; ii) explicitly discriminate which types of user-item interactions are more important in assisting the forecasting task on the target behavior. Additionally, we further integrate the multi-modal graph attention layer with temporal encoding strategy, to empower the learned embeddings be reflective of both dedicated multiplex user-item and item-item collaborative relations, as well as the underlying interaction dynamics. Extensive experiments conducted on three real-world datasets show that KHGT consistently outperforms many state-of-the-art recommendation methods across various evaluation settings. Our implementation is available in https://github.com/akaxlh/KHGT. Title: Behavior sequence transformer for e-commerce recommendation in alibaba Link: https://dl.acm.org/doi/abs/10.1145/3326937.3341261 Year: 2019 Abstract: Deep learning based methods have been widely used in industrial recommendation systems (RSs). Previous works adopt an Embedding&MLP paradigm: raw features are embedded into low-dimensional vectors, which are then fed on to MLP for final recommendations. However, most of these works just concatenate different features, ignoring the sequential nature of users' behaviors. In this paper, we propose to use the powerful Transformer model to capture the sequential signals underlying users' behavior sequences for recommendation in Alibaba. Experimental results demonstrate the superiority of the proposed model, which is then deployed online at Taobao and obtain significant improvements in online ClickThrough-Rate (CTR) comparing to two baselines. Title: Transformers4rec: Bridging the sequential/session-based recommendation gap between nlp and Link: https://dl.acm.org/doi/abs/10.1145/3460231.3474255 Year: 2021 Abstract: Much of the recent progress in sequential and session-based recommendation has been driven by improvements in model architecture and pretraining techniques originating in the field of Natural Language Processing. Transformer architectures in particular have facilitated building higher-capacity models and provided data augmentation and training techniques which demonstrably improve the effectiveness of sequential recommendation. But with a thousandfold more research going on in NLP, the application of transformers for recommendation understandably lags behind. To remedy this we introduce Transformers4Rec, an open-source library built upon HuggingFace’s Transformers library with a similar goal of opening up the advances of NLP based Transformers to the recommender system community and making these advancements immediately accessible for the tasks of sequential and session-based recommendation. Like its core dependency, Transformers4Rec is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. In order to demonstrate the usefulness of the library and the applicability of Transformer architectures in next-click prediction for user sessions, where sequence lengths are much shorter than those commonly found in NLP, we have leveraged Transformers4Rec to win two recent session-based recommendation competitions. In addition, we present in this paper the first comprehensive empirical analysis comparing many Transformer architectures and training approaches for the task of session-based recommendation. We demonstrate that the best Transformer architectures have superior performance across two e-commerce datasets while performing similarly to the baselines on two news datasets. We further evaluate in isolation the effectiveness of the different training techniques used in causal language modeling, masked language modeling, permutation language modeling and replacement token detection for a single Transformer architecture, XLNet. We establish that training XLNet with replacement token detection performs well across all datasets. Finally, we explore techniques to include side information such as item and user context features in order to establish best practices and show that the inclusion of side information uniformly improves recommendation performance. Transformers4Rec library is available at https://github.com/NVIDIAMerlin/Transformers4Rec/ Title: Multiplex behavioral relation learning for recommendation via memory augmented transformer network Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401445 Year: 2020 Abstract: Capturing users' precise preferences is of great importance in various recommender systems (e.g., e-commerce platforms and online advertising sites), which is the basis of how to present personalized interesting product lists to individual users. In spite of significant progress has been made to consider relations between users and items, most of existing recommendation techniques solely focus on singular type of user-item interactions. However, user-item interactive behavior is often exhibited with multi-type (e.g., page view, add-to-favorite and purchase) and inter-dependent in nature. The overlook of multiplex behavior relations can hardly recognize the multi-modal contextual signals across different types of interactions, which limit the feasibility of current recommendation methods. To tackle the above challenge, this work proposes a Memory-Augmented Transformer Networks (MATN), to enable the recommendation with multiplex behavioral relational information, and joint modeling of type-specific behavioral context and type-wise behavior inter-dependencies, in a fully automatic manner. In our MATN framework, we first develop a transformer-based multi-behavior relation encoder, to make the learned interaction representations be reflective of the cross-type behavior relations. Furthermore, a memory attention network is proposed to supercharge MATN capturing the contextual signals of different types of behavior into the categoryspecific latent embedding space. Finally, a cross-behavior aggregation component is introduced to promote the comprehensive collaboration across type-aware interaction behavior representations, and discriminate their inherent contributions in assisting recommendations. Extensive experiments on two benchmark datasets and a real-world e-commence user behavior data demonstrate significant improvements obtained by MATN over baselines. Codes are available at: https://github.com/akaxlh/MATN. Title: Recommender systems in the era of large language models (llms) Link: https://arxiv.org/abs/2307.02046 Year: 2023 Abstract: With the prosperity of e-commerce and web applications, Recommender Systems (RecSys) have become an important component of our daily life, providing personalized suggestions that cater to user preferences. While Deep Neural Networks (DNNs) have made significant advancements in enhancing recommender systems by modeling user-item interactions and incorporating textual side information, DNN-based methods still face limitations, such as difficulties in understanding users' interests and capturing textual side information, inabilities in generalizing to various recommendation scenarios and reasoning on their predictions, etc. Meanwhile, the emergence of Large Language Models (LLMs), such as ChatGPT and GPT4, has revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI), due to their remarkable abilities in fundamental responsibilities of language understanding and generation, as well as impressive generalization and reasoning capabilities. As a result, recent studies have attempted to harness the power of LLMs to enhance recommender systems. Given the rapid evolution of this research direction in recommender systems, there is a pressing need for a systematic overview that summarizes existing LLMempowered recommender systems, to provide researchers in relevant fields with an in-depth understanding. Therefore, in this paper, we conduct a comprehensive review of LLM-empowered recommender systems from various aspects including Pre-training, Fine-tuning, and Prompting. More specifically, we first introduce representative methods to harness the power of LLMs (as a feature encoder) for learning representations of users and items. Then, we review recent techniques of LLMs for enhancing recommender systems from three paradigms, namely pre-training, finetuning, and prompting. Finally, we comprehensively discuss future directions in this emerging field. Title: [PDF][PDF] Design of distribution transformer health management system using IoT sensors Link: https://www.researchgate.net/profile/Rajesh-SharmaRajendran/publication/354650561_Design_of_Distribution_Transformer_Heal th_Management_System_using_IoT_Sensors/links/63a9f7f103aad5368e41d731/D esign-of-Distribution-Transformer-Health-Management-System-using-IoTSensors.pdf Year: 2021 Abstract: Transformers are one of the primary device required for an AC (Alternating Current) distribution system which works on the principle of mutual induction without any rotating parts. There are two types of transformers are utilized in the distribution systems namely step up transformer and step down transformer. The step up transformers are need to be placed at some regular distances for reducing the line losses happening over the electrical transmission systems. Similarly the step down transformers are placed near to the destinations for regulating the electricity power for the commercial usage. Certain regular check-ups are must for a distribution transformer for increasing its operational life time. The proposed work is designed to regularize such health check-ups using IoT sensors for making a centralized remote monitoring system. Title: Continuous-time sequential recommendation with temporal graph collaborative transformer Link: https://dl.acm.org/doi/abs/10.1145/3459637.3482242 Year: 2021 Abstract: In order to model the evolution of user preference, we should learn user/item embeddings based on time-ordered item purchasing sequences, which is defined as Sequential Recommendation~(SR) problem. Existing methods leverage sequential patterns to model item transitions. However, most of them ignore crucial temporal collaborative signals, which are latent in evolving user-item interactions and coexist with sequential patterns. Therefore, we propose to unify sequential patterns and temporal collaborative signals to improve the quality of recommendation, which is rather challenging. Firstly, it is hard to simultaneously encode sequential patterns and collaborative signals. Secondly, it is non-trivial to express the temporal effects of collaborative signals. Hence, we design a new framework Temporal Graph Sequential Recommender (TGSRec) upon our defined continuous-time bipartite graph. We propose a novel Temporal Collaborative Transformer TCT layer in TGSRec, which advances the self-attention mechanism by adopting a novel collaborative attention. TCT layer can simultaneously capture collaborative signals from both users and items, as well as considering temporal dynamics inside sequential patterns. We propagate the information learned from TCT layer over the temporal graph to unify sequential patterns and temporal collaborative signals. Empirical results on five datasets show that modelname significantly outperforms other baselines, in average up to 22.5% and 22.1% absolute improvements in Recall@10 and MRR, respectively. Title: Cd-HRNN: Content-Driven Recommendation System HRNN to Improve Session-Based Link: https://ieeexplore.ieee.org/abstract/document/10191438/ Year: 2023 Abstract: The increasing popularity of digital entertainment systems has made personalization a key factor for success in the industry. Recommendation systems, particularly for videos and movies, are crucial in this regard. However, many existing systems are implicit feedback recommendation system that uses indirect signals to infer user preferences, such as user actions (e.g. clicks, views, purchases) or interactions with items (e.g. listening to a song, watching a movie). The challenge lies in the limited information and uncertainty present in user behavior, making it difficult to predict their interests and preferences. In Previous research, Recurrent Neural Networks (RNNs) have shown to be efficient in predicting the next item in a session, based on past item click sequences, but their effectiveness is limited when only relying on click sequences as input data. In this paper, we extend the Hierarchical RNN architecture (HRNN) for generating recommendations by combining session clicks and item content information, such as item ids and item description respectively. The Bidirectional Encoder Representations from Transformers (BERT) architecture is applied for generating feature vectors from text descriptions of the items. Our model has been extensively tested on the benchmark dataset MovieLens 1m has demonstrated superiority over state-of-the-art (SOTA) session-based recommendation systems (SBRS) models. Experimental results establish the efficacy of using content information along with item ids for recommendation. Title: M6-rec: Generative pretrained language models are open-ended recommender systems Link: https://arxiv.org/abs/2205.08084 Year: 2022 Abstract: Industrial recommender systems have been growing increasingly complex, may involve \emph{diverse domains} such as e-commerce products and user-generated contents, and can comprise \emph{a myriad of tasks} such as retrieval, ranking, explanation generation, and even AI-assisted content production. The mainstream approach so far is to develop individual algorithms for each domain and each task. In this paper, we explore the possibility of developing a unified foundation model to support \emph{open-ended domains and tasks} in an industrial recommender system, which may reduce the demand on downstream settings' data and can minimize the carbon footprint by avoiding training a separate model from scratch for every task. Deriving a unified foundation is challenging due to (i) the potentially unlimited set of downstream domains and tasks, and (ii) the real-world systems' emphasis on computational efficiency. We thus build our foundation upon M6, an existing large-scale industrial pretrained language model similar to GPT-3 and T5, and leverage M6's pretrained ability for sample-efficient downstream adaptation, by representing user behavior data as plain texts and converting the tasks to either language understanding or generation. To deal with a tight hardware budget, we propose an improved version of prompt tuning that outperforms fine-tuning with negligible 1\% task-specific parameters, and employ techniques such as late interaction, early exiting, parameter sharing, and pruning to further reduce the inference time and the model size. We demonstrate the foundation model's versatility on a wide range of tasks such as retrieval, ranking, zero-shot recommendation, explanation generation, personalized content creation, and conversational recommendation, and manage to deploy it on both cloud servers and mobile devices. Title: Personalized transformer for explainable recommendation Link: https://arxiv.org/abs/2105.11601 Year: 2021 Abstract:Personalization of natural language generation plays a vital role in a large spectrum of tasks, such as explainable recommendation, review summarization and dialog systems. In these tasks, user and item IDs are important identifiers for personalization. Transformer, which is demonstrated with strong language modeling capability, however, is not personalized and fails to make use of the user and item IDs since the ID tokens are not even in the same semantic space as the words. To address this problem, we present a PErsonalized Transformer for Explainable Recommendation (PETER), on which we design a simple and effective learning objective that utilizes the IDs to predict the words in the target explanation, so as to endow the IDs with linguistic meanings and to achieve personalized Transformer. Besides generating explanations, PETER can also make recommendations, which makes it a unified model for the whole recommendation-explanation pipeline. Extensive experiments show that our small unpretrained model outperforms fine-tuned BERT on the generation task, in terms of both effectiveness and efficiency, which highlights the importance and the nice utility of our design. Title: Towards knowledge-based recommender dialog system Link: https://arxiv.org/abs/1908.05391 Year: 2019 Abstract:In this paper, we propose a novel end-to-end framework called KBRD, which stands for Knowledge-Based Recommender Dialog System. It integrates the recommender system and the dialog generation system. The dialog system can enhance the performance of the recommendation system by introducing knowledge-grounded information about users' preferences, and the recommender system can improve that of the dialog generation system by providing recommendation-aware vocabulary bias. Experimental results demonstrate that our proposed model has significant advantages over the baselines in both the evaluation of dialog generation and recommendation. A series of analyses show that the two systems can bring mutual benefits to each other, and the introduced knowledge contributes to both their performances. Title: Hybrid transformer knowledge graph completion with multi-level fusion for multimodal Link: https://dl.acm.org/doi/abs/10.1145/3477495.3531992 Year: 2022 Abstract: Multimodal Knowledge Graphs (MKGs), which organize visual-text factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, different tasks and modalities require changes to the model architecture, and not all images/objects are relevant to text input, which hinders the applicability to diverse real-world scenarios. In this paper, we propose a hybrid transformer with multi-level fusion to address those issues. Specifically, we leverage a hybrid transformer architecture with unified input-output for diverse multimodal knowledge graph completion tasks. Moreover, we propose multi-level fusion, which integrates visual and text representation via coarse-grained prefix-guided interaction and finegrained correlation-aware fusion modules. We conduct extensive experiments to validate that our MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER1. https://github.com/zjunlp/MKGformer. Title: Fast multi-resolution transformer fine-tuning for extreme multilabel text classification Link: https://proceedings.neurips.cc/paper_files/paper/2021/hash/3bbca1d243b0 1b47c2bf42b29a8b265c-Abstract.html Year: 2021 Abstract: Extreme multi-label text classification~(XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XR-Transformer takes significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%. Title: Augmenting sequential recommendation with pseudo-prior items via reversely pre-training transformer Link: https://dl.acm.org/doi/abs/10.1145/3404835.3463036 Year: 2021 Abstract: Sequential Recommendation characterizes the evolving patterns by modeling item sequences chronologically. The essential target of it is to capture the item transition correlations. The recent developments of transformer inspire the community to design effective sequence encoders,e.g., SASRec and BERT4Rec. However, we observe that these transformer-based models suffer from the cold-start issue,i.e., performing poorly for short sequences. Therefore, we propose to augment short sequences while still preserving original sequential correlations. We introduce a new framework for Augmenting Sequential Recommendation with Pseudo-prior items (ASReP). We firstly pre-train a transformer with sequences in a reverse direction to predict prior items. Then, we use this transformer to generate fabricated historical items at the beginning of short sequences. Finally, we fine-tune the transformer using these augmented sequences from the time order to predict the next item. Experiments on two real-world datasets verify the effectiveness of ASReP. The code is available on https://github.com/DyGRec/ASReP. Title: CIRS: Bursting recommender system filter bubbles by counterfactual interactive Link: https://dl.acm.org/doi/abs/10.1145/3594871 Year: 2023 Abstract: While personalization increases the utility of recommender systems, it also brings the issue of filter bubbles. e.g., if the system keeps exposing and recommending the items that the user is interested in, it may also make the user feel bored and less satisfied. Existing work studies filter bubbles in static recommendation, where the effect of overexposure is hard to capture. In contrast, we believe it is more meaningful to study the issue in interactive recommendation and optimize long-term user satisfaction. Nevertheless, it is unrealistic to train the model online due to the high cost. As such, we have to leverage offline training data and disentangle the causal effect on user satisfaction. To achieve this goal, we propose a counterfactual interactive recommender system (CIRS) that augments offline reinforcement learning (offline RL) with causal inference. The basic idea is to first learn a causal user model on historical data to capture the overexposure effect of items on user satisfaction. It then uses the learned causal user model to help the planning of the RL policy. To conduct evaluation offline, we innovatively create an authentic RL environment (KuaiEnv) based on a real-world fully observed user rating dataset. The experiments show the effectiveness of CIRS in bursting filter bubbles and achieving long-term success in interactive recommendation. The implementation of CIRS is available via https://github.com/chongminggao/ CIRS-codes. Title: Self-supervised learning for recommender systems: A survey Link: https://ieeexplore.ieee.org/abstract/document/10144391/ Year: 2023 Abstract: In recent years, neural architecture-based recommender systems have achieved tremendous success, but they still fall short of expectation when dealing with highly sparse data. Self-supervised learning (SSL), as an emerging technique for learning from unlabeled data, has attracted considerable attention as a potential solution to this issue. This survey paper presents a systematic and timely review of research efforts on self-supervised recommendation (SSR). Specifically, we propose an exclusive definition of SSR, on top of which we develop a comprehensive taxonomy to divide existing SSR methods into four categories: contrastive, generative, predictive, and hybrid. For each category, we elucidate its concept and formulation, the involved methods, as well as its pros and cons. Furthermore, to facilitate empirical comparison, we release an open-source library SELFRec ( https://github.com/Coder-Yu/SELFRec ), which incorporates a wide range of SSR models and benchmark datasets. Through rigorous experiments using this library, we derive and report some significant findings regarding the selection of self-supervised signals for enhancing recommendation. Finally, we shed light on the limitations in the current research and outline the future research directions. Title: Improving Arabic text categorization using transformer training diversification Link: https://aclanthology.org/2020.wanlp-1.21/ Year: 2020 Abstract: Automatic categorization of short texts, such as news headlines and social media posts, has many applications ranging from content analysis to recommendation systems. In this paper, we use such text categorization i.e., labeling the social media posts to categories like ‘sports’, ‘politics’, ‘human-rights’ among others, to showcase the efficacy of models across different sources and varieties of Arabic. In doing so, we show that diversifying the training data, whether by using diverse training data for the specific task (an increase of 21% macro F1) or using diverse data to pre-train a BERT model (26% macro F1), leads to overall improvements in classification effectiveness. In our work, we also introduce two new Arabic text categorization datasets, where the first is composed of social media posts from a popular Arabic news channel that cover Twitter, Facebook, and YouTube, and the second is composed of tweets from popular Arabic accounts. The posts in the former are nearly exclusively authored in modern standard Arabic (MSA), while the tweets in the latter contain both MSA and dialectal Arabic. Title: Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning Link: https://arxiv.org/abs/2307.03591 Year: 2023 Abstract:Multimodal knowledge graphs (MKGs), which intuitively organize information in various modalities, can benefit multiple practical downstream tasks, such as recommendation systems, and visual question answering. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial architectures, the pretrained transformer models have drawn increasing attention, especially for multimodal scenarios. However, the research of multimodal pretrained transformer (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multimodal data, the rich structural information underlying the MKG still cannot be fully leveraged in existing MPT models. Most of them only utilize the graph structure as a retrieval map for matching images and texts connected with the same entity. This manner hinders their reasoning performances. To this end, we propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two different strategies, i.e., weighted summation and alignment constraint, is first designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies. Title: [HTML][HTML] A dynamic graph representation learning based on temporal graph transformer Link: https://www.sciencedirect.com/science/article/pii/S1110016822005336 Year: 2023 Abstract: The graph neural network has received significant attention in recent years because of its unique role in mining graph-structure data and its ubiquitous application in various fields, such as social networking and recommendation systems. Although most work focuses on learning low-dimensional node representation in static graphs, the dynamic nature of real-world networks makes temporal graphs more practical and significant. In this paper, we propose a dynamic graph representation learning method based on a temporal graph transformer (TGT), which can efficiently preserve high-order information and temporally evolve structural properties by incorporating an update module, an aggregation module, and a propagation module in a single model. The experimental results on three real-world networks demonstrate that the TGT outperforms state-of-the-art baselines for dynamic link prediction and edge classification tasks in terms of both accuracy and efficiency. Title: Personalized re-ranking for recommendation Link: https://dl.acm.org/doi/abs/10.1145/3298689.3347000 Year: 2019 Abstract: Ranking is a core task in recommender systems, which aims at providing an ordered list of items to users. Typically, a ranking function is learned from the labeled dataset to optimize the global performance, which produces a ranking score for each individual item. However, it may be sub-optimal because the scoring function applies to each item individually and does not explicitly consider the mutual influence between items, as well as the differences of users' preferences or intents. Therefore, we propose a personalized re-ranking model for recommender systems. The proposed re-ranking model can be easily deployed as a followup modular after any ranking algorithm, by directly using the existing ranking feature vectors. It directly optimizes the whole recommendation list by employing a transformer structure to efficiently encode the information of all items in the list. Specifically, the Transformer applies a self-attention mechanism that directly models the global relationships between any pair of items in the whole list. We confirm that the performance can be further improved by introducing pre-trained embedding to learn personalized encoding functions for different users. Experimental results on both offline benchmarks and real-world online ecommerce systems demonstrate the significant improvements of the proposed re-ranking model. Title: An ensemble-based hotel recommender system analysis and aspect categorization of hotel reviews using sentiment Link: https://www.sciencedirect.com/science/article/pii/S1568494620308735 Year: 2021 Abstract: Finding a suitable hotel based on user’s need and affordability is a complex decision-making process. Nowadays, the availability of an ample amount of online reviews made by the customers helps us in this regard. This very fact gives us a promising research direction in the field of tourism called hotel recommendation system which also helps in improving the information processing of consumers. Real-world reviews may showcase different sentiments of the customers towards a hotel and each review can be categorized based on different aspects such as cleanliness, value, service, etc. Keeping these facts in mind, in the present work, we have proposed a hotel recommendation system using Sentiment Analysis of the hotel reviews, and aspect-based review categorization which works on the queries given by a user. Furthermore, we have provided a new rich and diverse dataset of online hotel reviews crawled from Tripadvisor.com. We have followed a systematic approach which first uses an ensemble of a binary classification called Bidirectional Encoder Representations from Transformers (BERT) model with three phases for positive–negative, neutral–negative, neutral–positive sentiments merged using a weight assigning protocol. We have then fed these pre-trained word embeddings generated by the BERT models along with other different textual features such as word vectors generated by Word2vec, TF–IDF of frequent words, subjectivity score, etc. to a Random Forest classifier. After that, we have also grouped the reviews into different categories using an approach that involves fuzzy logic and cosine similarity. Finally, we have created a recommender system by the aforementioned frameworks. Our model has achieved a Macro F1-score of 84% and test accuracy of 92.36% in the classification of sentiment polarities. Also, the results of the categorized reviews have formed compact clusters. The results are quite promising and much better compared to state-of-the-art models. Title: Pre-training graph transformer with multimodal side information for recommendation Link: https://dl.acm.org/doi/abs/10.1145/3474085.3475709 Year: 2021 Abstract: Side information of items, e.g., images and text description, has shown to be effective in contributing to accurate recommendations. Inspired by the recent success of pre-training models on natural language and images, we propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We relate items by common user activities, e.g., copurchase, and construct a homogeneous item graph. This graph provides a unified view of item relations and their associated side information in multimodality. We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item. The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction. Experimental results on real datasets demonstrate that the proposed PMGT model effectively exploits the multimodality side information to achieve better accuracies in downstream tasks including item recommendation and click-through ratio prediction. In addition, we also report a case study of testing PMGT in an online setting with 600 thousand users. Title: Knowledge aware emotion recognition in textual conversations via multi-task incremental transformer Link: https://aclanthology.org/2020.coling-main.392/ Year: 2020 Abstract: Emotion recognition in textual conversations (ERTC) plays an important role in a wide range of applications, such as opinion mining, recommender systems, and so on. ERTC, however, is a challenging task. For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance. In this paper, we propose a novel Knowledge Aware Incremental Transformer with Multi-task Learning (KAITML) to address these challenges. Firstly, we devise a dual-level graph attention mechanism to leverage commonsense knowledge, which augments the semantic information of the utterance. Then we apply the Incremental Transformer to encode multiturn contextual utterances. Moreover, we are the first to introduce multitask learning to alleviate the aforementioned confusion and thus further improve the emotion recognition performance. Extensive experimental results show that our KAITML model outperforms the state-of-the-art models across five benchmark datasets. Title: A survey on knowledge graph-based recommender systems Link: https://ieeexplore.ieee.org/abstract/document/9390863/ Year: 2021 Abstract: To solve the cognitive overlord problem and information explosion, recommender systems have been using to model the user interest. Although recommender systems have been developed for decades, there still exists many problems such as cold start and data sparsity. Thus, the knowledge graph is introduced into the recommendation domain to alleviate these problems. We collect papers related to the knowledge graph-based recommender systems in recent years to summarize their fundamental knowledge and main ideas, including the usage of the knowledge graph in the recommender systems and user interest models. Finally, we propose several future directions aiming to make some progress. Title: Improving conversational recommender systems via knowledge graph based semantic fusion Link: https://dl.acm.org/doi/abs/10.1145/3394486.3403143 Year: 2020 Abstract: Conversational recommender systems (CRS) aim to recommend highquality items to users through interactive conversations. Although several efforts have been made for CRS, two major issues still remain to be solved. First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference. Second, there is a semantic gap between natural language expression and item-level user preference. To address these issues, we incorporate both word-oriented and entity-oriented knowledge graphs~(KG) to enhance the data representations in CRSs, and adopt Mutual Information Maximization to align the word-level and entity-level semantic spaces. Based on the aligned semantic representations, we further develop a KG-enhanced recommender component for making accurate recommendations, and a KGenhanced dialog component that can generate informative keywords or entities in the response text. Extensive experiments have demonstrated the effectiveness of our approach in yielding better performance on both recommendation and conversation tasks. Title: Leveraging historical conversational recommender system interaction data for improving Link: https://dl.acm.org/doi/abs/10.1145/3340531.3412098 Year: 2020 Abstract: Recently, conversational recommender system (CRS) has become an emerging and practical research topic. Most of the existing CRS methods focus on learning effective preference representations for users from conversation data alone. While, we take a new perspective to leverage historical interaction data for improving CRS. For this purpose, we propose a novel pre-training approach to integrating both item-based preference sequence (from historical interaction data) and attributebased preference sequence (from conversation data) via pre-training methods. We carefully design two pre-training tasks to enhance information fusion between item- and attribute-based preference. To improve the learning performance, we further develop an effective negative sample generator which can produce high-quality negative samples. Experiment results on two real-world datasets have demonstrated the effectiveness of our approach for improving CRS. Title: Specter: Document-level representation learning using citationinformed transformers Link: https://arxiv.org/abs/2004.07180 Year: 2020 Abstract:Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their documentlevel representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark. Title: [PDF][PDF] Adversarial oracular seq2seq learning for sequential recommendation Link: https://www.ijcai.org/Proceedings/2020/0264.pdf Year: 2021 Abstract: Title: Uprec: User-aware pre-training for recommender systems Link: https://arxiv.org/abs/2102.10989 Year: 2021 Abstract:Existing sequential recommendation methods rely on large amounts of training data and usually suffer from the data sparsity problem. To tackle this, the pre-training mechanism has been widely adopted, which attempts to leverage large-scale data to perform self-supervised learning and transfer the pre-trained parameters to downstream tasks. However, previous pre-trained models for recommendation focus on leverage universal sequence patterns from user behaviour sequences and item information, whereas ignore capturing personalized interests with the heterogeneous user information, which has been shown effective in contributing to personalized recommendation. In this paper, we propose a method to enhance pre-trained models with heterogeneous user information, called User-aware Pre-training for Recommendation (UPRec). Specifically, UPRec leverages the user attributes andstructured social graphs to construct self-supervised objectives in the pre-training stage and proposes two user-aware pre-training tasks. Comprehensive experimental results on several real-world large-scale recommendation datasets demonstrate that UPRec can effectively integrate user information into pre-trained models and thus provide more appropriate recommendations for users. Title: Towards topic-guided conversational recommender system Link: https://arxiv.org/abs/2010.04125 Year: 2020 Abstract:Conversational recommender systems (CRS) aim to recommend highquality items to users through interactive conversations. To develop an effective CRS, the support of high-quality datasets is essential. Existing CRS datasets mainly focus on immediate requests from users, while lack proactive guidance to the recommendation scenario. In this paper, we contribute a new CRS dataset named \textbf{TG-ReDial} (\textbf{Re}commendation through \textbf{T}opic-\textbf{G}uided \textbf{Dial}og). Our dataset has two major features. First, it incorporates topic threads to enforce natural semantic transitions towards the recommendation scenario. Second, it is created in a semiautomatic way, hence human annotation is more reasonable and controllable. Based on TG-ReDial, we present the task of topic-guided conversational recommendation, and propose an effective approach to this task. Extensive experiments have demonstrated the effectiveness of our approach on three sub-tasks, namely topic prediction, item recommendation and response generation. TG-ReDial is available at this https URL. Title: What does bert know about books, movies and music? probing bert for conversational recommendation Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412249 Year: 2020 Abstract: Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling, achieving impressive results on numerous downstream tasks. It has also been shown that they implicitly store factual knowledge in their parameters after pre-training. Understanding what the pre-training procedure of LMs actually learns is a crucial step for using and improving them for Conversational Recommender Systems (CRS). We first study how much offthe-shelf pre-trained BERT “knows” about recommendation items such as books, movies and music. In order to analyze the knowledge stored in BERT’s parameters, we use different probes (i.e., tasks to examine a trained model regarding certain properties) that require different types of knowledge to solve, namely content-based and collaborative-based. Content-based knowledge is knowledge that requires the model to match the titles of items with their content information, such as textual descriptions and genres. In contrast, collaborative-based knowledge requires the model to match items with similar ones, according to community interactions such as ratings. We resort to BERT’s Masked Language Modelling (MLM) head to probe its knowledge about the genre of items, with cloze style prompts. In addition, we employ BERT’s Next Sentence Prediction (NSP) head and representations’ similarity (SIM) to compare relevant and non-relevant search and recommendation querydocument inputs to explore whether BERT can, without any fine-tuning, rank relevant items first. Finally, we study how BERT performs in a conversational recommendation downstream task. To this end, we fine-tune BERT to act as a retrieval-based CRS. Overall, our experiments show that: (i) BERT has knowledge stored in its parameters about the content of books, movies and music; (ii) it has more content-based knowledge than collaborative-based knowledge; and (iii) fails on conversational recommendation when faced with adversarial data. Title: Graph neural recommendation network for tag ranking in tag-enhanced video Link: https://dl.acm.org/doi/abs/10.1145/3340531.3416021 Year: 2020 Abstract: In tag-enhanced video recommendation systems, videos are attached with some tags that highlight the contents of videos from different aspects. Tag ranking in such recommendation systems provides personalized tag lists for videos from their tag candidates. A better tag ranking model could attract users to click more tags, enter their corresponding tag channels, and watch more tag-specific videos, which improves both tag click rate and video watching time. However, most conventional tag ranking models merely concentrate on tag-video relevance or tag-related behaviors, ignoring the rich information in video-related behaviors. We should consider user preferences on both tags and videos. In this paper, we propose a novel Graph neural network based tag ranking (GraphTR) framework on a huge heterogeneous network with video, tag, user and media. We design a novel graph neural network that combines multifield transformer, GraphSAGE and neural FM layers in node aggregation. We also propose a neighbor-similarity based loss to encode various user preferences into heterogeneous node representations. In experiments, we conduct both offline and online evaluations on a real-world video recommendation system in WeChat Top Stories. The significant improvements in both video and tag related metrics confirm the effectiveness and robustness in real-world tag-enhanced video recommendation. Currently, GraphTR has been deployed on WeChat Top Stories for more than six months. The source codes are in https://github.com/lqfarmer/GraphTR. Title: Molecular graph enhanced transformer for retrosynthesis prediction Link: https://www.sciencedirect.com/science/article/pii/S0925231221009413 Year: 2021 Abstract: With massive possible synthetic routes in chemistry, retrosynthesis prediction is still a challenge for researchers. Recently, retrosynthesis prediction is formulated as a Machine Translation (MT) task. Namely, since each molecule can be represented as a Simplified Molecular-Input Line-Entry System (SMILES) string, the process of retrosynthesis is analogized to a process of language translation from the product to reactants. However, the MT models that applied on SMILES data usually ignore the information of natural atomic connections and the topology of molecules. To make more chemically plausible constrains on the atom representation learning for better performance, in this paper, we propose a Graph Enhanced Transformer (GET) framework, which adopts both the sequential and graphical information of molecules. Four different GET designs are proposed, which fuse the SMILES representations with atom embeddings learned from our improved Graph Neural Network (GNN). Empirical results show that our model significantly outperforms the vanilla Transformer model in test accuracy. Title: [HTML][HTML] News recommender system: a review of recent progress, challenges, and opportunities Link: https://link.springer.com/article/10.1007/s10462-021-10043-x Year: 2022 Abstract: Nowadays, more and more news readers read news online where they have access to millions of news articles from multiple sources. In order to help users find the right and relevant content, news recommender systems (NRS) are developed to relieve the information overload problem and suggest news items that might be of interest for the news readers. In this paper, we highlight the major challenges faced by the NRS and identify the possible solutions from the state-of-the-art. Our discussion is divided into two parts. In the first part, we present an overview of the recommendation solutions, datasets, evaluation criteria beyond accuracy and recommendation platforms being used in the NRS. We also talk about two popular classes of models that have been successfully used in recent years. In the second part, we focus on the deep neural networks as solutions to build the NRS. Different from previous surveys, we study the effects of news recommendations on user behaviors and try to suggest possible remedies to mitigate those effects. By providing the state-ofthe-art knowledge, this survey can help researchers and professional practitioners have a better understanding of the recent developments in news recommendation algorithms. In addition, this survey sheds light on the potential new directions. Title: Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems Link: https://proceedings.mlsys.org/paper_files/paper/2022/hash/1eb34d662b67a 14e3511d0dfd78669be-Abstract.html Year: 2022 Abstract: Deep learning for recommendation data is one of the most pervasive and challenging AI workload in recent times. State-of-the-art recommendation models are one of the largest models matching the likes of GPT-3 and Switch Transformer. Challenges in deep learning recommendation models (DLRM) stem from learning dense embeddings for each of the categorical tokens. These embedding tables in industrial scale models can be as large as hundreds of terabytes. Such large models lead to a plethora of engineering challenges, not to mention prohibitive communication overheads, and slower training and inference times. Of these, slower inference time directly impacts user experience. Model compression for DLRM is gaining traction and the community has recently shown impressive compression results. In this paper, we present Random Offset Block Embedding Array (ROBE) as a low memory alternative to embedding tables which provide orders of magnitude reduction in memory usage while maintaining accuracy and boosting execution speed. ROBE is a simple fundamental approach in improving both cache performance and the variance of randomized hashing, which could be of independent interest in itself. We demonstrate that we can successfully train DLRM models with same accuracy while using 1000× less memory. A 1000× compressed model directly results in faster inference without any engineering effort. In particular, we show that we can train DLRM model using ROBE array of size 100MB on a single GPU to achieve AUC of 0.8025 or higher as required by official MLPerf CriteoTB benchmark DLRM model of 100GB while achieving about in inference throughput. 3.1× (209\%) improvement Title: Guided transformer: Leveraging multiple external sources for representation learning in conversational search Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401061 Year: 2020 Abstract: Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively less explored. In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation. We evaluate this Guided Transformer model in a conversational search scenario that includes clarifying questions. In our experiments, we use two separate external sources, including the top retrieved documents and a set of different possible clarifying questions for the query. We implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection. Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines. Title: Self-supervised reinforcement learning for recommender systems Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401147 Year: 2020 Abstract: In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose selfsupervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards (e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Qlearning (SQN) and Self-Supervised Actor-Critic (SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach. Title: MEANTIME: Mixture of attention mechanisms with multi-temporal embeddings for sequential recommendation Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412216 Year: 2020 Abstract: Recently, self-attention based models have achieved state-ofthe-art performance in sequential recommendation task. Following the custom from language processing, most of these models rely on a simple positional embedding to exploit the sequential nature of the user’s history. However, there are some limitations regarding the current approaches. First, sequential recommendation is different from language processing in that timestamp information is available. Previous models have not made good use of it to extract additional contextual information. Second, using a simple embedding scheme can lead to information bottleneck since the same embedding has to represent all possible contextual biases. Third, since previous models use the same positional embedding in each attention head, they can wastefully learn overlapping patterns. To address these limitations, we propose MEANTIME (MixturE of AtteNTIon mechanisms with Multi-temporal Embeddings) which employs multiple types of temporal embeddings designed to capture various patterns from the user’s behavior sequence, and an attention structure that fully leverages such diversity. Experiments on real-world data show that our proposed method outperforms current state-of-the-art sequential recommendation methods, and we provide an extensive ablation study to analyze how the model gains from the diverse positional information. Title: Deep learning for recommender systems: A Netflix case study Link: https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/18140 Year: 2021 Abstract: Deep learning has profoundly impacted many areas of machine learning. However, it took a while for its impact to be felt in the field of recommender systems. In this article, we outline some of the challenges encountered and lessons learned in using deep learning for recommender systems at Netflix. We first provide an overview of the various recommendation tasks on the Netflix service. We found that different model architectures excel at different tasks. Even though many deep- learning models can be understood as extensions of existing (simple) recommendation algorithms, we initially did not observe significant improvements in performance over well-tuned non-deep-learning approaches. Only when we added numerous features of heterogeneous types to the input data, deep-learning models did start to shine in our setting. We also observed that deep-learning methods can exacerbate the problem of offline–online metric (mis-)alignment. After addressing these challenges, deep learning has ultimately resulted in large improvements to our recommendations as measured by both offline and online metrics. On the practical side, integrating deep-learning toolboxes in our system has made it faster and easier to implement and experiment with both deeplearning and non-deep-learning approaches for various recommendation tasks. We conclude this article by summarizing our take-aways that may generalize to other applications beyond Netflix. Title: Transformer network lithium-ion batteries for remaining useful life prediction of Link: https://ieeexplore.ieee.org/abstract/document/9714323/ Year: 2022 Abstract: Accurately predicting the Remaining Useful Life (RUL) of a Liion battery plays an important role in managing the health and estimating the state of a battery. With the rapid development of electric vehicles, there is an increasing need to develop and improve the techniques for predicting RUL. To predict RUL, we designed a Transformer-based neural network. First, battery capacity data is always full of noise, especially during battery charge/discharge regeneration. To alleviate this problem, we applied a Denoising Auto-Encoder (DAE) to process raw data. Then, to capture temporal information and learn useful features, a reconstructed sequence was fed into a Transformer network. Finally, to bridge denoising and prediction tasks, we combined these two tasks into a unified framework. Results of extensive experiments conducted on two data sets and a comparison with some existing methods show that our proposed method performs better in predicting RUL. Our projects are all open source and are available at https://github.com/XiuzeZhou/RUL . Title: A hierarchical recommendation system for E-commerce using online user reviews Link: https://www.sciencedirect.com/science/article/pii/S1567422322000151 Year: 2022 Abstract: Recommendation systems are considered as one of the important components of e-commerce platforms due to their direct impact on profitability. In this study, we propose a hierarchical recommendation system to increase the performance of the e-commerce recommendation system. Our DeepIDRS approach has a two-level hierarchical structure: (1) The first level uses bidirectional encoder representations to represent textual information of an item (title, description, and a subset of item reviews), efficiently and accurately; (2) The second level is an attention-based sequential recommendation model that uses item embeddings derived from the first level of the hierarchical structure. Furthermore, we compare our approach DeepIDRS with various approaches from different perspectives. Our results in the real-world dataset show that DeepIDRS provides at least 10% better HR@10 and NCCG@10 performance than other review-based models. With this study, for e-commerce, we clearly show that a hierarchical, explainable recommendation system that accurately represents the item title, description, and a subset of item reviews, improves performance. Title: [PDF][PDF] Bert, elmo, use and infersent sentence encoders: The panacea for research-paper recommendation? Link: https://ceur-ws.org/Vol-2431/paper2.pdf Year: 2019 Abstract: Title: EdgeRec: recommender system on edge in Mobile Taobao Link: https://dl.acm.org/doi/abs/10.1145/3340531.3412700 Year: 2020 Abstract: Recommender system (RS) has become a crucial module in most web-scale applications. Recently, most RSs are in the waterfall form based on the cloud-to-edge framework, where recommended results are transmitted to edge (e.g., user mobile) by computing in advance in the cloud server. Despite effectiveness, network bandwidth and latency between cloud server and edge may cause the delay for system feedback and user perception. Hence, real-time computing on edge could help capture user preferences more preciously and thus make more satisfactory recommendations. Our work, to our best knowledge, is the first attempt to design and implement the novel Recommender System on Edge (EdgeRec), which achieves Real-time User Perception and Real-time System Feedback. Moreover, we propose Heterogeneous User Behavior Sequence Modeling and Context-aware Reranking with Behavior Attention Networks to capture user's diverse interests and adjust recommendation results accordingly. Experimental results on both the offline evaluation and online performance in Taobao home-page feeds demonstrate the effectiveness of EdgeRec. Title: Towards question-based recommender systems Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401180 Year: 2020 Abstract: Conversational and question-based recommender systems have gained increasing attention in recent years, with users enabled to converse with the system and better control recommendations. Nevertheless, research in the field is still limited, compared to traditional recommender systems. In this work, we propose a novel Question-based recommendation method, Qrec, to assist users to find items interactively, by answering automatically constructed and algorithmically chosen questions. Previous conversational recommender systems ask users to express their preferences over items or item facets. Our model, instead, asks users to express their preferences over descriptive item features. The model is first trained offline by a novel matrix factorization algorithm, and then iteratively updates the user and item latent factors online by a closed-form solution based on the user answers. Meanwhile, our model infers the underlying user belief and preferences over items to learn an optimal question-asking strategy by using Generalized Binary Search, so as to ask a sequence of questions to the user. Our experimental results demonstrate that our proposed matrix factorization model outperforms the traditional Probabilistic Matrix Factorization model. Further, our proposed Qrec model can greatly improve the performance of state-of-the-art baselines, and it is also effective in the case of coldstart user and item recommendations. Title: Enhancing Recommender Systems with Large Language Model Reasoning Graphs Link: https://arxiv.org/abs/2308.10835 Year: 2023 Abstract:Recommendation systems aim to provide users with relevant suggestions, but often lack interpretability and fail to capture higherlevel semantic relationships between user behaviors and profiles. In this paper, we propose a novel approach that leverages large language models (LLMs) to construct personalized reasoning graphs. These graphs link a user's profile and behavioral sequences through causal and logical inferences, representing the user's interests in an interpretable way. Our approach, LLM reasoning graphs (LLMRG), has four components: chained graph reasoning, divergent extension, self-verification and scoring, and knowledge base self-improvement. The resulting reasoning graph is encoded using graph neural networks, which serves as additional input to improve conventional recommender systems, without requiring extra user or item information. Our approach demonstrates how LLMs can enable more logical and interpretable recommender systems through personalized reasoning graphs. LLMRG allows recommendations to benefit from both engineered recommendation systems and LLM-derived reasoning graphs. We demonstrate the effectiveness of LLMRG on benchmarks and real-world scenarios in enhancing base recommendation models. Title: Deep learning based recommender system using cross convolutional filters Link: https://www.sciencedirect.com/science/article/pii/S0020025522000561 Year: 2022 Abstract: With the recent development of online transactions, recommender systems have increasingly attracted attention in various domains. The recommender system supports the users’ decision making by recommending items that are more likely to be preferred. Many studies in the field of deep learning-based recommender systems have attempted to capture the complex interactions between users’ and items’ features for accurate recommendation. In this paper, we propose a recommender system based on the convolutional neural network using the outer product matrix of features and cross convolutional filters. The proposed method can deal with the various types of features and capture the meaningful higherorder interactions between users and items, giving greater weight to important features. Moreover, it can alleviate the overfitting problem since the proposed method includes the global average or max pooling instead of the fully connected layers in the structure. Experiments showed that the proposed method performs better than the existing methods, by capturing important interactions and alleviating the overfitting issue. Title: Bert4sessrec: Content-based video relevance prediction bidirectional encoder representations from transformer with Link: https://dl.acm.org/doi/abs/10.1145/3343031.3356051 Year: 2019 Abstract: This paper describes our solution for the Content-Based Video Relevance Prediction (CBVRP) challenge, where the task is to predict user click-through behavior on new TV series or new movies according to the user's historical behavior. We consider the task as a session-based recommendation problem and we focus on the modeling of the session. Thus, we use the Bidirectional Encoder Representations from Transformer (BERT) methodology and propose a BERT for session-based recommendation (BERT4SessRec) method. Our method has two stages: in the pre-training stage, we use all sessions as training data and train the bidirectional session encoder with the masking trick; in the fine-tuning stage, we use the provided click-through data and train the click-through prediction network. Our method achieves session representations with the help of BERT, which effectively captures the bidirectional correlation in each session. In addition, the pre-training stage makes full use of all sessions, overcoming the positive-negative imbalance problem of the click-through data. We report the results of using different kinds of features on the test set of the challenge, which verify the effectiveness of our method. Title: Bert4nilm: A bidirectional transformer model for non-intrusive load monitoring Link: https://dl.acm.org/doi/abs/10.1145/3427771.3429390 Year: 2020 Abstract: Non-intrusive load monitoring (NILM) based energy disaggregation is the decomposition of a system's energy into the consumption of its individual appliances. Previous work on deep learning NILM algorithms has shown great potential in the field of energy management and smart grids. In this paper, we propose BERT4NILM, an architecture based on bidirectional encoder representations from transformers (BERT) and an improved objective function designed specifically for NILM learning. We adapt the bidirectional transformer architecture to the field of energy disaggregation and follow the pattern of sequence-to-sequence learning. With the improved loss function and masked training, BERT4NILM outperforms state-of-the-art models across various metrics on the two publicly available datasets UK-DALE and REDD. Title: [HTML][HTML] Knowledge transfer recommendation: A review and prospect via pre-training for Link: https://www.frontiersin.org/articles/10.3389/fdata.2021.602071/full Year: 2021 Abstract: Recommender systems aim to provide item recommendations for users and are usually faced with data sparsity problems (e.g., cold start) in real-world scenarios. Recently pre-trained models have shown their effectiveness in knowledge transfer between domains and tasks, which can potentially alleviate the data sparsity problem in recommender systems. In this survey, we first provide a review of recommender systems with pre-training. In addition, we show the benefits of pre-training to recommender systems through experiments. Finally, we discuss several promising directions for future research of recommender systems with pretraining. The source code of our experiments will be available to facilitate future research. Title: Developing a personalized recommendation system in a smart product service system based on unsupervised learning model Link: https://www.sciencedirect.com/science/article/pii/S0166361521000282 Year: 2021 Abstract: Contemporary consumers have begun shifting their focus from product functionality toward the value that can be derived from products. In response to this trend, companies have begun using product service systems (PSS), business models that provide customers not only with tangible products but also with intangible services. Moreover, with the increasing use of smart devices, services providers can offer customized services to customers based on user-generated data with smart product service systems (Smart PSS). Despite extensive research on Smart PSS framework, few of these frameworks treated customer as an active data producer, which means producing data for the Smart PSS actively. Additionally, most of them proposed a general solution instead of a personalized one. To bridge the research gap, this study proposed a method that includes: (1) unsupervised natural language processing (NLP) methods to analyze user-provided data. (2) a recommendation system integrating deep learning to offer customers with personalized solutions. Thus, the role of customers is not only a service receiver but also an active data producer and forms a value co-creation process with service providers. A case study of tourist recommendation validate the benefits of proposed method. The main contribution of this research is to develop a personalized smart PSS method which could achieve a win-win situation for all players in this method. Title: A review of modern fashion recommender systems Link: https://arxiv.org/abs/2202.02757 Year: 2022 Abstract:The textile and apparel industries have grown tremendously over the last few years. Customers no longer have to visit many stores, stand in long queues, or try on garments in dressing rooms as millions of products are now available in online catalogs. However, given the plethora of options available, an effective recommendation system is necessary to properly sort, order, and communicate relevant product material or information to users. Effective fashion RS can have a noticeable impact on billions of customers' shopping experiences and increase sales and revenues on the provider side. The goal of this survey is to provide a review of recommender systems that operate in the specific vertical domain of garment and fashion products. We have identified the most pressing challenges in fashion RS research and created a taxonomy that categorizes the literature according to the objective they are trying to accomplish (e.g., item or outfit recommendation, size recommendation, explainability, among others) and type of side-information (users, items, context). We have also identified the most important evaluation goals and perspectives (outfit generation, outfit recommendation, pairing recommendation, and fill-in-the-blank outfit compatibility prediction) and the most commonly used datasets and evaluation metrics. Title: An empirical study on the usage of transformer models for code completion Link: https://ieeexplore.ieee.org/abstract/document/9616462/ Year: 2021 Abstract: Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made possible by deep learning (DL) models. However, code completion techniques are mostly evaluated in the scenario of predicting the next token to type, with few exceptions pushing the boundaries to the prediction of an entire code statement. Thus, little is known about the performance of state-of-the-art code completion approaches in more challenging scenarios in which, for example, an entire code block must be generated. We present a large-scale study exploring the capabilities of state-of-the-art Transformer-based models in supporting code completion at different granularity levels, including single tokens, one or multiple entire statements, up to entire code blocks (e.g., the iterated block of a for loop). We experimented with several variants of two recently proposed Transformer-based models, namely RoBERTa and the Text-To-Text Transfer Transformer (T5), for the task of code completion. The achieved results show that Transformer-based models, and in particular the T5, represent a viable solution for code completion, with perfect predictions ranging from ∼ 29%, obtained when asking the model to guess entire blocks, up to ∼ 69%, reached in the simpler scenario of few tokens masked from the same code statement. Title: Graph transformer networks Link: https://proceedings.neurips.cc/paper/2019/hash/9d63484abb477c97640154d4 0595a3bb-Abstract.html Year: 2019 Abstract: Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved state-of-the-art performance in tasks such as node classification and link prediction. However, most existing GNNs are designed to learn node representations on the fixed and homogeneous graphs. The limitations especially become problematic when learning representations on a misspecified graph or a heterogeneous graph that consists of various types of nodes and edges. In this paper, we propose Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-toend fashion. Graph Transformer layer, a core layer of GTNs, learns a soft selection of edge types and composite relations for generating useful multi-hop connections so-call meta-paths. Our experiments show that GTNs learn new graph structures, based on data and tasks without domain knowledge, and yield powerful node representation via convolution on the new graphs. Without domain-specific graph preprocessing, GTNs achieved the best performance in all three benchmark node classification tasks against the state-of-the-art methods that require pre-defined meta-paths from domain knowledge. Title: Deep learning recommendation systems recommendation model for personalization and Link: https://arxiv.org/abs/1906.00091 Year: 2019 Abstract:With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design. Title: Tutorial on conversational recommendation systems Link: https://dl.acm.org/doi/abs/10.1145/3383313.3411548 Year: 2020 Abstract: Recent years have witnessed the emerging of conversational systems, including both physical devices and mobile-based applications. Both the research community and industry believe that conversational systems will have a major impact on human-computer interaction, and specifically, the RecSys community has begun to explore Conversational Recommendation Systems. Conversational recommendation aims at finding or recommending the most relevant information (e.g., web pages, answers, movies, products) for users based on textual- or spoken-dialogs, through which users can communicate with the system more efficiently using natural language conversations. Due to users’ constant need to look for information to support both work and daily life, conversational recommendation system will be one of the key techniques towards an intelligent web. The tutorial focuses on the foundations and algorithms for conversational recommendation, as well as their applications in realworld systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating conversational recommendation methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions. Title: Why are deep learning models not consistently winning recommender systems competitions yet? A position paper Link: https://dl.acm.org/doi/abs/10.1145/3415959.3416001 Year: 2020 Abstract: For the past few years most published research on recommendation algorithms has been based on deep learning (DL) methods. Following common research practices in our field, these works usually demonstrate that a new DL method is outperforming other models not based on deep learning in offline experiments. This almost consistent success of DL based models is however not observed in recommendation-related machine learning competitions like the challenges that are held with the yearly ACM RecSys conference. Instead the winning solutions mostly consist of substantial feature engineering efforts and the use of gradient boosting or ensemble techniques. In this paper we investigate possible reasons for this surprising phenomenon. We consider multiple possible factors such as the characteristics and complexity of the problem settings, datasets, and DL methods; the background of the competition participants; or the particularities of the evaluation approach. Title: Optimizing deep learning recommender systems training on cpu cluster architectures Link: https://ieeexplore.ieee.org/abstract/document/9355237/ Year: 2020 Abstract: During the last two years, the goal of many researchers has been to squeeze the last bit of performance out of HPC system for AI tasks. Often this discussion is held in the context of how fast ResNet50 can be trained. Unfortunately, ResNet50 is no longer a representative workload in 2020. Thus, we focus on Recommender Systems which account for most of the AI cycles in cloud computing centers. More specifically, we focus on Facebook's DLRM benchmark. By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve up to twoorders of magnitude improvement in performance on a single socket compared to the reference CPU implementation, and high scaling efficiency up to 64 sockets, while fitting ultra-large datasets which cannot be held in single node's memory. Therefore, this paper discusses and analyzes novel optimization and parallelization techniques for the various operators in DLRM. Several optimizations (e.g. tensorcontraction accelerated MLPs, framework MPI progression, BFLOAT16 training with up to 1.8× speed-up) are general and transferable to many other deep learning topologies. Title: A survey on adversarial recommender systems: from attack/defense strategies to generative adversarial networks Link: https://dl.acm.org/doi/abs/10.1145/3439729 Year: 2021 Abstract: Latent-factor models (LFM) based on collaborative filtering (CF), such as matrix factorization (MF) and deep CF methods, are widely used in modern recommender systems (RS) due to their excellent performance and recommendation accuracy. However, success has been accompanied with a major new arising challenge: Many applications of machine learning (ML) are adversarial in nature [146]. In recent years, it has been shown that these methods are vulnerable to adversarial examples, i.e., subtle but non-random perturbations designed to force recommendation models to produce erroneous outputs.The goal of this survey is two-fold: (i) to present recent advances on adversarial machine learning (AML) for the security of RS (i.e., attacking and defense recommendation models) and (ii) to show another successful application of AML in generative adversarial networks (GANs) for generative applications, thanks to their ability for learning (high-dimensional) data distributions. In this survey, we provide an exhaustive literature review of 76 articles published in major RS and ML journals and conferences. This review serves as a reference for the RS community working on the security of RS or on generative models using GANs to improve their quality. Title: GPU accelerated feature engineering and training for recommender systems Link: https://dl.acm.org/doi/abs/10.1145/3415959.3415996 Year: 2020 Abstract: In this paper we present our 1st place solution of the RecSys Challenge 2020 which focused on the prediction of user behavior, specifically the interaction with content, on this year’s dataset from competition host Twitter. Our approach achieved the highest score in seven of the eight metrics used to calculate the final leaderboard position. The 200 million tweet dataset required significant computation to do feature engineering and prepare the dataset for modelling, and our winning solution leveraged several key tools in order to accelerate our training pipeline. Within the paper we describe our exploratory data analysis (EDA) and training, the final features and models used, and the acceleration of the pipeline. Our final implementation runs entirely on GPU including feature engineering, preprocessing, and training the models. From our initial single threaded efforts in Pandas which took over ten hours we were able to accelerate feature engineering, preprocessing and training to two minutes and eighteen seconds, an end to end speedup of over 280x, using a combination of RAPIDS cuDF, Dask, UCX and XGBoost on a single machine with four NVIDIA V100 GPUs. Even when compared to heavily optimized code written later using Dask and Pandas on a 20 core CPU, our solution was still 25x faster. The acceleration of our pipeline was critical in our ability to quickly perform EDA which led to the discovery of a range of effective features used in the final solution, which is provided as open source [16]. Title: Attentive capsule network for click-through rate and conversion rate prediction in online advertising Link: https://www.sciencedirect.com/science/article/pii/S0950705120306511 Year: 2021 Abstract: Estimating Click-through Rate (CTR) and Conversion Rate (CVR) are two essential user response prediction tasks in computing advertising and recommendation systems. The mainstream methods map sparse, highdimensional categorical features (e.g., user id, item id) into lowdimensional representations with neural networks. Although they have achieved significant advancement in recent years, how to capture user’s diverse interests effectively from past behaviors is still challenging. Recently some works try using attention-based methods to learn the representation from user behavior history adaptively. However, it is insufficient to capture the diversity of user’s interests. As a step forward to improve this goal, we propose a method named Attentive Capsule Network (ACN). It uses Transformers for feature interaction and leverages capsule networks to capture multiple interests from user behavior history. To precisely obtain sequence representation related to the current advertisement, we further design a modified dynamic routing algorithm integrating with an attention mechanism. Experimental results on realworld datasets demonstrate the effectiveness of our proposed ACN with significant improvement over state-of-the-art approaches. Moreover, it also offers good explainability when extracting diverse interest points of users from behavior history. Title: [PDF][PDF] Disentangling the Performance Puzzle of Multimodalaware Recommender Systems Link: https://sisinflab.poliba.it/publications/2023/MCPD23/_END__EvalRS_2023_ __Disentangling_the_Performance_Puzzle_of_Multimodal_aware_Recommender_ Systems.pdf Year: 2023 Abstract: Title: A novel time-aware food recommender-system based on deep learning and graph clustering Link: https://ieeexplore.ieee.org/abstract/document/9775081/ Year: 2022 Abstract: Food recommender-systems are considered an effective tool to help users adjust their eating habits and achieve a healthier diet. This paper aims to develop a new hybrid food recommender-system to overcome the shortcomings of previous systems, such as ignoring food ingredients, time factor, cold start users, cold start food items and community aspects. The proposed method involves two phases: food content-based recommendation and user-based recommendation. Graph clustering is used in the first phase, and a deep-learning based approach is used in the second phase to cluster both users and food items. Besides a holisticlike approach is employed to account for time and user-community related issues in a way that improves the quality of the recommendation provided to the user. We compared our model with a set of state-of-the-art recommender-systems using five distinct performance metrics: Precision, Recall, F1, AUC and NDCG. Experiments using dataset extracted from “Allrecipes.com” demonstrated that the developed food recommender-system performed best. Title: Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5) Link: https://dl.acm.org/doi/abs/10.1145/3523227.3546767 Year: 2022 Abstract: For a long time, different recommendation tasks require designing task-specific architectures and training objectives. As a result, it is hard to transfer the knowledge and representations from one task to another, thus restricting the generalization ability of existing recommendation approaches. To deal with such issues, considering that language can describe almost anything and language grounding is a powerful medium to represent various problems or tasks, we present a flexible and unified text-to-text paradigm called “Pretrain, Personalized Prompt, and Predict Paradigm” (P5) for recommendation, which unifies various recommendation tasks in a shared framework. In P5, all data such as user- item interactions, user descriptions, item metadata, and user reviews are converted to a common format — natural language sequences. The rich information from natural language assists P5 to capture deeper semantics for personalization and recommendation. Specifically, P5 learns different tasks with the same language modeling objective during pretraining. Thus, it serves as the foundation model for various downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation. P5 advances recommender systems from shallow model to deep model to big model, and will revolutionize the technical form of recommender systems towards universal recommendation engine. With adaptive personalized prompt for different users, P5 is able to make predictions in a zero-shot or few-shot manner and largely reduces the necessity for extensive fine-tuning. On several benchmarks, we conduct experiments to show the effectiveness of P5. To help advance future research on Recommendation as Language Processing (RLP), Personalized Foundation Models (PFM), and Universal Recommendation Engine (URE), we release the source code, dataset, prompts, and pretrained P5 model at https://github.com/jeykigung/P5. Title: Equivariant contrastive learning for sequential recommendation Link: https://dl.acm.org/doi/abs/10.1145/3604915.3608786 Year: 2023 Abstract: Contrastive learning (CL) benefits the training of sequential recommendation models with informative self-supervision signals. Existing solutions apply general sequential data augmentation strategies to generate positive pairs and encourage their representations to be invariant. However, due to the inherent properties of user behavior sequences, some augmentation strategies, such as item substitution, can lead to changes in user intent. Learning indiscriminately invariant representations for all augmentation strategies might be sub-optimal. Therefore, we propose Equivariant Contrastive Learning for Sequential Recommendation (ECL-SR), which endows SR models with great discriminative power, making the learned user behavior representations sensitive to invasive augmentations (e.g., item substitution) and insensitive to mild augmentations (e.g., feature-level dropout masking). In detail, we use the conditional discriminator to capture differences in behavior due to item substitution, which encourages the user behavior encoder to be equivariant to invasive augmentations. Comprehensive experiments on four benchmark datasets show that the proposed ECL-SR framework achieves competitive performance compared to state-of-the-art SR models. The source code is available at https://github.com/Tokkiu/ECL. Title: A survey of graph neural networks Challenges, methods, and directions for Link: https://dl.acm.org/doi/abs/10.1145/3568022 recommender systems: Year: 2023 Abstract: Recommender system is one of the most important information services on today’s Internet. Recently, graph neural networks have become the new state-of-the-art approach to recommender systems. In this survey, we conduct a comprehensive review of the literature on graph neural network-based recommender systems. We first introduce the background and the history of the development of both recommender systems and graph neural networks. For recommender systems, in general, there are four aspects for categorizing existing works: stage, scenario, objective, and application. For graph neural networks, the existing methods consist of two categories: spectral models and spatial ones. We then discuss the motivation of applying graph neural networks into recommender systems, mainly consisting of the high-order connectivity, the structural property of data and the enhanced supervision signal. We then systematically analyze the challenges in graph construction, embedding propagation/aggregation, model optimization, and computation efficiency. Afterward and primarily, we provide a comprehensive overview of a multitude of existing works of graph neural network-based recommender systems, following the taxonomy above. Finally, we raise discussions on the open problems and promising future directions in this area. We summarize the representative papers along with their code repositories in https://github.com/tsinghua-fib-lab/GNN-Recommender-Systems. Title: Contrastive learning for debiased candidate generation in largescale recommender systems Link: https://dl.acm.org/doi/abs/10.1145/3447548.3467102 Year: 2021 Abstract: Deep candidate generation (DCG) that narrows down the collection of relevant items from billions to hundreds via representation learning has become prevalent in industrial recommender systems. Standard approaches approximate maximum likelihood estimation (MLE) through sampling for better scalability and address the problem of DCG in a way similar to language modeling. However, live recommender systems face severe exposure bias and have a vocabulary several orders of magnitude larger than that of natural language, implying that MLE will preserve and even exacerbate the exposure bias in the long run in order to faithfully fit the observed samples. In this paper, we theoretically prove that a popular choice of contrastive loss is equivalent to reducing the exposure bias via inverse propensity weighting, which provides a new perspective for understanding the effectiveness of contrastive learning. Based on the theoretical discovery, we design CLRec, a contrastive learning method to improve DCG in terms of fairness, effectiveness and efficiency in recommender systems with extremely large candidate size. We further improve upon CLRec and propose Multi-CLRec, for accurate multi-intention aware bias reduction. Our methods have been successfully deployed in Taobao, where at least four-month online A/B tests and offline analyses demonstrate its substantial improvements, including a dramatic reduction in the Matthew effect. Title: Graph neural networks in recommender systems: a survey Link: https://dl.acm.org/doi/abs/10.1145/3535101 Year: 2022 Abstract: With the explosive growth of online information, recommender systems play a key role to alleviate such information overload. Due to the important application value of recommender systems, there have always been emerging works in this field. In recommender systems, the main challenge is to learn the effective user/item representations from their interactions and side information (if any). Recently, graph neural network (GNN) techniques have been widely utilized in recommender systems since most of the information in recommender systems essentially has graph structure and GNN has superiority in graph representation learning. This article aims to provide a comprehensive review of recent research efforts on GNN-based recommender systems. Specifically, we provide a taxonomy of GNN-based recommendation models according to the types of information used and recommendation tasks. Moreover, we systematically analyze the challenges of applying GNN on different types of data and discuss how existing works in this field address these challenges. Furthermore, we state new perspectives pertaining to the development of this field. We collect the representative papers along with their opensource implementations in https://github.com/wusw14/GNN-in-RS. Title: [HTML][HTML] A survey of recommendation systems: recommendation models, techniques, and application fields Link: https://www.mdpi.com/2079-9292/11/1/141 Year: 2022 Abstract: This paper reviews the research trends that link the advanced technical aspects of recommendation systems that are used in various service areas and the business aspects of these services. First, for a reliable analysis of recommendation models for recommendation systems, data mining technology, and related research by application service, more than 135 top-ranking articles and top-tier conferences published in Google Scholar between 2010 and 2021 were collected and reviewed. Based on this, studies on recommendation system models and the technology used in recommendation systems were systematized, and research trends by year were analyzed. In addition, the application service fields where recommendation systems were used were classified, and research on the recommendation system model and recommendation technique used in each field was analyzed. Furthermore, vast amounts of application servicerelated data used by recommendation systems were collected from 2010 to 2021 without taking the journal ranking into consideration and reviewed along with various recommendation system studies, as well as applied service field industry data. As a result of this study, it was found that the flow and quantitative growth of various detailed studies of recommendation systems interact with the business growth of the actual applied service field. While providing a comprehensive summary of recommendation systems, this study provides insight to many researchers interested in recommendation systems through the analysis of its various technologies and trends in the service field to which recommendation systems are applied. Title: Deep learning techniques collaborative filtering for recommender systems based on Link: https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.12647 Year: 2020 Abstract: In the Big Data Era, recommender systems perform a fundamental role in data management and information filtering. In this context, Collaborative Filtering (CF) persists as one of the most prominent strategies to effectively deal with large datasets and is capable of offering users interesting content in a recommendation fashion. Nevertheless, it is well-known CF recommenders suffer from data sparsity, mainly in cold-start scenarios, substantially reducing the quality of recommendations. In the vast literature about the aforementioned topic, there are numerous solutions, in which the state-of-the-art contributions are, in some sense, conditioned or associated with traditional CF methods such as Matrix Factorization (MF), that is, they rely on linear optimization procedures to model users and items into low-dimensional embeddings. To overcome the aforementioned challenges, there has been an increasing number of studies exploring deep learning techniques in the CF context for latent factor modelling. In this research, authors conduct a systematic review focusing on state-of-the-art literature on deep learning techniques applied in collaborative filtering recommendation, and also featuring primary studies related to mitigating the cold start problem. Additionally, authors considered the diverse non-linear modelling strategies to deal with rating data and side information, the combination of deep learning techniques with traditional CF-based linear methods, and an overview of the most used public datasets and evaluation metrics concerning CF scenarios. Title: Neural networks and deep learning Link: https://www.emerald.com/insight/content/doi/10.1108/978-1-83909694-520211010/full/html Year: 2021 Abstract: Neural networks, which provide the basis for deep learning, are a class of machine learning methods that are being applied to a diverse array of fields in business, health, technology, and research. In this chapter, we survey some of the key features of deep neural networks and aspects of their design and architecture. We give an overview of some of the different kinds of networks and their applications and highlight how these architectures are used for business applications such as recommender systems. We also provide a summary of some of the considerations needed for using neural network models and future directions in the field. Title: Multi-aspect aware session-based recommendation for intelligent transportation services Link: https://ieeexplore.ieee.org/abstract/document/9093954/ Year: 2020 Abstract: In the intelligent transportation system, the session data usually represents the users' demand. However, the traditional approaches only focus on the sequence information or the last item clicked by the user, which cannot fully represent user preferences. To address this issue, this paper proposes an Multi-aspect Aware Session-based Recommendation (MASR) model for intelligent transportation services, which comprehensively considers the user's personalized behavior from multiple aspects. In addition, it developed a concise and efficient transformer-style self-attention to analyze the sequence information of the current session, for accurately grasping the user's intention. Finally, the experimental results show that MASR is available to improve user satisfaction with more accurate and rapid recommendations, and reduce the number of user operations to decrease the safety risk during the transportation service. Title: Contrastive self-supervised sequential recommendation with robust augmentation Link: https://arxiv.org/abs/2108.06479 Year: 2021 Abstract:Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data. At their core, such approaches model transition probabilities between items in a sequence, whether through Markov chains, recurrent networks, or more recently, Transformers. However both old and new issues remain, including data-sparsity and noisy data; such issues can impair the performance, especially in complex, parameter-hungry models. In this paper, we investigate the application of contrastive Self-Supervised Learning (SSL) to the sequential recommendation, as a way to alleviate some of these issues. Contrastive SSL constructs augmentations from unlabelled instances, where agreements among positive pairs are maximized. It is challenging to devise a contrastive SSL framework for a sequential recommendation, due to its discrete nature, correlations among items, and skewness of length distributions. To this end, we propose a novel framework, Contrastive Self-supervised Learning for sequential Recommendation (CoSeRec). We introduce two informative augmentation operators leveraging item correlations to create highquality views for contrastive learning. Experimental results on three real-world datasets demonstrate the effectiveness of the proposed method on improving model performance and the robustness against sparse and noisy data. Our implementation is available online at \url{this https URL} Title: [HTML][HTML] Advances and challenges in conversational recommender systems: A survey Link: https://www.sciencedirect.com/science/article/pii/S2666651021000164 Year: 2021 Abstract: Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs in five directions: (1) Question-based user preference elicitation. (2) Multiturn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey can help to identify and address challenges in CRSs and inspire future research. Title: [PDF][PDF] Deep feedback network for recommendation Link: https://www.ijcai.org/Proceedings/2020/0349.pdf Year: 2021 Title: Adversarial feature translation for multi-domain recommendation Link: https://dl.acm.org/doi/abs/10.1145/3447548.3467176 Year: 2021 Abstract: Real-world super platforms such as Google and WeChat usually have different recommendation scenarios to provide heterogeneous items for users' diverse demands. Multi-domain recommendation (MDR) is proposed to improve all recommendation domains simultaneously, where the key point is to capture informative domain-specific features from all domains. To address this problem, we propose a novel Adversarial feature translation (AFT) model for MDR, which learns the feature translations between different domains under a generative adversarial network framework. Precisely, in the multi-domain generator, we propose a domain-specific masked encoder to highlight inter-domain feature interactions, and then aggregate these features via a transformer and a domain-specific attention. In the multi-domain discriminator, we explicitly model the relationships between item, domain and users' general/domain-specific representations with a two-step feature translation inspired by the knowledge representation learning. In experiments, we evaluate AFT on a public and an industrial MDR datasets and achieve significant improvements. We also conduct an online evaluation on a real-world MDR system. We further give detailed ablation tests and model analyses to verify the effectiveness of different components. Currently, we have deployed AFT on WeChat Top Stories. The source code is in https://github.com/xiaobocser/AFT. Title: Time interval aware self-attention for sequential recommendation Link: https://dl.acm.org/doi/abs/10.1145/3336191.3371786 Year: 2020 Abstract: Sequential recommender systems seek to exploit the order of users' interactions, in order to predict their next action based on the context of what they have done recently. Traditionally, Markov Chains(MCs), and more recently Recurrent Neural Networks (RNNs) and Self Attention (SA) have proliferated due to their ability to capture the dynamics of sequential patterns. However a simplifying assumption made by most of these models is to regard interaction histories as ordered sequences, without regard for the time intervals between each interaction (i.e., they model the time-order but not the actual timestamp). In this paper, we seek to explicitly model the timestamps of interactions within a sequential modeling framework to explore the influence of different time intervals on next item prediction. We propose TiSASRec (Time Interval aware Self-attention based sequential recommendation), which models both the absolute positions of items as well as the time intervals between them in a sequence. Extensive empirical studies show the features of TiSASRec under different settings and compare the performance of selfattention with different positional encodings. Furthermore, experimental results show that our method outperforms various state-of-the-art sequential models on both sparse and dense datasets and different evaluation metrics. Title: A deep learning approach for robust detection of bots in twitter using transformers Link: https://ieeexplore.ieee.org/abstract/document/9385071/ Year: 2021 Abstract: During the last decades, the volume of multimedia content posted in social networks has grown exponentially and such information is immediately propagated and consumed by a significant number of users. In this scenario, the disruption of fake news providers and bot accounts for spreading propaganda information as well as sensitive content throughout the network has fostered applied research to automatically measure the reliability of social networks accounts via Artificial Intelligence (AI). In this paper, we present a multilingual approach for addressing the bot identification task in Twitter via Deep learning (DL) approaches to support end-users when checking the credibility of a certain Twitter account. To do so, several experiments were conducted using state-ofthe-art Multilingual Language Models to generate an encoding of the textbased features of the user account that are later on concatenated with the rest of the metadata to build a potential input vector on top of a Dense Network denoted as Bot-DenseNet. Consequently, this paper assesses the language constraint from previous studies where the encoding of the user account only considered either the metadata information or the metadata information together with some basic semantic text features. Moreover, the Bot-DenseNet produces a low-dimensional representation of the user account which can be used for any application within the Information Retrieval (IR) framework. Title: Multimedia recommender systems: Algorithms and challenges Link: https://link.springer.com/chapter/10.1007/978-1-0716-2197-4_25 Year: 2021 Abstract: This chapter studies state-of-the-art research related to multimedia recommender systems (MMRS), focusing on methods that integrate multimedia content as side information to various recommendation models. The multimedia features are then used by an MMRS to recommend either (1) media items from which the features were derived, or (2) non-media items utilizing the features obtained from a proxy multimedia representation of the item (e.g., images of clothes). We first outline the key considerations and challenges that must be taken into account while developing an MMRS. We then discuss the most popular multimedia content processing approaches to produce item representations that may be utilized as side information in an MMRS. Finally, we discuss recent stateof-the-art MMRS algorithms, which we classify and present according to classical hybrid models (e.g., VBPR), neural approaches, and graph-based approaches. Throughout this work, we mentioned several use-cases of MMRSs in the recommender systems research across several domains or products types such as food, fashion, music, videos, and so forth. We hope this chapter provides fresh insights into the nexus of multimedia and recommender systems, which could be exploited to broaden the frontier in the field. Title: A memory transformer network for incremental learning Link: https://arxiv.org/abs/2210.04485 Year: 2022 Abstract:We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from. Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes. One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks. In our paper, we propose to enhance the utilization of this memory bank: we not only use it as a source of additional training data like existing works but also integrate it in the prediction process explicitly.Our method, the Memory Transformer Network (MTN), learns how to combine and aggregate the information from the nearest neighbors in the memory with a transformer to make more accurate predictions. We conduct extensive experiments and ablations to evaluate our approach. We show that MTN achieves state-of-the-art performance on the challenging ImageNet-1k and Google-Landmarks-1k incremental learning benchmarks. Title: A survey of recommender systems with multi-objective optimization Link: https://www.sciencedirect.com/science/article/pii/S0925231221017185 Year: 2022 Abstract: Recommender systems have been widely applied to several domains and applications to assist decision making by recommending items tailored to user preferences. One of the popular recommendation algorithms is the model-based approach which optimizes a specific objective to improve the recommendation performance. These traditional recommendation models usually deal with a single objective, such as minimizing the prediction errors or maximizing the ranking quality of the recommendations. In recent years, there is an emerging demand for multi-objective recommender systems in which multiple objectives are considered and the recommendations can be optimized by the multi-objective optimization. For example, a recommendation model may be built by optimizing multiple metrics, such as accuracy, novelty and diversity of the recommendations. The multi-objective optimization methodologies have been well developed and applied to the area of recommender systems. In this article, we provide a comprehensive literature review of the multi-objective recommender systems. Particularly, we identify the circumstances in which a multi-objective recommender system could be useful, summarize the methodologies and evaluation approaches in these systems, point out existing challenges or weaknesses, finally provide the guidelines and suggestions for the development of multi-objective recommender systems. Title: Untargeted attack against federated recommendation systems via poisonous item embeddings and the defense Link: https://ojs.aaai.org/index.php/AAAI/article/view/25611 Year: 2023 Abstract: Federated recommendation (FedRec) can train personalized recommenders without collecting user data, but the decentralized nature makes it susceptible to poisoning attacks. Most previous studies focus on the targeted attack to promote certain items, while the untargeted attack that aims to degrade the overall performance of the FedRec system remains less explored. In fact, untargeted attacks can disrupt the user experience and bring severe ﬁnancial loss to the service provider. However, existing untargeted attack methods are either inapplicable or ineffective against FedRec systems. In this paper, we delve into the untargeted attack and its defense for FedRec systems. (i) We propose ClusterAttack, a novel untargeted attack method. It uploads poisonous gradients that converge the item embeddings into several dense clusters, which make the recommender generate similar scores for these items in the same cluster and perturb the ranking order. (ii) We propose a uniformity-based defense mechanism (UNION) to protect FedRec systems from such attacks. We design a contrastive learning task that regularizes the item embeddings toward a uniform distribution. Then the server ﬁlters out these malicious gradients by estimating the uniformity of updated item embeddings. Experiments on two public datasets show that ClusterAttack can effectively degrade the performance of FedRec systems while circumventing many defense methods, and UNION can improve the resistance of the system against various untargeted attacks, including our ClusterAttack. Title: Filter-enhanced MLP is all you need for sequential recommendation Link: https://dl.acm.org/doi/abs/10.1145/3485447.3512111 Year: 2022 Abstract: Recently, deep neural networks such as RNN, CNN and Transformer have been applied in the task of sequential recommendation, which aims to capture the dynamic preference characteristics from logged user behavior data for accurate recommendation. However, in online platforms, logged user behavior data is inevitable to contain noise, and deep recommendation models are easy to overfit on these logged data. To tackle this problem, we borrow the idea of filtering algorithms from signal processing that attenuates the noise in the frequency domain. In our empirical experiments, we find that filtering algorithms can substantially improve representative sequential recommendation models, and integrating simple filtering algorithms (e.g., Band-Stop Filter) with an all-MLP architecture can even outperform competitive Transformer-based models. Motivated by it, we propose FMLP-Rec, an all-MLP model with learnable filters for sequential recommendation task. The all-MLP architecture endows our model with lower time complexity, and the learnable filters can adaptively attenuate the noise information in the frequency domain. Extensive experiments conducted on eight real-world datasets demonstrate the superiority of our proposed method over competitive RNN, CNN, GNN and Transformer-based methods. Our code and data are publicly available at the link: https://github.com/RUCAIBox/FMLP-Rec . Title: Noninvasive self-attention sequential recommendation for side information fusion in Link: https://ojs.aaai.org/index.php/AAAI/article/view/16549 Year: 2021 Abstract: Sequential recommender systems aim to model users’ evolving interests from their historical behaviors, and hence make customized time-relevant recommendations. Compared with traditional models, deep learning approaches such as CNN and RNN have achieved remarkable advancements in recommendation tasks. Recently, the BERT framework also emerges as a promising method, benefited from its self-attention mechanism in processing sequential data. However, one limitation of the original BERT framework is that it only considers one input source of the natural language tokens. It is still an open question to leverage various types of information under the BERT framework. Nonetheless, it is intuitively appealing to utilize other side information, such as item category or tag, for more comprehensive depictions and better recommendations. In our pilot experiments, we found naive approaches, which directly fuse types of side information into the item embeddings, usually bring very little or even negative effects. Therefore, in this paper, we propose the NOn-inVasive self-Attention mechanism (NOVA) to leverage side information effectively under the BERT framework. NOVA makes use of side information to generate better attention distribution, rather than directly altering the item embeddings, which may cause information overwhelming. We validate the NOVA-BERT model on both public and commercial datasets, and our method can stably outperform the stateof-the-art models with negligible computational overheads. Title: An approach to integrating sentiment analysis into recommender systems Link: https://www.mdpi.com/1424-8220/21/16/5666 Year: 2021 Abstract: Recommender systems have been applied in a wide range of domains such as e-commerce, media, banking, and utilities. This kind of system provides personalized suggestions based on large amounts of data to increase user satisfaction. These suggestions help client select products, while organizations can increase the consumption of a product. In the case of social data, sentiment analysis can help gain better understanding of a user’s attitudes, opinions and emotions, which is beneficial to integrate in recommender systems for achieving higher recommendation reliability. On the one hand, this information can be used to complement explicit ratings given to products by users. On the other hand, sentiment analysis of items that can be derived from online news services, blogs, social media or even from the recommender systems themselves is seen as capable of providing better recommendations to users. In this study, we present and evaluate a recommendation approach that integrates sentiment analysis into collaborative filtering methods. The recommender system proposal is based on an adaptive architecture, which includes improved techniques for feature extraction and deep learning models based on sentiment analysis. The results of the empirical study performed with two popular datasets show that sentiment–based deep learning models and collaborative filtering methods can significantly improve the recommender system’s performance. Title: Stochastic embedding layers shared embeddings: Data-driven regularization of Link: https://proceedings.neurips.cc/paper/2019/hash/37693cfc748049e45d87b8c7 d8b9aacd-Abstract.html Year: 2019 Abstract: In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results. Title: CRSLab: An recommender system open-source toolkit for building conversational Link: https://arxiv.org/abs/2101.00939 Year: 2021 Abstract:In recent years, conversational recommender system (CRS) has received much attention in the research community. However, existing studies on CRS vary in scenarios, goals and techniques, lacking unified, standardized implementation or comparison. To tackle this challenge, we propose an open-source CRS toolkit CRSLab, which provides a unified and extensible framework with highly-decoupled modules to develop CRSs. Based on this framework, we collect 6 commonly-used human-annotated CRS datasets and implement 18 models that include recent techniques such as graph neural network and pre-training models. Besides, our toolkit provides a series of automatic evaluation protocols and a human-machine interaction interface to test and compare different CRS methods. The project and documents are released at this https URL. Title: U-BERT: recommendation Pre-training user representations for improved Link: https://ojs.aaai.org/index.php/AAAI/article/view/16557 Year: 2021 Abstract: Learning user representation is a critical task for recommendation systems as it can encode user preference for personalized services. User representation is generally learned from behavior data, such as clicking interactions and review comments. However, for less popular domains, the behavior data is insufficient to learn precise user representations. To deal with this problem, a natural thought is to leverage content-rich domains to complement user representations. Inspired by the recent success of BERT in NLP, we propose a novel pre- training and fine-tuning based approach U-BERT. Different from typical BERT applications, U-BERT is customized for recommendation and utilizes different frameworks in pre-training and fine-tuning. In pre-training, U-BERT focuses on content-rich domains and introduces a user encoder and a review encoder to model users' behaviors. Two pre-training strategies are proposed to learn the general user representations; In fine-tuning, U-BERT focuses on the target content-insufficient domains. In addition to the user and review encoders inherited from the pre-training stage, U-BERT further introduces an item encoder to model item representations. Besides, a review co-matching layer is proposed to capture more semantic interactions between the reviews of the user and item. Finally, U-BERT combines user representations, item representations and review interaction information to improve recommendation performance. Experiments on six benchmark datasets from different domains demonstrate the state-of-the-art performance of U-BERT. Title: Sequential recommendation via stochastic self-attention Link: https://dl.acm.org/doi/abs/10.1145/3485447.3512077 Year: 2022 Abstract: Sequential recommendation models the dynamics of a user’s previous behaviors in order to forecast the next item, and has drawn a lot of attention. Transformer-based approaches, which embed items as vectors and use dot-product self-attention to measure the relationship between items, demonstrate superior capabilities among existing sequential methods. However, users’ real-world sequential behaviors are uncertain rather than deterministic, posing a significant challenge to present techniques. We further suggest that dot-product-based approaches cannot fully capture collaborative transitivity, which can be derived in item-item transitions inside sequences and is beneficial for cold start items. We further argue that BPR loss has no constraint on positive and sampled negative items, which misleads the optimization. We propose a novel STOchastic Self-Attention (STOSA) to overcome these issues. STOSA, in particular, embeds each item as a stochastic Gaussian distribution, the covariance of which encodes the uncertainty. We devise a novel Wasserstein Self-Attention module to characterize item-item position-wise relationships in sequences, which effectively incorporates uncertainty into model training. Wasserstein attentions also enlighten the collaborative transitivity learning as it satisfies triangle inequality. Moreover, we introduce a novel regularization term to the ranking loss, which assures the dissimilarity between positive and the negative items. Extensive experiments on five real-world benchmark datasets demonstrate the superiority of the proposed model over stateof-the-art baselines, especially on cold start items. The code is available in https://github.com/zfan20/STOSA. Title: A troubling analysis recommender systems research of reproducibility and progress in Link: https://dl.acm.org/doi/abs/10.1145/3434185 Year: 2021 Abstract: The design of algorithms that generate personalized ranked item lists is a central topic of research in the field of recommender systems. In the past few years, in particular, approaches based on deep learning (neural) techniques have become dominant in the literature. For all of them, substantial progress over the state-of-the-art is claimed. However, indications exist of certain problems in today’s research practice, e.g., with respect to the choice and optimization of the baselines used for comparison, raising questions about the published claims. To obtain a better understanding of the actual progress, we have compared recent results in the area of neural recommendation approaches based on collaborative filtering against a consistent set of existing simple baselines. The worrying outcome of the analysis of these recent works— all were published at prestigious scientific conferences between 2015 and 2018—is that 11 of the 12 reproducible neural approaches can be outperformed by conceptually simple methods, e.g., based on the nearestneighbor heuristic or linear models. None of the computationally complex neural methods was actually consistently better than already existing learning-based techniques, e.g., using matrix factorization or linear models. In our analysis, we discuss common issues in today’s research practice, which, despite the many papers that are published on the topic, have apparently led the field to a certain level of stagnation. Title: Fast-adapting and privacy-preserving federated recommender system Link: https://link.springer.com/article/10.1007/s00778-021-00700-6 Year: 2021 Abstract: In the mobile Internet era, recommender systems have become an irreplaceable tool to help users discover useful items, thus alleviating the information overload problem. Recent research on deep neural network (DNN)-based recommender systems have made significant progress in improving prediction accuracy, largely attributed to the widely accessible large-scale user data. Such data is commonly collected from users’ personal devices and then centrally stored in the cloud server to facilitate model training. However, with the rising public concerns on user privacy leakage in online platforms, online users are becoming increasingly anxious over abuses of user privacy. Therefore, it is urgent and beneficial to develop a recommender system that can achieve both high prediction accuracy and strong privacy protection. To this end, we propose a DNN-based recommendation model called PrivRec running on the decentralized federated learning (FL) environment, which ensures that a user’s data is fully retained on her/his personal device while contributing to training an accurate model. On the other hand, to better embrace the data heterogeneity (e.g., users’ data vary in scale and quality significantly) in FL, we innovatively introduce a first-order meta-learning method that enables fast on-device personalization with only a few data points. Furthermore, to defend against potential malicious participants that pose serious security threat to other users, we further develop a user-level differentially private model, namely DP-PrivRec, so attackers are unable to identify any arbitrary user from the trained model. To compensate for the loss by adding noise during model updates, we introduce a two-stage training approach. Finally, we conduct extensive experiments on two large-scale datasets in a simulated FL environment, and the results validate the superiority of both PrivRec and DP-PrivRec. Title: A generic network compression framework for sequential recommender systems Link: https://dl.acm.org/doi/abs/10.1145/3397271.3401125 Year: 2020 Abstract: Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating highquality recommendations. Current state-of-the-art sequential recommender models are typically based on a sandwich-structured deep neural network, where one or more middle (hidden) layers are placed between the input embedding layer and output softmax layer. In general, these models require a large number of parameters to obtain optimal performance. Despite the effectiveness, at some point, further increasing model size may be harder for model deployment in resource-constraint devices. To resolve the issues, we propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed. Specifically, we first propose a block-wise adaptive decomposition to approximate the input and softmax matrices by exploiting the fact that items in SRS obey a long-tailed distribution. To reduce the parameters of the middle layers, we introduce three layer-wise parameter sharing schemes. We instantiate CpRec using deep convolutional neural network with dilated kernels given consideration to both recommendation accuracy and efficiency. By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4~8 times compression rates in realworld SRS datasets. Meanwhile, CpRec is faster during training & inference, and in most cases outperforms its uncompressed counterpart. Title: Recommendation system for technology convergence opportunities based on self-supervised representation learning Link: https://link.springer.com/article/10.1007/s11192-020-03731-y Year: 2021 Abstract: We show how a deep neural network can be designed to learn meaningful representations from high-dimensional and heterogeneous categorical features in patent data using self-supervised learning. Based on each firm’s technology portfolio and each patent’s co-classification information, we propose a novel recommendation system for firms seeking new convergence opportunities through representations of convergence items and firms. The results of this work are expected to recommend convergence opportunities in multiple technology fields by considering the target firm’s potential preference. First, we create a technology portfolio consisting of a set of patents owned by each firm. Then, we train a neural network to extract latent representations of firms and technology convergence items. Despite a lack of indicators related to a firm’s latent preference for a convergence item, a self-supervised neural network can capture the similarity with semantic information of firm’s latent preference that is implicitly present in patent’s coclassification information in each firm’s technology portfolio. We then calculate the similarity between the vector of a target firm and convergence items for recommendation. The top N similar convergence items that have the highest scores are recommended as the new convergence items for the target firm. We apply our framework to the dataset of patents granted by the United States Patent and Trademark Office between 2011 and 2015. The results indicate that the recent development in theories and empirical studies of deep representation learning can shed new light on extracting valuable information from the structured part of patent data. Title: AICF: Attention-based item collaborative filtering Link: https://www.sciencedirect.com/science/article/pii/S1474034620300598 Year: 2020 Abstract: Item-to-item collaborative filtering (short for ICF) has been widely used in ecommerce websites due to his interpretability and simplicity in real-time personalized recommendation. The focus of ICF is to calculate the similarity between items. With the rapid development of machine learning in recent years, it takes similarity model instead of cosine similarity and Pearson coefficient to calculate the similarity between items in recommendation. However, the existing similarity models can not sufficient to express the preferences of users for different items. In this work, we propose a novel attention-based item collaborative filtering model(AICF) which adopts three different attention mechanisms to estimate the weights of historical items that users have interacted with. Compared with the state-of-the-art recommendation models, the AICF model with simple attention mechanism Self-Attention can better estimate the weight of historical items on non-sparse data sets. Due to depth models can model complex connection between items, our model with the more complex Transformer achieves superior recommendation performance on sparse data. Extensive experiments on ML-1M and Pinterest-20 show that the proposed model greatly outperforms other novel models in recommendation accuracy and provides users with personalized recommendation list more in line with their interests. Title: Direct feedback alignment scales to modern deep learning tasks and architectures Link: https://proceedings.neurips.cc/paper/2020/hash/69d1fc78dbda242c43ad6590 368912d4-Abstract.html Year: 2020 Abstract: Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment (DFA) to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our findings show that it successfully trains a large range of state-of-theart deep learning architectures, with performance close to fine-tuned backpropagation. When a larger gap between DFA and backpropagation exists, like in Transformers, we attribute this to a need to rethink common practices for large and complex architectures. At variance with common beliefs, our work supports that challenging tasks can be tackled in the absence of weight transport. Title: Contrastive learning for recommender system Link: https://arxiv.org/abs/2101.01317 Year: 2021 Abstract:Recommender systems, which analyze users' preference patterns to suggest potential targets, are indispensable in today's society. Collaborative Filtering (CF) is the most popular recommendation model. Specifically, Graph Neural Network (GNN) has become a new state-of-theart for CF. In the GNN-based recommender system, message dropout is usually used to alleviate the selection bias in the user-item bipartite graph. However, message dropout might deteriorate the recommender system's performance due to the randomness of dropping out the outgoing messages based on the user-item bipartite graph. To solve this problem, we propose a graph contrastive learning module for a general recommender system that learns the embeddings in a self-supervised manner and reduces the randomness of message dropout. Besides, many recommender systems optimize models with pairwise ranking objectives, such as the Bayesian Pairwise Ranking (BPR) based on a negative sampling strategy. However, BPR has the following problems: suboptimal sampling and sample bias. We introduce a new debiased contrastive loss to solve these problems, which provides sufficient negative samples and applies a bias correction probability to alleviate the sample bias. We integrate the proposed framework, including graph contrastive module and debiased contrastive module with several Matrix Factorization(MF) and GNN-based recommendation models. Experimental results on three public benchmarks demonstrate the effectiveness of our framework. Title: KRED: Knowledge-aware recommendations document representation for news Link: https://dl.acm.org/doi/abs/10.1145/3383313.3412237 Year: 2020 Abstract: News articles usually contain knowledge entities such as celebrities or organizations. Important entities in articles carry key messages and help to understand the content in a more direct way. An industrial news recommender system contains various key applications, such as personalized recommendation, item-to-item recommendation, news category classification, news popularity prediction and local news detection. We find that incorporating knowledge entities for better document understanding benefits these applications consistently. However, existing document understanding models either represent news articles without considering knowledge entities (e.g., BERT) or rely on a specific type of text encoding model (e.g., DKN) so that the generalization ability and efficiency is compromised. In this paper, we propose KRED, which is a fast and effective model to enhance arbitrary document representation with a knowledge graph. KRED first enriches entities’ embeddings by attentively aggregating information from their neighborhood in the knowledge graph. Then a context embedding layer is applied to annotate the dynamic context of different entities such as frequency, category and position. Finally, an information distillation layer aggregates the entity embeddings under the guidance of the original document representation and transforms the document vector into a new one. We advocate to optimize the model with a multi-task framework, so that different news recommendation applications can be united and useful information can be shared across different tasks. Experiments on a realworld Microsoft News dataset demonstrate that KRED greatly benefits a variety of news recommendation applications.

Transformer Models for Recommender Systems

Related documents

Products

Support

Transformer Models for Recommender Systems

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib