Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference Mitigating Power User Attacks on a User-Based Collaborative Recommender System David C. Wilson and Carlos E. Seminario University of North Carolina at Charlotte Charlotte, North Carolina, USA davils@uncc.edu cseminar@uncc.edu Amazon VineTM invites trusted reviewers to post opinions about new and pre-release items.1 In previous work (Wilson and Seminario 2014; Seminario and Wilson 2014), we have shown that attackers emulating power users are effective against user-based, item-based, and SVD-based recommenders. In the literature, mitigating RS attacks usually consists of detecting the attackers and either removing them from the dataset or ignoring them during the prediction calculations (Chirita, Nejdl, and Zamfir 2005; Mehta and Nejdl 2009). While removing attack user profiles from recommendation calculations is a straightforward approach to eliminating the attacker’s influence in a laboratory environment, in live RS environments this approach could also have unwanted side effects (Mehta and Nejdl 2009). For instance, in cases where a legitimate power user is mistakenly identified as an attacker (false positive) and is removed, two issues could occur: (1) the removed legitimate power user would no longer receive recommendations, and (2) the users that rely on that legitimate power user’s neighborhood influence may be impacted. These approaches also assume that all attackers will be detected, i.e., no provision is provided for attackers that are not detected (false negatives). Abstract Collaborative Filtering (CF) Recommender Systems (RSs) ease the burden of information overload faced by online users who browse, search, or shop for products and services. Influential users, known as “power users” are able to exert substantial influence over recommendations made to other users, and RS operators encourage the existence of power user communities to help fellow users make informed purchase decisions. However, the influence power users wield can be used for both positive (addressing the “new item” problem) or negative (attack) purposes. Attacks on RSs tend to bias recommendations by introducing fake reviews or ratings and remain a key problem area for system operators. In prior work, we have shown that attackers emulating power users are effective against user-based, item-based, and SVD-based CF RSs. Previous research has shown that, in general, attacks on RSs can be mitigated by detecting the attackers and either removing them from the dataset or ignoring them during the prediction calculations. In live RS environments, however, these approaches impact legitimate users detected as attackers (false positives) and can lead to reduced coverage for those legitimate power users as well as reduced accuracy for other users that depend on the ratings of those legitimate power users. Our research is investigating alternative mitigation approaches to address these issues for power user attacks. We focus on techniques that remove or reduce the influence of power users and determine their impact on RS accuracy and robustness using established metrics. We introduce a new metric used to assess the trade-off between accuracy and robustness when our mitigation approaches are applied. And our results show that, for user-based systems, reducing power user influence is more effective than removing power users from the dataset. 1 This study investigates the potential for more effective mitigation approaches against Power User Attacks (PUAs), as compared to 100% removal of identified power users. PUA mitigation seeks to balance the trade-offs between accuracy (too many power user profiles are removed) and robustness (too few power user profiles are removed) impacts. Our hypothesis is that reducing the influence of power users is a more effective and less impactful mitigation strategy than removing the profiles of identified power users. The following research questions are used as a guide: RQ1: What happens to RS accuracy/robustness when power user profiles are removed from recommendation calculations to mitigate the power user attack impacts? RQ2: What happens to RS accuracy/robustness when power user influence is reduced during similarity calculations? RQ3: What are the trade-offs between accuracy and robustness when power user attacks are mitigated? Introduction Recommender Systems help users decide what products and services to buy from online providers. Influential “power” users, in the RS context, are those that are able to influence the largest group of RS users, i.e., power users have the ability to positively or negatively impact RS predictions for many other users. For white-hat purposes (e.g., addressing the “new item” problem), online systems encourage the formation of power user communities, e.g., c 2015, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 1 513 http://www.amazon.com/gp/vine/help 2 Related Work ipate in the largest number of neighborhoods (Wasserman and Faust 1994; Lathia, Hailes, and Capra 2008). In our implementation, for each user i compute similarity with every other user j applying significance weighting ncij /50, where ncij is the number of co-rated items and 50 items was determined empirically by (Herlocker et al. 1999) to optimize RS accuracy. Next, discard all but the top-k neighbors for each user i. Count the number of similarity scores for each user j (# neighborhoods user j is in), and select the top-k user j’s. Aggregated Similarity (AggSim): The Most Central heuristic from (Goyal and Lakshmanan 2012) selects top-k users with the highest aggregate similarity scores as the selected set of power users. This method requires at least 5 co-rated items between user i and user j and does not use significance weighting.2 Number of Ratings (NumRatings): This method is based on (Herlocker et al. 2004) where “power user” refers to users with the highest number of ratings; it also is called the Most Active heuristic in (Goyal and Lakshmanan 2012). We selected the top-k users based on the total number of ratings they have in their user profile. Power users are of particular interest to RS operators and their client companies when launching a new item, because a positive endorsement (high rating) can result in making item recommendations to many other users. This “marketbased” use of RS has been previously promoted as a solution to the “cold-start” or “new item” problem (Anand and Griffiths 2011). A viral marketing perspective to exploit the network value of customers was studied in (Domingos and Richardson 2001). Attacks on RSs by providing false ratings are known as “shilling attacks” (Lam and Riedl 2004), or “profile injection attacks” (Mobasher et al. 2007b; O’Mahony, Hurley, and Silvestre 2005). Research in attacks on recommender systems began in 2002 (O’Mahony, Hurley, and Silvestre 2002); a recent summary in (Burke, O’Mahony, and Hurley 2011) describes RS attack models, attack detection, and algorithm robustness. In (Wilson and Seminario 2013), we defined a novel Power User Attack (PUA) model as a set of power user profiles with biased ratings that influence the results presented to other users. The PUA relies critically on the method of power user identification/selection, so we also developed and evaluated a novel use of degree centrality concepts from social network analysis for identifying influential RS power users for attack purposes (Wilson and Seminario 2013). In (Wilson and Seminario 2014; Seminario and Wilson 2014), we have shown that attackers emulating power users are effective against user-based, item-based, and SVD-based recommenders. Previous research has shown that, in general, attacks on RSs can be mitigated by detecting the attackers (Mobasher et al. 2007a; Williams, Mobasher, and Burke 2007; Sandvig, Mobasher, and Burke 2008; Burke, O’Mahony, and Hurley 2011) and either removing them from the dataset or ignoring them during the prediction calculations (Chirita, Nejdl, and Zamfir 2005; Mehta and Nejdl 2009). However, removing attack user profiles from recommendation calculations can also have unwanted side effects (Mehta and Nejdl 2009) that result in reduced accuracy and coverage. Therefore, a gap in the research is that attack mitigation strategies that preserve adequate accuracy, coverage, and robustness have largely been ignored. And this remains an open question in RS attack research that we continue to explore in this study. 3 4 Mitigation Strategies Removing 100% of the power user attackers as a mitigation strategy could result in (1) reduced coverage for the “removed” users including legitimate users (false positives), 2) reduced accuracy for users whose similarity neighborhoods no longer benefit from the influence of the “removed” users including legitimate users (false positives), and (3) no provision for attackers that are not detected (false negatives) and assumes that all (true) power user attackers will be detected. To address these issues, the following mitigation strategies were initially evaluated in this study: • Remove attackers incrementally from 0% to 100%. • Reduce the similarity weighting factor of all attackers incrementally from 1.0 to 0.0. • Combine removal and influence reduction. Our analysis of these initial mitigation strategies determined the following: When removing power user attackers incrementally from the dataset, removal sequence matters. From the attacker’s standpoint, it would be better to remove starting from least influential to most influential; while from the system operator’s standpoint, removing starting from most influential to least influential would be better. And we also analyzed the impacts when removing power user attackers randomly. Since this is a mitigation study, we decided to use a removal sequence that favored system operators. Our analysis indicated that removing power users starting from most influential to least influential improves robustness at a faster rate than the other two methods. When mitigating the PUA, the type of target item matters. We used “New” target items (those with one rating) and “New and Established” target items (those with one or more ratings). From previous research (Seminario and Wilson 2014), we knew that New target items are more vulnera- Power User Attack Background In order to study RS attacks based explicitly on measures of influence, we previously defined a Power User Attack model as a set of power user profiles with biased ratings that influence the results presented to other users (Wilson and Seminario 2013). The PUA consists of one or more user profiles containing item ratings (called attack user profiles) that “push” (promote) or “nuke” (disparage) a specific item. The PUA relies critically on the method of power user identification/selection, so we implemented a set of heuristic approaches for comparative purposes, as follows: In-Degree Centrality (InDegree): Our approach based on in-degree centrality where power users are those that partic- 2 514 Based on personal communication with the authors. Other parameters such as recommender algorithms, datasets, and metrics are also specified in Section 5. ble to attack than New and Established targets. We analyzed the impacts of the PUA and found that, for New targets, robustness metrics were relatively high until the weighting was set to zero (ignore power user influence). For New and Established targets we found that robustness measures were significantly lower between weightings from 1.0 to 0.1, indicating that the PUA was not as effective with these target items. Combining power user attacker removal and influence reduction resulted in outcomes similar to the removal approach when the similarity weighting was greater than zero. Robustness was at a minimum only when the similarity weighting was zero. So, this approach did not provide additional information regarding power user attack mitigation. In our initial approach, all injected attackers were also considered to be power users (even when they were not) so that removing and/or reducing the influence of power users assumed a perfect power user attacker detection method. This was not a very realistic assumption so we decided to use our power user selection methods (§ 3) to allow for a mix of real and synthetic power user attackers. Consequently, the final mitigation strategies (MS) for this study are: MS1: Remove selected power users incrementally from 0% to 100%, starting from most influential to least influential. MS2: Reduce the similarity weighting factor of all selected power users incrementally from 1.0 to 0.0. MS3: Reduce the number of power users that influence predictions. The percentage of power users selected is reduced incrementally from 100% to 0% and the similarity weighting is set to one if selected, zero otherwise. To implement these mitigation strategies, the following methodology was used: 5 Experimental Design To address our research questions and hypothesis, we conducted three main experiments to correspond with the three final mitigation strategies (MS1-MS3) described in § 4: • E1: Power User Removal • E2: Power User Influence Reduction: All power users • E3: Power User Influence Reduction: Selected power users Evaluation Metrics: Evaluations were performed before and after the attacks. We use Mean Absolute Error (MAE) for accuracy and prediction coverage (Herlocker et al. 2004; Shani and Gunawardana 2011) using a holdout-partitioned 70/30 train/test dataset. To compare MAE before and after attacks, we use δM AE = M AEaf ter − M AEbef ore . For robustness metrics (Mobasher et al. 2007b; Burke, O’Mahony, and Hurley 2011), we use Hit Ratio (HR), Average HR (HR), Prediction Shift (PS), Average PS (P S), Rank (R), and Average R (R), where a high Hit Ratio and a low Rank after the attack indicate that the attack was successful (from the attacker’s standpoint) assuming that the target item had a lower Hit Ratio and higher Rank before the attack. The top-N list of recommendations for Hit Ratio calculations is N=40. Also, when the PUA being evaluated uses “new” target items (items with 1 rating), the Prediction Shift is expected to be close to the maximum rating as defined by the RS. For “new and established” target items, the Prediction Shift may also be high because some of the SPUs may fall below the threshold of power users to be removed or have their influence reduced; the SPUs that remain after removal or influence reduction are still used in the attack and may contribute to the high Prediction Shift. Finally, to assist in the assessment of the effectiveness of the mitigation strategies and the trade-offs between accuracy and robustness, we developed the new Accuracy/Robustness/Mitigation measure (ARM), 1. Generate power user lists from selected datasets using power user selection techniques from Section 3: InDegree, NumRatings, and AggSim. This generates a list of real power users (RPUs). 2. Generate synthetic power user (SPU) attack profiles based on power user statistical characteristics (Wilson and Seminario 2014) and insert them into the dataset. Select power users from the updated dataset using power user selection techniques described in Section 3. The top-k list of power users is expected to be a combination of RPUs and SPUs. M AE ∗HR ARM = (2 ∗ M AE af ter+HR ) ∗ (1 − ρ), where ρ is the peraf ter centage of power users or influence being evaluated. ARM varies between 0 and 1 and a higher ARM indicates a more effective mitigation for a given experiment. The major motivation behind the ARM metric is to find a measure that determines the level of removal or influence reduction that is best for mitigating the PUA. The ARM metric combines MAE and Hit Ratio in such a way that it balances the increase in MAE with the reduction in Hit Ratio as power user influence is removed or reduced. Datasets and Algorithms: We used MovieLens3 ML100K4 and ML1M5 datasets. The CF user-based weighted algorithm (UBW) (Desrosiers and Karypis 2011) uses Pearson similarity with a threshold of 0.0 (positive correlation), neighborhood size of 50, and significance 3. Select target items from a given dataset: New (items with one rating), New and Established (randomly-selected items with a range of popularity and likeability values). 4. Create incremental datasets with most-to-least-influential power users removed from the top of the top-k list. 5. Execute attacks for each mitigation strategy for the selected target items and calculate averaged metrics over all target items. Only SPUs will be used for attack purposes, leaving RPUs to provide their influence but not be part of the attack. Note that some SPU attack profiles will remain in the dataset after the top-k power users are removed during the experiments described in Section 5. 3 6. Compare accuracy, coverage, and robustness metrics for variations of the mitigation strategy to determine impacts of removing and reducing influence of power users. http://www.grouplens.org nominal 100,000 ratings, 1,682 movies, and 943 users. 5 nominal 1,000,209 ratings, 3,883 movies, 6,040 users. 4 515 weighting of n/50 where n is the number of co-rated items (Herlocker et al. 1999). We used UBW from Apache Mahout6 and added functionality to implement the MS2 and MS3 strategies (§ 4). Power User Selection: The InDegree (ID), NumRatings (NR), and AggSim(AS) methods described in § 3 were used. Target Item Selection: Fifty target items with no more than one rating, regardless of their rating value, were selected randomly as “New” (New) target items. We also used 50 “New and Established” (New/Est) target items, i.e., target items were selected randomly and had the following average number of ratings, average rating, and average rating entropy, respectively: ML100K (73.780, 3.133, 1.769), ML1M (280.399, 3.296, 1.883). Attack Parameter Selection: The Attack Intent is Push, i.e., target item rating is set to max (= 5). The Attack Size or number of SPUs in each attack varied by dataset: up to 50 for ML100K and up to 300 for ML1M. Attack sizes, also expressed as ( #attackers ∗ 100)%, were se#users lected based on previous research (Mobasher et al. 2007b; Burke, O’Mahony, and Hurley 2011), where a 5-10% attack size was shown to be effective; we use a 5% attack size for each dataset. Power user attack profiles were generated as described in (Wilson and Seminario 2014) and target item ratings were injected at run time. Test Variations: We used 3 experiments to evaluate the final set of mitigation strategies using one prediction algorithm, 2 datasets, 3 power user selection methods, 2 target item types, and 8 attack sizes. Figure 2: E2 – Hit Ratio and MAE as Power Users’ (Real & Synthetic) Influence reduced from 1.0 to 0.0 using ML1M runtime; the 50 targets are evaluated one at a time for each PUA, then the HR/Rank/PS metrics are averaged across all 50 target items. We use the most-to-least-influential SPU removal approach since that better mitigates the attack effectiveness, from a system operator’s perspective, i.e, HR drops off faster as the most influential power users are removed first. HR before the attack is 0% for New target items and 2% for New/Est target items for ML100K across all power user removal levels; these values serve as the HR baseline and indicate that without attackers, the target items appear in few, if any, top-N lists of recommendations. Figure 1 shows the results for ML100K as the percentage of power users removed increases from 0% to 100% (0-50 power users). The drop in HR is modest as power users are removed because some SPU attackers remain in the dataset, i.e., they were below the specified power user selection threshold and contributed to increasing the HR. New/Est target items (not shown) are more difficult to attack compared to New targets and have a lower HR starting at 50% for ID and NR, 4% for AS. Removing 100% of the power users still leaves SPU attackers in the dataset, hence HR remains high; the PUA with AS is not effective at any level of removal. And as power users are removed, MAE varies between 0.80 to 0.83 for New and New/Est target items across all power user selection methods and all removal levels. P S for ML100K and New target items averaged 4.9 for all power user selection methods across all removal levels; for New/Est target items P S was 4.5 for InDegree and NumRatings, across all removal levels; 3.0 for AggSim. R for ML100K and New target items varied between 3 and 4 for all power user selection methods across all removal levels; for New/Est target items R varied between 13 and 15 across all selection meth- Figure 1: E1 – Hit Ratio and MAE as 0% to 100% of Power Users (Real & Synthetic) are removed using ML100K 6 Results and Discussion (E1) Power User Removal: Consisted of removing power users from the dataset (incrementally from 0% to 100%) prior to recommendation calculations (similarity and prediction). We conducted a series of PUAs against the user-based CF algorithm. Each PUA in this experiment uses a dataset with a specified number of SPU attackers (§ 5). The SPU profiles are injected with New or New/Est target items at 6 http://mahout.apache.org 516 ods and removal levels, indicating a less effective attack. We observed similar results with ML1M except that for ID, HR increased slightly as power users were removed, most likely due to the influence characteristics of the SPUs. R for New target items varied between 7 and 9, and for New/Est target items varied between 16 and 21 for all power user selection methods across all removal levels. The ARM measure indicated that 100% removal is the best mitigation for all attacks in E1. same number of power users, i.e., there is no removal of power users in this experiment. HR before the attack is 0% for New target items and 1% for New/Est target items for ML1M across all power user influence reduction levels. New target item results in Figure 2 for ML1M indicate that as similarity weighting is reduced from 1.0 to 0.1, HR remains flat for ID (81%), NR (74%), and AS (4%). When similarity weighting is set to zero, HR drops significantly for ID (to 16%) and for NR (to 9%), while remaining flat for AS (4%). HR is flat for ID and NR as similarity weighting is reduced from 1.0 to 0.1 and can be attributed to the fact that the SPUs are, in most cases, the only users that have rated the New target items; therefore, the SPUs dominate the influence within the neighborhoods keeping HR high and R low. The influence of power users (a mix of RPUs and some SPUs) can be observed in the significantly higher MAE results (lower accuracy) when similarity weighting is set to zero, i.e., without the power user influence, accuracy becomes much worse. The non-zero HR when similarity weighting is set to zero indicates that not all attackers (SPUs) have been removed from the prediction calculations. P S for ML1M and New target items was >4 and R ranged between 9 and 14 for all power user selection methods across all reduction levels. For New/Est targets (not shown), HR begins at a much lower level (40% for ID and NR, 4% for AS) and remains flat until similarity weighting drops to 0.0, mainly because SPU influence is not very dominant within the neighborhoods as many other users have rated established items. As similarity weighting is reduced from 1.0 to 0.1, MAE increases for ID and NR, and remains flat for AS; at 0.0 similarity weighting MAE also rises significantly. P S was >3.8 for New/Est target items and R ranged between 14 and 22 over all similarity weightings and selection methods. We observed similar results with ML100K except that HR starts slightly higher (90%) and R ranges lower (4-6 for New, 15-17 for New/Est targets) for ID and NR. For all attacks in E2, the ARM measure indicated that a similarity weighting reduction setting of 0.1 is the best mitigation for ID and NR, avoiding the spike in M AEaf ter albeit with high HR, and 0.0 for AS. (E3) Influence Reduction, selected Power Users: There is no power user removal in this experiment and only a percentage (varied from 0% to 100%) of power users are involved in the prediction calculation. The power users are selected randomly and have a similarity weighting of 1.0 if selected and 0.0 if not selected, during the prediction calculation. HR before the attack (not shown) is 0% for New target items and 1% for New/Est target items for ML1M across all power user influence reduction levels. For ML1M with New target items shown in Figure 3, as the percentage of power users is reduced from 100% to 10%, HR decreases from 80% to 20% for both ID and NR; for AS, HR remains flat at 4%. When the percentage of power users is zero, HR goes to 15% and 10% for ID and NR, and no change for AS. As the percentage of power users is reduced from 100% to 10%, MAE increases for ID and NR, remains flat (0.79) for AS. P S for ML1M and New target items was >4 for all power user selection methods and power user percentages. R ranges between 9 and 17 over all power user percentages Figure 3: E3 – Hit Ratio and MAE as 100% to 0% of Power Users’ (Real and Synthetic) Influence is applied using ML1M Figure 4: E3 – ARM Measure as 100% to 0% of Power Users’ (Real and Synthetic) Influence is applied using ML1M (E2) Influence Reduction, all Power Users: Consisted of varying the similarity weighting (incrementally from 0.0 to 1.0) applied to all power users (selected RPUs and SPUs) who are nearest neighbors during the prediction calculation. Each PUA in this experiment uses a dataset with the 517 and selection methods. For ML1M with New/Est targets (not shown), we observed a similar set of results except that HR begins at a lower level (40% for ID and NR, 3% for AS), P S is >3.8, and R ranges between 13 and 25 for all power user percentages and selection methods. We observed very similar results with ML100K except that HR starts slightly higher (90%) and R ranges lower (3-6 for New, 13-18 for New/Est targets) for ID and NR. For New target items, the ARM measure (see Figure 4) indicated that a percentage of power user reduction of 20-40% is the best mitigation for ID and NR (avoiding the larger values of M AEaf ter ) and 0.0 for AS. ARM results were the same for New/Est target items (not shown). Based on our results and using the ARM metric, the mitigation strategy that best balances accuracy and robustness for ID and NR PUAs is MS3; the AS PUA was not an effective attack in this study and did not require mitigation. For MS1, the ARM metric indicates 100% removal which leaves a very high Hit Ratio. And MS2 is marginally better than MS1, with the ARM metric indicating a similarity weighting of 0.1. Our hypothesis is accepted for MS3, partially accepted for MS2, and rejected for MS1. 7 Goyal, A., and Lakshmanan, L. V. S. 2012. Recmax: Exploiting recommender systems for fun and profit. In Proceedings KDD ’12. ACM. Herlocker, J. L.; Konstan, J. A.; Borchers, A.; and Riedl, J. 1999. An algorithmic framework for performing collaborative filtering. In Proc of the ACM SIGIR Conf. Herlocker, J. L.; Konstan, J. A.; Terveen, L. G.; and Riedl, J. 2004. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems. Lam, S. K., and Riedl, J. 2004. Shilling recommender systems for fun and profit. In Proceedings of the 13th international conference on World Wide Web. ACM. Lathia, N.; Hailes, S.; and Capra, L. 2008. knn cf: A temporal social network. In Proceedings of the 2nd ACM Recommender Systems Conference (RecSys ’08). Mehta, B., and Nejdl, W. 2009. Unsupervised strategies for shilling detection and robust collaborative filtering. User Modeling and User-Adapted Interaction 19(1-2). Mobasher, B.; Burke, R.; Bhaumik, R.; and Sandvig, J. 2007a. Attacks and remedies in collaborative recommendation. Intelligent Systems, IEEE 22(3):56 –63. Mobasher, B.; Burke, R.; Bhaumik, R.; and Williams, C. 2007b. Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness. ACM Trans. Internet Technol. O’Mahony, M. P.; Hurley, N.; and Silvestre, G. C. M. 2002. Promoting recommendations: An attack on collaborative filtering. In Proceedings of DEXA’02. O’Mahony, M. P.; Hurley, N.; and Silvestre, G. C. M. 2005. Recommender systems: Attack types and strategies. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05). Sandvig, J.; Mobasher, B.; and Burke, R. 2008. A survey of collaborative recommendation and the robustness of modelbased algorithms. IEEE Data Engineering Bulletin. Seminario, C. E., and Wilson, D. C. 2014. Attacking itembased recommender systems with power items. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14. ACM. Shani, G., and Gunawardana, A. 2011. Evaluating recommendation systems. In Ricci, F., et al., eds., Recommender Systems Handbook. Springer. Wasserman, S., and Faust, K. 1994. Social Network Analysis: Methods and Applications. New York, NY: Cambridge University Press. Williams, C.; Mobasher, B.; and Burke, R. D. 2007. Defending recommender systems: detection of profile injection attacks. Service Oriented Computing and Applications. Wilson, D. C., and Seminario, C. E. 2013. When power users attack: assessing impacts in collaborative recommender systems. In Proceedings of the 7th ACM conference on Recommender Systems, RecSys ’13. ACM. Wilson, D. C., and Seminario, C. E. 2014. Evil twins: Modeling power users in attacks on recommender systems. In Proceedings of the 22nd Conference on User Modelling, Adaptation and Personalization. Conclusion This paper evaluated power user attack mitigation approaches to address issues encountered when legitimate influential users (false positives) are removed along with attackers. We have shown that reducing similarity weighting during prediction calculation is an improvement over removal. We showed that there is a trade-off between accuracy (MAE) and robustness (Hit Ratio) when implementing mitigation strategies and have developed a metric to assist in evaluating this trade-off. Consistent with our previous work using user-based recommenders, we also showed that reducing the influence of power users contributes to a reduction in recommender system accuracy indicated by an increase in MAE; this shows how power users can impact recommendations. Our future work in this area will examine other recommender system algorithms and datasets. References Anand, S. S., and Griffiths, N. 2011. A market-based approach to address the new item problem. In Proceedings of the ACM Recommender Systems Conference. Burke, R.; O’Mahony, M. P.; and Hurley, N. J. 2011. Robust collaborative recommendation. In Ricci, F., et al., eds., Recommender Systems Handbook. Springer. Chirita, P.-A.; Nejdl, W.; and Zamfir, C. 2005. Preventing shilling attacks in online recommender systems. In WIDM ’05: Proceedings of the 7th annual ACM international workshop on Web information and data management. ACM. Desrosiers, C., and Karypis, G. 2011. A comprehensive survey of neighborhood-based recommendations methods. In Ricci, F.; Rokach, L.; Shapira, B.; and Kantor, P. B., eds., Recommender Systems Handbook. Springer. Domingos, P., and Richardson, M. 2001. Mining the network value of customers. In Proceedings of KDD’01. ACM. 518