QUICK DESIGN GUIDE

advertisement
Assessing Trustworthiness in Collaborative Environments
Jeffrey Segall, Michael Jay Mayhew, Michael Atighetchi, Rachel Greenstadt
Drexel University
js572@drexel.edu
US Air Force Research Laboratory
Michael.Mayhew@rl.af.mil
ABSTRACT
METHODOLOGY
NEXT STEPS
FEATURE EXTRACTION
• Collect pertinent classification data from large Wikipedia XML data dump and store only
relevant information in a SQL database
• Relational database allows for ease of searching and extraction into training and test sets
GROUND TRUTH
• To test the classification model’s accuracy, it is necessary to determine the true revert status of
each edit
• Edits with the same text hash and content length are considered equivalent
– In chronological order, edits in between two equivalent edits can be considered reverted
DATASET CREATION
• Sampling of the full edit histories of 10,000 randomly-selected pages of a total 237,000
– Results in approximately 250,000 data instances
CLASSIFICATION
• Training and testing with a support vector machine using SMO
Collaborative environments, specifically those concerning
information creation and exchange, increasingly demand
notions of trust accountability. In the absence of explicit
authority, the quality of information is often unknown. Using
Wikipedia edit sequences as a use case scenario, we detail
experiments in the determination of community-based user
and document trust. Our results show success in answering the
first of many research questions: Provided a user’s edit history,
is a given edit to a document positively contributing to its
content? We detail how the ability to answer this question
provides a preliminary framework towards a better model for
collaborative trust and discuss subsequent areas of research
necessary to broaden its utility and scope.
GOALS
Assess Trustworthiness in collaborative
document editing environments:
Should a page status be revised?
Should revision R to content C be allowed?
COLLABORATIVE TRUST
Consensus-based Model
• Communities following their own model for assigning trust
• Wikipedia’s award culture has evolved as a method of
recognizing and motivating significant contributions
Authoritative Model
• Roots of trust are defined by a hierarchically-structured
graph of authority
Trustworthy users create trustworthy content.
Trustworthy content is created by trustworthy users.
Untrustworthy users create untrustworthy content.
Untrustworthy content is created by untrustworthy
users.
Vandalism Detection
• Vandalism is only a subset of reverted edits
– Information may be incorrect or unusable without the
malicious intent implied by vandalism
– Such incorrect information may not be detected by
current systems if it follows patterns of constructive
behavior
RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com
Category Information
• Each page on Wikipedia can pertain to up to two categories
• Category hierarchies can determine user knowledge areas
– Reverted edits on a certain topic may not relate to
those on another topic
– Create a notion of field expertise, or lack thereof
Future Experiments
• Will a page be gain or lose award status?
• Is a user likely to be banned?
• Should a page be locked for editing?
• Full English Wikipedia as a platform
• More pages, data, edits, controversy
ACKNOWLEDGEMENTS
Page Features
User Features
• Documents — Wikipedia pages
• Users — Wikipedia editors
Drexel University
greenie@drexel.edu
Raytheon BBN Technologies
matighet@bbn.com
admin
Is the editing user a Wikipedia administrator?
bureaucrat
Is the editing user a Wikipedia bureaucrat?
bot
Is the editing user on the list of approved bots?
edit_count
The number of edits made by the editing user
revert_count
The number of reverted edits made by the editing user
revert_percentage
The percentage of edits made that were reverted
is_good
Was the edit made to a page with the “Good Page” award?
is_vgood
Was the edit made to a page with the “Very Good Page”
award?
Edit Features
delta_time
The time (in seconds) since the previous update was made
to the page
content_length
The length of the page content
delta_length
The change in page content from the last update
RESULTS
TIMING
• Approximately 2.5 hours for training on approximately 250,000 instances
• Approximately 1.27 microseconds per classification test instance
Correctly Classified Instances
Incorrectly Classified Instances
Mean Absolute Error
Root Mean Squared Error
ACCURACY
• 97% correct classification rate
• 0.8% false positive rate
96.9983%
3.0017%
0.03
0.1733
Class
TP Rate
FP Rate
Precision
Recall
F-Measure
ROC Area
Reverted
0.715
0.008
0.891
0.715
0.794
0.854
Non-Revert
0.992
0.285
0.975
0.992
0.984
0.854
This work was sponsored by the US Air Force Research Laboratory
(AFRL). The authors would like to thank John Benner of Booz
Allen Hamilton, Jonathan Webb of BBN, Joseph Muoio, and
Pavan Kantharaju of Drexel University for their research
contribution.
REFERENCES
[1] A. Appleby. Murmurhash. https://sites.google.com/site/murmurhash.
[2] R. Geiger and D. Ribes. The work of sustaining order in wikipedia:
The banning of a vandal. In Proceedings of the 2010 ACM conference
on Computer Supported Cooperative Work, pages 117-126, 2010.
[3] A. Halfaker, A. Kittur, and J. Riedl. Don’t bite the newbies: how
reverts affect the quantity and quality of wikipedia work. In
International Symposium on Wikis and Open Collaboration, 2011
[4] S. Kamvar, M. Schlosser, and H. Garcia-Molina. The eigentrust
algorithm for reputation management in p2p networks. In Proceedings
of the Twelfth International World Wide Web Conference, 2003.
[5] S. Keerthi, S. Shevade, C. Bhattacharyya, and K. Murthy.
Improvements to Platt’s SMO algorithm for SVM classifier design.
Neural Computation, 13(3):637-649, 2001
[6] A. Kittur and R. Kraut. Harnessing the widsom of crowds in
wikipedia: Quality through coordination. In Proceedings of the 2008
ACM conference on Computer Supported Cooperative Work, pages 3746, 2008.
[7] A. Kittur, B. Su, and E. Chi. Can you ever trust a wiki? Impacting
perceived trustworthiness in wikipedia. In Proceedings of the 2008
ACM conference on Computer Supported Cooperative Work, page
477-480, 2008.
[8] R. Levien. Attack Resistant Trust Metrics. Ph.D. thesis, UC Berkeley,
2004. Draft Only.
[9] D. McDonald, S. Javanmard, and M. Zachry. Finding patterns in
behavioral observations by automatically labeling forms of wikiwork
in barnstars. In International Symposium on Wikis and Open
Collaboration, 2011.
[10] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation
ranking: Bringing order to the web. Technical Report SIDL-WP1999-0120, Stanford InfoLab, 1999.
[11] J.C. Platt. Sequential minimal optimization: A fast algorithm for
training support vector machines. Technical Report MSR-TR-98-14,
Microsoft Research, April 1998.
Distribution A. Approved for public release; distribution
unlimited (88ABW-2012-4350, 07 Aug. 2012).
Download