Virtual Knowledge Studio (VKS) Information Studies What is Webometrics? Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK 1. Introduction □ Webometrics is concerned with gathering data on and measuring aspects of the Web □ □ □ □ □ □ web sites web pages hyperlinks web search engine results YouTube video commenter networks MySpace Friend networks □ …for very varied social science purposes New problems: Web-based phenomena □ Webometrics can be applied to understanding web-based phenomena □ Why do web sites interlink? □ Which web sites interlink? □ What interlinking patterns exist? □ What topics are frequently blogged about? Old problems: Offline phenomena reflected online □ Some offline phenomena have measurable online reflections □ International communication □ Inter-university collaboration □ University-business collaboration □ The impact or spread of ideas □ Public opinion 2. Examples Blog searching - blogpulse.com Example: Identifying and tracking public science concerns in blogs Over 100,000 Blogs and other sources tracked daily via RSS feeds Objective: to identify and track public concerns about science E.g., “Schiavo” identified and tracked as potential public science concern Example: The online impact of research groups (NetReAct) Austria Switzerland Geopolitical connected Belgium Germany Example: Links between EU universities France Spain NL UK Norway Italy Poland Finland Normalised linking, smallest countries removed Sweden International biofuels research network Example: MySpace age profiles percentage of profiles containing swearing moderate strong very strong sample size US males 16-19 10% 47% 2% 1,530 US females 16-19 11% 38% 2% 1,287 UK males 16-19 33% 33% 8% 171 UK females 16-19 18% 38% 3% 130 (typical sample size 20-148 for non-web swearing research) emphatic adverb/adjective OR adverbial booster OR premodifying intensifying negative adjective (36% of swearing) □ and we r guna go to town again n make a ryt fuckin nyt of it again lol □ see look i'm fucking commenting u back □ lol and stop fucking tickleing me!! □ Thanks for the party last night it was fucking good and you are great hosts. □ That 50's rock and roll weekender was fucking mint! □ Fuckin my space, my arse □ 1/2 d ppl cudnt even speak fuckin english! □ yeah so me and sarah broke up and everythings fucking shit YouTube – Video poster ages YouTube friend network Online impact - Keywords in web pages mentioning IWRM Data Gathering/Processing Tools □ Blogpulse.com – blog network diagrams □ LexiURL Searcher – links, web text, YouTube, Flickr, Technorati □ Issue Crawler, Google TouchGraph links Discussion points for online data □ Validity – is the underlying meaning of the text/video/picture readily apparent to the researcher? □ Possibly not to any great degree for teenagers’ MySpace comments or very personal YouTube videos □ Reliability –are search engines accurate/good at returning the correct results? □ Google blog search shows unreliability – very variable over time □ Researchers can triangulate different similar search engines or over time to test reliability Discussion points for online data □ Coverage – to what extent is all the phenomena of interest covered by the source (e.g., search engine) used? □ Sample bias – are certain types of people over-represented? (e.g., the more literate, the more vocal, the more politically active, youth, educated, creative types…) Summary □ The web contains a wide variety of interesting web and “web 2.0” content posted by many different people in many different formats □ Webometric methods can give insights into this data Books □ Thelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool. □ Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press. □ http://lexiurl.wlv.ac.uk http://webometrics.wlv.ac.uk http://www.issuecrawler.net Important considerations □ Data accuracy □ Data cleaning □ Context to help interpret results □ Report results carefully Example: Analysis of the accuracy of search engine results Live Search results analysis