Request for New Course EASTERN MICHIGAN UNIVERSITY DIVISION OF ACADEMIC AFFAIRS REQUEST FOR NEW COURSE DEPARTMENT/SCHOOL: ____COMPUTER SCIENCE______________________COLLEGE: ARTS AND SCIENCE CONTACT PERSON: _______WILLIAM MCMILLAN__________________________________________________________________ CONTACT PHONE: 7-1063 CONTACT EMAIL: WMCMILLAN@EMICH.EDU REQUESTED START DATE: TERM____FALL_________YEAR____2012_______ A. Rationale/Justification for the Course In a September 15, 2008 piece in Scientific American, Nigel Shadbolt and Tim Berners-Lee wrote, “…the Web is more than the sum of its pages. Vast emergent properties have arisen that are transforming society. … A new branch of science—Web science—aims to address [these phenomena].”1 The World Wide Web, the tools used to interact with it, its many user communities, its vast and complex data stores, and its relationships with all systems that characterize modern civilization define collectively a natural system that is increasingly the subject of empirical scientific study. Just as biological life forms have evolved into complex systems deserving of scientific study, this Internet-based amalgamation has evolved to be an important subsystem of the natural environment. No single collection of people has designed it explicitly or can derive through pure logic its future behavior. It is neither a purely technological system nor a purely human system. It is a system composed of many interacting heterogeneous subsystems, similar in many ways to geologic or biologic systems. This course samples from the many empirical research techniques that have been employed in computer science to give a general education student experience in carrying out laboratory studies in the field and an introductory theoretical view of the Web as a natural system. The theoretical foundations for the course, employed across topics, can be grouped into these major categories: • • • • 1 Principles of Computational Thinking2: Algorithms, distributed computing, semantic modeling, heuristic search, pattern matching. Complex and chaotic systems: Emergent properties of complex, dynamic systems that are sensitive to initial conditions, “fractal” in nature, and have the appearance of being non-deterministic. Patterns of Web use by individuals and groups: Theories of human-computer interaction such as the GOMS model, ways in which computing technology can enable social interaction, Web ecosystems, privacy and ethics principles. Techniques and theories supporting the computing infrastructure: Computer networking, database structures, data representation, methods of managing contention for common resources, techniques to ensure data security. http://www.scientificamerican.com/article.cfm?id=web-science 2 Computational thinking is a phrase coined by Jeanette Wing of Carnegie Mellon University to refer to the generalized principles and theories of computer science that can be employed across multiple domains. Miller, New Course Sept. 09 New Course Form B. Course Information 1. Subject Code and Course Number: 2. Course Title: COSC 104 Web Science 3. Credit Hours: 3 4. Repeatable for Credit? Yes_______ No__x____ If “Yes”, how many total credits may be earned?_______ 5. Catalog Description (Limit to approximately 50 words.): Empirical study of the global, emergent systems facilitated by the World Wide Web. Computing infrastructure that enables entities to interact and share information via dynamic, virtual systems. Theoretical foundations such as chaotic and complex systems, computational thinking, and virtual interaction spaces. Empirical methods. Forming hypotheses from general principles and testing them through data collection and analysis. 6. Method of Delivery (Check all that apply.) a. Standard (lecture/lab) X On Campus X Off Campus b. Fully Online c. Hybrid/ Web Enhanced 7. Grading Mode: Normal (A-E) X Credit/No Credit 8. Prerequisites: Courses that MUST be completed before a student can take this course. (List by Subject Code, Number and Title.) None. 9. Concurrent Prerequisites: Code, Number and Title.) Courses listed in #5 that MAY also be taken at the same time as a student is taking this course. (List by Subject None. 10. Corequisites: Courses that MUST be taken at the same time as a student in taking this course. (List by Subject Code, Number and Title.) None. 11. Equivalent Courses. A student may not earn credit for both a course and its equivalent. A course will count as a repeat if an equivalent course has already been taken. (List by Subject Code, Number and Title) None. 12. Course Restrictions: Miller, New Course Sept. ‘09 Page 2 of 5 New Course Form a. Restriction by College. Is admission to a specific College Required? College of Business Yes No x College of Education Yes No x b. Restriction by Major/Program. Will only students in certain majors/programs be allowed to take this course? Yes No X If “Yes”, list the majors/programs c. Restriction by Class Level Check all those who will be allowed to take the course: Undergraduate Graduate All undergraduates____X___ All graduate students____ Freshperson Certificate Sophomore Masters Junior Specialist Senior Doctoral Second Bachelor___X_____ UG Degree Pending_____ Post-Bac. Tchr. Cert._____ Low GPA Admit_______ Note: If this is a 400-level course to be offered for graduate credit, attach Approval Form for 400-level Course for Graduate Credit. Only “Approved for Graduate Credit” undergraduate courses may be included on graduate programs of study. Note: Only 500-level graduate courses can be taken by undergraduate students. Undergraduate students may not register for 600-level courses d. Restriction by Permission. Will Departmental Permission be required? Yes No (Note: Department permission requires the department to enter authorization for every student registering.) 13. Will the course be offered as part of the General Education Program? Yes X X No If “Yes”, attach Request for Inclusion of a Course in the General Education Program: Education for Participation in the Global Community form. Note: All new courses proposed for inclusion in this program will be reviewed by the General Education Advisory Committee. If this course is NOT approved for inclusion in the General Education program, will it still be offered? Yes No X C. Relationship to Existing Courses Within the Department: 14. Will this course will be a requirement or restricted elective in any existing program(s)? Yes No X If “Yes”, list the programs and attach a copy of the programs that clearly shows the place the new course will have in the curriculum. Program Required Restricted Elective Program Required Restricted Elective 15. Will this course replace an existing course? Yes Miller, New Course Sept. ‘09 No X Page 3 of 5 New Course Form 16. (Complete only if the answer to #15 is “Yes.”) a. Subject Code, Number and Title of course to be replaced: b. Will the course to be replaced be deleted? Yes No 17. (Complete only if the answer #16b is “Yes.”) If the replaced course is to be deleted, it is not necessary to submit a Request for Graduate and Undergraduate Course Deletion. a. When is the last time it will be offered? Term Year b. Is the course to be deleted required by programs in other departments? Contact the Course and Program Development Office if necessary. Yes No c. If “Yes”, do the affected departments support this change? Yes No If “Yes”, attach letters of support. If “No”, attach letters from the affected department explaining the lack of support, if available. Outside the Department: The following information must be provided. Contact the Course and Program Development office for assistance if necessary. 18. Are there similar courses offered in other University Departments? If “Yes”, list courses by Subject Code, Number and Title Yes No X 19. If similar courses exist, do the departments in which they are offered support the proposed course? Yes No If “Yes”, attach letters of support from the affected departments. If “No”, attach letters from the affected department explaining the lack of support, if available. D. Course Requirements 20. Attach a detailed Sample Course Syllabus including: a. b. c. d. e. f. g. h. Course goals, objectives and/or student learning outcomes Outline of the content to be covered Student assignments including presentations, research papers, exams, etc. Method of evaluation Grading scale (if a graduate course, include graduate grading scale) Special requirements Bibliography, supplemental reading list Other pertinent information. NOTE: COURSES BEING PROPOSED FOR INCLUSION IN THE EDUCATION FOR PARTICIPATION IN THE GLOBAL COMMUNITY PROGRAM MUST USE THE SYLLABUS TEMPLATE PROVIDED BY THE GENERAL EDUCATION ADVISORY COMMITTEE. THE TEMPLATE IS ATTACHED TO THE REQUEST FOR INCLUSION OF A COURSE IN THE GENERAL EDUCATION PROGRAM: EDUCATION FOR PARTICIPATION IN THE GLOBAL COMMUNITY FORM. Miller, New Course Sept. ‘09 Page 4 of 5 New Course Form E. Cost Analysis (Complete only if the course will require additional University resources. Fill in Estimated Resources for the sponsoring department(s). Attach separate estimates for other affected departments.) Estimated Resources: Year One Year Two Year Three Faculty / Staff $_________ $_________ $_________ SS&M $_________ $_________ $_________ Equipment $_________ $_________ $_________ Total $____0_____ $____0_____ $_0________ F. Action of the Department/School and College 1. Department/School Vote of faculty: For ___15____ Against ____0____ Abstentions ___0_____ (Enter the number of votes cast in each category.) <signed> W. W. McMillan Department Head/School Director Signature 12 Mar 2012 Date 2. College/Graduate School A. College College Dean Signature Date B. Graduate School (if Graduate Course) Graduate Dean Signature Date G. Approval Associate Vice-President for Academic Programming Signature Miller, New Course Sept. ‘09 Date Page 5 of 5 COSC 104: Web Science MASTER SYLLABUS Rationale (for Gen Ed Inclusion): In a September 15, 2008 piece in Scientific American, Nigel Shadbolt and Tim Berners-Lee wrote, “…the Web is more than the sum of its pages. Vast emergent properties have arisen that are transforming society. … A new branch of science—Web science—aims to address [these phenomena].”1 The World Wide Web, the tools used to interact with it, its many user communities, its vast and complex data stores, and its relationships with all systems that characterize modern civilization define collectively a natural system that is increasingly the subject of empirical scientific study. Just as biological life forms have evolved into complex systems deserving of scientific study, this Internet-based amalgamation has evolved to be an important subsystem of the natural environment. No single collection of people has designed it explicitly or can derive through pure logic its future behavior. It is neither a purely technological system nor a purely human system. It is a system composed of many interacting heterogeneous subsystems, similar in many ways to geologic or biologic systems. This course samples from the many empirical research techniques that have been employed in computer science to give a general education student experience in carrying out laboratory studies in the field and an introductory theoretical view of the Web as a natural system. The theoretical foundations for the course, employed across topics, can be grouped into these major categories: • • • • 1 Principles of Computational Thinking2: Algorithms, distributed computing, semantic modeling, heuristic search, pattern matching. Complex and chaotic systems: Emergent properties of complex, dynamic systems that are sensitive to initial conditions, “fractal” in nature, and have the appearance of being nondeterministic. Patterns of Web use by individuals and groups: Theories of human-computer interaction such as the GOMS model, ways in which computing technology can enable social interaction, Web ecosystems, privacy and ethics principles. Techniques and theories supporting the computing infrastructure: Computer networking, database structures, data representation, methods of managing contention for common resources, techniques to ensure data security. http://www.scientificamerican.com/article.cfm?id=web-science 2 Computational thinking is a phrase coined by Jeanette Wing of Carnegie Mellon University to refer to the generalized principles and theories of computer science that can be employed across multiple domains. 1 Course Description: Empirical study of the global, emergent systems facilitated by the World Wide Web. Computing infrastructure that enables entities to interact and share information via dynamic, virtual systems. Theoretical foundations such as chaotic and complex systems, computational thinking, and virtual interaction spaces. Empirical methods. Forming hypotheses from general principles and testing them through data collection and analysis. Course Credits: 3.0 Course Pre-requisites: None. Course Goals: Students will be able to: 1. explain and apply the theoretical foundations of the natural science of the web comprising computational principles, complex and chaotic systems, human subsystems, and principles of infrastructure of distributed computing systems; 2. design and implement empirical studies in order to test hypotheses derived from theories of the Web; 3. proficiently use computing environments that enable the interaction and sharing of information in a disciplined, effective, and secure manner; and 4. provide and explain examples of dynamic virtual Web systems and their use in interacting and sharing information. Method of Evaluation: Laboratory Reports 40% Homework Exercises 15% Quizzes and Exams 45% Grading Scale: A: 93-100 A-: 90-92 B+: 87-89 B: 83-86 B-: C+: C: C-: 80-82 77-79 73-76 70-72 D+: D: D-: E: 67-69 63-66 60-62 < 60 Suggested Textbooks: 1. Coursepack and laboratory manual (currently under development). 2. Mining the Social Web by Matthew Russel (O'Reilly Media, Incorporated, 2011. ISBN13: 9781449388348; ISBN: 1449388345) 3. Introductory Statistics for Engineering Experimentation by Peter R. Nelson, Karen A.F. Copeland, Marie Coffin (Elsevier, 2003. ISBN: 98780125154239) 2 Course Topics and Schedule3: 1. The Web as a Natural System (1 week) a. Importance to modern civilization of virtual, Web-based systems b. Comparison of the Web’s complexity and evolutionary development to, e.g., biological, psychological, and geologic systems. c. Overview of underlying theories d. Introduction to empirical methods 2. The Web as a Data Environment (1.5 week) a. Web pages and documents as an evolving set of emergent databases b. Indexing methods and effects on searching c. Comparing existing search technologies i. Precision versus coverage ii. Signal detection - Receiver Operating Characteristics 3. Connectivity and Routing between Web-Connected Elements (2 weeks) a. Ways in which computing, data, and human elements are connected on the Web b. How information is transmitted c. Routing and reassembly of information d. Identification of subsystems on the Web e. Determining packets’ paths with tools such as Traceroute i. Identifying bottlenecks ii. Effect on traffic of temporal cycles, e.g., time-of-day iii. Measuring logical path length as number of hops f. Getting information about Web nodes using, e.g., Ping i. End-to-end delay ii. Nature of units of information transmitted, e.g., packet size g. Resolving Web addresses via DNS lookup 4. Web-Based Communities (1.5 weeks) a. Social spaces on the Web b. Kinds of Web interactions c. Collective community actions d. Individuals’ behavior in relation to Web communities (identity, reputation, presence, and engagement) 5. Data on the Web (2 weeks) a. A closer look at kinds of data b. Methods of representation and storage c. Quality of data vs. efficiency d. Computing infrastructure to support data repositories and interconnectivity e. Virtual subsystems on the Web as emergent databases f. The Cloud as a popular way to conceptualize virtual subsystems 6. Privacy and Security (1.5 weeks) a. Sources and nature of threats and vulnerabilities 3 This topic outline focuses on concepts and theories covered in lecture and discussion. Related laboratory exercises are specified separately. 3 b. Communities’ collective defenses as immune responses c. Techniques for ensuring security d. Accessibility vs. privacy 7. Individuals’ Usage Patterns (1 week) a. The user as an element in Web dialogues b. Categories of patterns of use, including: i. Means ends analysis ii. Foraging (short-horizon means-ends) iii. Hedonistic c. Individuals’ assessment of, and reactions to, risks on the Web 8. Artificial Intelligence and the Web (1.5 weeks) a. AI as heuristic reasoning and state space search b. Web-based systems that employ AI c. Modeling of users by subsystems on the Web i. Recommender systems ii. Profiling to predict likely actions d. Finding meaningful patterns via data mining 9. Distributed Computing (1.5 weeks) a. Web-based subsystems as computational engines b. Non-centralized control of computation compared to hierarchical control c. Developing virtual computing structures via the Cloud d. Infrastructure i. Technological ii. Human e. Maintaining integrity of data and computational results 10. The future of the Web (1 week) a. Society’s perceptions of the Web, its effects, and its directions (employing popular literature and commentary) b. Students’ predictions and evaluations of emerging Web-based entities and forces c. Ethics in the age of the Web Academic Dishonesty Policy: Academic dishonesty, including all forms of cheating, falsification, and/or plagiarism, will not be tolerated in this course. Each student is expected to submit individually prepared work. Any instance of academic dishonesty during any exam will result in an automatic failing grade for the course. You are free to discuss problems or questions on laboratory exercises, or projects with your classmates. However, the submitted work should reflect the individual student’s efforts. Any student who submits work, which has been determined as not his/her own, will be given a grade of ZERO for the first offense. In addition, ALL other students involved will also be given a grade of ZERO. Any subsequent instance of academic dishonesty will result in a failing grade for the course. In addition, you may be referred to the Office of Student Judicial Services for discipline that can result in either a suspension or permanent dismissal. The Student Conduct Code contains 4 detailed definitions of what constitutes academic dishonesty. If you are not sure about whether something you are doing would be considered academic dishonesty, consult with the course instructor. You may access the Code and other helpful resources online at http://www.emich.edu/sjs. Students with Disabilities: If you require special arrangements due to a disability, please see the instructor as early in the term as possible. Bibliography Search Engines 1. Alfano, Marco and Biagio Lenzitti. “A web search methodology for different user typologies,” CompSysTech '09: Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing, June 2009. 2. Feng, Juan and Xiaoquan (Michael) Zhang. “Dynamic price competition on the internet: advertising auctions,” EC '07: Proceedings of the 8th ACM conference on Electronic commerce, June 2007. 3. Ghose, Anindya and Sha Yang “Analyzing search engine advertising: firm behavior and cross-selling in electronic markets,” WWW '08: Proceeding of the 17th international conference on World Wide Web, April 2008. 4. Jansen, Bernard J. “The comparative effectiveness of sponsored and nonsponsored links for Web e-commerce queries,” Transactions on the Web (TWEB) , Volume 1 Issue 1, May 2007. 5. Juan, Yun-Fang and Chi-Chao Chang. “An analysis of search engine switching behavior using click streams,” WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web, May 2005. 6. Lahaie, Sébastien and David M. Pennock. “Revenue analysis of a family of ranking rules for keyword auctions,” EC '07: Proceedings of the 8th ACM conference on Electronic commerce, June 2007. 7. McCown, Frank and Michael L. Nelson. “Agreeing to disagree: search engines and their public interfaces,” JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, June 2007. 8. Poblete, Barbara and Ricardo Baeza-Yates. “Query-sets: using implicit feedback and query patterns to organize web documents,” Proceedings of the 17th international conference on World Wide Web (New York, NY, 2008). 9. Poremsky, Diane. Google and Other Search Engines: Visual Quickstart Guide (Peachpit Press, May, 2004). 10. Sun, Jian-Tao, Xuanhui Wang, Dou Shen, Hua-Jun Zeng, and Zheng Chen. “CWS: a comparative web search system,” WWW '06: Proceedings of the 15th international conference on World Wide Web, May 2006. 11. Vrochidis, Stefanos, Ioannis Kompatsiaris, and Ioannis Patras. “Optimizing visual search with implicit user feedback in interactive video retrieval,” CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval, July 2010. Connectivity and Routing 12. Ahsan, Habib Md. and Abrams Marc. “Analysis of Bottlenecks in International Internet 5 Links,” Analysis of Bottlenecks in International Internet Links. Virginia Polytechnic Institute & State University Blacksburg, VA, USA ©1998. 13. Bickerstaff, Cindy, Ken True, Charles Smothers, Tod Oace, Jeff Sedayao, and Clinton Wong. “Don't just talk about the weather - manage it! a system for measuring, monitoring, and managing internet performance and connectivity,” NETA'99 Proceedings of the 1st conference on Conference on Network Administration - Volume 1, USENIX Association Berkeley, CA, USA ©1999. 14. Chen, Thomas M. “Increasing the observability of Internet behavior,” Communications of the ACM, Volume 44 Issue 1, Jan. 2001. ACM New york, NY, USA. 15. Fan, Xun and John Heidemann. “Selecting representative IP addresses for internet topology studies,” IMC '10 Proceedings of the 10th annual conference on Internet measurement. (ACM New York, NY, USA ©2010 ISBN: 978-1-4503-0483-2) 16. Kar, Dulal C. “Internet path characterization using common internet tools,” Journal of Computing Sciences in Colleges. Volume 18 Issue 4, April 2003. Consortium for Computing Sciences in Colleges , USA. 17. Logg, Connie, Les Cottrell and Jiri Navratil. “Experiences in traceroute and available bandwidth change analysis,” NetT '04 Proceedings of the ACM SIGCOMM workshop on Network troubleshooting: research, theory and operations practice meet malfunctioning reality (ACM New York, NY, USA ©2004. ISBN:1-58113-942-X). 18. Olivieira, Ricardo V., Beichuan Zhang and Lixia Zhang. “Observing the evolution of internet as topology,” SIGCOMM '07 Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications. ISBN: 978-159593-713-1. ACM SIGCOMM Computer Communication Review, Volume 37 Issue 4, October 2007. ACM New York, NY, USA. 19. Rasti, Amir H., Nazanin Magharei, Reza Rejaie, and Walter Willinger. “Eyeball ASes: from geography to connectivity,” IMC '10 Proceedings of the 10th annual conference on Internet measurement. ACM New York, NY, USA ©2010. 20. Viger, Fabien Viger, Brice Augustin, Xavier Cuvellier, Clémence Magnien, Matthieu Latapy, Timur Friedman, and Renata Teixeira. “Detection, understanding, and prevention of traceroute measurement artifacts,” Computer Networks: The International Journal of Computer and Telecommunications Networking, Volume 52 Issue 5, April, 2008. User Communities 21. Calvi, Licia. “Personal networks as a case for online communities: two case studies.” International Journal of Web Based Communities, Volume 5 Issue 1. November 2008. 22. Paul, Sheila A., Marianne Jensen, Chui Yin Wong and Chee Weng Khong. “Socializing in mobile gaming.” DIMEA '08: Proceedings of the 3rd international conference on Digital Interactive Media in Entertainment and Arts. September 2008. 23. Robu, Valentin, Harry Halpin and Hana Shepherd. “Emergence of consensus and shared vocabularies in collaborative tagging systems.” Transactions on the Web (TWEB), Volume 3 Issue 4. September 2009. 24. Sosa, Manuel E.. “Where Do Creative Interactions Come From? The Role of Tie Content and Social Networks.” Organization Science, Volume 22 Issue 1. January 2011. 25. Torkjazi, Mojtaba, Reza Rejaie and Walter Willinger. “Hot today, gone tomorrow: on the migration of MySpace users.” WOSN '09: Proceedings of the 2nd ACM workshop on Online 6 social networks. August 2009. 26. Valafar, Masoud, Reza Rejaie and Walter Willinger. “Beyond friendship graphs: a study of user interactions in Flickr.” WOSN '09: Proceedings of the 2nd ACM workshop on Online social networks. August 2009 Data 27. Chierichetti, Flavio, Silvio Lattanzi and Alessandro Panconesi. “Gossiping (via mobile?) in social networks.” DIALM-POMC '08: Proceedings of the fifth international workshop on Foundations of mobile computing. August 2008. 28. Huang, Lailei and Zhengyou Xia. “User Character and Communication Pattern Detecting on Social Network Site.” ICEC '09: Proceedings of the 2009 International Conference on Engineering Computation. May 2009. 29. Leung, Cane Wing-ki, Ee-Peng Lim, David Lo and Jianshu Weng. “Mining interesting link formation rules in social networks.” CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management. October 2010. Privacy 30. Asuncion, Arthur U. & Michael T. Goodrich. "Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships and Other Sensitive Binary Attribute Vectors." WPES '10: Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society. October 2010. 31. Freni, Dario, Carmen Ruiz Vicente, Sergio Mascetti, Claudio Bettini & Christian S. Jensen. "Preserving Location and Absence Privacy in Geo-Social Networks." CIKM '10: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. October 2010. 32. Zhou, Bin and Jian Pei. “Preserving Privacy in Social Networks Against Neighborhood Attacks” ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. April 2008. Individual User Behavior Patterns 33. Bhavanani, Suresh K. "Domain-Specific Search Strategies for the Effective Retrieval of Healthcare and Shopping Information." CHI. pp. 610 - 611. 2002. 34. Kalbach, James. "Designing for Information Foragers: A Behavioral Model for Information Seeking on the World Wide Web." Internet Technical Group,3.3, Dec. 2000. 35. White, Ryen W. & Steven M. Drucker. "Investigating Behavioral Variability in Web Search." International World Wide Web Conference. pp. 21 - 30. 2007. 36. Widen-Wulff, Gunilla, Stefan Ek, Mariam Ginman, Reija Perttila, Pia Sodergard & AnnaKarin Totterman. "Information Behavior Meets Social Capital: A Conceptual Model." Journal of Information Science, Volume 34 Issue 3. June 2008. Artificial Intelligence 37. Cohen, Paul R. Empirical Methods for Artificial Intelligence. (MIT Press, 1995. ISBN 10: 0262032252) 38. Denker, Manfred, Wojbor A. Woyczynski and Bernard Ycart. Introductory Statistics and Random Phenomena: Uncertainty, Complexity and Chaotic Behavior in Engineering and 7 Science. Statistics for Industry and Technology, Nov 1, 1998. ISBN: 0817640312 39. Dressier, Fuchs, Truchat, Yao, Lu, and Marquardt. “Profile-Matching Techniques for OnDemand Software Management in Sensor Networks.” EURASIP Journal on Wireless Communication and Networking. Vol. 2007. 40. Li, Yung-Ming and Han-Wen Hsiao. “Recommender Service for Social Network based Applications.” ICEC '09: Proceedings of the 11th International Conference on Electronic Commerce. August 2009. 41. Lu, Elchstaedt, and Ford. “Efficient Profile Matching for Large Scale Webcasting.” Proceedings of the Seventh International World Wide Web Conference. April 1998. 42. Ma, Hao, Dengyong Zhou, Chao Liu and Michael R. Lyu, Irwin King. “Recommender systems with social regularization.” WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining. February 2011. Distributed Computing 43. Coope, Trueulli, et al. “The Challenge of Designing Scientific Discovery Games.” Foundations of Digital Games Conference. 2010. 44. Goldsmith and Owen. “The Search for Life in the Universe.” University Science Books. 3rd ed. 2001. 45. Lang and Armitage. “An Ns2 Model for the Xbox System Link Gane Halo.” University Science Books. ATNAAC 2003. Melbourne, Australia. Web Science, Social Networking 46. Abraham, Ajith, Aboul-Ella Hassanien and Vaclav Snáel. Computational Social Network Analysis: Trends, Tools and Research Advances (Computer Communications and Networks, Springer-Verlag London, Dec 21, 2009. ISBN: 9781848822283) 47. Ang, Chee Siang and Panayiotis Zaphiris. “Simulating Social Networks of Online Communities: Simulation as a Method for Sociability Design.” INTERACT '09: Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II. August 2009. 48. Asuncion, Arthur U. and Michael T. Goodrich. “Turning privacy leaks into floods: surreptitious discovery of social network friendships and other sensitive binary attribute vectors.” WPES '10: Proceedings of the 9th annual ACM workshop on Privacy in the electronic society. October 2010. 49. Bakshy, Eytan, Brian Karrer and Lada A. Adamic. “Social influence and the diffusion of user-created content.” EC '09: Proceedings of the 10th ACM conference on Electronic commerce. July 2009. 50. Berners-Lee, Tim, Wendy Hall, James A. Hendler. A Framework for Web Science (Foundations and Trends(R) in Web Science S.). Now Publishing, 2006. ISBN: 1-93301933-6) 51. Burke, Moira, Cameron Marlow and Thomas Lento. “Social network activity and social well-being.” CHI '10: Proceedings of the 28th international conference on Human factors in computing systems. April 2010. 52. Cheung, Christy M. K. and Matthew K. O. Lee. “A theoretical model of intentional social action in online social networks.” Decision Support Systems, Volume 49 Issue 1. April 2010. 53. Daradoumis, Thanasis, Alejandra Martínez-Monés and Fatos Xhafa. “A layered framework 8 for evaluating on-line collaborative learning interactions.” International Journal of HumanComputer Studies , Volume 64 Issue 7. July 2006. 54. De Choudhury, Munmun, Winter A. Mason, Jake M. Hofman and Duncan J. Watts. “Inferring relevant social networks from interpersonal communication.” WWW '10: Proceedings of the 19th international conference on World Wide Web. April 2010. 55. Easley, David and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. (Cambridge University Press, July 19, 2010. ISBN: 0521195330) 56. Eubank, Stephen, V. S. Anil Kumar, Madhav V. Marathe, Aravind Srinivasan and Nan Wang. “Structural and algorithmic aspects of massive social networks.” SODA '04: Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms. January 2004. 57. Freni, Dario, Carmen Ruiz Vicente, Sergio Mascetti, Claudio Bettini and Christian S. Jensen. “Preserving location and absence privacy in geo-social networks” CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management. October 2010. 58. Fushimi, Takayasu, Takashi Kawazoe, Kazumi Saito, Masahiro Kimura and Hiroshi Motoda. “What Does an Information Diffusion Model Tell about Social Network Structure?” Knowledge Acquisition: Approaches, Algorithms and Applications. May 2009. 59. Groh, Georg and Christian Ehmig. “Recommendations in taste related domains: collaborative filtering vs. social filtering.” GROUP '07: Proceedings of the 2007 international ACM conference on Conference on supporting group work. November 2007. 60. Huang, Jian, Ziming Zhuang, Jia Li and C. Lee Giles. “Collaboration over time: characterizing and modeling network evolution.” WSDM '08: Proceedings of the international conference on Web search and web data mining. February 2008. 61. Kimmerle, Joachim, Johannes Moskaliuk and Ulrike Cress. “Learning and knowledge building with social software.” CSCL'09: Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 1, Volume 1. June 2009. 62. Kleinberg, Jon. “Social networks, incentives, and search” SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. August 2006. 63. Kuan, Huei-Huang and Gee-Woo Bock. “Trust transference in brick and click retailers: An investigation of the before-online-visit phase.” Information and Management, Volume 44 Issue 2. March 2007. 64. Kuhlman, Chris J., V. S. Anil Kumar, Madhav V. Marathe, S. S. Ravi and Daniel J. Rosenkrantz. Finding critical nodes for inhibiting diffusion of complex contagions in social networks.” ECML PKDD'10: Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II. September 2010. 65. Lin, Chieh-Peng. “Assessing the mediating role of online social capital between social support and instant messaging usage.” Electronic Commerce Research and Applications, Volume 10 Issue 1. January 2011. 66. Lin, Kuan-Yu and Hsi-Peng Lu. “Why people use social networking sites: An empirical study integrating network externalities and motivation theory.” Computers in Human Behavior, Volume 27 Issue 3. May 2011. 67. Magnani, Matteo, Danilo Montesi and Luca Rossi. “Information Propagation Analysis in a Social Network Site.” ASONAM '10: Proceedings of the 2010 International Conference on 9 Advances in Social Networks Analysis and Mining. August 2010. 68. Malinka, Kamil and Jiri Schafer. “Development of Social Networks in Email Communication.” ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and Protection. May 2009. 69. Maserrat, Hossein and Jian Pei. “Neighbor query friendly compression of social networks.” KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. July 2010. 70. Mayer, Adalbert. “Online social networks in economics.” Decision Support Systems, Volume 47 Issue 3. June 2009. 71. Mislove, Alan, Hema Swetha Koppula, Krishna P. Gummadi, Peter Druschel and Bobby Bhattacharjee. “Growth of the flickr social network.” WOSP '08: Proceedings of the first workshop on Online social networks. August 2008. 72. Moturu, Sai T., Jian Yang and Huan Liu. “Quantifying Utility and Trustworthiness for Advice Shared on Online Social Media” CSE '09: Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04, August 2009. 73. Wang, Yu, Gao Cong, Guojie Song and Kunqing Xie. “Community-based greedy algorithm for mining top-K influential nodes in mobile social networks.” KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. July 2010. 74. Widén-Wulff, Gunilla, Stefan Ek, Mariam Ginman, Reija Perttilä, Pia Södergård and AnnaKarin Tötterman. “Information behaviour meets social capital: a conceptual model.” Journal of Information Science , Volume 34 Issue 3. June 2008. 75. White, Su, et al., “Negotiating the Web Science Curriculum through Shared Educational Artefacts,” Proceedings of the 3rd International Conference on Web Science, June 4-17, 2011. 76. Xiaodong, Zhao, Guo Weiwei and Mark Greeven. “An Empirical Study on the Relationship Between Entrepreneur's Social Network and Entrepreneurial Performance: The Case of the Chinese IT Industry”. IFITA '10: Proceedings of the 2010 International Forum on Information Technology and Applications - Volume 03, Volume 03. July 2010. 77. Ye, Qi, Bin Wu, Yuan Gao and Bai Wang. “Empirical Analysis and Multiple Level Views in Massive Social Networks.” WI-IAT '10: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, Volume 01. August 2010. 78. Yeung, Ching-man Au and Tomoharu Iwata. “Capturing implicit user influence in online social sharing.” HT '10: Proceedings of the 21st ACM conference on Hypertext and hypermedia. June 2010. 10 Sample Laboratory Experiment #1: Search Engines Hypothesis • Differences in search algorithms cause large differences in search results. Learning Goals We will verify the validity of this hypothesis in this experiment through: • Using different search engines for different types of searches; • Ranking of the “suitability” of three major search engines. • Explore how search engines generate different results based on current events and past events. Resources For this experiment, you will need access to the Internet and use a web browser and a spreadsheet application. Procedure In this experiment you will be ranking the “suitability” of the three major search engines to different types of searches. Two types of queries are to be explored – Current Events (things in the news or recent within the past 48 hours) and Past Events (things from the past – more than one week old). You are to present three different queries (search phrases) to the search engines, and then systematically investigate the suitability of the top five links presented by the search engines. You should produce a one page written report based on the analysis of the collected data linking the “suitability” of the Search Engines to searches on Current events and Past Events or lack thereof. Here are the steps for the experiment: First, we will conduct the experiment using current events. You will repeat this process for Past Events. !" Identify three search phrases on current events. #" Identify with three search engines $" Search these three phrases on three major search engines and write down the top two links presented by the search engines X, Y, Z for each phrase. For example, if your search phrase is “Japanese Tsunami”, you will select the top two links in the result coming from each of X, Y, Z search engines. %" Next calculate how related the webpage is to the search phrase by performing a word count. You may do this by counting the number of occurrences of the search phrase as a whole within the webpage. There should be six of these counts for each search engine – 2 links and 3 search phrases. Total the six numbers to get a value for each search engine. &" Using alexa.com or a suitable utility, obtain the number of visits to each of the web pages over a fixed time period like one week. Again, there are six of these figures. Total the six numbers to get a value for each search engine. '" Using a suitable utility, obtain the number links (reference count) to each of the web pages . Again, there are six of these figures. Total the six numbers to get a value for each search engine. 11 (" Using a suitable utility such as blogsearch.google.com, obtain the blog chatters related to the web pages. There are six of these figures. Total the six numbers to get a value for each search engine. )" Using a spreadsheet, enter the three search engines in one row or column and the corresponding numbers for word count, visits, reference count and blog chatter. Produce a bar chart with these figures for each of the four categories. Repeat the above steps with past events. Analysis/Discussion You should produce a one page written report based on the analysis of the collected data linking the “suitability” of the Search Engines to searches on current events and past events or lack thereof. You are required to submit a word document with the information for the above steps, the spreadsheet tabular data and the bar charts. Grading • Steps 1-7: 10% each (70% total) • Summary Report: 30% 12 Sample Laboratory Experiment #2: Computer Connectivity and Routing Hypothesis End-to-end delay associated with the delivery of a packet on the Internet is a function of variable conditions including the physical distance, the size of the packet, the number of hops, available bandwidth, and the amount of traffic (time of day) between the two end nodes. Learning Goals We will verify the validity of this hypothesis in this experiment through: • Learning the definition of common computer networking terminologies and giving examples; • Using network utilities to learn about how a machine is connected to the rest of the Internet; and • Understanding the effects of physical distance, bandwidth, hop distance, etc. to the end-toend delay associated with the delivery of a packet on the Internet. Resources For this experiment, you will need access to the Internet and the following network utilities: • Ping • DNS Lookup • Traceroute In addition, you will also need a web browser. Procedure 1. Terminologies. Provide a short description of the following terms. a. Host name. Give an example. b. IP address. Give an example. c. Network interface d. Routing table e. Packet f. DNS g. TLD. Give an example. 2. Using Network Utilities. All machines that is connected to the Internet is assigned at least one IP address. Gather the network information for your machine and how it is connected to the Internet using the specified network utility. Answer the following questions. a. Network Information. 1. What is the Hardware Address of your machine’s en0 network interface? 2. What is the IP address assigned to your machine? b. Lookup. Lookup the corresponding Internet (IP) Address for the following host names. Find the similar data for 5 additional host names of your choice. • Network Address 13 • • • • my.emich.edu www.citibank.com Google.com www.cnnic.net.cn c. Ping. Ping the same 10 host names in (2b) by sending 10 pings each. Do the same for the 5 host names you added in Part b above. What is the percentage of packet loss? What are the minimum, average, maximum, and standard deviation of the round-trip times (RTT) in msec? d. Traceroute. How many hops does it take to get to the 10 addresses defined in (2b) from your machine? e. Time-of-day. Repeat parts (c) and (d) above during a different time of day. f. Geolocation. Using a web search engine, find the physical location of the network addresses defined in (2b). Next, using a mapping web site (i.e., Google Maps, MapQuest, etc.), look up the physical distance between your current location and the given network addresses. Analysis/Discussion Based on the data you collected, discuss the relationship to the end-to-end delay associated with the delivery of a packet on the Internet as a function of the physical distance, the size of the packet, the number of hops, available bandwidth, and the amount of traffic (time of day) between the two end nodes. Grading • Terminologies: 20% • Procedure: 40% • Analysis/Discussion: 40% 14 Sample Laboratory Experiment #3: Social User Behavior Learning Goals Here you will study the kinds of social, web-based interactions classified by Malone and Crumlish in their work on the design of social interfaces. Context A is within the “Collaboration” sphere of the “Activities” space, specifically collaborative editing endeavors such as for a Wiki. Context B is within the “Community Management” sphere of the “Communities” space, specifically norms and group moderation. Somewhat similar tasks are carried out across Contexts A and B. Our hypothesis is that despite these surface similarities, there are reliable and discernable differences in the following measures: Number of times a person posts an exclusionary message, pushing someone else out of the group (e.g., “Get lost, troll.”); percentage of posts that contribute directly to the task at hand; and percentage of posts that are affirming or reinforcing of others’ efforts. Resources Web-connected computer, accounts and identities on (A) a Wiki (Wikipedia is fine), and (B) a discussion web site, preferably on a particular topic such as a scientific, political, health, sports, or business issue. Procedures Read users’ posts collected over a period of eight hours in each of Contexts A and B. The time periods compared have to be equal. You can record posts via copying and pasting, printing to PDF, or some other method. Categorize each post as (1) task-oriented (in a discussion, this would be providing objective information or links), (2) exclusionary, (3) affirming, or (4) other. The last category comprises many possible kinds of postings and it is important to count them in order to compute percentages. Analysis and Presentation Compute the relevant percentages and count the numbers of exclusionary posts. Present these using a bar chart or similar in a way that shows any apparent differences between Contexts A and B. Perform t tests on the pairs of proportions and counts. Discussion Questions 1. Were there statistically significant differences between the two contexts on any measures? 2. If there are such, what do you think are the reasons? 3. What other comparisons could be made between these contexts of use (refer to Malone and Crumlish’s work). 4. How do the processes you observed in these posts shape the nature of user communities over time? 15 Grading • Understanding of interaction types: 20% • Completeness of notes and procedures: 20% • Clarity of graphical presentation: 20% • Detail and insight in discussion answers: 20% • Overall quality and style: 20% Notes for faculty There are many other kinds of social interaction that could be studied. It would also be possible to devise a good lab where the nature of interactions over time is investigated, looking at the effects of particular kinds of postings. Besides relatively slowly-changing discussions such as those used above, one could feature Twitter feeds or other more stream-oriented technologies. 16 Sample Laboratory Experiment #4: Data of the Web System The Web is a system originally designed to share information. The shared information is distributed, of varying quality, in varying formats, from varying sources. Understanding the nature of the data is fundamental to developing an understanding the web as a system for scientific study. Unlike chemistry, where the building blocks were painfully discovered, the fundamental building blocks of the web are engineered. However, like chemistry, understanding the building blocks of the web system facilitate discovery of the principles of emergent behavior that is built on those building blocks. Learning Goals • Understand the nature of the data in Web 1.0: HTML • The interaction between data elements: links • Understand the use of the data through experimentation on web crawlers • Understand the concepts of precision and coverage (i.e., the ROC matrix) Resources For students • Computer with Internet connection For laboratory • Two to four web crawlers that vary in their algorithms for crawling and for indexing. These crawlers should be limitable to specified servers. The crawlers may be initialized with a set of desired keywords. • Four related data sets: Given a set of predefined HTML pages, each page has four versions: ! Links are present: ! one does not use <META> tag to aid the web crawlers ! the second does use <META> tag to identify keywords for the crawlers ! Links are removed ! one does not use <META> tag ! the second does use <META> tag ! The data sets must be scored apriori by a human in order for the student to compare results against known correct results. Procedure Run the web crawlers separately over the complete set of data. Collect the following measurements: 1. For each page, was it indexed correctly according to keyword? 2. For each web crawler, how long did it take to index the complete set of data? Analysis and Presentation Quality of indexing: Using the data collected above, analyze the results for precision and coverage in comparison to a previously human scored index. For a crawler, what percentage of its misses should have been hits (a low rate indicates good coverage)? what percentage of its hits 17 were actually misses (a high rate indicates poor precision)? Time of indexing: Using the data collected above, analyze the time to completion for each web crawler against each data set. Discussion Questions 1. What is the nature of the data that will lead to good coverage and precision by a web crawler? 2. How can one tune precision versus coverage? 3. What is the nature of the data that will lead to faster web crawler performance? 4. What happens to the quality of the results when the data set is modified during a crawl? Grading • HTML and links: 10% • Web crawler: 10% • Performance measures (speed, precision, coverage): 40% • Graphical presentation of data: 20% • Discussion questions: 20% 18 Sample Laboratory Experiment #5: Emergent Collectives (Recommender Systems) From the simplest units on the Web, i.e., any object an HTML anchor can reference, combined with a set of user actions, clicks, on those units, a startling result manifests. The users form into groups, intersecting subsets, based on shared clicking (preference) behavior. The objects also form into groups of similar items based on the user behavior. This emergent structure, based on user preferences, has developed into the area of recommender systems. The behavior of a recommender system can be observed by the casual observer through Netflix or Amazon recommendations. This laboratory will investigate the nature of the formation of user groups and object groups. Learning Goals • Understand the formation of groups based on user behavior. • Understand the measurements of groups (cohesiveness, durability) • Understand the effects changing input has on emergent groups. • Understand recommender systems. Resources • Computer with data analysis software such as MatLab, R, Sage. • Toy recommender system, with items, such as movies, and customers. The user has the ability to enter rankings of items for customers. Procedures 1. Given a set of n (n <= 10) items and 2 * m users (m = the number of students in the class), each user will rate every item on 5 star scale (as for Netflix) as themselves and again as a person well known to the student 20 or more years older. 2. Analyze the data to discover disjoint user groups. For each user group, make 1 – 3 recommendations. 3. Now add another n items and rank them. 4. Analyze the new data and the conglomerate data to identify new and modified groups. Again, for each user group, make 1 – 3 recommendations. 5. Analyze the recommendations made based on groups from the perspective of the individual users. How high does each user actually rank a recommendation? Analysis and Presentation The cohesiveness of a group is measured according to the distance between element and the median position of the elements in a k-dimensional space. (k is the number of characteristics being considered by the experiment). Calculate the cohesiveness of the groups. Identify groups that have a high or increasing level of cohesiveness with new preferences. Identify groups with low or decreasing cohesiveness. From the individual user perspective, how attached is he to a particular group? Does the user 19 attachment reflect group cohesiveness? Discussion Questions 1. How do these emergent groups compare to groups in other scientific systems: e.g., tribal units, species, collection of memes. 2. What is the nature of the objects that would lead to stable groups? Fashion-driven, societal norms (e.g., authoritarian versus democratic politics), ... Grading • Data collection: 40% • Graphical presentation of data: 40% • Discussion questions: 20% 20 Sample Laboratory Exercise #6: Distributed Computing Learning Goals In this lab, you will explore the concept of distributed computing and investigate an ongoing DC project. You will be able to understand and document the characteristics of a distributed system (structural research) and analyze and report on an on-going DC project (naturalistic observation). Resources A computer with Internet connection. Procedures 1. Using the textbook and/or an online resource, find the following information about a DC system: a. How are the computers (nodes) connected to each other? b. Do these computers communicate with each other? What does the dedicated server do? c. How do these computers help in solving a problem? d. Is the combined computational power better than a single computer? Supercomputer? How so? e. What is BOINC? 2. Select one on-going DC project, such as, SETI@Home and Folding@Home, and find the following information: a. Number and type of active CPUs contributing to the project b. Server status for the past year c. Top participants and their credits d. BOINC projects with most participants Analysis and Presentation Draw a distributed computing environment and label its parts. Include the following: nodes, dedicated server, network, message passing, ports, and BOINC software locations. Using charts, visually display the information you collected in step 2 of the procedure. One should be able to answer some of the discussions questions, given below, using these displays. Discussion Questions 1. What is the main purpose of distributed computing? 2. How is it different from supercomputing, parallel computing, and cloud computing? 3. Provide answers to the following questions, using the charts: a. What is the type of CPU that is contributing most to the project? b. How many servers have been rejecting connections in the month of March? c. The maximum credit received by an out-of-country participant. d. Any 5 BOINC projects with more than a million users. 21 Grading • DC drawing: 15% • Charts: 30% • Answers to discussion questions: 40% • Overall quality and style: 15% 22