SENATE COMMITTEE ON CURRICULAR AFFAIRS COURSE SUBMISSION AND CONSULTATION FORM Principal Faculty Member Proposing Course: Aaron Mauro College: BEHREND COLLEGE Department or Instructional Area: HUMANITIES AND SOCIAL SCIENCES College/Academic Unit With Curriculum Responsibility: BEHREND COLLEGE Type of Proposal: Add Type of Review: Full (See Guide to Curricular Procedure for definitions of a full or expedited review.) Course Designation: (DIGIT 210) Large Scale Text Analysis Special categories for Undergraduate (001-499) courses Current listings for existing courses are in bold type. Proposed changes are indicated by the checkboxes. Proposed Bulletin Listing Abbreviation : DIGIT Number 210 Title : Large Scale Text Analysis Abbreviated Title : TextAnalysis Credits : Min: 3 Max: 3 Repeatable : No : Course teaches students programmatic and algorithmic techniques and tools for accessing and analyzing unstructured text. Prerequisites : DIGIT 100 Concurrent Courses : Cross Listings : Does this Course have a Travel Component: No Description Course Outline A brief outline or overview of the course content The scale of the Internet has fundamentally changed the scope of humanities research. The vast swaths of digital text available represent a tremendous opportunity for humanities researchers, but the majority of the text on the web remains unstructured or structured in a way that is not meaningful for academic research. DIGIT 210 teaches students programmatic and algorithmic techniques and tools for accessing and analyzing unstructured text. This class is a skills intensive and collaborative learning opportunity for students to build technical ability in a team setting. Students will learn how to manage, manipulate, and query literary or historical texts with a range of tools and techniques. The methods taught will evolve in parallel with the best practices of the digital humanities and technical resources available at each campus in which it is taught. Invariably, however, students will be introduced to a high-level programming language (i.e.: interpreted rather than compiled) with an emphasis on text analysis. For example, the R Studio platform, the Python Text Analysis workflow developed by DARIAH-DE (Germany’s Digital Research Infrastructure for the Arts and Humanities), or Apache’s HiveQL (Hive Query Language) alongside the Hadoop Distributed File System (HDFS) are just three methods well suited to humanities research. As tools and techniques for analyzing large quantities of audio, video, and still images improve and become accessible to non-specialists, the range of unstructured content can evolve alongside this technological development. This course will emphasize high-level programming languages such as Python, Ruby, or Perl. However, Natural Language Processing software like the well-known Machine Learning for Language Toolkit (MALLET), Natural Language ToolKit (NLTK), or Weka also represent an opportunity to apply sophisticated Natural Language Processing and Machine Learning tools to humanistic inquiry. Students will work to collaborate and troubleshoot technical problems in groups and learn to access web based forums and communities to solve real world development problems. Twitter and the class blog will be key forms of participation and will allow students to generate content and flag concerns within the class. Through a theoretical framing of computation in a humanities context, students will be asked to propose solutions to humanities based problems. Cultural, historical, and literary issues related to race, class, and/or gender will guide our readings, in Franco Moretti’s words, “the great unread” digital content on the web. The content of the course will vary with the specific expertise of the instructor and the archival, literary, or historical resources available to individual campuses. Students will extend their critical reflections on culture and technology with a hands-on and project oriented engagement with the issues in class. While students will all have a shared theoretical and methodological knowledge that will be established during this class lecture and lab time, students will each work to develop a refined understanding of large scale text analysis. This course will challenge students to experiment with new techniques and put cultural theory into practice by generating new datasets. A listing of the major topics to be covered with an approximate length of time allotted for their discussion Programming for Humanists Course Overview The history of distant reading and the digital humanities Theory and practice of algorithmic criticism Evolution of programming languages and technical specifications Access development forums and communities to support open source technologies Workflow Testing and Planning Humanities database research (Gutenburg, HathiTrust, Internet Archive, JSTOR DFR etc.) proposal Group technical skills evaluation and feasibility study Planning visualizations and analysis DH Practice and Tool Use A selected workflow from the following list (or others, as they become available): Python’s NLTK, MALLET, Weka, R Studio, HiveQL and HDFS for large scale unstructured text analysis Understanding Data types and data visualization Critical Assessment of Prototypes Reflection on Successes and Failures Peer assessment and forum support analysis Contribute workflow to open source development communities Holistic class assessment to identify synergies Example Course Timeline: Week One—History of Computing Text Navigating the Command Line UNIX Introduction to regex Week Two—UNIX Programs grep wc tail/head awk Week Three—Introduction to Python I What is a program? Debugging in Python Syntax Python Interpreter Week Four—Python II: Programming Methodology and Asking Questions of a Large Corpus Variables, Expressions, and Statements String Operations Conditionals and Recursion Modulus operators, Boolean expressions, and Logical Operators Week Five—Python III: Data Structure and Selection Structuring Humanistic Information and Anticipating Outcomes Lists and Dictionaries Week Six—Python IV: Introduction to the Natural Language ToolKit Language Processing and Python Accessing Text Corpora and Lexical Resources Week Seven—Python V: Processing Raw Text Preprocessing Week Eight—Python VI: Writing Structured Programs with NLTK Week Nine—Python VII: Categorizing and Tagging Words Week Ten—Python VIII: Learning to Classify Text Extracting Information from Text Week Eleven—Python IX: Analyzing Sentence Structure Building Feature Based Grammars Week Twelve—Python X: Analyzing the Meaning of Sentences Week Thirteen—Python XI: Managing Linguistic Data Week Fourteen—Python XII: Visualization Tools Week Fifteen—Putting it all Together: Presenting Large Scale Text Analysis Projects Long Course Description: A succinct stand-alone course description (up to 400 words) to be made available to students through the on-line Bulletin and Schedule of Courses. The humanities are undergoing a computational turn. The traditional theories and methods that underwrote the study of literature, history, and philosophy in the 20th century are now being supplemented with an emphasis on new computational methods and practices. Because of the proliferation of text on the Internet, this course will teach computational methods developed by digital humanists and computer scientists for analyzing large collections of text. Students will be introduced to Natural Language Processing and Machine Learning techniques for understanding large collections of text and produce argumentative data visualizations. This class will be conducted in a collaborative lab environment. Intended for students who have completed DIGIT 100, this course will build upon previous tool-based digital humanities practice by allowing students to propose projects and complete them during class time. Because of the methodological orientation of the course, the readings will be derived from online community forums for other developers and programmers. Additionally, some readings will be generated by the class as the most current online resources are identified. This class assumes that students will possess a basic understanding of basic markup languages (XML and HTML), UNIX commands, and Regular Expressions (RegEx) gained in DIGIT 100. Beyond this prerequisite, the only other technical requirement is curiosity and willingness to work diligently to solve technical problems. The name(s) of the faculty member(s) responsible for the development of the course Aaron Mauro Justification Statement Instructional, Educational, and Course Objectives Course Objectives: This course is the practical and skills based extension of the theoretical and cultural training in DIGIT 100. This course seeks to guide students to develop programming projects that can be used to answer and complement humanities based questions and critiques. Instructional goals and educational objectives include: 1) Goal: To teach high-level interpreted programming languages in a humanities context Educational objectives: Students should be able to… -have a firm basis in programming methods and best practices. -use NLP and Machine Learning to gain access to unstructured text on the web. -use appropriate technical terms to describe humanities and computing problems. -find and evaluate development community forums and resources. -understand the context of technological development of DH. 2) Goal: To encourage students to envision and plan experiments with large cultural corpora Educational objectives: Students should be able to... -analyze political and cultural problems with computational techniques. -persuasively blend a variety of media and tools in academic arguments. -situate specific examples of technology within historical contexts. -describe how making or doing things represents critical thought. 3) Goal: To prepare students to contribute meaningfully to the digital humanities discourse Learning objectives: Students should be able to... -plan, outline, draft, revise, and edit a critical analysis of large scale textual analysis projects. -write a purposeful essay on their use of technology. -discuss technology and culture with sophisticated, appropriate, and persuasive language. Evaluation Methods 1) Ongoing Blog Posts: Each week, students will write a short post to the class blog about a reading or project that students find interesting or useful. The posts may be as long the students like, but a substantial contribution will likely be 100 to 200 words in length. Students may also consider including links or other content to share with the class. 2) Response Blog Post: Students are required to respond to a classmate’s blog post once during the semester. Students must respond with academic professionalism, critical insight, encouragement, and support. This is a forum for students to commiserate, congratulate, and postulate. Students may answer questions that have been asked or you may ask questions that need asking. The instructor will moderate or interject if needed. Students may comment on other comments. 3) Attendance and Participation: This class is designed to give students the opportunity to build a prototype as an expression of creative or critical thought. The course will place an emphasis on using digital tools for cultural critique, but will also place an emphasis on collaboration. Students are expected to attend class, to be on time, and to be ready to engage with class material. 4) Prototype Proposal: Students will propose a digital project by accounting for its purpose, method, feasibility, and proposed outcomes. They will select a programming language, find or create relevant data for analysis, and propose the best method to answer a research question. 5) Humanities based Prototype: Large projects can be completed independently or in groups. Each group will consult with the instructor in person on an ongoing basis to discuss the breadth and direction of the project. The size and scope of the project will be proportional to the number of members in your group. Students working in groups must also submit a short email detailing your experiences in the group. These comments are private and are meant to offer a space to reflect on your collaborative experience. The final assignment will be submitted in the form of functioning code package complete with a "README" file describing the outcomes of their project. All code will be validated according to current standards and in-line commenting will be assessed for clarity and accuracy. Relationship/Linkage of Course to Other Courses DIGIT 210 is the introductory programming skills and data curation based extension of DIGIT 100 and DIGIT 110. Students will extend the practical and methodological basis of digital humanities research by taking on large scale cultural analysis. Students will learn to parse and query the unstructured text that makes up great swaths of the cultural content on the web. DIGIT 210 will teach students how to handle cultural information through the same open source tools and software used by developers and programmers. DIGIT 210 offers students algorithmic methods of accessing and querying unstructured text through programming languages, libraries, and software packages. Relationship of Course to Major, Option, Minor, or General Education DIGIT 210 may have broad appeal across the campus. It is particularly suited to cultural critics and historians interested in contemporary digital culture. However, this course would be an asset for anyone dealing with large unstructured datasets. For example, these skills would be well suited to anyone with plans of entering law or business school. A description of any special facilities Multi-User Computer Lab Frequency of Offering and Enrollment Annually Effective Date: Fall 2015 Consultation Summary/Response: This final note describes to the ways we have addressed the non-concur votes in the consultation process. 1) As Lynette Kvasny mentions in her comment, she expressed that we felt unqualified to assess this class. She requested to be removed from the consultation process, but the CSCS system does not allow for this to occur once the review process has begun. 2) Scott Bennett expressed several concerns in his comments. I addressed each of his concerns in my response, and I updated the proposals to include a course content completion timeline. The timeline can be found in section B.2 above. Formal Consultation Name: Position: Lynette Kvasny Formal Consultant Department: INFO SCIENCES & TECH Campus: UNIVERSITY PARK CAMPUS Title: ASSOC PROF OF IST Concur:No, This Proposal Needs Significant Changes (1) Comments: I do not have the expertise to evaluate this course, and would like to abstain from reviewing. Please remove me as a reviewer. Reviewed On: 9/9/2014 12:49:00 PM Response: On 9/16/2014 4:03:22 PM Aaron Mauro Responded: Dear Lynette, No problem. Thank you for your time thus far. We are working to have you taken out of the system for these courses. All best, Aaron Name: Position: Rod Troester Formal Consultant Title: associate professor Department: DIV HUMAN & SOC SCI Campus: PENN STATE ERIE, THE BEHREND COLLEGE (2) Name: Position: Title: Concur:Yes Comments: Reviewed On: 9/11/2014 10:49:00 AM Scott Bennett Formal Consultant Department: POLITICAL SCIENCE Campus: UNIVERSITY PARK CAMPUS Distinguished Professor OF POL SCIENCE Concur:No, This Proposal Needs Significant Changes (3) Comments: -- There is no time allotment for the different parts of the course. I cannot tell if the content and volume of material is appropriate for a semester course. -- I do not understand why this (primarily) technical course is part of the Humanities/Social Sciences department. Shouldn't this be a computer science course? Large scale text analysis as a topic is something being doing in computer science and social science departments, at a much more technical/programming level. -- It seems like at one point there might have been an earlier title (buried in the proposal) of "Programming for Humanists." That might be a more accurate title. Or "text analysis for humanists." If it is just text analysis, that seems not humanities specific. -- The outline looks like it is primarily technical/technique. However, if the course is teaching the technical side, then why is it for humanists? The description talks about using the methods to study race, gender, etc., but there is nothing in the outline to suggest (say) a final project tied to a humanities discipline, or a comparative analysis of text, say. -- More detail of the outline should be provided, with time, and a clearer link between the technical content and why it is a humanities course in the outline or assignments would help clarify the course. Reviewed On: 9/17/2014 11:27:00 AM Response: On 9/21/2014 1:12:30 PM Aaron Mauro Responded: Dear Scott, Thank you for your comment. Let me take the opportunity to offer some additional material for you to consider. Naturally, my hope is that you’ll revise your non-concur of our major program. I'll list my response to your concerns below. If you have any questions, please feel free to email me directly (mauro@psu.edu) or call my office line (814-898-6394). 1) How can we be sure there is enough time?: This course has been modeled on several long running programs in the digital humanities community. I have several years experience working with the Digital Humanities Summer Institute. It is the longest running DH training center in the world and has had a transformative influence on the field. You can find the website and course descriptions here: http://dhsi.org/courses.php. I suggest you look through the "Text Encoding Fundamentals and their Application” and the "Fundamentals of Programming/Coding for Human(s|ists).” The form and timing of these courses (and courses like them taught all over the world) has been well calibrated. 2) Why are humanists teaching methodologically focused courses?: The digital humanities has always been methodologically focused. Simply because there is methodological overlap with other disciplines (i.e.: Computer Science), does not preclude the use of computation in the humanities. Computational methods must be taught in a humanities context because the problems solved with these tools is very different than Computer Science, Physics, or Mathmatics. Simply put, the way we approach problems and find solutions requires humanistic expertise, and the most competent programmer/encoder cannot answer questions in the humanities without robust training in humanistic critique. While it is true that the humanities has long been invested in the practice of close reading and critical writing, the humanities makes no claim upon them as methodological practices within the university and acknowledges that other disciplines have specific use cases for reading and writing. The artificial divisions between technical fields and fields with soft skills (like cultural critique) are precisely discursive divisions we hope to break down. As an example of how this is functioning today, I will list several examples of successful classes below. 3) Shouldn't the title include "the humanities" somewhere?: If you should require examples of other courses taught by leaders in the field, I would recommend you look to Stephen Ramsay's "Digital Humanities: Development and Design" at UNL: http://jetson.unl.edu/syllabi/2014/fall/dh/index.html. Ramsay's course is an excellent model for the kind of work we will be doing in DIGIT210. You may also wish to consult Laura Mandell's excellent TEI/XSLT course at TAMU. You can find the link here: http://idhmc.tamu.edu/chat/programming4HUManists/XSLTClassSchedule.html. You'll see that her course, which is one of the first of its kind, is indeed called Programming for Humanists. As has been common in the field, many have given nods to Dr. Mandell for her pioneering work. I included the reference out of respect and a sense of honoring our discursive legacy. While I appreciate that the title of the course can be cause for concern, its place within the School of Humanities and Social Science will distinguish it from overlapping courses in CS. In all honesty, adding "for humanists" to the title seemed redundant, but it may prove necessary to respect a more conservative definition of the faculties. 4) Why is the course lacking detailed description of content?: Like all courses in the humanities, the actual selection of texts is largely dependent on the instructor. Realistically, any moderately sized corpus would suffice for students to encode in TEI or analyze with computational methods. An instructor may have an interest in the journalistic output after the 9/11 attacks, the letters of Ralph Waldo Emerson, or the Shakespeare's later romances. In any case, the methodological basis of their research would remain fairly similar. The questions asked would vary depending on the particular instructor's expertise. While TEI has a more rigid schema, the ontologies (that is the plan by which researchers mark up the text) could vary a great deal. An emphasis on gender may be appropriate for a study of Shakespeare, whereas an emphasis on tagging vocabulary relating to racial profiling and political jargon may be important with regard to the journalistic output after 9/11. As I hope you can see, an overly prescript course description may limit the natural breadth and interdisciplinarity of these courses. Finally, Scott, I want to welcome your comments on my responses. The digital humanities is a field that contains multitudes. There are multitudes of methods, and multitudes of research topics. I suspect it is so widely misunderstood because it is simply so variable. At its heart, however, is a simple and unyielding desire to leverage computational tools to answer humanistic questions. As I mentioned at the opening of this response, please feel free to email me directly (mauro@psu.edu) or call my office line (814-898-6394). Kind regards, Aaron Name: Position: Mary Beth Rosson Formal Consultant Title: PROFESSOR AND ASSOC DEAN INFO SCI & TEC Concur:Yes Comments: Reviewed On: 9/22/2014 11:17:00 AM (4) Department: INFO SCIENCES & TECH Campus: UNIVERSITY PARK CAMPUS Name: Christopher Long Position: Formal Consultant Title: ASSOC DEAN FOR GR and UG Education Concur:Yes Comments: (Approved By Default - Exceeded Two Week Time Limit) Reviewed On: 9/24/2014 2:50:00 AM (5) Department: LIBERAL ARTS ADMINISTRATION Campus: UNIVERSITY PARK CAMPUS Name: Position: Graeme Sullivan Formal Consultant Title: DIRECTOR Concur:Yes Comments: (Approved By Default - Exceeded Two Week Time Limit) Reviewed On: 9/24/2014 2:50:00 AM (6) Department: SCHOOL OF VISUAL ART Campus: UNIVERSITY PARK CAMPUS Name: Position: Maura Shea Formal Consultant Title: Assoc. Dept Head, F-V & MS Concur:Yes Comments: (Approved By Default - Exceeded Two Week Time Limit) Reviewed On: 9/24/2014 2:50:00 AM (7) Department: FILM/VIDEO Campus: UNIVERSITY PARK CAMPUS Name: Position: Title: (8) Mariel O Harden Formal Consultant Department: Campus: Concur:Yes Comments: (Approved By Default - Exceeded Two Week Time Limit) Reviewed On: 9/24/2014 2:50:00 AM Name: Meng Su Position: Title: Formal Consultant ASSOC PROF CMPSC/SFTW EN Concur:Yes Comments: (Approved By Default - Exceeded Two Week Time Limit) Reviewed On: 9/24/2014 2:50:00 AM (9) Department: THE SCHOOL OF ENGINEERING Campus: BEHREND Name: Position: Matthew Jackson Formal Consultant Title: ASSOC PROF DEP HD TELECOM Concur:Yes Comments: (Approved By Default - Exceeded Two Week Time Limit) Reviewed On: 9/24/2014 2:50:00 AM (10) Department: COMMUNICATIONS Campus: UNIVERSITY PARK CAMPUS Name: Position: Robert Speel Formal Consultant Title: ASSOC PROF POL SCI Concur:Yes Comments: (Approved By Default - Exceeded Two Week Time Limit) Reviewed On: 9/24/2014 2:50:00 AM (11) Department: DIV HUMAN & SOC SCI Campus: PENN STATE ERIE, THE BEHREND COLLEGE Name: Position: Rob Speel Per Request of College Administrator Title: ASSOC PROF POL SCI Concur:Yes Comments: The School of Humanities and Social Sciences Academic Program and Policy Committee recommended some revisions to an earlier version of this proposal, and the recommended revisions have been made. The Committee unanimously approves this proposal. Reviewed On: 11/12/2014 12:32:00 AM (12) Department: DIV HUMAN & SOC SCI Campus: PENN STATE ERIE, THE BEHREND COLLEGE Required Signatories Name: Position: Steven Hicks Head of Department Department: (Not Available) Campus: (Not Available) Title: (Not Available) Concur:Not Yet Reviewed Comments: Not Yet Reviewed Reviewed On: Not Yet Reviewed Name: Position: Title: Rodney Troester College Representative (Not Available) Concur:Not Yet Reviewed Comments: Not Yet Reviewed Reviewed On: Not Yet Reviewed Department: (Not Available) Campus: (Not Available) Name: Position: Title: Dawn Blasko Dean of the College (Not Available) Concur:Not Yet Reviewed Comments: Not Yet Reviewed Reviewed On: Not Yet Reviewed Department: (Not Available) Campus: (Not Available) Name: Position: Title: [Name Not Specified] Faculty Senate (Not Available) Concur:Not Yet Reviewed Comments: Not Yet Reviewed Reviewed On: Not Yet Reviewed Department: (Not Available) Campus: (Not Available) Concur:Not Yet Reviewed Comments: Not Yet Reviewed Reviewed On: Not Yet Reviewed Bluebook Number: Approval Date: ProposalID: 19800 Close