https://cms.psu.edu/profile/PSUCPS/cpsPrintable.asp?cpid=19800

advertisement
SENATE COMMITTEE ON CURRICULAR AFFAIRS
COURSE SUBMISSION AND CONSULTATION FORM
Principal Faculty Member Proposing Course: Aaron Mauro
College: BEHREND COLLEGE
Department or Instructional Area: HUMANITIES AND SOCIAL SCIENCES
College/Academic Unit With Curriculum Responsibility: BEHREND COLLEGE
Type of Proposal: Add
Type of Review: Full
(See Guide to Curricular Procedure for definitions of a full or expedited review.)
Course Designation: (DIGIT 210) Large Scale Text Analysis
Special categories for Undergraduate (001-499) courses
Current listings for existing courses are in bold type. Proposed changes are indicated by the checkboxes.
Proposed Bulletin Listing
Abbreviation
: DIGIT
Number
210
Title
: Large Scale Text Analysis
Abbreviated Title
: TextAnalysis
Credits
: Min: 3 Max: 3
Repeatable
: No
: Course teaches students programmatic and algorithmic techniques and tools for accessing
and analyzing unstructured text.
Prerequisites
: DIGIT 100
Concurrent Courses
:
Cross Listings
:
Does this Course have a Travel Component: No
Description
Course Outline
A brief outline or overview of the course content
The scale of the Internet has fundamentally changed the scope of humanities research. The vast swaths of digital
text available represent a tremendous opportunity for humanities researchers, but the majority of the text on the
web remains unstructured or structured in a way that is not meaningful for academic research. DIGIT 210 teaches
students programmatic and algorithmic techniques and tools for accessing and analyzing unstructured text. This
class is a skills intensive and collaborative learning opportunity for students to build technical ability in a team
setting. Students will learn how to manage, manipulate, and query literary or historical texts with a range of tools
and techniques. The methods taught will evolve in parallel with the best practices of the digital humanities and
technical resources available at each campus in which it is taught. Invariably, however, students will be introduced
to a high-level programming language (i.e.: interpreted rather than compiled) with an emphasis on text analysis.
For example, the R Studio platform, the Python Text Analysis workflow developed by DARIAH-DE (Germany’s
Digital Research Infrastructure for the Arts and Humanities), or Apache’s HiveQL (Hive Query Language)
alongside the Hadoop Distributed File System (HDFS) are just three methods well suited to humanities research.
As tools and techniques for analyzing large quantities of audio, video, and still images improve and become
accessible to non-specialists, the range of unstructured content can evolve alongside this technological development.
This course will emphasize high-level programming languages such as Python, Ruby, or Perl. However, Natural
Language Processing software like the well-known Machine Learning for Language Toolkit (MALLET), Natural
Language ToolKit (NLTK), or Weka also represent an opportunity to apply sophisticated Natural Language
Processing and Machine Learning tools to humanistic inquiry. Students will work to collaborate and troubleshoot
technical problems in groups and learn to access web based forums and communities to solve real world
development problems. Twitter and the class blog will be key forms of participation and will allow students to
generate content and flag concerns within the class. Through a theoretical framing of computation in a humanities
context, students will be asked to propose solutions to humanities based problems. Cultural, historical, and literary
issues related to race, class, and/or gender will guide our readings, in Franco Moretti’s words, “the great unread”
digital content on the web. The content of the course will vary with the specific expertise of the instructor and the
archival, literary, or historical resources available to individual campuses. Students will extend their critical
reflections on culture and technology with a hands-on and project oriented engagement with the issues in class.
While students will all have a shared theoretical and methodological knowledge that will be established during this
class lecture and lab time, students will each work to develop a refined understanding of large scale text analysis.
This course will challenge students to experiment with new techniques and put cultural theory into practice by
generating new datasets.
A listing of the major topics to be covered with an approximate length of time allotted for their discussion
Programming for Humanists
Course Overview
The history of distant reading and the digital humanities
Theory and practice of algorithmic criticism
Evolution of programming languages and technical specifications
Access development forums and communities to support open source technologies
Workflow Testing and Planning
Humanities database research (Gutenburg, HathiTrust, Internet Archive, JSTOR DFR etc.) proposal
Group technical skills evaluation and feasibility study
Planning visualizations and analysis
DH Practice and Tool Use
A selected workflow from the following list (or others, as they become available): Python’s NLTK, MALLET,
Weka, R Studio, HiveQL and HDFS for large scale unstructured text analysis
Understanding Data types and data visualization
Critical Assessment of Prototypes
Reflection on Successes and Failures
Peer assessment and forum support analysis
Contribute workflow to open source development communities
Holistic class assessment to identify synergies
Example Course Timeline:
Week One—History of Computing Text
Navigating the Command Line UNIX
Introduction to regex
Week Two—UNIX Programs
grep
wc
tail/head
awk
Week Three—Introduction to Python I
What is a program?
Debugging in Python
Syntax
Python Interpreter
Week Four—Python II: Programming Methodology and Asking Questions of a Large Corpus
Variables, Expressions, and Statements
String Operations
Conditionals and Recursion
Modulus operators, Boolean expressions, and Logical Operators
Week Five—Python III: Data Structure and Selection
Structuring Humanistic Information and Anticipating Outcomes
Lists and Dictionaries
Week Six—Python IV: Introduction to the Natural Language ToolKit
Language Processing and Python
Accessing Text Corpora and Lexical Resources
Week Seven—Python V: Processing Raw Text
Preprocessing
Week Eight—Python VI: Writing Structured Programs with NLTK
Week Nine—Python VII: Categorizing and Tagging Words
Week Ten—Python VIII: Learning to Classify Text
Extracting Information from Text
Week Eleven—Python IX: Analyzing Sentence Structure
Building Feature Based Grammars
Week Twelve—Python X: Analyzing the Meaning of Sentences
Week Thirteen—Python XI: Managing Linguistic Data
Week Fourteen—Python XII: Visualization Tools
Week Fifteen—Putting it all Together: Presenting Large Scale Text Analysis Projects
Long Course Description:
A succinct stand-alone course description (up to 400 words) to be made available to students through the on-line Bulletin
and Schedule of Courses.
The humanities are undergoing a computational turn. The traditional theories and methods that underwrote the
study of literature, history, and philosophy in the 20th century are now being supplemented with an emphasis on
new computational methods and practices. Because of the proliferation of text on the Internet, this course will teach
computational methods developed by digital humanists and computer scientists for analyzing large collections of
text. Students will be introduced to Natural Language Processing and Machine Learning techniques for
understanding large collections of text and produce argumentative data visualizations. This class will be conducted
in a collaborative lab environment. Intended for students who have completed DIGIT 100, this course will build
upon previous tool-based digital humanities practice by allowing students to propose projects and complete them
during class time. Because of the methodological orientation of the course, the readings will be derived from online
community forums for other developers and programmers. Additionally, some readings will be generated by the
class as the most current online resources are identified. This class assumes that students will possess a basic
understanding of basic markup languages (XML and HTML), UNIX commands, and Regular Expressions (RegEx)
gained in DIGIT 100. Beyond this prerequisite, the only other technical requirement is curiosity and willingness to
work diligently to solve technical problems.
The name(s) of the faculty member(s) responsible for the development of the course
Aaron Mauro
Justification Statement
Instructional, Educational, and Course Objectives
Course Objectives:
This course is the practical and skills based extension of the theoretical and cultural training in DIGIT 100. This
course seeks to guide students to develop programming projects that can be used to answer and complement
humanities based questions and critiques.
Instructional goals and educational objectives include:
1) Goal: To teach high-level interpreted programming languages in a humanities context
Educational objectives: Students should be able to…
-have a firm basis in programming methods and best practices.
-use NLP and Machine Learning to gain access to unstructured text on the web.
-use appropriate technical terms to describe humanities and computing problems.
-find and evaluate development community forums and resources.
-understand the context of technological development of DH.
2) Goal: To encourage students to envision and plan experiments with large cultural corpora
Educational objectives: Students should be able to...
-analyze political and cultural problems with computational techniques.
-persuasively blend a variety of media and tools in academic arguments.
-situate specific examples of technology within historical contexts.
-describe how making or doing things represents critical thought.
3) Goal: To prepare students to contribute meaningfully to the digital humanities discourse
Learning objectives: Students should be able to...
-plan, outline, draft, revise, and edit a critical analysis of large scale textual analysis projects.
-write a purposeful essay on their use of technology.
-discuss technology and culture with sophisticated, appropriate, and persuasive language.
Evaluation Methods
1) Ongoing Blog Posts:
Each week, students will write a short post to the class blog about a reading or project that students find interesting
or useful. The posts may be as long the students like, but a substantial contribution will likely be 100 to 200 words
in length. Students may also consider including links or other content to share with the class.
2) Response Blog Post:
Students are required to respond to a classmate’s blog post once during the semester. Students must respond with
academic professionalism, critical insight, encouragement, and support. This is a forum for students to
commiserate, congratulate, and postulate. Students may answer questions that have been asked or you may ask
questions that need asking. The instructor will moderate or interject if needed. Students may comment on other
comments.
3) Attendance and Participation:
This class is designed to give students the opportunity to build a prototype as an expression of creative or critical
thought. The course will place an emphasis on using digital tools for cultural critique, but will also place an
emphasis on collaboration. Students are expected to attend class, to be on time, and to be ready to engage with class
material.
4) Prototype Proposal:
Students will propose a digital project by accounting for its purpose, method, feasibility, and proposed outcomes.
They will select a programming language, find or create relevant data for analysis, and propose the best method to
answer a research question.
5) Humanities based Prototype:
Large projects can be completed independently or in groups. Each group will consult with the instructor in person
on an ongoing basis to discuss the breadth and direction of the project. The size and scope of the project will be
proportional to the number of members in your group. Students working in groups must also submit a short email
detailing your experiences in the group. These comments are private and are meant to offer a space to reflect on
your collaborative experience. The final assignment will be submitted in the form of functioning code package
complete with a "README" file describing the outcomes of their project. All code will be validated according to
current standards and in-line commenting will be assessed for clarity and accuracy.
Relationship/Linkage of Course to Other Courses
DIGIT 210 is the introductory programming skills and data curation based extension of DIGIT 100 and DIGIT
110. Students will extend the practical and methodological basis of digital humanities research by taking on large
scale cultural analysis. Students will learn to parse and query the unstructured text that makes up great swaths of
the cultural content on the web. DIGIT 210 will teach students how to handle cultural information through the
same open source tools and software used by developers and programmers. DIGIT 210 offers students algorithmic
methods of accessing and querying unstructured text through programming languages, libraries, and software
packages.
Relationship of Course to Major, Option, Minor, or General Education
DIGIT 210 may have broad appeal across the campus. It is particularly suited to cultural critics and historians
interested in contemporary digital culture. However, this course would be an asset for anyone dealing with large
unstructured datasets. For example, these skills would be well suited to anyone with plans of entering law or
business school.
A description of any special facilities
Multi-User Computer Lab
Frequency of Offering and Enrollment
Annually
Effective Date: Fall 2015
Consultation Summary/Response:
This final note describes to the ways we have addressed the non-concur votes in the consultation process.
1) As Lynette Kvasny mentions in her comment, she expressed that we felt unqualified to assess this class. She requested
to be removed from the consultation process, but the CSCS system does not allow for this to occur once the review
process has begun.
2) Scott Bennett expressed several concerns in his comments. I addressed each of his concerns in my response, and I
updated the proposals to include a course content completion timeline. The timeline can be found in section B.2 above.
Formal Consultation
Name:
Position:
Lynette Kvasny
Formal Consultant
Department: INFO SCIENCES & TECH
Campus:
UNIVERSITY PARK
CAMPUS
Title:
ASSOC PROF OF IST
Concur:No, This Proposal Needs Significant Changes
(1)
Comments: I do not have the expertise to evaluate this course, and would like to abstain from reviewing.
Please remove me as a reviewer.
Reviewed On: 9/9/2014 12:49:00 PM
Response: On 9/16/2014 4:03:22 PM Aaron Mauro Responded: Dear Lynette, No problem. Thank you for your time
thus far. We are working to have you taken out of the system for these courses. All best, Aaron
Name:
Position:
Rod Troester
Formal Consultant
Title:
associate professor
Department: DIV HUMAN & SOC SCI
Campus:
PENN STATE ERIE, THE
BEHREND COLLEGE
(2)
Name:
Position:
Title:
Concur:Yes
Comments:
Reviewed On: 9/11/2014 10:49:00 AM
Scott Bennett
Formal Consultant
Department: POLITICAL SCIENCE
Campus:
UNIVERSITY PARK
CAMPUS
Distinguished Professor OF POL SCIENCE
Concur:No, This Proposal Needs Significant Changes
(3)
Comments: -- There is no time allotment for the different parts of the course. I cannot tell if the content and
volume of material is appropriate for a semester course.
-- I do not understand why this (primarily) technical course is part of the Humanities/Social Sciences
department. Shouldn't this be a computer science course? Large scale text analysis as a topic is something
being doing in computer science and social science departments, at a much more technical/programming level.
-- It seems like at one point there might have been an earlier title (buried in the proposal) of "Programming for
Humanists." That might be a more accurate title. Or "text analysis for humanists." If it is just text analysis, that
seems not humanities specific.
-- The outline looks like it is primarily technical/technique. However, if the course is teaching the technical
side, then why is it for humanists? The description talks about using the methods to study race, gender, etc.,
but there is nothing in the outline to suggest (say) a final project tied to a humanities discipline, or a
comparative analysis of text, say.
-- More detail of the outline should be provided, with time, and a clearer link between the technical content
and why it is a humanities course in the outline or assignments would help clarify the course.
Reviewed On: 9/17/2014 11:27:00 AM
Response: On 9/21/2014 1:12:30 PM Aaron Mauro Responded: Dear Scott, Thank you for your comment. Let me take
the opportunity to offer some additional material for you to consider. Naturally, my hope is that you’ll revise
your non-concur of our major program. I'll list my response to your concerns below. If you have any questions,
please feel free to email me directly (mauro@psu.edu) or call my office line (814-898-6394). 1) How can we
be sure there is enough time?: This course has been modeled on several long running programs in the digital
humanities community. I have several years experience working with the Digital Humanities Summer
Institute. It is the longest running DH training center in the world and has had a transformative influence on
the field. You can find the website and course descriptions here: http://dhsi.org/courses.php. I suggest you
look through the "Text Encoding Fundamentals and their Application” and the "Fundamentals of
Programming/Coding for Human(s|ists).” The form and timing of these courses (and courses like them taught
all over the world) has been well calibrated. 2) Why are humanists teaching methodologically focused
courses?: The digital humanities has always been methodologically focused. Simply because there is
methodological overlap with other disciplines (i.e.: Computer Science), does not preclude the use of
computation in the humanities. Computational methods must be taught in a humanities context because the
problems solved with these tools is very different than Computer Science, Physics, or Mathmatics. Simply put,
the way we approach problems and find solutions requires humanistic expertise, and the most competent
programmer/encoder cannot answer questions in the humanities without robust training in humanistic critique.
While it is true that the humanities has long been invested in the practice of close reading and critical writing,
the humanities makes no claim upon them as methodological practices within the university and acknowledges
that other disciplines have specific use cases for reading and writing. The artificial divisions between technical
fields and fields with soft skills (like cultural critique) are precisely discursive divisions we hope to break
down. As an example of how this is functioning today, I will list several examples of successful classes below.
3) Shouldn't the title include "the humanities" somewhere?: If you should require examples of other courses
taught by leaders in the field, I would recommend you look to Stephen Ramsay's "Digital Humanities:
Development and Design" at UNL: http://jetson.unl.edu/syllabi/2014/fall/dh/index.html. Ramsay's course is an
excellent model for the kind of work we will be doing in DIGIT210. You may also wish to consult Laura
Mandell's excellent TEI/XSLT course at TAMU. You can find the link here:
http://idhmc.tamu.edu/chat/programming4HUManists/XSLTClassSchedule.html. You'll see that her course,
which is one of the first of its kind, is indeed called Programming for Humanists. As has been common in the
field, many have given nods to Dr. Mandell for her pioneering work. I included the reference out of respect
and a sense of honoring our discursive legacy. While I appreciate that the title of the course can be cause for
concern, its place within the School of Humanities and Social Science will distinguish it from overlapping
courses in CS. In all honesty, adding "for humanists" to the title seemed redundant, but it may prove necessary
to respect a more conservative definition of the faculties. 4) Why is the course lacking detailed description of
content?: Like all courses in the humanities, the actual selection of texts is largely dependent on the instructor.
Realistically, any moderately sized corpus would suffice for students to encode in TEI or analyze with
computational methods. An instructor may have an interest in the journalistic output after the 9/11 attacks, the
letters of Ralph Waldo Emerson, or the Shakespeare's later romances. In any case, the methodological basis of
their research would remain fairly similar. The questions asked would vary depending on the particular
instructor's expertise. While TEI has a more rigid schema, the ontologies (that is the plan by which researchers
mark up the text) could vary a great deal. An emphasis on gender may be appropriate for a study of
Shakespeare, whereas an emphasis on tagging vocabulary relating to racial profiling and political jargon may
be important with regard to the journalistic output after 9/11. As I hope you can see, an overly prescript course
description may limit the natural breadth and interdisciplinarity of these courses. Finally, Scott, I want to
welcome your comments on my responses. The digital humanities is a field that contains multitudes. There are
multitudes of methods, and multitudes of research topics. I suspect it is so widely misunderstood because it is
simply so variable. At its heart, however, is a simple and unyielding desire to leverage computational tools to
answer humanistic questions. As I mentioned at the opening of this response, please feel free to email me
directly (mauro@psu.edu) or call my office line (814-898-6394). Kind regards, Aaron
Name:
Position:
Mary Beth Rosson
Formal Consultant
Title:
PROFESSOR AND ASSOC DEAN INFO SCI & TEC
Concur:Yes
Comments:
Reviewed On: 9/22/2014 11:17:00 AM
(4)
Department: INFO SCIENCES & TECH
Campus:
UNIVERSITY PARK
CAMPUS
Name:
Christopher Long
Position:
Formal Consultant
Title:
ASSOC DEAN FOR GR and UG Education
Concur:Yes
Comments: (Approved By Default - Exceeded Two Week Time Limit)
Reviewed On: 9/24/2014 2:50:00 AM
(5)
Department: LIBERAL ARTS
ADMINISTRATION
Campus:
UNIVERSITY PARK
CAMPUS
Name:
Position:
Graeme Sullivan
Formal Consultant
Title:
DIRECTOR
Concur:Yes
Comments: (Approved By Default - Exceeded Two Week Time Limit)
Reviewed On: 9/24/2014 2:50:00 AM
(6)
Department: SCHOOL OF VISUAL ART
Campus:
UNIVERSITY PARK
CAMPUS
Name:
Position:
Maura Shea
Formal Consultant
Title:
Assoc. Dept Head, F-V & MS
Concur:Yes
Comments: (Approved By Default - Exceeded Two Week Time Limit)
Reviewed On: 9/24/2014 2:50:00 AM
(7)
Department: FILM/VIDEO
Campus:
UNIVERSITY PARK
CAMPUS
Name:
Position:
Title:
(8)
Mariel O Harden
Formal Consultant
Department:
Campus:
Concur:Yes
Comments: (Approved By Default - Exceeded Two Week Time Limit)
Reviewed On: 9/24/2014 2:50:00 AM
Name:
Meng Su
Position:
Title:
Formal Consultant
ASSOC PROF CMPSC/SFTW EN
Concur:Yes
Comments: (Approved By Default - Exceeded Two Week Time Limit)
Reviewed On: 9/24/2014 2:50:00 AM
(9)
Department: THE SCHOOL OF
ENGINEERING
Campus:
BEHREND
Name:
Position:
Matthew Jackson
Formal Consultant
Title:
ASSOC PROF DEP HD TELECOM
Concur:Yes
Comments: (Approved By Default - Exceeded Two Week Time Limit)
Reviewed On: 9/24/2014 2:50:00 AM
(10)
Department: COMMUNICATIONS
Campus:
UNIVERSITY PARK
CAMPUS
Name:
Position:
Robert Speel
Formal Consultant
Title:
ASSOC PROF POL SCI
Concur:Yes
Comments: (Approved By Default - Exceeded Two Week Time Limit)
Reviewed On: 9/24/2014 2:50:00 AM
(11)
Department: DIV HUMAN & SOC SCI
Campus:
PENN STATE ERIE, THE
BEHREND COLLEGE
Name:
Position:
Rob Speel
Per Request of College Administrator
Title:
ASSOC PROF POL SCI
Concur:Yes
Comments: The School of Humanities and Social Sciences Academic Program and Policy Committee
recommended some revisions to an earlier version of this proposal, and the recommended revisions have been
made. The Committee unanimously approves this proposal.
Reviewed On: 11/12/2014 12:32:00 AM
(12)
Department: DIV HUMAN & SOC SCI
Campus:
PENN STATE ERIE, THE
BEHREND COLLEGE
Required Signatories
Name:
Position:
Steven Hicks
Head of Department
Department: (Not Available)
Campus:
(Not Available)
Title:
(Not Available)
Concur:Not Yet Reviewed
Comments: Not Yet Reviewed
Reviewed On: Not Yet Reviewed
Name:
Position:
Title:
Rodney Troester
College Representative
(Not Available)
Concur:Not Yet Reviewed
Comments: Not Yet Reviewed
Reviewed On: Not Yet Reviewed
Department: (Not Available)
Campus:
(Not Available)
Name:
Position:
Title:
Dawn Blasko
Dean of the College
(Not Available)
Concur:Not Yet Reviewed
Comments: Not Yet Reviewed
Reviewed On: Not Yet Reviewed
Department: (Not Available)
Campus:
(Not Available)
Name:
Position:
Title:
[Name Not Specified]
Faculty Senate
(Not Available)
Concur:Not Yet Reviewed
Comments: Not Yet Reviewed
Reviewed On: Not Yet Reviewed
Department: (Not Available)
Campus:
(Not Available)
Concur:Not Yet Reviewed
Comments: Not Yet Reviewed
Reviewed On: Not Yet Reviewed
Bluebook Number:
Approval Date:
ProposalID: 19800
Close
Download