DH - Professor C. Lee Giles

advertisement
IST 511 Information Management: Information
and Technology
Digital Humanities and Research Methods
Dr. C. Lee Giles
David Reese Professor, College of Information Sciences
and Technology
Professor of Computer Science and Engineering
Professor of Supply Chain and Information Systems
The Pennsylvania State University, University Park, PA,
USA
giles@ist.psu.edu
http://clgiles.ist.psu.edu
Special thanks to V. Ryabov,
Today
What are the digital humanities
What are research methods
– Qualitative
– Quantitative
– Computational
Last time:
• Digital libraries
• Scientometics and bibliometrics
Tomorrow
Your research presentations
Digital Humanities
Other names for the digital humanities
• Computational humanities
• Computational archaeology
• Computational history
• etc
• Cultural informatics
Digital Humanities
Humanities
What are the humanities?
Wikipedia
Stanford
National Endowment for the Humanities (NEH)
History of the Digital Humanities
Not that old – 1940’s – start of digitization
“Digitus Dei est hic!”
http://www.corpusthomisticum.org/it/index.age
Hockey’s Consolidation
1970’s –mid-1980’s
New Developments
Mid 1980’s –early 1990’s
http://www.tei-c.org/index.xml
http://www.perseus.tufts.edu/hopper/
Web/Humanities 1.0:
From the few to the many
Web/Humanities 2.0:
From the many to the many
http://vos.ucsb.edu/
http://nines.org/
Film and Media and
Communication
Studies
Film
Media
Communication
Cultural
Feminist
STS
http://www.manovich.net/
Explosion of new groups, communities,
subjects
The Disciplines
Literary Studies
Archeology
Art History
Classical Studies
History
Is this all?
The Arts
Archeology
http://www.cast.uark.edu/other/nps/nadb/
http://www.u.arizona.edu/~mlittler/
http://www.cast.uark.edu/
Art History and The Arts
http://www.vraweb.org/
http://users.ecs.soton.ac.uk/km/projs/
vasari/
http://www.getty.edu
Classical Studies
• Obsolescence and Preservation
http://scriptorium.lib.duke.edu/papyrus/
http://nolli.uoregon.edu/rioni.html
Problematics
http://www.romereborn.virginia.edu/
History
http://valley.vcdh.virginia.edu/
http://ashp.cuny.edu
Accessibility
Teaching and Learning
New Learning Environments
New Subjects
New Pedagogies
Digital Disconnect
Literary Studies
What happens to lit and “literary” in the age of digital tech?
http://www.emilydickinson.org/
http://www.rossettiarchive.org/
http://ted.streamguys.net/ted_rives_mockingbirds_2006.mp3
Thematic of Textuality vs. Visuality
Jerome McGann‘: digital technology
and literary studies written (and
published) between 1993 and 2001.
Episodes in the history of McGann's
engagement with the intellectual
opportunities offered by the
interaction between computer power,
digital technology and literary
Richard Mayer: For hundreds
of years verbal messages have
been the primary means of
explaining ideas to learners.
Although verbal learning offers
a powerful tool for humans,
this book explores ways of
going beyond the purely verbal
Teaching and Learning
Digital Disconnect
Electracy
Oral
Print
Electronic
Examples
http://www.vectorsjournal.org/issues/index.php?issue=5
http://vectors.usc.edu/issues/05_issue/bluevelvet/
http://www.vectorsjournal.org/index.php?page=7&projectId=86
Wayne State Digital
http://www.lib.wayne.edu/resources/digital_library/index.php
Social Networks
From Remediation to Convergence
and Intermediation
HASTAC
http://www.hastac.org/
Digital Antiquity - Mission
Organization devoted to enhancing preservation and
access to digital records of archaeological
investigations
– to permit scholars to more effectively create and
communicate knowledge of the long-term human past;
– to enhance the management of archaeological resources;
and
– to provide for the long-term preservation of irreplaceable
records of archaeological investigations.
We’re Losing the Archaeological Record
Explosion of Digital Information
– >50,000 field projects/year, 1000s of databases
– Primary archaeological data is now “born digital”
Absence of Trusted Repositories
– Few institutions capable of long-term data curation
– Media on which data resides is treated as an artifact
– Standard work flows do not move digital data into trusted repositories
Fragility of Digital Data
– Media degradation & software obsolescence
– Loss of data semantics (metadata)
 We need a trusted digital repository for archaeological
documents and data
Digital Antiquity’s Repository:
tDAR - the Digital Archaeological Record
On-line, trusted digital repository for archaeological data and
documents that
– financially and socially sustainable,
– long-term preservation of data & metadata
– on-line discovery, and access for data and documents produced by
archaeological projects.
– web ingest interface: acquire metadata and user upload of data
Scope
– targets digital products of ongoing research & legacy data
– focus on archival data (not continuously updated databases such as site
files)
– Work of scholars in the US and the Americas more broadly
Digital Antiquity Builds
on the ADS Model
The Archaeology Data Service (ADS) in the UK has a 10 year
track record of success
– ADS is heavily staffed (ca 10FTE), provides a high level of curation
and high quality archive
– ADS provides a refined presentation layer for its projects
– ADS processes a relatively small number of projects (ca 200) each year
at a high unit cost
Digital Antiquity Diverges from ADS
In Order to Scale to the US Situation
50,000 federally mandated cultural resource field projects
conducted each year in the US.
– tDAR aspires to capture the digital data and documents from a substantial
fraction
Implies a different business model
Demands much heavier reliance on users to provide metadata that
make their data meaningful
Requires a user-friendly ingest interface for metadata acquisition
and data upload
Prototype Ingest
Interface
Preservation and Access Requirements
To maintain the utility of data, we must preserve the data
(bits) on a sustainable media, in a sustainable format,
along with their semantics
– Existing coding keys and manuals are inadequate
Cannot require universal coding schemes
– We must employ ontologies to allow naive users to locate
relevant resources.
We must plan for integration of data that employ different
systematics.
– We must collect detailed database metadata (e.g., at the table,
column, and value level)
Need persistent URIs, DOIs
Metadata & Database Semantics
Standardization of original data on deposit is
unacceptable
– We must capture, not transform, original semantics
– Digital coding sheets at dataset registration time
Our representation is not highly abstract but structured
by archaeological practice
On registration, the dataset creator
– associates database codes with dataset labels through a
coding sheet
– and maps coding sheet labels to default (and possible
alternate) ontologies created by material class experts
Modeling Global Societal Evolution Over a Half-Century:
Petascale Humanities Computing
Institute for Computing in the Humanities, Arts, and Social
Science at the University of Illinois and Center Affiliate of the
National Center for Supercomputing Applications
Research Directions
Forecast global stability
Model social group interactions
Gain a better understanding of the underpinnings of global
unrest and how society functions
Quantify the flow of information across the world and how
human societies produce and consume realtime
information
Gain new understanding of the evolution of the civil war
discourse
The Digital Humanities
Very large field, encompasses a tremendous variation in
applications
Focus on the textually-driven humanities, such as history,
journalism, etc
Quantitative Qualitative Computation
Digital humanities requires “Quantitative Qualitative
Computation” – find ways of converting the “latent”
aspects of language into computable numeric indicators
Historically have focused on facts and discarded the rest as
“uncomputable”
More recently, dimensions such as “tone” have become
booming industries (brand mining)
Quantitative Qualitative Computation
VERY computationally expensive
Easy to take Google Ngram dataset and plot frequency of
“democrat” vs “republican” in time to see who gets more
book coverage each year
Gauging which one gets the most POSITIVE coverage,
however, and WHERE that coverage comes from requires
a LOT of computation
Building a Global Map
The map at the start of this presentation visualizes a
geographic cross-section through a much larger dataset: a
petascale network
What does a digital humanities pipeline look like?
Petascale Networks
Start with a petascale network
10 billion actors connected by over 100 trillion relationships
just from a single dataset covering only 30 years
Assuming simple tuple structure: ID,WEIGHT,ID, that’s 8b *
3 = 24 bytes * 100 trillion rows = 2.4PB
Need this all memory-resident for random access across the
ENTIRE dataset
This is just a small pilot dataset
Data is XD
From “Big Data” to “Really Big
Data”
Is XD really “Big Data?”
Total disk of all current production XD systems combined:
12.1PB (Gordon is 1/3 of the entire XD)
If we add all XD tape silos, we get 34.1PB
The entire national allocated research infrastructure is just 12PB
of disk and 22PB of tape!
Microsoft’s Bing search engine uses 150PB of spinning disk
Biggest scientific projects will generate only 10-20TB / day of
data, while Twitter alone produces 28GB of new data a day
and Bing processes 2PB / day
“Really Big Data”
Traditional sciences are “small data” compared with the
information world of news and social media
200 MILLION new tweets a day
1BILLION new Facebook items a day: average person adds 3
items to Facebook every single day
“Really Big becomes REALLY Big”
Social media in particular is vastly outpacing traditional
information sources
Entire New York Times 1945-2005 = 18M articles = 2.9
billion words
5 BILLION words added to Twitter each DAY (almost twice
the total volume of the Times in the last 60 years)
And Even Bigger
HaitiTrust includes Google Books and contains 4% of all
books every printed = 9.4 million digitized works = 3.3
billion pages = 2 trillion words
Estimated 49.5 trillion words ever printed in books over last
600 years
Twitter alone will reach that size in just 27 years with zero
additional growth. With its current rate of tripling post
volume each year, it will take just three years
Storing and Searching Big Text
The BIG World of Text
With scientific datasets, you have data and then you have an
index (HDF + PyTables)
With text the data IS the index
Text is vast and operates at the microlevel of the word
(equivalent to every query searching every pixel of a vast
image archive)
Unstructured
Tons of associated metadata
What is research?
Research work and comparable development work refer
to systematic activity to increase the level of
knowledge and the use of the knowledge to find new
applications.
The essential criterion is whether the activity generates
fundamental new knowledge.
Research could be: basic, applied, and developmental.
Basic research
This refers to such activities to gain new knowledge
which do not primarily aim to practical applications.
Basic research includes, for example, analysis of
qualities, structures and dependencies whose
objectives are to form and test new hypotheses,
theories and scientific regularities.
Furthermore, basic research can be also directed, in
which the results can be expected to result in
significant applications; sometimes, however, only in
the long run.
Applied research
This refers to such activities to gain new knowledge,
which primarily aim to develop specific practical
application.
The purpose of applied research is to deal with questions
of everyday life.
Applied research includes, for example, seeking
applications for findings of basic research, or creation
of new methods and means to solve a specific problem
.
Developmental research
Uses the knowledge gained from research and/or practical
experience to create new materials, products,
manufacturing processes, methods and systems or to
improve existing ones significantly.
It includes, for example, so-called action research which
produces information directly in the situation in which
it is also applied, and any research and development
activities taking place during R&D projects in industry.
Research process
Selecting a topic &
research questions
Selecting research
methods
Writing a research
report/paper/thesis
Analyzing data
Reading for
research
Collecting data
When and why to write down?
Write to remember
Experienced researchers never wait for the end of the
project to start writing. They make a list of sources,
summaries, keeping lab notes, making outlines, etc.
Write to understand
When you arrange or rearrange the results of your research
in new ways, you can discover new connections, contrasts,
complications, and implications.
Write to gain perspective
The basic reason for writing is to get your thoughts out of
your head and onto the paper, where you can see them
more clearly.
What is not a scientific research?
Investigations, which refer to data gathering, editing, and
analyzing for planning or decision processes. Investigations
are usually an actual part of the planning process.
Gathering of the general information. For example:
continuous observations primarily for other reason than the
research, such as hydrological weather observations, the
production of statistics, opinion polls, archaeological
excavations obligated by the law, collecting and arranging
documents, market research, inventory and charting of the
natural resources.
Production of computer applications, unless they are a part
of a research project.
Criteria of good research work
Fertility. The results of the research pose new questions, reveal
new problems and directions for further research.
Relevance. Good research is significant and influential.
Objectivity. Although researchers can freely define the
problems and the hypotheses, the implementation of the
method, as well as the results and reporting, must be
objective.
Verification. It must be possible to examine every research
discovery, test result, measurement and interpreted result
from the point of view of its validity and relevance.
Practicality. Every good study shows opportunities for
practical applications.
Ethical questions of research
Research is a profoundly social activity. Reporting research
connects us not just to those who will use it, but also to
those whose research we used.
Do not plagiarize or claim credit for the results of others.
Do not misreport sources or invent results.
Do not submit data whose accuracy there is a reason to
question, unless you raised the questions.
Do not hide objections you cannot respond to.
Do not caricature or distort opposing views.
Do not destroy or hide sources and data important for those
who follow.
Plagiarism
Plagiarism is the worst thing that can happen to a researcher.
You plagiarize when, intentionally or not, you use someone
else’s words or ideas but fail to credit that person, leading
your readers to think that those words are yours.
Standards for plagiarism could be different in different fields.
Every time you use the exact words of the source:
– type quotation marks before and after them
– record the words exactly as they are in the source
– cite the source
Finding a research topic
Your interests
General topic
Focused topic
Research questions
Twelve issues to keep in mind
How much choice you have
Your motivation
The time you have
available
Regulations and expectations
The cost of research
Your subject or field of study
The resources you have
available
Previous examples of research
projects
The size of the topic
Your need for support
Access issues
Methods for researching
From general to focused topic
The topic is usually too broad if you could state it in four
or five words.
Examples:
– ”Evaluation of user interfaces” – a broad topic
– ”The use of cognitive models for the efficient
development and evaluation of user interfaces” – a
focused topic
Don’t narrow your topic so much that you can’t find
enough data on it.
From topic to research questions
A typical mistake of beginner researchers: they rush from
a topic to immediate data collection.
Readers of research reports don’t want just information –
they want an answer to a question worth asking.
Serious researchers never report data for their own sake
but to support the answer to research questions they
formulated.
Identifying research questions
Research questions stemming from the topic.
Ask predictable questions about the topic, like
who, what, when, where, how and why.
Examples:
– ”How cognitive models are applied to the development
user interfaces?”
– ”Does the use of these model make the interface design
more efficient? In which situations?”
Identifying research questions
For a small-scale research project 2-3 main research
questions are usually enough.
When research questions are right, they should suggest
not just the field of study, but also the methods for
carrying out the research and the kind of analysis
required.
Research questions should be motivated! You have to
explain why they are important.
Group research
Enables you to share responsibility.
Lets you specialize in those aspects of the work to
which you are best suited.
Provides you with useful experience of team working.
Allows you to take on larger-scale topics than you
could otherwise manage.
Provides you with a ready made support network.
May be essential for certain kinds of research.
Individual research
Gives you sole ownership of the research.
Means that you are wholly responsible for the progress and
success of the research.
May result in a more focused project.
The quality of research work is determined by you alone.
Means that you have to carry out all elements of the
research process.
Dimensions of research
How research is used
Basic, applied
Purpose of the study
Exploratory, descriptive,
explanatory, predictive
Cross-sectional, longitudinal (time
series, panel, cohort), case study
The way time enters in
Technique for collecting data:
For quantitative &
Experiments, surveys, content
computational data
analysis, existing statistics,
harvesting
For qualitative data
Field research, historical
comparative research
Purpose of study:
exploratory goals of research
The goal is “to explore”.
Become familiar with the basic facts, setting, and concerns.
Create a general mental picture of conditions.
Formulate and focus questions for future research.
Generate new ideas, proposals, or hypotheses.
Determine the feasibility of conducting research.
Develop techniques for measuring and locating future data.
Purpose of study:
descriptive goals of research
The goal is “to describe”.
Provide a detailed, highly accurate picture.
Locate new data that contradict past data.
Create a set of categories or classify types.
Clarify a sequence of steps or stages.
Document a causal process or mechanism.
Report on the background or context of a situation.
Purpose of study:
explanatory goals of research
The goal is “to explain”.
Test a theory’s predictions or principle.
Elaborate and enrich a theory’s explanation.
Extend a theory to new issues or topics.
Support or refute an explanation or prediction.
Link issues or topics with a general principle.
Determine which of several explanations is best.
Types of longitudinal research
Time-series research. The same type of information is
collected on a group of people or other units across
multiple time periods.
Panel study. Exactly the same people, group,
organizations, or other units are observed across time
periods.
Cohort analysis. A category of people who share a similar
life experience in a specified time period is studied. It is
“explicitly marcoanalytic” meaning examining category
as a whole for important features.
Time dimension in research:
case studies
Examines in depth many features of a few cases over a
duration of time.
Cases may be individuals, groups, organizations, events,
or geographic units.
The data are usually more detailed, varied, and extensive.
Most involve qualitative data about a few cases.
Qualitative and case study research are not identical!
General strategies for doing
research
Quantitative vs. qualitative
Quantitative research is empirical research where the
data are in the form of numbers or a database.
Computational research uses data management
techniques
Qualitative research is empirical research where the
data are not in the form of numbers.
Deskwork vs. fieldwork (staying in the office, library, or
laboratory vs. going out to research)
The similarities between qualitative
and quantitative research
Quantitative research are used for testing theory, but also for
exploring an area and generating hypothesis and theory.
Qualitative research can be used for testing hypotheses and
theories, even though it is mostly used for theory generation.
Qualitative data often include quantification (e.g. statements
such as more than, less than, most, etc.) and can be quite
large.
Quantitative approaches (e.g. large-scale surveys) can also
collect qualitative (non-numeric) data.
The underlying philosophical positions are not necessarily as
distinct as the stereotypes suggest.
Questions leading to quantitative
research
Quantitative research is used when it is possible to specify
variables which can be measured or tested or indicated as
numbers by using some other method.
Examples of questions:
– How much of something occurs in the
phenomenon X?
– How often something occurs in the phenomenon X?
– Is the occurrence of Y and X statistically significant?
– Can we classify Y?
Questions leading to qualitative
research
The objective of qualitative research is usually to build a new
construct from observed points or from existing issues.
This new construct should be clearer that the previous one or
it should emphasise some points so that they can be
understood better.
Examples of questions:
– What is the phenomenon like?
– What kind of qualities the phenomenon has?
Deductive Theory
Theory
Hypotheses
Data Collection
Findings
Hypotheses Confirmed or Rejected
Revision of Theory
Induction
[General research question]
Observation
Theory Formulation
Quantitative and Qualitative Methods
Quantitative:
Deductive (and inductive)
Tests hypotheses
Positivism
Objectivism
Employs measurement
Macro
Detached researcher
Qualitative:
Inductive
Produces theories
Phenomenology
Constructionism
Usually does not employ
measurement
Micro
Involved researcher
Old ideas? Some research is now both!
Examples?
Main Steps in Quantitative Research:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Theory
Hypothesis
Research design
Devise measures of concepts
Select research site(s)
Select research subjects/respondents
Administer research instruments/ collect data
Process data
Analyse data
Write up findings and conclusions
Main Steps in Qualitative Research:
1.
2.
3.
4.
5.
6.
7.
8.
9.
General research question
Select relevant site(s) and subjects
Collection of relevant data
Interpretation of data
Conceptual and theoretical work
Tighter specification of the research question
Collection of further data
Conceptual and theoretical work
Write up findings
Examples of Quantitative Research
Methods:
Experiments
Social surveys
– Cross-sectional
– Comparative (cross-national)
– Longitudinal
Content Analysis
Secondary Statistical Analysis
Official Statistics
– Demography
– Epidemiology
Field stimulations
– Structured Interviews and Observation.
Examples of Qualitative Research:
In-depth Interviews
Focus Groups
Ethnography/Field Research
Historical-Comparative Research
Discourse Analysis
Narrative Analysis
Media Analysis
Combine both
Quantitative and qualitative research are often cast as opposing fields.
But sometimes they blur - qualitative research may employ quantification
in their work or may be positivist in their approach. Some quantitative
may employ phenomenology.
Both can be also be combined in a project
– Qualitative can facilitate quantitative research (1) can provide hypotheses
(2) fill in the gaps, help interpret relationships
– Quantitative can facilitate qualitative through locating interviewees and
help with generalising findings
– Together they can give you a micro and macro level versions and so you
can examine the relationships between the two levels. They can
complement each other.
What we covered
• Research methods
• Digital humanities
Questions
• Role in the information science?
• Examples of
qualitative/quantitative/computational
methods?
Download