Introduction to Multimedia and MSEC 20-791

advertisement
Introduction to Multimedia and
MSEC 20-791
Mike Christel
Alex Hauptmann
ARCHIVE:
http://www.cs.cmu.edu/~christel/MM2002/syllabus.htm
Contact Information
Mike Christel
christel@cs.cmu.edu
http://www.cs.cmu.edu/~christel
(412) 268-7799
Wean Hall 5212
Alex Hauptmann
alex@cs.cmu.edu
http://www.cs.cmu.edu/~alex
(412) 268-1448
Wean Hall 5124
Office Hours by Appointment
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
2
Carnegie Mellon
Teaching Assistant
Rong Yan
yanrong@cs.cmu.edu
http://www.cs.cmu.edu/~yanrong
(412) 268-9515
Newell Simon Hall 4533
Office Hours by Appointment
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
3
Carnegie Mellon
Carnegie Mellon Campus Map
Wean Hall
NewellSimon Hall
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
4
Carnegie Mellon
Course Outline
Oct. 24
Oct. 29
Oct. 31
Nov. 5
Nov. 7
Nov. 12
Nov. 14
Introduction to Multimedia
Images as Multimedia Interface
Components; Intro to Macromedia Flash 5
Digital Audio; Speech Recognition
Image Processing and Computer Vision
Speech Synthesis and Speech
Dialogue Applications
Digital Video
Multimedia via Cell Phones and PDAs
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
5
Carnegie Mellon
Course Outline
Nov. 19
Web Specifications, MM Synchronization
Nov. 21
Digital Music and Music Processing
Nov. 26
MM Projects: Project LISTEN, Informedia
Dec. 3
Multimedia Information Retrieval,
TREC Interactive Video Track
Dec. 5
Multimedia and Entertainment: Carnegie
Mellon’s Entertainment Technology Center
Dec. 10
MM Content Analysis: Digital Human
Memory; Informedia Interface Evaluation
Dec. 12
(MM Experiences from the Field planned…)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
6
Carnegie Mellon
Grading
• No midterm, no final
• Textbook plus recommended links/readings
• Grading based on homeworks (90%), class presence
and participation (10%)
• Homeworks MUST be published to your web site;
email me (christel@cs.cmu.edu) by next class your
base URL from which a “MSEC 20-791” link will exist
• Homework time deadlines are strictly enforced: loss
of 10% per day late for each assignment
• Flash homework is worth twice other homeworks
• 10% for class time meant to encourage you to show
up mentally and physically for class
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
7
Carnegie Mellon
Definition of Multimedia
Multi (Latin multus - numerous)
Media, medium (Latin medius, medium: middle, center,
intermediary; Latin mediat: intermediary, means)
Multiple types of information captured, stored,
manipulated, transmitted, and presented.
Specifically: Images, Video, Audio (+Speech) and Text
Related terms: hypermedia, hypertext
Problem: “hypertext”, “hypermedia”, “multimedia” so
overused/generalized they now convey little meaning
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
8
Carnegie Mellon
A Few Items in a Multimedia Timeline
Pre-Digital Age: suggestions?
see “Multimedia: From Wagner to Virtual Reality”,
http://www.artmuseum.net/w2vr/timeline/timeline.html
1906 – Color photography made practicable
http://www.niepce.com/pagus/pagus-inv.html
1945 – Vannevar Bush, memex “As We May Think”
http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
1960s – Ted Nelson, Xanadu, “a universal instantaneous hypertext
publishing network”
1967 – Nicholas Negroponte formed MIT Architecture Machine Group
(later in 1985 MIT Media Lab opens)
1987 – RCA’s David Sarnoff Labs’ announce Digital Video Interactive
1988 – Apple “Knowledge Navigator” vision
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
9
Carnegie Mellon
Multimedia Timeline, Continued
1989 – Tim Berners-Lee proposed the World Wide Web to CERN
1991 – Motion Picture Experts Group
1993 – NCSA Mosaic
1994 – Netscape; creation of World Wide Web Consortium (W3C)
1995 – JAVA for platform-independent application development
1996 – PNG (Portable Network Graphics)
1997 – HTML 4.0
1998 – XML 1.0
1999 – XSLT 1.0 and Xpath 1.0
2001 – MPEG-7, JPEG 2000, SVG
2002 – intellectual property and JPEG 2000 (www.jpeg.org/newsrel1.html)
Help with alphabet soup: http://www.w3c.org, other on-line multimedia course
glossaries, e.g., http://www.cs.cornell.edu/courses/cs631/1999sp/
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
10
Carnegie Mellon
Top Ten Misconceptions about
Multimedia Computing
Ramesh Jain, founding chairman of Virage and CTO of
Praja, www.praja.com, presented the following “top ten”
MISCONCEPTIONS list as part of his keynote speech at
the ACM Multimedia Conference, Ottawa, Canada,
October 2, 2001:
10. Video = Multimedia.
9. Multimedia = multi X separate medium.
8. All information is ONLY in the images or video.
7. Editing of media is almost always off-line.
6. Query by example is best access method.
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
11
Carnegie Mellon
Top Ten Misconceptions about
Multimedia Computing, Continued
5. All users have PhDs in multimedia computing.
4. Users have no memory or context.
3. Computers are for computing.
2. Medium is the message.
1. We work for computers.
Ramesh Jain concluded his keynote talk with the
observation:
Information Builds Experience, Experience is Life.
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
12
Carnegie Mellon
Multimedia
Audio
Networking
Natural
Language
Processing
Psychology
Storage
Systems
Video
Information
Retrieval
Images
Data
Compression
HCI
CPU Power
Multimedia Physics
• Sound is a waveform
• Imagery is a waveform
• light is electromagnetic radiation with different intensity in
spatial coordinates
• color corresponds to wavelength (red is the longest
wavelength visible by people)
• Introductory treatment of “light behaves as both particle
and wave” at http://www.howstuffworks.com/light1.htm
• “Distributed Multimedia” by Palmer Agnew and Anne
Kellerman, published by Atomic Dog Publishing,
http://www.atomicdogpublishing.com
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
14
Carnegie Mellon
A Quick Introduction to Light Waves
• Derived from:
http://www.pbs.org/deepspace/classroom/activity2.html
• Waves characterized by wavelength and frequency
wavelength
• Light is a type of electromagnetic radiation in a range for which our
eyes are sensitive
• Sound is not electromagnetic radiation, but sound is a wave as
well. Higher pitches are caused by higher frequencies of vibrating
molecules that reach your eardrum. Lower pitches are likewise
caused by lower frequencies.
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
15
Carnegie Mellon
Wavelength/ Frequency Spectrum
TV, FM
Infrared Ultraviolet
Long radio waves Microwaves
700 nm
X-rays
600 nm
500 nm
5x1014 Hz
6x1014 Hz
4.5x1014 Hz
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
16
Gamma rays
400 nm
7x1014 Hz
Carnegie Mellon
Migration from Analog to Digital Representation
• Analog signals to sensors
• E.g. vinyl records
• Fidelity is faithfulness to the original
•
Digital representation (1960s)
• Sampling
• Quantizing
• Coding
• Limiting factors in move to digital:
•
•
•
•
Storage limits
CPU speeds
I/O speeds
Network bandwidth
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
17
Carnegie Mellon
Loss of Fidelity Due to Sampling
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
18
Carnegie Mellon
Loss of Fidelity Due to Quantizing
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
19
Carnegie Mellon
Overview of Compression Strategies
• Lossless Compression
• Huffman Encoding
• Adaptive Huffman Encoding
• Lempel-Ziv-Welch (LZW)
• used in GIF
• JPEG-LS
• Lossy Compression
• JPEG
• H.261, MPEG-1, MPEG-2
• Lossless and Lossy Together
• JPEG 2000
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
20
Carnegie Mellon
Huffman Encoding Procedure
1. Initialization: Put all items in a list L, sorted by freq.
2. Repeat until L has only one node left:
(a) From L pick two nodes having the lowest frequency,
create a parent node of them.
(b) Assign the sum of the children's frequencies to the
parent node and insert it into L (kept in sorted order).
(c) Assign code 0, 1 to the two branches of the tree, and
delete the children from L.
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
21
Carnegie Mellon
Huffman Coding Example
• Input: “ALOHA HAWAII”
• Frequency: 4A, 2H, 2I, 1L, 1O, 1 space, 1W
• 96 bits (8 bits * 12 characters) to 32 bits:
0
1
A
1
0
0
I
1
0
1
H
0 1
L
[space]
W
0
1
O
A=0, I=100, H=101, L=1100, space=1101, etc.
RECOMMENDED: Java applet example at
http://www.cs.sfu.ca/CC/365/li/squeeze/index.html
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
22
Carnegie Mellon
Why Digital?
• Universal storage, transmission format
• CD, Internet
• Precision (range of values, number of bits, floating
point)
• Lossless transmission/storage
BUT:
• Sampling rate distorts information
• Size requirements may be huge compared to analog,
e.g., 4.2 million pixels for single 35 mm photograph!
 results in lots of work on perception-based lossy digital
compression strategies
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
23
Carnegie Mellon
Why Perception Matters
http://www.libertarian.on.ca/images/Florida%20Recount.jpg
Audio
• Sounds
• Hear 15 Hz to 20 kHz
• Speech is 50 Hz to 10 kHz
• Speech Recognition
• It is hard to wreck a nice beach / It is hard to recognize
speech
• Ice cream / I scream
• Synthesis
• Speech
• Music
• MIDI for 127 instruments, 47 percussion sounds
• Notes, timing
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
25
Carnegie Mellon
Speech Recognition Issues
• Continuous vs. discrete
• Vocabulary size
• Channel (microphone)
• Environment (location of microphone and speaker)
• Speaker dependent/speaker independent
• Context (language model)
• Interactivity (dialog model)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
26
Carnegie Mellon
Speech Recognition Knowledge Sources
Acoustic Modeling
Describes the sounds that
make up speech
Speech Recognition
Lexicon
Describes which
sequences of speech
sounds make up
valid words
Language Model
Describes the likelihood
of various sequences of
words being spoken
Speech Variations
Style Variations
Voice
careful, clear, articulated, formal, casual
spontaneous, normal, read,
dictated, intimate
Quality
breathy, creaky,
whispery, tense,
lax, modal
Context
sport, professional,
interview,
free conversation,
man-machine dialogue
Speaking Rate
normal, slow, fast,
very fast
Stress
in noise, with increased vocal
effort (Lombard reflex),
emotional factors (e.g. angry),
under cognitive load
Video
• Video is made up of frames
• Frame rate = delay between successive frames
• Minimal change between frames
• Sequencing creates the illusion of movement
• 16 frames per second (fps) is “smooth”
• Standards: NTSC 29.97 fps, PAL fps, HDTV 60 fps
• Interlacing
• Display scan rate is different
• Monitor refresh rate, e.g., 60-70 Hz = ~1/second
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
29
Carnegie Mellon
Captured vs. Synthetic
• Animation vs. Video
• Vector Graphics vs. Bitmap/Raster Pictures
• Synthesizer vs. Recording
• Storage? Manipulation? Processor Requirements?
• Fidelity to real world
• Hybrids are possible
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
30
Carnegie Mellon
Why is Multimedia Important?
• Our society •
•
•
•
captures its experience,
records its accomplishments,
portrays its past
informs its masses
……in pictures, audio and video
• For many, CNN has become the “publication of record”
• Multimedia learning leverages “multiple intelligences”
• Multimedia Digital Libraries are an essential component
of
• formal, informal, and professional learning
• distance education, telemedicine
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
31
Carnegie Mellon
Technology Push vs. Market Pull
• Home Entertainment
• Catalog Ordering
• Multimedia Training, Education
• Videoconferencing
• Professional Video Services
• Videomail
• Speech Recognition
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
32
Carnegie Mellon
Hype vs. Reality
What is feasible, under what circumstances?
What is possible?
What is impossible?
What is unlikely?
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
33
Carnegie Mellon
A Multimedia Vision for the Home Market
FX Palo Alto Laboratory
John J. Doherty, Lynn Wilcox, and Andreas
Girgensohn
“A Night at the Opera”
Video to appear as part of the
ACM Multimedia Conference, 2002
(7:11)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
34
Carnegie Mellon
Upcoming Homework
Register: send email to christel@cs.cmu.edu with URL where
your homeworks will be located (we will use that URL plus
your sending email address for future correspondence) –
before Oct. 28
Homework 1: Multimedia lookup via the web – Oct. 28
Homework 2: Scanning and image search – Oct. 30/Nov. 4
Homework 3: Animation via Macromedia Flash – Nov. 24
Homeworks 4,5,6,7 for later in the term
Homework 8: Multimedia web site – Dec. 12
See syllabus for details
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann
35
Carnegie Mellon
Download