Introduction to Multimedia and MSEC 20-791 Mike Christel Alex Hauptmann ARCHIVE: http://www.cs.cmu.edu/~christel/MM2002/syllabus.htm Contact Information Mike Christel christel@cs.cmu.edu http://www.cs.cmu.edu/~christel (412) 268-7799 Wean Hall 5212 Alex Hauptmann alex@cs.cmu.edu http://www.cs.cmu.edu/~alex (412) 268-1448 Wean Hall 5124 Office Hours by Appointment © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 2 Carnegie Mellon Teaching Assistant Rong Yan yanrong@cs.cmu.edu http://www.cs.cmu.edu/~yanrong (412) 268-9515 Newell Simon Hall 4533 Office Hours by Appointment © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 3 Carnegie Mellon Carnegie Mellon Campus Map Wean Hall NewellSimon Hall © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 4 Carnegie Mellon Course Outline Oct. 24 Oct. 29 Oct. 31 Nov. 5 Nov. 7 Nov. 12 Nov. 14 Introduction to Multimedia Images as Multimedia Interface Components; Intro to Macromedia Flash 5 Digital Audio; Speech Recognition Image Processing and Computer Vision Speech Synthesis and Speech Dialogue Applications Digital Video Multimedia via Cell Phones and PDAs © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 5 Carnegie Mellon Course Outline Nov. 19 Web Specifications, MM Synchronization Nov. 21 Digital Music and Music Processing Nov. 26 MM Projects: Project LISTEN, Informedia Dec. 3 Multimedia Information Retrieval, TREC Interactive Video Track Dec. 5 Multimedia and Entertainment: Carnegie Mellon’s Entertainment Technology Center Dec. 10 MM Content Analysis: Digital Human Memory; Informedia Interface Evaluation Dec. 12 (MM Experiences from the Field planned…) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 6 Carnegie Mellon Grading • No midterm, no final • Textbook plus recommended links/readings • Grading based on homeworks (90%), class presence and participation (10%) • Homeworks MUST be published to your web site; email me (christel@cs.cmu.edu) by next class your base URL from which a “MSEC 20-791” link will exist • Homework time deadlines are strictly enforced: loss of 10% per day late for each assignment • Flash homework is worth twice other homeworks • 10% for class time meant to encourage you to show up mentally and physically for class © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 7 Carnegie Mellon Definition of Multimedia Multi (Latin multus - numerous) Media, medium (Latin medius, medium: middle, center, intermediary; Latin mediat: intermediary, means) Multiple types of information captured, stored, manipulated, transmitted, and presented. Specifically: Images, Video, Audio (+Speech) and Text Related terms: hypermedia, hypertext Problem: “hypertext”, “hypermedia”, “multimedia” so overused/generalized they now convey little meaning © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 8 Carnegie Mellon A Few Items in a Multimedia Timeline Pre-Digital Age: suggestions? see “Multimedia: From Wagner to Virtual Reality”, http://www.artmuseum.net/w2vr/timeline/timeline.html 1906 – Color photography made practicable http://www.niepce.com/pagus/pagus-inv.html 1945 – Vannevar Bush, memex “As We May Think” http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm 1960s – Ted Nelson, Xanadu, “a universal instantaneous hypertext publishing network” 1967 – Nicholas Negroponte formed MIT Architecture Machine Group (later in 1985 MIT Media Lab opens) 1987 – RCA’s David Sarnoff Labs’ announce Digital Video Interactive 1988 – Apple “Knowledge Navigator” vision © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 9 Carnegie Mellon Multimedia Timeline, Continued 1989 – Tim Berners-Lee proposed the World Wide Web to CERN 1991 – Motion Picture Experts Group 1993 – NCSA Mosaic 1994 – Netscape; creation of World Wide Web Consortium (W3C) 1995 – JAVA for platform-independent application development 1996 – PNG (Portable Network Graphics) 1997 – HTML 4.0 1998 – XML 1.0 1999 – XSLT 1.0 and Xpath 1.0 2001 – MPEG-7, JPEG 2000, SVG 2002 – intellectual property and JPEG 2000 (www.jpeg.org/newsrel1.html) Help with alphabet soup: http://www.w3c.org, other on-line multimedia course glossaries, e.g., http://www.cs.cornell.edu/courses/cs631/1999sp/ © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 10 Carnegie Mellon Top Ten Misconceptions about Multimedia Computing Ramesh Jain, founding chairman of Virage and CTO of Praja, www.praja.com, presented the following “top ten” MISCONCEPTIONS list as part of his keynote speech at the ACM Multimedia Conference, Ottawa, Canada, October 2, 2001: 10. Video = Multimedia. 9. Multimedia = multi X separate medium. 8. All information is ONLY in the images or video. 7. Editing of media is almost always off-line. 6. Query by example is best access method. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 11 Carnegie Mellon Top Ten Misconceptions about Multimedia Computing, Continued 5. All users have PhDs in multimedia computing. 4. Users have no memory or context. 3. Computers are for computing. 2. Medium is the message. 1. We work for computers. Ramesh Jain concluded his keynote talk with the observation: Information Builds Experience, Experience is Life. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 12 Carnegie Mellon Multimedia Audio Networking Natural Language Processing Psychology Storage Systems Video Information Retrieval Images Data Compression HCI CPU Power Multimedia Physics • Sound is a waveform • Imagery is a waveform • light is electromagnetic radiation with different intensity in spatial coordinates • color corresponds to wavelength (red is the longest wavelength visible by people) • Introductory treatment of “light behaves as both particle and wave” at http://www.howstuffworks.com/light1.htm • “Distributed Multimedia” by Palmer Agnew and Anne Kellerman, published by Atomic Dog Publishing, http://www.atomicdogpublishing.com © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 14 Carnegie Mellon A Quick Introduction to Light Waves • Derived from: http://www.pbs.org/deepspace/classroom/activity2.html • Waves characterized by wavelength and frequency wavelength • Light is a type of electromagnetic radiation in a range for which our eyes are sensitive • Sound is not electromagnetic radiation, but sound is a wave as well. Higher pitches are caused by higher frequencies of vibrating molecules that reach your eardrum. Lower pitches are likewise caused by lower frequencies. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 15 Carnegie Mellon Wavelength/ Frequency Spectrum TV, FM Infrared Ultraviolet Long radio waves Microwaves 700 nm X-rays 600 nm 500 nm 5x1014 Hz 6x1014 Hz 4.5x1014 Hz © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 16 Gamma rays 400 nm 7x1014 Hz Carnegie Mellon Migration from Analog to Digital Representation • Analog signals to sensors • E.g. vinyl records • Fidelity is faithfulness to the original • Digital representation (1960s) • Sampling • Quantizing • Coding • Limiting factors in move to digital: • • • • Storage limits CPU speeds I/O speeds Network bandwidth © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 17 Carnegie Mellon Loss of Fidelity Due to Sampling © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 18 Carnegie Mellon Loss of Fidelity Due to Quantizing © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 19 Carnegie Mellon Overview of Compression Strategies • Lossless Compression • Huffman Encoding • Adaptive Huffman Encoding • Lempel-Ziv-Welch (LZW) • used in GIF • JPEG-LS • Lossy Compression • JPEG • H.261, MPEG-1, MPEG-2 • Lossless and Lossy Together • JPEG 2000 © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 20 Carnegie Mellon Huffman Encoding Procedure 1. Initialization: Put all items in a list L, sorted by freq. 2. Repeat until L has only one node left: (a) From L pick two nodes having the lowest frequency, create a parent node of them. (b) Assign the sum of the children's frequencies to the parent node and insert it into L (kept in sorted order). (c) Assign code 0, 1 to the two branches of the tree, and delete the children from L. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 21 Carnegie Mellon Huffman Coding Example • Input: “ALOHA HAWAII” • Frequency: 4A, 2H, 2I, 1L, 1O, 1 space, 1W • 96 bits (8 bits * 12 characters) to 32 bits: 0 1 A 1 0 0 I 1 0 1 H 0 1 L [space] W 0 1 O A=0, I=100, H=101, L=1100, space=1101, etc. RECOMMENDED: Java applet example at http://www.cs.sfu.ca/CC/365/li/squeeze/index.html © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 22 Carnegie Mellon Why Digital? • Universal storage, transmission format • CD, Internet • Precision (range of values, number of bits, floating point) • Lossless transmission/storage BUT: • Sampling rate distorts information • Size requirements may be huge compared to analog, e.g., 4.2 million pixels for single 35 mm photograph! results in lots of work on perception-based lossy digital compression strategies © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 23 Carnegie Mellon Why Perception Matters http://www.libertarian.on.ca/images/Florida%20Recount.jpg Audio • Sounds • Hear 15 Hz to 20 kHz • Speech is 50 Hz to 10 kHz • Speech Recognition • It is hard to wreck a nice beach / It is hard to recognize speech • Ice cream / I scream • Synthesis • Speech • Music • MIDI for 127 instruments, 47 percussion sounds • Notes, timing © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 25 Carnegie Mellon Speech Recognition Issues • Continuous vs. discrete • Vocabulary size • Channel (microphone) • Environment (location of microphone and speaker) • Speaker dependent/speaker independent • Context (language model) • Interactivity (dialog model) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 26 Carnegie Mellon Speech Recognition Knowledge Sources Acoustic Modeling Describes the sounds that make up speech Speech Recognition Lexicon Describes which sequences of speech sounds make up valid words Language Model Describes the likelihood of various sequences of words being spoken Speech Variations Style Variations Voice careful, clear, articulated, formal, casual spontaneous, normal, read, dictated, intimate Quality breathy, creaky, whispery, tense, lax, modal Context sport, professional, interview, free conversation, man-machine dialogue Speaking Rate normal, slow, fast, very fast Stress in noise, with increased vocal effort (Lombard reflex), emotional factors (e.g. angry), under cognitive load Video • Video is made up of frames • Frame rate = delay between successive frames • Minimal change between frames • Sequencing creates the illusion of movement • 16 frames per second (fps) is “smooth” • Standards: NTSC 29.97 fps, PAL fps, HDTV 60 fps • Interlacing • Display scan rate is different • Monitor refresh rate, e.g., 60-70 Hz = ~1/second © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 29 Carnegie Mellon Captured vs. Synthetic • Animation vs. Video • Vector Graphics vs. Bitmap/Raster Pictures • Synthesizer vs. Recording • Storage? Manipulation? Processor Requirements? • Fidelity to real world • Hybrids are possible © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 30 Carnegie Mellon Why is Multimedia Important? • Our society • • • • captures its experience, records its accomplishments, portrays its past informs its masses ……in pictures, audio and video • For many, CNN has become the “publication of record” • Multimedia learning leverages “multiple intelligences” • Multimedia Digital Libraries are an essential component of • formal, informal, and professional learning • distance education, telemedicine © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 31 Carnegie Mellon Technology Push vs. Market Pull • Home Entertainment • Catalog Ordering • Multimedia Training, Education • Videoconferencing • Professional Video Services • Videomail • Speech Recognition © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 32 Carnegie Mellon Hype vs. Reality What is feasible, under what circumstances? What is possible? What is impossible? What is unlikely? © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 33 Carnegie Mellon A Multimedia Vision for the Home Market FX Palo Alto Laboratory John J. Doherty, Lynn Wilcox, and Andreas Girgensohn “A Night at the Opera” Video to appear as part of the ACM Multimedia Conference, 2002 (7:11) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 34 Carnegie Mellon Upcoming Homework Register: send email to christel@cs.cmu.edu with URL where your homeworks will be located (we will use that URL plus your sending email address for future correspondence) – before Oct. 28 Homework 1: Multimedia lookup via the web – Oct. 28 Homework 2: Scanning and image search – Oct. 30/Nov. 4 Homework 3: Animation via Macromedia Flash – Nov. 24 Homeworks 4,5,6,7 for later in the term Homework 8: Multimedia web site – Dec. 12 See syllabus for details © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 35 Carnegie Mellon