7008fa14_04_Multimedia

advertisement
LSU/SLIS
Multimedia
Session 4
LIS 7008
Information Technologies
Agenda
•
•
•
•
•
•
•
HW2
Quiz
Images
Video
Audio
Streaming
SMILe
HW2
• Good work
– Your webpage is well rendered in a browser.
• Should use paragraphs
– Findings based on each approach are addressed.
– A correct (or partially correct) conclusion is drawn by
pulling together all the findings. An attempt to draw a
conclusion of who owns the website is obviously
present.
– Some good Examples (although not perfect):
• Distributed on Moodle
Quiz
• Open book (text, slides, Internet)
• Posted on Moodle
• Scope: covers Session 1-3
– Readings, slides, homework
• Purposes
– Measure your learning progress
– Preview what the midterm exam will be like
• Time: 20 minutes from opening to closing
• Due:
– See instructions on Moodle
– Read the instructions BEFORE opening the quiz!!!
– You can download the quiz any time, but do not open it
until you are ready to take it.
The Gullibility of Human Senses
• Three simple tricks for producing
– Images
– Video
– Audio
• But how do you move the bits around fast enough?
– Remove redundancy – apply compression!
– Throw away stuff that doesn’t matter (b/c eyes cannot see)
• Synchronizing different media to create multimedia
A lighthouse picture
y
K
Specify color
• Additive colors and subtractive colors.
– Primary colors: RGB: produce secondary colors
– http://en.wikipedia.org/wiki/Primary_color
• Red+Green+Blue = White
– Subtractive colors: MCY: absorb colors.
– http://en.wikipedia.org/wiki/Subtractive_color
• Magenta+Cyan+Yellow = Black
• Guess the size of that picture on the previous slide?
– Typical projector/monitor: 1024x768 = 786,432 pixels.
• Horizontal dimension: 1024 pixels, vertical dimension: 768 pixels
Basic Image Coding
• An image = Collection of picture elements (pixels)
– Each pixel has a “color”
• Black/white image: each pixel has 1 color: use 1 bit (either 1 or 0) to
code a color
• Grayscale image: each pixel has 1 color: use 8 bits to code a color
• Colorful image: each pixel has 3 colors - RGB, each color is coded
with 8 bits (or 1 byte), so each pixel has 24 bits (or 3 bytes, 1 byte=8
bits)
– So, 3 bytes per pixel for colorful images
• Screen
– Typical projector resolution: 1024x768 pixels
– A 1024x768 image requires 2.4 MB (=1024x768x3)
• So a picture is worth 400,000 words (1 word = 6 bytes)!
• Compression: do not use 3 bytes/pixel
– remove some unimportant colors that human eyes cannot see
Look closely
Nothing new (color the pixels)
Georges Seurat, A Sunday Afternoon on the Island of La Grande Jatte
Visual Perception
• Closely spaced dots appear solid
– But irregularities in diagonal lines can stand out
• Any color can be produced from just three
– Red, Blue and Green: “additive” primary colors
• High frame rates produce apparent motion
– Smooth motion picture movie requires about 24
frames/second
• Visual acuity varies markedly across features
– Discontinuities of features easily seen, absolute difference
is less crucial
Monitor Characteristics
• Technology
– Cathode-Ray Tube (CRT): RGB 3 guns
– Flat panel (LCD liquid crystal)
• Size (15, 17, 19, 21 inch)
– Measured diagonally
– For CRT, key figure is “viewable area”
• Resolution
– 640x480 (VGR video card), 800x600 (old laptop),
1024x768 (popular now or near future), 1280x1024…
• Layout (three dot pixel, by lines)
• Dot pitch (0.26mm, 0.28mm)
– Distance between pixels on a monitor screen
• Color Refresh rate (60, 72, 80 Hz)
– light flick rate: 60 Hz
Some Questions
• How many images can a 1 GB flash card store?
– But mine holds about 500 images. How?
• How long will it take to send an image at 128KB/s?
– But my image-intensive Web page loads faster than that. How?
You should be able to answer these questions by the end of this
session; otherwise, come to Moodle to discuss.
Compression
• Goal: Send the same information using fewer bits
• Technology originally developed for fax transmission
– Send high quality documents in short calls
• Two types of compression:
– Lossless compression: can reconstruct the original image
exactly after decompression
• File size reduced, but no information is lost
– Lossy: can’t reconstruct the original image after compression,
but the compressed image looks the same as the original
• Two compression strategies:
– Reduce redundancy
– Throw away stuff that doesn’t matter (because human eyes/ears
cannot see/hear)
• Whether to compress? Depends on:
– Computer speed
– Network transmission speed
Palette Selection
• Opportunity:
– No picture uses all 16 million colors
– Human eye does not see small differences between colors
• Approach:
– Select a palette of 256 colors
– Indicate which palette entry to use for each pixel
– Look up each color in the palette
“The rain in Spain falls mainly in the plain”
→ [*=ain,^=in]
“The r* ^ Sp* falls m*ly ^ the pl*”
…
…
1 pixel = 3 colors
Compression Using Run-Length
Encoding (RLE)
• Pixels are organized into lines
• Opportunity:
– Large regions of a single color are common
– Most pixels are the same as the one before
• As you can see from the lighthouse image
• Approach:
– Record # of consecutive pixels for each color
• An example of lossless encoding
Sheep go baaaaaaaaaa and cows go moooooooooo
→ Sheep go ba<10> and cows go mo<10>
Graphic Interchange Format (GIF)
• Do palette selection first , then do lossless compression
• Opportunity:
– Common colors are sent more often
• Approach: Huffman Encoding
– Use fewer bits to represent common colors
– Encoding Color % color in image #Bits vs. #Bits if using regular 2 bits/color
• 1
• 01
• 001
Blue
White
Red
75%
20%
5%
Total:
75x1= 75 vs.
20x2= 40 vs.
5x3= 15 vs.
130 bits
75x2=150
20x2= 40
5x2= 10
vs.
200 bits
What is 10100101? Can you interpret the colors? If
you have no idea, come to Moodle to discuss this.
PNG (Portable Network Graphics): replacement for GIF (PNG has no
patent restrictions, GIF is owned by Compuserv.)
Joint Photographic Experts Group
(JPEG)
• Opportunity:
– Eye sees sharp lines better than subtle shading
– Eye more sensitive to small changes in brightness than in color
• Approach:
– Retain detail only for the most important parts (by human eyes)
– Approximate changes in image with mathematical curves:
accomplished with Discrete Cosine Transform
• Allows user-selectable fidelity (allow users to select compression rate)
• Efficiently captures smooth transitions and shading
• Not as good at capturing sharp edges
• Results:
– Typical compression rate is 20:1
Variable Compression Rate in JPEG
37 KB (20% rate)
4 KB (95% rate)
Vector Graphics
Line drawing using math functions. Re-scalable without loss of resolution
Raseter vs. Vector Graphics
• Raster images (“bitmap graphics”)
– Actually describe the contents of the image
– Good for natural scenes
• Vector images
– Mathematically describe how to draw the image
– Rescalable without loss of resolution
Discussion Point:
Selecting an Image Format
• Should I use GIF, JPEG, or vector graphics for …
• Color photos?
• Scanned black & white text? (Transcript, itinerary)
• Line drawings?
These are important practical questions to archivists and
digital librarians. Please come to Moodle to discuss
this.
Hands-On Exercise:
Convert Between Formats
• Download and save two images
– http://www.csc.lsu.edu/~wuyj/Teaching/7008/fa14/Images/image1.jpg
– http://www.csc.lsu.edu/~wuyj/Teaching/7008/fa14/Images/image2.gif
• Use Microsoft Paint (on Windows: All Programs
AccessoriesPaint) to convert each to the other
format, and compare quality and the file size
– Observe the difference
– Why the difference?
Basic Video Coding
• Display a sequence of images
– Fast enough for smooth motion and no flicker
– Motion picture film: smooth show at 24 pictures/second
• NTSC Video
– National Television System Committee (analog TV system)
– 60 “interlaced” half-frames/second, 512x486 pixel images
• HDTV
– 30 “progressive” full-frames/second, 1280x720 pixel images
Video Data Rates
• “NTSC” Quality Computer Display
– 640 x 480 pixel image
– 3 bytes per pixel (red, green, blue)
– 30 Frames per second
• Bandwidth requirement
– 26.4 MB/second
– That exceeds the bandwidth of most disk drives!
• Storage
– CD-ROM would hold 25 seconds worth of NTSC video
• What is the capacity of CD-ROM? 650-900MB
– 30 minutes would require 46.3 GB
– About 100GB/hr: too big!
– Compression! Multimedia is big! Compress harder!
Video Compression
• Opportunity:
– One frame looks very much like the next
• Approach:
– Record only the pixels that change (trace the difference)
• Standards:
– MPEG-1: for Web video (download then play)
– MPEG-2: for HDTV and DVD (commercial quality)
– MPEG-4: for Web video (streaming)
– Next?
MPEG Encoding
•••
•••
I1 B1 B2 B3 P1 B4 B5 B6 P2 B7 B8 B9 I2
FrameTypes:
I Intra (JPEG)
Encode complete image, similar to JPEG
P Forward Predicted
Motion relative to previous I and P’s
B Backward Predicted Motion relative to previous & future I’s & P’s
MPEG1 Frame Reconstruction
I1
I1+P1
I1+P1+P2
•••
I2
•••
updates
I frames provide complete
image
P frames provide series of
updates to most recent I
frame
P1
P2
What if drop an I frame? Bad!
Frame Reconstruction
I1
I1+P1
I1+P1+P2
•••
I2
•••
Interpolations
B frames interpolate
between frames
represented by I’s & P’s
B1 B2 B3
B4 B5 B6
B7 B8 B9
Basic Audio Encoding (Digitizing)
• Sample at twice the highest frequency (22KHz)
– 8 or 16 bits per sample, sample rate: X samples/second
Sampler
• Speech (0-4 kHz) requires 8 kB/s
– Standard telephone channel (1-byte samples)
• Music (0-22 kHz) requires 172 kB/s (uncompressed)
– Standard for CD-quality audio (2-byte samples)
• Pitch range:
– http://www.youtube.com/watch?v=zESbrwRvMyM
– Caution! Extremely high pitch: the following can hurt your
ears! Stop playing when feel uncomfortable!
http://www.youtube.com/watch?v=BX7Ar3Z-oTo
Music Compression
• Opportunity:
– The human ear cannot hear all frequencies at once
• Approach:
– Don’t represent “masked” frequencies
• Standard: MPEG-1 Layer 3 (.mp3)
Loudness: http://www.tlc-direct.co.uk/Technical/Sounds/Decibles.htm
loudness
frequency
Temporal Masking
If we hear a loud sound, then it stops, it takes a while until
we can hear a soft tone at about the same frequency.
“Psychoacoustic compression”
–
–
–
–
Eliminate sounds below threshold of hearing
Eliminate sounds that are frequency masked
Eliminate sounds that are temporally masked
Eliminate stereo information for low frequencies
Compact Disk (CD) Recording
• Parameters
– 44,100 samples per second
• Sufficient for frequency response of 22KHz
– Each sample takes 16 bits
• 48 dB (decibel) range
– Two independent channels: stereo sound
• Dolby surround-sound uses tricks to pack 5 sound channels +
subwoofer effects
• Bit Rate
– 44.1K samples/sec x 2 channels x 2 bytes/sample = 172 KB/sec
• Typical Capacity
– 74 Minutes maximum playing time
– 747 MB total
Speech Compression
• Opportunity:
– Human voices vary in predictable ways
• Approach:
– Predict what’s next, then send only any corrections/changes
• Standards:
– Real audio can code speech in 6.5 kb/sec
• Demo at http://www.data-compression.com/speech.html
– Scroll down to near the bottom: “VII. Demonstration”
– Listen to the original and LPC10U (2400bps) to understand
speech effect with different compression rate.
Narrated PowerPoint
• Create your slides using PowerPoint
• Slide Show  Record Narration
– Set microphone level
• Record the narration
– Slide transitions are automatically captured
• Narration plays automatically when displayed
– Synchronized between slide flipping and narration
Adding Video to PowerPoint
• InsertMovies and Sounds
– Movies from file (a .mpg file)
• Decide whether you want “autostart”
– If not, it starts when you click on it
The “Last Mile”:
bandwidth to your desk
• Traditional modems
– “56” kb/sec modems really move data at ~3 kB/sec
– Maximumly 56 kb/s theoretically
• Digital Subscriber Lines (DSL)
– 384 kb/sec downloads (~38 kB/sec)
– 128 kb/sec uploads (~12 kB/sec)
• Cable modems
– 10 Mb/sec downloads (~1 MB/sec)
– 256 kb/sec uploads (~25kB/sec)
Multimedia on a Web Server
Web
Browser
Web
Server
Media
Player
• Object stored in a file
• File transferred as an HTTP object:
– Received entirely at the client
– Passed to media player for play
– This seems stupid because downloading is slow
Streaming
buffering
Web
Browser
Web
Server
Media
Player
Streaming
Server
Can be downloaded
and installed
• Browser gets a portion of media file over HTTP
– Launches media player to interpret that media file
• Media player contacts streaming server
Streaming Audio and Video
•
•
•
•
Begin to play after only a portion received
Buffer provides time to recover lost packets
Interrupts replay when “rebuffering”
Data not saved to hard drive.
Buffer
Media
Sever
Internet
Lost Packets (IP Phone)
• Network loss
– Packets completely lost (e.g., due to collisions)
• Delay loss
– Packets arrives too late for playout
• Due to: queueing; sender and receiver processing delays
• IP Phone: Typical maximum tolerable delay: 400 ms
• Loss tolerance
– 1% to 10% packet loss may be tolerable
• Some encoding schemes are more tolerant than others
Multiple Client Rates
1.5 Mbps encoding
28.8 Kbps encoding
Q: how to handle different client receiving rate capabilities?
– 28.8 Kbps dialup
– 128 Kbps to 3 Mbps (residential DSL service)
– 100Mbps Ethernet
A: server stores, transmits multiple copies of video, encoded
at different rates, for different users
Synchronizing Multiple Media
• Scripting Languages for synchronizing multiple
media:
– Synchronized Multimedia Integration Language (SMIL)
• Custom applications for this:
– Macromedia Flash
• Content representation standards for this:
– MPEG 4
SMIL
• Synchronized Multimedia Integration
Language
• Integration of multimedia with text, audio,
video
• Supported in RealPlayer
Slide from http://www.umiacs.umd.edu/~jimmylin/LBSC690-2007-Spring/content.html (Session 5)
SMILe
• Follows W3C standard
– Player-specific extensions are common
– Real Player implements SMIL (or SMILe)
• It is XML, with a structure similar to HTML
<smil>
<head> … </head>
<body> … </body>
</smil>
Elements in SMIL
• Window controls (in <head>)
– Controlling layout: <region>, <root-layout>
• Timeline controls (in <body>)
– Sequence control: <seq>, <excl>, <par>
– Timing control: <begin>, <end>, <dur>
• Content types (in <body>)
– <audio>, <video>, <img>, <ref>
SMIL Examples
• Implemented in RealOne Player
• You need to install RealOne Player (or Real Player) to run the following
examples
• Demo:
http://www.csc.lsu.edu/~wuyj/Teaching/7008/fa14/SMIL-demo/index.htm
There are 3 sets of executable and text files, at least run/read the last set:
– First, run the smildemo.smil (executable)
– Then, view smildemo.smil (xml) file
• Question: can you make sense of smildemo.smil?
– You are welcome to play with the first 2 sets.
SMIL Example
<smil>
<head>
<meta name="title" content="Online Teaching Services promo" />
<meta name="author" content="Jay Moonah, CAT" />
<layout type="text/smil-basic-layout">
<root-layout width="280" height="316" background-color="white"/>
<region id="AnimChannel1" title="AnimChannel1"
left="0" top="0" height="265" width="280" fit="hidden"/>
</layout>
</head>
<body>
<par title="Online Teaching Services promo" author="Jay Moonah, CAT" >
<audio src="final.rm" id="Soundtrack" title="Soundtrack"/>
<animation src="otscompfin.swf" id="Animation"
region="AnimChannel1" title="Animation" fill="freeze"/>
<text src="cc.rt" id="caption" region="cc" title="cc" fill="freeze"/>
</par>
</body>
</smil>
Slide from http://www.umiacs.umd.edu/~jimmylin/LBSC690-2007-Spring/content.html (Session 5)
Synchronizing
audio, animation,
and text.
From Media to Multimedia…
• Tricking the human senses:
– Blending pixels into a seamless image
– Rapidly cycling through images to create motion (video)
– Sampling analog waveforms to create digital recordings
• Lots of information required to encode images,
movies, and sounds
– Result: you get a bulky digital file, not handy for online
distribution and access.
– The Key is compression!
• Synchronization of different media sources leads to
multimedia applications
Discussion Point: When is
Lossless Compression Important?
•
•
•
•
For images?
For text?
For sound?
For video?
Please Come to Moodle to discuss.
Download