LSU/SLIS Multimedia Session 4 LIS 7008 Information Technologies Agenda • • • • • • • HW2 Quiz Images Video Audio Streaming SMILe HW2 • Good work – Your webpage is well rendered in a browser. • Should use paragraphs – Findings based on each approach are addressed. – A correct (or partially correct) conclusion is drawn by pulling together all the findings. An attempt to draw a conclusion of who owns the website is obviously present. – Some good Examples (although not perfect): • Distributed on Moodle Quiz • Open book (text, slides, Internet) • Posted on Moodle • Scope: covers Session 1-3 – Readings, slides, homework • Purposes – Measure your learning progress – Preview what the midterm exam will be like • Time: 20 minutes from opening to closing • Due: – See instructions on Moodle – Read the instructions BEFORE opening the quiz!!! – You can download the quiz any time, but do not open it until you are ready to take it. The Gullibility of Human Senses • Three simple tricks for producing – Images – Video – Audio • But how do you move the bits around fast enough? – Remove redundancy – apply compression! – Throw away stuff that doesn’t matter (b/c eyes cannot see) • Synchronizing different media to create multimedia A lighthouse picture y K Specify color • Additive colors and subtractive colors. – Primary colors: RGB: produce secondary colors – http://en.wikipedia.org/wiki/Primary_color • Red+Green+Blue = White – Subtractive colors: MCY: absorb colors. – http://en.wikipedia.org/wiki/Subtractive_color • Magenta+Cyan+Yellow = Black • Guess the size of that picture on the previous slide? – Typical projector/monitor: 1024x768 = 786,432 pixels. • Horizontal dimension: 1024 pixels, vertical dimension: 768 pixels Basic Image Coding • An image = Collection of picture elements (pixels) – Each pixel has a “color” • Black/white image: each pixel has 1 color: use 1 bit (either 1 or 0) to code a color • Grayscale image: each pixel has 1 color: use 8 bits to code a color • Colorful image: each pixel has 3 colors - RGB, each color is coded with 8 bits (or 1 byte), so each pixel has 24 bits (or 3 bytes, 1 byte=8 bits) – So, 3 bytes per pixel for colorful images • Screen – Typical projector resolution: 1024x768 pixels – A 1024x768 image requires 2.4 MB (=1024x768x3) • So a picture is worth 400,000 words (1 word = 6 bytes)! • Compression: do not use 3 bytes/pixel – remove some unimportant colors that human eyes cannot see Look closely Nothing new (color the pixels) Georges Seurat, A Sunday Afternoon on the Island of La Grande Jatte Visual Perception • Closely spaced dots appear solid – But irregularities in diagonal lines can stand out • Any color can be produced from just three – Red, Blue and Green: “additive” primary colors • High frame rates produce apparent motion – Smooth motion picture movie requires about 24 frames/second • Visual acuity varies markedly across features – Discontinuities of features easily seen, absolute difference is less crucial Monitor Characteristics • Technology – Cathode-Ray Tube (CRT): RGB 3 guns – Flat panel (LCD liquid crystal) • Size (15, 17, 19, 21 inch) – Measured diagonally – For CRT, key figure is “viewable area” • Resolution – 640x480 (VGR video card), 800x600 (old laptop), 1024x768 (popular now or near future), 1280x1024… • Layout (three dot pixel, by lines) • Dot pitch (0.26mm, 0.28mm) – Distance between pixels on a monitor screen • Color Refresh rate (60, 72, 80 Hz) – light flick rate: 60 Hz Some Questions • How many images can a 1 GB flash card store? – But mine holds about 500 images. How? • How long will it take to send an image at 128KB/s? – But my image-intensive Web page loads faster than that. How? You should be able to answer these questions by the end of this session; otherwise, come to Moodle to discuss. Compression • Goal: Send the same information using fewer bits • Technology originally developed for fax transmission – Send high quality documents in short calls • Two types of compression: – Lossless compression: can reconstruct the original image exactly after decompression • File size reduced, but no information is lost – Lossy: can’t reconstruct the original image after compression, but the compressed image looks the same as the original • Two compression strategies: – Reduce redundancy – Throw away stuff that doesn’t matter (because human eyes/ears cannot see/hear) • Whether to compress? Depends on: – Computer speed – Network transmission speed Palette Selection • Opportunity: – No picture uses all 16 million colors – Human eye does not see small differences between colors • Approach: – Select a palette of 256 colors – Indicate which palette entry to use for each pixel – Look up each color in the palette “The rain in Spain falls mainly in the plain” → [*=ain,^=in] “The r* ^ Sp* falls m*ly ^ the pl*” … … 1 pixel = 3 colors Compression Using Run-Length Encoding (RLE) • Pixels are organized into lines • Opportunity: – Large regions of a single color are common – Most pixels are the same as the one before • As you can see from the lighthouse image • Approach: – Record # of consecutive pixels for each color • An example of lossless encoding Sheep go baaaaaaaaaa and cows go moooooooooo → Sheep go ba<10> and cows go mo<10> Graphic Interchange Format (GIF) • Do palette selection first , then do lossless compression • Opportunity: – Common colors are sent more often • Approach: Huffman Encoding – Use fewer bits to represent common colors – Encoding Color % color in image #Bits vs. #Bits if using regular 2 bits/color • 1 • 01 • 001 Blue White Red 75% 20% 5% Total: 75x1= 75 vs. 20x2= 40 vs. 5x3= 15 vs. 130 bits 75x2=150 20x2= 40 5x2= 10 vs. 200 bits What is 10100101? Can you interpret the colors? If you have no idea, come to Moodle to discuss this. PNG (Portable Network Graphics): replacement for GIF (PNG has no patent restrictions, GIF is owned by Compuserv.) Joint Photographic Experts Group (JPEG) • Opportunity: – Eye sees sharp lines better than subtle shading – Eye more sensitive to small changes in brightness than in color • Approach: – Retain detail only for the most important parts (by human eyes) – Approximate changes in image with mathematical curves: accomplished with Discrete Cosine Transform • Allows user-selectable fidelity (allow users to select compression rate) • Efficiently captures smooth transitions and shading • Not as good at capturing sharp edges • Results: – Typical compression rate is 20:1 Variable Compression Rate in JPEG 37 KB (20% rate) 4 KB (95% rate) Vector Graphics Line drawing using math functions. Re-scalable without loss of resolution Raseter vs. Vector Graphics • Raster images (“bitmap graphics”) – Actually describe the contents of the image – Good for natural scenes • Vector images – Mathematically describe how to draw the image – Rescalable without loss of resolution Discussion Point: Selecting an Image Format • Should I use GIF, JPEG, or vector graphics for … • Color photos? • Scanned black & white text? (Transcript, itinerary) • Line drawings? These are important practical questions to archivists and digital librarians. Please come to Moodle to discuss this. Hands-On Exercise: Convert Between Formats • Download and save two images – http://www.csc.lsu.edu/~wuyj/Teaching/7008/fa14/Images/image1.jpg – http://www.csc.lsu.edu/~wuyj/Teaching/7008/fa14/Images/image2.gif • Use Microsoft Paint (on Windows: All Programs AccessoriesPaint) to convert each to the other format, and compare quality and the file size – Observe the difference – Why the difference? Basic Video Coding • Display a sequence of images – Fast enough for smooth motion and no flicker – Motion picture film: smooth show at 24 pictures/second • NTSC Video – National Television System Committee (analog TV system) – 60 “interlaced” half-frames/second, 512x486 pixel images • HDTV – 30 “progressive” full-frames/second, 1280x720 pixel images Video Data Rates • “NTSC” Quality Computer Display – 640 x 480 pixel image – 3 bytes per pixel (red, green, blue) – 30 Frames per second • Bandwidth requirement – 26.4 MB/second – That exceeds the bandwidth of most disk drives! • Storage – CD-ROM would hold 25 seconds worth of NTSC video • What is the capacity of CD-ROM? 650-900MB – 30 minutes would require 46.3 GB – About 100GB/hr: too big! – Compression! Multimedia is big! Compress harder! Video Compression • Opportunity: – One frame looks very much like the next • Approach: – Record only the pixels that change (trace the difference) • Standards: – MPEG-1: for Web video (download then play) – MPEG-2: for HDTV and DVD (commercial quality) – MPEG-4: for Web video (streaming) – Next? MPEG Encoding ••• ••• I1 B1 B2 B3 P1 B4 B5 B6 P2 B7 B8 B9 I2 FrameTypes: I Intra (JPEG) Encode complete image, similar to JPEG P Forward Predicted Motion relative to previous I and P’s B Backward Predicted Motion relative to previous & future I’s & P’s MPEG1 Frame Reconstruction I1 I1+P1 I1+P1+P2 ••• I2 ••• updates I frames provide complete image P frames provide series of updates to most recent I frame P1 P2 What if drop an I frame? Bad! Frame Reconstruction I1 I1+P1 I1+P1+P2 ••• I2 ••• Interpolations B frames interpolate between frames represented by I’s & P’s B1 B2 B3 B4 B5 B6 B7 B8 B9 Basic Audio Encoding (Digitizing) • Sample at twice the highest frequency (22KHz) – 8 or 16 bits per sample, sample rate: X samples/second Sampler • Speech (0-4 kHz) requires 8 kB/s – Standard telephone channel (1-byte samples) • Music (0-22 kHz) requires 172 kB/s (uncompressed) – Standard for CD-quality audio (2-byte samples) • Pitch range: – http://www.youtube.com/watch?v=zESbrwRvMyM – Caution! Extremely high pitch: the following can hurt your ears! Stop playing when feel uncomfortable! http://www.youtube.com/watch?v=BX7Ar3Z-oTo Music Compression • Opportunity: – The human ear cannot hear all frequencies at once • Approach: – Don’t represent “masked” frequencies • Standard: MPEG-1 Layer 3 (.mp3) Loudness: http://www.tlc-direct.co.uk/Technical/Sounds/Decibles.htm loudness frequency Temporal Masking If we hear a loud sound, then it stops, it takes a while until we can hear a soft tone at about the same frequency. “Psychoacoustic compression” – – – – Eliminate sounds below threshold of hearing Eliminate sounds that are frequency masked Eliminate sounds that are temporally masked Eliminate stereo information for low frequencies Compact Disk (CD) Recording • Parameters – 44,100 samples per second • Sufficient for frequency response of 22KHz – Each sample takes 16 bits • 48 dB (decibel) range – Two independent channels: stereo sound • Dolby surround-sound uses tricks to pack 5 sound channels + subwoofer effects • Bit Rate – 44.1K samples/sec x 2 channels x 2 bytes/sample = 172 KB/sec • Typical Capacity – 74 Minutes maximum playing time – 747 MB total Speech Compression • Opportunity: – Human voices vary in predictable ways • Approach: – Predict what’s next, then send only any corrections/changes • Standards: – Real audio can code speech in 6.5 kb/sec • Demo at http://www.data-compression.com/speech.html – Scroll down to near the bottom: “VII. Demonstration” – Listen to the original and LPC10U (2400bps) to understand speech effect with different compression rate. Narrated PowerPoint • Create your slides using PowerPoint • Slide Show Record Narration – Set microphone level • Record the narration – Slide transitions are automatically captured • Narration plays automatically when displayed – Synchronized between slide flipping and narration Adding Video to PowerPoint • InsertMovies and Sounds – Movies from file (a .mpg file) • Decide whether you want “autostart” – If not, it starts when you click on it The “Last Mile”: bandwidth to your desk • Traditional modems – “56” kb/sec modems really move data at ~3 kB/sec – Maximumly 56 kb/s theoretically • Digital Subscriber Lines (DSL) – 384 kb/sec downloads (~38 kB/sec) – 128 kb/sec uploads (~12 kB/sec) • Cable modems – 10 Mb/sec downloads (~1 MB/sec) – 256 kb/sec uploads (~25kB/sec) Multimedia on a Web Server Web Browser Web Server Media Player • Object stored in a file • File transferred as an HTTP object: – Received entirely at the client – Passed to media player for play – This seems stupid because downloading is slow Streaming buffering Web Browser Web Server Media Player Streaming Server Can be downloaded and installed • Browser gets a portion of media file over HTTP – Launches media player to interpret that media file • Media player contacts streaming server Streaming Audio and Video • • • • Begin to play after only a portion received Buffer provides time to recover lost packets Interrupts replay when “rebuffering” Data not saved to hard drive. Buffer Media Sever Internet Lost Packets (IP Phone) • Network loss – Packets completely lost (e.g., due to collisions) • Delay loss – Packets arrives too late for playout • Due to: queueing; sender and receiver processing delays • IP Phone: Typical maximum tolerable delay: 400 ms • Loss tolerance – 1% to 10% packet loss may be tolerable • Some encoding schemes are more tolerant than others Multiple Client Rates 1.5 Mbps encoding 28.8 Kbps encoding Q: how to handle different client receiving rate capabilities? – 28.8 Kbps dialup – 128 Kbps to 3 Mbps (residential DSL service) – 100Mbps Ethernet A: server stores, transmits multiple copies of video, encoded at different rates, for different users Synchronizing Multiple Media • Scripting Languages for synchronizing multiple media: – Synchronized Multimedia Integration Language (SMIL) • Custom applications for this: – Macromedia Flash • Content representation standards for this: – MPEG 4 SMIL • Synchronized Multimedia Integration Language • Integration of multimedia with text, audio, video • Supported in RealPlayer Slide from http://www.umiacs.umd.edu/~jimmylin/LBSC690-2007-Spring/content.html (Session 5) SMILe • Follows W3C standard – Player-specific extensions are common – Real Player implements SMIL (or SMILe) • It is XML, with a structure similar to HTML <smil> <head> … </head> <body> … </body> </smil> Elements in SMIL • Window controls (in <head>) – Controlling layout: <region>, <root-layout> • Timeline controls (in <body>) – Sequence control: <seq>, <excl>, <par> – Timing control: <begin>, <end>, <dur> • Content types (in <body>) – <audio>, <video>, <img>, <ref> SMIL Examples • Implemented in RealOne Player • You need to install RealOne Player (or Real Player) to run the following examples • Demo: http://www.csc.lsu.edu/~wuyj/Teaching/7008/fa14/SMIL-demo/index.htm There are 3 sets of executable and text files, at least run/read the last set: – First, run the smildemo.smil (executable) – Then, view smildemo.smil (xml) file • Question: can you make sense of smildemo.smil? – You are welcome to play with the first 2 sets. SMIL Example <smil> <head> <meta name="title" content="Online Teaching Services promo" /> <meta name="author" content="Jay Moonah, CAT" /> <layout type="text/smil-basic-layout"> <root-layout width="280" height="316" background-color="white"/> <region id="AnimChannel1" title="AnimChannel1" left="0" top="0" height="265" width="280" fit="hidden"/> </layout> </head> <body> <par title="Online Teaching Services promo" author="Jay Moonah, CAT" > <audio src="final.rm" id="Soundtrack" title="Soundtrack"/> <animation src="otscompfin.swf" id="Animation" region="AnimChannel1" title="Animation" fill="freeze"/> <text src="cc.rt" id="caption" region="cc" title="cc" fill="freeze"/> </par> </body> </smil> Slide from http://www.umiacs.umd.edu/~jimmylin/LBSC690-2007-Spring/content.html (Session 5) Synchronizing audio, animation, and text. From Media to Multimedia… • Tricking the human senses: – Blending pixels into a seamless image – Rapidly cycling through images to create motion (video) – Sampling analog waveforms to create digital recordings • Lots of information required to encode images, movies, and sounds – Result: you get a bulky digital file, not handy for online distribution and access. – The Key is compression! • Synchronization of different media sources leads to multimedia applications Discussion Point: When is Lossless Compression Important? • • • • For images? For text? For sound? For video? Please Come to Moodle to discuss.