Chapter 4 Data Formats 4.1 a. (BL1) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ASCII EBCDIC 41 C1 42 C2 43 C3 44 C4 45 C5 46 C6 47 C7 48 C8 49 C9 4A D1 4B D2 4C D3 4D D4 4E D5 4F D6 50 D7 51 D8 52 D9 53 E2 54 E3 55 E4 56 E5 57 E6 58 E7 59 E8 5A E9 a b c d e f g h i j k l m n o p q r s t u v w x y z ASCII EBCDIC ASCII EBCDIC 61 81 0 30 F0 62 82 1 31 F1 63 83 2 32 F2 64 84 3 33 F3 65 85 4 34 F4 66 86 5 35 F5 67 87 6 36 F6 68 88 7 37 F7 69 89 8 38 F8 6A 91 9 39 F9 6B 92 6C 93 Values are given in hexadecimal 6D 94 6E 95 6F 96 70 97 71 98 72 99 73 A2 74 A3 75 A4 76 A5 77 A6 78 A7 79 A8 7A A9 b. (BL1+) Numeric characters can be converted into numeric values by stripping off or subtracting the first hexadecimal digit, 3 in the case of ASCII, F in EBCDIC. Thus, the numeric value would be the ASCII character value minus decimal 48. c. (BL1+) In ASCII, the lower-case letters can be converted to capitals by subtracting 20 hex, or 32 decimal from the character value. In EBCDIC, the conversion is done by adding 40 hex, or 64 decimal to the character value. d. (BL1+) The method is the same, but the constant to be added or subtracted is different. Note that the lower-case-to-capital letter conversion is an addition in one case, a subtraction in the other. 4.2 a. (BL1+) binary: 0101101 0110011 0101100 0110001 0110100 0110001 0110101 hexadecimal: 2D 33 2E 31 34 31 35 octal: 055 063 054 061 064 061 065 decimal: 45 51 46 49 52 49 53 b. (BL1+) hexadecimal: 4E F1 6B F2 F5 F0 4B F1 4.3 (BL2-) Converting the code to hexadecimal, the message reads 54 68 69 73 20 69 73 20 45 41 53 59 21. Now reading from the table on page 65 of the textbook , the message is This is EASY! 4.4 (BL2) Reading from the table in Figure E4.2, the message reads MICKEY MOUSE *LOVES* MINNIE, 5000 KISSES 4.5 a. (BL2-) X * b. (BL2+) The code is self-delimiting. Each combination of 1's and 0's is unique, so that a sliding pattern matcher can identify each code. A dropped bit during transmission would make it possible to "lose sync". Suppose the leading 0 in the character 01011 is dropped. If this character is followed by another character that starts with a 0, the system will read the code as the character as 10110, and this error can propagate. If a bit gets switched, it is possible to confuse two characters, For example, if the character code 01101 becomes 01100, it is impossible to distinguish this character from 01110. This error is self-correcting in the sense that the next character can still be correctly identified. 4.6 (BL2) The answer depends upon the system that the student is using, of course, but with most students working on ASCII-based systems, the expected integer values would be ORD(A) = 65, ORD(B) = 66, ORD(c) = 99, results taken from the ASCII table. If the student switches to an EBCDIC-based system, the results would change. 4.7 (BL2) This is a programming problem. It can be solved most easily with a table. An algebraic approach is more difficult because the EBCDIC alphabetic values are split up differently than the ASCII codes. 4.8 (BL2-3) The answer to this question is individual to each student. 4.9 (BL3) Most systems assume that a numerical value input ends at a "white space", a space, tab, or carriage return. When a character input variable follows a numerical value, the system treats the character following the numerical value as the desired character. In the example given here, the next character, and the one found in charval, will be the carriage return. 4.10 (BL2) Most modern languages define the internal values of enumerated, or user-defined, data types numerically, starting with 0. The order of the values is defined in the type statement. The ORD value of TODAY will range from 0 to 6 for this example, with ORD (MON) = 0. 4.11 (BL3) This is a programming problem. The conversion requires that the program break the number down to the values of its individual integers, then converting each to characters. 4.12 (BL1+) Each ASCII character requires one byte. If one assumes that a typical page of text holds, say roughly 2000 characters, then the CD-ROM can hold about 650 MB/2 KB or 325,000 pages. Unicode would store half as much, but with multilingual capability. Of course this result does not include space used for illustrations and the like. It also does not include fonts and graphic formatting used for WYSIWYG display, which would require some space. Nonetheless, the result gives a good idea of the incredible capacity of CD-ROMs. 4.13, 4.14, 4.15 These are project problems. 4.16 (BL2) COBOL defines both numeric characters and numbers using PICTURE statements. PIC (X) values may only be used as characters. PIC (9) values are stored as numeric characters, but may be used in arithmetic operations. COBOL programs convert PIC (9) values internally from character to numerical form when performing arithmetic, and back when the operation is complete. 4.17 (BL2+) /wedge { ... } def %define a procedure named "wedge" to draw a wedge 0 0 moveto %set cursor at (x,y) = 0,0 setgray %reads and sets the first argument as a gray level /angle1 exch def /angle2 exch def %read the two angle arguments and name them angle1 and angle2 0 0 144 angle1 angle2 arc %draws a wedge from angle1 to angle2 with radius 144 points 0 0 lineto %draw the closing line closepath /Helvetica-Bold findfont 16 scalefont set font %set the font to 16 point Helvetica-Bold .4 72 108 wedge fill %calls wedge procedure with gray set to .4 (dark gray), angle1 =72, angle2 = 108 .8 108 360 edge fill %calls wedge procedure with gray = .7 32 12 translate %move cursor to 32, 12 0 0 72 wedge fill %call wedge procedure with gray = 0 (black) gsave %save parameters of arc for later outline drawing -8, 8 translate %move cursor up and to the left 8 points 1 0 72 wedge %draw white wedge on top of black wedge shadow 0 setgray stroke %draw outline in black grestore %at same place as white wedge 0 setgray %set gray level to black for text 144 144 moveto %move cursor (baseball cards) show %print text in font that was previously set -30 200 (cash) show %move cursor, print text -216 108 (stocks) show %move cursor, print text 32 scalefont %set font size to 32 point (Personal Assets) show %print text showpage %display the results 4.18 (BL2) 8-bit ASCII is placed in the Unicode table in such a way that the more significant eight bits are always 0. The less significant byte in Unicode is identical to the 8-bit ASCII code. The ASCII code 00000000 is represented by the NUL character, which is ignored by ASCII readers. Thus, the Unicode would appear to be the same as the corresponding ASCII text, with every other character set to NUL. 4.19 (BL3) MPEG-2 is an algorithm designed to reduce the amount of data present in a video image. MPEG-2 uses a lossy compression algorithm that compresses the data both spatially and temporally. The spatial algorithms are similar to those used for JPEG images. The main algorithm used is the Discrete Cosine Transform. This technique breaks the image into small square blocks of pixels, and searches for redundancies within these blocks. Temporal compression is based on the concept that individual blocks will not normally change very much from frame to frame. This means that it is possible to store or transmit an image less frequently, with more rapid updating of just those blocks that have changed. Prediction of movement is also used to reduce the number of blocks that must be updated. More detail may be found at www.st.com/stonline/books/ and many other sites. 4.20 (BL2) MP3 uses a lossy compression algorithm, therefore the original music cannot be recovered exactly. Subtleties are lost in the compression. CDs are not compressed, so the original quality is maintained (limited only by the digitization). 4.21. (BL 3) (a) PDF creates a page by storing objects of different types (text, bitmap images, object images, form boxes, and the like) in a dictionary and placing them on a page at different page coordinates. The page format uses a device-independent coordinate system for this purpose. The page as a whole can then be appropriately scaled for display or printing, regardless of the resolution of the device. (b) In simplest terms, a PDF file consists of page descriptions for each page in a document. A page description specifies each object on a page, together with its location. Objects are stored in a table. Objects are identified by their location in the table, as specified by the offset of each object from the beginning of the table. Objects do not have to be stored in any particular sequence. Objects include a string of characters, not to exceed a single line on the page, a bitmap image, an object image, a link to a stream, such as a video stream, ... There are a general set of built-in fonts; in addition, special fonts can be embedded into the file when the page is created, if required. To build a page for output, each object is located in the table, and placed at its specified location on the page. The completed page is scaled as necessary and then presented for display or printing. Because the objects are stored in a separate table and accessed directly, pages may be retrieved rapidly in any order. In comparison with HTML, PDF presents a more accurate page representation. HTML describes a page relatively crudely, leaving it to the Web browser to complete the layout details. It does not consider such details as scaling for different environments or browser window sizes. Instead, it attempts to present a best, reasonable fit. PDF, on the other hand, creates its pages precisely, with objects located at exact, specified locations on the page, and scaled to appropriate size for the display or print medium. (c.) Like other page components, fonts are managed as objects. There are fourteen built-in typefaces that represent the most commonly used document fonts. A PDF document reader will attempt to substitute one of these for a font that it doesn't recognize. PDF files can also contain embedded font objects. Embedded fonts are specific to a document, specifically included as part of the document description. (d.) Object images in PDF are represented as a collection of path components. Path components are lines, cubic Bezier curves, and font outline descriptions. Each component is described by its type and its parameters. There are also a number of patterns available for shading and filling objects. Bitmap images are rendered as raster images, then stored in one of a number of PDFspecific compressed binary formats that are designed for efficient document presentation. Common to these formats are a metadata description of the image, together with a "stream" containing the actual image data. (e.) Although PDF was derived from Postscript, it differs in several important ways: (1) Except for embedded objects such as binary image and audio objects, Postscript files are stored in alphanumeric form. This includes the postscript commands, program statements, and descriptions of fonts and vector graphics as well as text. PDF files are stored in a "tokenized" binary form for compactness and efficiency. (2) PDF document descriptions are based on a subset of Postscript. Postscript is a page description language, whereas PDF is strictly a page description file format. Therefore, features of Postscript that represent programming constructs, such as loops and decision statements are not present in PDF. Also, the programming construct design of Postscript requires a sequential layout and interpretation of the Postscript code, making access to random pages in a document more difficult and slower. (3) PDF provides capabilities that are not available in standard Postscript, including transparency, and built-in extensions for 3D rendering. (f.) PDF limits the the processing that a user can perform on a PDF document in a number of ways: (1) Since the format is page-focused, edits that expand the document material in such a way as to move material from one page to the next require re-rendering of the document. Even text editing that moves text from one line to the next requires re-rendering, because text is stored as line-byline objects. (2) The PDF reader software prohibits a user from modifying a document, although it is usually possible for a reader to fill data into a form and sometimes possible to annotate and add margin notes to a page. (3) Because PDF stores documents as a series of objects and pointers, it is difficult or almost impossible to cut or copy material from a document for use elsewhere. For example, special software is required to import a PDF document into a word processor. Security features built into PDF also allow the creator of a document to limit the operations that can be performed by a reader. 4.22. (BL2) property GIF Type of image PNG JPEG all are bitmap images, used for photos and digital artwork compression lossless lossless lossy file size largest usually somewhat smaller than GIF, due to improved compression technology smallest, due to use of lossy compression; depth of compression vs. quality adjustable when jpeg image is created. color capability limited to 256 color palette selected from a 24-bit color space up to 48-bit color; 16-bit 24-bit color gray level; or various color palettes particular strengths and weaknesses poor for drawings with text, line art, or sharp edges; excellent for photographs and artwork with lots of color variation and detail; multiple edits are difficult due to generation loss. additional features animation capability; 4pass, 1-d interlacing allows rough early identification of image variable transparency, gamma color correction; 7-pass, 2-d interlacing allows rough early identification of image-faster than GIF. usage restrictions none; formerly subject to patent licensing restrictions--patents expired in 2006 none: open standard probably none; attempts to claim patents on JPEG have been invalidated by various courts and the US Patent Office