ch04

advertisement
Chapter 4
Data Formats
4.1
a.
(BL1)
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
ASCII EBCDIC
41
C1
42
C2
43
C3
44
C4
45
C5
46
C6
47
C7
48
C8
49
C9
4A
D1
4B
D2
4C
D3
4D
D4
4E
D5
4F
D6
50
D7
51
D8
52
D9
53
E2
54
E3
55
E4
56
E5
57
E6
58
E7
59
E8
5A
E9
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
ASCII EBCDIC
ASCII EBCDIC
61
81
0
30
F0
62
82
1
31
F1
63
83
2
32
F2
64
84
3
33
F3
65
85
4
34
F4
66
86
5
35
F5
67
87
6
36
F6
68
88
7
37
F7
69
89
8
38
F8
6A
91
9
39
F9
6B
92
6C
93
Values are given in hexadecimal
6D
94
6E
95
6F
96
70
97
71
98
72
99
73
A2
74
A3
75
A4
76
A5
77
A6
78
A7
79
A8
7A
A9
b.
(BL1+) Numeric characters can be converted into numeric values by stripping off or
subtracting the first hexadecimal digit, 3 in the case of ASCII, F in EBCDIC. Thus, the numeric
value would be the ASCII character value minus decimal 48.
c.
(BL1+) In ASCII, the lower-case letters can be converted to capitals by subtracting 20
hex, or 32 decimal from the character value. In EBCDIC, the conversion is done by adding 40
hex, or 64 decimal to the character value.
d.
(BL1+) The method is the same, but the constant to be added or subtracted is different.
Note that the lower-case-to-capital letter conversion is an addition in one case, a subtraction in
the other.
4.2
a.
(BL1+) binary: 0101101 0110011 0101100 0110001 0110100 0110001 0110101
hexadecimal: 2D 33 2E 31 34 31 35
octal: 055 063 054 061 064 061 065
decimal: 45 51 46 49 52 49 53
b.
(BL1+) hexadecimal: 4E F1 6B F2 F5 F0 4B F1
4.3
(BL2-) Converting the code to hexadecimal, the message reads
54 68 69 73 20 69 73 20 45 41 53 59 21.
Now reading from the table on page 65 of the textbook , the message is
This is EASY!
4.4
(BL2) Reading from the table in Figure E4.2, the message reads
MICKEY MOUSE *LOVES* MINNIE, 5000 KISSES
4.5
a.
(BL2-) X * 
b.
(BL2+) The code is self-delimiting. Each combination of 1's and 0's is unique, so that a
sliding pattern matcher can identify each code. A dropped bit during transmission would make it
possible to "lose sync". Suppose the leading 0 in the character 01011 is dropped. If this character
is followed by another character that starts with a 0, the system will read the code as the
character as 10110, and this error can propagate. If a bit gets switched, it is possible to confuse
two characters, For example, if the character code 01101 becomes 01100, it is impossible to
distinguish this character from 01110. This error is self-correcting in the sense that the next
character can still be correctly identified.
4.6 (BL2) The answer depends upon the system that the student is using, of course, but with most
students working on ASCII-based systems, the expected integer values would be ORD(A) = 65,
ORD(B) = 66, ORD(c) = 99, results taken from the ASCII table. If the student switches to an
EBCDIC-based system, the results would change.
4.7
(BL2) This is a programming problem. It can be solved most easily with a table. An algebraic
approach is more difficult because the EBCDIC alphabetic values are split up differently than the
ASCII codes.
4.8
(BL2-3) The answer to this question is individual to each student.
4.9
(BL3) Most systems assume that a numerical value input ends at a "white space", a space, tab, or
carriage return. When a character input variable follows a numerical value, the system treats the
character following the numerical value as the desired character. In the example given here, the
next character, and the one found in charval, will be the carriage return.
4.10 (BL2) Most modern languages define the internal values of enumerated, or user-defined, data types
numerically, starting with 0. The order of the values is defined in the type statement. The ORD
value of TODAY will range from 0 to 6 for this example, with ORD (MON) = 0.
4.11 (BL3) This is a programming problem. The conversion requires that the program break the number
down to the values of its individual integers, then converting each to characters.
4.12 (BL1+) Each ASCII character requires one byte. If one assumes that a typical page of text holds,
say roughly 2000 characters, then the CD-ROM can hold about
650 MB/2 KB or 325,000 pages.
Unicode would store half as much, but with multilingual capability. Of course this result does not
include space used for illustrations and the like. It also does not include fonts and graphic
formatting used for WYSIWYG display, which would require some space. Nonetheless, the result
gives a good idea of the incredible capacity of CD-ROMs.
4.13, 4.14, 4.15 These are project problems.
4.16 (BL2) COBOL defines both numeric characters and numbers using PICTURE statements. PIC (X)
values may only be used as characters. PIC (9) values are stored as numeric characters, but may be
used in arithmetic operations. COBOL programs convert PIC (9) values internally from character to
numerical form when performing arithmetic, and back when the operation is complete.
4.17 (BL2+) /wedge { ... } def %define a procedure named "wedge" to draw a wedge
0 0 moveto %set cursor at (x,y) = 0,0
setgray %reads and sets the first argument as a gray level
/angle1 exch def
/angle2 exch def %read the two angle arguments and name them angle1 and angle2
0 0 144 angle1 angle2 arc %draws a wedge from angle1 to angle2 with radius 144 points
0 0 lineto %draw the closing line
closepath
/Helvetica-Bold findfont
16 scalefont
set font %set the font to 16 point Helvetica-Bold
.4 72 108 wedge fill %calls wedge procedure with gray set to .4 (dark gray), angle1 =72,
angle2 = 108
.8 108 360 edge fill %calls wedge procedure with gray = .7
32 12 translate %move cursor to 32, 12
0 0 72 wedge fill %call wedge procedure with gray = 0 (black)
gsave %save parameters of arc for later outline drawing
-8, 8 translate %move cursor up and to the left 8 points
1 0 72 wedge %draw white wedge on top of black wedge shadow
0 setgray stroke %draw outline in black
grestore %at same place as white wedge
0 setgray %set gray level to black for text
144 144 moveto %move cursor
(baseball cards) show %print text in font that was previously set
-30 200 (cash) show %move cursor, print text
-216 108 (stocks) show %move cursor, print text
32 scalefont %set font size to 32 point
(Personal Assets) show %print text
showpage %display the results
4.18 (BL2) 8-bit ASCII is placed in the Unicode table in such a way that the more significant eight bits
are always 0. The less significant byte in Unicode is identical to the 8-bit ASCII code. The ASCII
code 00000000 is represented by the NUL character, which is ignored by ASCII readers. Thus, the
Unicode would appear to be the same as the corresponding ASCII text, with every other character
set to NUL.
4.19 (BL3) MPEG-2 is an algorithm designed to reduce the amount of data present in a video image.
MPEG-2 uses a lossy compression algorithm that compresses the data both spatially and
temporally. The spatial algorithms are similar to those used for JPEG images. The main algorithm
used is the Discrete Cosine Transform. This technique breaks the image into small square blocks of
pixels, and searches for redundancies within these blocks. Temporal compression is based on the
concept that individual blocks will not normally change very much from frame to frame. This
means that it is possible to store or transmit an image less frequently, with more rapid updating of
just those blocks that have changed. Prediction of movement is also used to reduce the number of
blocks that must be updated. More detail may be found at www.st.com/stonline/books/ and many
other sites.
4.20 (BL2) MP3 uses a lossy compression algorithm, therefore the original music cannot be recovered
exactly. Subtleties are lost in the compression. CDs are not compressed, so the original quality is
maintained (limited only by the digitization).
4.21. (BL 3) (a) PDF creates a page by storing objects of different types (text, bitmap images, object
images, form boxes, and the like) in a dictionary and placing them on a page at different page
coordinates. The page format uses a device-independent coordinate system for this purpose. The
page as a whole can then be appropriately scaled for display or printing, regardless of the resolution
of the device.
(b) In simplest terms, a PDF file consists of page descriptions for each page in a document. A page
description specifies each object on a page, together with its location. Objects are stored in a table.
Objects are identified by their location in the table, as specified by the offset of each object from
the beginning of the table. Objects do not have to be stored in any particular sequence. Objects
include a string of characters, not to exceed a single line on the page, a bitmap image, an object
image, a link to a stream, such as a video stream, ... There are a general set of built-in fonts; in
addition, special fonts can be embedded into the file when the page is created, if required.
To build a page for output, each object is located in the table, and placed at its specified location on
the page. The completed page is scaled as necessary and then presented for display or printing.
Because the objects are stored in a separate table and accessed directly, pages may be retrieved
rapidly in any order.
In comparison with HTML, PDF presents a more accurate page representation. HTML describes a
page relatively crudely, leaving it to the Web browser to complete the layout details. It does not
consider such details as scaling for different environments or browser window sizes. Instead, it
attempts to present a best, reasonable fit. PDF, on the other hand, creates its pages precisely, with
objects located at exact, specified locations on the page, and scaled to appropriate size for the
display or print medium.
(c.) Like other page components, fonts are managed as objects. There are fourteen built-in
typefaces that represent the most commonly used document fonts. A PDF document reader will
attempt to substitute one of these for a font that it doesn't recognize. PDF files can also contain
embedded font objects. Embedded fonts are specific to a document, specifically included as part of
the document description.
(d.) Object images in PDF are represented as a collection of path components. Path components
are lines, cubic Bezier curves, and font outline descriptions. Each component is described by its
type and its parameters. There are also a number of patterns available for shading and filling
objects. Bitmap images are rendered as raster images, then stored in one of a number of PDFspecific compressed binary formats that are designed for efficient document presentation. Common
to these formats are a metadata description of the image, together with a "stream" containing the
actual image data.
(e.) Although PDF was derived from Postscript, it differs in several important ways:
(1) Except for embedded objects such as binary image and audio objects, Postscript files are stored
in alphanumeric form. This includes the postscript commands, program statements, and
descriptions of fonts and vector graphics as well as text. PDF files are stored in a "tokenized"
binary form for compactness and efficiency.
(2) PDF document descriptions are based on a subset of Postscript. Postscript is a page description
language, whereas PDF is strictly a page description file format. Therefore, features of Postscript
that represent programming constructs, such as loops and decision statements are not present in
PDF. Also, the programming construct design of Postscript requires a sequential layout and
interpretation of the Postscript code, making access to random pages in a document more difficult
and slower.
(3) PDF provides capabilities that are not available in standard Postscript, including transparency,
and built-in extensions for 3D rendering.
(f.) PDF limits the the processing that a user can perform on a PDF document in a number of
ways:
(1) Since the format is page-focused, edits that expand the document material in such a way as to
move material from one page to the next require re-rendering of the document. Even text editing
that moves text from one line to the next requires re-rendering, because text is stored as line-byline objects.
(2) The PDF reader software prohibits a user from modifying a document, although it is usually
possible for a reader to fill data into a form and sometimes possible to annotate and add margin
notes to a page.
(3) Because PDF stores documents as a series of objects and pointers, it is difficult or almost
impossible to cut or copy material from a document for use elsewhere. For example, special
software is required to import a PDF document into a word processor.
Security features built into PDF also allow the creator of a document to limit the operations that
can be performed by a reader.
4.22. (BL2)
property
GIF
Type of image
PNG
JPEG
all are bitmap images, used for photos and digital artwork
compression
lossless
lossless
lossy
file size
largest
usually somewhat
smaller than GIF, due to
improved compression
technology
smallest, due to use of
lossy compression;
depth of compression
vs. quality adjustable
when jpeg image is
created.
color capability
limited to 256 color
palette selected from a
24-bit color space
up to 48-bit color; 16-bit 24-bit color
gray level; or various
color palettes
particular strengths and
weaknesses
poor for drawings with
text, line art, or sharp
edges; excellent for
photographs and
artwork with lots of
color variation and
detail; multiple edits are
difficult due to
generation loss.
additional features
animation capability; 4pass, 1-d interlacing
allows rough early
identification of image
variable transparency,
gamma color correction;
7-pass, 2-d interlacing
allows rough early
identification of image-faster than GIF.
usage restrictions
none; formerly subject
to patent licensing
restrictions--patents
expired in 2006
none: open standard
probably none; attempts
to claim patents on JPEG
have been invalidated
by various courts and
the US Patent Office
Download