Test

advertisement
IT 241
Information Discovery Fall 2014 Exam 1
Page 1
Tuesday, Sept. 30, 2014
Name _____________________________
[6 pts]
1.
Below is one of the visualization pipelines from the text.
There are three stages (raw data, data tables, and visual structures) and three transformations (data/flow
transformations, visual mappings, view transformations/interact) in the above pipeline for a final
visualization. Match each stage or transformation to the form of the data and final visualization as listed
below.
Handling missing values _______________________
Data found in a relational database and separate CSV files _____________________________
Choosing which attribute will be color and which will be glyph size ____________________
Selecting a datapoint range in a parallel coordinates graph ______________________________
Data merged into a simpler CSV file ________________________
Displaying parallel coordinates or matrix of scatterplots _____________________
2. A graphic can be classified as an exploratory visualization, an explanatory visualization or an example of visual art.
Explain the differences among these three graphics types by giving an example of each and why it is one and not the
others.
[6 pts]
IT 241
Information Discovery Fall 2014 Exam 1
Page 2
[18 pts]
3. Data coding.
a. The value 1001 0110 in binary is _______________ in decimal
and its corresponding hexadecimal digits are ____ ____.
Converting decimal 39 to binary becomes ______________ .
If the 8 bit ASCII codes in decimal for “A” and “a” are 65 and 97, respectively,
then the decimal ASCII codes for the string “Aced” are _____________________________________.
If we want to store 75 unique values, we would need at least _______ bits to represent those values.
b. A 1200 x 1000 pixel color image coded in RGB (+ 1 alpha byte) format requires ________________bytes.
c.
______ (True/false) Compression techniques such as MP4, and GIF for images, audio and video are lossy,
meaning that compression reduces their size in a way that the original codings can be recovered.
d. A 60 second stereo sound clip (2 channels) sampled at 48000 samples per second with a 16 bit depth will
result in storing __________________ bytes.
e. What are the colors for these RGB hexadecimal encodings? (choose from: white, black, grey, red, green,
yellow, blue, brown, dark blue, dark green)
f.
FFFFFF = __________________
888888 = ______________________
FFFF00 = __________________
000044 = ______________________
______(True/false) A CSV file is editable in a text editor, exportable from Excel and a general input format
for most visualization and data mining tools.
g. ______(True/false) An XML file is self-describing, editable in a text editor, supports complex data
organizations and is an input option for visualization and data mining tools.
4. Plot on the number line with small circles this set of 10 univariate numbers {2, 6, 8, 10, 16, 24, 28, 31, 45, 50} then
superimpose a Tukey box plot representing the median and 25th and 75th percentiles.
[10 pts]
┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼
0
5
10
15
20
25
30
35
40
45
50
55
What is the mean of these data? _______
If a standard deviation (average distance from the average) when added/subtracted from the mean spans the middle
70% of the data above and below the mean, estimate its value ________ .
If we normalize this data set to fall between 0 and 1, assuming the data range is likely (0 to 50) then the following 3
values are recoded as…
6-> ______, 24-> _______, and 50 -> _______
IT 241
Information Discovery Fall 2014 Exam 1
Page 3
5. Describe three possible ways to handle missing data in a data set. E.g. what to do with the data observation record or
deal with the missing data cell.
[6 pts]
a.
b.
c.
6. The following attributes are found in a daily worker log data set. Associate the best descriptor for each attribute. If
you do not understand the meaning of the attribute, please ask for clarification.
Choices: Nominal-Categorical (Cat), Nominal-Arbitrary (no categories) (Arb),
Nominal-Ranked (Ranked),
Ordinal-Continuous (Cont), Ordinal-Discrete (Disc),
Spatial/geographic (Geo), Temporal/Time (Time)
[9 pts]
Date
Day of week
Latitude-Longitude location of
project
Employee name
Hours worked on project (can
be fractional)
Number of widgets used
High temperature of day
Project code label
Project priority (5 levels)
[9 pts]
7. Perception true/false
_____ Recognition and organization are elements of perception.
_____ Change blindness is a form of color blindness.
_____ The cones in the human eye react to red, yellow and blue primary colors.
_____ About 2% of the population has a named form of color blindness.
_____ Smooth pursuit eye movements are when each eye tracks or follows an object independently
_____ Saccadic eye movements are the rapid eye movements to scan targets of interest
_____ Motion is a strong pre-attentive feature.
_____ Some illusions might portray data that is not there, e.g. Hermann Grid illusions.
_____ Each pre-attentive feature used in a visualization should map to an attribute from the data.
IT 241
Information Discovery Fall 2014 Exam 1
Page 4
8. True/false statistics.
[15 pts]
_____ Statistics can be computed only if the data columns are continuous ordinal.
_____ Nominal-ranked can be converted to ordinal.
_____ Nominal-categorical unordered can be converted to a series of binary attributes.
_____ A Likert scale is a form of nominal-ranked.
_____ Calculating a mode statistic is best for ordinal-discrete or nominal-ranked.
_____ Use of bins allows conversion of ordinal data to nominal-ranked.
_____ Bins with the ranges [80,90) and [90,100], puts the value 90 into the [80,90) bin.
_____ Frequency counts are appropriate only for data in discrete ordinal form.
_____ Correlation calculates a value rating the relationship between pairs of values from the same row.
_____ The correlation statistic is a value in the range [0,1].
_____ A correlation of 0.9 of two attributes says that the pair values from a row is related, roughly, as when the
first number is high then the other number is low.
_____ A correlation close to zero means that both attributes should be ignored.
_____ Linear regression attempts to fit a line to the data that maximizes the y-distance between the line and the
data points.
_____ Linear regression can be applied to 1 dependent variable and many independent variables.
_____ Time series analyzes a sequence of measurements for its cycles and prediction of patterns.
9. What are the dimensions of data found in Minards map? List at least 5.
[5 pts]
IT 241
Information Discovery Fall 2014 Exam 1
Page 5
10. For each of the types of attributes: ordinal, nominal, and relational, choose the best 2 features that would support the
visualization of a value in such an attribute type. Place an X in the box if the feature is effective in displaying
different values.
[9 pts]
Feature
Example
Ordinal
Nominal
(unranked)
Relational
(between two
items)
Position/
placement
Length
Shape
Color
(imagine!)
Connection
11. Basic visualization design. Consider these two graphics we studied in class.
[7 pts]
a. What one improvement would you make to the left graphic? Why? [2]
b. What one improvement would you make to the right graphic? Why? [2]
c. How do the graphics complement each other? [3]
Download