IT 241 Information Discovery Fall 2014 Exam 1 Page 1 Tuesday, Sept. 30, 2014 Name _____________________________ [6 pts] 1. Below is one of the visualization pipelines from the text. There are three stages (raw data, data tables, and visual structures) and three transformations (data/flow transformations, visual mappings, view transformations/interact) in the above pipeline for a final visualization. Match each stage or transformation to the form of the data and final visualization as listed below. Handling missing values _______________________ Data found in a relational database and separate CSV files _____________________________ Choosing which attribute will be color and which will be glyph size ____________________ Selecting a datapoint range in a parallel coordinates graph ______________________________ Data merged into a simpler CSV file ________________________ Displaying parallel coordinates or matrix of scatterplots _____________________ 2. A graphic can be classified as an exploratory visualization, an explanatory visualization or an example of visual art. Explain the differences among these three graphics types by giving an example of each and why it is one and not the others. [6 pts] IT 241 Information Discovery Fall 2014 Exam 1 Page 2 [18 pts] 3. Data coding. a. The value 1001 0110 in binary is _______________ in decimal and its corresponding hexadecimal digits are ____ ____. Converting decimal 39 to binary becomes ______________ . If the 8 bit ASCII codes in decimal for “A” and “a” are 65 and 97, respectively, then the decimal ASCII codes for the string “Aced” are _____________________________________. If we want to store 75 unique values, we would need at least _______ bits to represent those values. b. A 1200 x 1000 pixel color image coded in RGB (+ 1 alpha byte) format requires ________________bytes. c. ______ (True/false) Compression techniques such as MP4, and GIF for images, audio and video are lossy, meaning that compression reduces their size in a way that the original codings can be recovered. d. A 60 second stereo sound clip (2 channels) sampled at 48000 samples per second with a 16 bit depth will result in storing __________________ bytes. e. What are the colors for these RGB hexadecimal encodings? (choose from: white, black, grey, red, green, yellow, blue, brown, dark blue, dark green) f. FFFFFF = __________________ 888888 = ______________________ FFFF00 = __________________ 000044 = ______________________ ______(True/false) A CSV file is editable in a text editor, exportable from Excel and a general input format for most visualization and data mining tools. g. ______(True/false) An XML file is self-describing, editable in a text editor, supports complex data organizations and is an input option for visualization and data mining tools. 4. Plot on the number line with small circles this set of 10 univariate numbers {2, 6, 8, 10, 16, 24, 28, 31, 45, 50} then superimpose a Tukey box plot representing the median and 25th and 75th percentiles. [10 pts] ┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼ 0 5 10 15 20 25 30 35 40 45 50 55 What is the mean of these data? _______ If a standard deviation (average distance from the average) when added/subtracted from the mean spans the middle 70% of the data above and below the mean, estimate its value ________ . If we normalize this data set to fall between 0 and 1, assuming the data range is likely (0 to 50) then the following 3 values are recoded as… 6-> ______, 24-> _______, and 50 -> _______ IT 241 Information Discovery Fall 2014 Exam 1 Page 3 5. Describe three possible ways to handle missing data in a data set. E.g. what to do with the data observation record or deal with the missing data cell. [6 pts] a. b. c. 6. The following attributes are found in a daily worker log data set. Associate the best descriptor for each attribute. If you do not understand the meaning of the attribute, please ask for clarification. Choices: Nominal-Categorical (Cat), Nominal-Arbitrary (no categories) (Arb), Nominal-Ranked (Ranked), Ordinal-Continuous (Cont), Ordinal-Discrete (Disc), Spatial/geographic (Geo), Temporal/Time (Time) [9 pts] Date Day of week Latitude-Longitude location of project Employee name Hours worked on project (can be fractional) Number of widgets used High temperature of day Project code label Project priority (5 levels) [9 pts] 7. Perception true/false _____ Recognition and organization are elements of perception. _____ Change blindness is a form of color blindness. _____ The cones in the human eye react to red, yellow and blue primary colors. _____ About 2% of the population has a named form of color blindness. _____ Smooth pursuit eye movements are when each eye tracks or follows an object independently _____ Saccadic eye movements are the rapid eye movements to scan targets of interest _____ Motion is a strong pre-attentive feature. _____ Some illusions might portray data that is not there, e.g. Hermann Grid illusions. _____ Each pre-attentive feature used in a visualization should map to an attribute from the data. IT 241 Information Discovery Fall 2014 Exam 1 Page 4 8. True/false statistics. [15 pts] _____ Statistics can be computed only if the data columns are continuous ordinal. _____ Nominal-ranked can be converted to ordinal. _____ Nominal-categorical unordered can be converted to a series of binary attributes. _____ A Likert scale is a form of nominal-ranked. _____ Calculating a mode statistic is best for ordinal-discrete or nominal-ranked. _____ Use of bins allows conversion of ordinal data to nominal-ranked. _____ Bins with the ranges [80,90) and [90,100], puts the value 90 into the [80,90) bin. _____ Frequency counts are appropriate only for data in discrete ordinal form. _____ Correlation calculates a value rating the relationship between pairs of values from the same row. _____ The correlation statistic is a value in the range [0,1]. _____ A correlation of 0.9 of two attributes says that the pair values from a row is related, roughly, as when the first number is high then the other number is low. _____ A correlation close to zero means that both attributes should be ignored. _____ Linear regression attempts to fit a line to the data that maximizes the y-distance between the line and the data points. _____ Linear regression can be applied to 1 dependent variable and many independent variables. _____ Time series analyzes a sequence of measurements for its cycles and prediction of patterns. 9. What are the dimensions of data found in Minards map? List at least 5. [5 pts] IT 241 Information Discovery Fall 2014 Exam 1 Page 5 10. For each of the types of attributes: ordinal, nominal, and relational, choose the best 2 features that would support the visualization of a value in such an attribute type. Place an X in the box if the feature is effective in displaying different values. [9 pts] Feature Example Ordinal Nominal (unranked) Relational (between two items) Position/ placement Length Shape Color (imagine!) Connection 11. Basic visualization design. Consider these two graphics we studied in class. [7 pts] a. What one improvement would you make to the left graphic? Why? [2] b. What one improvement would you make to the right graphic? Why? [2] c. How do the graphics complement each other? [3]