Data visualization – Tables and Charts Data preparation 1. Raw data 2. Structure data 3. Data processing 4. Exploration data analysis (EDA) 5. Insight reports, visual graphs Data visualization involves ● Creating a summary table for the data ● Generating charts to help interpret, and learn the data Purposes: ● Summarize data information by highlighting important relationships and trends ● Identify data errors if any Tables: Usually have headers, title, observations and variables. When to use tables? ● Refer to specific numerical values ● Make precise comparisons between different values, not just relative comparisons ● Variables have different units or very different magnitudes LESS IS BETTER (such as lines separating data) Data-ink ratio= ink used to convey the meaning of the data / total ink used in a table or chart Table design principles ● Avoid using Vertical Lines unless they are necessary for clarity ● Horizontal lines are necessary only for ○ Separating column titles from data values or ○ When indicating a calculation has taken place ● Use shading to separate columns ● Numbers are right aligned to highlight any differences ● If you are showing digits, be sure to have the same numbers of digits (such as one piece of data has a decimal then make all numbers have decimals) Tables ● Crosstabulation: a table describing relationship between two variables Scatter charts ● Relationship between two numerical variables Line charts: ● Similar to scatter charts with the dots connected ● Useful when one variable is time Bar charts: ● Categorical data is the minimum (like how many are in Jan or in Feb or Mar). Should you include 0 in the bar. Always include 0 in bar charts. ● When there are two categorical data most likely use clustered column chart Pie charts: ● Suggest avoiding pie charts, ● Frequently used to compare categorical data. ● Comparing information using areas and angles, which are very hard for humans to judge. ● Most loathed graph of all time Bubble charts: ● Visualize 3 variables in a 2 dimensional chart Heat Maps: ● Use colors to convey information Scatter-Chart Matrix: ● Study the relationships across many pairs of (numerical) variables Distributions plots: ● Histograms: Distribution of one numerical variable. Make sure the width of the bins are proportional and “make a story” ● Box plots: (side-by-side) Useful for comparing subgroups. Can use categorical data. Use this chart to identify outliers and try to explore whether they are due to experimental errors. Jittering: ● Uncrowds the data by allowing more markers to be seen ● Moving markers by a small random amount ● Tells a better story, moves observations to show multitudes of data when they are stacked on top of each other. Map Chart: Combating countries Geographic information systems (GIS): A system that merges maps and statistics to present data over different geographic areas. Data Dashboard: Data visualization tool that illustrates multiple metrics and automatically updates these metrics as new data become available. Key performance indicators (KPI’s) in dashboards for example ● Automobile dashboards: Current speed, fuel level and oil pressure. ● Business dashboard: Financial position, inventory on hand, customer service metrics. Principle of effective data dashboards ● Present al KPIs as a single screen that users can quickly access ● Provide timely summary information on important KPI ● Call attention to unusual measures that may require attention ● Color should be used carefully Tableau benefits ● Quick and interactive visualization, easy, drag drop, couple of menus, no coding, merge different datasets, handles more data than excel Tableau disadvantages: ● Use excel to clean data, fill in missing values, or create new variables and then use tableau for data visualization Dimension: Categorical data Measure: Numerical data