Data Visualization: Key Concepts, Techniques, and Best Practices by Dr. Abdul Rehman Abbasi What is Data Visualization? Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to observe trends, outliers, and patterns in data. 2 Importance of Data Visualization Simplifies Complex Data: Makes large datasets easier to understand. Reveals Insights: Highlights trends, correlations, and anomalies. Improves Decision-Making: Provides a clearer basis for informed decisions. Enhances Communication: Makes presentations more engaging and impactful. 3 Complex versus Simple 4 5 6 7 8 Source: NEPRA State of Industry Report 2023 9 10 Source: IAEA Annual Report 2023 11 Source: IAEA Annual Report 2023 12 Key Elements of Effective Data Visualization Clarity: The visualization should be easy to interpret. Accuracy: Data must be presented without distortion or bias. Relevance: Focus on what’s important for the target audience. Design: Use colors, labels, and layouts effectively to avoid clutter. 13 Common Data Visualization Techniques 2. Graphs 3. Maps 4. Specialized Visualizations Line Chart Scatter Plot Choropleth Map Tree Map Bar Chart Histogram Heat Map Network Graph Pie Chart Box Plot 1. Charts Gantt Chart Stacked Bar Chart 14 Common Data Visualization Techniques 1. Charts 2. Graphs 3. Maps • Line Chart: Shows trends over time. • Bar Chart: Compares categories or groups. • Pie Chart: Displays proportions or percentages. • Stacked Bar Chart: Represents cumulative values across categories. • Scatter Plot: Shows relationships or correlations between two variables. • Histogram: Displays frequency distributions. • Box Plot: Summarizes distributions and outliers. • Choropleth Map: Uses color shading to show data values across geographical regions. • Heat Map: Highlights density or intensity of data points. 4. Specialized Visualizations • Tree Map: Represents hierarchical data using nested rectangles. • Network Graph: Shows relationships and connections. • Gantt Chart: Tracks project schedules and timelines. 15 Source: NEPRA State of Industry Report 2023 16 Source: NEPRA State of Industry Report 2023 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Best Practices in Data Visualization Choose the Right Chart Match the type of data with an appropriate visualization. Prioritize Simplicity Eliminate unnecessary elements (e.g., 3D effects, excessive labels). Focus on the Audience Tailor visuals to the knowledge level and interests of the audience. Use Colors Strategically Use consistent color schemes to avoid confusion. Include Labels and Legends Clearly label axes, data points, and categories. Maintain Proportionality Ensure that the visual representation accurately reflects the underlying data. Avoid overcomplicated visuals for simple datasets. Use clean, uncluttered designs to focus on the data. Provide context with titles, legends, and annotations. Highlight key data points with contrasting colors. Provide a legend for any symbols or color codes. 33 Tools for Data Visualization • General Tools • Microsoft Excel, Google Sheets: Basic charting tools for small datasets. • Advanced Tools • Tableau: Professional-grade tool for interactive dashboards. • Power BI: Integrates well with Microsoft tools for business analytics. • Google Data Studio: For creating interactive reports from Google data sources. • Programming Libraries • Python: • Matplotlib: Simple plots and charts. • Seaborn: Statistical data visualization. • Plotly: Interactive and dynamic visuals. • R: • ggplot2 for sophisticated visualizations 34 Matplotlib • import matplotlib.pyplot as plt • # Data • # Numbers from 1 to 8 • x = range(1, 8) • # Square of each number • y = [num ** 2 for num in x] • # Create the plot • plt.figure(figsize=(8, 5)) • plt.plot(x, y, marker='o', linestyle='-', color='blue', label='y = x^2') • # Add labels, title, and legend • plt.title('Square of Numbers', fontsize=14) • plt.xlabel(‘X Label (x)', fontsize=12) • plt.ylabel(‘Y Label (y)', fontsize=12) • plt.grid(True, linestyle='--', alpha=0.6) • plt.legend(fontsize=12) • # Show the plot • plt.show() 35 Seaborn • Seaborn is built on top of Matplotlib and provides a highlevel interface for drawing attractive and informative statistical graphics. It is particularly useful for visualizing univariate and bivariate data. Some of its advantages include: • Built-in Themes: Provides built-in themes for better aesthetics. • Statistical Functions: Includes statistical functions for better data insights. • Ease of Use: Simplifies complex visualizations with fewer lines of code. • Integration: Works seamlessly with Pandas data frames. 36 37 38 Assignment-3(a) • The following information is available for two alternative machines for a given service. Compare the true equivalent uniform annual cost for these an indicate which one to choose? Draw a visualization for better decision making? Machine A Capital cost (in Million Rs.) Life (in years) Machine B 3,000 10,000 4 10 Salvage value (in Million Rs.) None 2,000 Annual operating cost (in Million Rs.) 1,400 700 Min Return on Investment (ROI) in % 10 15 39 Assignment-3(b) 40 Assignment-3(b): Draw the following: Bar Chart: Line Chart: • Compare Units Sold or Revenue for each Region or Product. • Show trends in Revenue or Profit Margin over Months for different Products or Regions. Stacked Bar Chart: Pie Chart: • Represent cumulative Revenue by Region or Month. • Show the percentage share of Revenue by Product or Region. Heatmap: Scatter Plot: • Visualize the Profit Margin or Units Sold across Regions and Products. • Display the relationship between Units Sold and Profit Margin for different Products. 41 Assignment 3 Due Date: th January 7 , 2025 Tuesday before 3 P.M. 42