Basic Plots Univariate, bivariate, multivariate Histogram, boxplot, dotplot, barchart, spine plot Scatterplot, density plots, mosaic plot Parallel coordinate plot, profile plots Maps Time series plots Data and its shapes Data comes in a lot of different formats We will assume that we can always get it into a shape (spread-sheet like) with headers at the top columns for each piece of information and rows for each object Univariate A dotplot is used for real-valued variables. A dot is positioned along an axis to represent the data value. 0 5 10 % capitals 15 %Cap 14 12.8 13 1.3 13 10 Breaks at $1 60 40 0 20 Frequency Histograms are used for realvalued variables. 80 Univariate 0 1 2 3 4 5 6 7 8 9 10 Tips ($) 0 10 20 30 40 Breaks at 10c Frequency Values are binned and the count is displayed by a rectangle 0 1 2 3 4 5 Tips ($) 6 7 8 9 10 Univariate Q3 3.6 Q1 2 Min 1 4 Median 2.9 6 8 10 2 The boxplot displays just these 5 numbers Max Tips The data values are summarized by 5 numbers: min, Q1, median, Q3, max 10 Boxplots are used for real-valued variables Univariate 87 157 100 150 Gender of person paying the bill 50 The count for each category is represented by the height of a rectangle M 0 Barcharts are used for categorical data F F M Univariate Spine plots are used for categorical variables The count for each category is represented by the width of a rectangle Most useful when there are t wo variables or more. F M 87 157 Gender F M Bivariate Tip 8 ● ● ● ● ● ● 6 ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ●●●●● ●●●● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● 2 4 ● ● ● 0 16.99 1.01 10.34 1.66 21.01 3.5 23.68 3.31 ... ... ● r=0.68 Total Tip Bill 10 Scatterplots place a dot representing a pair of numbers on a Cartesian plane 0 10 20 30 Total Bill 40 50 Bivariate Density plots : hexagonal grids (Carr) 10 Counts Tip 8 9 7 5 4 3 2 2 1 1 6 4 2 10 20 30 Total Bill 40 50 Thu F It starts from a spine plot and divides the bars according to counts of a second variable. F 32 9 28 18 M 30 10 59 58 M A mosaic plot represents a t wo way table of categorical variables. Thu Fri Sat Sun Gender Bivariate Fri Day Sat Sun Barcharts with t wo variables Stacked barchart Side-by-side barchart 80 Male Female Male 0 0 10 20 20 40 30 40 60 50 Female Male Thu Fri Sat Day Sun Thu Fri Sat Day Sun Multivariate Tip 16.99 10.34 21.01 23.68 ... 1.01 1.66 3.5 3.31 ... Standardized Values Bill Standardized Values A parallel coordinate plot changes from orthogonal Cartesian axes to parallel axes. Bill Tip Variables Bill Tip Variables Standardized Values Parallel coordinate plots Look for patterns in the direction of the lines Bill Tip Variables Gender Parallel coordinate plots Measurements on Beetles Standardized Values What patterns do you see here? tars1 aede3 aede2 aede1 Variables head tars2 Maps Convention North at top The problems with taking longitude at number value Aspect ratio of lat to long Small regions/areas and reading information Longitude at numerical value and aspect ratio of longitude to latitude -50 0 50 Can you imagine what the world would look like if the vertical and horizontal plot space were equal? -150 -100 -50 0 50 100 150 This location is both -180 and 180 Small areas/regions 2004 election results on map: red=republican, blue=democrat Cartogram of 2004 election results http://www-personal.umich.edu/~mejn/election/ Time series plots 40 Fri Lines indicate temporal dependency Sat Sun Thu 50 100 150 Day 0 Average number of sunspots 0 20 Count 60 80 Temporal scale: Days of weeks need to be in conventional order, lines, ... What’s wrong with this plot? 1750 1760 1770 1780 1790 1800 1810 1820 1830 1840 1850 1860 1870 Year 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 Combinations Beetles 120 130 140 120 130 140 150 60 80 100 140 120 160 200 tars1 110 120 130 tars2 45 50 55 head 16 120 130 140 150 aede1 12 14 aede2 120 8 10 ~ 80 100 aede3 60 Small multiples is an approach advocated by Tufte to plotting multiple variables in a digestible way. This might be considered as combinations of basic plots. 120 240 110 120 160 200 240 45 50 55 8 10 12 14 16 Waseca Dotplots Barley yield for t wo different years for 6 locations in Minnesota and 10 varieties Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Crookston Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Morris Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota 1932 1931 University Farm Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Duluth Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Grand Rapids Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota 20 30 40 50 60 Barley Yield (bushels/acre) Boxplots or Dotplots Soprano 1 Different representations of the heights of the New York Choral Society Soprano 2 Soprano 1 Alto 1 Alto 1 Alto 2 Alto 2 Tenor 1 Soprano 2 Tenor 1 Tenor 2 Tenor 2 Bass 1 Bass 1 Bass 2 Bass 2 60 65 70 Height (inches) 60 65 70 75 ~ 75 Basic plots form the core How does Napoleon’s March use basic plots? time series plot + map + barchart How is John Snow’s cholera map related to a basic plot? Map + scatterplot Napoleon’s March Basic plot(s)? Cholera map …And on a Global Scale NORWAY BRITAIN COSTA RICA ECUADOR PANAMA PERU COLOMBIA ECUADORBOLIVIA PERU PARAGUAY CHILE BOLIVIA BOLIVIA KAZAKHSTAN MONGOLIA AFGHANISTAN BELGIUM AUSTRIA GER. UKRAINE SOUTH KOREA CHINA RUSSIA FRANCE CZECH REP. DOMINICAN REPUBLIC PORTUGAL Crotia BARBADOS ALGERIA Belguim Poland KazakhstanSwitzerland France TRINIDAD/TOBAGO Russia Fed VENEZUELA BARBADOS Bosnia/ Herz JAMAICA MEXICO HAITI GUATEMALA BELIZE EL SALVADOR NICARAGUA HONDURAS GUATEMALA GUATEMALA COSTA RICA VENEZUELA PANAMA EL SALVADOR NICARAGUA NICARAGUA COLOMBIA BELARUS IRELAND BAHAMAS CUBA MEXICO NEPAL ESTONIA, LATVIA, LITHUANIA NETHERLANDS BARBADOS SURINAME Norway TRINIDAD, TOBAGO GUYANAMAURITANIA Finland Romania GUYANA Albania Hungary Greece England Spain SURINAME Ireland Italy Portugal Germany Bulgaria MOROCCO MALI SURINAME SENEGAL BRAZIL GAMBIA GUINEABISSAU ARGENTINA CHILE PARAGUAY URUGUAY URUGUAY Myanmar INDIA MIDDLE EAST BALKANS LIBYA Moldova SUDAN MYANMAR BANGLADESH VIETNAM EGYPT PHILIPPINES Yugo LAOS THAILAND UGANDA NIGERIA Czech Rep Lithuania Latavia BURKINA FASO CAMBODIA NIGER GUINEA SRI LANKA GHANA SIERRA LEONE CHILE PAKISTAN ITALY SPAIN TUNISIA Austria Ukraine CAMEROON MALAYSIA CHAD LIBERIA SINGAPORE IVORY COAST ARGENTINA BENIN TOGO AIDS in the NY Times JAPAN Slovenia UNITED STATES POLAND DENMARK In many countries, reliable statistics on AIDS are hard to obtain. Unaids, the United Nations AIDS agency, relies on these numbers. Each square on the grid represents 2,500 people with AIDS. CANADA FINLAND SWEDEN CANADA ERITREA ETHIOPIA CENTRAL RWANDA AFRICAN REPUBLIC INDONESIA DJIBOUTI Highly Industrialized Countries PAPUA NEW GUINEA AUSTRALIA Latin America and the Caribbean CONGO REP. North Africa and the Middle East EACH DOT REPRESENTS 2,500 PEOPLE LIVING WITH AIDS KENYA GABON BURUNDI NEW ZEALAND CONGO Eastern Europe and Central Asia Southern and Eastern Asia MALAWI Sub-Saharan Africa 25 million 4 million ZAMBIA NEW CASES The estimated number of new H.I.V./AIDS cases in highly industrialized countries has decreased slightly since the 1980’s but has continued growing in sub-Saharan Africa. LIVING WITH AIDS ANGOLA ZIMBABWE BOTSWANA SWAZILAND EASTERN EUROPE AND CENTRAL ASIA HIGHLY INDUSTRIALIZED COUNTRIES LATIN AMERICA AND THE CARIBBEAN NORTH AFRICA AND THE MIDDLE EAST * Preliminary numbers Source: UNAIDS 15 million 10 million SUB-SAHARAN AFRICA SOUTHERN AND EASTERN ASIA ’80 ’82 ’84 ’86 ’88 ’90 ’92 ’94 ’96 ’98 ’00* 20 million NAMIBIA 1 million 0 The estimated number of people living with H.I.V./AIDS has exploded in sub-Saharan Africa while staying relatively level in highly industrialized countries. MOZAMBIQUE 3 million 2 million SUB-SAHARAN AFRICA TANZANIA Sweeden LESOTHO SOUTHERN AND EASTERN ASIA MADAGASCAR 5 million HIGHLY INDUSTRIALIZED COUNTRIES LATIN AMERICA AND THE CARIBBEAN SOUTH AFRICA 0 EASTERN EUROPE AND CENTRAL ASIA NORTH AFRICA AND THE MIDDLE EAST ’80 ’82 ’84 ’86 ’88 ’90 ’92 ’94 ’96 ’98 ’00* Source: *UNAIDS Preliminary numbers