US News College Information (Treemap, Spotfire)

advertisement
Jerry Alan Fails
March 01, 2005
Application
CS 838S – Ben Shneiderman
Page 1 of 11
US News College Information
Every year, US News publishes a composite ranking of all undergraduate colleges and
universities. These rankings combine and even camouflage numerous features using a nebulous
composite formula. By accessing the individual data features/attributes, however, students and
faculty may be able to use the data to better guide decisions in where to study and/or teach. Two
data sets were joined to allow exploring these two perspectives. The joined data sets were the US
News College Data from 1995 and the Faculty/Staff Salary Data from the same year. (The data sets
were found at: http://www.amstat.org/publications/jse/jse_data_archive.html.) In order to get a
geographical perspective of the data, the final data set also included regional data for each college
as specified by the US Census Bureau (see http://www.census.gov/geo/www/us_regdiv.pdf). The
tools used to explore the data were Treemap and SpotFire.
In looking at the data there were two main perspectives that I used, that of a future college
student and that of a future college professor. Some common questions between prospective
students and teachers are: where do people go to college, where are the largest colleges, what do
they cost, etc. Questions a prospective student might ask are: which schools can I afford, which
school is ranked highest, where will I get the most for my money, etc. A prospective professor
might ask are: where are the highest salaries, where is the most funding, where are the “smartest”
students.
Common to both perspectives is an understanding of where college institutions are, how
many full-time students are enrolled and how much tuition costs. Treemap is a perfect tool to view
this hierarchical data (see Figure 1).
Figure 1 — General layout of enrollment in colleges and universities. Size indicates number of students enrolled.
Color represents the price of out-of-state tuition (Green, high; black, low).
Jerry Alan Fails
March 01, 2005
CS 838S – Ben Shneiderman
Page 2 of 11
Application
Note that with the treemap representation one can identify the number of full-time students
in both the regions and the states. For example, this representation illustrates that the regions with
the most students in descending order are: East North Central (OH, IL, MI, IN, WI), South Atlantic
(NC, VA, FL, GA, MD, SC, WV, DC, DE), Middle Atlantic (NY, PA, NJ), West South Central
(TX, LA, OK, AK), Pacific (CA, WA, OR, HI, AK), West North Central (MO, MN, IA, KS, NE,
SD, ND), East South Central (TN, AL, KY, MS), Mountain (CO, UT, AZ, ID, NM, MT, NV, WY),
and New England (MA, CT, NH, RI, ME, VT). The top four states are obvious as well: CA, NY,
TX, and PA.
Analyzing the same data with Spotfire one could see the distribution with the graphs shown
below. These graphs add the element of being able to visualize the ratio between public and private
institutions.
Number of Full-time Students by region
Number of Institutions by region
840815
195
197
185
800000
180
726929
700000
160
619945
600000
134
140
507150
500000
120
451636
423280
101
100
100
400000
93
81
305330
300000
301366
80
276955
60
200000
48
40
100000
20
0
0
Region Name
Region Name
Number of Institutions by Region
299288
250000
200000
187263
150000
132030
100000
96403
80411
67590
54612
50000
43201
25500
18162
7535
0
State Full Name
Figure 2 — Spotfire representations of the distributions of number of full-time student enrollment (upper left, in
decreasing order), number of institutions (upper right, same order as upper right), and by state (lower middle, in
decreasing order by state). Color represents public (blue) and private (red) institutions.
Jerry Alan Fails
March 01, 2005
CS 838S – Ben Shneiderman
Page 3 of 11
Application
Analysis of these scatter plots that were generated by Spotfire reveals that NY not only
enrolls many undergraduates, but it also has many private universities — many more than CA,
which while enrolling many students has mostly public schools. Similar to NY, PA also has a high
ratio of private to public, although not quite as high. TX is more like CA having fewer private
schools while enrolling many students. At first one may attribute this as a difference between
western and eastern states due to the population, population-density, and geographical reasons (both
CA and TX are large) variations. However, by looking at Figure 2, upper left, it is apparent that
most regions have a large percent of public schools, the Middle Atlantic and New England schools
have a large percentage of private vs. public schools.
Viewing Figure 2, upper right reveals that the West North Central and New England regions
have smaller institutions since there are more institutions in these regions, and the ordering of the
regions is by number of full-time enrolled undergraduates.
Another view of the overview of the data results in some insights. Figure 3 simply plots instate tuition and number of full-time undergraduate students. As shown, these features highly
segregate the public and private schools. Also highlighted are a cluster of institutions with
enrollment of 2500 to 12500 and tuitions of $13,000 to $20,000. Looking at the names of the
colleges reveals many famous, prestigious universities. The graph also shows a private school
amidst the public schools, namely Brigham Young University. The school is a large private school,
with very low tuition.
Number of Full-time Students vs. In-state T uition
25000
20000
15000
Yale University
Dartmouth College
Stanford University Cornell Univ.-Statutory Clls
Harvard University
Brown University
Washington University
Univ. of Southern California
Emory University
University of Puget Sound
New York University
Northwestern University
Villanova University
Syracuse University
Northeastern University
10000
5000
Brigham Young University
0
0
5000
10000
15000
20000
25000
30000
Number of full-time undergraduates
Figure 3 — Number of full-time students vs. in-state tuition. Color represents public (blue) and private (red)
institutions. Size of square indicates average faculty salary. As can be seen there is a cluster of famous private
universities that are about the same in-state tuition and same number of full-time undergraduates. Also, there is an
outlier private school that has a large enrollment and small tuition (Brigham Young University).
Jerry Alan Fails
March 01, 2005
Application
CS 838S – Ben Shneiderman
Page 4 of 11
Figure 5, below shows how two similarly structured treemaps reveal this same relationship
between private and public schools and their tuition costs. Both treemaps use a region/state
hierarchy. The treemap on the left is coded by public (blue) and private (red) schools. The treemap
to the right is color-coded by in-state tuition, with black meaning low tuition and green signifying a
high tuition. As can be seen there is a nice visual comparison between the blue on the left and black
on the right, and red on the left and green on the right.
Figure 4 — Left, shows the region/state hierarchy, coloring public schools blue and private schools red. Right, the
region/state hierarchy, with a color distribution over in-state tuition (black is low in-state tuition, green is high tuition).
Notice the visual correlation between high-tuition and private schools. (The trend for out-of-state tuition is similar,
although slightly blurred.)
Jerry Alan Fails
March 01, 2005
Application
CS 838S – Ben Shneiderman
Page 5 of 11
Addressing the perspective of a prospective professor, the Spotfire scatter plots in Figure 4,
provide new interesting insights.
Average Faculty Salaries by Region
500
481.8709677419
466.9603960396
465.2702702703
411.4461538462
408.4166666667
401.2842639594
395.46
377.5373134328
369.2345679012
400
300
200
100
0
Region Name
Average Public Institution Faculty Salaries by Region
507.6
519.8913043478
500
500
460.1785714286
400
Average Private Institution Faculty Salaries by Region
459.3015873016
424.1463414634 416.8979591837
415.1666666667
405.5405405405
396.5593220339
400
469.5616438356
462.4528301887
447.1942446043
393.8780487805
388.6060606061
388.1666666667
384.9826086957
354.8470588235
338.7045454545
300
300
200
200
100
100
0
0
Region Name
Region Name
Figure 5 — Average faculty salaries by region and type of institution. Color represents public (blue) and private (red)
institutions. Surprisingly, average public faculty salaries are more than average private faculty salaries.
Jerry Alan Fails
March 01, 2005
Application
CS 838S – Ben Shneiderman
Page 6 of 11
As displayed the graphs in Figure 4, a surprising insight emerges, namely that by averaging
by region, private school faculty salaries on average are less than public school salaries. This is
surprising as private and public schools are generally distinguishable by the amount of tuition
(recall Figures 3 and 4). The somewhat unexpected difference in public and private school faculty
salary can somewhat be reconciled by the fact that public schools receive public funding.
Figure 6, explores the supply/demand and cost benefits for professors. The treemap below
uses the region/state hierarchy. It uses the number of faculty for sizing, and average faculty salary
for coloring (low salary being black [$23200], high salary being green [$86600]). This graph shows
a high demand with a correlating high average salary in the Middle Atlantic region as well as in CA
and MA. Somewhat surprisingly, the regions with the highest demand (East North Central and
South Atlantic), do not seem to pay as much as these the third and forth most needy regions (Middle
Atlantic and Pacific).
Figure 6 — The general hierarchical distribution of number of faculty (size) and average salary (high salary green, low
salary black). Middle Atlantic seems to be a high paying region as do CA and MA.
Continuing an exploration of the data, looking for correlations for high faculty salaries,
Figure 7 displays out-of-state tuition vs. instructional expenditure per student. Square size of the
university indicates the average faculty salary for that institution. As can be seen, there is a strong
correlation between tuition cost and instructional expenditure, except for the highlighted
institutions. The highlighted institutions have higher instructional expenditure per student than the
amount of tuition. This is probably due to outside funding sources. For example, it is not too
surprising that Gallaudet University has a lot of funding (as it is focused on sign language, and
teaching of the deaf and hard-of-hearing). Cal Tech receives a lot of technical funding. Johns
Hopkins, of course, is known for their medical research.
Jerry Alan Fails
March 01, 2005
CS 838S – Ben Shneiderman
Page 7 of 11
Application
Out-of-state T uition vs. Instruction Expenditure per Student (Size = Faculty Salary)
California Inst. of Tech.
60000
Johns Hopkins University
50000
Washington University
Antioch University
Wake Forest University
Yale University
40000
University of Chicago
Harvard University
30000
Emory University
University of Pennsylvania
Stanford University
Massachusetts Inst. of Tech.
Columbia University
Dartmouth College
Princeton University
Northwestern University
University of Rochester
Vanderbilt University
Swarthmore College Wheaton College
Creighton University
Wellesley College
Pomona College
Univ.of Calif.-Los Angeles
Cornell Univ.-Statutory Clls
Brown University
New York University Harvey Mudd College
Carnegie-Mellon University
Rice University
Gallaudet University
20000
Case Western Reserve Univ.
10000
5000
10000
15000
20000
25000
Out-of-state tuition
Figure 7 — Out-of-state tuition vs. instruction expenditure per student. Color represents public (blue) and private (red)
institutions. Size of square indicates average faculty salary. As can be seen a lot of “top” universities spend a lot more
on instructional expenditure per student than they receive in tuition. This is likely due to outside funding sources. As
can be noted by the sizes of the squares, there is an obvious correlation between the instructional expenditure per
student and average faculty salaries.
It is also interesting to note that many of the highlighted universities (those whose
instructional expenditure are greater than their tuition), are also mostly “prestigious” institutions.
As indicated by the size of square for the institution, these top universities also pay their faculty
higher salaries. Of course this is not too surprising since one of the axes is instructional expenditure
per student, but this per-student value is also dependent upon the student/faculty ratio.
Jerry Alan Fails
March 01, 2005
Application
CS 838S – Ben Shneiderman
Page 8 of 11
Application Evaluation
Spotfire and Treemap were effective tools for visualizing this multi-dimensional data set.
To be able to view information however, it is necessary for the user in both applications to prune
and define specific data attributes to view. Spotfire attempted to give suggestions to things that
were highly correlated; however, since the views do not include any of the color mappings available
on the main visualization panel, it is really difficult to see what these graphs mean. Also, because
you cannot even expand this ‘View Tip’ box, it is impossible to read long comparison names to help
in the pruning process. Spotfire does allow visualization of 4D data in a 2D space via the x-, y-axis,
node size and color. This is a nice feature for multi-dimensional data. I also like the aggregate
functions Spotfire allows when creating histograms. Treemap on the other hand allows
visualization of more levels if they include hierarchies. Although in most cases it only allows a 2D
representation based on square size and color, grouped by an imposed hierarchy. HCE might have
also yielded interesting results, but would fatally crash upon viewing this data.
This project took a lot of data massaging using Excel and Access prior to visualization. A
visualization tool that integrated better data manipulation and control would be very desirable.
Jerry Alan Fails
March 01, 2005
CS 838S – Ben Shneiderman
Page 9 of 11
Application
Other Interesting Graphs
Instruct Expenditure per Student
vs. Average Salary
_
.
California Inst. of Tech .
_
60000
_
Johns Hopkins University
_
50000
_
Washington University
_
Antioch University
_
Wake Forest University
_
Yale University
_
U . S. Naval Academy
_
40000
_
University of Chicago
_
Harvard University
_
Massachusetts Inst. of Tech .
_
Stanford University
_
_
Dartmouth
College
Emory University _
_
Columbia University
30000
_
Northwestern University
University of Rochester _
_
Duke University
_
University of Pennsylvania
_
20000
_
10000
_
300
_
400
_
500
_
600
_
700
_
800
_
Average salary - all ranks
_
Figure 8 — This illustrates average salary vs. instructional expenditure. There is a general correlation, with many
outliers. The highlighted outliers have a higher instructional expenditure vs. salary indicating that these universities
probably have more funding that does not necessarily go to the professors. The US Naval Academy is a “public” outlier
among the private schools. This is not surprising since the Naval Academy likely receives a lot of federal funding.
Jerry Alan Fails
March 01, 2005
CS 838S – Ben Shneiderman
Page 10 of 11
Application
Out-of-state T uition vs. Instruction Expenditure per Out-of-State T uition Dollar
Gallaudet University
5
4
Grambling State University
Univ. of Hawaii at Hilo
California Inst. of Tech.
Univ. Alabama at Birmingham
Brigham Young University
Wake Forest University
3
Antioch University
Boise State University
Univ.of Calif.-Los Angeles
University of South Alabama
Eastern Oregon State College
Adams State College Univ.Alaska-Fairbanks
Lincoln University
Univ.of Calif.-San Diego
2
Johns Hopkins University
Univ. of Hawaii at Manoa
Bellevue College
William Paterson College
Washington University
Rice University
Creighton University
University of Washington
Harvard University
University of Chicago
Stanford University
Yale University
Northwestern University
Dartmouth College
Duke University
Massachusetts Inst. of Tech.
Columbia University
Univ.of Calif.-Davis
University of Pennsylvania
University of Rochester
Carnegie-Mellon University
1
0
5000
10000
15000
20000
25000
Out-of-state tuition
Figure 9 — This graph highlights the universities where students get the “biggest bang for their buck”. In other words
how much instructional expenditures they get per tuition dollar. This puts expensive and inexpensive universities on an
equal playing level as their instructional spending is normalized by their out-of-state tuition.
Jerry Alan Fails
March 01, 2005
CS 838S – Ben Shneiderman
Page 11 of 11
Application
Regions vs. Instructional expenditure per Out -of-State T uition Dollar - Region Name vs.
Instructional expenditure per out -of-state tuition dollar
Gallaudet University
5
Grambling State University
Univ. of Hawaii at Hilo
4
California Inst. of Tech.
Univ. Alabama at Birmingham
Brigham Young University
Wake Forest University
3
Antioch University
Boise State University
Texas Woman's University
University of South Alabama
Washington University
Adams State College
2
Jersey City State College
Lincoln University
Univ. of Illinois-Chicago
Howard University
Bellevue College
SUNY at Stony Brook
William Paterson College
Kean College of New Jersey
University of Chicago
Harvard University
Creighton University
Yale University
Massachusetts Inst. of Tech.
Dartmouth College
1
0
East North Central East South C... Middle Atlantic
Mountain
New England
Pacific
South Atlantic West North ...
Region Name
Figure 10 — This shows which universities, by region, have the “biggest bang for their buck”.
West South Central
Download