A Baseball Statistics Course

advertisement
A Baseball Statistics Class
Jim Albert
Department of Mathematics and Statistics
Bowling Green State University
albert@bgnet.bgsu.edu
Supported by the National Science Foundation
Outline






Describe the intro stats class at BGSU
Why focus a class on sports?
Examples of Data analysis
Examples of Probability
Examples of Inference
Address some questions
MATH 115 – Introduction to Statistics





Satisfies math elective for students in
College of Arts and Sciences
Required by students in health college
Students have range of math skills
Goal of course is statistical literacy – how
does one draw conclusions from data
Book is at the level of Moore, Basic Practice
of Statistics
Class is hard to teach




No one wants to take stats.
Easy to focus on number crunching rather
than concepts.
Students have little interest in the topics and
datasets discussed.
How to make the class more relevant to
everyday life?
Statistics can made more interesting if
we capitalize on “good” datasets






Come in raw form
Are authentic
Are intrinsically interesting
Are topical or controversial
Offer substantial learning
Lend itself to a variety of statistical analyses
Why base a stats course on baseball?





Great American game
Great historical tradition.
Statistics are a integral part of baseball, used
to rate players and teams.
Players are known by their statistics (60, 56,
1.12)
Relatively easy to model using probability.
MATH 115 b



Special section of MATH 115 with a baseball
emphasis
I’ve taught it several times, most recently this
summer.
Text: Albert, Teaching Statistics Using
Baseball, Mathematical Association of
America.
Getting started with data analysis

Looked at Bernie Williams’ baseball card.

Started with a question “Was Bernie a big
home run hitter?”

Used graphs to answer the question.
Great home run hitters





Watched part of Ken Burn’s documentary
about Babe Ruth.
Explored the slugging percentages of Babe.
Interesting to plot SLG against his AGE
(his career trajectory)
Notice a familiar pattern.
Interesting outlier (the bellyache heard
around the world)
Do all players show a similar
trajectory?

Look at Barry Bonds’
slugging percentages
over time.

Shows unusual pattern towards the end of
his career.
Baseball shapes

Counts of things, like
home run counts tend
to be right-skewed.

Derived baseball stats
tend to be symmetric.
The Babe, Roger, and Barry



Watched part of the movie “61*”
Compared the home run rates of players in
1921, 1961, 2001
Which outlier
was most
notable?
The Second Best Baseball Player from
BGSU?




Orel Herscheiser was the best.
Who was the 2nd best: Grant Jackson and
Roger McDowell ? (Grant’s niece was in my
class.)
Compared their strikeout rates.
Jackson was the better strikeout pitcher.
Fitting lines to scatterplots

Used spaghetti to fit a line to (Home run,
Slugging Percentage) for Mike Piazza’s data
(note the Italian connection).

Talked about the best batting measure. Is
batting average or OBP better in predicting
runs scored per game?
Regression effect




Suppose your favorite team has a crummy
season last season.
I predict they will do better this season.
The regression effect.
Illustrate by looking at the number of wins of
teams for two consecutive seasons.
Field of Dreams




Watched part of the movie.
Looked at the statistics of Shoeless Joe
Jackson and Moonlight Graham.
Who was better: Ty Cobb or Shoeless
Jackson?
Can you predict Jackson’s triple count for a
season if you know his double count?
Introducing probability




Played a simple dice game Big League
Baseball.
A single die controls the pitch (ball or strike).
Two dice control the “in play” outcome.
Simple enough you can talk about
probabilities of various events (like a hit).
All-Star Baseball



Spinner game where each spinner controls
the hitting outcome for a single player.
Student had a project where they
constructed a spinner for a player given his
career hitting statistics.
Played a spinner game in class.
Spinner for Mike Schmidt (one of my
favorite players)
The spinner game motivates inference


There is a distinction between a player's
ability and his performance. An ability is an
intrinsic quality of a player, say his batting
talent, that we really don't know exactly. We
do observe a player's performance, say his
batting average for a particular season.
The objective of Statistics is to learn about a
player's ability on the basis of his
performance.
Suppose a player’s true on-base
percentage average is .4

Use a 10-sided die to simulate the
performance of a player in 20 plate
appearances.

Big distinction between his ability and his onbase performance in these games.
Do we observe chance variation in
baseball?

Watched part of “Angels in the Outfield”.

Went to a Toledo Mud Hens game. Students
were asked to look for lucky things that
happened in the game (such as a groundball
that found the right location for a hit)
Concluded with a discussion of some
interesting issues in inference

Are baseball players really streaky?

Are situational statistics in baseball
meaningful?
(this is how players perform in different
situations like Home/Away, in different
months, against different pitchers, etc.)
Arguments against teaching this type
of course
I’ll describe five objections
“All students aren’t interested in
baseball”

At BGSU, easy to fill one section with
students who like baseball

Don’t need to be a baseball fan, just willing to
learn some baseball and statistics.
“Baseball (game) and statistics
(serious science) don’t mix”

Baseball is a serious business for players,
managers and owners.

Need a proper interpretation of statistics to
be a successful baseball team.

Controversy about the use of statistics –
similar to the mistrust of statistics in the
public area.
“The course appeals mainly to one
gender”

Course does tend to attract more men.

But the course only requires a willingness to
learn.
“I don’t know any baseball, but my brothers
played sports, and I was learning to learn.”
“Students won’t be able to think
statistically in other settings”

Use baseball as the medium where students
learn statistical concepts, such as learning
about an ability (a parameter).

Once the concept is learned, it is relatively
easy to expose students to other examples
outside baseball.
“Course doesn’t cover all topics in a
first statistics course”

Only topic that didn’t receive much attention
was collecting data through sample surveys
and designed experiments.

But could include these topics within context
of baseball.
Was the course successful?




Fun for both instructor and the students.
Enthusiasm of the instructor about the
material had a positive impact on learning.
Baseball is a great context for learning many
statistics concepts.
Students could make sense of the statistical
conclusions.
Moral of this experiment

Should explore alternative methods of
teaching statistics.

In particular, explore ways of engaging
students through interesting applications so
they can make more sense of statistical
thinking.
Some references
“A Baseball Statistics Class”, Journal of
Statistics Education
http://www.amstat.org/publications/jse/v10n2/al
bert.html
 I created a blog of my recent class.
http://bstats.blogspot.com/
 See my website http://bayes.bgsu.edu for
more information about the book.

Download