Welcome to STAT203 – Statistics for Social Sciences Today’s agenda: - Introduction

advertisement
Welcome to STAT203 – Statistics for Social Sciences
Today’s agenda:
-
Introduction
Policies
How to win at statistics.
Ch. 2 start: Nominal, Ordinal, and Interval data.
Video: Joy of Stats - Florence Nightingale
My Assumption is at the beginning of the semester you are…
- Fresh from the break, but probably not super enthusiastic
about class.
- Possibly apprehensive about doing a quantitative class
away from your major.
- Mildly interested in statistics, but not as much your own
field.
My hope is at the end of the semester you are…
- Less intimidated by stats than at the beginning of the
semester.
- Able to handle the most common kinds of statistical
problems, and know what kinds of questions to ask of a
specialist when something more complex comes up.
- 3 credits wiser.
But what are YOUR hopes?
- I’ve sent out a link to an online survey asking what you’re
hoping to get out of this course. Your answers will largely
determine what problems we cover in class and
assignments.
Why SPSS?
- Stands for Statistical Package for Social Sciences. How you
can argue with a name like that?
- Updated often, but usage changes very little.
- Has a certification system and tech support by IBM.
A note on academic dishonesty.
- This course is weighted more towards assignments than similar
courses in the past. Considering that, plagiarism on assignments is
going to be taken more seriously than usual. Working together on
assignments is good BUT make sure it’s obvious that each of you has
worked through the material independently enough that you could go
through the steps on your own.
- To keep things honest, I expect people working together to indicate
so on their assignments and that each person hand in an assignment.
Blatant copying of each other’s work or the work of students from
previous courses will be considered cheating and a personal insult,
regardless of credit given.
One more note.
- This is your class, not mine. You own it and I’m just the
lecturer. The Stats + ActSci. Department and your own
departments dictate the skeleton of what material needs to
be covered, but the details and method are at our
discretion. If you have any suggestions, comments, or
requests for the course. Please e-mail me and I’ll do what I
can to accommodate within the bounds of the syllabus.
jackd@sfu.ca
www.sfu.ca/~jackd
Grading Scheme
-
4-6 assignments, worth 25% in total.
Midterm 1 is worth 15%.
Midterm 2 is worth 20%.
Final Exam worth 40%
Grading Philosophy
- This course should take about 100 hours, including
studying, lectures, and assignments. If you’re doing
something for much less than 1% of a course grade, you’re
probably wasting your time.
Grading Philosophy
- The first midterm is worth slightly less to adjust for people
getting used to the grading scheme.
- If you do MUCH worse on one midterm than the other and
the final, the botched midterm will be ignored and the
weight put towards the final. I reserve the right to define
“MUCH” how I like; I’m hoping to prevent sick people from
being forced to come to class and also to prevent people
from feigning illness to avoid doing a midterm unprepared.
How to win at stats
- You get better at stats by doing stats. You can do lots of
reading but you won’t know your comprehension level until
you try to tackle some questions.
- Know your learning style (Tactile, Visual, Auditory) and play
to your strengths.
- Try to explain the material to someone else.
- Work standing up whenever possible (I apologize for the sitin-class paradigm)
- There is a statistics workshop in K9501, Shrum Science
Center K on the 9000 (main) level. On the way towards
Pizza Point / Club Ilia / Cornerstone Mews from here.
- The workshop has SPSS ready computers and on-site tutors
that can help you Mon-Fri. Use the workshop early, and use
it often.
- Stay ahead. Read the material before class so that you’re
seeing it a second time here. This saves more time than you
think it does. (Also gives you a buffer for papers in other
classes).
- Don’t fall behind. In previous years, I’ve had people ask for
help saying they need 80-100% on the final to pass the
course. So far, all of them have failed.
About the textbook
- Bad news: You’ll need the textbook.
- Good news: Either the new (11th) or the old (10th) edition
will do.
- Some assignment questions will be based off of the
textbook problems, but my webpage will have the assigned
questions.
th
- I’m basing the material off the 11 edition, so which topic is
in which chapter may not line up perfectly if you have an
old edition.
- There will be other readings (Parts from Freakanomics, The
Numerati, and Outliers, clips from The Joy of Stats), but
they will be available free online via links provided or
books.google.ca
Start of Chapter 2 - Nominal Data
- Nominal means ‘name’, as in the name is the
most important part.
- Example: Sex – Male, Female, Other.
- Example: Favourite Ice Cream – Chocolate,
Vanilla, Pistachio, Toenail, Anthrax, RumRaisin.
Nominal data can be expressed as a pie chart because we’re
most interested in the relative frequency of each response (i.e.
the relative size of each group)
Other
4%
Favourite Ice Cream
Gender
Anthrax
6%
RumRaisin
3%
Toenail
9%
Man
48%
Woman
48%
Chocolate
42%
Pistachio
15%
Vanilla
25%
Word Cloud (for interest)
- Recently more creative graphs like word clouds are used to
show frequencies in many categories at once. (thanks to
http://www.tocloud.com/ )
- Next: A cloud of the word frequencies of
http://en.wikipedia.org/wiki/British_Columbia_history
Word Cloud (for interest)
- The larger a word is, the more often it appears. This graph
is dominated by
o British (used 116 times),
o Columbia (97 times), and
o The phrase “British Columbia” (86 times) in red.
Word Cloud (for interest)
- We can see subtler patterns by ignoring “British” and
“Columbia”.
Ordinal Data
- Means ‘order’, because the order of the data is the most
important.
- Example: Opinion - Strongly Agree, Agree, Neutral,
Disagree, Strongly Disagree
- Example: How much did you drink over the break? – None
at all, A little, moderate amount, enough to drop a grizzly
bear.
- Both ordinal and nominal data can be expressed as bar
charts, but for ordinal data, the order of the categories is in
implied in the placement of the bars.
Interval Data
- Like ordinal data, but the different categories are evenly
spaced.
- Example: Grades as percent. The 83% category could
include anything in the interval from 82.5% to 83.5% or
from 82.1% to 83% depending on grades.
- Example: Number of bearded dragon owned. (0, 1, 2, 3, 4,
…) The numbers are discrete, meaning separated, but the
difference between each category is still one dragon.
- Interval data will be our first focus, because many classic
summary statistics can be done on them like the…
o mean,
o median,
o standard deviation,
o interquartile range, and
o skewness.
Histogram
- Unlike a bar chart, a histogram is drawn with no gaps
between the bars.
- The lack of gaps emphasizes the evenly spaced categories
that cover all the values in a range.
Blurred line between ordinal and interval
- If the distance between categories is constant or makes
numerical sense then ordinal data can be treated like
interval data.
- Example (either Ordinal OR Interval) Distance: 0-200km,
200-400km, 400-600km, 600-800km.
- Example (Ordinal but NOT Interval) Distance: 0-20km, 2050km, 50-200km, more than 200km.
Visualization is one of the more active topics within Statistics.
Florence Nightingale [Joy of Stats 23:40 – 27:00]
Next lecture
-
Modes
Symmetry and Skew
Mean and Median Which is best?
Video: Joy of Stats - The mean
Download