Welcome to STAT203 – Statistics for Social Sciences Today’s agenda: - Introduction Policies How to win at statistics. Ch. 2 start: Nominal, Ordinal, and Interval data. Video: Joy of Stats - Florence Nightingale My Assumption is at the beginning of the semester you are… - Fresh from the break, but probably not super enthusiastic about class. - Possibly apprehensive about doing a quantitative class away from your major. - Mildly interested in statistics, but not as much your own field. My hope is at the end of the semester you are… - Less intimidated by stats than at the beginning of the semester. - Able to handle the most common kinds of statistical problems, and know what kinds of questions to ask of a specialist when something more complex comes up. - 3 credits wiser. But what are YOUR hopes? - I’ve sent out a link to an online survey asking what you’re hoping to get out of this course. Your answers will largely determine what problems we cover in class and assignments. Why SPSS? - Stands for Statistical Package for Social Sciences. Have you can argue with a name like that? - Updated often, but usage changes very little. - Has a certification system and tech support by IBM. A note on academic dishonesty. - This course is weighted more towards assignments than similar courses in the past. Considering that, plagiarism on assignments is going to be taken more seriously than usual. Working together on assignments is good BUT make sure it’s obvious that each of you has worked through the material independently enough that you could go through the steps on your own. - To keep things honest, I expect people working together to indicate so on their assignments and that each person hand in an assignment. Blatant copying of each other’s work or the work of students from previous courses will be considered cheating and a personal insult, regardless of credit given. One more note. - This is your class, not mine. You own it and I’m just the lecturer. The Stats + ActSci. Department and your own departments dictate the skeleton of what material needs to be covered, but the details and method are at our discretion. If you have any suggestions, comments, or requests for the course. Please e-mail me and I’ll do what I can to accommodate within the bounds of the syllabus. jackd@sfu.ca www.sfu.ca/~jackd Grading Scheme - 4-6 assignments, worth 25% in total. Midterm 1 is worth 15%. Midterm 2 is worth 20%. Final Exam worth 40% Grading Philosophy - This course should take about 100 hours, including studying, lectures, and assignments. If you’re doing something for much less than 1% of a course grade, you’re probably wasting your time. Grading Philosophy - The first midterm is worth slightly less to adjust for people getting used to the grading scheme. - If you do MUCH worse on one midterm than the other and the final, the botched midterm will be ignored and the weight put towards the final. I reserve the right to define “MUCH” how I like; I’m hoping to prevent sick people from being forced to come to class and also to prevent people from feigning illness to avoid doing a midterm unprepared. How to win at stats - You get better at stats by doing stats. You can do lots of reading but you won’t know your comprehension level until you try to tackle some questions. - Know your learning style (Tactile, Visual, Auditory) and play to your strengths. - Try to explain the material to someone else. - Work standing up whenever possible (I apologize for the sitin-class paradigm) - There is a statistics workshop in K9501, Shrum Science Center K on the 9000 (main) level. On the way towards Pizza Point / Club Ilia / Cornerstone Mews from here. - The workshop has SPSS ready computers and on-site tutors that can help you Mon-Fri. Use the workshop early, and use it often. - Stay ahead. Read the material before class so that you’re seeing it a second time here. This saves more time than you think it does. (Also gives you a buffer for papers in other classes). - Don’t fall behind. In previous years, I’ve had people ask for help saying they need 80-100% on the final to pass the course. So far, all of them have failed. About the textbook - Bad news: You’ll need the textbook. - Good news: Either the new (11th) or the old (10th) edition will do. - Some assignment questions will be based off of the textbook problems, but my webpage will have the assigned questions. th - I’m basing the material off the 11 edition, so which topic is in which chapter may not line up perfectly if you have an old edition. - There will be other readings (Parts from Freakanomics, The Numerati, and Outliers, clips from The Joy of Stats), but they will be available free online via links provided or books.google.ca Start of Chapter 2 - Nominal Data - Nominal means _______, as in the name is the most important part. - Example: Sex – Male, Female, Other. - Example: Favourite Ice Cream – Chocolate, Vanilla, Pistachio, Toenail, Anthrax, RumRaisin. Nominal data can be expressed as a ___________ because we’re most interested in the ___________ frequency of each response (i.e. the relative size of each group) Other 4% Favourite Ice Cream Gender Anthrax 6% RumRaisin 3% Toenail 9% Man 48% Woman 48% Chocolate 42% Pistachio 15% Vanilla 25% Word Cloud (for interest) - Recently more creative graphs like word clouds are used to show frequencies in many categories at once. (thanks to http://www.tocloud.com/ ) - Next: A cloud of the word frequencies of http://en.wikipedia.org/wiki/British_Columbia_history Word Cloud (for interest) - The larger a word is, the more often it appears. This graph is dominated by o British (used 116 times), o Columbia (97 times), and o The phrase “British Columbia” (86 times) in red. Word Cloud (for interest) - We can see subtler patterns by ignoring “British” and “Columbia”. Ordinal Data - Means ___________, because the order of the data is the most important. - Example: Opinion - Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree - Example: How much did you drink over the break? – None at all, A little, moderate about, enough to drop a grizzly bear. - Both ordinal and nominal data can be expressed as ___________, but for ordinal data, the ___________ of the ___________ is in implied in the placement of the bars. Interval Data - Like ordinal data, but the different categories are ______________________. - Example: Grades as percent. The 83% category could include anything in the interval from ___________to ___________or from ___________ to ___________ depending on grades. - Example: Number of bearded dragon owned. (0, 1, 2, 3, 4, …) The numbers are ___________, meaning separated, but the difference between each category is still ___________. - Interval data will be our first focus, because many classic summary statistics can be done on them like the… o ___________, o ___________, o ______________________, o ______________________, and o ___________. Histogram - Unlike a bar chart, a histogram is drawn with ___________between the bars. - The ___________emphasizes the ___________categories that cover all the values in a ___________. Blurred line between ordinal and interval - If the distance between categories is ___________ or makes numerical sense then ordinal data can be treated like interval data. - Example (either Ordinal OR Interval) Distance: 0-200km, 200-400km, 400-600km, 600-800km. - Example (Ordinal but NOT Interval) Distance: 0-20km, 2050km, 50-200km, more than 200km. Visualization is one of the more active topics within Statistics. Florence Nightingale [Joy of Stats 23:40 – 27:00] Next lecture - Modes Symmetry and Skew Mean and Median Which is best? Video: Joy of Stats - The mean