SOC 210: GRADUATE STATISTICS I

advertisement
SOC 210: GRADUATE STATISTICS I
_____________________________________________________________________________
Lecture:
Section Number: 14618
Room: SSM 100
Day: Monday & Wednesday
Time: 1:30-2:45 PM
______________________________________________________________________________
Instructor:
Office:
Office Hours:
E-Mail:
Kyle Dodson
COB 337
Wednesday, 3:00 PM- 4:30 PM (or by appointment)
kdodson2@ucmerced.edu
Course Description
This is the first semester of the two-course sequence in social statistics required of graduate
students in Sociology. This course takes a systematic approach to the general linear model for
continuous dependent variables; the second semester course covers nonlinear regression models
for categorical and limited dependent variables. In addition to laying the theoretical foundations
for future social science research, this course introduces students to the use of computerized
statistical analysis using the software program Stata. Students are encouraged to think creatively
about how to use statistical methods in their own research.
Students meet twice each week for a 75 minute lecture on statistical fundamentals, theory,
applications, and topics. There are no mathematics prerequisites. Students are not expected to
have a background in calculus, but facility with algebra and exposure to the rudiments of
statistical distribution theory and hypothesis testing is expected.
The course is organized into four sections. The first section of the course covers the fundamental
mathematical and statistical concepts that are the building blocks for regression analysis. The
purpose of this section is both to refresh your memory and to provide a deeper, more formal
presentation of familiar concepts. The second section focuses on the assumption and mechanics
of the classical linear regression model. At the end of the second section you will have a good
mechanical knowledge of regression analysis. The third section includes a practical exposition of
the general linear model as we begin to relax the assumptions of the classical linear regression
model. At the end of the third section you will have a deeper theoretical and applied
understanding of the flexibility and limitations of the general linear regression model for social
science data. The final section presents an overview of topics in estimation for common
problems in social science research, including an introduction to structural equation models. The
purpose of this brief section is to give you some exposure to more complex models for
continuous dependent variables rather than to ask you to develop sophistication with these
techniques.
1
Details
The SOC 210 course notes are required reading and the primary source for lectures. Course notes
are published on the web. These notes were authored by Professor Patricia McManus at Indiana
University. They may not be cited, reproduced, or distributed without the expressed
written consent of Professor McManus.
There is no required textbook. An optional text that is accessible is Jeffrey Wooldridge’s
Introductory Econometrics, 4th edition. The textbook expands on topics covered in the course
notes, covers statistical theory in more detail, and presents material that is beyond the scope of a
one-semester course or is covered only briefly in class. Students interested in an optional
introductory text may be interested in Paul Allison’s Multiple Regression: A Primer. This text
covers background material that is typically covered in an undergraduate course in social
statistics, and introduces many of the course topics. Students interested in obtaining a more
advanced graduate-level econometrics textbook for reference are encouraged to choose Damodar
N. Gujarati’s Basic Econometrics as an introductory text or the intermediate text Econometric
Analysis by William H. Greene.
We use Stata version 13 in this class. Academic Technology Services at UCLA maintains an
unparalleled set of online statistical resources, including an excellent site for learning and using
Stata. Students who want more in-depth coverage of Stata can find copies of the Stata manuals
on the computers in the graduate student lab.
Primary texts:
S554 Lecture Notes. Available on the course website.
Recommended texts:
Allison, Paul D. 1998. Multiple Regression: A Primer. Pine Forge Press.
Greene, William H. 2002. Econometric Analysis 5th Edition. Prentice-Hall.
Gujarati, Damodar N. 2003. Basic Econometrics 4th Edition. New York: McGraw-Hill.
Wooldridge, Jeffrey. 2009. Introductory Econometrics: A Modern Approach 4th Edition. Mason,
Ohio: South-Western College Publishing Co.
2
Software:
All the necessary software is available in the graduate student lab.
Class assignments will require the use of Stata Version 13, a statistical software package.
Stata includes a text-editor to write and edit batch files containing multiple Stata commands.
These batch files, called “do-files,” can also be edited with any text-based file editor such as
NotePad and Textpad. Stata’s text editor has become increasingly sophisticated, but my personal
preference is to use Textpad (http://www.textpad.com). A syntax file that allows for code
highlighting can be downloaded from Scott Long’s website
(http://www.indiana.edu/~jslsoc/stata/wf/misc/textpad_syntax_file/stata.syn). Students who wish
to buy Stata for home use can inquire about student discounts through UC Merced IT.
Students will also need a flexible word-processing package such as Microsoft Word to complete
assignments. The word-processor and printer must have the capacity to (a) choose a fixed font
(such as Courier) for tables and (b) choose a Greek or mathematical font for common characters
such as β and σ. Microsoft Word also offers useful matrix and graphics capabilities.
Supplies:
 Your network folder (“U Drive” or “L Drive”) will prove useful for storing data and
Programs
 You will likely need portable storage. Your best bet is a USB-compatible memory key.
 You’ll need three large Campus Mail envelopes for handing in assignments. Get them
from the SSHA office or mailroom, and write your name on the outside, along with your
campus address if you are not in the Sociology department. I use these envelopes to
return assignments.
 Lots of paper and toner.
Requirements and Grading:
Students are required to complete 8 homework assignments during the course of the semester.
Each assignment is worth between 50-175 points for a total of 1000 points over the course of the
semester. The final grade for the course is determined by the sum of individual assignment
grades. Grading for this course is as follows:
981 – 1000
931 – 980
901 – 930
881 – 900
831 – 880
801 – 830
A+
A
AB+
B
B-
781-800
731-780
701-730
601-700
0 -599
C+
C
CD
F
3
Assignments
Each of the assignments includes data analysis exercises using Stata and the course data extracts
provided. The earlier assignments will be very structured, but towards the end of the semester
you will be asked to choose or construct your own variables for analysis. You can use the course
data extracts for these assignments, or you may discuss alternative data sources with me in
advance. Note that you should design these analyses so that the results reveal interesting
relationships. If you choose variables that are not significant and have small effects, your
grade will necessarily be lower.
Deadlines and Late Penalties
It is critical that you keep up with assignments. Assignments should be handed to the professor
or placed in the professor’s box in the Classroom and Office Building on the due date. Be sure to
confirm with me if you need to make alternative arrangements or if you are handing in a late
assignment – don’t expect the box to be checked regularly. Assignments due in class are due at
the start of class on the due date. Late assignments will be penalized by 5 points if they are
received within 24 hours of the time due, 10 points if they are received within 48 hours of the
time due, 25 points if they are received within 72 hours, 50 points if they are received within 96
hours, and 75 points if received after 96 hours. In no case will assignments be accepted on or
after the sixth calendar day after the due date.
Working Together
Students are encouraged to discuss homework assignments and data preparation with each other.
In particular, when cleaning data and constructing new variables for the early assignments it is a
good idea to compare your data with one or two other students before beginning your write-up.
The final product, however, must reflect your own work. On computer assignments that require
that you choose variables for analysis, everyone is expected to use different variables. If you are
aware that someone else is using the same variables that you are using, one or both of you need
to change variables.
Computer Problems
If you are having problems analyzing your data, be sure to bring a hardcopy listing of the
command file and the output, along with a disk with the command file and the output file. It is
impossible to diagnose error messages without these. If you send a question electronically,
include the Stata log file.
Revisions of Assignments
You may submit revisions of any assignment scoring below 90% of the total point count.
Sometimes I note on the assignment that a revision should be done, but you can always request a
revision. If you submit a revised assignment your final grade will be an average of the original
and the revised work, less any late penalties imposed on the original submission. Revisions must
follow this format:
 Include a brief memo summarizing the problems you have addressed.
 Resubmit the original, graded, marked-up copy of the assignment in entirety along with
the revised assignment.
 Include a clean copy of the entire assignment, including questions that have not been
changed from the original.
4



Use highlighting and/or marginal notes on the revised assignment to indicate all sections
that have been changed.
If the revision includes any changes in the data analysis, a new copy of the entire data log
should be submitted.
The entire package should be given to me directly – not to the lab instructor - within three
weeks of the date the original was returned.
Disability Statement
I am committed to providing assistance to help you be successful in this course. Reasonable
accommodations are available for students with a documented disability. If you have a disability
and may need accommodations to fully participate in this class, please visit the Disability
Services Center. All accommodations must be approved through Disability Services (Kolligan
Library, West Wing Suite 109). Please stop by or call 209-228-6996 to make an appointment
with a disability specialist.
Honor Code
If you plagiarize, or otherwise cheat, on any assignment, you will fail this course and your
transcript will note your violation of the academic honesty policy. Plagiarism involves
intentionally representing someone else’s words or ideas as your own. If you use outside
sources—either in the form of quotes or ideas—you must cite them to indicate where they come
from. Please see or email me if you need help with citations. When in doubt, ask! If you cheat,
or let someone else represent your work as their own, you are in violation of the student code of
conduct. You will fail this course and your failing grade will be identified on your student
transcript as resulting from academic dishonesty. Please consult the office of student life web site
if you require further information: http://studentlife.ucmerced.edu/ (then go to “Student Judicial
Affairs” and look at the “academic honesty policy”). Your enrollment in this course indicates
your willingness to comply with all requirements and policies.
5
Course Outline & Readings
This outline provides a good but imperfect estimate of the time we will spend on each topic. The
actual schedule is subject to change. Please read the course notes before class. Readings from the
course notes are identified as “Lectures 1 & 2.”
Assignment Due Dates and Point Count
Assignment 1. Introduction to Stata
Assignment 2. Data Management
Assignment 3. Regression Fit
Assignment 4. Regression Inference
Assignment 5. Functional Form
Assignment 6. Diags / Heterosked.
Assignment 7. Multiple Regression
TOTAL POINTS
Mon
Wed
Mon
Mon
Mon
Mon
Fri
Feb 2
Feb 18
Mar 2
Mar 16
Apr 143
Apr 27
May 8
130
130
150
150
125
150
165
1000
I. Preliminary Toolkit. Mathematics, Probability, Statistics
Week #1
January 21
Organizational Meeting
Week #2
January 26-28
Introductory Concepts: Why, What, How
Course Notes: Lecture 1
Course Notes: Appendix A
Week #3
February 2-4
Recapitulation. Mathematics, Probability, Statistics
Course Notes: Lecture 2
Course Notes: Lecture 3
II. Presentation of the Classical Normal Linear Regression Model
Week #4
February 9-11
Simple Regression: Estimation, Interpretation, Fit
Course Notes: Lecture 4
Week #5
February 18
Multiple Regression: Estimation, Interpretation, Fit
Course Notes: Lecture 5
Week #6
February 23-25
Statistical Inference I: Interval Estimation and Hypothesis Testing
Course Notes: Lecture 6, 6.1 through 6.9
Week #7
Mar 2-4
Statistical Inference II: Interval Estimation and Hypothesis Testing
Course Notes: Lecture 6, 6.10 to end
Course Notes: Appendix D
Week #8
Mar 9-11
Specification Tests & Functional Form
Course Notes: Lecture 7
6
Week #9
Mar 16-18
Group Differences: Dummy Variables and Interaction Terms
Course Notes: Lecture 8
Week #10
March 30
April 1
Omitted Variables, Measurement Error, Multicollinearity
Course Notes: Lecture 9
Course Notes: Lecture 10
III. The General Linear Model: Relaxing the Assumptions
Week #11
April 6-8
Heteroskedastic Errors, Loglinear Transformation, WLS
Course Notes: Lecture 11
Week #12
April 13-15
Correlated Errors
Course Notes: Lecture 12
IV. Topics in Estimation
Week #13
April 20-22
Introduction to Systems of Equations
Course Notes: Lecture 13
Week #14
April 27-29
Jackknife and Bootstrap Methods for Interval Estimation
Course Notes: Lecture 14
Week #15
May 4-6
Maximum Likelihood Estimation
Course Notes: Lecture 15
7
Download