Analysing synthetic microdata from the National

advertisement
Analysing synthetic microdata from the National
Longitudinal Survey of Children and Youth for ages
16-17
Overview
The National Longitudinal Survey of Children and Youth (NLSCY) is a unique study
of Canadians from birth to adulthood. This data file is a synthetic file based on the NLSCY
data containing 359 records. It represents 359 youth, approximately one fifth of the 1617 year old cohort who completed the survey in 2002-2003. In order to preserve
confidentiality for the respondents, each record in the synthetic file contains both real
and artificial data.
The synthetic data provide sufficiently accurate results for use by high school students to
practise statistical analysis techniques but should not be used to produce estimates for
formal analysis and publication.
The respondents were asked a series of questions about themselves and their lives. The
questions were divided into the following seven categories:







Section
Section
Section
Section
Section
Section
Section
A: Friends and Family
B: About Me
C: Feelings and Behaviours
D: Smoking, Drinking, and Drugs
E: Health
F: My Relationships
G: My Parent(s)
In this activity, students will be introduced to microdata files from the National
Longitudinal Survey of Children and Youth for Ages 16-17 and will begin to analyse
NLSCY variables from each section of the survey.
Contributors: Jonathan Lee, Queen's University; Tracey Bushnik, Jennifer Hall, and
Joel Yan, Statistics Canada
Objectives




Ask questions and make predictions based on data
Analyse data using one- and two-variable statistical analysis
Use technology to generate graphical summaries of one- and two-variable data,
generating different types of graphs based on the type of data (numeric or
categorical)
Analyse data trends and data correlations based on the graphs produced
Suggested grade levels and subject area
Grade 11 or 12
Mathematics (Data Management)
Duration
Two to four 75 minute periods
Materials





Computers with Internet access and statistical software
Computer projector
NLSCY synthetic microdata
Analysis question handouts
Student worksheet
Classroom instructions
1.
If necessary, give a quick review of one- and two-variable data analysis
techniques.
2.
Using the computer projector, provide the students with an overview of the
NLSCY survey, using the information provided on the Mathematics Data page of the
Statistics Canada Learning resources website.
3.
Ensure students know how to launch the statistical software to be used and
how to import the NLSCY synthetic microdata into this software from the
Statistics Canada website.
4.
Divide the students into seven groups, one for each section (A to G) of the
questionnaire. Distribute the analysis questions to each group (Note: All of the
analysis questions for the different sections are in one file, so you will need to cut
them into the separate sections). Ensure that within each group, each question is
being investigated by at least one student.
5.
Distribute the student worksheet and have students complete Part 1
independently.
Note: To save time, this step can be completed as homework.
6.
After students have independently investigated their two questions, have the
groups reconvene to summarize their findings for their assigned section of the
survey in Part 2 of the worksheet.
7.
Have students from each group share highlights of their overall findings with
the class. This can be done via a PowerPoint presentation, a brief oral report, or
another format of the teacher's choosing. During the presentations, brainstorm
with the class ways that students could use these microdata for a major project.
Have students write ideas for further analysis and project work on their worksheets
in Part 3 of the worksheet.
8.
Collect the worksheet for evaluation.
Evaluation
Students can be informally assessed on their work habits and computer skills throughout
this activity. They can be formally assessed via the worksheet, which can be marked
using a marking scheme of the teacher's choice.
Student worksheet
National Longitudinal Survey of Children and Youth (NLSCY)
Self-reported synthetic microdata file for 16-17 year olds
Name:
Section of Questionnaire:
Group Members:
Objectives
During this activity, you will examine NLSCY synthetic microdata critically. You will:




become familiar with different types of analysis
represent the data visually
make conclusions based on your analysis
share your findings with the class
Instructions
1.
Your group will be assigned a set of questions to examine that relate to one of
the following sections of the NLSCY questionnaire:

Section A: Friends and Family

Section B: About Me

Section C: Feelings and Behaviours

Section D: Smoking, Drinking, and Drugs

Section E: Health

Section F: My Relationships

Section G: My Parent(s)
2.
With your group, read and discuss the list of analysis questions for your
section.
3.
From the list, select two questions that interest you. Make sure every question
has at least one person in your group assigned to it. If you wish, you may come up
with your own question to investigate.
4.
Complete Part 1 of the worksheet on your own.
5.
Reconvene with your group and complete Part 2 of the worksheet. Be prepared
to present your findings to the class.
6.
During the group presentations and whole class discussion, complete Part 3 of
the worksheet.
Part 1: On your own
The two questions I have selected are:
1.
2.
For my selected questions, I predict the relationship/trend to be:
1.
2.
Download the synthetic microdata from the NLSCY survey from the Statistics Canada
website and open it with your statistical software.
Please note:
1.
Section headers are in capital letters. There are no data corresponding to the
headers (e.g., 'PERSONAL DATA').
2.
The first characters of each attribute name are the question numbers (e.g., A1,
B4). If the question has multiple parts, each of the corresponding attribute names
will have the same initial characters. For example, Question A14 asks 'Who besides
your friends do you talk to about your problems?' and offers answers such as
'Mother' and 'Father'. Each of these sub-categories will also be denoted as A14 as
they relate to that question.
3.
More information about the study and about the attributes in the dataset is
available as part of the Description of items in dataset.
Tips on performing statistical analyses
Summary tables
Summary tables are useful to quickly display information about one or two attributes and
about how two attributes may be linked.
Performing an analysis between two numeric attributes
Plot one numeric attribute on the x-axis and the other numeric attribute on the y-axis.
You can then examine different numeric measures, plot the line of best fit, and examine
the r2 value.
Performing an analysis between a numeric and a categorical attribute
Plot the attributes on the x-axis and y-axis. A box plot provides a good visual
representation. You can also plot numeric measures such as mean or median for the
numeric attribute. A summary table can help in identifying different measures.
Performing an analysis between two categorical attributes
Plot one attribute on one axis and the other attribute onto the graph itself to overlay the
data onto the existing graph. Be sure to arrange categories in an order that makes sense
(e.g., least to most). It may help to use a ribbon chart to accurately compare
proportions.
Relationships
1.
What trends or relationships did you find in your data? Include actual measures
(equation of line of best fit, mean, r2 value, etc.) if performing numeric analysis.
i.
ii.
2.
Please explain these relationships (or lack of relationships).
.
i.
Part 2: With your group
In the space provided below, summarize your group's findings about the section of the
NLSCY questionnaire you analyzed.
Part 3: With the whole class
In the space provided below, make notes on ways you could perform further analyses on
this dataset or use these data in a project.
Analysis questions for the National Longitudinal
Survey of Children and Youth synthetic
microdata for 16-17 year olds
Section A: Friends and Family




What are the mean, median, and mode for the attribute Friends Score? Do these
values differ by sex?
What is the mean number of female (A7) and male (A8) close friends? What are
the median and mode? Are there any outlier values? Now examine by sex of
respondent. Do girls have more female than male friends? Do boys have more
male than female friends?
Add the number of close female friends and close male friends together and
recalculate mean, median, and mode. What does this represent?
Is there a link between friends smoking (A10), drinking (A10), or trying marijuana
(A10) and the respondent smoking (D1, D2), drinking (D3), or using marijuana
(D5) respectively?
Section B: About Me





What are the mean, median, and mode for the attribute General Self Image
Score?
Do these values differ by sex?
Is there a link between the attribute General Self Image Score and:
o feeling that close friends know who I am (A5)
o the number of close friends (A7 and A8)
o whether teens have someone else to talk to besides their close friends
(A13)
o whether teens are normal weight or overweight (BMI_Cole_Method)
Is there a link between feeling like an outsider (B7) and the attribute General
Self Image Score?
How about between feeling like an outsider (B7) and feeling happy right now
(B3)?
Are girls or boys more likely to have been physically attacked (B8)?
How often are respondents seeing violence on TV (B10)? Are there any sex
differences?
Section C: Feelings and Behaviours




What are the mean, median, and mode for the attribute Depression Score?
Do these values differ by sex?
Is there a correlation between not feeling like eating (C1) and having trouble
focussing (C1)?
Examine mean Depression Score by anyone at school who committed suicide
(C2) and anyone you know who committed suicide (C3). Is there any correlation?
Examine Depression Score by seriously considered suicide (C4).

Is there a correlation between selling drugs (C7) and being questioned by the
police (C7)?
Section D: Smoking, Drinking and Drugs










How frequently are young people smoking (D1)? Does it differ by sex?
How frequently are young people drinking (D3)? Does it differ by sex?
What proportion of young people have been drunk (D4)? How often? Does it differ
by sex?
Is there a link between smoking (D1) and drinking (D3) behaviours?
How often are young people using marijuana (D5)? Does it differ by sex?
Are there any links between marijuana use (D5) and smoking (D1) or drinking
(D3)?
Are there any links between smoking, drinking, and marijuana use and the
following attributes:
o Depression Score
o General Self Image Score
o Friends Score
What proportion of young people has driven impaired (D7)? Does it differ by sex?
What proportion of young people has been a passenger when the driver has been
impaired (D8)? Does it differ by sex?
Is there a link between driving impaired (D7) and being a passenger when the
driver is impaired (D8)?
Section E: Health







What are the mean, median, and mode for the attribute Raw value for Body
Mass Index? What is the mode for the attribute BMI Cole Method? Do these
values differ by sex?
Note: In the questions below, you can choose Raw value for Body Mass Index
if you wish to work with numeric data or BMI Cole Method if you wish to work
with categorical data.
Is there a correlation between the attribute Depression Score and BMI?
Is there a correlation between the attribute General Self Image Score and BMI?
Is there a link between BMI and thinking about committing suicide (C4)?
Is there a link between BMI and smoking (D1), drinking alcohol (D3), or using
marijuana (D5)?
Is there a link between BMI and the respondent's feelings about their weight
(E8)?
Do boys and girls differ in their feelings about their weight (E8)?
Section F: My Relationships

What is the mean age at which youth had their first boyfriend/girlfriend (F1)?
What are the median and mode? Do these values differ by sex?





What proportion of youth has had consensual sexual intercourse (F5)? Does this
differ by sex?
For those who have had sex, what is the mean age at the first experience (F6)?
What are the median and mode? Do these values differ by sex?
For those who have had sex, what is the mean age of their partner at the first
experience (F7)? Does it differ by sex?
Is there a relationship between the age of the respondent when they first had
consensual sex (F6) and the age of their partner (F7)? What is the equation of the
line of best fit? What is the r2 value? What does this mean?
Examine the attribute General Self Image Score with whether the respondent
currently has a boyfriend or girlfriend (F2), has ever had sex (F5), and whether
the respondent is currently sexually active (F8).
Section G: My Parents






What are the mean, median, and mode for the attribute Conflict Resolution
Score with Mother? Do these values differ by sex?
What are the mean, median, and mode for the attribute Conflict Resolution
Score with Father? Do these values differ by sex?
Examine the attributes Conflict Resolution Score with Mother and Conflict
Resolution Score with Father with whether the respondent has ever considered
suicide (C4). Are there any correlations?
Examine the attributes Conflict Resolution Score with Mother and Conflict
Resolution Score with Father with smoking (D1), drinking (D3) and marijuana
use (D5). Are there any correlations?
Are there any correlations between the frequency of eating meals together or
having discussions (G4 for mother; G9 for father) and the overall relationship with
mother (G3) and father (G8)?
Are there any correlations between feeling your parents understand you, are fair
to you, or are affectionate to you (G2 for mother; G7 for father) and the frequency
that you and your parents disagree and fight (G5 for mother; G10 for father)?
Download