Analysing synthetic microdata from the National Longitudinal Survey of Children and Youth for ages 16-17 Overview The National Longitudinal Survey of Children and Youth (NLSCY) is a unique study of Canadians from birth to adulthood. This data file is a synthetic file based on the NLSCY data containing 359 records. It represents 359 youth, approximately one fifth of the 1617 year old cohort who completed the survey in 2002-2003. In order to preserve confidentiality for the respondents, each record in the synthetic file contains both real and artificial data. The synthetic data provide sufficiently accurate results for use by high school students to practise statistical analysis techniques but should not be used to produce estimates for formal analysis and publication. The respondents were asked a series of questions about themselves and their lives. The questions were divided into the following seven categories: Section Section Section Section Section Section Section A: Friends and Family B: About Me C: Feelings and Behaviours D: Smoking, Drinking, and Drugs E: Health F: My Relationships G: My Parent(s) In this activity, students will be introduced to microdata files from the National Longitudinal Survey of Children and Youth for Ages 16-17 and will begin to analyse NLSCY variables from each section of the survey. Contributors: Jonathan Lee, Queen's University; Tracey Bushnik, Jennifer Hall, and Joel Yan, Statistics Canada Objectives Ask questions and make predictions based on data Analyse data using one- and two-variable statistical analysis Use technology to generate graphical summaries of one- and two-variable data, generating different types of graphs based on the type of data (numeric or categorical) Analyse data trends and data correlations based on the graphs produced Suggested grade levels and subject area Grade 11 or 12 Mathematics (Data Management) Duration Two to four 75 minute periods Materials Computers with Internet access and statistical software Computer projector NLSCY synthetic microdata Analysis question handouts Student worksheet Classroom instructions 1. If necessary, give a quick review of one- and two-variable data analysis techniques. 2. Using the computer projector, provide the students with an overview of the NLSCY survey, using the information provided on the Mathematics Data page of the Statistics Canada Learning resources website. 3. Ensure students know how to launch the statistical software to be used and how to import the NLSCY synthetic microdata into this software from the Statistics Canada website. 4. Divide the students into seven groups, one for each section (A to G) of the questionnaire. Distribute the analysis questions to each group (Note: All of the analysis questions for the different sections are in one file, so you will need to cut them into the separate sections). Ensure that within each group, each question is being investigated by at least one student. 5. Distribute the student worksheet and have students complete Part 1 independently. Note: To save time, this step can be completed as homework. 6. After students have independently investigated their two questions, have the groups reconvene to summarize their findings for their assigned section of the survey in Part 2 of the worksheet. 7. Have students from each group share highlights of their overall findings with the class. This can be done via a PowerPoint presentation, a brief oral report, or another format of the teacher's choosing. During the presentations, brainstorm with the class ways that students could use these microdata for a major project. Have students write ideas for further analysis and project work on their worksheets in Part 3 of the worksheet. 8. Collect the worksheet for evaluation. Evaluation Students can be informally assessed on their work habits and computer skills throughout this activity. They can be formally assessed via the worksheet, which can be marked using a marking scheme of the teacher's choice. Student worksheet National Longitudinal Survey of Children and Youth (NLSCY) Self-reported synthetic microdata file for 16-17 year olds Name: Section of Questionnaire: Group Members: Objectives During this activity, you will examine NLSCY synthetic microdata critically. You will: become familiar with different types of analysis represent the data visually make conclusions based on your analysis share your findings with the class Instructions 1. Your group will be assigned a set of questions to examine that relate to one of the following sections of the NLSCY questionnaire: Section A: Friends and Family Section B: About Me Section C: Feelings and Behaviours Section D: Smoking, Drinking, and Drugs Section E: Health Section F: My Relationships Section G: My Parent(s) 2. With your group, read and discuss the list of analysis questions for your section. 3. From the list, select two questions that interest you. Make sure every question has at least one person in your group assigned to it. If you wish, you may come up with your own question to investigate. 4. Complete Part 1 of the worksheet on your own. 5. Reconvene with your group and complete Part 2 of the worksheet. Be prepared to present your findings to the class. 6. During the group presentations and whole class discussion, complete Part 3 of the worksheet. Part 1: On your own The two questions I have selected are: 1. 2. For my selected questions, I predict the relationship/trend to be: 1. 2. Download the synthetic microdata from the NLSCY survey from the Statistics Canada website and open it with your statistical software. Please note: 1. Section headers are in capital letters. There are no data corresponding to the headers (e.g., 'PERSONAL DATA'). 2. The first characters of each attribute name are the question numbers (e.g., A1, B4). If the question has multiple parts, each of the corresponding attribute names will have the same initial characters. For example, Question A14 asks 'Who besides your friends do you talk to about your problems?' and offers answers such as 'Mother' and 'Father'. Each of these sub-categories will also be denoted as A14 as they relate to that question. 3. More information about the study and about the attributes in the dataset is available as part of the Description of items in dataset. Tips on performing statistical analyses Summary tables Summary tables are useful to quickly display information about one or two attributes and about how two attributes may be linked. Performing an analysis between two numeric attributes Plot one numeric attribute on the x-axis and the other numeric attribute on the y-axis. You can then examine different numeric measures, plot the line of best fit, and examine the r2 value. Performing an analysis between a numeric and a categorical attribute Plot the attributes on the x-axis and y-axis. A box plot provides a good visual representation. You can also plot numeric measures such as mean or median for the numeric attribute. A summary table can help in identifying different measures. Performing an analysis between two categorical attributes Plot one attribute on one axis and the other attribute onto the graph itself to overlay the data onto the existing graph. Be sure to arrange categories in an order that makes sense (e.g., least to most). It may help to use a ribbon chart to accurately compare proportions. Relationships 1. What trends or relationships did you find in your data? Include actual measures (equation of line of best fit, mean, r2 value, etc.) if performing numeric analysis. i. ii. 2. Please explain these relationships (or lack of relationships). . i. Part 2: With your group In the space provided below, summarize your group's findings about the section of the NLSCY questionnaire you analyzed. Part 3: With the whole class In the space provided below, make notes on ways you could perform further analyses on this dataset or use these data in a project. Analysis questions for the National Longitudinal Survey of Children and Youth synthetic microdata for 16-17 year olds Section A: Friends and Family What are the mean, median, and mode for the attribute Friends Score? Do these values differ by sex? What is the mean number of female (A7) and male (A8) close friends? What are the median and mode? Are there any outlier values? Now examine by sex of respondent. Do girls have more female than male friends? Do boys have more male than female friends? Add the number of close female friends and close male friends together and recalculate mean, median, and mode. What does this represent? Is there a link between friends smoking (A10), drinking (A10), or trying marijuana (A10) and the respondent smoking (D1, D2), drinking (D3), or using marijuana (D5) respectively? Section B: About Me What are the mean, median, and mode for the attribute General Self Image Score? Do these values differ by sex? Is there a link between the attribute General Self Image Score and: o feeling that close friends know who I am (A5) o the number of close friends (A7 and A8) o whether teens have someone else to talk to besides their close friends (A13) o whether teens are normal weight or overweight (BMI_Cole_Method) Is there a link between feeling like an outsider (B7) and the attribute General Self Image Score? How about between feeling like an outsider (B7) and feeling happy right now (B3)? Are girls or boys more likely to have been physically attacked (B8)? How often are respondents seeing violence on TV (B10)? Are there any sex differences? Section C: Feelings and Behaviours What are the mean, median, and mode for the attribute Depression Score? Do these values differ by sex? Is there a correlation between not feeling like eating (C1) and having trouble focussing (C1)? Examine mean Depression Score by anyone at school who committed suicide (C2) and anyone you know who committed suicide (C3). Is there any correlation? Examine Depression Score by seriously considered suicide (C4). Is there a correlation between selling drugs (C7) and being questioned by the police (C7)? Section D: Smoking, Drinking and Drugs How frequently are young people smoking (D1)? Does it differ by sex? How frequently are young people drinking (D3)? Does it differ by sex? What proportion of young people have been drunk (D4)? How often? Does it differ by sex? Is there a link between smoking (D1) and drinking (D3) behaviours? How often are young people using marijuana (D5)? Does it differ by sex? Are there any links between marijuana use (D5) and smoking (D1) or drinking (D3)? Are there any links between smoking, drinking, and marijuana use and the following attributes: o Depression Score o General Self Image Score o Friends Score What proportion of young people has driven impaired (D7)? Does it differ by sex? What proportion of young people has been a passenger when the driver has been impaired (D8)? Does it differ by sex? Is there a link between driving impaired (D7) and being a passenger when the driver is impaired (D8)? Section E: Health What are the mean, median, and mode for the attribute Raw value for Body Mass Index? What is the mode for the attribute BMI Cole Method? Do these values differ by sex? Note: In the questions below, you can choose Raw value for Body Mass Index if you wish to work with numeric data or BMI Cole Method if you wish to work with categorical data. Is there a correlation between the attribute Depression Score and BMI? Is there a correlation between the attribute General Self Image Score and BMI? Is there a link between BMI and thinking about committing suicide (C4)? Is there a link between BMI and smoking (D1), drinking alcohol (D3), or using marijuana (D5)? Is there a link between BMI and the respondent's feelings about their weight (E8)? Do boys and girls differ in their feelings about their weight (E8)? Section F: My Relationships What is the mean age at which youth had their first boyfriend/girlfriend (F1)? What are the median and mode? Do these values differ by sex? What proportion of youth has had consensual sexual intercourse (F5)? Does this differ by sex? For those who have had sex, what is the mean age at the first experience (F6)? What are the median and mode? Do these values differ by sex? For those who have had sex, what is the mean age of their partner at the first experience (F7)? Does it differ by sex? Is there a relationship between the age of the respondent when they first had consensual sex (F6) and the age of their partner (F7)? What is the equation of the line of best fit? What is the r2 value? What does this mean? Examine the attribute General Self Image Score with whether the respondent currently has a boyfriend or girlfriend (F2), has ever had sex (F5), and whether the respondent is currently sexually active (F8). Section G: My Parents What are the mean, median, and mode for the attribute Conflict Resolution Score with Mother? Do these values differ by sex? What are the mean, median, and mode for the attribute Conflict Resolution Score with Father? Do these values differ by sex? Examine the attributes Conflict Resolution Score with Mother and Conflict Resolution Score with Father with whether the respondent has ever considered suicide (C4). Are there any correlations? Examine the attributes Conflict Resolution Score with Mother and Conflict Resolution Score with Father with smoking (D1), drinking (D3) and marijuana use (D5). Are there any correlations? Are there any correlations between the frequency of eating meals together or having discussions (G4 for mother; G9 for father) and the overall relationship with mother (G3) and father (G8)? Are there any correlations between feeling your parents understand you, are fair to you, or are affectionate to you (G2 for mother; G7 for father) and the frequency that you and your parents disagree and fight (G5 for mother; G10 for father)?