Baby Names in the United States from 1880 to 2011 LT – Understand how data from a government web site may be transformed into information using Python programming and PSD Google Doc spreadsheet charts. SC – Use data available from the United States Social Security Office and transform it into meaningful charts using PSD Google Doc spreadsheets, the spreadsheet workbook file is to then be inserted into each student's individual web site. You are going to make a series of charts in a single PSD Google Docs spreadsheet that show the popularity of baby names from 1880 to 2011. Below is an example of the charts you will be asked to make. This one happens to be for the name Scott and gender male. (As you can see, it was a rare name for a male back in the 1930s and earlier, as well as recently.) The Office of Social Security in the United States makes available text files for each year from 1880 to 2011 of all the baby names by gender issued a birth certificate in America, as long as five or more individuals have that name in a given year. The problem is that the Social Security Office only releases this information in a zipped folder of 132 separate text files (see National Data file at (http://www.socialsecurity.gov/OACT/babynames/limits.html). These files are downloadable from one folder. Mr. Durkin has already downloaded this folder and has it inside his Public folder on the Preston server. Directions on how to find it and place a copy in your server folder appear on the next page. Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 1 of 8 The files from the Social Security Office web site have already been downloaded for you and are in a folder in Mr. Durkin's Public folder on the Preston server. You will need to copy the "names" folder and then go to your web20 folder on the Preston server folder (your id #) and paste a copy of this folder in there. Below is where you can find the "names" folder Mr. Durkin has in his Public folder. (It is inside another folder inside the Public folder.) Most of you have been to Mr. Durkin's Public folder before. Once you find it, you will right-click on the "names" folder to copy it and then go to your folder to paste it in there. To get a copy of the Social Security "names" folder: 1. 2. 3. 4. 5. 6. 7. 8. 9. Open the Computer icon on the desktop of your computer. Open the Preston main server called PREMAIN (T:/), I believe. Open the Staff folder. (The one that is just called "Staff.") Go to the folder for sdurkin. (If you type "sd" real fast you will go right to it.) Open the Public folder. And then open the folder inside it named "Other." Inside Other look for the folder named: names – this is the one you need to copy. Copy the names folder (right-click on the folder and select Copy). Go to your web20 folder in your Preston server folder and paste in this folder you copied by just right-clicking in a white area of your web20 folder and selecting Paste. You now have the files from the Social Security Office. Look in your copy of the folder and open up one the files for any year. It will probably open in Notepad. These are the files that will help us make our charts! Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 2 of 8 Without a computer program, it would take a long time to open and close all these files all the while searching for the desired name, gender and number of people in each file. You will use a Python program that has already been written for you that will help you open each file, look for the name and gender you identified, display the year and the number of people of that gender who were issued that name, and close each file. (This is the kind of program computer science students at Fossil write.) 1. You need to highlight and copy the Python program below – then come back to step#2 below. 2. Open a new script window in Python. 3. Name it: names.py 4. IMPORTANT: BE SURE TO SAVE names.py INSIDE YOUR NEW names FOLDER (where all the text files are from Social Security)– otherwise this program won't work!!!! 5. Copy and paste in the highlighted code below into your names.py file. Save these changes to the file. x = 1880 name = input("Enter a name: ") gender = input("Enter a gender: ") years = [] while x <= 2011: found = 0 s = 'yob'+str(x)+'.txt' f = open(s, 'r') for i in f: i = i.strip(); i = i.split(',') if i[0] == name and i[1] == gender and len(name)==len(i[0]): t = (x, i[2]) years += [t] found = 1 if found == 0: t = (x, 0) years += [t] #print (x) x+=1 for i in years: print (i[0],'\t',i[1]) print ("done!") Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 3 of 8 After you run the names.py program and enter an appropriate name and gender, it should display for you the year and number of people identified by you that were born with that information. It is this output you will copy and use to make your charts in PSD Google Docs spreadsheet. Below is a screen shot of the Python Shell for the name "Rush" and the gender of "M." (You have to indicate gender with a single capital letter (uppercase), either "M" or "F." Also, the name must be entered with an initial capital letter and the rest in lowercase. Also keep in mind that if your uncle's name is "Bob," for instance, you will probably want to look for "Robert.") Select and copy the lines of output from 1880 to 2011. Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 4 of 8 Run the program and copy the output (just the lines of output from 1880 to 2011). Create a new PSD Google Doc spreadsheet and name it Names. In the first sheet click on cell A1 and enter: Year In cell B1 enter: Number of People Click in cell A2 and paste the (1880 to 2011) information that was displayed in the output of your Python program. Below is a screen shot of Rush (Male). Of course, the data goes all the way to row 133. Name this sheet RushData (change it to whatever name you identified – MaryData, for ex.). Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 5 of 8 Next, as you may recall, you have to highlight exactly the data in order to begin to make a chart. When there is a long amount of cells to highlight sometimes it is easier to use the following keyboard trick, but you can also just select A1 and try to manually highlight down to cell B133. Go to cell A1. Hold down the SHIFT key and the CTRL key with one hand. While you keep those held down also press the DOWN ARROW KEY. (This will highlight all the data in Column A.) To also highlight the data in column B, keep the SHIFT and CTRL down and press the RIGHT ARROW KEY. If the above is done correctly, all the data from A1 to B133 should now be selected/highlighted. While the data is now selected/highlighted, go to the Insert menu and select Chart. The window that shows up has a chart (see below), but the preview shown isn't correct yet! ? Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 6 of 8 You have to adjust the settings in the Start tab! In Recommended charts section, select the Combo Chart. Also check the Use column A as labels box. Better! Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 7 of 8 Next, click on the Customize tab to label the chart title and both axis. If you would like, it is also possible to change the type to a line chart. Insert this chart into the spreadsheet. Move the chart to its own sheet. Rename the sheet with the chart RushChart (change it to what name you identified). Do at least three or four name/gender, if not many more! Carefully label each chart and be sure to name each sheet. Just keep adding sheets as needed! Before class is over today, please insert this chart into your PSD academic web site! Please share with your family members what you learned about the popularity of names over the last one hundred years in America by showing them the work you have posted on your academic web site! Baby Names in the United States from 1880 to 2011 Monday, February 8, 2016 Page 8 of 8