Name: _______________________________ MDM4U (Mathematics of Data Management) Culminating Project The Task The main focus of this project is for you to gather, organize, analyze and present data. Your study will include relevant descriptive statistics that you have learned about in this course. Your final project will consist of three parts – a proposal, a written report and a presentation to the class. This project is to be completed either individually or with one (1) partner. The project is worth 20% of your final mark for this course. Type of Project First you will need to decide what type of data you will collect for your project: 1. Primary Data is information that you collect on your own. For example, this could be obtained by having students at CPHS complete a questionnaire on paper or online (using Fathom, surveymonkey.com, etc.). It could also be an actual experiment/simulation that you conduct using a computer. 2. Secondary Data is information that you are taking from another source. It is important to use reliable sources. When choosing your topic, be sure that you will be able to find good data. Some places that students often obtain data from are as follows: Statistics Canada: www.statcan.ca E-Stat Remote login Username: uppercan PW: estat University of Michigan Library of Statistics http://www.lib.umich.edu/govdocs/stats.html Nation Master www.nationmaster.com Sample Projects Some sample project write-ups are available at http://www.brocku.ca/cmt/mdm4u/asprojects/index.html Note: the write-up format is not the same as is required for your project Other Useful Information Your textbook has some useful information for working on your final project. The introduction on pages vii to xvi Various Project Connections on 29, 66, 85, 93, 110, 151, 197, 266, 303, 383 1. Scheduling and Due Dates Scheduling Draw up a calendar. Set due dates. Leave “mess up” time. Think things through thoroughly before you get started. Make sure that all pieces will work. Good planning saves time in the end. Due Dates Proposal – Thurs, Nov 24th Presentations – Tues, Dec 6th Final Written Report – on or before Tues. Dec 6th -1- 2. Proposal Phase Proposal Requirements – Due Thursday, Nov 24th, 2011 1. 2. 3. 4. 5. 6. An outline describing the procedure you took in researching and gathering the data [3] Thesis – your main thesis question/statement and the sub-problems you are going to answer. [5] Population, Sample (if applicable) [3] Analyze – Explain each of the following: What are the main variables in your question? [3] Can these variables be measured statistically? [2] Is there enough data to make an interesting analysis? [2] Hypothesis - What do you expect to find / observe? [2] I need to know how the researchers got their data. OR 2) the survey that you are going to use. [5] HINT: Start your Bibliography as soon as you find your first useful web site. Trying to go back and find information later is a nightmare. If you have problems gathering ANY of these components SEE ME. You've likely chosen a difficult topic and it needs to be changed. Almost EVERY problem that I have seen on final projects was because of an incomplete or poorly done proposal phase. The following flawed projects have been seen in past Data Management courses: Projects that were far too large in scope. A research team of 100 working for 25 years would be unable to prove causation in the way that these students wished to do. This happens most often with projects like drunk driving, teenage pregnancy or economic problems. Choose less glamorous and smaller topics that you can find data about. Projects which attempted to prove causation instead of correlation. Projects whose entire body of evidence was based on the unreliable sources from the Internet. They made no attempt to figure out where their sources' data came from. Projects where random sampling involved giving a survey to everyone in their class. Projects where the students developed their surveys first and their research questions second. They ended up not asking the correct survey questions and were unable to prove their point. 3. Written Report Format Your report should have the following sections: 1. Title Page - Thesis, Name, Date, Course Code: MDM 4U, Teacher’s Name Relevant picture or graphic NOT numbered 2. Table of Contents - Include section headings and page numbers NOT numbered -2- 3. Summary - - - Do not write this until you are finished your project! Page numbering starts here (1) – insert a section break In one page, briefly summarize your entire report. A summary section is something that would be read by a manager who didn’t have enough time to read the entire report, so make sure that you have enough details that it can stand by itself. At the very least, include the following information: - Problem: A clear statement of what you are trying to learn - Plan: The procedure you will use to carry out the study (How do you choose people? How do you measure? Who does the measuring? What methods are you going to use?) - Data: The data are collected according to the plan (What data did you collect? Where did it come from?) - Analysis: The data are summarized and analyzed to answer the thesis question (numerical, graphical, informative sentences) - Conclusions are drawn about what has been learned (note any biases, suggest further studies) 4. Problem Main thesis question. The thesis question is the theme of your report (e.g. What is the relationship between an NBA player’s salary and their success?). Try to use the word “relationship” in your thesis question. Remember, you do not have the tools to try and find any cause and effect. Sub-questions: The sub-questions are the smaller questions that you will answer that will lead you to conclude on your main thesis question. These should be specific enough that they contain your variables that you will compare. The problems may evolve slightly throughout the life of your project. (e.g. What is the relationship between salary and a player’s points per game? What is the relationship between salary and a player’s rebounds per game? What is the relationship between salary and the number of games that a player has won?) Hypothesis – What do you expect to find? Define the population and describe the characteristics of the population (e.g. all players in the NBA that played at least 70 games during the 2006-07 regular season). Define the independent variables (e.g. points per game in 2006-07 NBA regular season, rebounds per game in 2006-07 NBA regular season) Define the dependent variables (e.g. player salary). 5. Plan Select the sampling method and justify your choice Design and explain the Experiment/Survey/Questionnaire/Data Collection process. Identify any possible biases NOTE: if the data is not your own, you need to find out as much of the above information as possible and point out the parts that you don’t know. 6. Data Put all of your raw data collected in an appendix, not in this section Include summaries of your data collection here (frequency tables – but not histograms or graphs) Identify all possible problems you ran into with your data (Did you need to manipulate it to use it in Excel? Did you alter the scale?) -3- 7. Analysis For each sub-question identified, use the statistics we learned in class to describe the data or find trends/relationships. Only use those that are relevant. (a) Numerical Statistics (you must include at least 3) Find means, modes, and medians Find the standard deviation, Q1 , Q3 , IQR, percentiles Use linear regression and find the correlation coefficient, equation of a line of best fit Use non-linear regression and find the coefficient of determination, equation of a curve of best fit Relate your data to the Normal Distribution, Binomial Distribution or another distribution. Use z-scores and z-tables to find some useful information. Permutations, Combinations and Probability: - Predict the probability of certain events using your model - Do something else relating to probability - Use a simulation to help you discover a probability - Use the binomial theorem (b) Graphical Representations (you must include at least 3) Scatter plots (this should be included in every project as you will be finding many relationships) Bar graph / histogram / frequency polygon (histogram + curve) / cumulative frequency polygon (each freq. is a cumulative total) / relative frequency polygon (freq. as a %) / line graph / moving average Box and whisker (c) Information – descriptive sentences. This part is very important and often overlooked by students. Don’t just provide numbers and statistics. Be sure to interpret them for the reader. What do the numbers tell you? 8. Conclusion Draw conclusions that directly relate to your thesis. Note any biases that you believe occurred in your study. Make suggestions for further/follow-up studies or any modifications that would make to the current study. 9. Bibliography Web sites cited using APA format. General format/sequence: Author. (Date published if available; n.d. (no date) if not). Title of article. Title of web site . Retrieved date. From URL. See http://www.liu.edu/CWIS/CWP/library/workshop/citapa.htm for other kinds of media. 10. Appendices Actual data set you used (or the data set you gathered) Sample questionnaires that have actually been filled out if you used primary data Glossary with key terms unique to your project -4- 4. Final Project Rubric Name: ______________________ Date: _________ MDM4U - Summative Report Rubric Criteria Level 0-1 Level 2 Level 3 Level 4 Title Page Title is not representative of the topic and/or missing 3 or more key components of a title page. Title is lacking insight and/or missing 2 of the key components of a title page. Title is okay or missing one of the key components of a title page. Interesting title, includes name, course code, date and teacher’s name Table of contents Not done or done incorrectly. NA NA Report format 2 formatting items not completed 1 formatting item All formatting not completed guidelines were followed. 3 or more formatting items not completed Includes a relevant picture or graphic Total 2 Lists the main sections of the report and the pages on which they 2 are found 4 Summary Includes Problem, Plan, Data, Analysis and Conclusion Limited effectiveness Some effectiveness Considerable effectiveness Highly effective 4 -5- Problem Limited effectiveness Some effectiveness Considerable effectiveness Highly effective Thesis and SubQuestions are clear and concise Population and characteristics are described All independent and dependent variables are defined Fish Bone Diagram is thorough 12 Plan sampling method Limited effectiveness Some effectiveness Considerable effectiveness Highly effective measurement technique possible biases Data Includes summary of data 8 Limited effectiveness Some effectiveness Considerable effectiveness Problems with data are summarized Highly effective 4 Analysis of Limited problem (stats of effectiveness. 1 variable) Analysis was incomplete and ineffective at communicating understanding Some Considerable effectiveness. effectiveness. Use a small number of techniques from the course to analyze the problem. Highly effective. Uses a large variety of different mathematical techniques to analyze the data that was obtained or collected in order to solve the 8 thesis problem Analysis of problem (regression) Some Considerable effectiveness. effectiveness. Use a small number of techniques from the course to analyze the problem. Highly effective. Uses a large variety of different mathematical techniques to analyze the data that was obtained or collected in order to solve the 8 thesis problem Limited effectiveness. Analysis was incomplete and ineffective at communicating understanding -6- Results Reports findings in graphical form and in textual form when necessary. Limited effectiveness Some effectiveness Considerable effectiveness Highly effective. Comments thoroughly on the results Results are presented in a logical and sequential manner. 8 Use of mathematical terminology and notation Mathematical content Uses terminology or Usually uses notation correct inconsistently or terminology incorrectly; makes and notation; major errors may make minor errors Consistently uses correct terminology and notation Presents material with mathematical content that is incorrect or incomplete; major errors or omissions Presents material with mathematical content that is generally correct and complete; may have minor errors or omissions Presents material with mathematical content that is completely correct and complete Presents the mathematical content in a fairly logical manner; minor steps may be omitted Presents the mathematical content in a logical manner. Logical Reasoning Presents the mathematical content in an illogical manner; major steps are omitted or significant leaps required to follow development Consistently uses correct terminology and notation which enhances the presentation 4 Presents material with mathematical content that is completely correct, complete and always pertinent to the report. 4 Presents the mathematical content in a logical manner, with all steps clearly shown. 4 -7- Conclusion Student restates and summarizes in a clear manner what the overall results of the work discovered or did not discover and states and comments on any limitations of the analysis. Future Work Student outlines, in detail, where the project could go if it were analyzed further. Limited effectiveness Some effectiveness Considerable effectiveness Highly effective. 8 Limited effectiveness Some effectiveness Considerable effectiveness Highly effective. States some limitations of the analysis. Appendix & Glossary 4 Limited effectiveness Some effectiveness Considerable effectiveness Highly effective. Student includes appendices with data tables of the raw data when necessary. Uses a small font to reduce the total number of pages used. 2 86 marks total -8-