Lesson Plan Data Mining Basics  BIM 1 

advertisement
Data Mining Basics BIM 1 Business Management & Administration Lesson
Plan
Performance Objective The student understands and is able to recall information on data mining basics. Specific Objectives  The student is expected to discuss the nature of data mining.  The student is expected to describe data mining tools and techniques. Terms  Data Mining‐ The process of analyzing data from different perspectives and summarizing it into useful information.  Perspective‐ A particular attitude toward or way of regarding something.  Database‐ A structured set of data held in a computer.  Gather‐ To collect.  Information‐ Facts provided or learned about something or someone.  Data‐ Facts and statistics collected together for reference or analysis.  Data Gathering Tool‐ The tools used to collect and record information.  Analysis‐ Detailed examination of the elements or structure of something, typically as a basis for discussion or interpretation.  Analytical Tools‐ Any tools used to help with analysis.  Regression Analysis‐ A procedure for determining a relationship between a dependent variable and an independent variable.  Query‐ A request of information from a database.  Consumer Trends‐ Habits or behaviors currently prevalent among consumers of goods or services.  Extract‐ To get, pull, or draw out, usually with special effort, skill, or force.  Transform‐ To change in form, appearance, or structure.  Infrastructure‐ The basic underlying framework or features of a system or organization. Time When taught as written, this lesson should take approximately 150 minutes to teach. Preparation
Copyright © Texas Education Agency, 2014. All rights reserved.
1
TEKS Correlations This lesson, as published, correlates to the following TEKS. Any changes/alterations to the activities may result in the elimination of any or all of the TEKS listed. 130.114 (c) Knowledge and Skills The student applies data mining methods to acquire pertinent information for business decision making. The student is expected to: (a) Discuss the nature of data mining; and (b) Describe data mining tools and techniques. Interdisciplinary Correlations English‐English IV 






110.34(b)(1) Reading/Vocabulary Development. Students understand new vocabulary and use it when reading and writing. 110.34(b)(17) Students understand the function of and use the conventions of academic language when speaking and writing. Students will continue to apply earlier standards with greater complexity. 110.34(b)(18) Students will write legibly and use appropriate capitalization and punctuation conventions in their compositions. Students are expected to correctly and consistently use conventions of punctuation and capitalization. 110.34(b)(19) Students are expected to spell correctly, including using various resources to determine and check correct spellings. 110.34(b)(12) Students use comprehension skills to analyze how words, images, graphics, and sounds work together to impact meaning. 110.34(b)(22) Students clarify research questions and evaluate and synthesize collected information. 110.34(b) (23) Students organize and present their ideas and information according to the purpose and research and their audience. Occupational Correlation (O*Net – www.onetonline.org/) Job Title: Operations Research Analyst O*Net Number: 15‐2031.00 Reported Job Titles: Operations Research Analyst, Operations Research Manager, Business Analytics Director Tasks 

Define data requirements and gather and validate information, applying judgment and statistical tests. Perform validation and testing of models to ensure adequacy and reformulate models as necessary. Prepare management reports defining and evaluating problems and recommending solutions. Soft Skills: Complex Problem Solving, Critical Thinking, Judgment and Decision Making Accommodations for Learning Differences It is important that lessons accommodate the needs of every learner. These lessons may be modified to accommodate your students with learning differences by referring to the files found on the Special Populations page of this website. Copyright © Texas Education Agency, 2014. All rights reserved.
2
Preparation  Review and familiarize yourself with the terminology, all website links, and any resource materials required.  Have materials and websites ready prior to the start of the lesson. References  http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm Instructional Aids  Lesson 5.1 Presentation  Data Gathering Tool (Spreadsheet)  Instructor Computer/Projection Unit  Online Website listed in the References Section Introduction
The main purpose of this lesson is to help students understand the terminology and nature of data mining and be able to describe and discuss data mining tools and techniques/methods. Say Currently, every two days, we create as much information as we did from the dawn of civilization until 2003. (This is a quote from Eric Schmidt.) Ask On any given day, what are some ways you gather information? Ask Do you think any of these ways are better than others? Why or why not? Ask Do you realize what kind of information is being gathered on you on any given day? Ask What methods do you think are being used to gather information on you? Say Now that we have discussed ways to gather information/data, let’s gather some! Copyright © Texas Education Agency, 2014. All rights reserved.
3
Outline
I. Vocabulary/Personal Word Walls I. Introduction (Ask and Say) III. Discovery Activity  How do we gather information?  Activity Review IV. Instruction/Discussion  When did data mining start?  Why use data mining?  How does it work?  What makes it data mining?  Different levels of analysis  What kind of technological infrastructure is required? V. Review and Evaluation VI. Extensions students will have created personal, possibly electronic, DWord Walls. The method and uring the 1st week of school, plocation will be established by the roject. teacher. Specifics are listed in both this document and in the presentation. After the discussion questions, students will be given a data gather tool to collect data on their classmates. Once they have gathered and recorded their information, they will analyze their findings and make predictions/assumptions based on their findings. Share with students the information provided. There are several other options to share the same information via posted videos and/or posted research. Review the main points from the lesson with students and then give them the provided assessment. Extension 1= Students can do individually, with a partner, or within a group. Extension 2= Classroom
Copyright © Texas Education Agency, 2014. All rights reserved.
4
Multiple Intelligences Guide Existentialist Interpersonal Intrapersonal Kinesthetic/ Bodily Logical/ Mathematical Musical/Rhythmic Naturalist Verbal/Linguistic Visual/Spatial Application
Discovery Activity Using the provided template, gather data on classmates. Students can use the provided fields or they can customize their data gathering tool (spreadsheet). Instruction/Discussion/Information This information is provided to help get your students ready to do some sort of application of the information in Lesson 3 of this unit. Share this with your students in the manner that works best for you and your classroom. Assessment Use the provided assessment. Key is provided. Summary
Review and Lesson Evaluation Review the lesson’s purpose and evaluate its effectiveness. Evaluation
Informal Assessment Any and all of the following can be used as informal assessments:  Class participation  Discovery Activity‐Data Gathering Tool Formal Assessment  Data Mining Basics Enrichment
Copyright © Texas Education Agency, 2014. All rights reserved.
5
Extensions  Data Mining Software Research and report on three different types of data mining software available for purchase. Include name of software, its capabilities, its infrastructure requirements, and any companies that use it (if available). 
Data Mining for Us Set up the classroom as a business in which people could make purchases. Keep the “store” open for a week and record all daily purchases (time of purchase, gender of purchaser, age of purchaser, item purchases, quantity purchased, etc.). Once you have gathered and recorded data, make predictions about what should be sold next week and where those items should be placed in the classroom. Copyright © Texas Education Agency, 2014. All rights reserved.
6
Lesson 5.1-Data Mining Basics
Formal Assessment
Objective: To determine your level of understanding of the nature of data mining and your ability to
describe data mining tools and techniques.
Please answer the following questions (be specific and detailed).
1. What is the definition of data mining?
2. Of the two manual methods of data mining that were described in the lesson, pick one and identify
and explain it.
3. What are three reasons someone would use data mining?
4. Of the six different levels of analysis used in data mining, pick one and explain it in your own
words.
5. What are two very important questions someone should ask and answer before purchasing any
data mining software?
Each question will be worth 10 points.
Copyright © Texas Education Agency, 2014. All rights reserved.
7
Lesson 5.1-Data Mining Basics
Formal Assessment
Objective: To determine your level of understanding of the nature of data mining and your ability to
describe data mining tools and techniques.
Please answer the following questions (be specific and detailed).
6. What is the definition of data mining?
a. Data Mining is the process of analyzing data from different perspectives and summarizing it
into useful information.
7. Of the two manual methods of data mining that were described in the lesson, pick one and identify
and explain it.
 Bayes’ Theorem= Thomas Bayes, 1700s. Probability measures a degree of belief and
Bayes’ Theorem links the degree of belief in a proposition before and after accounting for
evidence. Example: For example, suppose somebody proposes that a biased coin is twice
as likely to land heads than tails. Degree of belief in this might initially be 50%. The coin is
then flipped a number of times to collect evidence. Belief may rise to 70% if the evidence
supports the proposition.
 Regression Analysis, 1800s= A statistical process for estimating the relationships among
variables…between one dependent variable and one or more independent variables. It
helps one understand how the typical value of the dependent variable (or 'Criterion
Variable') changes when any one of the independent variables is varied, while the other
independent variables are held fixed
8. What are three reasons someone would use data mining?
 To determine market trends
 To save money
 To make money
 To determine future consumer spending
 To analyze consumer spending habits
 To help see determine patterns
 To save time
9. Of the six different levels of analysis used in data mining, pick one and explain it in your own
words.
 Artificial neural networks: Non-linear predictive models that learn through training and
resemble biological neural networks in structure.
 Genetic algorithms: Optimization techniques that use processes such as genetic
combination, mutation, and natural selection in a design based on the concepts of
natural evolution.
 Decision trees: Tree-shaped structures that represent sets of decisions. These decisions
generate rules for the classification of a dataset. Specific decision tree methods include
Classification and Regression Trees (CART) and Chi Square Automatic Interaction
Detection (CHAID). CART and CHAID are decision tree techniques used for classification
of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset
to predict which records will have a given outcome. CART segments a dataset by creating
Copyright © Texas Education Agency, 2014. All rights reserved.
8
2-way splits while CHAID segments using chi square tests to create multi-way splits. CART
typically requires less data preparation than CHAID.
 Nearest neighbor method: A technique that classifies each record in a dataset based on a
combination of the classes of the k record(s) most similar to it in a historical dataset (where
k 1). Sometimes called the k-nearest neighbor technique.
 Rule induction: The extraction of useful if-then rules from data based on
statistical significance.
 Data visualization: The visual interpretation of complex relationships in multidimensional
data. Graphics tools are used to illustrate data relationships.
10. What are two very important questions someone should ask and answer before purchasing any
data mining software?
 How big is/will be your database?
 How complex are/will be your queries?
Each question will be worth 20 points.
Copyright © Texas Education Agency, 2014. All rights reserved.
9
Download
Study collections