Statistics for Data Science - 1 Week 1 Graded Assignment Syllabus covered: 1. Classification of statistics 2. Understanding the notion of sample and population 3. Classification of data 4. Understanding the notion of case/observation and variable 5. Classification of variables: Numerical and categorical 6. Scales of measurement for variable: Nominal, ordinal, interval and ratio [Total 20 marks] 1 Use the following information and the data given in Table 1.1.A to answer the questions 1 and 2. A local government wants to assess the number of the rich, middle class, and poor people living in a particular region. To understand this, the government defined the categories based on the annual income of its citizens [Table 1.1.A]. Income is measured in lakhs of rupees per year Poor : income ≤ 4 lakhs Middle Class : 4 lakhs < income ≤ 8 lakhs Rich : income > 8 lakhs Person Person Person Person Person Person Person Person Person Person Person 1 2 3 4 5 6 7 8 9 10 Income Rich Poor Poor Middle class Middle class Rich Middle class Poor Middle class Middle class Table 1.1.A: Income dataset 1. What kind of a variable is income in the given dataset? [1 mark] (a) Categorical (b) Discrete numerical variable (c) Continuous numerical variable 2. What is the scale of measurement of the variable income in the given dataset? [1 mark] (a) Nominal (b) Interval (c) Ordinal (d) Ratio 2 Use the following information and the data given in Table 1.2.A and Table 1.3.A to answer the questions 3, 4, and 5. Google provides users with the option to rate any app on their Play Store. A user can rate an app by giving an integer rating from 1 to 5. The rating of the app is then computed as the arithmetic mean of all user ratings. Tables 1.2.A and 1.3.A show the rating given by users and average rating of apps respectively. App App App App App App App App App App name 1 1 1 2 2 2 2 3 3 User User User User User User User User User User 1 2 3 4 5 6 7 8 9 Rating given by user 4 3 2 5 4 3 5 4 3 Table 1.2.A: Ratings given by user App App App App name 1 2 3 Rating of app 3 4.25 3.5 Table 1.3.A: Ratings of apps 3. What kind of a variable is the rating given by each user to an app? [1 mark] (a) Categorical (b) Discrete numerical variable (c) Continuous numerical variable 4. What is the scale of measurement of the rating given by the user? (a) Ordinal (b) Nominal (c) Interval 3 [1 mark] (d) Ratio 5. What kind of a variable is the overall rating of an app on play store? (a) Categorical (b) Discrete numerical variable (c) Continuous numerical variable 4 [1 mark] Use the following information and the data given in Table 1.4.A to answer the questions 6,7, and 8. Karthik wants to start a shoe production company for all the Indian market. To analyse public preferences, he hires a team of market researchers who conduct market research in two states: Andhra Pradesh and Maharashtra. The market research team prepares a form to collect the data and the collected dataset is shown in Table 1.4.A S.No 1 2 3 4 5 6 7 8 9 Name Rahul Joseph Satya Krish Yamini Ashish Shoaib Satwik Mayur Age (years) 37 25 18 29 43 57 34 65 22 Shoe material Leather Rubber Synthetics Plastic Foam Synthetics Plastic Leather Plastic Purpose Formal Sports Running Running Sports Running Formal Formal Sports Price range High Medium Low Medium Medium High Medium Medium Low Suggestions Quality should be better, with better pricing Heel quality should be better Variety of colors and styles are required Heel height shouldn’t be high Flexible, abrasive sole Cushioning should be comfortable Breathable mesh for sports shoe Heel should be very hard for trekking purpose Should be water resistant and last longer Table 1.4.A: Shoe manufacturing market research 6. Is the sample a representative of the population? [1 mark] (a) True (b) False 7. How many variables from the collected survey data are categorical? [1 mark] 8. Which of the collected data is unstructured? [1 mark] (a) S.No (b) Name (c) Age (years) (d) Shoe material (e) Purpose (f) Price range (g) Suggestions Use the following information and the data given in Table 1.5.A to answer the questions 9, 10, and 11. 5 Table 1.5.A: Bike dataset Naresh is a bike enthusiast and wants to know about the specifications of all the major bikes available in the market. He collects data for different bike models which is given in Table 1.5.A. 9. Which of the following is(are) case(s) or observation(s) for the given bike dataset? [2 marks] (a) KTM 790 Duke (b) Yamaha MT 09 (c) Yamaha (d) Displacement in CC 10. Which of the following is(are) continuous variable(s) in the given bike dataset? [2 marks] (a) Mileage (b) Displacement (c) Number of cylinders (d) Average Rating 6 11. Statement - The sample should be representative of the population. [3 marks] (a) True, since the sample is a subset of the population. (b) True, because the sample is a subset of the population and we derive the properties of the population like mean, median from the sample. (c) False, sample need not be representative of the population. Use the following information to answer the questions 12, 13, and 14. The head of a locality in Chennai conducted a survey in her locality regarding Covid-19. According to the survey, Statement 1: 59% of the residents consider Covid-19 to be a serious disease. Statement 2: 41% consider it as normal flu and are of the opinion that there should not be any lockdown. 12. Which of the following is a case/observation for the above data? [1 mark] (a) India (b) Person who answered the survey (c) Covid-19 (d) Flu 13. If all the residents of the locality participated in the survey, what statistical analysis can be done using this data? [2 marks] (a) Inferential statistics (b) Descriptive statistics 14. Suppose the survey collected data from 100 of the locality residents taken at random, and from that the above statements 1 and 2 were issued. Now, what statistical analysis can be done using this data? [2 marks] (a) Inferential statistics (b) Descriptive statistics 7