Laboratory work ”Basic Statistics” Instructions

732A47 Text Mining Division of Statistics Department of Computer and Information Science Laboratory work ”Basic Statistics” Instructions Please include all diagrams and Python codes that support your conclusions into your report. Assignment 1: Generating random numbers and hypothesis testing 1. Use module scipy.stats to generate a sample X with 50 observations that come from Poisson distribution having mean value 4. 2. Write the code that computes an approximation to the mean value and the variance by using the sample: (𝑋1 + ⋯ + 𝑋50 ) 𝑀𝑒𝑎𝑛𝑋 = 𝑋� = 𝑉𝑎𝑟𝑋 = �� 𝑋 2 − (𝑋�)2 50 3. Check the theoretical values for the mean and the variance of this distribution. How close/far are those values to the ones you have computed in step 2? Conclusions? 4. Use “Counter” module in the “collections” to find out the unique values present in X and their frequencies. 5. Create a barplot showing frequencies for all observed values of X. Does this look like a Poisson distribution? (use bar() ) 6. For each Z from 0 to 13, compute a probability mass function from the Poisson distribution with mean 4 and present the result as a barplot (use poisson.pmf() and bar() ). Compare this plot with the previous one and make conclusions. 7. Use function mstats.ttest_1samp and the sample you have generated in step 1 to test whether a. Mean value is 2.0 b. Mean value is 3.7 c. Mean value is 4.3 Assignment 2: Classifying a data with Support Vector Machines Here, you will need to use svm module from sklearn: http://scikit-learn.org/stable/modules/svm.html Please read this page to perform a successful classification in this assignment. 1. Generate two samples X1 and X2, each of size n=100, and the values in these samples should be Exponential(scale=5) 2. For each X1(i) and X2(i) Generate Y(i) value according to the following formula: 'Blue' if X1*X2≤30 +ϵ 𝑌=� 𝑤ℎ𝑒𝑟𝑒 𝜖~𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝑠𝑐𝑎𝑙𝑒 = 10) ′Orange′ if X1 ∗ X2 < 30 + 𝜖 732A47 Text Mining Division of Statistics Department of Computer and Information Science 3. Plot the data in the coordinates X1 and X2 and specify color for the points as Y (use scatter() ) Are the data well separated? 4. Pack X1 and X2 into tuple X with function zip() 5. Fit the following SVM models to the data (X,Y): a. Kernel=’Linear’ b. Kernel=’rbf’, gamma=0.7 6. Use the fitted models to predict the Y values for all X values. 7. Make the same kind of plots as in step 3 but use the fitted values to show the color of the points. Conclusions?

Laboratory work ”Basic Statistics” Instructions

Related documents

Products

Support

Laboratory work ”Basic Statistics” Instructions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib