732A47 Text Mining Division of Statistics Department of Computer and Information Science Laboratory work ”Basic Statistics” Instructions Please include all diagrams and Python codes that support your conclusions into your report. Assignment 1: Generating random numbers and hypothesis testing 1. Use module scipy.stats to generate a sample X with 50 observations that come from Poisson distribution having mean value 4. 2. Write the code that computes an approximation to the mean value and the variance by using the sample: (π1 + β― + π50 ) πππππ = ποΏ½ = ππππ = οΏ½οΏ½οΏ½οΏ½ π 2 − (ποΏ½)2 50 3. Check the theoretical values for the mean and the variance of this distribution. How close/far are those values to the ones you have computed in step 2? Conclusions? 4. Use “Counter” module in the “collections” to find out the unique values present in X and their frequencies. 5. Create a barplot showing frequencies for all observed values of X. Does this look like a Poisson distribution? (use bar() ) 6. For each Z from 0 to 13, compute a probability mass function from the Poisson distribution with mean 4 and present the result as a barplot (use poisson.pmf() and bar() ). Compare this plot with the previous one and make conclusions. 7. Use function mstats.ttest_1samp and the sample you have generated in step 1 to test whether a. Mean value is 2.0 b. Mean value is 3.7 c. Mean value is 4.3 Assignment 2: Classifying a data with Support Vector Machines Here, you will need to use svm module from sklearn: http://scikit-learn.org/stable/modules/svm.html Please read this page to perform a successful classification in this assignment. 1. Generate two samples X1 and X2, each of size n=100, and the values in these samples should be Exponential(scale=5) 2. For each X1(i) and X2(i) Generate Y(i) value according to the following formula: 'Blue' if X1*X2≤30 +Ο΅ π=οΏ½ π€βπππ π~ππππππ(0, π ππππ = 10) ′Orange′ if X1 ∗ X2 < 30 + π 732A47 Text Mining Division of Statistics Department of Computer and Information Science 3. Plot the data in the coordinates X1 and X2 and specify color for the points as Y (use scatter() ) Are the data well separated? 4. Pack X1 and X2 into tuple X with function zip() 5. Fit the following SVM models to the data (X,Y): a. Kernel=’Linear’ b. Kernel=’rbf’, gamma=0.7 6. Use the fitted models to predict the Y values for all X values. 7. Make the same kind of plots as in step 3 but use the fitted values to show the color of the points. Conclusions?