Laboratory work ”Basic Statistics” Instructions

advertisement
732A47 Text Mining
Division of Statistics
Department of Computer and Information Science
Laboratory work ”Basic Statistics”
Instructions
Please include all diagrams and Python codes that support your conclusions into your report.
Assignment 1: Generating random numbers and hypothesis testing
1. Use module scipy.stats to generate a sample X with 50 observations that come from Poisson
distribution having mean value 4.
2. Write the code that computes an approximation to the mean value and the variance by using
the sample:
(𝑋1 + β‹― + 𝑋50 )
π‘€π‘’π‘Žπ‘›π‘‹ = 𝑋� =
π‘‰π‘Žπ‘Ÿπ‘‹ = οΏ½οΏ½οΏ½οΏ½
𝑋 2 − (𝑋�)2
50
3. Check the theoretical values for the mean and the variance of this distribution. How close/far
are those values to the ones you have computed in step 2? Conclusions?
4. Use “Counter” module in the “collections” to find out the unique values present in X and
their frequencies.
5. Create a barplot showing frequencies for all observed values of X. Does this look like a
Poisson distribution? (use bar() )
6. For each Z from 0 to 13, compute a probability mass function from the Poisson distribution
with mean 4 and present the result as a barplot (use poisson.pmf() and bar() ). Compare this
plot with the previous one and make conclusions.
7. Use function mstats.ttest_1samp and the sample you have generated in step 1 to test
whether
a. Mean value is 2.0
b. Mean value is 3.7
c. Mean value is 4.3
Assignment 2: Classifying a data with Support Vector Machines
Here, you will need to use svm module from sklearn:
http://scikit-learn.org/stable/modules/svm.html
Please read this page to perform a successful classification in this assignment.
1. Generate two samples X1 and X2, each of size n=100, and the values in these samples should
be Exponential(scale=5)
2. For each X1(i) and X2(i) Generate Y(i) value according to the following formula:
'Blue'
if X1*X2≤30 +Ο΅
π‘Œ=οΏ½
π‘€β„Žπ‘’π‘Ÿπ‘’ πœ–~π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™(0, π‘ π‘π‘Žπ‘™π‘’ = 10)
′Orange′ if X1 ∗ X2 < 30 + πœ–
732A47 Text Mining
Division of Statistics
Department of Computer and Information Science
3. Plot the data in the coordinates X1 and X2 and specify color for the points as Y (use scatter() )
Are the data well separated?
4. Pack X1 and X2 into tuple X with function zip()
5. Fit the following SVM models to the data (X,Y):
a. Kernel=’Linear’
b. Kernel=’rbf’, gamma=0.7
6. Use the fitted models to predict the Y values for all X values.
7. Make the same kind of plots as in step 3 but use the fitted values to show the color of the
points. Conclusions?
Download