Uploaded by Karthik Kumar

EuclideanDistance

advertisement
05/09/2019
Similarity-distance
Playing arund with Iris
We will use Iris in class to practice some attribute transformations and computing similarities.
In [3]:
import matplotlib.pyplot as plt
from sklearn import datasets
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 2:4] # we only take petal length and petal width.
Y = iris.target
plt.figure(2, figsize=(8, 6))
plt.clf()
# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=Y)
plt.xlabel('Petal length')
plt.ylabel('Petal width')
plt.show()
Lets import numpy, we'll use it to get various statistics. Also, let's peek at the data.
https://hub.gke.mybinder.org/user/jupyterlab-jupyterlab-demo-5durbs0y/lab/workspaces/auto-u?clone
1/4
05/09/2019
Similarity-distance
In [4]:
import numpy as np
A = iris.data
a = A[0,:]
b = A[-1,:]
print(a,b)
[5.1 3.5 1.4 0.2] [5.9 3.
5.1 1.8]
In [5]:
c = np.log(a)
In [6]:
d = np.abs(c)
print(d)
[1.62924054 1.25276297 0.33647224 1.60943791]
You can also get the min and max with numpy
In [7]:
for i in range(A.shape[1]):
print(np.min(A[:,i]), np.max(A[:,i]))
4.3
2.0
1.0
0.1
7.9
4.4
6.9
2.5
TODO: Standardize the data by subtracting the mean and dividing by standard deviation.
In [8]:
standardised = ((A-np.mean(A))/np.std(A))
TODO: Compute the Euclidean distance between every pair of points and print the pair with the smallest
distance.
https://hub.gke.mybinder.org/user/jupyterlab-jupyterlab-demo-5durbs0y/lab/workspaces/auto-u?clone
2/4
05/09/2019
Similarity-distance
In [9]:
from sklearn.metrics.pairwise import euclidean_distances
dist = euclidean_distances(standardised)
print(dist)
[[0.
7]
[0.27282639
6]
[0.25832953
6]
...
[2.2594606
5]
[2.35621893
3]
[2.09745567
]]
0.27282639 0.25832953 ... 2.2594606
0.
2.35621893 2.0974556
0.15198777 ... 2.27925353 2.39028652 2.1041753
0.15198777 0.
... 2.3616593
2.45648262 2.1779021
2.27925353 2.3616593
... 0.
0.31230517 0.3243988
2.39028652 2.45648262 ... 0.31230517 0.
0.3891467
2.10417536 2.17790216 ... 0.32439885 0.38914673 0.
In [19]:
for i in range(A.shape[0]):
dist[i,i] = np.max(dist)
print(dist)
[[3.58954366
7]
[0.27282639
6]
[0.25832953
6]
...
[2.2594606
5]
[2.35621893
3]
[2.09745567
6]]
0.27282639 0.25832953 ... 2.2594606
2.35621893 2.0974556
3.58954366 0.15198777 ... 2.27925353 2.39028652 2.1041753
0.15198777 3.58954366 ... 2.3616593
2.27925353 2.3616593
2.45648262 2.1779021
... 3.58954366 0.31230517 0.3243988
2.39028652 2.45648262 ... 0.31230517 3.58954366 0.3891467
2.10417536 2.17790216 ... 0.32439885 0.38914673 3.5895436
In [22]:
print(np.min(dist))
0.0
We can see duplicate data points in dataset, so the min distance will be 0
https://hub.gke.mybinder.org/user/jupyterlab-jupyterlab-demo-5durbs0y/lab/workspaces/auto-u?clone
3/4
05/09/2019
Similarity-distance
In [21]:
indices = list(zip(*np.where(dist==0.0)))
print(indices)
for index in indices:
print("pair of points with minimum distance",A[index[0],:],A[index[1],:])
[(101, 142), (142, 101)]
pair of points with minimum distance [5.8 2.7 5.1 1.9] [5.8 2.7 5.1 1.
9]
pair of points with minimum distance [5.8 2.7 5.1 1.9] [5.8 2.7 5.1 1.
9]
In [ ]:
https://hub.gke.mybinder.org/user/jupyterlab-jupyterlab-demo-5durbs0y/lab/workspaces/auto-u?clone
4/4
Download