CS 4485 Test

advertisement

CS 4485 Test

Question 1.

(10 points) In the vector space model, document j is represented as

(w

1,j

, w

2,j

, w

3,j

, …, w n,j

) , where there are n terms in total. Each w i,j

is computed as w i , j

 f i , j

* log

N n i idf i

 log

N n i f i , j

 freq i , j max l freq l , j

Explain the meaning of N , n i

, and freq i,j

.

Question 2.

(10 points) Give the definitions of recall and precision.

Question 3: (15 points) Describe a O(n) time algorithm that takes a document (a string) as input and finds the number of occurrence of each term in the document.

Question 4: (15 points) Let X1=(1,0), X2 =(1,1), X3=(2,2), and X4=(2,3). D1={X1, X2} and D2={X3, X4}. Use the Perceptron Algorithm to find a vector w such that

W i

X ij

>0 for each X j

D1 and

i=1 to m

W i

X ij

<0 for each X j

D2.

i=1 to m

Question 5. (10 points) Explain the following terminologies: stopword, stemming.

Question 6. (10 points) PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)). Explain the meaning of

PR(Ti) and C(Ti).

Question 7. (15 points) Use the dynamic programming algorithm to compute LCS between X=abbcdbba and Y=abcbdba.

Question 8. (15 points) Consider the Fuzzy Information Retrieval model.

Suppose that our system has d1=(1, 0, 0), d2=(1, 0, 0), and d3=(0, 1,1), d4=(0, 0, 1) and d5=(1, 0, 0). Query is k1 and k2. Compute

 q, j

for j=1 and 2.

Download