CS 4485 Test
Question 1.
(10 points) In the vector space model, document j is represented as
(w
1,j
, w
2,j
, w
3,j
, …, w n,j
) , where there are n terms in total. Each w i,j
is computed as w i , j
f i , j
* log
N n i idf i
log
N n i f i , j
freq i , j max l freq l , j
Explain the meaning of N , n i
, and freq i,j
.
Question 2.
(10 points) Give the definitions of recall and precision.
Question 3: (15 points) Describe a O(n) time algorithm that takes a document (a string) as input and finds the number of occurrence of each term in the document.
Question 4: (15 points) Let X1=(1,0), X2 =(1,1), X3=(2,2), and X4=(2,3). D1={X1, X2} and D2={X3, X4}. Use the Perceptron Algorithm to find a vector w such that
W i
X ij
>0 for each X j
D1 and
i=1 to m
W i
X ij
<0 for each X j
D2.
i=1 to m
Question 5. (10 points) Explain the following terminologies: stopword, stemming.
Question 6. (10 points) PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)). Explain the meaning of
PR(Ti) and C(Ti).
Question 7. (15 points) Use the dynamic programming algorithm to compute LCS between X=abbcdbba and Y=abcbdba.
Question 8. (15 points) Consider the Fuzzy Information Retrieval model.
Suppose that our system has d1=(1, 0, 0), d2=(1, 0, 0), and d3=(0, 1,1), d4=(0, 0, 1) and d5=(1, 0, 0). Query is k1 and k2. Compute
q, j
for j=1 and 2.