Web Markov Skeleton Processes and Applications Zhi-Ming Ma 10 June, 2013, St.Petersburg Email: mazm@amt.ac.cn http://www.amt.ac.cn/member/mazhiming/index.html • Y. Liu, Z. M. Ma, C. Zhou: Web Markov Skeleton Processes and Their Applications, Tohoku Math J. 63 (2011), 665695 • Y. Liu, Z. M. Ma, C. Zhou: Further Study on Web Markov Skeleton Processes, in Stochastic Analysis and Applications to Finance,World Scientific,2012 • C. Zhou: Some Results on Mirror SemiMarkov Processes, manuscript Web Markov Skeleton Process Markov Chain conditionally independent given Define WMSP by : Simple WMSP: Many simple WMSPs are Non-Markov Processes [LMZ2011a,b] Mirror Semi-Markov Process Mirror Semi-Markov Process is not a Hou-Liu’s Markov Skeleton Process, i.e. it does not satisfy Multivariate Point Process associated with WMSP WMSP Let Consequently Define We can prove that where where Time-homogeneous mirror semi-Markov processes are all independent of n More property of of time homogeneity Renewal Theory Contribution probability Staying times and first entry times Limit distribution for semi-Markov process Limit distribution for mirror semi-Markov processes Reconstruction of Mirror Semi-Markov Processes Why it is called a Web Markov Skeleton Process? Page Rank, a ranking algorithm used by the Google search engine. 1998, Sergey Brin and Larry Page , Stanford University From probabilistic point of view, PageRank is the stationary distribution of a Markov chain. A simple Markov Skeleton Process Markov chain describing surfing behavior Markov chain describing surfing behavior Web surfers usually have two basic ways to access web pages: 1. with probability α, they visit a web page by clicking a hyperlink. 2. with probability 1-α, they visit a web page by inputting its URL address. where Weak points of PageRank • Using only static web graph structure • Reflecting only the will of web managers, but ignore the will of users e.g. the staying time of users on a web. • Can not effectively against spam and junk pages. BrowseRankSIGIR.ppt Data Mining Browsing Process • Markov property • Time-homogeneity Computation of the Stationary Distribution – Stationary distribution: P(t ) – is the mean of the staying time on page i. The more important a page is, the longer staying time on it is. – is the mean of the first re-visit time at page i. The more important a page is, the smaller the revisit time is, and the larger the visit frequency is. BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu, Bin Gao, Tie-Yan Liu, Ying Zhang, Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval. Best student paper ! • Browse Rank the next PageRank says Microsoft •jerbrows er.wmv • Browsing Processes will be a Basic Mathematical Tool in Internet Information Retrieval Beyond: --General fromework of Browsing Processes? --How about inhomogenous process? --Marked point process --Mobile Web: not really Markovian ExtBrowseRank and semi-Markov processes MobileRank and Mirror Semi-Markov Processes MobileRank and Mirror Semi-Markov Processes Web Markov Skeleton Process [10] B. Gao, T. Liu, Z. M. Ma, T. Wang, and H. Li A general markov framework for page importance computation, In proceedings of CIKM '2009, [11] B. Gao, T. Liu, Y. Liu, T. Wang, Z. M. Ma and H. LI Page Importance Computation based on Markov Processes, Information Retrieval online first: <http://www.springerlink.com/content/7mr7526x21671131 Research on Random Complex Networks and Information Retrieval: In recent years we have been involved in the research direction of Random Complex Netowrks and Information Retrieval. Below are some of the related outputs by our group (in collaboration with Microsoft Research Asia) More property of time homogeneity right continuous, piecewise constant functions Theorem [LMZ 2011a] for all n Theorem [LMZ 2011b] General case The statistical properties of a time homogeneous mirror semi-Markov process is completely determined by: Reconstruction of Mirror Semi-Markov Processes Given: , , Theorem [LMZ 2011b] We can construct such that uniformly Limit distribution for semi-Markov process Limit distribution for mirror semi-Markov processes Staying times and first entry times Staying time on the state j: Distribution Expectation First entry time into the state k: where into k Distribution Expectation Contribution probability from state i to state j: Renewal Theory Proposition Renewal Equation [LMZ2011a] Renewal functional: where Below are the resuls on the renewal functional [LMZ2011a] Thank you ! Time Homogeneous WMSP right continuous, piecewise constant functions More property of of time homogeneity Theorem [LMZ 2011b] for all Reconstruction of WMSP [LMZ2011b] Write is expressed as Ranking Websites, a Probabilistic View Internet Mathematics, Volume 3 (2007), Issue 3 Ying Bao, Gang Feng, Tie-Yan Liu, Zhi-Ming Ma, and Ying Wang AggregateRank: Bring Order to Web Sites 29th Annual International Conference on Research & Development on Information Retrieval (SIGIR’06). G.Feng, T.Y. Liu, Ying Wang, Y.Bao, Z.M.Ma et al