Web Exploration and Search Technology Lab

Web Exploration and Search Technology Lab
Department of Computer and Information Science
Polytechnic University
Brooklyn, NY 11201
Torsten Suel
PhD Students:
PhD Graduates:
Qingqing Gan
Hao Yan
Jiangong Zhang
Yen-Yu Chen (2006) -> Yahoo
Utku Irmak (2006) -> Yahoo
Xiaohui Long (2006) –> MSN Search
Looking for additional PhD students …
Polytechnic University:
• “Brooklyn Poly”, founded in 1854
• in downtown Brooklyn
• Engineering, CS, Management
• 1500 ugrads, 1400 grad students
• CS: 16 tenure/t faculty, 40 PhD studs.
• Algorithms, Networks, Security, Software Eng., Image/Vision/Graphics
• Databases ?
• Information Retrieval ?
• Web Search !!
- core web search
- related work in algorithms, systems, databases
- emerging applications: social networks, blogs, local search, …
core search
image &
low level stuff: “search engine guts”
• Systems/Architectures/Scalability:
- efficient crawling, data distribution, indexing, query execution, link analysis
• Emerging Applications:
- geographic/mobile search, deep web search, blog/RSS search, P2P search
• Web Spam
Some Research Projects
• Scalability of Large Search Engines
- automatic
- interactive
- can we do with less?
- scale to larger data?
- storage/indexing/mining
of web archives
• Future Search Architectures
Search Engine Research Cluster at Poly
- peer-to-peer as Google killer?
- desktop/client based search
- blogs/social networks/new media
• Geo / Local Search Engines
Example: Google Local Search
Geo Search Research at Poly
ODISSEA System Architecture
Some Recent Group Publications:
Search Engine Query Processing:
• Three-Level Caching for Efficient Query Processing in Large
Web Search Engines. X. Long, T. Suel. 14th WWW Conf., 2005.
• Optimized Query Execution in Large Search Engines with Global
Page Ordering. X. Long, T. Suel. VLDB, 2003.
Geographic Web Search:
• Efficient Query Processing in Geographic Web Search Engines.
Y. Chen, T. Suel, A. Markowetz. ACM SIGMOD, 2006.
• Design and Implementation of a Geographic Search Engine.
A. Markowetz, Y. Chen, et al. WebDB 2005
• Efficient Query Subscription Processing for Prospective
Search Engines. U. Irmak, S. Mihaylov et al. USENIX, 2006.
• Interactive Wrapper Generation with Minimal User Effort.
U. Irmak, T. Suel. 15th WWW Conf., 2006.
• Efficient Query Evaluation on Large Textual Collections in a
P2P Environment. J. Zhang, T. Suel. IEEE Conf. on P2P, 2005.
• Improved Single-Round Protocols for Remote File Synchron.
U. Irmak, S. Mihaylov, T. Suel. IEEE Infocom, 2005.
• Hierarchical Substring Caching for Efficient Content Distr. to
Low-Bandwidth Clients. U. Irmak, T. Suel. 14th WWW Conf., 2005.