Rakesh Agrawal Technical Fellow Search Labs, Microsoft Research – Silicon Valley Current state of affair Evolving search Search labs projects Current state of affair Evolving Search Search Labs projects Navigational Queries Pseudo- Navigational Queries Car GPS around $300 Four day trip to Bhutan from Delhi to visit important Buddhist places Game Console s Party Site Search queries are not grammatically correct questions, but they are not bags of words either Query terms are often more than strings of characters Data often has structure or structure can be derived Search can span multiple sessions over several days Search often provides entry point for browsing and search and browsing are inter-mixed Expectations from search are increasing Current state of affair Evolving search Search labs projects Health Education Humanity’s greatest advances are not in its discoveries – but in how those discoveries are applied to reduce inequity. Bill Gates Harvard Commencement. June 7, 2007 “Is it right? Is it just? Is it in the interest of mankind?” Woodrow Wilson. May 30, 1919. Applications to benefit individuals and society Number of People With Chronic Conditions (millions) Deaths due to infectious diseases 180 171 164 157 160 149 141 140 133 125 120 118 100 1995 2000 2005 2010 2015 Year 2020 2025 2030 New Challenge: chronic conditionsillnesses and impairments expected to last a year or more, limit what one can do and may require ongoing care In 2005, 133 million Americans lived with a chronic condition (up from 118 million in 1995) Tremendous simplification in the technologies for effortlessly capturing useful personal information Dramatic reduction in the cost and form factor for personal storage Cloud Computing Charts for appropriate demographics? Optimum level for Asian Indians: 150 mg/dL (much lower than 200 mg/dL for Westerners) Due to elevated levels of lipoprotein(a)* Distributed computation and selection across millions of nodes Privacy and security *Enas et al. Coronary Artery Disease In Asian Indians. Internet J. Cardiology. 2001. 1951 1981 2002(*) Literacy Percentage 18.33% 43.57% 65.38% Educational Instituions Primary Upper Primary High/Hr. Seconday & Inter & Pre Junior College 209671 13596 7416 494503 118555 51573 664041 219626 133492 Enrollements (in millions) Primary Upper Primary High/Hr. Seconday & Inter & Pre Junior College 19.2 3.1 1.5 73.8 20.7 11 113.9 44.8 30.5 NA 82.5 66 Teachers (in '000) Primary Upper Primary High/Hr. Seconday & Inter & Pre Junior College 538 86 127 1363 669 926 1928 1157 1777 Pupil Teacher Ratio Primary Upper Primary High/Hr. Seconday & Inter & Pre Junior College 20 20 21 38 33 27 43 34 34 0.64% 2.92% 4.02% Dropout Rates (%) Public Expenditure (% of GDP) Significant achievements, but problems remain … Poor performance • 39% dropouts in primary, additional 15.6% in secondary, additional 11.7% in higher secondary • Pass out ratio is 50% at Class X and majority of them pass in 3rd division • Less than 8% finish all schooling to qualify for a college education Poorly trained teachers • 51% of primary teachers are higher secondary or below • Only 44% have received in-service training • Absence of learning material for teachers to update their knowledge Poor teacher-student ratios • Ratio in primary is 1:43, secondary and Higher secondary is 1:34. About 9% of primary schools have a teacher-student ratio > 1:100 • 1.4% of primary schools have no teachers, 19% have only one teacher for all classes Poor quality of material • Poor quality of textbooks, out-dated curriculum Source: IBM Report on Improving India’s Education System through Information Technology, 2005 Attacking Complex Problems* Framework Application to Education 1. Define goal. 1. Quality education to all. 2. Find the highest- 2. New pedagogy. leverage approach. 3. Discover the ideal technology for that approach. 4. In the meantime, make the smartest application of the technology on-hand. 3. Individualized learning with teacher as a discussant. 4. Internet-based mass collaboration to help teachers teach better and improve the educational infrastructure. * Bill Gates. Harvard Commencement. June 7, 2007 • Participation of experts, teachers, parents and students in the development and revisions of curricula • Sharing and collaborative development of lectures, assignments, tests, etc. • Tools for capturing feedback on textbooks (errors, better explanations, supplementary readings) • Collaborative translation and localization of educational material Current state of affair Evolving search Search labs projects Search Labs Invent next in Internet search and applications computational machine learning economics game theory information retrieval query processing web mining algorithms privacy parallel mining data management ranking NLP 23 inconsistent data link analysis Best car GPS around $300 Structure and Semantics in Data and Queries Insights on user behavior from massive data mining From ranking to decision making Task-orientation Best car GPS around $300 Category = “Auto GPS” Price = approx(300) Order By ReviewRank Symphony Enable non-developers to create and monetize custom search applications that combine their data and knowledge with Search services. ShoeQueen ShoeQueen “Trail running shoes” Advertising <javascript /> submit Search ShoeQueen <div id=results> </div> query Query/Click Logger Symphony Runtime Semi-structured Query Processing ShoeQueen Data Customer Data + forum/review pages Web Advertising ShoeQueen Config images Image Advertising Items Advertising 1. Collect initial results 2. Add additional data 3. Generate HTML Query clicks Rev Sharing Component ads Data Services Proprietary Query, Click, App-ID News … Video 3rd Party Tools for creating and updating content (Wikipedia++) Trust and authoritativeness of content Personalization of search to find the material suitable for one’s own style of teaching Bootstrapping and incentives Search is becoming an essential “utility” Need to develop new foundations and abstractions to take search to next level Academia can (and must) play a leading role 27 Search Labs’ mission is to invent next in Internet search and applications 28