WeigendNotesStanford2004.doc Version history V01 V00 created 2004.06.30 Week 1 - Class 1 Goals Learning to ask questions Actionable outcomes Definitions What is eBusiness What is data mining? Empirical What else? Insights Observation vs Experiments Experiment: vary the thing you can act upon 2 groups: A, B; or treatment and control Do it in parallel, vs sequentially Other effects, such as day of week can swamp out small effects Levels of data, levels of analysis Exponential growth of automatically collected data Constrain problems from what actions are possible Domain knowledge matters: directs us where to look Behavioral economics Levels of explanation What? vs why? Different depth of “why?” Need to create insights that generalize 5 steps of modeling Not a linear process, but a “dialog with the data” not an IT function, but an iterative, creative process Importance of defining metrics Understand the trade-offs Baseline Recommendations were based on (anonymous) co-purchasing behavior Arrow of time Game metaphor Example of tech transfer Issues Who owns the data State street Ppl who bought also bought Adverse governments Class 2 Insights Technology enables: Key ingredients Communication Connectivity Standardization D:\106760476.doc Page 1 of 18 From car parts to Size of eBiz O(10%) of overall biz But: Multiple counting of parts, only single-counting of whole? O(10%) of the world’s population But: definition, vs weighted by wealth Impact on productivity Other Organization TAs Project Textbooks Introduction to e-Business Background reading Technology: BFS Ch2 Statistics: B&L Ch5 Class 3 Student backgrounds Understand backgrounds and expectations, what problems and data do they have SCPD off-campus people to write in to Eric What was important last class? Define e-business Internet age vs industrial age 2 economies of scale: production vs network Key ingredients Communication Standardization Storage is free Search: from finding to ranking 1) context (local) 2) static (link) 3) dynamic (trajectories of customers) New pricing mechanism Buyers (for the company) power shifts to the consumer Inventory Comparison shopping Pay by imprint Pay by click Pay by buy Priceline Auctions Primarily in B2B Trade-offs Privacy vs Convenience Security vs Convenience Power vs Ease of use Review Technology and information Frame: why it matters. Why now. Infocomm Concepts of underlying technologies (and business relevance) D:\106760476.doc Page 2 of 18 TCP/IP VoIP etc Sources of data and scales E-business: Retail, telecom, airlines, hotels Trade-offs for customer (Cost of privacy. Power vs ease of use) Server side data: Customer registration, purchase, payment. Session aggregates, clicks, all that was displayed. Client side data: Toolbars (yahoo, google, alexa); ComScore. Trade-offs in customer perception: privacy vs convenience. Customer education, trust. Web services E-business Scope, Size and growth “Stanford01 ppt” Statistics (B&L Ch 5) Exploratory data analysis Description / characterization of observations of single variable. What makes sense when/ Power “laws”. Entropy. Experiments vs observation. Causality vs Correlation Many variables: visualizations,. Representation: time of year examples Statistics: Quantifying uncertainty Making predictions (and knowing how good or bad they are) Making decisions: Combining predictions with preferences Value of information Exploratory: Also Decision Trees, Ch 6 Model based on decision trees can be understood / point to exactly the rule that led to it (e.g., denial of credit) Quantifying uncertainty Don’t blindly trust numbers presented to you Plot early, plot often; look at raw data Understand what questions to ask If you don’t measure, you can’t manage effectively Often, shining a bright light on the problem is already enough Quantitative methods: At their heart, they are about quantifying and managing uncertainty Don’t use intuitions because they can easily be wrong If there was no uncertainty, method very simple Understand what is likely and what is unlikely Predictive ability on new data (out-of-sample) is king! Learn to avoid mistakes Make people aware that they are making decisions even by avoiding to make decisions Different kinds of random samples Everything has trade-offs, nothing is black and white: we quantify the different way of grey Ultimately: need to drive business decisions (under uncertainty) Just do it If you understand the risks / the uncertainty, you can make decisions with more certainty Specific learning Empirical rule (one variable) By observing a few points, you can make predictions for rare events Distinguish chance fluctuations (statistical errors) from bias (systematic errors) Observed X = true X + chance error + bias Statistics only can tell you about the chance error Chance error proportional to 1 / sqrt (N) D:\106760476.doc Page 3 of 18 Understanding correlation (two variables) Limitations, interpretation as linear measure of association Correlation is not causation Size of shoes, and linguistic ability are correlated Chapter 17 How data actually look like, in contrast to how you would like it to look like Today (Class 3) Applications to Marketing Definition / Why it is important Segmentation, history of direct marketing Note: act more quantitatively / model based, allows to create even better models Active learning (underwear) Market research (general) Survey Comscore: Panel (implicit behavior) Email campaigns Measure response Iowa electronic markets P2P example for the labels Observer: queries to download .NE. listening Streaming service could be different Cluster queries, determine segments Issues Sampling How to sell it? Paradigm shift: specific customer and specific session intention Lesson: focus on current session (Eg., home loan, vs SamTrans; Russian Ladies) Occasionalization ASSIGN PAPER B&L Ch4Why DM is important in business NYT p 90 Learn to formulate the right questions, to frame it right Ch2 P 23 How important is it to understand the business case correctly P59/ General Mills; Who is a yoghurt lover? They want to be the category manager; who determines what is placed where on What they wanted: ZIP codes etc rather than printing coupons Sources of data Applications: advertising.com vs server Session signatures Cookie IP address Time of day Browser info (color depth, window size) Purchase y/n http- referrer Timing Customer signatures and customer segmentation Behavior-based vs demographics-based Clustering algorithms Driving and driven by actions D:\106760476.doc Page 4 of 18 Conjoint analysis Interaction with the Company: Customer experience and satisfaction models Implicit vs explicit data Surveys: Sampling (session-based or customer based?), non-response biases Stated vs revealed preferences (Guest: Ketchpel/Vividence) Shopping process models (PRMs) Bayes Nets cf D’Ambrosio/CleverSet Network effects (Pedro): Traditional marketing vs network marketing Pedro Paper + Talk Data mining for viral marketing Customers influence each other Traditional methods ignore this Model markets as social networks Networks mined from collaborative filtering systems and knowledge sharing sites Some customers have very high network value Targeted viral marketing much more profitable than direct marketing Class 4 Summary of Class 3 Empower the end-customer Closing the feedback look Marketing Create product Esp for digital products Market research P2P activity info Observing Experiments Eg new song Paradigm shifts Time scales: Estimate CURRENT intentions From customer to session From corporate buyers to end-customers Both for physical and for information products Specific example of empowering the end-customer Design of Experiments Statistical design of experiments vs scientific design of experiments Importance of having to pay for things Trade-offs Class 5 Logistics IM andreasweigend@yahoo.com during class Summary of Class 4: Design of experiments Art notes Key is to vary something… Michaelson-Morley Look up year .. but what do we vary: need your creativity, your playfulness D:\106760476.doc Page 5 of 18 Data mining is not opening a book at a (random, often!) page, and applying a recipe (often wrongly) Sad to see that people focus on statistical significance, and customers ask about statistical significance. They should ask about relevance!! About actionability! Examples First, in all cases, agree on what you want to optimize Probability of a worst case, average squared error, average absolute error… Sponsorship matters: DARPA vs VCs Response time of server Assume a model, e.g., quadratic response # of DOF so small that we saw little ellipses as confidence regions AB Tests Discrete Main problem: Meta-theory of what experiments to do, how to generalize from one to the other Relationship to “active learning” 2 panel goldbox Short-term vs long-term Sponsored links People are curious: lots of clicks initially, then no more Vs heat bath / large pool: people always entering new Search Framing: This course about technologies, about data, methods and the Internet. One of the key technologies that has dramatically changed the way we do thinks, privately and publicly, is SEARCH. In this course, we will present 3 perspectives on search: General (Web), Products (in a shopping context), People (in a dating context). There are lots of insights you will gain, but let’s start simple: How do you look for a string in a given doc, on a disk Once vs many times: Trade-off Similar to data bases Idea: Indexing [Reference: Managing gigabytes] Q: Spell correction? No, keep all the typos (But might provide tools (1) looking at transpositions etc, possibly having model of what mistakes people make (2) trajectories Show Andreas Weigand at yahoo, then google) Example: Clearning Meaning = use: Wittgenstein We have talked about consumer behavior before … Note here the same shift from normative to descriptive (What are the political interests to keep with the normative? What is the point of communication, vs the Academie Francaise, the French Academy) Applications for personal search Not: personals search… Longhorn Search deeply embedded in OS Gmail Filing is hard: never know what will be needed E.g., real estate agent Paradigm shift: from filing to (indexing and) searching D:\106760476.doc Page 6 of 18 Q: Why does search on most intranets suck? Q: Ask students how satisfied they are with their intranet Blue pants example here Search inside the book 1.07 times as many books sold with Search inside the book compared to books that don’t have that feature How about the web? Need to crawl the web Classnotes: no links to them, need to know – no way to find this. Hyperlinks! Different crawl strategies / policies [BFS] Serious CS issues Don’t try this at home Order of magnitude: Number of computers Underlying technologies Communication Standards Storage And, one level deeper, processing power, RAM… Ok, so now we have the index of the web Examples "Bird Diapers" Diapers and beer: Highest nr of adult diapers sold per capita (by ZIP code) "Is it a fruit or not" "Vegan Singles" But what about all the weigends? (Also show the google number) …Solved one problem, but create another one… Like in the good old days, when you were successful, your server would melt down … ranking Paradigm shift: from finding to relevance ranking Relevance of course context dependent George Miller’s literacy test What kind(s) of info can we use here? Local info: where / how often does the word appear Keyword match (title, abstract, body) Anchor Text (referring text) Static info: link structure NYU undergrad course Enough? No – need to consider the quality of the sites linking to me Iterate, formally largest eigenvector. Luckily easily gotten, [BFS] Dynamic info Click-through (Direct Hit) No statement about what is beyond the link! (Can do smart evaluation, Thorsten Joachims, but this problems remains) (Unless more complicated models) But that only goes for the attractiveness of the link! No statement about what is beyond the link! D:\106760476.doc Page 7 of 18 (Can do smart evaluation, Thorsten Joachims, but this problems remains) (Unless more complicated models) Exploration vs exploitation trade-off Trajectories of people …. And… Money: Goto/Overture: rank by $$ Right combination: $$ * click-through Beliefs No human editors Focus on the process Other people’s work Economies of scale Provide the means of production “Platform” Empower people to become authors, editors People want to contribute, (1) make it easy for them to contribute, (2) organize their contributions, (3) learn, and channel them in the direction of max leverage Limitations: political views enter: Loved it: nobody answered. just nobody clicked Power of community So far, from the perspective of the company providing search… Economics of search Advertising.com Discuss who sees what Product search Price Comparison sites (aggregators) Shopping.com NexTag.com How do ppl search on P2P networks How does their behavior change according to what they get back / mental model of the space What can be done to guide them to more advance search? People search User = product Consider IM activity … Now from the perspective of the user How much time do you spend searching per week? No such notion of a “session” any more Personalization Any search engine in 5 years: knows your preferences, highly targeted Focus on customer, focus on creating value. Not about windowdressing and balance sheets creating transparency, through data mining Localization Knowing past searches But for this, need to start a major collection engine that does collect data about the past Toolbars: weak value props Economics for the end-user Advantage Amazon has over companies that don’t sell physical goods Class 6/ Guests: Chris Pirkner, Jon Herlocker Presentation on the Web Class 7/ Guest: Jim Oliver, eBay D:\106760476.doc Page 8 of 18 Discussion: PERSONALIZATION Desiderata: What are desired properties of a personalization system? Purpose: Why do we want to offer the customer a personalized experience? Class Not static, but adaptive (slow time scale) Detect current situation (fast time scale) Understandability / Interpretability Making things easier Save time Especially on repeated tasks Help user discover things Metrics: How do we measure whether the system does a good / bad job? Class Accuracy – defn? Relevance – who determines relevance? Ask user explicitly? Does user click at it (short-term)? Does the user return? Timing / Context Note: is a loop!! P(purchase)?? Baselines?? Empirical density Pick random items leaving the warehouse now Customer satisfaction? How do we measure that?? Time scales: Short-term vs long-term Will it dramatically differ according to personality? Track more things Esp behavior over time Proxies for: Satisfaction User experience Inputs: What data are used to build the model (variable selection)? Class Device (mobile phone vs PC) Geolocation / IP Address Shopping history, both browsing and the order history Past price sensitivity Key distinction: Passive observation vs active collection Personalization Implicit: customer has no choice …vs Customization Explicit set of choices, but people are not doing it unless they see what’s in for them Explicit ratings (like / dislike) e.g., as basis for recommender system Disambiguates the otherwise unary bought vs don’t know Amount of inference From just storing past searches… To having complicated models User acceptance: are users willing to spend the time to understand why they see what they see? Related Multi-modal personalization Can view as schizophrenic, multiple personalities Either user explicitly logs in with different identities, or sorts actions to belong to identities, or D:\106760476.doc Page 9 of 18 Occasionalization Room-mate, visit, hook-up Availability of historical data Personalization by definition based on persistent data Discuss range of data collection Stateless Scope = session Scope = entire history Do we want to enable the user to edit their history? What people edit out is informative Or at least turn it on or off? Outputs: What does the model produce? Class Perhaps build model of preferences Characterization of the situation Mode: Push vs pull for delivery of the result Independent of initial Company-initiated vs customer-initiated activity? Exploration vs exploitation Wants to delight the customer Acceptance: What does the customer like? What scares them? Does it depend on the individual customer? Psychology of customization What drives people to get special ringtones? Want to express themselves, define themselves Who experiences the results of the customization? Psychographics vs more “objective” What does the customer consider personal or private (vs public)? Blogs Anonymity / Pseudonymity? Self-realization Psychology of personalization Pay-off matrices: individual differences in false-positives, false negatives Careful to keep in mind the alternative of no customization / personalization Value propositions What’s in for the user? Co-creation, participatory design Explicit incentives Implicit E.g. better ranking of search results based on their past click behavior How much are they willing to work for good personalization? What’s in for the sponsor? Lifecycle Up-sell More features, understand how far user willing to go (e.g., good salesperson!) What cues beyond clicks will be possible? Interaction / Discourse / Mode switching: ask key questions for decision explicitly (rather than let user build a mental model of the system!) Cross-sell Market basket analysis Association rules of stuff bought (or in shopping basket) Info through navigation Risk D:\106760476.doc Page 10 of 18 Trust vs bad PR User education is key, but how? Who takes the time?? Privacy, Info leakage What would be good examples? Ads Ask about elevator Computer Operating system Open Sesame, ca 1995, suggested what files to open next Search: How would you do customized vs personalized search Geolocation key Supershuttle example A historic perspective: Personalization 4 years ago What has disappeard? Why? What has survived? Why? Industry Standard summary http://www.thestandard.com/article/display/0,1151,12444,00.html March 06, 2000 The Profilers: Getting Personal Web software applications that track consumer behavior are springing up all over, and some of them aren't invasive at all. By Jenny Oh E-businesses and advertisers are continually searching for new ways to personalize the Web experience and attract new customers. Consumers, meanwhile, are increasingly concerned about controlling their personal information on the Internet. Fueling the privacy race is a variety of companies that offer software and Web-based applications that serve both needs. The following is a rundown of some of the services that are currently on the market, and others that soon will be. COMPANY Alexa San Francisco Andromedia (acquired by Macromedia in October 1999) San Francisco PRODUCTS Alexa's downloadable navigational bar tracks and aggregates users' Web visits and provides a Related Links service. Owned by Amazon.com, the company also offers a beta version of zBubbles, a productrecommendation guide. LikeMinds personalization software aggregates information provided by Web visitors and makes product recommendations based on data collected from "like-minded" users. PRICING CUSTOMERS Free to consumers. 5 million downloads of Alexa's navigational bar; 250,000 downloads of zBubbles. Starts at $25,000 for 50,000 users and can go as high as $100,000. More than 120 customers, including the Boston Herald, Cinemax, E-Trade, Levi Strauss, Sun Microsystems and the U.S. Postal Service. D:\106760476.doc Page 11 of 18 CoreMetrics San Francisco ELuminate Web-based application tracks and analyzes consumers' browsing and buying behavior. Plans to charge a monthly service fee. Service will go live in mid-April. DoubleClick New York DART technology profiles users based on the ads they click. Boomerang technology tracks product and service preferences of consumers visiting any of DoubleClick Network's 750-plus sites and creates a list of frequent buyers or previous visitors for advertising. Through its merger with Abacus, DoubleClick offers consumer purchasing-behavior information to marketers. CPM rates vary depending on whether the advertiser belongs to the DoubleClick Network and the number of targeting filters. DoubleClick Network serves over 11,000 sites, including Kelley Blue Book, Thomson Financial Interactive and MindSpring. Engage Technologies Andover, Mass. ProfileServer technology tracks customer preferences at specific Web sites. AudienceNet tracks users' browsing habits, then draws on its database of 42 million demographic-based profiles to deliver targeted ads. Pricing for ProfileServer varies. Cost for AudienceNet is CPM-based. 1,400 customers, including CNET, NetNoir and Image Networks. This figure also includes users of Engage services and products other than those mentioned here. Lumeria Berkeley, Calif. SuperProfile technology lets consumers customize and selectively share their personal profiles with Web businesses. The Lumeria Ad Network delivers ads based on consumers' profiles and pays consumers for each click-through. SuperProfile is free to individuals. Lumeria charges an undisclosed ad rate to businesses. Scheduled to ship in April. MatchLogic Westminster, Colo. TrueMatch technology uses cookies to determine demographic information about a Web surfer, then uses MatchLogic's database of 72 million profiles to deliver targeted ads. CPM rate varies. 400 customers, primarily advertisers. This figure includes customers of other MatchLogic products as well as TrueMatch. Net Perceptions Eden Prairie, Minn. E-commerce and Ad Targeting software generate profiles through direct customer queries and by monitoring browsing habits. The software then makes product recommendations and delivers ads. The annual license fee for Ecommerce starts at $75,000 for up to 50,000 users. Annual license fee for Ad Targeting begins at $25,000 for sites serving up to 1 million ads per month and goes as high as $250,000 for sites serving 500 million ads per 156 customers, including Dean & Deluca, eToys, HomeGrocer, Lycos and Virgin Online Megastore. D:\106760476.doc Page 12 of 18 month. Personify San Francisco Essentials server software identifies patterns in consumers' Web behavior and integrates data from user registrations and from offline databases. Proactive e-mail app generates targeted e-mail lists based on the same data. License fee for the basic package which includes Essentials plus one add-on option is $385,000; the premiere package, which includes Essentials, Proactive and four add-on options is $490,000. 75 customers including Drkoop.com, J. Crew, Onsale and Patagonia. Predictive Networks Boston Predictive Networks' technology enables ISPs to track the ads and Web sites that consumers click, then aggregate their preferences to deliver highly targeted advertising. N/A Internet service providers and advertising agencies. PrivaSeek Broomfield, Colo. Persona technology allows consumers to customize and selectively share their personal profiles with Web businesses; functions as an e-wallet. Free to consumers. N/A WinWin Boston WinWin technology delivers ads based on demographic and psychographic information provided by individual consumers. Advertisers pay anywhere from 1 cent to $1 per ad, depending on the level of interaction with the consumer. Hook Media. Younology New York Orby technology creates profiles by tracking clickthroughs and directly asking consumers about their preferences. Total Perspective for ebusinesses tracks Orby users to create a personalized shopping experience. SmartSense Server links consumer profiles to business suppliers and partners in real time. Orby is free to consumers. Total Perspective costs $1,500 per site server plus monthly fees ranging from $2,250 to $60,000 depending on the number of registered Orby users. There is a $9,995 site license fee to link to SmartSense Server. Scheduled to launch in midMarch. Customers include the Knot, Travelbreak.com and Fuxito Worldwide. D:\106760476.doc Page 13 of 18 Article from ZDNet http://www.zdnet.com/pcmag/stories/reviews/0,6755,2327781,00.html Class 8: Infomediary + Unbundling the corporation Reduce cost of interaction / information / transaction US 70% interaction activities India 40% Negotiation tools – Impact of IT? Where to spend the time of the human?? Good point: free up / reallocate that time to more creativity / redeploy Existence of firm: shaped by interaction costs As technology shifts, customers empowered to obtain more info as well as NEGOTIATE with the customers What is a market Customers finding vendors that are best / good matches Infomediary: power is shifting (not sudden, over decades in US: from producers to retailers!!) Cf: Wal*Mart becoming a customer agent, squeezing out suppliers Also: Dell. What is an Infomediary Agent who acts on behalf of the customer Net worth: NOT a privacy event, only an element of the concept Greater value: to be helpful to find vendors What factors need to come together to have these new businesses develop? Examples of startups that emerged (and failed)? Late 90’s: about eyeballs, fast growth (1) BUT not enough quality commitment, takes time to build profiles Q: sub-markets? (2) Overly optimistic on technology (XML already 5 years ago!) (3) Successful intermediary requires trust and profiles / customer information (exists in larger companies, bank, retailers…) … but their mindset is not that of an agent!!! They just want to sell more products clean-slate startups / entrepreneurial Q: verisign? Deutsche Post? Positive examples in business setting Li & Fung HK based, 100 years old They act as agent on behalf of apparel designers They have 7.5 companies whose capabilities they know very well They engineer business processes on behalf of their customers Of course, done with very little technology. Only now moving toward technology They are spending a lot of time up front to discuss trade-offs etc After 9/11, they managed to move production from Pakistan to “safer” countries D:\106760476.doc Page 14 of 18 USD 5bn in revenues, similar to Amazon.com What is the bottleneck? Not technology, but mindset Business issue of not thinking to send traffic to others Cf Alibaba.com Freemarkets (Pittsburgh), recently acquired by Ariba They would go out and qualify vendors based on the needs Then they create a reverse auction, having Alternative technologies, but different business models Has search replaced the Infomediary? BUT: only a small part is searchable!! What about personal agents (“agent-based technologies”? Technology: What can be automated? Platform Reputation systems Market design: What are good market mechanisms for information? Economic schemes Is it just that companies can get it away without paying?? Or do we need to define paying more broadly (with attention, with time) Example Hire-right: stupid that they don’t contact the customer LinkedIn Why is it so hard to get companies (e.g. HR departments) to close a loop Issues Validity of information!! Acxiom, but what’s in for the end-customer?? Also: Need mechanisms for fixing simply incorrect information Unbundling the Corporation Less visionary, more what’s on the way already Initially bundled because of communication overhead… but very different mindsets and cultures!! 1. Customer relationship 2. Infrastructure High-volume, routine E.g., manage a logistics network 3. Product innovation Product lifecycle What has happened in the 5 years since this article? Companies outsourcing core operations etc Companies driven by taking out cost of their business Sport shoes Design (cf GA’s) www.spectorCNE.com (98 SLAC story) Thoughts on privacy Auction off information? Add: RFIDs, .4mm on the side, 5cent, dropping to 1c, typically 128bits Up to 10+ meters Wireless cameras D:\106760476.doc Page 15 of 18 Micro-GPS My HK transmitter Header info/ Mike Schwartz paper Blogs. IBM CN example: characters expressed in Pinjin and translated into english Understand what is non-communication recording (eg GPS in rental cars), and what is communicating (Payless: $1 / mile outside CA/NV) Can we restrict propagation of existing info (whether correct or not) Drive through toll-station Clothing: conclude from RFIDs + mobile phones who you are (But then of course lost and found) Yet -- same tech can make inventory / supply chain etc way more effective (Wal-mart estimate: save 8bn / year (total sales: 250bn) 31M surveillance cameras in the world. Data production amount?? 4M in UK (-10yrs: 0.15M) Smart badges, ca 1990 at then-Xerox PARC: mad your phone ring in the ofc you happen to be in (pre-mobile area) Phone sensing volume and being smart about it; elevators A dozen syphilis cases in SF ca 1999 could be traced back to a chatroom SH jiao tong card/ metro: by the minute and second which gate you entered, which you left HK: Octopus card Should people be made aware when they are being checked out? (Show log of weigend.com) In this world: ppl worry about cookies! Requesting authentication...: listening habits Already: TiVo PVR (Personal Video Recorder, cf the NYT story) So, will my fridge be talking to my mobile when I am at safeway, and they have caviar on sale? Or will safeway be talking to me since they know? Class 9: Spam: Rick Giarrusso Email: Relatively recent phenomenon Mid-80s: email rare Mid-90s: email popular, but not spam yet What is Spam? Unsolicited email (Will frequency of sent-out emails make a difference to the perception of whether it is spam?) Individual vs community 1) Individual recipient Manually maintaining a list Of senders Of keywords… But lots of difficulties with rules, such as the order they are applied in 2) Community based approach Implicit data Lots of emails arriving from a certain sender Explicit Empowering users to label emails as spam D:\106760476.doc Page 16 of 18 Probabilistic model (Bayesian network) Explicit vs implicit data Issues Problem: Users need to understand the result of their actions People using the spam button instead of the delete button How do you train users? Different people have different modes to learn things What info to use? Header info, including TCP/IP Arms race White-colored text on white background (non-copyrighted stuff) to swamp the filter Return address from Note: Spammers and viruscreators make nteresting bedfellows People and Processes Ex: Scenarios Interesting: build a Bayes Net to model the outcome of a litigation Helps to focus people on what they really are thinking about / communication issues But: communication does not mean PPT slides… Why do people not have those conversations?? Logical, rational way of doing problem solving? What role does data mining play? Processes How do you quantify the benefits? Chuck Lam’s insight: Scientist are trained to asked questions MBAs are trained to give answers Pricing Class 10: Analyzing proxy data and data from other sources to create information products for Wall Street www.weigend.com/tmp has several papers by Majestic Research (MR…) Speaker: Seth Goldstein, CEO of Majestic Followed by Reception in the stats department Goal Create transparency Key driver Focus on proprietary data Not accounting irregularities, forensic stuff Dimensions of information products Latency Granularity Exclusivity 2-dimensional plot: Along the diagonal: Tools – Reports –Servicds – Custom X: $1 … $1bn Y: 1000’s of clients -> 1 client Which are good stocks for them to predict? Retails, not Enterprise software Enterprise: might be made or killed by one consumer Travel D:\106760476.doc Page 17 of 18 Initially Cool: Then look at their costs! I.e., prices they are paying for keywords (Isn’t that smart?!) And 2 more dimensions Thin – Liquid Calm - Volatile Data Company or Research Company? Service company? How are the data analyzed? For any equity, there is a model out there Same store sales New subscribers License revenue NPD monthly sales … Import, scrub and categorize initial data Build models Assess results Confidence intervals Analysis of residuals Normality assumptions Out-of-sample testing If possible assessment, automate data collection and model updating process What makes an analyst? Quantiative Creative D:\106760476.doc Page 18 of 18