Problems of statistical modeling of whole session Sessions that are long complicate the model (need to do averages) Determining what are the best states Determining the transition probabilities We don’t know how to estimate the effort corresponding to the time they spend in various states Difficult to capture rich context in a statistical model What are the system features that should be improved based on the model (where to put researches---e.g., in query reformulation We don’t know how to relate system support to the task outcomes It is expensive to collect a lot of training data It is difficult to have representative samples of searches (people, tasks, systems) How many statistical models should we create? (e.g., 3 for our different scenarios) Interactive systems have more functionality and therefore more possible states. What are the critical functionalities to model? Ranking provides RSV rather than probability estimates (approach would be to use score distribution models). How to aggregate different score values for different results. Cost of using the system versus using the document should be separated. We do not know what people do with the objects found. Our measures do not take into account sampling without replacement (once a document is saved it should be removed from the search space for that user in subsequent users) We do not have any principles for estimating the effects of making simplifying assumptions in our statistical models (we typically just do things to make the math and/or the set up on the evaluation easier) How to apply statistical models at the tactical or strategic levels? How to take auto-complete into account in data collection for statistical modeling? Approaches Use Markovian modeling Use for higher ordering transitions Use process mining from workflow to study whole session To make predictions that are more precise (have higher probabilities of being correct), create more fine grained model Research Proposals Collect data and create X models with different number of states and compare their predictability by cost How to use a statistical model to create criterion-based tests for performance Success criteria should relate to the process and not only search outcome. A good outcome for a process model includes an ordering factor rather than being simply and unordered set of good documents across the various results.