Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense William Pentney, Ana-Maria Popescu, Shiaokai Wang, Henry Kautz Matthai Philipose Intel Research Seattle 1100 NE 45th Street Department of Computer Science & Engineering 6th Floor University of Washington Seattle, WA 98105 Box 352350 matthai.philipose@intel.com Seattle, WA 98195 {bill,amp,shiaokai,kautz}@cs.washington.edu Abstract sense representation, information retrieval, statistical inference and sensing to build an automated system that can track the day-to-day world at multiple levels. The most ambitious sensor-based day-to-day state estimation systems to date have been human activity recognition systems. Using sensors such as accelerometers and coarse audio (Bao & Intille 2004; Lester et al. 2005), these systems have been able to do an excellent job of recognizing simple activities such as walking, running and climbing stairs. On more complex activities, e.g., various cooking, cleaning and personal grooming tasks, traditional generalpurpose systems (typically based on vision (Moore, Essa, & Hayes 1999; Duong et al. 2005), potentially in concert with the above sensors) have had more limited success, for two main reasons. First, it has proven extremely difficult to detect salient high-level features (such as objects in use) robustly under day-to-day conditions. Second, acquiring models for the activities has proven difficult because of the need to acquire very large quantities of labeled data under a variety of circumstances. A promising development in feature detection is the emergence of dense sensors, based on Radio Frequency Identification (RFID) and other wireless technology that can detect robustly the use of even small objects like toothbrushes. Using simple object-use-based models of activities, systems based on these sensors (Philipose et al. 2004; Tapia, Intille, & Larson 2004) have been able to detect a large variety of high-level activities robustly with high accuracy. The use of simple object-use models allows an interesting solution to the problem of acquiring models for activities (Perkowitz et al. 2004; Wyatt, Philipose, & Choudhury 2005). Since these models essentially capture the correlation between activity names (e.g., “make coffee”) and object names (e.g., “mug”, “spoon”), and the mapping is common sense (i.e., most people use the most of the same objects), it is possible to mine them using term-correlation on a large generic corpus such as the web. The weak classifiers so obtained can serve as priors for unsupervised learning on unlabeled data that can improve the model automatically. It is intriguing to adapt the idea of using dense sensors to facilitate commonsense activity recognition to using them to facilitate recognition of generic day-to-day state using common sense at a very large scale. The adaptation is challenging for a variety of reasons. First, it is unclear how to rep- The use of large quantities of common sense has long been thought to be critical to the automated understanding of the world. To this end, various groups have collected repositories of common sense in machinereadable form. However, efforts to apply these large bodies of knowledge to enable correspondingly largescale sensor-based understanding of the world have been few. Challenges have included semantic gaps between facts in the repositories and phenomena detected by sensors, fragility of reasoning in the face of noise, incompleteness of repositories, and slowness of reasoning with these large repositories. We show how to address these problems with a combination of novel sensors, probabilistic representation, web-scale information retrieval and approximate reasoning. In particular, we show how to use the 50,000-fact hand-entered OpenMind Indoor Common Sense database to interpret sensor traces of day-to-day activities with 88% accuracy (which is easy) and 32/53% precision/recall (which is not). Introduction A system that can track the state of the world at multiple levels as humans go about their day-to-day activities is of interest both for conceptual and practical reasons. Conceptually, the ability to recognize and reason about what activities a person is doing, what the resulting physical state of the world is, what the likely emotional state of the actors is, etc., is at the heart of computational models of human intelligence. From a pragmatic viewpoint, a whole variety of tasks such as caregiving, security monitoring, training and directing, which are currently considered expensive “high-touch” jobs that depend solely on humans become amenable to automated support if the computer can reason about the world. Researchers have recognized a variety of factors preventing this level of reasoning in machines, especially the need for very large quantities of common sense (McCarthy 1996; Minsky 2000), for noise-resistant representations and reasoning, and for very large quantities of labeled data connecting sensor signals to symbols. In this paper, we show how to leverage recent advances in very large scale common c 2006, American Association for Artificial IntelliCopyright gence (www.aaai.org). All rights reserved. 906 Relational DB mine clause quality from web Weighted Relations propositionalize Weighted Horn Clauses convert weighted Horn Clauses to MRF features Chain Graph Figure 1: The iBracelet (left) and RFID tags (right) add temporal smoothing Markov Random Field (MRF) Figure 2: The SRCS data conversion architecture resent models: common sense knowledge relating aspects of the world state tends to be expressed as declarative relations; the HMMs used for representing activities seem inadequate to the task. Second, it is unclear that term occurrence statistics are a practical means of acquiring arbitrary common sense information from the web. Third, given that we expect both the number of state variables of interest and the relationships between them to be very large, and given that we want to track state over time, it is unclear how to perform learning and inference efficiently. In this paper, we present the architecture and implementation of a system called SRCS (for State Recognition using Common Sense) that solves the above problems. SRCS represents information as chain graphs (Buntine 1995), a factored probabilistic graphical representation that allows both directed and undirected dependences. It combines human-entered declarative relational databases of commonsense (Singh et al. 2002b; Lenat & Guha 1990) with webwide information retrieval techniques based on lightweight syntactic analysis (Brill et al. 2001; Etzioni et al. 2004) to produce chain graphs representing the databases. It exploits the structure of the graph to introduce simple but highly effective techniques for performing inference fast on the extremely large graphs generated. We show that when reasoning about 75 minutes of day-to-day activity data from two subjects at a 2.5-second granularity, SRCS is able to track various aspects of the world with accuracy, precision and recall of 88%, 32% and 53%, far above that of baseline schemes. To our knowledge, SRCS is the first system to show how large common sense databases can be used to interpret sensor data collected about the broad state of the world. tects a tagged object, the object is in use, i.e., hand proximity to objects implies object use. This definition leads to noise in object-use data, since objects not in use may be near the hand, and some objects may be grasped too far from a tag. Because this noise is low, we assume below that the bracelet yields a stream of the names of objects used in the current task. In its current avatar, the iBracelet makes it feasible to track the use of thousands of object in a household. Data Acquisition and Representation Figure 2 shows how SRCS acquires and transforms the representation of commonsense facts that represent its world view. SRCS operates by collecting sensory input from a user and employing statistical inference methods to reason about various predefined facts about the state of the world (e.g., “Is the light on?”, “Is the user in the kitchen?”, “Is the user hungry?”). The model translating between observations and abstract state is acquired from existing hand-created commonsense databases, weighted by quality using automated web-based information retrieval techniques, translated into logical form, converted into a non-temporal probabilistic graphical model PGM to enable consistent reasoning under uncertainty, and finally converted into a temporal PGM well suited for efficient inference over time. Below, we describe each of these components in detail. Common Sense Databases We obtain the basic facts that we reason about, and the relationship betweeen them from the Open Mind Indoor Common Sense (OMICS) (Gupta & Kochenderfer 2004) database. Similar to CyC (Lenat & Guha 1990), OMICS is a user-contributed database, based on the interface described by (Singh et al. 2002a), except that unlike CyC, which has a small dedicated team of humans adding facts, OMICS allows users from all over the internet to add facts. Users are presented with fill-in-the-blank questions such as “You blank when you are blank”, with the expectation that users will fill in, e.g., “eat” and “hungry” in the two blanks. The sentence templates map into relations, e.g., the people(Action,Context) relation, which may contain the instance people(eat,hungry). Figure 3 shows some other instances in the database. SRCS uses roughly 50,000 such instances spanning 15 relations. Sensors Figure 1 shows the iBracelet (Fishkin, Philipose, & Rea 2005), the sensor we use to detect object use during an activity. The iBracelet works as follows. RFID tags are attached to objects (e.g., the toothbrush and tube of toothpaste in Figure 1) whose use is to be detected. These tags are small, 40cent battery-free stickers with an embedded processor and antenna. The bracelet issues queries for tags at 1Hz or faster. When queried by a reader up to 30cm away, each tag responds (using energy scavenged from the reader signal) with a unique identifier; the identifier can be matched in a separate database to determine the kind of object. The bracelet either stores timestamped tag IDs onboard or transmits read tags wirelessly to an ambient base station, and lasts 12 to 150 hours between charges. We assume that if a bracelet de- Gauging the Quality of Facts Given that the data in the OMICS database is contributed by (non-dedicated) humans, it contains a number of nomi- 907 actiongeneralization(’investigate cause of’,’alarm’,’smoke alarm’) actiongeneralization(’wipe off’,’floorcover’,’carpet’) actiongeneralization(’clean’,’floorcover’,’carpet’) Converting from Relational to Propositional Form We convert the relational entries in the common sense databases into propositional form such that the atoms of the propositional form correspond to observables and propositions we wish to reason about. SRCS’s propositions are Horn clauses of the form p1 ∧ . . .∧pN ⇒ pN +1 , where the p1 , . . . , pN are either constants or atoms and pN +1 is an atom. Constants are of five types: object, action, location, context, and state. Types of atoms include (all told, SRCS uses 8 atoms): ... contextactions(’full garbage bag’,’put the garbage in’,’trash’) contextactions(’making toasted bread’,’slice’,’bread’) ... • useInferred(O) — object O’s use has been observed, or indirectly inferred. This is a key atom, since it binds to sensor observations and grounds our inference. • stateOf(O,S) — object O is currently in state S. • locationInferred(L) — the current location of the user is L. • personIn(S) — the user is in state S (e.g., “sleeping”, or “happy”). • actionObserved(A) — the action A is observed in the user’s world.. people(’eat’,’are hungry’) people(’drink water’,’are thirsty’) Figure 3: Sample facts from OMICS. The predicate “actiongeneralization” represents generalized versions of objects used in actions. “contextaction” associates a context with specific actions. “people” associates states of a person with actions performed by that person. nal “facts” that go against common sense. The database as it stands provides no information on the degree to which individual relations should be trusted. We use the lighweight webscale syntactic analysis techniques from the KnowItAll (Etzioni et al. 2004) system to estimate the degree to which each relation should be trusted. We convert from individual relational entries to corresponding Horn clauses using a small fixed set of rewrite rules (approximately 20). For example, one such rule is: people(S,A) (actionObserved(A) ⇒ personIn(S)). Thus if OMICS contains the fact people(angry,yell), then we define the atoms actionObserved(yell) and personIn(angry), and add them to our set of atoms. Weights from the incoming relations are preserved during rewrites so that the Horn clauses are weighted. The rewrite rules encode many assumptions about what the relations in the database mean. For instance, we provide a purely propositional view of the world: we cannot quantify over multiple instance. Further, each proposition is assumed to refer to the state of the world in a single timeslice. Although these assumptions may not always be true, we believe that the end result is still of value. KnowItAll is an information retrieval system designed to extract and evaluate widely known facts (and therefore also commmon sense) from the web. At the heart of KnowItAll is a template-based system that works as follows. To evaluate instances of a particular relationship, say people(Action,Context), KnowItAll uses a small number of examples of the relation to induce a number of text templates that exemplify the relation. For instance, it may induce the two templates "[action] when * [state]" and "[state] * [action]" because, for instance phrases such as "eat when hungry" appear on the web. Using normalized counts of incidences of these patterns in the web, KnowItAll is able to produce a measure of how reliable a particular proposition may be. From Weighted Clauses to Markov Random Fields Say we wish to track information about the state of the environment over a series of time intervals 1, 2, ...T . We are given a set of objects O1 , ...On whose use may be tracked over this time, and a set of atoms f1 , ...fm which we wish to track over this time. Let oi,t be the random variable representing the use of object Oi at time slice t. Let fi,t be the random variable representing the truth value of atom fi at time slice t. We will model the probability p(f1,t , ...fm,t , o1,t , ...on,t ). of the world for a given time slice t using a Markov Random Field (MRF). A MRF consists of a graph whose set of vertices V — in this case all fi,t and oi,t for a given t — are connected by a set of cliques ci ⊂ V . Each ci has a potential functions φi mapping assignments of ci to nonnegative reals. For an assignment ft , ot to f1,t , ...fm,t , o1,t , ...on,t , we have 1 p(ft , ot ) = exp( λi φi (ci (ft , ot ))) Z i A limitation of the base KnowItAll system is that it does not directly handle assessment for predicates with more than three arguments. Thus, the four-place stateChange(Action,Object,State,State) cannot be processed directly. In this case, we use a set of three-place templates, "[action] * [state1] [object]", "[state2] because * [action]" and "[state2] due to [ing form(action)]" to assess it. Each template corresponds to a Boolean feature: if the number of hitcounts obtained by instantiating it is greater than c = 5, the feature is true, else false. The features are combined to give the final score for the whole template. Although this technique does not always work well, it works well for identifying very reliable propositions. The end result of this pass is a database with relation instances r1 , . . . , rn with a weight wi associated with ri . 908 100000 0 0.1 0.2 % of observations in subgraph (~0.01) 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 observation counts % of observation counts number of nodes 10000 time t 1000 100 time t + 1 10 Figure 4: The chain graph representing our model over time 1 where ci (ft , ot ) is the assignment imposed on ci by ft , ot and Z is a normalization constant. λi is a weight placed on the clique ci and represents a tuneable parameter. For our current purposes, all λi are set to 1. To represent the state of world ft given the observations ot , we convert the weighted Horn clauses to an MRF as follows. For each atom and constant in the clauses, we have a node in the MRF. For each weighted clause p1 ∧ . . . ∧ pN ⇒ pN +1 with score w, create a clique c of the nodes ni corresponding to the pi , and associate with it the potential φ(p1 . . . pN +1 ) = w if p1 ∧ . . . ∧ pN ⇒ pN +1 , and 1 − w otherwise. In other words, we favor joint assignments that satisfy as many clauses as possible. The actual potential we use is slightly different so as to discourage assignments that set the atoms on the left of the clauses being set to false (since this will make clauses trivially true). This technique of converting weighted logical formulas to MRFs is similar to that used, for instance, in Richardson & Domingos(2006). 1 10 100 number of observations in subgraph 1000 Figure 5: Number of observations, and percentage of total observations, in selected subgraphs of depth d = 2. (ft , ft+1 ) indicates that random variable ft+1 is dependent upon ft . A model of the resulting CG is depicted in Figure 4. As described in (Buntine 1995), there is a natural decomposition of this graph into component Random Fields and Bayesian networks. The form of our graph allows for a natural expression of all dependencies present in a single time slice, including those from an adjacent time slice, as a MRF. We will describe filtering (i.e., forward inference) here. We may calculate the distribution of a node ft in the CG as P (ft ) = v∈DP a(ft ) P (ft |v)P (v), where DP a(ft ) is the set of directed parents of ft — i.e., all nodes vf −1 such that vf −1 has a directed edge to ft ; in our current model, DP a(ft ) is either empty or {ft−1 }. We assume that an atom’s truth value degrades at a constant rate, absent other observations, i.e., if ft is true at time t, then P (ft+k ) = pkT for some pT , and if ft is false at t, then P (ft+k ) = pkF for some pF . We fix pT to .95 and pF to .095, since we consider propositions more likely to be false than true. This is our model for defining p(ft+1 |ft ) for each proposition ft . Temporal Dependences via Chain Graphs The MRF is an effective representation for the relationships between observations and propositions about the world for a moment in time. However, we also wish to incorporate temporal relationships and infer over periods of time. One way to do this is to employ a dynamic MRF, in which we create an MRF for each time slice, then connect them into a larger MRF with potential functions between time slices. For our purposes, however, a dynamic MRF requires an immense amount of computation and is too inefficient; to infer over one slice of time t, we must infer over all time slices at once. A better model would permit us to calculate probabilities at time t conditioned only upon those probabilities at time t− 1 or t+ 1, not unlike the technique of rollup filtering in dynamic Bayesian networks. We thus employ a different technique for incorporating temporal relationships into our model. SRCS makes use of a chain graph (CG), described in detail in (Buntine 1995). A chain graph is a hierarchical combination of directed and undirected graphical models. To produce a temporal model, we produce a chain graph in which a series of MRFs, each representing a time slice t, are linked by directed edges, representing conditional probabilities between nodes in different slices. If ft and ft+1 are nodes in time slices t and t + 1, respectively, a directed edge Inference Say that we wish to track the truth values of propositions over a period of time 1...t. To infer state at a time i, we will fix those nodes whose truth value is known for a time slice i — in this case, all propositions of the form useinferred(O) will be set to true or false depending on whether use of object O was detected in time slice i. Using the marginals for propositions at time slice i − 1, we may then perform inference on this graph to calculate the probabilities of unknown variables at time i. We use loopy belief propagation (BP), as described in (Pearl 1988), for inference. Since the OMICS data is provided by untutored users in natural language form, the propositions and observations produced in its processing sometimes be difficult to resolve with observed data. We use a basic synonym-based matching scheme (e.g., matching “bucket” with “pail”), as well as 909 brush teeth shave your face dust shelves write a letter take medication take a shower eat cereal make cereal make tea a actioninferred(brush teeth with) c stateof(toothbrush,wet) e stateof(duster,dirty) g stateof(cereal,prepared) i locationinferred(kitchen) k locationinferred(bathroom) m personin(want to be entertained) o stateof(cereal,eaten) q actioninferred(prepare tea in) s actioninferred(shave using) u actioninferred(sit) w locationinferred(pantry) water plants watch television groom hair wash windows Figure 6: The set of activities for which experimental data was collected. b d f h j l n p r t v x contextinferred(brush teeth) stateof(teeth,clean) locationinferred(shower) actioninferred(eat) likelyaction(shower with) actioninferred(write) actioninferred(add milk to) likelyaction(swallow) stateof(tea kettle,hot) stateof(window,dirty) actioninferred(write a letter) locationinferred(greenhouse) Figure 7: Variables tracked during inference. other minor ad hoc techniques, to match the observed uses of objects with the appropriate propositions. Model Random All labeled false SRCS/uniform prob. SRCS/KnowItAll prob. Query-Directed Pruning This model still poses a problem: representing all the known commonsense propositions about the world at a particular time t can require a huge graph. After processing the OMICS database, our graph contains 55000+ nodes in a single time slice for representing both observations and hidden variables; inference on each time slice (where each slice represents 2.5 seconds of data) using loopy BP takes approximately 30 minutes. We thus employ techniques to make inference over this graph more tractable. It is unnecessary to performing inference over the entire graph of variables for two reasons. First, many of the variables may be irrelevant to the current context in which the system is being used; the state of the bathroom sink, for instance, is unlikely to be of interest when observing activities in the garage. Second, we typically use a query set of variables whose truth values we are interested in. Instead of performing inference over the entire graph, we perform inference over a smaller subgraph which includes all possible observations and the query variables we wish to track, in addition to many others. In practice, we do this by selecting the k variables we wish to track and selecting every node of distance d from each proposition in the time slice MRF (we used d = 2). To insure all observations were included in the graph, we then find the shortest path from each proposition to the node useinferred(O) for each object O being tracked by our system, if such a path exists, and select all nodes along these paths as well. The selected nodes, and the potentials between them, comprise the subgraph which we perform inference over. This is the method we will use in our experiments to track the state of a subset of variables over time. Figure 5 gives some insight into why the query-based pruning technique with d = 2 may work for our graph. Many nodes have a grounded observation node within their 2-neighborhood, and quite a few nodes have two or more such nodes. Note that if variables such as locations and primitive actions (such as limb movement) were directly observable and ground, the density would increase, further favoring pruned inference. Accuracy 50.00% 92.55% 80.26% 88.42% Precision 8.04% 18.07% 31.73% Recall 50.00% 0.00% 46.44% 53.07% Figure 8: Per proposition and mean accuracy, precision, and recall. relative measure, as they tend to increase as the likelihood of an activity increases. We thus use simple machine learning techniques to identify “threshold” probabilities beyond which we label a proposition true or false. To label traces, we feed SRCS object traces labeled with ground truth for the variables being tracked, and train decision stumps on each proposition to recognize the optimal threshold value for labeling variables. We then perform inference over object traces via the technique described, with observations of object use fixed to true or false, and label according to whether the probabilities output fall above or below the learned thresholds. Evaluation Methodology and Results For our experimental evaluation, we collected traces of the iBracelet’s output in an experimental setting as worn by three users while performing various daily activities in a simulated home environment. The list of activities performed can be seen in Figure 6. A total of 5-7 minutes worth of performance of each activity was collected, for a total of approximately 70-75 minutes of data. These traces were divided into time slices of 2.5 seconds; reasoning was to be performed over each of these time slices. For these activities, we considered a variety of variables about the state of the world which could be relevant to these activities. We then selected a set of 24 Boolean variables in the collected SRCS database which represented these variables, or were semantically very close to them; these variables are listed in Figure 7. We then recorded their “truth” value as being true or false for each interval of time in the trace. In some instances, the labeling of the variables involved were somewhat subjective in nature; this is a natural consequence of the OMICS database being collaborative, Output Thresholding The probabilities that SRCS outputs, in its current form, do not have much use as an absolute measure of the probability of an action. However, we have found they are useful as a 910 1 0.9 accuracy precision 0.8 recall 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 a b c d e f g h i j k l m n o p q r s t u v w x Mean Figure 9: Per proposition and mean accuracy, precision, and recall measures as labeled by SRCS. 0.6 ItAll mined potentials are used are given in Figure 9, and compared to random labeling and labeling all variables as false. While the precision and recall figures may seem low, it is worth noting that they perform far better than the baseline strategies, and have been achieved with the use of mined and preexisting data and little developer effort in developing the actual model. We see that use of the mined quality scores considerably improves the accuracy and precision. Much of the difference in results is caused by a decrease in positive labels (i.e., variables labeled “true”); while this lowers the recall slightly, the precision is considerably improved by the decrease in false positives. The mined scores appear to be effective in weeding out correlations of lower quality in the OMICS data. Precision and recall are quite high on certain variables, but performance on these measures is quite inconsistent, being very high on certain variables and low on others. Part of this is due simply to the inadequacies of the OMICS database; because OMICS is represented in natural language and built in an informal fashion, there may be many “holes” in its representation of the world. These deficiencies in the model could be resolved through human entry of correlations between variables or, preferably, automatic discovery of such variables via mining, machine learning, or other means. Finally, a graph comparing the precision and recall with increasing values of the parameter d can be seen in Figure 10. We see that improvement in precision and recall levels off for a remarkably small value of the pruning depth: above d = 3, recall is nearly flat, and precision even drops slightly. We chose d = 2 for our experiments because the precision and recall are nearly as good, but inference time is significantly less, because the number of nodes in the pruned graph grows exponentially with the pruning depth. ’d-vs-prec’ ’d-vs-recall’ 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 1 2 3 4 5 6 7 8 Figure 10: Plot of precision and recall versus increased pruning depth (d). unsupervised, and represented by natural language. However, we are interested in observing how closely the model may track the human interpretation of these variables, so we consider a human labeling of these traces to be an appropriate source of ground truth. We ran SRCS, and trained its stumps on a sampling of data for each activity (∼20 min total). We then found the labeled truth values it provided as per the method previously described, and compared them to the given “ground truth” values provided by human labeling. This was done in two runs: one with the potentials imposed by the OMICS data all set to a uniform strength, and one with the strengths defined by the mining by KnowItAll. Note that most of the variables are false most of the time (∼94% of labels are false), and finding true variables is of greater interest; we thus consider the standard IR measures of precision and recall with respect to discovery of true variables in addition to labeling accuracy. We are also interested in observing the effect of different choice of query-directed pruning depth d on the resulting output. We thus ran the experiments for values of d between 1 and 8 (the total diameter of the graph produced), to observe the difference. Total mean accuracy, precision, and recall measures for all variables combined are compared in Figure 8. Results for labeling of each proposition upon traces when Know- Future Work While this work provides an interesting architecture for inference about the state of the world with respect to everyday activity, there remains considerable room for improvement. For instance, our initial model is built in a relatively simple manner; the use of learning weights on the graph potentials with labeled object traces is likely to considerably improve inference accuracy. While labeling traces in this manner could represent considerable cost in terms of human 911 effort, there may be much promise in semi-supervised learning methods like those described in (Zhu 2005), in which a small amount of labeled traces may be used with a large set of unlabeled traces, which can be collected with minimal effort by simply letting the system passively record activities for some period of time. We are currently exploring effective learning of the weights and potential functions for the graphical model described, as well as making effective use of mined information like that provided by systems like KnowItAll to minimize the need for labeled data to learn from. We are also currently exploring the integration of other sources of sensory input into our system; while a trace of object use can be useful on its own in recognition and analysis of activity, other input, such as tracking of movement, can be helpful as well. To this end, we hope to incorporate input from the multi-sensor board described in (Lester et al. 2005), which measures a user’s acceleration in each direction and ambient environmental information, into the input of our system. This input could be use to recognize instances of different actions occuring with objects, e.g., the use of “chopping” with a knife. Another promising subject of study is the problem of selecting the subset of variables about the world that are relevant to the user’s context. In these experiments, we have only considered a fixed, predefined subset of variables as a means of selecting the variables to use in inference. A more sophisticated system could attempt to determine the current context of the user (e.g., in the bathroom, making dinner, fixing the car, etc.), select the subset of variables that were relevant to this context, and perform inference over them. We wish to explore solutions to this problem in future work as well. Buntine, W. 1995. Chain graphs for learning. In UAI 1995. Duong, T. V.; Bui, H. H.; Phung, D. Q.; and Venkatesh, S. 2005. Activity recognition and abnormality detection with the switching hidden semi-markov model. In CVPR, 838–845. Etzioni, O.; Cafarella, M.; Downey, D.; Popescu, A.; Shaked, T.; Soderland, S.; Weld, D.; and Yates, A. 2004. Methods for domain-independent information extraction from the web: An experimental comparison. In Proc. of 19th Annual AAAI Conference. Fishkin, K. P.; Philipose, M.; and Rea, A. 2005. Hands-on RFID: Wireless wearables for detecting use of objects. In ISWC 2005, 38–43. Gupta, R., and Kochenderfer, M. J. 2004. Common sense data acquisition for indoor mobile robots. In AAAI, 605–610. Lenat, D., and Guha, R. V. 1990. Building Large KnowledgeBased Systems: Representation and Inference in the Cyc Project. Addison-Wesley. Lester, J.; Choudhury, T.; Kern, N.; Borriello, G.; and Hannaford, B. 2005. A hybrid discriminative/generative approach for modeling human activities. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). McCarthy, J. 1996. From here to human-level AI. In Proc. of Principles of Knowledge Representation and Reasoning (KR). Minsky, M. 2000. Commonsense-based interfaces. Communications of the ACM 43 No. 8:67–73. Moore, D. J.; Essa, I. A.; and Hayes, M. H. 1999. Exploiting human actions and object context for recognition tasks. In ICCV, 80–86. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kauffman. Perkowitz, M.; Philipose, M.; Patterson, D.; and Fishkin, K. 2004. Mining Models of Human Activities from the Web. In WWW 2004. Philipose, M.; Fishkin, K.; Perkowitz, M.; Patterson, D.; Kautz, H.; and Hahnel, D. 2004. Inferring activities from interactions with objects. IEEE Pervasive Computing Magazine 3(4):50–57. Richardson, M., and Domingos, P. 2006. Markov logic networks. Machine Learning 62:107–136. Singh, P.; Lin, T.; Mueller, E. T.; Lim, G.; Perkins, T.; and Zhu, W. L. 2002a. Open mind common sense: Knowledge acquisition from the general public. In CoopIS/DOA/ODBASE, 1223–1237. Singh, P.; Lin, T.; Mueller, E. T.; Lim, G.; Perkins, T.; and Zhu, W. L. 2002b. OpenMind: knowledge acquisition from the general public. In Proceedings of the First International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems. Tapia, E. M.; Intille, S. S.; and Larson, K. 2004. Activity recognition in the home using simple and ubiquitous sensors. In Pervasive, 158–175. Wyatt, D.; Philipose, M.; and Choudhury, T. 2005. Unsupervised activity recognition using automatically mined common sense. In AAAI, 21–27. Zhu, X. 2005. Semi-supervised learning literature survey. Computer Sciences TR 1530, University of Wisconsin, Madison. Conclusions Densely deployable wireless sensors developed in recent years have made it possible to detect objects used in daily activities in great detail. We show in this paper that when coupled with recent advances in collaborative common sense databases, web-scale information retrieval and large-scale statistical inference, these sensors can yield a system capable of tracking the state of the world at various levels of details with relatively little human effort. This work suggests many future directions, including the use of other handmade databases, the use of information retrieval techniques to supplement these databases, the use of other dense sensors such as accelerometers, audio and GPS to ground even more commonsense nodes, an exploration of scaling inference in sound ways, learning parameters and structure of the huge network, the use of richer models such as first-order models, and testing the resulting systems on much larger amounts of data. References Bao, L., and Intille, S. 2004. Activity recognition from userannotated acceleration data. In Proc. of PERVASIVE 2004, LNCS 3001, 1–17. Brill, E.; Lin, J. J.; Banko, M.; Dumais, S. T.; and Ng, A. Y. 2001. Data-intensive question answering. In TREC. 912