Mental Models for intelligent agents Nikolaos Mavridis MIT Media Lab Early motivation… How are people able to think about things that are not directly accessible to their senses at the moment? What is required for a machine to able to talk about things that are out of sight, happened in the past, or view the world through somebody else’s eyes (and mind)? What is the machinery required for the comprehension of a sentence like “Give me the green beanbag that was on my left”? Overview Why mental models? Architecture W: The descriptive language Reusable models Property description structures S: Sensory structures F: Instantiators / Predictors / reconcilliators Closing comments… Why mental models ? Goal: Provide an intermediate representation, mediating between perception and language In essence: an internalized representation of the state of the world as best known so far, in a form convenient for “hooking up” language a way of updating this representation given further relevant sensory data, and predicting future states in the absence of such data Why mental models? But also: A useful decomposition of a complex problem, suggesting a practical engineering methodology with reusable components, as well as a theoretical framework A unified platform for the instantiation of hypothetical scenarios (useful for planning, instantiation of situations communicated through language etc.) A starting point for experimental simulations of: Multi-agent systems with differing partial world knowledge or model structure Primitive versions of theory of mind by incorporating the estimated world models of other agents Learning parameters or structures of the architectures, and experimenting with learned vs. innate (predesigned) tradeoffs (for example, learning predictive dynamics, senses-to-model maps, language-to-model maps etc.) Notation & Formalities D = {W, S, F} D = {W, S, F} : A dynamical mental model W: Mental Model State S: Sensory Input: W[t]: state of the mental model at time t W: the structure of the state (the chosen descriptive language for the world, ontology). Decompositions might be hierarchical. W = {O1, O2, … } into Objects/Relations (creations/deletions crucial) Oi = {P1, P2, …} into Properties (updates of contents but usually no creations/deletions) S[t], S, S = {I1, I2, …} (Modalities/Sensors) F: Update / Prediction function W[t+1] = F( W[t], S[t] ) as a dynamical system F is a two-argument update / prediction function A decomposition: (…also Wh[t]: hypotheticals) (W[t],S[t])->Ws[t] (sensory-driven changes in W form) W[t]->Wp[t] (prediction-driven changes in W form) W[t+1] = R(Ws[t], Wp[t]) (the “reconcilliation” function) Block Diagram (&sync issues!) W: the descriptive language W in conversational setting: include me, you, others Indexing: Internal & External Ids, continuity, signatures Bottom-up: Simple_object (f.e. a cylinder) Object_relation (binary) (f.e. hinge joint) Compound_object = SimpleObjectMap U ObjectRelationMap Agent = Compound Object U Viewpoint U Gripper U Mover? Agent_relation (f.e. inter-agent joints, visibilty?) Compound_agent = AgentMap U AgentRelationMap = World Basic properties: in simple_object, object_relation Property description structures (fixed, with confidences, with stochastic model, observational history, categorical form) - later Simple_object class simple_object: public Packable { long ID; //OBSERVER-INDEPENDENT PROPERTIES //Gradations of existence int exists; //exists=1 means sensory object, exists=0 means virtual int body_exists; //for ODE newtonian dynamics int geom_exists; //for ODE collision-detection int draw_exists; //should it be visible in the visualiser? //Position, rotation and velocity (second-order state space for rigid body) double pos[3]; double R[12]; //remeber to set quaternion, too! double lvel[3],avel[3]; double facc[3],tacc[3]; //force and torque accumulators...are they required? //Shape int shape; #define SOBJECT_SHAPE_BOX, SOBJECT_SHAPE_SPHERE 2, SOBJECT_SHAPE_CYLINDER 3, SOBJECT_SHAPE_CAPPEDCYLINDER 4 double shapeparam[3]; //[0] is also radius, [1] is also length //Mass double double double etc. WHICH SHOULD BE CHOSEN AND WHICH DERIVED? density; mass; //this is just density*volume, i.e. density*f(shape) weight; //should this also be here? just mass*gravity //Color & texture double color[3]; int texture; //Relations with other objects; attachment, visibility (inview_rip is OBSERVER-DEPENDENT in a sense...) //THESE MIGHT ALSO BE PART OF RIPLEY'S STRUCTURES int attached; int inview_rip; int inview_rip_x2D, inview_rip_y2D; } Object_relation class binary_object_relation : public Packable { friend ostream& operator<<(ostream& os, binary_object_relation &bor); public: long ID; long obj1ID; long obj2ID; //vector<double> params; //e, tha ginei pio specific int type; #define BOR_TYPE_HINGE 1,define BOR_TYPE_HINGE2 2,define BOR_TYPE_BALL 3,define BOR_TYPE_SLIDER 4 double axis[3]; #define BOR_DIRECTION_X 1, BOR_DIRECTION_Y 2, BOR_DIRECTION_Z 3 double anchor[3]; double param[10]; #define BOR_PARAM_ANGLE 0 #define BOR_PARAM_ANGLERATE 1 #define BOR_PARAM_HISTOP 2 #define BOR_PARAM_LOSTOP 3 #define BOR_PARAM_VEL 4 #define BOR_PARAM_FMAX 5 #define BOR_PARAM_FUDGEFACTOR 6 #define BOR_PARAM_BOUNCE 7 #define BOR_PARAM_STOPERP 8 #define BOR_PARAM_STOPCFM 9 } Compound_object typedef map<long, simple_object> SimpleObjectMap; typedef map<long, binary_object_relation> BinaryObjectRelationMap; static long compound_object_ID_counter=0; //Object 0 is not allowed! class compound_object { friend ostream& operator<<(ostream& os, compound_object &cobj); public: long ID; long signature; //signature of IDs of component objects and relations int exists; //existence flag.... //should we allow existence of subobjects even if globally it doesnt exist? SimpleObjectMap objects; long internal_object_ID_counter; BinaryObjectRelationMap relations; long internal_relation_ID_counter; compound_object(); compound_object(long ID_); void clear(); void set_exists(); void set_notexists(); void clear_objects(); void clear_relations(); long add_object(simple_object &object_in); //returns new outerID long add_relation(binary_object_relation &relation_in); void add_objects(SimpleObjectMap &somap_in); void add_relations(BinaryObjectRelationMap &bormap_in); void add_objects_and_relations(compound_object &cobj_in); void add_SimpleObjectMap(SimpleObjectMap &somap_in); void add_BinaryObjectRelationMap(BinaryObjectRelationMap &bormap_in); void add_compound_object(compound_object &cobj_in); int delete_object_innerID(long ID_); … etc…. Agent, Compound Agent class agent : public compound_object, public Packable { public: viewpoint viewpt; gripper grip; //mover mov; public: void pack(int initsend=1); void unpack(); }; typedef map<long, agent_ODE> AgentODEMap; typedef map<long, binary_agent_relation_ODE> BinaryAgentRelationODEMap; class compound_agent_ODE : public Packable { long ID; long signature; //signature of IDs of component objects and relations int exists; //existence flag.... AgentODEMap agents; long internal_agent_ID_counter; BinaryAgentRelationODEMap relations; long internal_relation_ID_counter; + member functions… More on the structures… myObjects: myModels: Packaging: A dozen of .h/.cpp made into libmyobjects.a (+some utils), include “myobjects.h” Expand/rethink types of relations! Think about joint (and relation!) recognition (Mann…) ready-made models for specific agents (ripley, human, environment)… packaged in libmymodels… Expand!!! These include parameter sets for customization, creation and deletion functions (as well as sensory update functions?). OuterIDs and body parts? myObjectsODE: ODE-supplemented version for predictor A model example: ripley_model.h EASILY PARAMETRISABLE FOR OTHER n-dof ARMS… #define SC 6. //Scaling factor, for some dimensions (rip etc.), not all! const double ripley_start_pos[3] = {23.2/SC,0,5.2/SC}; //Simple objects comprising ripley #define NUM 6 //Number of links comprising ripley const double ripley_head_length = 5/SC; const double ripley_head_radius1 = 3.5/SC; const double ripley_head_radius2 = 2.5/SC; const double ripley_head_color[3] = {.5,.5,.5}; const int ripley_head_texture = DS_WOOD; const double ripley_link_length[NUM]={2.4/SC, 22.8/SC, 9/SC, 1.8/SC, 2.3/SC, ripley_head_length /*0.1/SC*//*5.2/SC*/}; // last one - length of camera const double ripley_color[3] = {.61,.61,.61}; const int ripley_texture = DS_WOOD; const double ripley_link_radius[NUM]={2/SC, 2/SC, 1.5/SC, 1.5/SC, 1.1/SC, ripley_head_radius2}; Etc… //************* //* FUNCTIONS * //************* void create_ripley_part_i(simple_object &part, int i, double* midpt, double* R_); //parts void create_ripley_bor_i(binary_object_relation &bor_in, SimpleObjectMap &somap_in, int i, double* pos); //joints void create_ripley(agent &agent_in, const double* pos); //Initial creation, called in main void calc_ripley_viewpoint_hpr(agent &agent_in); void calc_ripley_viewpoint_hpr_nofilter(agent &agent_in); void calc_ripley_viewpoint_mat(agent &agent_in); void calc_ripley_viewpoint_nofilter(agent &agent_in); void calc_ripley_viewpoint(agent &agent_in); Property description structures The near future: Class property_conf {string name; double value; double confidence;} How to deal with ints/doubles and vectors? How to update conf? Decrease with time? 4-tier structure Class property_4 {categorical_descr c; //variable granularity, context-sensitive boundaries property_conf ml; stoch_descr distrib; relevant_sensory_history senspointers;} Advantages: Confidences vital for incomplete knowledge / information-driven sensing Homogenisation very useful for later experiments in feature selection etc. Suggestions/ideas? S: the sensory structures Vision: Proprioception: Objectworld from 2D Objecter Extensions for 3D? Shape models? Partial view integration and the instantiator? JointAnglesPacket form ripley’s control Weight measurements Direct access to force feedback? Switching from continuous to on-demand feeding of new information (I.e. lookat() etc.) F: update/prediction function I. Instantiators Modality-specific instantiators(&updators/destruct.): Send create/update/delete packets to mental_model They SHOULD know previous world state R they modality or agent-specific? Should the generic agent models include specific sensory update functions? Virtual object instantiator: Sometimes also used for creation of sensory-updated agents (I.e. self) – boundaries? What would the clients need? Let’s choose an API F: update/prediction function II. Predictor & Reconcilliator Prediction rules: Collision detection (collisions as obj_relat) Dynamics (reconcilliation with senses, inference of internal forces… where to store?) Out of bounds deletions & object stabilisation Reconcilliation: How to resolve conflicts between sensed, predicted and requested? (think: multiple sensors in car) Simplistic: When no other info, use prediction. Else, blend senses with prediction? Closing comments Many open questions / lots of work in the horizon! Some landmarks for the future: How do you achieve localisation of information and actions in these modules? Who should know what and how should things be synced? What about global signals sent from outside? Let’s design for easy customisation/reusability. Significant parallelisation achieved. Confidences in property descriptions Virtual object instantiator connected to 3D world creation tool for simulated external worlds Better shape description capabilities & vision Connection to a different robot Two virtual agents in simulated world each with its own mental model and the estimate of the other’s – simple demos 4-tier property descriptors Hypothetical scenarios and planning Parallel work: Extend linguistic modules for more functionality given the richness of the structures Given confidences, better shape and extended bishop, do action (and speech) selection by maximum expected information return in general framework Our ultimate goal… Let’s make ripley and his brothers more fun to talk to! And let’s learn more about us on the way… THANX 4 yr attn!