From: AAAI Technical Report SS-00-01. Compilation copyright © 2000, AAAI (www.aaai.org). All rights reserved.
The Learning Shell
Nico Jacobs
Dept.of ComputerScience,K.U.Leuven
Celestijnenlaan
200A,
B-3001Heverlee,
Belgium
Nico.Jacobs@cs.kuleuven.ac.be
Abstract
Inthispaper
wepresent
preliminary
workintheareaof
automated
construction
ofuserprofiles.
We consider
thetaskof modeling
thebehavior
of a userworking
witha command
shell(suchas cshor bashunderthe
Unixoperating
system).
We illustrate
howthetechniqueof inductive
logicprogramming
canbe usedto
build
a userprofile
andhowthisprofile
canbeusedto
predict
thefuture
behavior
of theuser.We describe
a first
setofexperiments
totestthese
hypotheses
and
present
directions
forfuture
work.
Introduction
Computers are probably the most flexible
tools
mankind ever built. However most users are not able
to optimally use all the features computers can offer
because they lack the knowledge about these features
or lack the time to optimally configure their hardware
and software. By combining domain knowledge about
potential optimizations with a user profile describing
how the user (mis-)uses the computer changes can
made to optimize the use. In this paper we focus on
the automatic construction of a user profile as a set of
rules in first order logic. Weimplementedthis technique
to model users of a commandshell in a Unix environment and present some preliminary results on using this
model to predict the user’s future behavior.
with a description of the state of the environment if
relevant. In the learning from interpretations setting
(Blockeel et al. 1999) we represent input as facts in
first order logic. The output is a general description
of the behavior of the user represented as a first order
logic theory.
In the rest of the paper we will focus on modeling a
commandshell user. Although this is a very old and
basic user interface, it is still widely used. The main
reason for selecting this user interface is that it’s very
easy to log the user’s actions and state of the environment whenusing such shells. It’s also very easy to adapt
the shell so that it uses the generated model. Modern
graphical user interfaces (e.g. the Microsoft Windows
2000 operating system) also keep a log of all actions
performed, which is comparable with what a user types
in at a shell. As a result, the technique we present here
can be used to generate user models from the logs of
modern GUIs as well. However, the main difference is
how the resulting model will be used.
We use the ILP system WARMR
(Dehaspe & Toivohen 1999) to discover first order association rules in the
user behavior. As input to the system we use the commands typed at the prompt of a Unix commandshell.
Inductive logic programming (ILP) (Muggleton
De Raedt 1994) combines machine learning with logic
programming. ILP systems take as input a set of examples and produces an hypothesis which generalizes
over these examples. Input as well as output are represented in a subset of first order logic. Also background
knowledge can be used. This is information provided
by a domain expert which can be used in building the
output hypothesis.
We canuseILP techniques
to automatically
builda
modelofthebehavior
of theuser.Theinputof thesystemconsists
of actions
performed
by theusertogether
Z cd rex
Y, emacs-n. -q aaai.1;ex
Thesecommands
areparsedandtransformed
intologicalfacts.
Wesplituptheinformation
in threefacts:
inI
formation
aboutthecommandused,theswitches
used
andtheattributes
(suchas filenames).
Thefirstargumentof eachof thesefactsis a uniqueexample
identifierwhichindicates
theorderin whichthe command
was entered,
thesecondargument
of switch/3
andattribute/3
predicates
indicates
theorderof switches
and
attributes
withinthe commandbecausethe ordermay
be important
(e.g.in thecp (copy)-command).
In
it’spossibleto submitmultiplecommands
one input
line(usingthe pipe-symbol
’--’)in sucha way that
theresult
of thefirstcommand
isusedas theinputfor
thesecondetc.To incorporate
thatin our knowledge
representation
schemewe caneitherintroduce
a fourth
Copyright
© 2000,American
Association
forArtificial
Intelligence
(www.aaal.org).
Allrights
reserved.
la switch
is a parameter
provided
toa command
to modifyit’sbehavior
Inductive
Logic Programming
5O
fact which takes as argument the identifier of a piped
command, or we can add a third boolean argument to
the commandfact. This feature is not yet included in
the prototype system.
For each rule we know the accuracy of the rule, the
3)
deviation (an indication of how’unusual’ the rule is
and the number of examples which obey this rule.
Experiments
command(1,’cd’).
attribute(1,1,’tex’).
command(2,’emacs’).
switch(2,1,’-nw’).
switch(2,2,’-q’).
attribute(2,1,~aaai.tex’).
To test the quality of the generated user models we performed tests in which we use these modelsto predict the
next commandthe user will type based on a history of
5 previous commands. This raises the problem of how
the user model can be used to predict the next command a user will use, because in most situations more
Nowwe can start looking for regularities in these exthan one rule in the user modelwill be applicable, often
amples using WARMm
However, within one example
with different conclusions. Therefor we tested 9 strateone can only find relations between a commandand
gies for selecting a rule to predict the next command:
it’s attributes, a command
and it’s switches or between
using (the prediction of) the rule with the highest acswitches and attributes. Interesting rules describe recuracy, the highest deviation, the highest support, the
lations between a commandand the previously used
commandwhich was predicted the most often, the comcommands.WARMR.
doesn’t automatically take into acmand with the highest accumulated accuracy, accumucount these relations, but using background knowledge
lated deviation or accumulated support. Wealso used
we can inform WARMR
about these relations by definthe commandwith the highest product of accuracy and
ing relations such as nextcommandand recentcommand.
deviation, and with the highest accumulated product of
Background information is represented as Prolog prothese two statistics.
grams, see (Bratko 1990) if you are unfamiliar with the
In this experiment we used data from real users.
used Prolog notation.
Some of the commandstyped by the user don’t exist
(errors), other are only very rarely used. Weinstructed
nextcommand(CommandID,Nextcommand)
the system to only learn rules to predict commands
NextID is CommandID
- 1 ,
from a set of 30 frequent occurring commands4. We
command(NextID,Nextcommand).
logged the shell use of 10 users, each log consisting of
recentcommand(CommandID,Recentcommand)
approximately 500 commands. Wesplit up each log in
command(RecentCommandID,Recentcommand),a training set consisting of 66%of the examples, the
Recentborderis RecentcommandID
+ 5 ,
remaining examples were used as a test set. We perRecentborder> CommandID.
formed two experiments. In a first experiment one general user model was built from the combined training
Wecan also use this background knowledge to ’endata of all the users and we tested this general model
rich’ the data that’s available. One type of background
on the test data of each user individually. In a second
knowledge is defining some sort of taxonomy of comexperiment we build an individual user model for each
mands(emacs is an editor, editors are text tools, ...).
user based on her training data and tested this on the
Other information can be provided as well (is this new
test data for that user.
or old software, does it uses a GUI, network connection,
The first experiment (’global’-column in table 1) conor other recourses, ...)
firms that the most intuitive rule selection method
(namely selecting the rule with the highest accuracy
command(Id,Command)
on the training set (maxacc)) leads to the best results:
isa(0rigcommand,Command),
36.4% of the occurrence of frequent commandswas precommand(Id,0rigcommand).
dicted correctly 5. Using the rule with the highest deisa(emacs,editor).
viation (maxdev), with the highest product of accuisa(vi,editor).
racy and deviation (maxpro) and with the highest acWhenall input data is converted and all useful backcumulated product of accuracy and deviation (accpro)
ground knowledge is defined, WARMR
calculates the set
all perform almost as well. A clearly bad rule selecof rules which describes all 2 frequent patterns which
tion method is selecting the commandwhich is prehold in the dataset. These rules can contain constants
dicted by the highest numberof applicable rules regardas well as variables.
less of there accuracy or deviation (count), resulting
IF command(Id,’ls’)
THENswitch(Id,’-l~).
IF recentcommand(Id,’cd’)
ANDcommand(Id,’ls’)
THEN nextcommand(Id,~editor~).
3the proportional difference betweenthe numberof examples which obey this rule and the number of examples
whichwouldobey this rule in a uniform dataset
4Is, cd, exit, pine, rm, vi, more, pwd,dir, make,cp, ps,
finger, telnet, javac, export, pico, tt, lpq, elm, man,java,
rlogin, w, my, grep, logout, less, ssh and mail
Sail percentages are averagesover the 10 users
2we have to provide a language bias which defines the
space of rules we are interested in
5]
method
maxacc
maxdev
maxpro
maxrul
accacc
acedev
accpro
accrul
count
global
36.4
34.8
36.0
28.3
32.5
34.0
35.5
30.2
20.5
specific
44.7
43.3
45.3
34.4
41.3
40.1
42.5
40.0
31.2
Table 1: Prediction accuracies for different rule selection methodsand for global as well as user specific models
a 20.4% accurate prediction. Rule selection methods
based on accumulated accuracy (accacc), accumulated
deviation (accdev), predictions from rules which cover
most examples (maxrul) and commandswhich have the
highest accumulated number of covered examples in the
training set (accrul) all perform in between.
The second experiment -- in which user models are
built for each user individually -- shows the same results concerning the rule selection methods: maxacc,
maxdev, maxpro and accpro perform well, count performs bad and the others are in between. The prediction accuracies are significant higher than in the first experiment, and this for all rule selection methods. This
shows that constructing user models for each individual separately is beneficial for the resulting user model
and confirms previous results in the domainof building
user models for the browsing behavior of users (Jacobs
1999).
Future
Work
The system and experiments described above are only
preliminary work. It illustrates that it’s possible to
learn user models from logs, and that it’s beneficial to
learn individual models for each user. However, the
system still needs improvements to become practical
useful.
In both experiments the best rule selection method
doesn’t outperform the other methods: for some users
other methods perform better than the method with
the overall highest accuracy. This information could be
used as input for a meta, learning task: which type of
rule selecting method is best suited for which type of
user model. An other possible improvement of the rule
selection method is by defining new methods as combinations of the above methods (the product of accuracy
and deviation is already an example of such a method).
Voting or weighted voting techniques could be useful.
The rules that the system builds up till now only
predict the commandthe user will use. Often the parameters ¯ the user provides (such as filenames) can
predicted as well. Since we use inductive logic programming as a induction technique, it’s easy to adapt the
system such that rules can be constructed which not
only predict the commandbut also (some of) the parameters, either as constants or as variables (in this way
linking them to parameters of previous commands).
Wefocused on how to generate the user model and
how to select rules from this model. A next step is the
actual use of these models to aid the user. An obvious
use of these models is intelligent commandcompletion
in which the computer tries to complete the command
the user is typing. With these models this is even possible before the user has typed the first character of
the new command!But other possible more useful applications of the models exist. One such application
is the automatic construction of useful new commands
(macros). The system can do this by selecting a ’good’
rule (for example a rule with high accuracy and deviation) if command(A) then nextcommand(B) and
ating a commandC which just executes A and then B.
Another application of this user modelis in giving the
user information about howhe (mis)uses the shell. The
system could e.g. detect that the user often removes
file A after copying file A to B. The system can then
inform the user that it’s more efficient to use the movecommand.
The log file that the system uses to induce a user
model lacks information which can be useful. Up till
now the file only contains a history of commands,but
no information at all about the state of the environment. Wecan inform the system by including information such as the date, the time, the workload of the
machine and the current directory at the momentthat
a commandis invoked in the log file. This information
can then be used in the construction of new rules.
Finally we considered the user model as static. In reality a user model changes often, so our system should
build user models in a non-monotonic incremental way.
This can be achieved with the current system by keeping a window on the log file and use those examples
to construct a user model from scratch each time. But
this is clearly a very inefficient approach and a truly
incremental system would speed up the user model construction/maintenance.
Conclusions
Webriefly described howwe can use the inductive logic
programming system WARMR
to build user models and
discussed the benefits of this approach: the use of background knowledgeto provide Ml sorts of useful information to the induction process and an expressive hypothesis space (horn clause logic) which allows for finding
rules that contain variables. Weillustrated this in the
domain of commandshells. Wealso presented 9 strategies to select rules from this model for prediction. We
constructed a global user model as well as individual
models for each user and tested their predictive accuracy on real user data, which indicated that individual
models perform significant better than a global model.
In future work we have to extend the system to predict the parameters of the commandsas well, improve
52
the rule selection method, incorporate more information about the environment and look for a truly incremental construction of the user model. Finally more
attention should be paid to the use of the model to
improve the user interface.
Acknowledgements:
The author thanks Luc Dehaspe for his help with the WArtMrtsystem and Luc De
Raedt for the discussions on this subject. Nico Jacobs
is financed by a specialization grant of the Flemish Institute for the promotion of scientific and technological
research in the industry (IWT).
References
Blockeel, H.; De Raedt, L.; Jacobs, N.; and Demoen,
B. 1999. Scaling up inductive logic programming
by learning from interpretations.
Data Mining and
Knowledge Discovery 3(1):59-93.
Bratko, I. 1990. Prolog Programmingfor Artificial
Intelligence. Addison-Wesley. 2nd Edition.
Dehaspe, L., and Toivonen, H. 1999. Discovery of frequent datalog patterns. Data Mining and Knowledge
Discovery3 (1) :7-36.
Jacobs, N. 1999. Adaplix: Towards adaptive websites.
In Proceedings of the Benelearn ’99 workshop.
Muggleton, S., and De Raedt, L. 1994. Inductive logic
programming : Theory and methods. Journal of Logic
Programming 19,20:629-679.
53