Improvement=1

advertisement

The two methods differ only in the way they search for the abstraction level of a particular literal:

The old method tries to create at each iteration a rule with the next possible abstraction of a given literal: starting with the word itself it tries to replace it with its closest generalization.

The new method searches in parallel at multiple levels: at each abstraction step, four alternatives of the literal are proposed :

1.

the abstract form from the next upper level in the taxonomical tree,

2.

the abstract form found at the ¼ distance from the current level to the current possible maximum level of abstraction (quartile1 in pseudocode)

3.

the abstract form found at ½ distance from the current level to the maximum level

4.

the abstract form found at ¾ distance from the current level to the maximum level

(quartile2 in pseudocode)

The abstraction interval is defined as the set of possible abstraction levels for a literal at a given iteration that lie between an upper and a lower margin; the margins are initially

0 for the lower and maximum abstraction level for the upper and then updated iteratively as the following:

If the abstraction found at the next level is selected as being the best alternative for the literal, the new abstraction interval is centered to the first quartile and has the margins from the current level to the half of the previous interval.

If the abstraction found at the first quartile is selected as being the best alternative for the literal, the new abstraction interval is centered to the first quartile and has the margins from the current level to the half of the previous interval.(same as the previous)

If the abstraction found at the half of the interval is selected, the new abstraction interval is centered to the same location but has the margins from the first quartile to the third quartile.

If the abstraction found at the third quartile is selected the new abstraction interval is centered to the third quartile and has the margins from the half of the previous interval to the maximum level of the previous interval.

The purpose of this search technique was to avoid the possible block of improvement by low-level abstractions that do not score well. During the experiments, I found that even when higher level abstractions win over lower level abstraction, a second round of abstraction does not follow, indicating that the usefulness of close abstractions varies rapidly (and the higher the level, the higher the variation; since the number of attributes grouped inside an abstraction increases exponentially in the level of the abstraction).

The result of the experiments were mixed:

For Reuters, only 3 of the classes had different results: 2 of them being positive and one negative:

Class money-fx crude acq

BE with new method BE with old method

67.0 70.4

84.9 82.5

89.2 86.3

New method

The same trend was observed also on the UCI dataset.

The pseudocode:

Prune_By_Abstraction_Binary_Search (Rule,PruneData)

Begin

******************************

Size=Size(Rule)

Score=V*_Score(Rule,PruneData)

For i:=0 to size

TempRule:=Prefix(Rule,i)

If(V*_Score(TempRule,PruneData)>Score)

Score=V*_Score(TempRule,PruneData)

Prune_pos=i

Endif

Endfor

******************************

Improvement=1

Level=0

While(improvement)

Improvement=0

Level++

Quarter1=set_quarter(interval,l)

Quarter1=set_quarter(interval,2)

For j:=Prune_pos to size

Literal=Rule(j)

AbstrRule:=Abstract(Rule, Literal, Level)

AbstrRule_Quartile1:=Abstract(Rule, Literal,quarter1)

AbstrRule_Quartile2:=Abstract(Rule, Literal,quarter2)

AbstrRule_Half:=Abstract(Rule, Literal,

(quarter1+quarter2)/2)

ScoreIncr=V*_Score(AbstrRule, PruneData)

ScoreARQ1= V*_Score(AbstrRule_Quarter1, PruneData)

ScoreARQ2= V*_Score(AbstrRule_Quarter2,

PruneData)

ScoreHalf= V*_Score(AbstrRule_Half,

PruneData)

MaxScore=Max(ScoreARQ1, ScoreARQ2, ScoreHalf,

ScoreIncr)

If(MaxScore>Score)

If(ScoreIncr==MaxScore)

Interval=UpdateInterval(Quarter1)

TmpRule=AbstrRule

If(ScoreARQ1==MaxScore)

Interval=UpdateInterval(Quarter1)

TmpRule=AbstrRule_Quarter1

If(ScoreARQ1==MaxScore)

Interval=UpdateInterval(Quarter2)

TmpRule=AbstrRule_Quarter2

If(ScoreHalf==MaxScore) …

Improvement=1

EndIf

Endfor

If(improvement)

Rule=TmpRule

Endwhile

Return Rule

End

Old method

Prune_By_Abstraction (Rule,PruneData)

Begin

**************************

Size=Size(Rule)

Score=V*_Score(Rule,PruneData)

For i:=0 to size

TempRule:=Prefix(Rule,i)

If(V*_Score(TempRule,PruneData)>Score)

Score=V*_Score(TempRule,PruneData)

Prune_pos=i

Endif

Endfor

*************************

Improvement=1

Level=0

While(improvement)

Improvement=0

Level++

For j:=Prune_pos to size

Literal=Rule(j)

AbstrRule:=Abstract(Rule, Literal, level)

If(V*_Score(AbstrRule,

PruneData)>Score)

Score=V*_Score(AbstrRule,

PruneData)

TmpRule=AbstrRule

Improvement=1

Endif

Endfor

If(improvement)

Rule=TmpRule

Endwhile

Return Rule

End

Download