A Transient Pattern Search Algorithm for Event Visualization Siva Sankar Grandhi

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
A Transient Pattern Search Algorithm for Event
Visualization
Siva Sankar Grandhi1, Srinivasu Varma Penumatsa 2, Chinna babu Galinki 3
3
1
1,2,3
M.Tech Scholar, 2Associate Professor, Associate Professor
CSE Department , Avanthi’s St.Theressa Institute of Engineering & Technology
Abstract: Pattern Searching is more important task during
the searching process in records or web or large amount data.
In traditional searching operations the patterns are stored in
an array and store time stamp anyway. We introduced new
algorithm so called as temporal pattern search and it
maintains same types in different events and time stamps. It
performs binary search using appropriate time stamps. It
highly and efficiently works in personal histories.
I. INTRODUCTION
As the use of electronic health records (EHRs)
spreads, there are growing opportunities for their use in
clinical research and patient care and the search queries
often have a temporal component. Considering the example
that finds all patients who were discharged from the
emergency room then admitted again within a week. And
another example is that find patients who had a normal
serum creatinine lab test less than 2 days before a radiology
test with intravenous contrast and then an increase in serum
creatinine by more than 50%. Currently available user
interfaces make possible simple queries such as that find
patients who had a radiology test with contrast and a high
value of creatinine and leave the users with the burden of
shuffling through large numbers of results in search of
matching patients.
Specifying temporal queries in SQL is difficult
even for computer professionals specializing in such
queries and the researchers have made progress in
representing temporal abstractions and executing complex
temporal queries1, 2, 3, 4, but there is very little research
that focuses on making it easy for clinicians and medical
researchers to both specify the queries and examine results
visually.
Temporal searches are used in many situations
such as clinical trial recruitment and clinical research as
well as general patient or alarm specification. Take an
example; that setting an alarm for patients on Heparin with
a precipitous drop in platelet counts (heparin-induced
thrombocytopenia) requires specificity around the
definition of precipitous. By querying existing EHR
databases have the interface for physicians designing the
alarm can iteratively test the logic of the alarm and validate
it with a large amount of data. Clinicians are always
concerned about changes from some baseline state.
A blood pressure of 90 per 60 may be normal for
a 25-year old female but may represent severe hypotension
in a 65-year old male hypertensive patient whose blood
pressure during previous visits was 160 per 100. In these
ISSN: 2231-5381
scenarios, changes from the baseline determine whether or
not an intervention should be taken. All of us believe that
interactive query interfaces are allowing researchers and
clinicians to explore data that have specific temporal
patterns in both numerical and categorical data will
dramatically increase the benefits of EHR databases. The
details of presentation have the results can then help users
see patterns and exceptions in the data they retrieved and
correct their query accordingly.
Much of the seminal work in computer science
relating to time 9, 10, 11 stems from artificial intelligence
time reasoning and early natural language processing and
this is referred as time theory.
Databases: Due to the complexity of evaluating
the structured query language queries there are several
approaches have made database query and more accessible
to a broader spectrum of users and the input Query By
Example (QBE) that the visual query mechanism used in
Microsoft’s Access TSQL28, a hybrid between QBE and
Extended Entity-Relationship diagrams12, 13. MQuery14
targets various types of streaming data.
The field of information visualization has emerged
"from research in human-computer interaction and
computer
science
and
also
graphics, visual
design, psychology, and business methods
and It is
increasingly applied as a critical component in scientific
research, digital libraries, data mining, financial data
analysis, market studies, manufacturing production control,
and drug discovery. Information visualization presumes
that "visual representations and interaction techniques take
advantage of the human eye’s broad bandwidth pathway
into the mind to allow users to see and understand large
amounts of information and visualization focused on the
creation of approaches for conveying abstract information
in intuitive ways.
Harada et al. developed a query language and
algorithm to search for patterns in multiple personal
histories. Their implementations assumes a grouping over a
column of data (e.g., customer ID) and an ordering by a
second column (e.g., time stamp) in the data structure and
performs pattern search algorithms over this structure and
they do not use an NFA approach to perform this search.
They developed an algorithm that resembles building a
topological graph. The cost complexity of their language
allows the specification of only limited negation. This
limitation means that their algorithm never has to
backtrack.
http://www.ijettjournal.org
Page 4394
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
II. RELATED WORK
Temporal data includes information about a
temporal event and it which describes an observation or set
of observations through time of a particular object or group
of objects. Therefore, the event includes information about
the observation itself and when or where the observation
took place and what activity was observed and as well as
identifies information about the object. Analyst for tracking
organizes this information into simple and complex
temporal events. The normal temporal event contains all
necessary information in one message or record and
contact to as the temporal observation component. A
normal temporal event includes a second component,
referred to as the temporal object. On the other case of
fixed-time data stored on disk and components appear as
files or tables.
Using the track identifier field
Whether working with simple or complex
temporal events and definitely need to become familiar
with the track identifier field, or ID field. The ID field
contains an identifier for objects being observed through
time. This value may be used to connect different
observations of the same object for display and analysis
purposes. Take an example that you may be tracking
several trucks with unique ID values on their routes
throughout the day. Using the ID field you can connect
each truck's activities and the same to connecting the dots.
Social Security numbers serve the same purpose and this
field does not need to be called ID and it is important to
make sure it contains the appropriate identifying
information. The line connecting the dots and which you
can apply on the Symbology tab, is called a track. Tracks
can be applied to simple or complex temporal events when
an ID field is set.
A) Simple events
The temporal observation component is part of the
data. It consists at least the date and time. And if all the
data is organized in one table it includes the date and other
attributes, the record (in fixed-time data) or message (in
real-time data) is considered a simple event. This simple
event contains in one component all elements necessary for
Tracking Analyst to process and display it.
B )Complex events
Complex events include two components: an
observation component and an object component. If the
temporal component does not include all the needed
information for the object and the additional information
may be stored in a second component is as referred to the
temporal object component and the contents of this
component will depend on whether the observed object is a
moving and static or discrete event. It will at least include
certain static attributes and the ID field. The merger of the
temporal observation with the temporal object creates a
complex event record or message and this merger uses one
identical field in both tables—typically the ID field—to
combine the two and the yielding a full picture of each
ISSN: 2231-5381
object's information. In the case of real-time data and this
merger occurs automatically so you will see the data
message stream in with all its necessary components
already combined and the more information on real-time
data structures, see about real-time data. A complex event
may further be described as either stationary or dynamic.
C) Complex stationary events
An example of a complex stationary event is input
from a traffic sensor. The sensor's geographic location
doesn't change so its static coordinates or other location
information is stored in the temporal object table. The
temporal object component also includes the sensor’s ID
and possibly other attributes. Because this information is
stored in the object component and the temporal
observation includes the ID and the date and time of the
observation and it possibly other attributes—but not the
locational information.
D) Complex dynamic events
An example of a complex dynamic event is
information from an airplane and its geographic location
changes constantly, so its location information, as well as
its ID and the date and time of its observations are stored in
the observation component. The temporal object table may
include information such as the make and model of the
aircraft and its pilot and crew information the age and
capacity of the fuselage.
E) The Adding complex events from fixed-time data
The following procedures include steps for adding
fixed-time simple and complex temporal events as new
layers in ArcMap and you add complex events from fixed
time data and the Add Temporal Data wizard asks you for
the two components described above. The wizard,
however, uses the terms input feature class and input
table to define how and where the data is stored and the
http://www.ijettjournal.org
Page 4395
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
two feature class and the table must reside in the same geodatabase. The input feature class always contains at least
the geographic features and the ID for the data you are
adding and its other contents depend on whether you're
adding a dynamic or stationary event. If any dynamic input
feature class will contain the dates and times of
observations but not static attributes. The input feature
class will contain the object's static attributes but not the
dates and times of observations. On the same way input
table will contain at least the ID and attribute information
and the complex event and input table will contain static
object information and the input table will contain dates
and times of observations.
Data constraint: It seems most straightforward to
store all events on a single sorted array regardless of type
and anyway for events that have the same time stamp and
scheme can create conflicts that mislead analysts and then
produce wrong results. Using one array for each event type
allows us to circumvent this problem and the main
frequency of an event sharing the same time stamp with
another depends on data sets. Sample clinical data we have
been supplied. The proportion of events that have the same
time stamp can range from less than 0.5 percent to almost
50 percent. We assume events that have the same type and
the same time stamp are the same event in order to merge
events that are in fact the same but come from two data
resource and this assumption is practical and reasonable for
personal records and may not apply to all temporal event
data.
2. Drawing constraint. Lifelines2 maintains a drawing
order of events by event types and events have the same
type are drawn. Lifelines2 maintains the z-order by event
types to avoid visual inconsistencies that can potentially
disrupt analytical tasks. The arrays are separated and it
would allow the drawing algorithm an efficient way to
access events of the same type.
3. Interface constraint. While searching for temporal
patterns is very important, it is not all that Lifelines2 does.
There are other operators designed for exploratory analysis
benefit from this separate arrays approach and useful to
analysts to hide event types. These interface features
involve finding event data of a specific type. The events are
classified into different arrays by type would allow
Lifelines2 to afford these features most efficiently. Regular
search algorithm;
Previous search algorithms involve backtracking
when a partially successful search path fails. This leads and
gives a lot of storage and bookkeeping and then executes
slowly. In the regular expression recognition technique
described each character in the text to be searched is
examined in sequence against a list of all possible current
characters. The examination a new list of all possible next
characters is built. The end of the current list is reached, the
new list becomes the current list and the next character is
obtained until the process continues. In the terms of
Brzozowski [1], this algorithm continually takes the left
derivative of the given regular expression with respect to
ISSN: 2231-5381
the text to be searched. The algorithm’s nature is very and
it makes it extremely fast.
The Implementation The specific implementation
of this algorithm is a compiler that translates a regular
expression into IBM 7094 code. The code which is
compiled along with certain runtime routines that accept
the text to be searched as input and finds all substrings in
the text that match the regular expression. The compiling
phase of the implementation does not detract from the
overall speed since any search routine must translate the
input regular expression into some sort of machine
accessible form.
In the compiled code, the lists mentioned in the
algorithm are not characters, but transfer instructions into
the compiled code. The fast execution and since a transfer
to the top of the current list automatically searches for all
possible sequel characters in the regular expression.
This compile-search algorithm is incorporated as the
context search in a time-sharing text editor. It happens by
no means the only use of such a search routine. Take an
example that a variant of this algorithm is used as the
symbol table search in an assembler.
III. PROPOSED APPROACH
Things get interesting and when we need to
record the history of the changes. We want to know the
state of the world; we want to know the state of the world
six months ago. Worse we may want to know what two
months ago we thought the state of the world six months
ago was. These queries lead us into a fascinating ground of
temporal patterns and which are all to do with organizing
objects that allow us to find answers to these questions
easily and without completely tangling up our domain
model. Of all the challenges of object modeling and both is
one of the most common and most complicated.
The simplest way to solve this problem is to use
an Audit Log. We concerned with keeping a record of
changes and don't expect to go back and use it very often.
So you want it to be easy to create and be minimally
intrusive upon your work. When someone needs to look at
it and you can expect they will have to do a lot of work to
dig out the information and don't need the resulting
information quickly, and then this is fine. Indeed if you're
using a database and it is free.
Below our proposed algorithm follows:
First the system takes record and temporal pattern.
It indexes the record. And each array includes events of the
same type and the access by the record. It also have time
stamp and type.
All the patterns are stored in an array Each item
includes an event and a temporary storage to maintain the
inverse of the item. It searches for the patterns and if
matched any where it stores the location and the matched
pattern. If in the case of absence pattern it will finds the
http://www.ijettjournal.org
Page 4396
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
next absence event and then checks to see if that absence
event occurs between the previous presence item match and
the next presence item match and then a constraint is
violated and the algorithm backtracks. Backtracking means
Temporal Pattern Search tries to look for an alternative to
one or more of its previously made matches. The algorithm
increments the pattern search and the processing time.
When the backtracking occurs and the temporal pattern
search roll back the operations means to the previous
search.
Algorithm:
B  backtrack flag
MT[]matched times
x current index
T current time
D last pos time
While(x<pattern. length)
if(p is negative)
check absence of record and pattern
else
check presence of record and pattern
Checking the presence of record and pattern:
For matching we use new method , that is shown
below:
This paper will give a new pattern matching
algorithm on the basis of the fixed window. The size of
window fixed is 2m -1 each match starts from the Search
outset position of each window and create a new structure
of the algorithm. After having matched the Search outset
position scan the prefix of the pattern from beginning of
the pattern if matched fully and then scan the suffix of the
pattern from end of the pattern. This will be able to make
full use of the nature of the pattern so as to ensure the
algorithm may partition simply and match not leaking data.
Analysis shows that the worst-base the best time
complexity of the algorithm in theory is respectively the
best result O(n) and O(n / m). But when the pattern is
longer algorithm is better than the current algorithms with
the alphabet growing is similar.
1) The seat shifted table
Use the parallel technology for the establishment
of a chain. The establishing rules of the seat shifted table
are as follows:
a) Handling alphabet
According to alphabet size, definite first level size
of the seat shifted table. Assuming that the size of an
alphabet is to SIZE and then the size of the first level is to
SIZE. Each character uses its value of the decimal base
corresponding with its ASCII to mark the first level of the
position. For example in figure 1 the first level is located
between the red lines the character ' A' because its ASCII
is 65 so in the first level and it is in the 65th.
ISSN: 2231-5381
C
0
40
0
0
65
1
6
1
89
1
2
5
99
0
7
97
1
b) Handling the Pattern
Mark the location of the characters in the pattern
from left to right and then the positions of each character
which appears in the pattern string according to the
decrease order in turn enter the position which is indicated
with its ASCII which would constitute a chain of other
levels for example between the Green Lines there is the
level 2 the yellow lines between there is the level 3.
c) Checking the characters in the pattern string or not and
mark
If a character in the pattern, it would be the
corresponding position defined as 1 if not the definition of
0 Figure 1 for example, because of the `A` in the pattern set
ASCII of `A` marks the position of 65 so it will mark the
65th to 1. ASCII character for 40 `@` does not appear in
the pattern, the marking of the position of 40 is 0. Through
this kind of indication when we want to inquire whether
there is some occurrence in the pattern of a character we
only need to inquire that the mark of the character in the
table is 1 or not.
2) Search starting position
a) It starts the Search with starting position defined in
special positions of the text: {km | 0 < k ≤ [n / m]}.
b) Matching window definition: Take Search starting
position as the center and take m−1 characters in it’s before
and after each to compose the windows in size to 1
Through this, a window of the m−1 characters of the latter
part is a window for the first half of the m−1 characters of
the next window, thus may guarantee after the partition, the
pattern string always falls in some window in every match,
and never omits the data, and also guarantees the algorithm
the accuracy. It will go to the next window.
c) The [n / m] th window possible only has n − m*[n / m]+
m characters, if not 2m −1, you can use a character not
belonging to the text string, such as "\ n" to fill complement
2m −1 symbol, and this will not affect the match.
3)Next Array
Using Next array, avoid that when there is no
matching the pattern will go to the back. The value of Next
array depends on their own characteristics nothing to do
with the text string. The establishing rules are as follows:
http://www.ijettjournal.org
Page 4397
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
We pre-treat the pattern P = p1 p2… pm in advance
and generate a function Next[i](0 < i < m+1) .when there is
not match in the i th by the time We can calculate in the
prefix p1 p2 p3…pi-1 whether there is a maximum of G,
Making p1..pG-1 established. If it exists there will
be Next[i] = G when matched next time, pattern can be
directly moved backward for i − Next[i] , and then we can
start the comparison from the Gth of pattern string. If it not
exists there will be Next[i] = 1.
C. Matching
Each match will start from the Search starting position and
use the seat shifted table and Next array. a) First examine
whether the mark in the seat shifted table of the k th (0 < k
≤ [n / m]) Search starting position is 1 or not if it is 0 go to
the (k+1) th Search starting position if 1 and It is said that
this character occurs in the pattern string therefore in the
second level of the seat shifted table we will find the first
position of the character in the pattern balance the string
pattern in the location of the first position of the character
with the text string in a position to the k th Search starting
position.
b) Match from the most left of the pattern if matched
completely before the Search starting position then match
from the most right of the pattern, if matched completely
after the Search starting position, this proves a match is
completed. Then jump the next Search starting position go
on.
c) If a match in a certain position failures, assuming the
position is i(0 ≤ i ≤ m) check Next[i] then check the seat
shifted table and find the next position in the pattern of the
character in the k,th Search starting position and calculate
the distance between the two positions assuming the value
is Distance Compared Distance with Next[i] size takes
bigger for the jump distance of the pattern.
If Next[i] larger first match the character in the Search
starting position. If marched, go on matching accordance
with the above and otherwise turn to c). Now if there is no
position of the character in the seat shifted table go to the
next Search starting position and turn to a).
The value range constraints, on the other hand
they can specify that the matching events must have values
within a certain range in order to be considered a match.
Take an example that physicians may look for patients who
had a heart attack followed by a heart surgery followed by
a systolic blood pressure reading greater than 140. More
complex value range constraints can involve higher
dimensional data and values relative to previously matched
items.
IV.CONCLUSION
In the proposed system the temporal search
algorithm that TPS utilizes binary searches over a set of
time-sorted event arrays and this is able to skip many
irrelevant events. We show that TPS saves significant
amount of time in comparison to NFA when there are
many event types, and that TPS is more easily extensible
than bit-parallel algorithms such as Shift-And. Finally, we
ISSN: 2231-5381
argue that using TPS in our application is a design success,
and other similar applications may benefit from TPS.
REFERENCES
[1] J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman,
“Efficient Pattern Matching over Event Streams,” Proc.
ACM SIGMOD Int’l Conf. Management of Data, pp. 147160, 2008.
[2] R.S. Boyer and J.S. Moore, “A Fast String-Searching
Algorithm,” Comm. ACM, vol. 20, no. 10, pp. 762-772,
1977.
[3] R. Cox, “Regular Expression Matching Can Be Simple
and Fast,” http://swtch.com/rsc/regexp/regexp1.html, 2007.
[4]
DataMontage,
http://www.stottlerhenke.com/datamontage/,
2011.
[5] A. Demers, J. Gehrke, M. Hong, M. Riedewald, and W.
White, “Towards Expressive Publish/Subscribe Systems,”
Proc. 10th Int’l Conf. Extending Database Technology
(EDBT), pp. 627-644, 2006.
[6] J. Fails, A. Karlson, L. Shahamat, and B. Shneiderman,
“A Visual Interface for Multivariate Temporal Data:
Finding Patterns of Events across Multiple Histories,”
Proc. IEEE Symp. Visual Analytics Science and
Technology (VAST ’06), pp. 167-174, 2006.
[7] D. Ficara, S. Giodano, G. Procissi, F. Vitucci, G.
Antichi, and A.D. Pietro, “An Improved DFA for Fast
Regular Expression Matching,” ACM SIGCOMM
Computer Comm. Rev., vol. 38, no. 5, pp. 29- 40, 2008.
[8] L. Harada and Y. Hotta, “Order Checking in a CPOE
Using Event Analyzer,” Proc. ACM Int’l Conf.
Information and Knowledge Management (CIKM), pp.
549-555, 2005.
[9] L. Harada, Y. Hotta, and T. Ohmori, “Detection of
Sequential
Patterns of Events for Supporting Business Intelligence
Solutions,” Proc. Int’l Database Eng. and Applications
Symp. (IDEAS ’04), pp. 475-479, 2004.
[10] J.E. Hopcroft, R. Motwani, and J.D. Ullman,
Introduction to
Automata Theory, Languages, and Computation. AddisonWesley, 2000.
[11] R.M. Karp and M.O. Rabin, “Efficient Randomized
Patter Matching Algorithms,” Technical Report TR-31-81,
Aiken Computation Laboratory, Harvard Univ., 1981.
[12] D.E. Knuth, J.H. Moris, and V.R. Pratt, “Fast Pattern
Matching in Strings,” SIAM J. Computing, vol. 6, no. 2,
pp. 323-350, 1977.
[13] S. Kumar, B. Chandrasekaran, J. Turner, and G.
Varghese, “Curing Regular Expressions Matching
Algorithms from Insomnia, Amnesia, and Acalculia,” Proc.
Third ACM/IEEE Symp. Architecture for Networking and
Comm., Systems (ANCS), pp. 155-164, 2007.
[14] H. Lam, D. Russell, D. Tang, and T. Munzner,
“Session Viewer: Visual Exploratory Analysis of Web
Session Logs,” Proc. IEEE Symp. Visual Analytics Science
and Technology (VAST ’07), pp. 147- 154, 2007.
http://www.ijettjournal.org
Page 4398
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
[15] S. Lam, “PatternFinder in Microsoft Amalga:
Temporal Query Formulation and Result Visualization in
Action,”http://
www.cs.umd.edu/hcil/patternFinderInAmalga/PatternFinde
rSHonorsPaper. pdf, 2011.
[16]
Microsoft
Amalga,
http://www.microsoft.com/amalga/, 2009.
[17] A. Møller “Regexp Library for Java,”
http://www.brics.dk/
automaton/, 2001.
[18] S. Murphy, M. Mendis, K. Hackett, R. Kuttan, W.
Pan, L. Phillips, V. Gainer, D. Berkowicz, J. Glaser, I.
Kohane, and H. Chueh, “Architecture of the Open-Source
Clinical Research Chart from Informatics for Integrating
Biology and the Bedside,” Proc. Am. Medical Informatics
Assoc. Ann. Symp. (AMIA ’07), pp. 548-552, 2007.
[19] G. Navarro, “Pattern Matching,” J. Applied Statistics,
vol. 31, no. 8, pp. 925-949, 2004.
[20] G. Navarro and M. Raffinot, “Fast and Flexible String
Matching by Combining Bit-Parallelism and Suffix
Automata,” ACM J. Experimental Algorithmics, vol. 5,
article
4,
Dec.
2000,
http://
doi.acm.org/10.1145/351827.384246.
area of Interest includes Data Warehouse and Data Mining,
Embedded Systems and other advances in computer
Applications.
BIOGRAPHIES:
Mr.Siva Sankar Grandhi, completed the
B.Tech(CSE)in Sri Sarathi Institute of
Engg. & Technology,Nuzvid, JNTUK, in
2010 and he is currently pursuing
M.Tech(Software
Engineering)
in
Avanthi’s St.Theressa Institute of
Engineering and Technology, Garividi,
Vizianagaram,JNTUniversity,Kakinada.
His
research
interests include Data Mining and Software Engineering.
Mr.
Srinivasu Varma Penumatsa,
currently working as an Associate
Professor in CSE Department , Avanthi’s
St.Theressa Institute of Engineering &
Technology, Garividi with 4 years of
experience. I have completed my
M.Tech(computer
science
and
Engineering) from Acharya Nagarjuna University in 2009.
His research areas include Data Mining and Network
Security.
Mr.Chinna babu Galinki , well known
excellent teacher Received M.Tech (CSE)
from Andhra university and working as
Associate
Professor
and
HOD,
Department
of
Computer
science
engineering, Avanthi’s St Theressa
inistitute of Engineering and Technology. He has 4 years of
teaching experience. To his credit couple of publications
both national and international conferences /journals . His
ISSN: 2231-5381
http://www.ijettjournal.org
Page 4399
Download