Project proposal.

advertisement
Asaf Meisels
Winter ’09 – Spring ’09
SENIOR PROJECT PROPOSAL
Problem
Databases are a vital way of storing large amounts of information and allowing us to view
this information in any way we specify. Probabilistic databases are a special type of database
that provide operations that are needed for probabilistic data, but that don’t exist in regular
databases. Probabilistic databases have been a hot topic in computer science lately, but there
is still research to be done. There are several implementations that exist, each with their own
advantages and disadvantages.
A Semi-structured Probabilistic Database allows data to be stored in the database without
having to follow a definite structure. The database is made up of Semi-structured
Probabilistic Objects (SPO). Each SPO contains four parts: context, variables, the
probability table, and conditionals. Besides that structure, SPOs may contain completely
different fields.
Evan Rosson’s XML implementation claims to use memory more efficiently and have a
faster execution time for some operations. In order to validate that this new implementation
is in fact more efficient, a Test Suite needs to be created, showing us that it is genuinely
faster.
Solution
A series of tests will need to be run on the database using both implementations filled with
“semi-random” data and compare the results in order to determine whether the new
implementation is more efficient than the existing system design. The information being
stored in the database for the test runs has to be generated randomly so the results of the tests
represent more than just a specific case. However, the data will have to be somewhat
structured so that we can test the database implementation under certain conditions.
The Test Suite will need to be complete because there are infinitely many possibilities for the
data stored in the database. I will need to test each operation several times with different test
cases to ensure that the results are genuine. The test cases will be created in the most efficient
way possible, covering all cases without overlapping.
Schedule
This project will take me two quarters to complete. For the first half of Winter ’09, I will be
doing background research to fully understand the requirements I need to fulfill. The
remainder of the quarter will be spent designing the XML Data Generator and the Test Suite.
All of Spring ’09 will be spent implementing the Test Suite so it provides us valuable data
for measuring the performance and the XML Data Generator so that it creates data based on
user-defined input.
Meeting Minimum Criteria
Independence
All of the work done to create the XML Data Generator as well as the Test Suite will be done
on my own.
Background Research
This project requires me to study Semi-structured Probabilistic Databases and how to manage
sets of SPOs (Semi-structured Probabilistic Objects) in an XML Database. I will also need to
do some research on the topic of testing databases to ensure that my Test Suite will actually
provide useful data.
Creativity
My XML Data Generator will need to be creative so that the data it generates for testing is
“semi-random”. On the other hand, the Test Suite will need to be similar to other Test Suites
so that the data it outputs is valuable. Currently there is no way to tell how efficient the new
implementation is compared to the existing system design.
Download