Senior Design Project Topics

Senior Design Project Topics
Advisor: Yana Kortsarts or/and
For students: please feel free to contact me if you would like to discuss the project details
or/and to obtain more information about each project.
Topic 1: Annotation of DNA sequences
Automatic annotation of large amounts of genomic DNA sequence clearly is and will
continue to be a formidable challenge. This problem will be addressed properly only by
developing very efficient computational tools for initial sequence annotation, treating the
annotations as hypotheses, and testing and verifying them in the laboratory. The "If you
sequence it, the community will annotate it" approach is unlikely to produce desired
results, and new paradigms and possibly new organizational models will need to be
implemented to present genomic sequence in its most useful form.
DNA annotation includes:
Predicted open reading frames (ORFs) and genes
Predicted genetic elements, e.g., tRNAs, rRNAs, snRNsa, LTRs, CENs, ORIs,
ARSs, miscellaneous RNAs
Similarities with other sequences in major public DNA databases
The goal of the project is to develop a DNA sequence viewer and annotation tool that
allows visualization of sequence features and the results of analyses within the context of
the sequence.
Topic 2: Advanced Topics in Cryptology: The Past and Future of
Knapsack Cryptosystem (see links to the papers and Power Point
presentation on
The Merkle-Hellman knapsack cryptosystem is a public-key cryptosystem that was first
published in 1978. It is commonly referred to as the knapsack cryptosystem. It is based
on the subset sum problem in combinatorics. The problem involves selecting a number of
objects with given weights from a large set such that the sum of the weights is equal to a
pre-specified weight. This is considered to be a difficult problem to solve in general, but
certain special cases of the problem are relatively easy to solve, which serve as the
"trapdoor" of the system. The-single iteration knapsack cryptosystem introduced in 1978
was broken by Shamir. Merkle then published the multiple-iteration knapsack problem
which was broken by Brickell. Merkle offered a $100 reward for anybody able to crack
the single iteration knapsack and a $1000 reward for anybody able to crack the multiple
iteration cipher from his own pocket. When they were cracked, he promptly paid up.
The Chor-Rivest knapsack cryptosystem was first published in 1984, followed by a
revised version in 1988. It is the only knapsack-like cryptosystem that does not use
modular multiplication. It was also the only knapsack-like cryptosystem that was secure
for any extended period of time. Unfortunately, Schnorr and Hörner developed an attack
on the Chor-Rivest cryptosystem using improved lattice reduction which reduced to
hours the amount of time needed to crack the cryptosystem for certain parameter values
(though not for those recommended by Chor and Rivest). They also showed how the
attack can be extended to attack Damgård's knapsack hash function.
The goal of the project is to study different Knapsack Cryptosystems and attacks on
different Knapsack Cryptosystems, to conduct a comparative analysis of different
Knapsack Cryptosystems and to investigate if there is a future for Knapsack-kind
Topic 3: Poker over the phone
How might one play a game of cards, such a poker, over the phone?
The difficulty, of course is in dealing the cads.
In 1979 Shamir, Rivest, and Adleman proposed the following simple strategy that using
encryption functions.
 Players, we will call them Alice and Bob, jointly select 52 distinct messages to
represent the cards and a large prime p.
 Each player secretly choose an integer that satisfies a specific conditions and they
are using these integers to encrypt the cards.
 Alice (the dealer) encrypts the cards, change their order and sends the resulting
list to Bob.
 Bob selects five of the cards and returns them to Alice, she decrypts these for her
hand. Bob selects five of the remaining cards for his own hand, encrypt them
with his encryption function and sends the result to Alice
 The rest of the game proceeds in this fashion.
Following this scheme, may attempts have been made to achieve protocols that would
allow people to play “Mental Poker”. There are a several protocols based on public-key
The goal of the project is to study different protocols for the “Mental Poker”, and
implement the game, first for the simple case for five or ten cards and then for the 52
cards, using at least one of the known protocols.
Topic 4: The Bin Packing Problem
One-dimensional bin packing is a classic problem with many practical applications
related to minimization of space or time. It is a hard problem for which many different
heuristic solutions have been proposed. The goal is to pack a collection of objects into the
minimum number of fixed-size "bins".
Consider the following example, we have 5 items: A, B, C, D, E and F. Each item has a
specific weight: A:12, B:8, C:5, D:6, E:6, F:3. We have to place 5 given item into the two
given boxes so that maximum weight of articles in each box is 20 pounds.
How might we go about solving this problem? Perhaps we might start by choosing the
heaviest article and put it into one of the boxes, and then following an algorithm
Repeat for all unpacked boxes:
1. choose heaviest remaining item
2. place it in the lightest box
If you will run this algorithm you will observe the following output for the example that
is given above:
as you can see, item F is left behind.
Here is the solution for the given example:
There are many variants of the bin packing problem, such as, 3D, 2D, linear, pack by
volume, pack by weight, minimize volume, maximize value, fixed shape objects, etc.
The goal of the project is to study the approximation solutions for the bin packing
problem, to implement a couple of the approximation algorithms for the bin packing
problem and perform an analysis of the algorithmic strategies and efficiencies of the
approximation solutions.
Topic 5: Local majorities, coalitions and monopolies in graphs.
Some fault-tolerance aspects of computer/processor networks as well as the propagation
of infections/diseases or the influence of strong coalitions in a political landscape can be
modeled as dynamic majority vote systems.
For example, we can consider a well known distributed coloring game played on a simple
connected graph: initially each vertex is colored color1 or color2; at each round, each
vertex simultaneously recolor itself by the color of the simple (strong) majority of its
neighbors. A set of vertices M is said to be a dynamo, if starting the game with only
vertices of M colored color1, the computation eventually reaches an all-color1
The importance of this game follows from the fact that it models the spread of faults in
point-to-point systems with majority-based voting; in particular, dynamos correspond to
the sets of initial failures which will lead the entire system to fail.
The goal of the project is to study the problem and initial assignments of colors which
lead the system to a monochromic configuration in a finite number of steps, to implement
the game under certain conditions.
Topic 6: Graph Coloring Problem
Have you ever observed the traffic light pattern at intersections and calculated the
number of different light patterns? Or have you ever wondered if the final examinations
at a college can be scheduled so that no student would have a time conflict? Both
problems (and more other problems) can be modeled by graphs and solved by a technique
called graph coloring.
A coloring of a simple graph is an assignment of colors to its vertices, so no two adjacent
vertices are assigned the same color. The chromatic number of a graph is a minimum
number of colors needed for a coloring graph.
The goal of the project is to study different problems that can be modeled by graph
coloring and implement an algorithm for the specific problem, such that exam scheduling
that reduces to an algorithm for graph coloring.
Topic 7: Attacks on the Data Encryption Standard (DES)
In 1973, The National Bureau of Standards (NBS), later to become the National Institute
of Standards and Technology (NIST), issued a public request seeking a cryptographic
algorithm to become a national standard. IBM submitted an algorithm and after some
modification, DES was released.
DES has been used extensively in electronic commerce, for example in the banking
industry. If two banks want to exchange data, they first use a public key method to
transmit a key for DES, then they use DES fro transmitting the data. DES uses key of
effective length 56 bits which creates a possible 2^56 distinct keys. Testing each of these
keys until you find the right one is called a brute force attack.
The goal of the project is to study and implement the different kinds of attack on DES,
such as brute force attacks and key collision attack.
Topic 8: Vertex Cover Problem
Consider the following problem: suppose we need to guard a museum. We could model
the museum as follows: vertices are rooms and an edge between two vertices means that
those two rooms are connected. Now, assuming that a guard standing in a room has full
view of the adjacent rooms, find where we should place guards so that all rooms are
guarded at the least cost (i.e. fewest number of guards)
This problem is known as Vertex Cover problem. A vertex cover in a graph is a subset of
the verticies of the graph, chosen with the property that one of the endpoints of each edge
is in the subset. In the graph at right, {1,3,5,6} is an example of a vertex cover.
The vertex cover problem is the optimization problem of finding a vertex cover of
minimum size in a graph. The problem is a decision problem, so we wonder if a vertex
cover of size k exists in the graph.
The goal of the project is to study the Vertex Cover problem along with applications and
approximation algorithms, implement a couple of the algorithms and perform a
comparison analysis of the algorithmic techniques and efficiencies.
See page 16 in this book:
Topic 9: Project in Mathematical Biology: Reaction Diffusion Equations and
Animal Coat Patterns
The importance of mathematics in certain fields such as physics has been known for a
long time but recently its power has also been discovered in biology. Some phenomena
that one believed to be owing to chance or to the action of genes are revealed to be the
consequence of mathematical dynamics. Perhaps the most spectacular example is that of
the pattern on the coats of animals.
Why are the coats of certain types of animals such as the leopard spotted, whereas the
coats of others are striped (tiger, zebra). Why are the spots of the giraffe much bigger
than and different from those of the leopard? Why do certain animals, such as the mouse
and elephant not have a pattern? Why do some animals, such as the cheetah, jaguar,
leopard and genet, have spotted bodies and striped tails, but there are no animals with
striped bodies and spotted tails?
All of these questions now have a mathematical answer. The model in question describes
the way in which two different chemical products react and are propagated on the skin :
one which colors the skin, and one which does not color it; or more precisely, one which
stimulates the production of melanin (coloring the skin) and one which inhibits this
What is remarkable is that the equations show that the different patterns of coat depend
only on the size and form of the region where they are developed. Stated in another way,
the same basic equation explains all of the patterns. But then, why do the tiger and the
leopard have different patterns given that their bodies are similar? This is because the
formation of the patterns would not be produced at the same moment during the growth
of the embryo. In the first instance, the embryo would be still small and in the other, it
would be at a much bigger stage.
More precisely, the equations show that no pattern is formed if the embryo is very small,
that a stripped pattern is formed if the embryo is a little bigger, a spotted pattern if it is
bigger yet, and no pattern at all if it is too big. This is why the mouse and the elephant
would not have a pattern.
What's more, as to comparable surfaces, the form of the surface makes a difference. Thus,
if one considers a certain surface large enough in order to permit the formation of spots,
and that one gives it a long, cylindrical form (like a tail) without altering its total area,
then the spots are transformed into stripes.
In this way, a unique system of differential equations seems to govern all the coat
patterns that one finds in nature. The same type of equations also permits one to explain
the patterns of the wings of butterflies as well as certain colored patterns of exotic fish.
Let's mention nevertheless that the process of chemical diffusion which we have just
mentioned (called a diffusion-reaction mechanism) have not yet been directly observed
on the skin of animals, although certain indirect evidence seems to confirm their
presence. The chemical substances in question would be actually found in the epidermis
or just below, and it is very difficult to detect them experimentally. For the moment, then,
this model remains a model, although considerable circumstantial evidence seems to
confirm it. In any case, that such a model succeeds in explaining almost all the diversity
and richness of the patterns occurring on animals is surely a sign that it contains a good
deal of truth.
The goal of the project: understand and analyze the reaction-diffusion model for animal
skin patterns. Perform a numerical simulation of the model.
Topic 9: Algorithm and Data Structures Visualization
The project deals with developing visualizations for introductory algorithms and data
Visualization helps to gain insights into algorithms and data structures. Visualization of
algorithms and data structures can be used for several purposes including learning,
debugging and optimization.
The project deals also with exploring ideas for integrating algorithm and data structures
visualization into early computer science curriculum and developing didactic concepts for
teaching introductory algorithms and data structures in early computer science curriculum
using visualizations.
The ideas for developing visualizations for advanced algorithms and data structures in
different areas of computer science will be investigated.
Topic 10: Bioinformatics Algorithms Visualization
The project deals with developing visualizations for bioinformatics algorithms. First step
will be to explore dynamic programming algorithms for local, global and semi-local
sequence alignment and multiple sequences alignment. The next step will be exploring
graph algorithms.
Topic 11: Exploring Steganography
The word steganography literally means covered writing as derived from Greek. It
includes a vast array of methods of secret communications that conceal the very existence
of the message. Among these methods are invisible inks, microdots, character
arrangement (other than the cryptographic methods of permutation and substitution),
digital signatures, covert channels and spread-spectrum communications. Steganography
can be defined as "the art of concealing the existence of information within seemingly
innocuous carriers. Steganography, in an essence, 'camouflages' a message to hide its
existence and make it seem 'invisible' thus concealing the fact that a message is being
sent altogether. An encrypted message may draw suspicion while an invisible message
will not." (Johnson, N. F., Steganography,
Steganography is the art of passing information in a manner that the very existence of the
message is unknown. The goal of steganography is to avoid drawing suspicion to the
transmission of a hidden message. If suspicion is raised, then this goal is defeated.
Steganalysis is the art of discovering and rendering useless such covert messages.
The goal of the project is to explore different steganography and steganalysis techniques
and algorithms and Java implementation of these techniques and algorithms.
Please, read the attached articles for more information.