e-mail: Senior Design Project Topics Advisor: Yana Kortsarts yanako@cs.widener.edu or/and ykortsarts@mail.widener.edu For students: please feel free to contact me if you would like to discuss the project details or/and to obtain more information about each project. Topic 1: Annotation of DNA sequences Automatic annotation of large amounts of genomic DNA sequence clearly is and will continue to be a formidable challenge. This problem will be addressed properly only by developing very efficient computational tools for initial sequence annotation, treating the annotations as hypotheses, and testing and verifying them in the laboratory. The "If you sequence it, the community will annotate it" approach is unlikely to produce desired results, and new paradigms and possibly new organizational models will need to be implemented to present genomic sequence in its most useful form. DNA annotation includes: Predicted open reading frames (ORFs) and genes Predicted genetic elements, e.g., tRNAs, rRNAs, snRNsa, LTRs, CENs, ORIs, ARSs, miscellaneous RNAs Similarities with other sequences in major public DNA databases The goal of the project is to develop a DNA sequence viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence. Topic 2: Advanced Topics in Cryptology: The Past and Future of Knapsack Cryptosystem (see links to the papers and Power Point presentation on cs.widener.edu/~yanako) The Merkle-Hellman knapsack cryptosystem is a public-key cryptosystem that was first published in 1978. It is commonly referred to as the knapsack cryptosystem. It is based on the subset sum problem in combinatorics. The problem involves selecting a number of objects with given weights from a large set such that the sum of the weights is equal to a pre-specified weight. This is considered to be a difficult problem to solve in general, but certain special cases of the problem are relatively easy to solve, which serve as the "trapdoor" of the system. The-single iteration knapsack cryptosystem introduced in 1978 was broken by Shamir. Merkle then published the multiple-iteration knapsack problem which was broken by Brickell. Merkle offered a $100 reward for anybody able to crack the single iteration knapsack and a $1000 reward for anybody able to crack the multiple iteration cipher from his own pocket. When they were cracked, he promptly paid up. The Chor-Rivest knapsack cryptosystem was first published in 1984, followed by a revised version in 1988. It is the only knapsack-like cryptosystem that does not use modular multiplication. It was also the only knapsack-like cryptosystem that was secure for any extended period of time. Unfortunately, Schnorr and Hörner developed an attack on the Chor-Rivest cryptosystem using improved lattice reduction which reduced to hours the amount of time needed to crack the cryptosystem for certain parameter values (though not for those recommended by Chor and Rivest). They also showed how the attack can be extended to attack Damgård's knapsack hash function. The goal of the project is to study different Knapsack Cryptosystems and attacks on different Knapsack Cryptosystems, to conduct a comparative analysis of different Knapsack Cryptosystems and to investigate if there is a future for Knapsack-kind Ciphers. Topic 3: Poker over the phone Description: How might one play a game of cards, such a poker, over the phone? The difficulty, of course is in dealing the cads. In 1979 Shamir, Rivest, and Adleman proposed the following simple strategy that using encryption functions. Players, we will call them Alice and Bob, jointly select 52 distinct messages to represent the cards and a large prime p. Each player secretly choose an integer that satisfies a specific conditions and they are using these integers to encrypt the cards. Alice (the dealer) encrypts the cards, change their order and sends the resulting list to Bob. Bob selects five of the cards and returns them to Alice, she decrypts these for her hand. Bob selects five of the remaining cards for his own hand, encrypt them with his encryption function and sends the result to Alice The rest of the game proceeds in this fashion. Following this scheme, may attempts have been made to achieve protocols that would allow people to play “Mental Poker”. There are a several protocols based on public-key cryptography. The goal of the project is to study different protocols for the “Mental Poker”, and implement the game, first for the simple case for five or ten cards and then for the 52 cards, using at least one of the known protocols. Topic 4: The Bin Packing Problem Description: One-dimensional bin packing is a classic problem with many practical applications related to minimization of space or time. It is a hard problem for which many different heuristic solutions have been proposed. The goal is to pack a collection of objects into the minimum number of fixed-size "bins". Consider the following example, we have 5 items: A, B, C, D, E and F. Each item has a specific weight: A:12, B:8, C:5, D:6, E:6, F:3. We have to place 5 given item into the two given boxes so that maximum weight of articles in each box is 20 pounds. How might we go about solving this problem? Perhaps we might start by choosing the heaviest article and put it into one of the boxes, and then following an algorithm Repeat for all unpacked boxes: 1. choose heaviest remaining item 2. place it in the lightest box If you will run this algorithm you will observe the following output for the example that is given above: as you can see, item F is left behind. Here is the solution for the given example: There are many variants of the bin packing problem, such as, 3D, 2D, linear, pack by volume, pack by weight, minimize volume, maximize value, fixed shape objects, etc. The goal of the project is to study the approximation solutions for the bin packing problem, to implement a couple of the approximation algorithms for the bin packing problem and perform an analysis of the algorithmic strategies and efficiencies of the approximation solutions. Topic 5: Local majorities, coalitions and monopolies in graphs. Description: Some fault-tolerance aspects of computer/processor networks as well as the propagation of infections/diseases or the influence of strong coalitions in a political landscape can be modeled as dynamic majority vote systems. For example, we can consider a well known distributed coloring game played on a simple connected graph: initially each vertex is colored color1 or color2; at each round, each vertex simultaneously recolor itself by the color of the simple (strong) majority of its neighbors. A set of vertices M is said to be a dynamo, if starting the game with only vertices of M colored color1, the computation eventually reaches an all-color1 configuration. The importance of this game follows from the fact that it models the spread of faults in point-to-point systems with majority-based voting; in particular, dynamos correspond to the sets of initial failures which will lead the entire system to fail. The goal of the project is to study the problem and initial assignments of colors which lead the system to a monochromic configuration in a finite number of steps, to implement the game under certain conditions. Topic 6: Graph Coloring Problem Description: Have you ever observed the traffic light pattern at intersections and calculated the number of different light patterns? Or have you ever wondered if the final examinations at a college can be scheduled so that no student would have a time conflict? Both problems (and more other problems) can be modeled by graphs and solved by a technique called graph coloring. A coloring of a simple graph is an assignment of colors to its vertices, so no two adjacent vertices are assigned the same color. The chromatic number of a graph is a minimum number of colors needed for a coloring graph. The goal of the project is to study different problems that can be modeled by graph coloring and implement an algorithm for the specific problem, such that exam scheduling that reduces to an algorithm for graph coloring. Topic 7: Attacks on the Data Encryption Standard (DES) Description: In 1973, The National Bureau of Standards (NBS), later to become the National Institute of Standards and Technology (NIST), issued a public request seeking a cryptographic algorithm to become a national standard. IBM submitted an algorithm and after some modification, DES was released. DES has been used extensively in electronic commerce, for example in the banking industry. If two banks want to exchange data, they first use a public key method to transmit a key for DES, then they use DES fro transmitting the data. DES uses key of effective length 56 bits which creates a possible 2^56 distinct keys. Testing each of these keys until you find the right one is called a brute force attack. The goal of the project is to study and implement the different kinds of attack on DES, such as brute force attacks and key collision attack. Topic 8: Vertex Cover Problem Description: Consider the following problem: suppose we need to guard a museum. We could model the museum as follows: vertices are rooms and an edge between two vertices means that those two rooms are connected. Now, assuming that a guard standing in a room has full view of the adjacent rooms, find where we should place guards so that all rooms are guarded at the least cost (i.e. fewest number of guards) This problem is known as Vertex Cover problem. A vertex cover in a graph is a subset of the verticies of the graph, chosen with the property that one of the endpoints of each edge is in the subset. In the graph at right, {1,3,5,6} is an example of a vertex cover. The vertex cover problem is the optimization problem of finding a vertex cover of minimum size in a graph. The problem is a decision problem, so we wonder if a vertex cover of size k exists in the graph. The goal of the project is to study the Vertex Cover problem along with applications and approximation algorithms, implement a couple of the algorithms and perform a comparison analysis of the algorithmic techniques and efficiencies. References: See page 16 in this book: http://www.designofapproxalgs.com/book.pdf http://domino.research.ibm.com/comm/wwwr_thinkresearch.nsf/pages/antivirus496.html http://www.youtube.com/watch?v=RD7Oaw0H-W4 http://www.cs.sunysb.edu/~algorith/files/vertex-cover.shtml http://www.cs.uiuc.edu/class/sp09/cs598csc/Lectures/lecture_3.pdf Topic 9: Project in Mathematical Biology: Reaction Diffusion Equations and Animal Coat Patterns The importance of mathematics in certain fields such as physics has been known for a long time but recently its power has also been discovered in biology. Some phenomena that one believed to be owing to chance or to the action of genes are revealed to be the consequence of mathematical dynamics. Perhaps the most spectacular example is that of the pattern on the coats of animals. Why are the coats of certain types of animals such as the leopard spotted, whereas the coats of others are striped (tiger, zebra). Why are the spots of the giraffe much bigger than and different from those of the leopard? Why do certain animals, such as the mouse and elephant not have a pattern? Why do some animals, such as the cheetah, jaguar, leopard and genet, have spotted bodies and striped tails, but there are no animals with striped bodies and spotted tails? All of these questions now have a mathematical answer. The model in question describes the way in which two different chemical products react and are propagated on the skin : one which colors the skin, and one which does not color it; or more precisely, one which stimulates the production of melanin (coloring the skin) and one which inhibits this production. What is remarkable is that the equations show that the different patterns of coat depend only on the size and form of the region where they are developed. Stated in another way, the same basic equation explains all of the patterns. But then, why do the tiger and the leopard have different patterns given that their bodies are similar? This is because the formation of the patterns would not be produced at the same moment during the growth of the embryo. In the first instance, the embryo would be still small and in the other, it would be at a much bigger stage. More precisely, the equations show that no pattern is formed if the embryo is very small, that a stripped pattern is formed if the embryo is a little bigger, a spotted pattern if it is bigger yet, and no pattern at all if it is too big. This is why the mouse and the elephant would not have a pattern. What's more, as to comparable surfaces, the form of the surface makes a difference. Thus, if one considers a certain surface large enough in order to permit the formation of spots, and that one gives it a long, cylindrical form (like a tail) without altering its total area, then the spots are transformed into stripes. In this way, a unique system of differential equations seems to govern all the coat patterns that one finds in nature. The same type of equations also permits one to explain the patterns of the wings of butterflies as well as certain colored patterns of exotic fish. Let's mention nevertheless that the process of chemical diffusion which we have just mentioned (called a diffusion-reaction mechanism) have not yet been directly observed on the skin of animals, although certain indirect evidence seems to confirm their presence. The chemical substances in question would be actually found in the epidermis or just below, and it is very difficult to detect them experimentally. For the moment, then, this model remains a model, although considerable circumstantial evidence seems to confirm it. In any case, that such a model succeeds in explaining almost all the diversity and richness of the patterns occurring on animals is surely a sign that it contains a good deal of truth. The goal of the project: understand and analyze the reaction-diffusion model for animal skin patterns. Perform a numerical simulation of the model. Topic 9: Algorithm and Data Structures Visualization The project deals with developing visualizations for introductory algorithms and data structures. Visualization helps to gain insights into algorithms and data structures. Visualization of algorithms and data structures can be used for several purposes including learning, debugging and optimization. The project deals also with exploring ideas for integrating algorithm and data structures visualization into early computer science curriculum and developing didactic concepts for teaching introductory algorithms and data structures in early computer science curriculum using visualizations. The ideas for developing visualizations for advanced algorithms and data structures in different areas of computer science will be investigated. Topic 10: Bioinformatics Algorithms Visualization The project deals with developing visualizations for bioinformatics algorithms. First step will be to explore dynamic programming algorithms for local, global and semi-local sequence alignment and multiple sequences alignment. The next step will be exploring graph algorithms. Topic 11: Exploring Steganography The word steganography literally means covered writing as derived from Greek. It includes a vast array of methods of secret communications that conceal the very existence of the message. Among these methods are invisible inks, microdots, character arrangement (other than the cryptographic methods of permutation and substitution), digital signatures, covert channels and spread-spectrum communications. Steganography can be defined as "the art of concealing the existence of information within seemingly innocuous carriers. Steganography, in an essence, 'camouflages' a message to hide its existence and make it seem 'invisible' thus concealing the fact that a message is being sent altogether. An encrypted message may draw suspicion while an invisible message will not." (Johnson, N. F., Steganography, http://www.jjtc.com/stegdoc/steg1995.html) Steganography is the art of passing information in a manner that the very existence of the message is unknown. The goal of steganography is to avoid drawing suspicion to the transmission of a hidden message. If suspicion is raised, then this goal is defeated. Steganalysis is the art of discovering and rendering useless such covert messages. The goal of the project is to explore different steganography and steganalysis techniques and algorithms and Java implementation of these techniques and algorithms. Please, read the attached articles for more information.