791H Senior Project Proposal Image Filtering and Enhancement of Scanning Transmission Electron Microscope Images Submitted for Review to: Dr. Tom Miller Submitted by: Nathan P. Brouwer University of New Hampshire College of Engineering and Physical Sciences Department of Electrical and Computer Engineering 55 Edgewood St. Durham, New Hampshire 03824 Created: October 10, 2010 REVISED: October 25, 2010 REV: Final 1 Table of Contents Table of Contents .......................................................................................................... 2 1 Abstract...................................................................................................................... 3 2 Project History and Definition ................................................................................... 3 2.1 Background........................................................................................... 3 2.2 Problem................................................................................................. 6 2.3 Project Objective................................................................................... 7 3 Methodology .............................................................................................................. 7 3.1 Three Phase Iterative Approach .......................................................... 7 4 Significance/Implications......................................................................................... 11 5 Personal Outcome .................................................................................................. 11 6 Location ................................................................................................................... 12 7 Preparation/Experience .......................................................................................... 12 8 Time Table .............................................................................................................. 13 9 Appendices.............................................................................................................. 14 9.1 Timeline for Project ............................................................................. 14 9.2 Budget Explanation ............................................................................ 14 9.3 References.......................................................................................... 15 2 1 Abstract ZSGenetics uses a scanning transmission electron microscope (STEM) to perform the direct imaging of Deoxyribonucleic acids (DNA) for research purposes. Because of the high magnification and the way images are formed, current images from this process are unclear and difficult to analyze directly. The proposed solution is to construct and implement a variety of image processing algorithms to improve and enhance the quality of DNA images and better enable the extraction of information that can be utilized by the scientists at ZSGenetics. The project will result in a graphical user interface (GUI) that can be used by researchers to process these images and make analyses much quicker and more accurate. This is novel work in an emerging engineering field with great potential for publication at its conclusion. 2 2.1 Project History and Definition Background Since the invention of the electron microscope (EM), scientists have dreamed of using it to determine the sequence of DNA that is essential to understanding its role as the “code of life”. DNA is the genetic information necessary for the development and functioning of all living organisms. DNA is synthesized by the body as two long polymers of simple repeating units called nucleotides that are attached by hydrogen bonds to form a double stranded helix. Each nucleotide consists of a nitrogenous base, a simple 5-carbon sugar called deoxyribose, and a phosphate group (PO42-). There are four potential nitrogenous bases in DNA; adenine, guanine, cytosine, and thymine. It is the pattern of these four bases that 3 determine the identity, features, and all biological processes of the organism by encoding for the amino acid sequence of every protein in the body. It is also this pattern that is determined during sequencing using the STEM technique (Robinson). The full sequence of these bases is unique to the individual and is the true “fingerprint” for organisms that can provide insight into its characteristics and functional capabilities. Significant work has been done to understand and sequence DNA, but there are still many mysteries associated with this process and the molecule itself. A deeper understanding could be useful to cure diseases and other genetic defects, as well as further other areas of research such as cloning. Using a variety of costly and time intensive techniques, scientists have discovered properties of the structure and how to sequence DNA, without directly viewing a sample at the atomic level. If there were a way to image DNA, it would be possible to use the images to sequence the DNA without many of the painstaking processes that are currently used and with much greater accuracy. In the past, scientists and engineers have faced numerous difficulties in the direct imaging of DNA using electron microscopes which are the only instruments with a strong enough magnification. The two main limiting factors currently impeding this technique are high resolution at high magnification and contrast of the resulting image at this high magnification. In the last several years, electron microscopes have made enormous technological advances that have increased their performance, namely by increasing the possible resolution to under .08nm. This new advancement allows scientists and engineers to view the building blocks of even the smallest particles. Since the average distance between base pairs is 4 .34nm, operating at close to ideal conditions, new electron microscopes have overcome the problem of resolution (Bell). The scanning transmission electron microscope (STEM) is a variety of EM that has recently been able to achieve such magnifications to produce the necessary resolution to view and sequence DNA. A STEM works by accelerating a high powered ionized beam of concentrated electrons down through a sample to a highly sensitive camera that will record the scattering of those electrons. The beam of electrons is raster scanned, line by line, across the sample and the camera records the intensity of the energy at every position of the beam, resulting in a two dimensional grid where each pixel is assigned an intensity value. When the beam is shot through a sample, the electrons are negatively charged and will therefore be deflected mainly by the magnetic forces caused by the dense positively charged nuclei in the sample. Due to particle wave duality, an excited electron acts like a wave, which reasons that collisions with electrons in the electron cloud of the sample to be negligible. Deflected electrons will not be detected by the sensor at the pixel the beam was shot from. The larger atoms will incur a greater number of deflections, and therefore a smaller intensity. On a graph of intensity values, large atoms will appear as dark spots and smaller atoms will not be distinguishable due to the inevitable scatter noise. Once the problem of resolution is overcome, the problem of contrast still presents itself. DNA is a very “light” molecule on the atomic scale, meaning the atoms have relatively low atomic numbers, and therefore, small nuclei. The main elements that make up DNA include; Hydrogen, Carbon, Nitrogen, Oxygen, and Phosphorus, with an average atomic number of about 5.5. Simply, the sizes of 5 the nuclei are not large enough to cause a significant number of collisions to detect a perceptible difference. ZSGenetics is a biomedical company in Danvers, Massachusetts, working on imaging DNA with a STEM. They have devised a patented method to bind certain “heavier” atoms to distinct nucleotide pairs. These are called marker atoms because they have a large enough nucleus to be recognized and distinguished from the lighter background atoms by a sensor. If a large marker atom is bound to a specific nucleotide pair, it is possible to tell the exact positions in a given DNA sample where that nucleotide pair exists. This new phenomenon gives rise to a new possibility for sequencing DNA through an image. 2.2 Problem The problem the scientists are facing is that the pictures are very difficult to analyze because there is a large amount of cluttering information, or noise, that interferes with the ability to detect these DNA strands. The camera records the image on a grey scale with 256 shades of grey. Since the average human can only detect a few dozen shades of grey with the naked eye, it is severely difficult to accurately analyze these images. Due to this human limitation, these unprocessed images are virtually useless. With the power of computers and the advances in digital image processing, it is possible to gather improved data from the images that can prove useful for human interpretation. A large problem that is anticipated is inevitable noise distortion. When dealing with samples at the microscopic level, there is bound to be cluttering noise that 6 will interfere with the actual sample. This is a scatter problem that will be a major challenge to overcome. 2.3 Project Objective The project goal is to provide a solution to the DNA imaging problem by using image processing algorithms and filters to extract information and improve the images to a point where they can be useful to the scientists at ZSGenetics. Through a variety of algorithms, it is possible to overcome the scatter problem of noise, detect marker atoms, and calculate the distance between markers to determine the number of non-marked base pairs between markers. This project has the potential to turn into higher-level graduate work as a masters or even PhD project that may lead to common practice in industry of sequencing DNA automatically through imaging. This could have monumental effects on the medical community, by enhancing DNA research that searches to find cures for genetic diseases. 3 Methodology 3.1 Three Phase Iterative Approach This project will reach its objective using an iterative three-phase approach. The preliminary phase will consist of data definition and collection. This will include travelling to Danvers and Cambridge Massachusetts to receive additional data sets of still images and video sequences of DNA from ZSGenetics. I will personally be receiving certified training on how to safely use and operate the electron microscope at Harvard University. There will be meetings to learn from the scientists exactly what they are looking for and how they may want the image 7 enhanced in order to better comprehend the data set. Phase I will end with the compilation of pertinent data sets with a clear idea of what algorithms might produce the desired results. Some preliminary images from the ZSGenetics and Harvard STEMs have already been received. Examples of two types of raw images and the results from basic enhancements are shown below: Raw DNA Strand Image Enhanced DNA Image The enhanced image above is done using a very rudimentary algorithm called color space mapping (Gonzalez). The algorithms to be developed and applied by 8 this work will be significantly more complex and, hopefully, more revealing of the DNA structure. Raw DNA strand (Dark Field Imaging) The image above was taken using a new technique called dark field imaging. This technique does not utilize the STEM camera, instead a single concentrated beam of excited electrons is shot through the sample. There is a small metal donut shaped ring that is hit by all the deflected electrons. When the excited electrons hit the ring, it induces a current proportional to the amount of electrons deflected. The result is a map of two dimensional positions versus current (intensity), in which the brightest spots mark the largest atoms and will be the focus of our attention. This new technique is useful because it provides greater contrast compared to bright field imaging. There are a variety of algorithms and extraction techniques to reduce noise and produce an image that is more coherent for finding the marker atoms. Some basic techniques to attempt this include; thresholding, pseudo color mapping, and stretching the magnitude to a logarithmic scale. There are also some higher level algorithms that may be useful, such as, filtering through time and the maximum 9 entropy method, which estimates the probabilistic noise based on an array of constraints. Once the exact nature of the images that need to be improved is understood, the best combination of image processing algorithms will be determined and applied. Phase II will be primarily the application of any algorithms identified in Phase I to the data set and then modifying them with feedback from the experts at ZSGenetics. Phase III will be sending our processed images using the current combination of algorithms back to ZSGenetics for additional feedback. They will evaluate how successful the attempts were and offer suggestions of what needs to be done to the images for even better clarity. The iterative part of this project will be using the feedback from ZSGenetics to go back to the drawing board in order to further improve our process. A contact from ZSGenetics has already agreed to be involved in open communication with myself and Professor Messner’s laboratory, which is a necessity for the project. Figure 1 below shows graphically how this phased approach will flow. Research and Data Collection Algorithms and Testing Evaluation and Feedback of Algorithms by ZSGenetics Create Graphical User Interface for use after project completion Final Report and publication Figure 1: Project Flow The end result will be a set of tuned image processing routines and a graphical user interface (GUI) able to be used by scientists and engineers for DNA 10 research. We expect that our end results will be publishable and expect to submit our finding to an appropriate journal for publication. NOTE: The work done will be with the images and videos of DNA, but not the DNA itself. This project will have no interaction with any genetic material. 4 Significance/Implications This work on imaging DNA in order to identify the specific DNA sequence in a sample via a scanning electron microscope has never been done before. If the project is successful, it will provide scientists and medical researchers with a method to extract information directly from the images of DNA. The publication will be an intellectual contribution, which could have profound practical implications. This project may prove to be a direct aid to medical science, allowing the identification of DNA sequences much more precisely and efficiently. 5 Personal Outcome This project will hopefully result in publishable material that will be submitted to an appropriate journal. Such a publication at this stage in my career will help in my desire to perform graduate work. This project will dramatically increase my background in image processing and will lead to an interesting and major-related topic for a senior project. One major goal of this project is to set up a path for graduate school, by continuing this research after the completion of my undergraduate career. Additionally for personal interest, I hope to learn more about the DNA structure and how DNA is analyzed from the researchers at 11 ZSGenetics to gain experience that will make me a better candidate for jobs in multiple fields of engineering and science. 6 Location The principle location of the project work will be in Professor Messner’s Image Processing lab in Kingsbury S326, on the University of New Hampshire campus. Periodic trips to Danvers, MA and Cambridge, MA to exchange data and to get feedback with our work is essential for this project. ZSGenetics has a partner program at Harvard University from which we may be using additional data sets. Personal cars will be used and gas has been provided for in the proposed budget. ZSGenetics has already agreed to work cooperatively with us on this project as described above. 7 Preparation/Experience I have research experience that has prepared me for this senior project. Last summer, I participated in a 10-week undergraduate research program at Colorado State University in Fort Collins, Colorado. I worked on simulating radar data from the CHILL radar system in MATLAB. Here I gained exposure to the research environment and process. I have also taken related classes: ECE 633H and ECE 634 (Signals and Systems 1 and 2), which has given me an essential background on the topic to be researched. Also, I am currently enrolled in ECE714 (Digital Signal Processing), which provides the fundamentals of one dimensional processing. 12 To further my knowledge, I will be following along with the senior-level digital image processing course on two-dimensional processing, taught by professor Messner. All of the lecture slides are online, so I will review them weekly and occasionally meet with Professor Messner to discuss topics to ensure understanding. 8 Time Table Observe figure 2 under attachments for a Gantt chart that describes the timeline. The timetable for the project details the September until mid-April time frame. September and October have been mainly preliminary researching image processing algorithms and data collection. There is electron microscope training at Harvard University and imaging of DNA samples scheduled for November. November will begin the first session of algorithm implementation to see what is successful. In December we will meet again with ZSGenetics for feedback and to evaluate that the work being done is correct and useful. The start of the second semester is reserved for continued improvement of algorithms to solidify the best possible approach. By mid-February, we hope to be finishing up the algorithm testing and begin creating a graphical user interface to provide a way for ZSGenetics to use the algorithms in a standardized manner. Nearly a month is designated at the end of the semester for last minute alterations and, most importantly, the final report and publication of this research. In April, there is an Undergraduate Research Conference where this research will be presented. 13 9 Appendices 9.1 Timeline for Project Figure 2: Gantt chart timeline 9.2 Budget Supplies Travel Other Expenses Total Paper Flash Drives Durham-Danvers Durham-Cambridge Photo Copies Color Printing 8 GB 96 mi RT 138 mi RT 2 Reams $35.98 2 $39.98 2 Trips $48.00 3 Trips $103.50 250 $25 $252.46 Note: SURF grant has awarded $150 for budget 9.3 Budget Explanation A. Paper – This will cover the actual paper used for printing and calculations, as well as the cost of color printing. Any cost above the budgeted amount will be covered by the ECE department. B. Flash Drives- It is necessary to find an easy and universal way to transfer and store images. It will be much simpler to transport the images that will be much too large to send over email. C. Travel- It will be necessary for training and data collection in both Danvers, MA and Cambridge, MA. D. Photocopies- It will be necessary to reproduce many of the images created. Any cost above the budgeted amount will be covered by the ECE department. 14 9.4 References Bell, David C., Murtagh, Katelyn M., Dionne, Cheryl A., Glover, William R. Glover. Direct observation of single-atom DNA labels with annular dark-field electron microscopy. Submitted to Nature (2010). Gonzalez, Rafael C., Richard E. Woods. Digital Image Processing. Upper Saddle River, N.J.: Prentice Hall, 2008. Nakanishi, Nobuto. Kotaka, Tasutoshi. Yamazaki, Takashi. An expanded approach to noise reduction from high-resolution STEM images based on the maximum entropy method. Ultramicroscopy 106 (2006) 233-239. Robinson, Richard. DNA Structure and Function, History. Genetics (2003). 15