791H Senior Project Progress Report Image Filtering and Enhancement of Scanning

advertisement
791H Senior Project Progress Report
Image Filtering and Enhancement of Scanning
Transmission Electron Microscope Images
Submitted for Review to:
Dr. Tom Miller
Submitted by:
Nathan P. Brouwer
University of New Hampshire
College of Engineering and Physical Sciences
Department of Electrical and Computer Engineering
55 Edgewood St.
Durham, New Hampshire 03824
Created: December 12, 2010
REVISED: December 17, 2010
REV: FINAL
1
Table of Contents
Table of Contents .......................................................................................................... 2
1 Abstract...................................................................................................................... 3
2 Project History and Definition ................................................................................... 3
2.1
Background.......................................... Error! Bookmark not defined.
2.2
Problem................................................ Error! Bookmark not defined.
2.3
Project Objective................................................................................... 4
3 Methodology ............................................................ Error! Bookmark not defined.
3.1
Three Phase Iterative Approach .......................................................... 4
4 Significance/Implications......................................................................................... 10
5 Personal Outcome .................................................. Error! Bookmark not defined.
6 Location ................................................................... Error! Bookmark not defined.
7 Preparation/Experience .......................................... Error! Bookmark not defined.
8 Time Table .............................................................................................................. 10
9 Appendices.............................................................................................................. 11
9.1
Timeline for Project ............................................................................. 11
9.2
Budget Explanation ............................................................................ 11
9.3
References.......................................................................................... 12
2
1
Abstract
ZSGenetics uses a scanning transmission electron microscope (STEM) to
perform the direct imaging of Deoxyribonucleic acids (DNA) for research
purposes. Because of the high magnification and the way images are formed,
current images from this process are unclear and difficult to analyze directly. The
proposed solution is to construct and implement a variety of image processing
algorithms to improve and enhance the quality of DNA images and better enable
the extraction of information that can be utilized by the scientists at ZSGenetics.
The project will result in a graphical user interface (GUI) that can be used by
researchers to process these images and make analyses much quicker and more
accurate. This is novel work in an emerging engineering field with great potential
for publication at its conclusion.
2
2.1
Project Definition and Objectives
Project Definition
The problem the scientists are facing is that the pictures are very difficult to
analyze because there is a large amount of cluttering information, or noise that
interferes with the ability to detect these DNA strands. It is a scatter problem that
becomes largely statistical, and is the major challenge or overcome.
ZSGenetics has an innovative patented technique to bind larger atoms to certain
nucleotide pairs in DNA. While, due to contrast and resolution, it is still nearly
impossible to detect the DNA atoms directly, the problem has become modified to
find the location of the marker atoms. If the location of the marker atoms can be
determined, then the corresponding base pair is also known.
3
2.2
Project Objective
The project goal is to provide a solution to the DNA imaging problem by using
image processing algorithms and filters to extract information and improve the
images to a point where they can be useful to the scientists at ZSGenetics.
Through a variety of algorithms, it is possible to overcome the scatter problem of
noise, detect marker atoms, and calculate the distance between markers to
determine the number of non-marked base pairs between markers.
Prior to primary the objective, it is essential to be competent in recognizing the
DNA strand through pattern recognition software I develop. After that has been
accomplished, the main goal of this project is to detect the marker atoms in a
coherent automated algorithm. Only once the strands are detected and the
marker atoms identified, the distance between points can be calculated using the
pixels and orientation of the image.
3 Design Process and Implementation Plan
3.1
Three Phase Iterative Approach
This cyclical three phase approach includes data definition and collection,
algorithms and testing, and evaluation and feedback of algorithms.
The preliminary phase will consists of data definition and collection. This will
include travelling to Cambridge Massachusetts to receive additional data sets of
still images and video sequences of DNA from ZSGenetics. I will personally be
receiving certified training on how to safely use and operate the electron
microscope at Harvard University. There will be meetings to learn from the
scientists exactly what they are looking for and how they may want the image
4
enhanced in order to better comprehend the data set. Phase I will end with the
compilation of pertinent data sets with a clear idea of what algorithms might
produce the desired results.
Once the exact nature of the images that need to be improved is understood, the
best combination of image processing algorithms will be determined and applied.
Phase II will be primarily the application of any algorithms identified in Phase I to
the data set and then modifying them with feedback from the experts at
ZSGenetics. Evidence suggests that the scattering noise will inevitably lead to
statistical solutions. The design goal for step two will analyze statistical trends in
DNA images, and using pattern recognition algorithms, correctly find and crop the
DNA sequence according to those trends. Once the strand is found I will
implement a decision process to determine where each marker is. More research
will be done examining a reliable set of features to be used at criteria for the
marker decision.
Phase III will include additional feedback by sending our processed images using
the current combination of algorithms back to ZSGenetics for assessment. They
will use their expertise in the subject matter to evaluate how successful the
attempts were and offer suggestions of what needs to be done to the images for
even better clarity. The iterative part of this project will be using the feedback from
ZSGenetics to return to the drawing board in order to further improve our process.
William Glover from ZSGenetics has been working in open communication with
myself and Professor Messner’s laboratory, which is a necessity for the project.
Figure 1 below shows graphically how this phased approach will flow.
5
Research and
Data Collection
Algorithms and
Testing
Evaluation and
Feedback of
Algorithms by
ZSGenetics
Create Graphical
User Interface for
use after project
completion
Final Report and
publication
Figure 1: Project Flow
The end result will be a set of tuned image processing routines and a graphical
user interface (GUI) able to be used by scientists and engineers for DNA
research. We expect that our end results will be publishable and expect to submit
our finding to an appropriate journal for publication.
4 Progress
Phase I is a data collection and interpretation phase. ZSGenetics has acquired
and forwarded to us several batches of images to begin working with. Examples
of two types of raw images collected by ZSGenetics are shown below:
Raw DNA strand (Bright Field Imaging)
6
Raw DNA strand (Dark Field Imaging)
Bright field and dark field are two distinct types of data acquisition techniques
result in very different images. Bright field imaging effectively blasts the sample
with electrons and a sensor monitors the scattering of electrons through the
sample. Dark field imaging uses a scanned electron beam to raster scan the
sample with a concentrated beam of electrons. This builds the image pixel by
pixel. It is anticipated that the dark field will yield better images due to contrast.
This remains to be proven by processing on each type of acquired images. By
working on both image sets, I will be able to determine which type will yield the
best results. It is possible with both types of imaging to ascertain the location and
orientation of the strand of DNA under sufficient magnification and focus. The
effectiveness of algorithms to identify such features will be a deciding factor in
which types of images to further pursue.
In addition to data collection, I also travelled down to Harvard to attend an
electron microscope training session. I completed the first part of the electron
microscope training. This session dealt with facility safety training. I will be
scheduling the final part of training over break.
7
Currently, I am in working on phase II, the algorithm development phase. The first
approach was to develop an algorithm that would be able to find the DNA strand
by looking at local area statistics.
Quad tree decomposition is the technique I
pursued to implement. The theory behind the technique is to test the parent
image for certain criteria. If it does not meet the criteria, split the parent into 4
equally sized child blocks. Each of those blocks is then tested for the conditions,
and if they fail them they are iteratively broken down until the criteria are met.
Once conditions are fulfilled, the block ceases to be broken down further because
it is assumed that all children of that block will also meet those conditions.
Quad Tree Decomposition
In this 64X64 pixel image is a 5X6 object. Instead of having to test each pixel,
which would take 4096 computations, it would only take 84 computations to fully
detect this object’s location. This method is much faster and can be set up to use
any criteria to test each cell. The example uses a simple threshold (if cells do not
equal to zero, then split cells).
A problem I ran into was that simple thresholds would not suffice to detect the
image because of the random noise that appears. I devised a way to create a test
8
function, called by a function handle that can test each cell based on conditions.
The problem is I do not know the best conditions to test. For the time being, the
quad tree decomposition function was put aside so I can begin looking into the
statistics of the noise. If I can characterize the noise in a coherent statistical
manner, that could provide the conditions for the quad tree decomposition.
The next step entailed inspecting the statistics of the noise with and without the
DNA present. I determined that there is a distinct statistical distribution that exists
through the image. An example of a normalized histogram of the intensity
distributions is shown below:
Hisogram of Characterized Noise
Normalized Frequency of Occurance
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
50
100
Intensity Bins
150
Histogram of Scatter Noise
Knowing that there are statistical trends, I have been developing a user friendly
interface to calculate deviation, variance, and higher order moments on the two
dimensional data. By inspecting data sets with and without DNA, it should give us
a good insight into differences of trends. Using this theory, it should provide me a
method for creating conditions for quad tree decomposition. This would be
accomplishing our first objective, algorithmically detecting DNA sequences from
images.
9
Another technique we employed was linear mapping, which basically maps all the
values of an image to a stretched out scale. As shown in the histogram, there is
little to no information beyond bin 80. Therefore I can have 0-80 map to 0-255,
the maximum number of an unsigned 8 bit number. This preserves the
information as long as there are no values above 80 in the original image, as well
as allows us better resolution for visualization because the gray scale has been
expanded.
5 Significance/Implications
Being on the verge of the capability to detect a DNA sequence in software from a
.TIFF image puts me in good shape for completion of the project for next
semester. Overcoming the noise problem by characterizing it statistically is
publishable material that could be useful to others. This work on imaging DNA in
order to identify the specific DNA sequence in a sample via a scanning electron
microscope has never been done before. If the project is successful, it will provide
scientists and medical researchers with a method to extract information directly
from the images of DNA.
6 Time Table
Observe figure 2 under attachments for a Gantt chart that describes the timeline. I
am right on track, in the middle of the algorithm phase, while continually getting
feedback to push in the knowledgeable direction to get a result. October and
November was mostly collecting the images and brainstorming possible
approaches. December mostly consisted of writing algorithms. January and
February will be continued writing code with lots of feedback to determine the
10
best course of action. After that, I will be focusing on GUI implementation and
documentation for publication.
Nearly a month is designated at the end of the semester for last minute
alterations and, most importantly, the final report and publication of this research.
In April, there is an Undergraduate Research Conference where this research will
be presented.
7 Appendices
7.1
Timeline for Project
Figure 2: Gantt chart timeline
7.2
Budget
Supplies
Travel
Other Expenses
Total
Paper
Flash Drives
Durham-Danvers
Durham-Cambridge
Photo Copies
Color Printing
8 GB
96 mi RT
138 mi RT
2 Reams
$35.98
2
$39.98
2 Trips
$48.00
3 Trips
$103.50
250
$25
$252.46
Note: SURF grant has awarded $150 for budget
7.3
Budget Explanation
A. Paper – This will cover the actual paper used for printing and calculations, as well
as the cost of color printing. Any cost above the budgeted amount will be covered
by the ECE department.
11
B. Flash Drives- It is necessary to find an easy and universal way to transfer and
store images. It will be much simpler to transport the images that will be much too
large to send over email.
C. Travel- It will be necessary for training and data collection in both Danvers, MA
and Cambridge, MA.
D. Photocopies- It will be necessary to reproduce many of the images created. Any
cost above the budgeted amount will be covered by the ECE department.
7.4
References
Bell, David C., Murtagh, Katelyn M., Dionne, Cheryl A., Glover, William R. Glover.
Direct observation of single-atom DNA labels with annular dark-field electron
microscopy. Submitted to Nature (2010).
Gonzalez, Rafael C., Richard E. Woods. Digital Image Processing. Upper Saddle
River, N.J.: Prentice Hall, 2008.
Nakanishi, Nobuto. Kotaka, Tasutoshi. Yamazaki, Takashi. An expanded approach
to noise reduction from high-resolution STEM images based on the maximum
entropy method. Ultramicroscopy 106 (2006) 233-239.
Robinson, Richard. DNA Structure and Function, History. Genetics (2003).
Schalkoff, Robert. Pattern Recognition. New York, NY: John Wiley & Sons, Inc,
1992.
12
Download