User Manual

advertisement

MSR User Manual

Program for Mixed Sequence Reader

Mixed Sequence Reader (MSR) for Analyzing DNA Sequences with Heterozygous

Base Calling Chromatography to Detect Genomic Variations

Chi-Neu Tsai 1 , Jang-Hau Lian 2 , Chun-Houh Chen 3, You-Jia Jhang 2 , Yan-Cun Lai 1 ,

Che-Wei Lin 1 , Chia-Lung Tsai 4 , Tzu-Hao Wang 4,5* and Yun-Shien Lee 2,5 *

1 Graduate Institutes of Clinical Medical Sciences, Chang Gung University

2 Department of Biotechnology, Ming Chuan University, Tao-Yuan, Taiwan

3 Institute of Statistical Science, Academia Sinica, Taipei, Taiwan

4 Department of Obstetrics and Gynecology, Lin-Kou Medical Center, Chang Gung

Memorial Hospital and Chang Gung University, Tao-Yuan, Taiwan

5 Genomic Medicine Research Core Laboratory, Chang Gung Memorial Hospital,

Tao-Yuan, Taiwan

*To whom correspondence should be addressed:

1. Tzu-Hao Wang, MD., PhD.

Department of Obstetrics and Gynecology,

Chang Gung Memorial Hospital and Chang Gung University

5 Fu-Hsing Street, Kwei-Shan, Tao-Yuan 333, Taiwan.

Tel: (+886)-3-3281200 ext. 5402, Fax: (+886)-3-3288252

E-mail: knoxtn@cgmh.org.tw

2. Yun-Shien Lee, PhD.

Department of Biotechnology, Ming Chuan University, Tao-Yuan, Taiwan

Tel: 886-3-3507001 ext 3354

Fax: 886-3-3288252

E-mail: bojack@mail.mcu.edu.tw

Note: PSTReader is an open source program which means you are free to use and modify it by your own interests. Also, you are ready to take the risk of errors and bugs inside the program.

1

Contents

1. Introduction 3

2. Installation 4

3. Parameters explanation 5

2

1. Introduction

Information on human genomic variations, including single nucleotide polymorphisms (SNPs), copy number variations (CNVs), variable number of tandem repeats (VNTR), and short tandem repeat (STR or microsatellite), is important for studying genomic evolution and related genomic variations in certain diseases.

However, detection of such genomic variations remains a challenge because of the lack of appropriate tools. Many commercial or web-based programs have been developed to detece SNP and insertion/deletion (Indel), but few can detect all types of variations by directly reading fluorescence chromatogram data generated by autosequencers. In this study, we have developed the Mixed Sequence Reader (MSR), a user friendly, web-based tool that can analyze heterozygous base-calling fluorescence chromatogram data. We defined the value of logarithm ratio of the fluorescence intensity (LRi) for each peak of chromatogram trace as the major intensity divided by the minor intensity. The LRi value is higher than 5 when there is only one homozygous signal in chromatogram trace, whereas the LRi value can be lower than 2 when heterozygous chromatogram signals are present. For the chromatogram trace with Indel and microsatellite sequences, the break-points between LRi >5 and LRi <2 are used to locate structural variations in the genome. The heterozygous sequences can be detected as structural variations by comparing them to the reference human genomic sequences. For the chromatography tracing derived by directly sequencing the PCR product of paralogous genes, all of the LRi values were found to be lower than homozygous signals. Furthermore, LRi values were directly proportional to the copy numbers of paralogous genes; such as  -defensin 4

(DEFB4). Accuracy of this program was validated by experimental results. We propose that Mixed Sequence Reader may be applied to identify Indel, the copy number of microsatellite markers, and paralogous genes.

In this user manual, we explained in detailed:

1. How to install MSR;

2. Execute processes;

3

2. Installation

1.

Download Mix_Sequences_Reader.ZIP, unzip it and save it to your disk. (e.g.: D:\MSR\).

2.

Launch Matlab. Change your matlab work directory into the program's folder.

(e.g.: D:\MSR\)

3.

To start MSR, just type 'MSR' in your matlab command window.

4

3. Execute processes

Mixed Sequence Reader (MSR) interface

The MSR program is used to read chromatography tracings and detect the position and sequence of variations. Analytical steps of MSR include:

(A). loading DNA Sequencing chromatography:

The chromatography of each DNA sequence was imported as .ab1 or .abi format (a standard format for ABI sequencer) and loaded on Mixed Sequence

Reader (as indicated as step1 in Fig. 1).

(B). defining Log ratio of intensity (LRi):

The value of LRi for each sequence was defined as the log ratio of the intensity (the major fluorescence intensity divided by the minor fluorescence intensity). If the sequence position contained only one base-calling, the value of LRi should be more than 5 because the major fluorescence intensity was much higher than the minor fluorescence signal (if any existed). If the DNA sequence contained heterozygous or mixed signals, the LRi was dramatically decreased to less than 2.5 (Fig 1A).

(C). mapping the variations of the genome:

We apply a “moving average smoother” to the LRi using the MATLAB “convn” function (Fig 1A, green line). The region with averaged LRi value >5 was used to locate structural variations in the human genome using the “BLASTALL” program ( http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/ ) with the

GRCh37 primary reference assembly, which were packaged in the MATLAB bioinformatics toolbox Version 3.4. The chromosomal number and physical position of the “BLAST” results are shown (Fig. 1D).

(D). finding the optimal combination of two haplo-sequences:

The DNA sequences with heterozygous or mixed signals were separated into two haplo-sequences. We predicted possible structural variations that contained deletions of one haplo-sequences with 1-4 nucleotides (Fig. 1B, shown by blue, red, cyan, green lines, respectively) from 20 nucleotides upstream to 40 nucleotides downstream of the variation site. When one haplo-sequence and predicted deleted (or inserted) haplo-sequences were aligned and contrasted, the correct rate between chromatography data and combined sequences could be calculated, the optimal structural variations located, and the ensuing results were displayed as maximum correct rates

(Fig. 1C). The output position and sequences of an indel are shown in Fig. 1D and Fig. 1E, respectively.

5

6

Download