vii TABLE OF CONTENT CHAPTER

advertisement
vii
TABLE OF CONTENT
CHAPTER
1
TITLE
PAGE
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENT
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENT
vii
LIST OF TABLE
x
LIST OF FIGURE
xi
LIST OF ABBREVIATION
xiii
LIST OF APPENDICES
xiv
PROJECT INTRODUCTION
1
1.1 Introduction
1
1.2 Background Problem
2
1.3 Problem Statement
3
1.4 Project Aim
4
1.5 Objective
4
1.6 Scope
5
1.7 Significant of the Study
5
1.8 Organization of the Report
6
viii
2
LITERATURE REVIEW
7
2.1 Introduction
7
2.2 Data Format
9
2.3 Microarray Analysis Process
10
2.3.1
Sharing of Microarray Data
11
2.3.2
Microarray Data Standardization
12
2.3.3
The End Product of Microarray Data Analysis
13
2.4 Enterprise Information Approach
13
2.5 Metadata
14
2.5.1
Function of Metadata
16
2.5.2
Structuring Metadata
16
2.5.3
Metadata Schema and Elements Set
17
2.5.4
Metadata for Dataset
18
2.5.5
Creating Metadata
18
2.6 Xml
2.6.1
19
DTD to validate Xml Data
19
2.7 Database Management System (DBMS)
21
2.8 Data Model use based on flow data
21
2.8.1
2.8.2
Relational Data Model
22
2.8.1.1 Notation of Relational Data Model
24
2.8.1.2
26
Limitation of Relational Data Model
Xml Data in Relational Database
27
2.8.2.1 Create Xml tree
28
2.8.2.2 The Storage of Xml Data
29
2.9 Data Repository
2.9.1
31
Data Warehouse
31
2.9.1.1 Drawback of Data Warehousing
32
2.9.1.2 Data Warehouse Metadata
32
2.9.2 Data Marts
33
2.9.3 Data Federated
33
2.9.3.1 Issues in Database Federation
2.10 Summary
34
36
ix
3
METHODOLOGY
37
3.1 Introduction
37
3.2 Operational Framework
37
3.3 Metadata for Data Integration
41
3.4 Metadata Framework for Biological Data
42
3.5 Accurate measure Query of protein secondary
.
44
Structure prediction process.
4
3.6 Summary
44
EXPERIMENTAL RESULT AND DISCUSSION
45
4.1 Introduction
45
4.2 Current process for protein secondary structure
46
prediction
4.3 Websites be used based on the query flow process
48
4.3.1 Motif Website
48
4.3.2 Prosite Database
50
4.3.3 Blast NCBI
52
4.3.4 PRINTS (DbBrowser)
55
4.3.5 PDB
57
4.4 Enterprise Based Data Model
61
4.5 Metadata framework for Integrate Data Model
61
4.5.1 System Overview
63
4.5.1.1 Convert Data to Xml
65
4.5.1.2 Create Xml schema
68
4.5.1.3 Create Relational Database on Xml
68
schema
4.5.1.4 The relational among table in the
70
relational database
4.5.1.5 Query Result
75
4.5.2
XML query
76
4.5.3
Data Store
79
4.6 Summary
80
x
5
CONCLUSION AND FUTURE WORK
81
5.1 Introduction
81
5.2 Summary Work
82
5.3 Achievement
83
5.4
83
Limitation
5.5 Future Work and Recommendation
84
5.6 Conclusion
85
REFERENCES
86
APPENDICES A-C
91-95
xi
LIST OF TABLE
TABLE NO
TITLE
PAGE
23
4.1
Schematic representation of different element of the
relation of table
The data block table with data block name
4.2
The entity Poly table
72
4.3
Entity poly category
72
4.4
Entity poly sequence
72
4.5
Atom site table
73
4.6
Entity poly sequence table
73
4.7
Structure reference table
74
4.8
Structure reference sequence table
74
4.9
Structure reference category
74
4.10
Structure reference sequence category
74
4.11
Atom site category
74
2.1
72
xii
LIST OF FIGURE
FIGURE NO
TITLE
PAGE
2.1
Example of flat file format
9
2.2
Show contains a portion of a sample protein in database
9
2.3
Document type definitions (DTD) which DTD can have
20
either an external potion or an internal portion or both
2.4
Basic component in the notation of the relational model
24
2.5
Diagram of a different tables and the overall relational
25
microarray database structure of the ArrayDB
2.6
Flow chart of from xml element to node of XML tree
28
2.7
Flow chart of insert data algorithm
30
3.1
Project operational framework
40
3.2
Metadata frameworks for biological data
43
4.1
The query flow of the PSSP process
46
4.2
The process for search “PEEL” motif sequence
48
4.3
Example for the selected database
49
4.4
The result show the number of motif found from the
49
selected libraries database
4.5
Example of the FASTA format
50
4.6
The example of the motif sequence search from the
51
prosite database
4.7
The result of the motif sequence search
51
xiii
4.8
The example of the motif sequence searching from the
52
BLAST website
4.9
The query id, description, molecule type, query, and
53
database name for the motif sequence “PEEL”.
4.10
Show the sequences producing significant alignments
53
4.11
The alignments of motif sequence with secondary
54
assignment
4.12
The search sequence query process
55
4.13
The result of the sequence fragment search
55
4.14a)
Seed alignment with 4 sequence”
56
4.14b)
The view alignment of query display sequence “PEEL”
56
4.15
The view structure of query display sequence “PEEL”
56
4.16
The interface for searching motif from PDB website.
57
4.17
The motif query structure hits from the PDB website
58
4.18
Example of detail description about motif query hits
58
4.19
The motif sequence identifies as “PEEL” found in the
59
currently displayed seqres sequence
4.20
The carbonic anhydrase 3d structure
59
4.21a)
The PDB file in textual file format
60
4.21b)
The example of PDB/XML file
60
4.22
Metadata frameworks for biological data
62
4.23
Overall systems overview
63
4.24
Workflows to transforms XML into relational database.
64
4.25a)
Flat file formats from Blast database
65
4.25b)
XML file format after transformation from the flat file
66
4.25c)
DTD for validate the XML file
66
4.25d)
Code file for validate XML file with the DTD
67
4.25e)
The XSLT data from the xml file which it’s only view the
67
necessary data that scientist need from the xml file
4.26a)
The XML schemas for PDB XML file
69
4.26b)
The SQL statement to create the table that corresponding
69
XML data
4.27
The relational among table in the relational database
71
xiv
4.28a)
The connection string statement and query statement
75
4.28b)
The table view for the searching result of motif sequence
76
4.28c)
The view result detail about datablock name 1BGC
76
4.29a)
The “motiftable.xquery” file
77
4.29b)
Show the motiftable.aspx code file
78
4.29c)
The “motiftable.aspx.cs” code file.
78
4.29d)
Show the output of the Xquery result
79
xv
LIST OF SYMBOLS
<
-
>
-
The beginning of the tag element in the
DTD
The end of the tag element in the DTD
?
-
Zero or one in DTD
+
-
One or many in DTD
*
-
Zero or many in DTD
xvi
LIST OF ABBREVIATION
DBMS
-
Database Management System
DOM
-
Document Object Model
DTD
-
Document Type Definitions
EDF
-
Extensible Data Format
FBB
-
Faculty Of Bioscience and
Bioengineering
PSSP
-
Protein Secondary Structure Prediction
RDB
-
Relational Database
XML
-Extensible Mark Up Languages
xvii
LIST OF APPENDICES
APPENDIX
TITLE
PAGE
A
Gantt Chart for Project 1
91
B
Gantt Chart for Project 2
93
C
Motif Query Research Tool
95
Download