Uploaded by Mohit Suradkar

ASSIGNMENT 2 22110154

advertisement
ASSIGNMENT 2
ES 243 : BIOLOGY FOR ENGINEERS
Mohit Suradkar 22110154
The Screenshots of human tumor protein 53 (TP53) search in NCBI:
Number and Length of Reference Sequences:
Reference Gene Sequence (1):
NCBI Reference Sequence: NG_017013.2
Length of Sequence : 32772 bp
Reference Protein Sequence (25):
Some of the Reference sequences and their lengths are listed below:
Reference Sequence
Length
NP_​​001394197.1
341 aa protein
NP_​​001394198.1
302 aa protein
NP_001394200.1
302 aa protein
NP_001394196.1
354 aa protein
NP_001394194.1
354 aa protein
NP_001394195.1
393 aa protein
NP_001263628.1
187 aa protein
NP_001263627.1
182 aa protein
Reference Transcript Sequence (26):
Some of the Reference sequences and their lengths are listed below:
Reference Sequence
Length
NM_001276698.3
2,136 bp linear mRNA
NM_001276697.3
2,003 bp linear mRNA
NM_001276696.3
2,645 bp linear mRNA
NM_001276699.3
2,063 bp linear mRNA
NM_001276761.3​​
2,509 bp linear mRNA
NM_001276760.3
2,512 bp linear mRNA
NM_000546.6
2,512 bp linear mRNA
Taking two random Transcript sequences and performing Pairwise Sequence Alignment
Using Needleman Wunsch Algorithm
Scoring System Used in above :
Gap open : 10
Gap extend : 0.5
End open : 10
End extend : 0.5
Gap penalty : 10
Extend penalty : 0.5
​End Gap penalty - False
Using Smith Waterman Algorithm
Scoring System Used in above :
Gap open : 10
Gap extend : 0.5
Steps for searching homolog of TP53 protein from House Mouse :
Performing pairwise alignment of human and mouse homolog of this protein :
Identity : 7775/19516 (39.8%)
Similarity : 7775/19516 (39.8%)
Gaps : 8467/19516 (43.4%)
Score : 33462.5
Structure of TP53 protein :
PDB (Protein Data Bank) and FAIR Data Principles:
Findable: PDB assigns unique identifiers to protein entries, simplifying structure discovery
through advanced search and query options.
Accessibility: Unique IDs and rich metadata (experimental method, resolution, authors,
etc.) enhance accessibility to PDB data.
Interoperability: PDB collaborates with UniProt, Ensembl, ChEMBL, and PDBe-KB, using
common formats for seamless data sharing.
Reusability: PDB data finds applications in education, drug creation, structural biology,
and bioinformatics.
NCBI (National Center for Biotechnology Information) and FAIR Data Principles:
Findable: Persistent identifiers (Accession numbers, DOIs, PMIDs) and rich metadata aid
in efficient data discovery on NCBI.
Accessibility: Standard protocols (HTTP, FTP, API) and authentication methods (NCBI
account, API key, My NCBI preferences) support easy access to NCBI data.
Interoperability: NCBI collaborates with PubMed Central, EMBL-EBI, DDBJ, and UniProt,
ensuring interoperability with widely accepted formats.
Reusability: NCBI data is reusable for sequence alignment, phylogenetic inference,
literature mining, and genomic analysis, with clear data usage licenses (CC0, CC-BY, public
domain).
Download