The DAS Protocol

advertisement
The DAS Protocol
Andy Jenkinson, EBI
Summary of Topics
• Technical overview
• Principles of communication
• Pros and cons
• DAS capabilities
DAS Architecture
• A client asks for data from many servers
• HTTP requests
• identically structured URLs, the same parameters
• Each server behaves in the same way
• pre-defined set of behaviours
• e.g. provide a sequence, provide annotations of a sequence
• Each server provides different data in the same format
• DAS-XML
DAS Concepts
Reference object
• usually a sequence
• e.g. “chromosome X” or “NT_025741”
Annotation
• information attached to a location within a segment
• e.g. “substitution at residue 326 of BRCA1”
DAS Concepts
Reference server
• server that provides “core” reference object data
• e.g. GRCh37 sequence data
Annotation server
• server that provides annotations of reference objects
Segment
• part of a reference object
• e.g. “bases 100 to 200 of chromosome X”
• ties together annotation and reference servers
Architectural Overview
The DAS Protocol
Defines 3 constraints
• transport layer: HTTP
• query format: constrained REST URLs
• response format: constrained XML
Keyword: constrained
The DAS Protocol
Defines 3 constraints
• transport layer: HTTP
• query format: constrained REST URLs
• response format: constrained XML
The DAS Protocol
Defines 3 constraints
• transport layer: HTTP
• query format: constrained REST URLs
• response format: constrained XML
Data transport
• Standard HTTP
• Includes compression
• Some additional headers, e.g. to indicate DAS version
The DAS Protocol
Defines 3 constraints
• transport layer: HTTP
• query format: constrained REST URLs
• response format: constrained XML
Well-defined query URLs
• A client can issue a command
http://das.sanger.ac.uk/das/ccds_mouse/features?segment=...
^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^
site prefix
das
source
command
arguments
The DAS Protocol
Defines 3 constraints
• transport layer: HTTP
• query format: constrained REST URLs
• response format: constrained XML
XML format
• server responds with a simple XML document
<SEGMENT id=“X” start=“1” end=“100”>
<FEATURE id=“exon1”>
<TYPE id=“exon”>exon</TYPE>
Why DAS?
Fast, targeted queries
• suitable for visual display
Based on existing simple tech
• XML/HTTP/CGI
• “dumb server, clever client” - relatively low knowledge
barrier for bioinformaticians with data to expose
Scalable
• integrators (client software) get more data for zero cost
Why not DAS?
One-dimensional queries
• query only by sequence position
• not by developmental stage, tissue type, etc
• (yet)
Constrained generic format
• clients aren’t “tailored” to each data source
• possible data types are to some extent limited
Not semantically rich
• ontology support optional
Commands: the basics
Sequence
• give me the DNA sequence for a given segment of a
reference object
• e.g. “bases 100k – 200k of chromosome 15”
Features
• give me all annotations offered by the data source that
are attached to a given segment of the sequence
The sequence command
/das/<source>/sequence?<params>
Parameters:
segment=ID:start,end (one or more)
ID of reference object
Example:
/das/<source>/sequence?segment=X:100,200
;segment=Y:500,600
The sequence command
Response:
<DASSEQUENCE>
<SEQUENCE id="X” start="100” stop="200”
version="1.0”>
cctgagccagcagtggcaacccaatggggtccctttcca...
</SEQUENCE>
<SEQUENCE id=”Y” start=”500” stop=”600”
version="1.0”>
ctggacagcccggaaaatgagctcctcatctctaaccca...
</SEQUENCE>
</DASSEQUENCE>
The features command
/das/<source>/features?<params>
Parameters:
segment=ID:start,end (one or more)
type=foo (zero or more)
category=bar (zero or more)
Example:
/das/<source>/features?segment=X:100,200
;segment=Y:500,600
;type=SNP
The features command
Response:
<DASGFF>
<GFF version="1.01" href=”...">
<SEGMENT id="X" start="100" stop="200">
<FEATURE id="X">
<START>100</START>
<END>200</END>
<TYPE id=”SNP” category=”variation">SNP</TYPE>
<METHOD id=”sequencing">sequencing</METHOD>
<SCORE>86.4</SCORE>
<ORIENTATION>+</ORIENTATION>
</FEATURE>
...
Other Commands
Stylesheet
• hints on how to render different types of feature
• e.g. “exons as blue boxes, SNPs as red triangles”
/das/<source>/stylesheet
Types
• lists the types of feature available
/das/<source>/types
Metadata
Can make a client that knows how to query a server and
parse the response
BUT something missing…
• which data sources are available on a server?
• which commands does a source support?
• what kind of reference objects does it know about?
The sources command
<server>/das/sources
• Lists a server’s data sources
For each source:
• text description
• list of “capabilities” (commands)
• list of coordinate systems (type of reference object)
• etc
DAS Registry
• third component of DAS
• catalogue of DAS sources
Human interface
• validate, register, search, view statistics
Programmatic interface
• http://www.dasregistry.org/das/sources
• http://www.dasregistry.org/das/coordinatesystem
• http://www.dasregistry.org/das/organism
SOA
Registry
Server
Client
Links
DAS Homepage
• http://www.biodas.org/
DAS Specification
• http://www.biodas.org/documents/spec-1.6.html
DAS in Ensembl:
• http://www.ensembl.org/info/docs/das/index.html
Mailing list:
• http://biodas.org/mailman/listinfo/das
Download