The DAS Protocol Andy Jenkinson, EBI Summary of Topics • Technical overview • Principles of communication • Pros and cons • DAS capabilities DAS Architecture • A client asks for data from many servers • HTTP requests • identically structured URLs, the same parameters • Each server behaves in the same way • pre-defined set of behaviours • e.g. provide a sequence, provide annotations of a sequence • Each server provides different data in the same format • DAS-XML DAS Concepts Reference object • usually a sequence • e.g. “chromosome X” or “NT_025741” Annotation • information attached to a location within a segment • e.g. “substitution at residue 326 of BRCA1” DAS Concepts Reference server • server that provides “core” reference object data • e.g. GRCh37 sequence data Annotation server • server that provides annotations of reference objects Segment • part of a reference object • e.g. “bases 100 to 200 of chromosome X” • ties together annotation and reference servers Architectural Overview The DAS Protocol Defines 3 constraints • transport layer: HTTP • query format: constrained REST URLs • response format: constrained XML Keyword: constrained The DAS Protocol Defines 3 constraints • transport layer: HTTP • query format: constrained REST URLs • response format: constrained XML The DAS Protocol Defines 3 constraints • transport layer: HTTP • query format: constrained REST URLs • response format: constrained XML Data transport • Standard HTTP • Includes compression • Some additional headers, e.g. to indicate DAS version The DAS Protocol Defines 3 constraints • transport layer: HTTP • query format: constrained REST URLs • response format: constrained XML Well-defined query URLs • A client can issue a command http://das.sanger.ac.uk/das/ccds_mouse/features?segment=... ^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^ site prefix das source command arguments The DAS Protocol Defines 3 constraints • transport layer: HTTP • query format: constrained REST URLs • response format: constrained XML XML format • server responds with a simple XML document <SEGMENT id=“X” start=“1” end=“100”> <FEATURE id=“exon1”> <TYPE id=“exon”>exon</TYPE> Why DAS? Fast, targeted queries • suitable for visual display Based on existing simple tech • XML/HTTP/CGI • “dumb server, clever client” - relatively low knowledge barrier for bioinformaticians with data to expose Scalable • integrators (client software) get more data for zero cost Why not DAS? One-dimensional queries • query only by sequence position • not by developmental stage, tissue type, etc • (yet) Constrained generic format • clients aren’t “tailored” to each data source • possible data types are to some extent limited Not semantically rich • ontology support optional Commands: the basics Sequence • give me the DNA sequence for a given segment of a reference object • e.g. “bases 100k – 200k of chromosome 15” Features • give me all annotations offered by the data source that are attached to a given segment of the sequence The sequence command /das/<source>/sequence?<params> Parameters: segment=ID:start,end (one or more) ID of reference object Example: /das/<source>/sequence?segment=X:100,200 ;segment=Y:500,600 The sequence command Response: <DASSEQUENCE> <SEQUENCE id="X” start="100” stop="200” version="1.0”> cctgagccagcagtggcaacccaatggggtccctttcca... </SEQUENCE> <SEQUENCE id=”Y” start=”500” stop=”600” version="1.0”> ctggacagcccggaaaatgagctcctcatctctaaccca... </SEQUENCE> </DASSEQUENCE> The features command /das/<source>/features?<params> Parameters: segment=ID:start,end (one or more) type=foo (zero or more) category=bar (zero or more) Example: /das/<source>/features?segment=X:100,200 ;segment=Y:500,600 ;type=SNP The features command Response: <DASGFF> <GFF version="1.01" href=”..."> <SEGMENT id="X" start="100" stop="200"> <FEATURE id="X"> <START>100</START> <END>200</END> <TYPE id=”SNP” category=”variation">SNP</TYPE> <METHOD id=”sequencing">sequencing</METHOD> <SCORE>86.4</SCORE> <ORIENTATION>+</ORIENTATION> </FEATURE> ... Other Commands Stylesheet • hints on how to render different types of feature • e.g. “exons as blue boxes, SNPs as red triangles” /das/<source>/stylesheet Types • lists the types of feature available /das/<source>/types Metadata Can make a client that knows how to query a server and parse the response BUT something missing… • which data sources are available on a server? • which commands does a source support? • what kind of reference objects does it know about? The sources command <server>/das/sources • Lists a server’s data sources For each source: • text description • list of “capabilities” (commands) • list of coordinate systems (type of reference object) • etc DAS Registry • third component of DAS • catalogue of DAS sources Human interface • validate, register, search, view statistics Programmatic interface • http://www.dasregistry.org/das/sources • http://www.dasregistry.org/das/coordinatesystem • http://www.dasregistry.org/das/organism SOA Registry Server Client Links DAS Homepage • http://www.biodas.org/ DAS Specification • http://www.biodas.org/documents/spec-1.6.html DAS in Ensembl: • http://www.ensembl.org/info/docs/das/index.html Mailing list: • http://biodas.org/mailman/listinfo/das