Microdata and schema.org

advertisement
Basics Microdata and schema.org
What is WHATWG? l  Microdata is a simple seman1c markup scheme that’s an alterna1ve to RDFa l  Developed by WHATWG and supported by major search companies (Google, MicrosoE, Yahoo, Yandex) l  Like RDFa, it uses HTML tag aJributes to host metadata l  Vocabularies are controlled and hosted at schema.org h8p://whatwg.org/ l  Web Hypertext Applica1on Technology Working Group – 
– 
Community interested in evolving the Web with focus on HTML and Web API development Ian Hickson is a key person, now at Google l  Founded in 2004 by individuals from Apple, Mozilla and Opera aEer a W3C workshop –  Concern about W3C's embrace of XHTML l  Current work on HTML5 l  Developed Microdata spec 1 HTML5 HTML taxonomy and status l  Started by WHATWG as an alterna1ve to XHTML, joined by W3C – 
– 
A W3C candidate recommenda1on in 2012 (draE) WHATWG will evolve it as a “living standard” l  HTML5 ≈ HTML + CSS + js l  Na1ve support for graphics, video, audio, speech, seman1c markup, … l  Current par1al support in major browsers & extensions Microdata l  The microdata effort has two parts: – 
– 
A markup scheme A set of vocabularies/ontologies l  The markup is similar to RDFa in providing ways to iden1fy subjects, types, proper1es & objects There’s also a standard way to encode microdata as RDFa l  The sanc1oned vocabularies are found at schema.org and include a small number of very useful ones: people, movies, etc. An example <div> <h1>Avatar</h1> <span>Director: James Cameron (born 1954) </span> <span>Science fic1on</span> <a href=”avatar-­‐trailer.html">Trailer</a> </div> 2 An example: itemscope l 
An itemscope aJribute iden1fies a content subtree that is the subject about which we want to say something <div itemscope > <h1>Avatar</h1> <span>Director: James Cameron (born 1954) </span> <span>Science fic1on</span> <a href=”avatar-­‐trailer.html">Trailer</a> </div> An example: itemtype An itemscope aJribute iden1fies a content subtree that is the subject about which we want to say something l  The itemtype aJribute specifies the subject’s type <div itemscope itemtype="h8p://schema.org/Movie"> <h1>Avatar</h1> <span>Director: James Cameron (born 1954) </span> <span>Science fic1on</span> <a href=”avatar-­‐trailer.html">Trailer</a> </div> l 
Microdata <-­‐> RDF Microdata <-­‐> RDF http://rdf-translator.appspot.com/
http://rdf-translator.appspot.com/
3 An example: itemtype An itemscope aJribute iden1fies a content subtree that is the subject about which we want to say something l  The itemtype aJribute specifies the subject’s type [ ] a schema:Movie .
<div itemscope itemtype="h8p://schema.org/Movie"> <h1>Avatar</h1> <span>Director: James Cameron (born 1954) </span> <span>Science fic1on</span> <a href=”avatar-­‐trailer.html">Trailer</a> </div> l 
An example: itemprop An itemscope aJribute iden1fies a content subtree that is the subject about which we want to say something l  The itemtype aJribute specifies the subject’s type l  An itemprop aJribute gives a property of that type <div itemscope itemtype="h8p://schema.org/Movie"> <h1 itemprop="name">Avatar</h1> <span>Director: James Cameron (born 1954) </span> <span itemprop="genre">Science fic1on</span> <a href=”avatar-­‐trailer.html” itemprop="trailer">Trailer</a> </div> l 
An example: itemprop An itemscope aJribute iden1fies a content subtree that is the subject about which we want to say something [ ] a schema:Movie ;
l  The itemtype aJribute specifies the schema:genre
subject’s type "Science fiction" ;
schema:name
"Avatar" ;
l  An itemprop aJribute gives a property of that type schema:trailer <avatar-trailer.html> .
<div itemscope itemtype="h8p://schema.org/Movie"> <h1 itemprop="name">Avatar</h1> <span>Director: James Cameron (born 1954) </span> <span itemprop="genre">Science fic1on</span> <a href=”avatar-­‐trailer.html” itemprop="trailer">Trailer</a> </div> l 
An example: embedded items l 
An itemprop immediately followed by another itemcope makes the value an object <div itemscope itemtype="hJp://schema.org/Movie"> <h1 itemprop="name">Avatar</h1> <div itemprop="director" itemscope itemtype="h8p://schema.org/Person"> Director: <span itemprop="name">James Cameron</span> (born <span itemprop="birthDate">1954</span>) </div> <span itemprop="genre">Science fic1on</span> <a href="avatar-­‐trailer.html" itemprop="trailer">Trailer</a> </div> 4 An example: embedded items [ ] a schema:Movie
;
schema:director [ a schema:Person ;
schema:birthDate "1954" ;
"James Cameron"
; akes l  An itemprop immediately fschema:name
ollowed by another itemcope ] m
"Science fiction" ;
the value an object schema:genre
schema:name "Avatar" ;
schema:trailer <avatar-trailer.html> .
<div itemscope itemtype="hJp://schema.org/Movie"> <h1 itemprop="name">Avatar</h1> <div itemprop="director" itemscope itemtype="h8p://schema.org/Person"> Director: <span itemprop="name">James Cameron</span> (born <span itemprop="birthDate">1954</span>) </div> <span itemprop="genre">Science fic1on</span> <a href="avatar-­‐trailer.html" itemprop="trailer">Trailer</a> </div> schema.org vocabulary l  Full type hierarchy in one file l  548 classes, 711 proper1es (5/4/14) l  Data types: Boolean, Date, DateTime, Number (Float, Integer) Text (URL), Time l  Objects: Rooted at Thing with two ‘metaclasses’ (Class and Property) and eight subclasses h8p://www.schema.org/Recipe Microdata as a KR language l  More than RDF, less than RDFS l  Proper1es have an expected type (range) – 
– 
Might be a string A list of types, any of which are OK l  Proper1es aJached ≥ 1 types (domain) l  Classes can have mul1ple parents and inherit (proper1es) from all of them l  No axioms (e.g., disjointness, cardinality, etc.) l  No subPropertyOf like rela1on 5 Mixing vocabularies Schema <-­‐> RDF hJp://schema.rdf.org l  Microdata is intended to work with just one vocabulary – the one at schema.org l  Advantages – 
– 
Simple, organized, well designed Controlled by the schema.org people l  Disadvantages: too simple, controlled – 
– 
Too simple, narrow, mono-­‐lingual Controlled by the schema.org people l  Schema.rdfs.org defines mappings between schema.org and popular RDF ontologies Extending the schema.org ontology l  hJp://www.schema.org/docs/extension.html Extension Problems l  Do agreed upon meaning l  You can subclass exis1ng classes – 
– 
Person/Engineer Person/Engineer/ElectricalEngineer l  Subclass exisi1ng proper1es – 
– 
– 
– 
musicGroupMember/leadVocalist musicGroupMember/leadGuitar1 musicGroupMember/leadGuitar2 – 
Through axioms supported by the language (e.g., equivalence, disjointness, etc.) No place for documenta1on (annota1ons, labels, comments) l  Without a namespace mechanism, your Person/Engineer and mine can be confused and might mean different things 6 SerializaTon l  Schema.org has a data model and serializa1ons – 
– 
– 
– 
Microdata is the original, na1ve steriliza1on RDFa is more expressive and works with the RDF stack Everyone agrees that RDFa Lite is a good encoding: as simple as Microdata but more expressive JSON-­‐LD is also an accepted encoding l  Search engines look for Microdata and RDFa encodings and are beginning to look for JSON-­‐LD l  Schema.org considers RDFa to be the “canonical machine representa1on of schema.org” Conclusions l  Microdata is a good effort by the search companies to use a simple seman1c language l  The seman1cs is pragma1c – 
e.g., expected types: A string is accepted where a thing is expected – “some data is beJer than none” l  The real value is in – 
– 
the supported vocabularies and their use by Search companies l  => Immediate mo1va1on for using seman1c markup 7 
Download