The three search giants have come together to improve search engines by urging webmasters and web developers to include more semantic data on their websites. For this they have launched schema.org which will be a repository for common structured data types on the internet.
The web has a lot of data, especially text data, however it can often be difficult to figure out the context and semantics of the large volumes on text one can find on the internet. Is “Orange” a fruit? It is a colour? Or is it even someone’s name?
HTML Microdata, a standard being developed along with HTML5 can be used to specify the semantics of content on web pages. For example, if you are mentioning a name on your webpage, you can use microdata as follows:
<div>The next event will be hosted by <span itemprop="name">Rupert Orange</span></div>
Google, Bing, and Yahoo! will be supporting this format for specifying semantics on a page, thus improving the reach and power of search engines.
The newly launched schema.org will be hosting the schemas of various types of data, such as books, movies, TV series, music, audio, video, image, events organizations, and much more.
Data usually contains other kinds of data associated with it, for example, a music album might have a title, a band, and a listing of tracks, which themselves will be audio objects with their own properties. The band itself will have more data associated with it, such as a list of members, an image, an address, a url etc. The band members themselves are people with further data associated with them, a name, a birthdate, a phone number etc.
Let’s take a look at an example from schema.org of how this will work:
<div itemscope itemtype ="http://schema.org/Movie"> <h1 itemprop="name">Avatar</h1> <div itemprop="director" itemscope itemtype="http://schema.org/Person"> Director: <span itemprop="name">James Cameron</span> (born <span itemprop="birthDate">August 16, 1954)</span> </div> <span itemprop="genre">Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a></div>
The outermost div establishes that it is enclosing information about an item by including the itemscope attribute. It also establishes that it is talking about a type of data defined at http://schema.org/Movie via the attribute itemtype. The Movie as expected has a name, a director, a genre, and a link to a trailer.
The director is a Person (defined at http://schema.org/Person), and hence has further information defines about him in another div that has his name and birth date.
In another listing somewhere about James Cameron, this association could be reversed. The article could be talking about James Cameron, and instead listing his movies, with Avatar being one of them. One can imagine how search engines could find this data useful, as it associated two pieces of clearly defined information and provides a clear relationship between them.
<