现在的位置: 首页 > 综合 > 正文

The Semantic Web, Linked Data and Open Data

2012年06月02日 ⁄ 综合 ⁄ 共 4688字 ⁄ 字号 评论关闭

Back in 2001 Tim Berners-Lee and his collaborators published a seminal article(开创性论文) called “The Semantic Web” in which they presented their idea of “a new form of Web content that is meaningful to computers [and] will unleash a revolution of new possibilities”. In the last few years, the idea has gained traction and technologies have become available to build parts of this vision. Unfortunately, getting started is not so easy, because there are many concepts with slightly varying names and minute differences in their meaning and several technologies with cryptic names, so let’s start with some definitions.

First up is the term Semantic Web. The Semantic Web describes the vision that machines will some day be able to understand the meaning (“semantics”) of information on the Internet, and be able to “perform tasks automatically and locate related information on behalf of the user” (Wikipedia). What is important to understand, is that this term describes an amalgam of concepts and technologies (similar to the “Web 2.0”) and not a single technology.

One technological concept that is part of the Semantic Web vision is Linked Data, which describes “a method of publishing structured data, so that it can be interlinked and become more useful” (Wikipedia). The above-mentioned example shows the power of this: instead of giving our software a meaningless (at least to a machine) string as an input, we give it an object with an URI (Zürich) and define this object as being of type, amongst others, populated place.

The meaning of “a populated place” in this case is clearly defined, so that others can look up what it means exactly and also use this definition themselves. This way, if someone uses “a populated place”, everyone talks about the same thing. Also, if we take a look at the definition of label, it says that it is “a human-readable name for the subject”.

The description of “a populated place” is part of a vocabulary that has been defined in an ontology. What’s interesting is that this ontology can be defined by anyone. This allows for the creation of ontologies for special areas of interest, such as the “friend of a friend (FOAF)” or the “hCard” vocabularies, which were created by individuals or small groups and have proven useful to their community. Because of the distributedness of these ontologies, they can be formed bottom-up and save us from creating The One Global Ontology, which would be a gargantuan task.

Linked Data by itself doesn’t have to be publicly available data, it can just as well be used in private, so we need one more definition: Open Data. It describes “a philosophy and practice requiring that certain data be freely available to everyone, without restrictions from copyright, patents or other mechanisms of control” (Wikipedia). This is similar in spirit to other movements like Open Source Software, and there is work being done to create licenses that clarify the usage terms of the data (e.g. Open Definition and the Open Data Commons).

At last, to describe data that is open and linked, there’s the combination of the two, Linked Open Data. This is the data we, as visualization creators, want, because it has clear license terms and is easily linkable with other data sets. To put these terms in relation to each other, I created the following graphic; in the world of all data, only the blue areas are open to the public, with the dark blue being open and linked.

Democratic governments have always had to make the data they produce transparent to their citizens, however, many do so using proprietary file formats like Excel, machine-unfriendly documents like PDFs, or “hide” the data by distributing it over many government sites and thus making it (unintentionally) hard to find. This is all Open Data, because people can look at and use it.

Luckily, there is this new trend to make data really open, not just legally and as a matter of form. Sites like data.gov have started to provide Open Data as a central, searchable catalog, often with the option of accessing the data through APIs, which makes it a lot easier to consume the data, as it doesn’t have to be transformed, combined and prepared for a program to use. With this central catalog in place, they have now been able to go a step further and start transforming this data into a huge Linked Open Data set, that is accessible to everyone. The graphic below shows the size of the Linked Open Data web at the end of 2010: each bubble is a website that you can access through Linked Open Data technologies in similar ways that you would normally access a database.

To get some perspective on these different ways of publishing data, Berners-Lee suggested a 5-star system to describe the accessibility quality of data sets to emphasize that “the Semantic Web isn’t just about putting data on the web”, but doing so in ways that allow machines to understand the meaning of the data. The LiDRC Lab has taken Berners-Lee’s proposal and prepared it using examples and annotations. Go and have a look at the Linked Open Data star scheme by example, it’s a good read.

What the system does not take into account, however, is the quality of the data itself. As with everything on the Internet, remember that even if you get your hands on a well published Linked Open Data set, it may be incomplete, taken out of context or badly curated. Bad content in, bad content out does still apply. This problem is especially acute for Linked Open Data at the moment, because everyone is just starting out with creating the ontologies and links and there is no way to do this overnight, so incompleteness will probably prevail for a while.

 


 

抱歉!评论已关闭.