现在的位置: 首页 > 综合 > 正文

关于RDF与XML的区别

2013年09月20日 ⁄ 综合 ⁄ 共 10322字 ⁄ 字号 评论关闭

      刚开始学习语义网的知识,根据语义网层次推进,看完XML紧接着RDF,忽然有个疑问:为什么我们一定要用RDF而非仅仅使用已经很成熟的XML呢?貌似XML比RDF少了一个推理的系统,可以就推理而言,RDF也绝不是最好的,显然OWL的推理要比RDF强很多啊,那么为什么还需要这个层次呢?

     在网上寻了N久,终于得到自己比较满意的答案,拿出来跟大家共享下。

     首先,XML与RDF的区别:

     1.RDF模式和XML模式是不同的XML数据模式是一个文本可扩展语言,相比之下,RDF有一个非常简单的模式,即二元关系模式。当然,任何的RDF声明形式都可以用XML来表示,但XML是被设定为固定的、树状的文本,在描述数据元上缺乏一定的灵活性。RDF模式却是有足够的灵活来描述这种主观的、分布式的、用不同形式来表达的元数据。

    2.RDF和XML所使用的资源不同
XML中所谈到的节点,是XML文档中的节点,尤其是在文档结构中特定之处。在RDF中,节点不在是节点本身了,而是任何其他可用URIS标识的资源,因此RDF是一种元数据语言。

    3.XML Schema和RDF的语意不同
    XML Schema最初的语意解释是限制在XML文档中的,它是隐含的。RDF原本就是语意解释,用于对那些不能够用树形结构来很好建模的知识进行建模。总之,XML/XML Schema是数据建模语言,RDF是元数据建模语言,当元数据需要编码成

数据时,XML语法就非常的有用,如果纯用XML语言来进行元数据建模那么在灵活性就会受到阻碍。

   其次,为什么要用RDF而非仅仅使用XML(这段是英语文档,不难看懂)

   This has been a question which has been around ever since RDF started. At the W3C Query Language workshop, there was a clear difference of view between those who wanted to query documents and those who wanted to extract the "meaning" in some form and query that. This is typical. I wrote this note in a frustrated attempt to explain whatthe RDF model was for those who though in terms of the XML model. I  later listened to those who thought in terms of the XML model, and tried to writ it the other way around in another note. This note assumes that the XML data model in all its complexity, and the RDF syntax as in RDF Model and Syntax, in all its complexity. It

doesn't try to map one directly onto the other -- it expresses the RDF model using XML.

    Let me take as an example a single RDF assertion. Let's try "The author of the page is Ora". This is traditional. In RDF this is a triple triple(author, page, Ora)  

    which you can think of as represented by the diagram  

  How would this information be typically be represented in XML?

<author>
     <uri>page</uri>
     <name>Ora</name>
</author>

or maybe

<document href="page">
   <author>Ora</author>
</document>

or maybe

<document>
   <details>
    <uri>href="page"</uri>
    <author>
        <name>Ora</name>
    </author>
    </details>
</document>

or maybe

<document>
   <author>
    <uri>href="page"</uri>
    <details>
        <name>Ora</name>
    </details>
    </author>
</document>

<document href="http://www.w3.org/test/page" author="Ora" />

The XML Graph
These are all perfectly good XML documents - and to a person reading then they mean

the same thing. To a machine parsing them, they produce different XML trees.

Suppose you look at the XML tree

<v>
   <x>
    <y> a="ppppp"</y>
    <z>
        <w>qqqqq</w>
    </z>
   </x>
</v>

It's not so obvious what to make of it. The element names were a big hint for a human

reader.

Without looking at the schema, you know things about the document structure, but

nothing else. You can't tell what to deduce. You don't know whether ppppp is a y of

qqqqq, or qqqqq is a z of ppppp or what. You can't even really tell what real questions

can be asked. A source of some confusion is that in the xyz example above, there are

lots of questions you can ask. They are questions like,

Is there a w element within a details element?
What is the content of the w element within the first x element?
What is the content of the w element following the first y element which contains an x

element whose a attribute is "pppp"?
and so on.
These are all questions about the document. If you know the document schema (a big

if) , and if that schema it only gives you a limited number of ways of expressing the

same thing (another big if) , then asking these questions can be in fact equivalent to

asking questions like

What is the author of page?
This is hairy. It is possible because there is a mapping from XML documents to

semantic graphs. In brief, it is hairy because

The mapping is many to one
You need a schema to know what the mapping is
(The schemas we are talking about for XML at the moment do not include that anyway

and would have to have a whole inference language added)
The expression you need for querying something in terms of the XML tree is

necessarily more complicated than the expression you need for querying something in

terms of the RDF tree.
This last is a big one. If you try to write down the expression for the author of a

document where the information is in some arbitrary XML schema, you can probably

do it though it may or may not be very pretty. If you try to combine more than one

property into a combined expression, (give me a list of books by the same author as

this one), saying it in XML gets too clumsy to consider.

(Think of trying to define the addition of numbers by regular expression operations on

the strings. Its possible for addition. When you get to multiplication it gets ridiculous -

to solve the problem you would end up reinventing numbers as a separate type.)

Looking at the simple XML encoding above,

<author>
     <uri>page</uri>
     <name>Ora</name>
</author>

it could be represented as a graph

 

We can represent the tree more concisely if we make a shorthand by writing the name

of each element inside its circle:

 

Of course the RDF tree which this represents (although it isn't obvious from the XML

tree except to those who know) is

 

Here we have made a shorthand again by putting making the label for each part its

URI.

The complexity of querying the XML tree is because there are in general a large

number of ways in which the XML maps onto the logical tree, and the query you write

has to be independent of the choice of them. So much of the query is an attempt to

basically convert the set of all possible representations of a fact into one statement.

This is just what RDF does. It gives you some standard ways of writing statements so

that however it occurs in a document, they produce the same effect in RDF terms. The

same RDF tree results from many XML trees.

Wouldn't it be nice if we could label our XML so that when the parser read it, it could

find the assertions (triples) and distinguish their subjects and objects, so as to just

deduce the logical assertions without needing RDF? This is just what RDF does,

though.

The RDF Graph
In fact RDF is very flexible - it can represent this triple in many ways in XML so as to

be able to fit in with particular applications, but just to pick one way, you could write the

above as

<Description about="http://www.w3.org/test/page" Author ="Ora" />

I have missed out the stuff about namespaces. In fact as anyone can create or own

the verbs, subjects and objects in a distributed Web, any term has to be identified by a

URI somehow. This actual real example works out to in real life more like

<?xml version="1.0"?>
  <Description

          xmlns="http://www.w3.org/TR/WD-rdf-syntax#"
          xmlns:s="http://docs.r.us.com/bibliography-info/"
 
                  about="http://www.w3.org/test/page"
                  s:Author ="http://www.w3.org/staff/Ora" />

You can think that the "description" RDF element gives the clue to the parser as to

how to find the subjects, objects and verbs in what follows.

This is pretty much the most shorthand way of using the base RDF in XML. There are

others which are longer, but more efficient when you have, for instance, sets of many

properties of the same object. The useful thing is that of course they all convey the

same triple

 

It is a mess when you use questions about a document to try to ask questions about

what the document is trying to convey. It will work. In a way. But flagging the grammar

explicitly (RDF syntax is a way of doing this) is a whole lot better.

Things you can do with RDF which you can't do with XML include

You can parse the semantic tree, which end up giving you a set of (possibly mutually

referential) triples and then you can use the ones you want ignoring the ones you don't

understand.
Problems with basing you understanding on the structure include

Without having gone to the trouble of getting the schema, or having an application

hand-programmed to recognise a particular document type, you can't pick up any

semantic information from a document;
When an XML schema changes, it could typically introduce new intermediate elements

(like "details" in the tree above or "div" is HTML). These may or may or may not

invalidate any query which has been based on the structure of the document.
If you haven't gone to the trouble of making a semantic model, then you may not have

a well defined one.
I'll end this with some examples of the last problem. Clearly they can be avoided by

good design even in an XML system which does not use RDF. Using RDF makes

things easier.

Get it right
If you haven't gone to the trouble of making a semantic model, then you may not have

a well defined one. What does that mean? I can give some general examples of

ambiguities which crop up in practice. In RDF, you need a good idea about what is

being said about what, and they would tend not to arise.

Look at a label on the jam jar which says: "Expires 1999". What expires: the label, or

the jam? Here the ambiguity is between a statement about a statement about a

document, and a statement about a document.

Another example is an element which qualifies another apparently element. When

information is assembled in a set of independently thrown in records often ambiguities

can arise because of the lack of logic. HTTP headers (or email headers) are a good

example. These things can work when one program handles all the records, but when

you start mixing records you get trouble. In XML it is all too easy to fall into the trap of

having two elements, one describing the author, and a separate one as a flag that the

"author" element in fact means not the direct author but that of a work translated to

make the book in question. Suddenly, the "author" tag, which used to allow you to

conclude that the author of a finnish document must speak finnish, now can be

invalidated by an element somewhere else on the record.

Another symptom of a specification where the actual semantics may not be as obvious

as as first sight is ordering. When we hear that the order of a set of records is

important, but the records seem to be defined independently, how can that be?

Independent assertions are always valid taken individually or in any order. In a server

configuration file, for example, he statement which looks like "any member has access

to the page" might really mean "any member has access to the page unless there is no

other rule in this file which has matched the page". That isn't what the spec said, but it

did mention that the rules were processed in order until one applied. Represented

logically, in fact there is a large nested conditional. There is implicit ordering when mail

headers say, "this message is encrypted", "this message is compressed", "this

message is ASCII encoded", "this message is in HTML". In fact the message is an

ASCII encoded version of an encrypted version of a compressed version of a

message in HTML. In email headers the logic of this has to be written into the fine print

of the specification.

Order in documents
There is something fundamentally different between giving a machine a knowledge

tree, and giving a person a document. A document for a person is generally serialized

so that, when read serially by a human being, the result will be to build up a graph of

associations in that person's head. The order is important.

For a graph of knowledge, order is not important, so long as the nodes in common

between different statements are identified consistently. (There are concepts of

ordered lists which are important although in RDF they break down at the fine level of

detail to an unordered set of statements like "The first element of L is x", the "third

element of L is z", etc so order disappears at the lowest level.). In machine-readable

documents a list of ostensibly independent statements where order is important often

turn out to be statements which are by no means independent.

Some people have been reluctant to consider using an RDF tree because they do not

wish to give up the order, but my assumption is that this is from constraints on

processing human readable documents. These documents are typically not ripe for

RDF conversion anyway.

Conclusion:

Sometimes it seems there is a set of people for whom the semantic web is the only

graph which they would consider, and another for whom the document tree (or graph if

you include links) is all they would consider. But it is important to recognise the

difference.

  圆满解答,O(∩_∩)O哈!

 

 

 

 

抱歉!评论已关闭.