Evolution

Added 30 Jul 2008

A very important concept on the Semantic Web is that of evolution: going from one system into another. Two key parts of evolvability are partial understanding and transformability. We will find out next how these manifest themselves naturally when changing the scale of a system.

Partial Understanding: Large Scale to Medium Scale

The concept of partial understanding is a very important one on the Semantic Web, and can often be found in older documents that came out about the same time as the Semantic Web was first being theorized.

An example of partial understanding when moving a large scale system to a medium scale system is of a company trying to make sense out of two invoices, one from Company A and one from Company B. The knowledge that both of the companies use similar fields in their invoices is well known, so company trying to make sense out of the invoices can easily compile a master list of expenditures by simply scraping the data from the two invoice languages. Neither Company A nor Company B need to know that this is going on.

Indeed, TimBL included this example in his XML 2000 keynote:-

[...] what we'll end up doing in the future is converting things, so for example [...] in the Semantic Web we will have a relationship between two langauges so that if you get an invoice in a langauge you don't understand, and you have... some business software which can pay invoices... by following links across the Semantic Web, your machine will be able to automatically convert it from one language to another, and so process it.

- Tim Berners-Lee

Transformability: Small Scale to Medium Scale

An example of a small scale Semantic Web system joined together to make a medium sized Semantic Web system could be two groups that have published address book fomats wanting to make a larger and better address book format by merging the two current formats together. Anyone using one of the old address book formats could probably convert them into the new format, and hence there would be a greater sense of interoperability. That's generally what happens when one goes from a small scale Semantic Web system into a medium scale Semantic Web system, although this is often not without some disadvantages and incompatabilites. The Semantic Web takes the sting out of it by automating 99% of the process (it can convert field A into field B, but it can't fill in any new data for you... of course, new fields can always be left empty for a while).

Facilitating Evolvability

How do we document the evolution of languages? This is a very important and indeed urgent question, and one which TimBL summarized quite neatly:-

Where for example a library of congress schema talks of an "author", and a British Library talks of a "creator", a small bit of RDF would be able to say that for any person x and any resource y, if x is the (LoC) author of y, then x is the (BL) creator of y. This is the sort of rule which solves the evolvability problems. Where would a processor find it? [...]

- Semantic Web roadmap, Tim Berners-Lee

One possible answer is: third party databases. Very often, it is not practical to have (in TimBL's example) either the LoC or or BL record the fact that two of their fields are the same, so this information will have to be recorded by a reputable third party.

One such "third party" that was set up to investigate this is SWAG, the Semantic Web Agreement Group. Co-founded by Seth Russell, Sean B. Palmer, Aaron Swartz, and William Loughborough, the group aims to ensure interoperability on the Semantic Web. They set up what is possibly the first ever third party Semantic Web dictionary, the WebNS SWAG Dictionary.

Intertwingling: Difficult, But Important

Although the Semantic Web as a whole is still very much at a grassroots kind of level, people are starting to take notice; they're starting to publish information using RDF, and thereby making it fit for the Semantic Web.

However, not enough is being done to link information together... in other words, the "Semantic" part of the "Semantic Web" is coming along nicely, but where's the "Web"? People are not using other people's terms effectively; when they use other terms, they often do so because they're aimlessly trying to help, but just generating noice in the process. If you're going to use other people's data, try to find out what the advantage is in doing that beforehand. For example, just because you use the term "dc:title" in your RDF rather than a home brewed ":title", does that mean that suddenly a Dublin Core application is going to be able to "understand" your code? Of course not. What it does mean however is that if the "dc:title" property in your instance is being put to use in such a way that information may be need to repurposed from it in the near future, then you may gain some advantage because "dc:title" is such a commonly used term, you may be able to modify a current rules file, or whatever.

Another part of the problem may be due to a problem similar to the one that the early World Wide Web experienced: why bother publishing a Web site when there is no one else's site to link to or be linked to? Why bother publishing a Web site when so few people have browsers? Why bother writing a browser when there are so few Web sites? Some people have to make the leaps for it all to happen, and that's a slow process.

What can be done about the situation? Well, it may hopefully sort itself out. Another well-known principle that applies very well to Semantic Web applications is that there is no point in reinventing the wheel; viz., if someone has already invented a schema which contains a comprehensive and well understood and used set of terms that you also need to use in your application, then there is no point in trying to redo the work that they have done. At some points this may lead to a form of "schema war", but survival of the fittest should see to it that a core of the best schemata are put to the most use. This is probably what TimBL means when he says that terms will just "emerge" out of the Semantic Cloud, that when people keep using the term "zip", rather than just recording that my term "zip" is equivalent to your term "zip" which is equivalent to someone else's term "zip", we'll all just use the same URI, and hence interoperability will be vastly improved.