Ambient Information and SEM

Added 30 Jul 2008

The scope of information was discussed a little, but let's take into consideration what it really means to have a "local" and a "global" system.

In general, there are small and large scale systems, and interactions between the two will most likely form a huge part of the transactions that occur on the Semantic Web. Let's define what we mean by large, medium, and small scale systems.

Large Scale

An example of a large scale system is two companies that are undergoing a merger needing to combine their databases. Another example would be search engines compiling results based upon a huge range of data. Large scale Semantic Web systems generally involve large databases, and heavy duty inference rules and processors are required to handle the databases.

Medium Scale

Medium scale Semantic Web systems attempt to make sense out of the larger scale Semantic Web systems, or are examples of small scale Semantic Web systems joined together. An example of the former is a company trying to partially understand two large scale invoice formats enough to use them together. An example of the latter is of two address book language groups trying to create a super-address book language.

Small Scale

Small scale Semantic Web systems are less widely discussed. By small scale Semantic Web systems, we mean languages that will be used primarily offline, or piles of data that will only be transferred with a limited scope, perhaps between friends, departments, or even two companies.

Sharing data on a local level is a very powerful example of how the Semantic Web can be useful in a myriad of situations. In the next section on evolution we shall be finding out how interactions between the different sized systems will form a key part of the Semantic Web.

SEM - SEmantic Memory

The concept of a SEmantic Memory was first proposed by Seth Russell, who suggested that personal database dumps of RDF that one has collected from the "rest" of the Semantic Web (a kind of Semantic Cloud) would be imperative for maintaining a coherant view of data. For example, a SEM would most likely be partitioned into data which is inherent to the whole Semantic Web (i.e., the schemata for the major languages such as XML RDF, RDF Schema, DAML+OIL, and so on), local data which is important for any Semantic Web applications that may be running (e.g. information about the logic namespace for CWM, which is currently built in), and data that the person has personally been using, is publishing, or that has been otherwise entered into the root context of the SEM.

The internal structure of a SEM will most likely go well beyond the usual triples structure of RDF, perhaps as far as quads or even pents. The extra fields are for contexts (an StID), and perhaps sequences. In other words, they are ways of grouping information within the SEM, for easy maintainence and update. For example, it should become simple to just delete any triple that was added into a certain context by removing all triples with that particular StID.

A lot of work on the Semantic Web has concentrated on making data stores (i.e. SEMs) interoperable, which is good, but that has lead to less work being conducted on what actually happens within the SEM itself, which is not good, because the representation of quads and pents in RDF is therefore up in the air. Obviously, statements can be modelled as they would be for reification:-

rdf:Statement rdfs:subClassOf :Pent .
_:s1 rdf:type :Pent .
_:s1 rdf:subject :x .
_:s1 rdf:predicate :y .
_:s1 rdf:object :z .
_:s1 :context :p .
_:s1 :seq "0" .

But clearly a dedicated pentuples format is always going to be more efficient, and aviod the perils of reification:-

:x :y :z :p "0" .

This language also needs a default context flag that indicates the root context of the document. The root context of a document is the space to which the (non-quoted) assertions are parsed, the conceptual information space in which all of the assertions are taken to be true. Any quoted information in the document (for example, using the Notation3 context syntax) would be in a different (possibly anonymous) context than the root context of the document.

TimBL appears, gaguing from the CWM source code, to be using "...#_formula" as the root context for the document, which (if true) is a bit of a nasty hack... what if one creates a property with the same URI? Maintaining interoperability at this level of the Semantic Web is an important thing for the Semantic Web developers to be investigating at this stage