What is a Semantic Information Model

A Semantic Information Model is an information model in which the meaning of data can be interpreted from the model itself, without the need to consult a meta-model or external documentation.
This implies that a Semantic Information Model is written in a formalized language, such as Formal English, Formal Dutch, etc. The term 'formal' means that the language and its terms and phrases are explicitly and unambiguously defined. All terms and phrases in expressions in the formal language are defined in a formal taxonomic dictionary or in user defined extensions and the expressions shall comply with a formal syntax.

According to the Gellish Semantic Modeling Methodology, information such as statements, questions, commands, etc. about (possible) facts or knowledge are expressed as collections of formalized English expressions. Gellish enabled software can interpret the meaning (semantics) from the expressions, without the need to know or use a separate meta-model. Furthermore, the methodology defines a universal data structure (syntax), which enables that all databases and messages have or can be converted to and from the same data structure and use the same language definition. (Because the language defining ontology is expressed in Formal English itself, it is possible to import that language definition in databases as their initial content.) This enables that software can interpret the semantic expressions in multiple databases and messages and that different databases can be treated as if they are one distributed database. Such interoperation of databases enables verification and management of the consistency as well as combination of their content. Collections of expressions can be presented in any user interface lay-out. Examples of generic user interface lay-outs are 'Brains' and 'generalized data sheets'.

Semantic information modeling differ from conventional information modeling. In Software Engineering it is a widespread convention to create data models, which are semantic meta models. Such meta models are the basis for database designs and exchange messages (often called interfaces). (A meta model is a model about an instance model.) Such a meta model defines the database structure or message structure and acts as its documentation. Typically the meta model remains separate from the database instances (its content). To interpret the meaning of the data instances in a database or message it is then required to use the meaning that is contained in the semantic meta model. Conventionally, each database and message uses its own meta model, thus the data structures of all databases and messages are different. The different meta models for different databases are the root cause of the fact that it is costly and time consuming to integrate data from different databases and to develop new interfaces.

Semantic modeling thus means that meaning is included in and can be inferred from the created semantic models. The way in which meaning is modeled in Formal English builds on the principle that semantic models are fact oriented or expression oriented (as opposed to object oriented). Knowledge as well as information (instances) are modeled as expressions of ideas and facts.

It appears that information can be modeled in formalized natural languages as collections of expressions that fit in two universal basic semantic patterns:

  • A pattern for expressions that represent information about individual things
  • A pattern for expressions that represent knowledge, definitions and general requirements, i.e. information about kinds of things

A third category, which consists of relations between individual things and kinds of things can be modeled by a mixture of these two basic semantic patterns.

These semantic patterns are described in the book 'Semantic Information Modeling in Formalized Languages'. An introduction to that book is presented below.

Universal basic semantic patterns

1. Expression of facts about individual things

Any possible fact that represents information about an individual thing can be expressed in a semantic model as a collection of 'Relation-Role-Object' elementary relations, in which 'Object' stands for anything that acts as a role player. Each of such elementary relations expresses an involvement of something in a role of some kind in a relation. For example, object A1 <is involved in role of kind B1 in> relation R1. In this expression the phrase <is involved in role of kind B1 in> represents a kind of relation.
For an N-ary relation, R1 has N such involvement relations with N role players (A1..An). Ordinary binary relations thus consist of two of such elementary involvement relations, together describing the two kinds of roles (B1 and B2) of the two role players (A1 and A2).

The simplified universal basic semantic pattern for the expression of facts about individual things is illustrated in Figure 1.

Figure 1, Simplified universal basic semantic pattern for the expression of facts about individual things

The pattern of Figure 1 is a semantic model of an expression of a fact or possible fact about an individual thing. It is a simplified version, because in the full version the role that is played by the individual thing and its classification are also explicit. This simplified pattern illustrates that each fact about individual things is represented by an individual relation (the relation-1 box (1)), whereas that fact is expressed by a number of (elementary) involvement relations (7) between involved objects (Oi) and the relation. Binary facts are modeled as a relation between two objects (O1 and O2). Higher order (N-ary) relations are expressed as relations that are related to N individual things by collections of N binary elementary involvement relations. In addition to that, Figure 1 illustrates that the pattern includes the explicit specification of the classification of the relation (2), the classifications of the involvement relation (8) which implies kinds of roles that are played by the related objects and the explicit classification of the related individual things (3).

For ordinary binary relations it is possible to simplify the expressions by replacing a pair of two elementary binary relations by one ordinary binary relation. The reason why that is possible can be described as follows:
Assume that we classify relation R1 by the kind of relation R. Then define the kind of relation R among others by specifying the two kinds of roles (B1 and B2) that are by definition required by every relation of such a kind R. This means that the two kinds of roles B1 and B2 are implied when such a kind of relation R is used in an expression. This enables that each pair of elementary involvement relations can be replaced by one ordinary binary relation (with two implied kinds of roles).

The resulting simplified basic semantic pattern for ordinary binary relations is illustrated in Figure 2.

Figure 2, Simplified basic semantic pattern for ordinary binary relations

Note that the numbers in Figure 2 refer to the same numbers as in Figure 1.
The graphical basic semantic pattern of Figure 2 can be expressed as a pattern for databases and data exchange messages that consists of expressions of four binary relations:

A-1 <R-1> A-2
A-1 <is classified as a> kind on thing
R-1 <is classified as a> kind of relation
A-2 <is classified as a> kind of thing

This pattern of four binary relations should be followed for all facts about individual things in order to be able to unambiguously interpret the meaning of relations such as R-1. The interpretation of expressions conform the above pattern rely on the interpretation of the kinds of things. Therefore, such patterns should be accompanied definitions of the kinds of things and kinds of relations. In other words, it requires modeled definitions that are interpretable as well. Such definitions are provided in a language defining ontology, expressed in the formal language as well. The expression of definitions of kinds of things (concepts) consists of relations between kinds of things. Such expressions follow a second basic semantic patter that is briefly described below. The book 'Semantic Modeling in Formal English' provides a more extensive description.

An example of a single ordinary binary relation with two implied kinds of roles (located and locator) that expresses a statement is:

statement 1: the Eiffel tower <is located in> Paris
statement 2: the Eiffel tower <is classified as a> tower
statement 3: Paris <is classified as a> city

Note 1: in the first expression the number 1 represents R-1, whereas the relation between '1' and phrase <is located in> represents the classification relation of R-1.

Note 2: Conventional linguistics treat such an expression (sentence) as a 'model of (eight) words', whereas a semantic model treats the expression as a model of (four) concepts. A semantic model is not a model of words, but a model of related concepts. Therefore, for example the phrase 'is located in' as a whole is used as a name of a kind of relation. The individual words in the phrase are irrelevant, which is opposed to conventional linguistic analysis of language.

Higher order relations and variable order relantion (variable in time) cannot be reduced to such simplified ordinary binary relations. Thus N-ary relations should be expressed as a collection of N elementary binary relations.
This implies that all kinds of relation can be modeled as (collections of) binary relations (either elementary or ordinary atomic).
The binary form is sometimes also called the Object-kind of relationship-Object (ORO) paradigm, although it can also be implemented in a different sequence, such as R(O,O). However, it should be recognized that a full expression requires more than just three components, as will be clarified below.

In its simplest form, the binary relation structure is also supported by technologies, such as RDF and OWL. However, a semantic model includes a number of semantic extensions that support an improved computer interpretation of such sentences and an improved computerized verification of semantic correctness. The main extensions are:

  • Each relation can be extended with an intention, such as 'statement' or 'question', 'promise', etc. This enables to express other things than statement or facts in the same language. As a consequence the language can also be used to express queries and no separate query language is required. (For simplicity of the description below we ingnore the intentions and talk mainly about 'facts', although also other intentions are implied.)
  • Each kind of relation shall have a modeled definition in the dictionary. Those semantic definitions of the relation type includes the definition of the kinds of roles that are by definition played in a relation of that kind and it includes the allowed kinds of players of such roles. For example, the relation type <is located in> requires a physical object in a 'locator' role and another physical object in a 'located' role.
  • Each individual thing shall be explicitly classified by a kind of thing. This is necessary to enable computer interpretation and semantic verification, because only the kinds of things (classes) are defined in the dictionary. Thus, the meaning of a relation between individual things can only be interpreted correctly when each related individual thing is classified, and also the roles they play and the relation they have are classified by a kind of thing.
  • The kinds of things shall be defined by at least a relation with their supertype kinds of things, thus forming a taxonomy of concepts (a specialization hierarchy, also called a subtype-supertype hierarchy). This is necessary for the interpretation of the meaning of the classifiers (for example, the concepts city, tower, and 'is located in', as well as 'locator' and 'located').
  • Each relation can be accompanied by a number of 'accessory facts' that define a context for interpretation in a standard way. For example, the context in which a statement or requirement is valid, during which period is it valid, when and by whom it is expressed, the language and language community for the used terms, etc.
  • Each thing (individual things, concepts, aspects and relations) is represented by its own language family wide unique identifier (UID), apart from a free domain. This enables the simultaneous use of various synonyms, homonyms, codes abbreviations and translations in different languages and language communities. This enables for the use of organization or system specific terminology and codes in expressions and presentations of those expressions in different terminology in communications with third parties.

Note that it is possible to use Gellish while ignoring the above functionality, but that will result in a lower formal language compliency level.

An ordinary binary fact is expressed as a binary relation between some individual thing and another individual thing. In Figure 1 these two related things are represented by the same box, called 'anything'. Thus the expression of a complete binary fact requires the sequence of objects on the lower line from left to right and backward:

  • something plays a role in a relationship in which another role is involved that is played by another thing

The definition of the kind of relationship (also called relation types, or fact types) includes the specification of the required kinds of roles. For a semantic interpretation of the expression it is generally sufficient to use those specifications. This means that in most cases the expression can be summarized as follows:

  • something <has a relationship of a particular kind with> another thing

Semantic expressions of facts about individual things can all be based on the above generic basic semantic pattern (model). Semantic analysis of natural language expressions resulted in the definition of a large number of kinds of relations (relation types), kinds of roles they require, as well as kinds of things (concepts) that can play roles of those kinds. Together with names, phrases, synonyms, codes and abbreviations for them they for a Gellish English Dictionary-Taxonomy. Thus defining a complete formal language, including a grammar and a dictionary-taxonomy.

Figure 1 also indicates that a kind of thing can have a specialization relation with another kind of thing. This means that a subtype-supertype hierarchy of relations between kinds of things is assumed. In other words the concepts are arranged in a Taxonomy. This also holds for the relation types. This is required, because each kind of relation requires that the related objects are of particular kinds. For example, only physical objects can be located in physical objects. Therefore, the <is located in> relation requires that each located thing and each locator thing is a physical object. For that reason it is required that the concepts 'building' and 'city' both are defined in the Dictionary-Taxonomy as subtypes of the concept 'physical object'. Thus, the subtype-supertype hierarchy of concepts (the Taxonomy) enables automated verification whether an expression is semantically allowed.

2. Expression of facts about kinds of things

Any fact that represents knowledge, a definition or a general requirement is expressed in a semantic model as a relation between kinds of things. Such expressions follows a universal basic semantic pattern. The simplified pattern (or data structure) is illustrated in Figure 3.

Figure 3, Simplified universal basic semantic pattern for the expression of facts about kinds of things

At the bottom left hand corner of Figure 3 a kind of thing or concept is depicted. The pattern implies that an N-ary relation (11) that represents knowledge or general requirements can be expressed as a collection of N kinds of (elementary) involvement relations (17) with N involved concepts, whereas each kind of involvement relation implies a kind of role. Note that all involved concepts are represented by the same box, called 'kind of thing'.The pattern is called 'simplified', because the full pattern explicitly shows the kinds of roles that are played by the kinds of things.

The expression of a complete binary fact requires two involvement relations. Each pair of such involvement relations can be replaced by one ordinary binary relation in a similar way as is depicted in Figure 2. An ordinary binary relation can also be read as a sequence of objects from left to right and backward in Figure 3:

  • some kind of thing can be involved with a kind of role in a kind of relation (11) in which another kind of thing can be involved that is playing another kind of role,

or in other words:

  • something of a particular kind plays a role of a particular kind in a relation of a particular kind in which another role of a particular kind is involved that is played by another thing of a particular kind.

For a proper interpretation of the meaning of such a kind of relation it is required that the meaning of each component in the expression is known. Each component is a kind of thing or kind of relation that shall be defined by a definition model in a Dictionary-Taxonomy (i.e. it shall be included in a Domain Dictionary-Taxonomy or in a proprietary extension). A proper definition of a concept in a formal language is computer interpretable. This means that the definition of each concept shall include at least one specialization relation (13A, 18 and 12) with a direct supertype of the concept. Such a specialization relation expresses that the defined concept is a proper subtype of that supertype concept and it means that the defined concept inherits all the facts about the supertype concept, including its possible roles in relations. In addition to that a concept may be defined by one or more other facts that are by definition the case.

Facts that represent knowledge include knowledge about what can be the case as well as knowledge about what is by definition the case. General requirements includes states of affairs that shall be the case in a particular context.

For example, knowledge about facts that are by definition the case are:

  • (any) house <has by definition as part a> roof
  • flat roof <is (by definition) a specialization of> roof

The facts about a concept that are by definition the case only define what the essential characteristics of such a kind of thing is. This means that a thing that does not comply with that fact is not a thing of that kind. Thus, this definition states that something without a roof is not a (well formed) house (yet).

An example of a general requirements is that in a particular context it may be specified as a requirement that:

  • (any) house <shall have as part a> flat roof

For a correct interpretation of the meaning of concepts such as 'house', 'roof' and the various kinds of relations (relation types) it is necessary that they are defined through a definition model in a formal Dictionary-Taxonomy.

Further guidance on the definition of relation types and other concepts is provided in the book: 'Semantic Information Modeling Methodology'.

Queries

Queries are collections of expressions in which one or more related things are replaced by unknowns. The use of unknowns in ordinary expressions in a formal language implies that there is no need for a separate 'query language' such as SQL or Sparql. An example of a simple query is given below in the paragraph about intentions.

Contextual facts

A proper interpretation of the meaning of an expression requires information about the context in which an expression is made. Therefore, semantic modeling not only requires expressions of possible facts, but it also requires that each expression is accompanied by additional contextual expressions of facts about the main possible fact. Such ┬┤contextual┬┤ and 'administrative' facts are called 'contextual facts' about a main fact. This information is also required for the management of information. For example, for proper information management it should be recorded who has created an expression, when that was done, what the status of the expression is, since when it is outdated or replaced by another expression of a fact, in what language it is expressed, etc. Each Gellish expression of a fact is therefore accompanied by such contextual facts. The standard 'Gellish Core' of contextual facts are defined in the document "Gellish Syntax and Contextual Facts, Definition of Universal Semantic Databases and Exchange Messages".

Intentions

One of the contextual facts of an expression expresses the 'communicative intention' of the expression. It specifies for example whether the expression is a 'statement' about something that is really the case (which is the default) or whether it expresses a probability, or a question, a denial, a promise, a confirmation, etc.

For example: The expression:

  • question: the Eiffel tower <is located in> where

can be answered by the following expression:

  • answer: the Eiffel tower <is located in> Paris

But also the following expressions are valid:

  • question: the Eiffel tower <is located in> Paris
  • confirmation: the Eiffel tower <is located in> Paris

Unique identifiers and names of things

The same thing can be denoted in one language by various terms, such as names, abbreviations, codes and synonym names. This is multiplied by the number of languages. Furthermore, one term can be used to denote different things. Such a term is called a homonym term. Semantic modeling is modeling of meaning, irrespective of which terms are used or in which language the meaning is expressed. To support this, it is a rule in the Semantic Modeling Methodology that each thing is uniquely represented by only one language wide unique identifier (a Gellish UID). This holds for anything, including also relation types, facts, etc.

There are three exceptions to this rule:

  1. Strings.
    Character strings have no UID's as the strings themselves are unique (being particular bit patterns in a specified coding system, such as Unicode).
  2. Reserved domain (allocated on request).
    In such a domain the user organization is resposible for allocating UID's, overlaps will not occur, but the same concept may exist with another UID. Conventions for reserved ranges of UID's and for allocating UID's are described in the Gellish Modeling Methodology, part 7: "General Principles and Guidelines".
  3. Free domain of UID's (above 30 billion).
    In this domain every user organization is responsible for its own UID's and overlap with others may occur,

The Gellish Dictionary-Taxonomy specifies the UID's of concepts for basic Gellish Formal English. The UID's of additional concepts and of individual things should be specified and managed by the persons who defines new things.

This clarifies why Gellish Expression Tables include also columns for auxiliary facts that relate UID's to names, abbreviations, codes, etc. and that specify in which language and language communities those names find their home base.

Universal Semantic Databases and Data Exchange Messages

The universal basic semantic structures enable to specify Universal Semantic Databases and Messages in a single common data structure. In a Universal Semantic Database the data (expressions) are integrated with the information from which the meaning of the expressions can be interpreted. For example, a Universal Semantic Database may include as instances the expression: "statement: the Eiffel Tower <is located in> Paris". Such data can be interpreted without the need to know a meta model about buildings and cities, because the expression includes the standard phrase <is located in>, which is defined in the Gellish Formal English Taxonomic Dictionary (upper ontology section) and thus became computer interpretable. That technology made it possible to define the Universal Expression format files that are suitable to contain expressions of any kind of fact, and universal databases and data exchange messages that consist of one or more of such universal files. A Gellish Database or Message typically consists of multiple Universal Expression format files. The definition of such a universal Gellish Expression format file defines:

  • A simple syntax (table columns) for the expression of any statement, question, etc. as a classified relation.
  • A few 'bootstrapping relation types'.
  • A number of accessory facts (facts about a main fact).

The definition of Formal English itself is specified in Formal English (especially the Upper Ontology section of the Taxonomic Dictionary. It uses the 'bootstrapping relation types' and defines a large number of other kinds of relations and concepts. Actual information that is provided by language users is also expressed in Formal English. Both collections of expressions can be stored in the tabular universal Gellish Expression Format.

The Gellish Semantic Modeling Methodology includes the definition of a Universal Semantic Database structure (see the document: "Gellish Syntax and Contextual Facts - Definition of Universal Semantic Databases and Data Exchange Messages" in the download area). The ability to include meaning in such semantic databases facilitates building distributed databases that enable applications to interpret the meaning from the content. This implies that Gellish databases can be integrated directly, because all applications use the same standard language and thus use the same standard relation types and the same definitions of the concepts in the Gellish Taxonomic Dictionary. This also implies that in general Gellish Databases have a wider applicability than relational or object oriented databases.