Gellish Syntax and Contextual facts

Definition of the Gellish Expression Format
for Universal Semantic Databases, Data Exchange Messages and Queries

Gellish Formal English is a universally applicable language. This enables to define universal databases, provided that we can define a universal syntax or data model. This is indeed possible on the basis of the universal basic semantic patterns.
A Gellish Universal Semantic Database or Gellish Data Exchange Message or Query consists of a collection of Gellish Expressions with a uniform structure as briefly described in this section.
Every Gellish expression is an expression of a 'main fact' (a main statement or proposition or query) and a number of 'contextual facts' that are relevant for the correct interpretation of the main fact. Together, the contextual facts form the “Gellish collection of contextual facts ”. The collection is comparable with the 'Dublin core'. The Gellish collection is intended to be suitable as a complete set of contextual facts. Each collection is related to a single main fact and does not imply additional main facts.The Gellish collection of contextual facts is briefly described below.

The structure or syntax of Gellish expressions is also universally applicable and does not require a dedicated data model, nor an extensive database design. The Gellish Expression Format consists of a structure that can be implemented in one universal Gellish Expression Table or a semantically equivalent structure, e.g. in RDF/XML.
The document 'Gellish Syntax and Contextual Facts', a definition of the Gellish Expression Format for Universal Semantic Databases, Data Exchange Messages and Queries, defines the full collection of contextual facts that corresponds to columns in a Gellish Expression Table, including a detailed definition of each contextual fact.

Each Gellish Database, Exchange Message or Query consists of one or more Gellish Expression Tables or equivalent formats. Each of them has basically the same structure. and is standardized and is application system independent. This differs from conventional databases that usually have proprietary data structures, and that have many database tables that are all different. Each collection of Gellish Expressions shall contain at least the obligatory contextual facts of one of the subsets of contextual facts that are defined in the Gellish Syntax definition document, as is summarized below.
The Gellish Expressions shall be compliant with the grammar and the dictionary of Gellish Formal English (or a Gellish variant in any other natural language). The standardized format, combined with the Gellish formal language and its management of unique identifiers (UIDs) ensures that the content of various collections of expressions can be combined and used as an integrated collection without the need for data harmonization or conversion. This enables to combine an arbitrary number of collections of Gellish Expressions into one (real or virtual) Database. Such a database might be centralized, but it can also be a distributed database. The consistency of the various collections of expressions can be verified by software. Furthermore it enables that a Gellish query can be executed on each independent collection of expressions, whereas the results of the query can be combined and presented together to a user. This means that the collections then act as one distributed database.
The various Gellish collections of expressions all have basically the same format, such as having identical column definitions. Apart from the fact that expressions may use a subset collection of contextual facts as appropriate. Preferred collections of contextual facts are defined in standard Gellish subsets of contextual facts.

A Gellish Database with Gellish Expressions may be implemented in various formats. It may be in tabular form that are implemented in the form of flat Unicode tables, SQL databases, or even in the form of XLS spreadsheet tables. The tabular form may also be converted into non-tabular implementation forms, such as in RDF/XML triple stores.

2. Limitations of conventional databases

Conventional databases typically consist of many tables, each of which is composed of a number of columns. The definition of those tables and columns determine the storage capabilities of the database, whereas the relations between the columns define the kinds of facts that can be stored in such a database. Those columns and relations determine the database structure that defines the expression capabilities of the database. Similar rules apply for the structure of data exchange files and thus for the information that is exchanged in electronic data files.
This conventional database technology has some major constraints:

  • When data was not covered during the database design and thus is not included in the data model, then such data cannot be stored in the database nor exchanged via such a data file structure.
  • Different databases have different data structures, which causes that data in one database cannot be integrated with data from other databases nor exchanged between databases without dedicated data conversion.
  • A database modification or extension requires redesign of the database structure, modification of software and data conversion, which makes it a relatively complicated and costly exercise.

Another characteristic of conventional databases is that there are hardy international standards available or used for the content of the databases, being the data that is entered by its users. This typically means that local conventions are applied to limit the diversity of data that may be entered in those databases. As local conventions usually differ from other local conventions this has as disadvantage that data that are entered in one database cannot be compared or integrated with data in other databases, even if those database structures are the same and even if the application domain of the databases is the same. For example, within a company there may be various implementations of the same system in various sites for the storage of data about equipment, whereas for example the performance data about the same type of equipment still cannot be compared with the performance data in another location, because the equipment types have different names and the properties are also different.

3. Gellish Expression Format Definition

The document 'Gellish Syntax and Contextual Facts' defines the full collection of main and contextual facts in each Gellish Expression. Such a collection can be a part of a Gellish Database, a Gellish Message or a Gellish Query. The document also defines a number of standardized subsets for usage in applications that do not require the full number of contextual facts. The definition of the Gellish Expression Format is also included in the book 'Semantic Modeling in Formal English'.
One of those subsets, the Business Model subset, is suitable for nearly all database contents data exchange usecases that describe knowledge and propositions. Its application range includes business communication about both designs (imaginary objects) as well as real world objects (observed individual objects) during their lifecycle and about enquiries, answers, orders, confirmations, etc. This table is a superset (indicated in bold) of the product model subset, so it can also be used for knowledge about classes of objects.
The subsets consists up to over 30 standard kinds of contextual facts.

A summary of the syntax definition document is given below.

 

The Gellish Expression collection header definition

Each collection of Gellish Expressions, typically in the form of a table, should have a header or table definition that defines the facts (columns in the table) and a body of expressions of main facts and their contextual facts. Typically one row in the table for each expression.
A Gellish expression collection consists of a predefined number of contextual facts. Thus a table can consist either of a complete set of columns or of a subset of columns. The document defines a number of standard subsets of contextual facts.
Each contextual fact or column has a column ID and a column name and has a meaning as defined below.
Note that the presence of a value in a column field implies one or more relations with values in other columns. The semantics of these implied relations are specified in the definitions of the table columns. Those relations define the (accessory) facts about the main fact!

If a collection is implemented in a table in a spreadsheet or ASCII or Unicode file, then the table starts with a header of three lines, as follows:

  • The first line contains a sequence of the following four fields A1, A2, A3 and A4, which shall contain the following text:

A1 = ’Gellish’
A2 = Natural language of the expressions in the table. Default 'English'.
A3 = ‘Version:’
A4 = version number of the applicable Gellish dictionary.
A5 = date of the release of the facts in this table (optional).
followed by free text fields.

  • The second line contains the column ID’s which consists of standard numbers, although arbitrarily chosen. They allow the columns to be presented in a different sequence without loss of meaning (the numbers below correspond to those column ID’s).
  • The third line contains human readable text in every column field providing a short name of the column. This name is free text.
 

The Gellish expression collection body

The lines (rows) in a collection of Gellish expressions are independent of each other and thus the lines may be sorted in any sequence, without loss of semantics (meaning).

Each line in a collection of Gellish expressions (which in a spreadsheet table starts on the fourth line) expresses a group of facts, which consists of a 'main fact' and a number of 'contextual facts' that are defined as follows.

Main fact.
A main fact is expressed by a combination of the following objects (the column IDs' are given in brackets):

  • A UID of a main fact (1)
  • A UID of a left hand object (2)
  • A UID of a relation type (60)
  • A UID of a right hand object (15)
  • A UID of a scale (unit of measure) (66)
  • A UID of an intention (5)

Prime contextual facts.

The prime contextual facts are represented by the following table columns, each of which implies an expression by a triple of objects (which are implicitly classified). The table columns are:

  • A UID of a left hand kind of role (72)
  • A UID of a right hand kind of role (74)
  • A pair of left hand object cardinalities (44)
  • A pair of right hand object cardinalities (45)
  • A UID of the accuracy of a quantification (76)
  • A UID of a pick list for the qualification of aspects (70)
  • A UID of the validity context for a fact (19)
  • A partial textual definition of a concept or individual thing (65)
  • A full textual definition of a concept or individual thing (4)
  • A textual description of a main fact (42)
  • Remarks on the expression of a main fact (14)
  • Approval status of the expression of a main fact (8)

Secondary contextual facts.

The secondary contextual facts are represented by the following table columns, each of which implies a triple of classified objects. These contextual facts form the context for the validity of the UID’s and the names for objects that are identified by their UID’s:

  • A reason for latest change of status
  • A UID of the successor of the fact, in cases it has the status 'replaced'
  • UID of creator of fact
  • Date-time of start of validity of the fact
  • Date-time of start of availability of the expression
  • Date-time of creation of copy
  • Date-time of latest change of the expression
  • UID of addressee of the expression
  • References
  • UID of the expression of the fact (Line UID)
  • UID of a collection of facts to which the fact belongs
  • A presentation sequence in which the expressions can be presented

In a tabular implementation, the columns with UID's are accompanied by columns with a name for the thing that is represented by the UID.

Field formats and optionality

Several columns contain unique identifiers (UID’s). Each UID should preferably be represented by a 64-bit integer (8-byte, Int64 or bigint'),' whereas only positive values shall be used. It is not recommended to use an unsigned integer (which only allows positive values) because SQL only enables the bigint datatype, which is signed.
Most other columns contain character string values. For database implementations it is indicated whether they have a fixed or variable length (nvarchar or varchar) or whether the string is externally stored (data types ntext and text). In addition to that it is indicated whether the cells may contain Unicode.
Fields in columns that are indicated as optional may be left empty, in which case the indicated default value is applicable. Otherwise a field value is obligatory.

Further details of the column definitions are given in the document 'Gellish Syntax and Contextual Facts'.

Language Definition Principles (base ontology)

The base language definition section (or base ontology) of the Gellish Formal English Taxonomic Dictionary defines the core of the Gellish languages, such as Formal English and Formal Dutch (Formeel Nederlands). That section primarily defines the kinds of facts that can be expressed in Gellish and secondary it defines the generic concepts in the top of the specialization hierarchy (the subtype-supertype hierarchy or Taxonomy) of concepts in the Gellish Taxonomic Dictionary.

The base language definition section consists of a number of formal language expressions and typically has the form of a Gellish Expression Table, although other syntaxes are possible. The structure (columns) of that table is defined in the document 'Definition of Universal Semantic Databases and Data Exchange Messages'. Each line (row) in that table expresses a main statement or idea and a number of accessory facts. (Note: when we talk about facts we also mean other kinds of ideas, such as opinions)
Each main statement is denoted as a separate 'object' with its own language independent unique identifier (the UID of the idea). Furthermore, each main statement is expressed in Gellish according to the semantic principle, as a (collection of) binary relation(s) between objects. Therefore, below is described how a main statement and thus how a relation is expressed in Gellish. The accessory facts are facts about the main statement (such as its status, date of creation and author) and facts that specify names of concepts or contexts for validity and interpretation of the expression.
Below we focus on the expression of main statements as binary relations between objects.

We distinguish binary relations from higher order relations. Higher order relations are separate objects that are related to more than two involved 'things', as will be explained later.

1. Binary relations

The basic kinds of relations (also called ‘relation types’) are binary relations. A binary relation is a relation that relates two things. A binary relation can be used for expressing a fact, statement or opinion that one thing is related to one other thing.
For example, the fact that ‘the Eiffel tower is located in Paris’ is a fact that can be expressed by a binary relation, although the expression requires seven words in English. The expression uses the relation type 'location relation' that is denoted in Gellish Formal English by the standard phrase <is located in>. That relation type has as language independent unique identifier (UID) 5138. Thus the Gellish Formal English expression will be a statement as follows:

statement:  the Eiffel tower <is located in> Paris

When the Eiffel tower has UID 101, Paris has UID 102, and the whole fact has UID 201, then the language independent expression becomes:

statement 201: 101 5138 102

To support human readability this is preferably expressed in a Gellish English database or exchange file/message as:

statement 201: 101 the Eiffel tower 5138 is located in 102 Paris

or in a Gellish Expression Table form as:

UID of fact UID of left hand object Name of left hand object UID of relation type Name of relation type UID of right hand object Name of right hand object
201 101 the Eiffel tower 5138 is located in 102 Paris

The base language definition section of the Gellish dictionary defines the semantics of all kinds of binary relations (relation types) that belong to the Gellish language. Such a definition recognizes that each of the two objects that are related by a binary relation has a role of a particular kind in the relation. For example, the Eiffel tower has a role that can be classified as ‘located’, whereas Paris has a role that can be classified as ‘locator’. Furthermore, the semantics of each relation type can be defined more precisely by specifying which kinds of things may play the required roles. For example it may be specified that a role of located as well as a role of locator can only be played by a physical object, because only physical objects can be located in space. This means that the semantic definition of a relation type is specified in five steps in the base language definition as follows:

  1. It specifies for each relation type that it <is a kind of> a more general relation type and it provides at least a textual definition of the relation type. The specialization relations together create a subtype-supertype hierarchy of relation types (a taxonomy of relation types). The result for ‘location relation’ is, that indirectly it is defined to be a subtype of binary relation between individual things. On its turn that is one of the subtypes of relation, which is an indirect subtype of the top concept called ‘anything’.

  2. It specifies for each binary relation type what the first and the second kind of role is that is required for the normal sequence in the Gellish expression.

  3. It specifies for each kind of role what kind of thing may play such a kind of role.

  4. It specified the normal Gellish phrase and the inverse Gellish phrase. The latter phrase requires that the role players have an inverse sequence.
    For example: Paris <is the location of> the Eiffel tower.

  5. It defines each kind of role and each kind of thing that can play a role by specifying its direct supertype(s) and providing a textual definition for each of them. These specialization relations together create a subtype-supertype hierarchy of roles and other concepts. A specialization relation implies that the subtype concept inherits all facts that are true for its supertype concept(s). This means that a relation type inherits the roles that are required for its supertype concept(s). The definitions in the base language definition make use of this mechanism in two ways: 1. For some relation types none or only one kind of role is specified, which means that the inherited kind of role is applicable and 2. Every specified kind of role shall be a subtype of the kind of role that is specified for the supertype relation type. 

The Gellish language enables in principle to express any kind of fact. This requires that various kinds of binary relations are defined. These include:

  • Kinds of relations that classify relations between individual things. They express real or imaginary ‘facts’, including propositions, statements, questions, answers, denials, confirmations, etc. about a state of affairs.
    For example: John Doe <is an employee of> Ford

  • Relations between kinds of things. This includes relation that express ‘facts’ that are by definition the case for all members of the related kinds of things. It includes also expressions of knowledge about what can be the case or about requirements about what shall be the case (in a particular context) for members of the kinds.
    For example: air bag <can be a part of a> car
    and: car <is a specialization of> vehicle.

  • Relations between individual things and kinds of things. This includes classification relations, but also relations between a kind of thing and an individual object, such as the expression that things of a particular kind are made by a particular company.
    For example: S40 <is made by> Volvo A.G.

  • Relations between individual things and collections of things.
    For example: V-101 <is an element of> stock of valves in company X 

  • Relations between collections.
    For example: collection A1 <is a subset of> collection A.

2. Higher order relations

Occurrences, such as activities, processes and events, and physical laws are typical examples of higher order relations, because they describe interactions or correlations between more than two things. Such higher order relations are expressed in Gellish by defining the relation as a separate object and by specifying a collection of elementary binary relations with that higher order relation. Each of those elementary relations specifies the role that a particular thing plays in the higher order relation.
For example, according to the IDEF0 terminology, an activity such as the construction of the Eiffel tower, has typically an input, an output, a control (signal) and a mechanism, usually being a performer, enabler or tool, whereas Gellish also recognizes additional roles. Such a higher order relation is the specified in Gellish by a collection of expressions of partial facts, each of which describing one of the involved physical objects and its relation to the activity. For example:

the construction of the Eiffel tower has as input x tons of steel bars
the construction of the Eiffel tower has as output the Eiffel tower
the construction of the Eiffel tower has as designer Mr. Eiffel
etc.

Each of these elementary kinds of binary relation types is defined in the base language definition section in the same way as the ordinary binary relation types.

3. Bootstrapping relation types

All facts in the base language defintion are expressed using just six relation types. Thus the facts in the base language definition can be interpreted when the semantics of only those six relation types are known. Thus software that interprets the content of the table should be provided with the meaning of these bootstrapping relation types. All other relation types that are defined in the Gelish language are defined in the base language definition using those six relation types. Thus, other Gellish Database tables can be interpreted only after the import and interpretation of the the base language definition section.
The bootstrapping relation types are:

1146 <is a specialization of>
4731 <requires as role-1 a>
4733 <requires as role-2 a>
4714 <can have a role as a>
1981 <is a synonym of>
1986 <is an inverse of>

The definition of these relation types is provided in the base language definition section itself. 

4. Other kinds of relations

The main content of the base language definition section of the taxonomic dictionary is the definition of a hierarchy (taxonomy) of kinds of relations, their required roles and allowed kinds of role players.

The kinds of relations are arranged in categories as is illustrated in the following figure.


 

The categories are for relations between individual things, relations between kinds of things, relations between an individual thing and a kind of thing and relations between collections.

 

The Formal English language

Formal English is a formalized language for the expression of information, knowledge and requirements as well as for the storage and exchange of data in an open, system independent, human and computer interpretable way. Formal English is derived from natural English and is based on various International Standards, among which modeling sources, such as ISO 10303-42, 50, 202, 221, ISO 15926 as well as W3C sources RDF, RDFS and OWL, as well as terminology sources, such as ISO 16354, ISO 1998, IEC 60050 and many other sources. Gellish includes rules for its own extension.

Formal English consists of three components:

  • The formal language definition (the upper ontology).
    This component defines how thoughts (ideas, stamenets about facts, etc.) and queries about any object, activity or aspect can be specified in a consistent computer interpretable way and how a number of 'contextual facts' about every main fact can be specified. This enables to store and exchange information and knowledge (including documents and 3D models) in a neutral format, to apply logic reasoning and querying and also to manage all those data and documents. The language definition is based on basic semantic patterns that specify what kinds of relations are required to express meaning. The core of the language specifications is formed by the definition of standard kinds of relations, their required roles and the definition of the kinds of things that can play such roles in those kinds of relations. The formal languages enable automated translation of expressions between natural languages. The language also covers the expression of dialogs, such as queries and response messages.

  • The Gellish Syntax and Contextual Fact - Definition of the Gellish Expression Format.
    This component defines how every Gellish enabled database or Data Exchange File or Query and Response Message could be structured and which contextual facts (meta data) can be added. It in fact defines a universal data structure. This universal structure, in combination with the application of a common formal language for the content enables integration of data from multiple sources. It also enables the seamless cooperation of multiple central or distributed databases as if they were one consistent database. A Gellish enabled database or data exchange file is a collection of semantic expressions, which means that it includes not only ordinary data, but it also contains the definitions of the used concepts as well as system independent rules for the interpretation of the stored expressions.

  • The Taxonomic Dictionary of Formal English.
    The Formal English language definition includes an extensive electronic smart Taxonomic Dictionary that consists of a generic core section (The Upper Ontology section) that defines common general concepts and kainds of relations and various Domain Taxonomic Dictionaries. The dictionary as a whole contains definitions of concepts and terminology (including synonyms as well as homonyms) from a variety of application domains. The dictionary is called an electronic Smart Dictionary, because it contains human as well as computer interpretable knowledge due to explicit relations between the defined concepts. For example, the concepts in the dictionary are arranged in a strict subtype-supertype hierarchy (also called a taxonomy) which enables inheritance of characteristics from generic concepts to more specific subtype concepts. The core taxonomic dictionary and the domain taxonomic dictionaries together form one consistent whole. The taxonomic dictionaries are extensible with other domain specific concepts and terminology as well as with proprietary concepts and terminology. New specialized domain dictionaries may be added. For example, standard product types and manufacturer's models can be included by defining them as further specialized proprietary extensions of domain dictionaries and thus as extension of the definition of the formal language.

The formal languages have a very rich semantic expression capability that exceeds the capabilities of conventional databases and other formal languages because of the large variety and detailed subtypes of kinds of relations that enable to express thought with semantic precision.

Expressions in the formal languages can be stored and exchanged in various kinds of implementation environments. For example they can be implemented in the form of SQL Database tables, XML files and messages using the XML Schema definition, RDF format, CSV files as well as in standardized tables in ordinary Excel spreadsheet form.

Proprietary and public extensions

Licensees as well as non-licensees of Gellish can create definitions of new concepts and objects. They can apply them themselves directly as proprietary extensions and they can propose them for consideration and inclusion as public concepts in the taxonomic dictionary of the formal languages. After verification and approval of their quality by the Gellish language manager, such additions receive a Gellish Unique Identifier (UID) and the public ones will be added to the taxonomic dictionary and published for general use. Proprietary extensions can be certified, but will not be published by Gellish.net. Proposals of licensees will be handled conform a contractual agreement.

The taxonomic dictionary of Formal English

The English variant of the Gellish family of formal languages includes an electronic taxonomic dictionary with ordinary English terms, synonyms and abbreviations, extended with many technical terms, often consisting of multi-word terms. This defines a lot of concepts that are not available in conventional dictionaries. The taxonomic dictionary includes not only definitions of concepts of things, aspects, processes and activities but also definitions of kinds of relations that are used to create expressions. The dictionary complies with the requirements of ISO 16354.

The Taxonomic Dictionary of Formal English contains the following domain taxonomic dictionaries:

  • Generic concepts and relation types (TOPini)
  • Units of Measures, scales and currencies
  • Activities, Events, Processes and Functions
  • Physical objects of various kinds, such as:
  • - Static equipment, process units and piping
  • - Buildings, civil and structural items
  • - Electrical and Instrumentation, Control and Valves
  • - Rotating equipment, Transport equipment and Solids Handling
  • - Roles of physical objects (usages)
  • Aspects, Properties, Qualities and Roles of aspects
  • Materials of constructions (steel and non-steel), Fluids and Waves
  • Documents and Identification, Information, Symbols and Graphics
  • Geographic objects, including countries
  • Biology
  • Organizations and Procurement
  • Mathematics, Geometry and Shapes
  • Waste water and water treatment

Formal Dutch (Nederlands)

The dictionary of Formal Dutch (Formeel Nederlands), defines the same concepts as the dictionary of Formal English, and uses the same unique identifiers (UIDs) to represent the concepts across languages and thereby it supports Dutch-English automatic translation and vice versa for expressions and models.
An example, a domain Taxonomic Dictionary is a Dutch dictionary for Buildings and Civil technology (Formeel Nederlands Woordenboek voor de Bouw).

Gellish.net Support

Gellish.net can provide knowledge and experience on the successful application of the Semantic Modeling and language definition and application, especially through train the trainer courses. It also advises on the creation of Semantic Databases and Exchange files and Messages as well as queries, searching and the generation of response messages in formal languages. Gellish.net also provides services on the creation or extension of taxonomic dictionaries and ontologies or on mapping between them.

Semantic Information Models

Information that is expressed in Gellish Formal English forms a Semantic Information Model. Such an information model is a collection of formalized expressions of facts and/or opinions about some things, kinds or subjects. It is called an information model, because the expressions are structured in a particular syntax and use terms and concepts from Formal English which makes them computer interpretable.

There are many kinds of Information Models.

For example:

  • Definition models in a Domain Dictionary of concepts, arranged as a Taxonomy.
  • Knowledge models about the design, construction or operation of various kinds of physical objects. 
  • Requirements models that specify requirements for products or processes or what data and/or documents shall be delivered by a project or party.
  • Product models that specify products, for example in supplier catalogs.
  • Facility Information Models (FIMs) that specify data and documents about a facility
  • Building Information Models (BIMs) that specify data and documents about a building
  • Rule models that specify business rules that guide business processes
  • etc.

Technically a Gellish Information Model are available in the form of one or more Gellish Expression Tables, in a database, file or message. 

Semantic information models are information models that are composed of complete expressions that include information that is required to interpret the meaning from the expressions. Such expressions includes kinds of relations between data elements and contextual information. This means that in principle data need not be presented in dedicated screen lay-outs and graphical user interfaces, although such a presentation is still possible.

The contextual information in Gellish Databases and Messages include Accessory Facts as defined in the document "Definition of Universal Databases and Data Exchange Messages". The Accessory Facts are summarized here.