From 70f119d07ef4f28997baad26b505afb14f2ae6a3 Mon Sep 17 00:00:00 2001 From: Jakob Voss Date: Thu, 7 Oct 2021 12:41:25 +0200 Subject: [PATCH 1/4] Start draft of KDL Data Model --- MODEL.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 MODEL.md diff --git a/MODEL.md b/MODEL.md new file mode 100644 index 0000000..4594218 --- /dev/null +++ b/MODEL.md @@ -0,0 +1,76 @@ +# KDL Data Model + +This document specifies an abstract data model of KDL document language. + +*This is version `0.0.0` of KDL Data model. It has not been released yet.* + +## Introduction + +KDL is defined by [KDL Specification](SPEC.md) as a formal language with +components such as nodes, identifiers, strings, comments, and whitespace. +Some of these components can be expressed in mutliple ways and some of +these components are typically ignored when a KDL document is parsed. + +KDL Data model defines a conceptual structure of semantically relevant elements +expressed in KDL syntax. The data model is required to process KDL documents +independent from syntax and implementations with [KDL Query +Language](QUERY-SPEC.md) and [KDL Schema Language](SCHEMA-SPEC.md). + +## Elements + +Every KDL document represents a [document](#document). + +Sequences and sets can be empty. + +A Unicode string is a sequence of Unicode code points. + +### Document + +A document is a sequence of [nodes](#node). + +Documents must not contain itself via node direct or indirect children to avoid +circular structures. + +### Node + +A node consists of three mandatory and two optional elements: + +* a **name** being a Unicode string +* an optional **tag** being a Unicode string +* a sequence of **arguments**, each being a [value](#value) +* a set of **properties**, each consisting of + * a name being Unicode string unique within the set + * a [value](#value) +* an optional list of **children** being a [document](#document) + +### Value + +A value is one of: + +* a Unicode string +* a **number**, consisting of an integer part and a decimal part +* a **boolean**, being one of the special values *true* and *false* +* the special value *null* + +## Application Notes + +Implementations may want to limit the set of processable KDL documents by +limiting properties of the data model such as the following: + +* A node with empty string tag (e.g. `("")node`) differs from a node without + tag (`node`). + +* A node with empty children (e.g. `node {}`) differs from a node without + children (`node`). + +* KDL does not differ between integer numbers and numbers with fractional part + nor does it define a fixed limit to the length or precision of numbers. + +* property names differ also if their distinct strings become equivalent after + Unicode normalization. + +Applications must make clear whether they support full KDL data model or a +specific subset and whether they modify KDL documents to fit to their limited +model. Applications may further extend their document model with additional +information such as line numbers beyond the scope of this specification. + From 64d23f521c17dbd8ae2f038162493b01fe5a6dc3 Mon Sep 17 00:00:00 2001 From: Jakob Voss Date: Fri, 8 Oct 2021 09:14:09 +0200 Subject: [PATCH 2/4] model: simplify definition of number --- MODEL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MODEL.md b/MODEL.md index 4594218..35faec6 100644 --- a/MODEL.md +++ b/MODEL.md @@ -48,7 +48,7 @@ A node consists of three mandatory and two optional elements: A value is one of: * a Unicode string -* a **number**, consisting of an integer part and a decimal part +* a **number**, being an arbitrary-precision, base-10 decimal number value * a **boolean**, being one of the special values *true* and *false* * the special value *null* From 19816c571c8fd96e7857a3ee9dbec6df71eb386f Mon Sep 17 00:00:00 2001 From: Jakob Voss Date: Fri, 8 Oct 2021 10:11:12 +0200 Subject: [PATCH 3/4] Extend KDL data model * Include suggestions from review * Add missing tags on values * Add sections on data binding and limited support of the data model --- MODEL.md | 66 ++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 47 insertions(+), 19 deletions(-) diff --git a/MODEL.md b/MODEL.md index 35faec6..b56180b 100644 --- a/MODEL.md +++ b/MODEL.md @@ -33,44 +33,72 @@ circular structures. ### Node -A node consists of three mandatory and two optional elements: +A node consists of five elements: * a **name** being a Unicode string -* an optional **tag** being a Unicode string +* a **tag** being a Unicode string * a sequence of **arguments**, each being a [value](#value) * a set of **properties**, each consisting of * a name being Unicode string unique within the set * a [value](#value) -* an optional list of **children** being a [document](#document) +* a list of **children** being a [document](#document) ### Value -A value is one of: +A value consists of two elements: + +* a **tag** being a Unicode string + +and one one of: * a Unicode string * a **number**, being an arbitrary-precision, base-10 decimal number value * a **boolean**, being one of the special values *true* and *false* * the special value *null* -## Application Notes +## Implementation Notes + +### Extensions to the data model + +While valid implementations must support at least the elements described above, +they *may* recognize and preserve additional information not captured in the +data model, such as: + +* Line numbers and character position of parsed KDL syntax elements. + +* Comments and precise details of whitespace and node terminators. + +* Whether a tag is the empty string (`("")node)` or missing. KDL syntax allows + nodes and values with and without tag. Both are identical in KDL data model. + +* Whether a node had an empty child list (`node {}`) or no child list at all. + KDL syntax allows both. KDL data model considers these identical. + +* The precise format of numbers, such as what radix they're specified in + (`0x1a`), whether they are an integer or not (`1` vs `1.0`), the presence of + underscores (`1_234`), etc. KDL syntax supports multiple ways to specify + numbers. KDL data model does not differentiate number types. + +### Data binding -Implementations may want to limit the set of processable KDL documents by -limiting properties of the data model such as the following: +The mapping of KDL elements to data elements of a particular programming or +database languages is beyond the scope of this data model. Implementations +should use tags as type annotations to map KDL data model instances to other +type systems. -* A node with empty string tag (e.g. `("")node`) differs from a node without - tag (`node`). +### Limitations to the data model -* A node with empty children (e.g. `node {}`) differs from a node without - children (`node`). +Implementations may choose to limit the set of processable KDL documents for +technical reasons. Such limitations must be stated clearly to indicate a useful +but incomplete support of KDL data model. Reasonable limitations include: -* KDL does not differ between integer numbers and numbers with fractional part - nor does it define a fixed limit to the length or precision of numbers. +* Precision of numbers -* property names differ also if their distinct strings become equivalent after - Unicode normalization. +* Types of elements that can have tags (e.g. disallow tags for boolean values + and `null`) -Applications must make clear whether they support full KDL data model or a -specific subset and whether they modify KDL documents to fit to their limited -model. Applications may further extend their document model with additional -information such as line numbers beyond the scope of this specification. +* Unicode Normalization (e.g. collapse properties into one when their names are + equivalent after normalization) +Implementations must document how limitations to KDL model are applied when KDL +document are read (e.g. give warnings and ignore unsupported elements). From 43f16802d3574970966020ddc2baba6634f8848e Mon Sep 17 00:00:00 2001 From: Jakob Voss Date: Tue, 12 Oct 2021 10:25:49 +0200 Subject: [PATCH 4/4] Include results of discussion on KDL Data Model --- MODEL.md | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/MODEL.md b/MODEL.md index b56180b..a164d65 100644 --- a/MODEL.md +++ b/MODEL.md @@ -33,29 +33,31 @@ circular structures. ### Node -A node consists of five elements: +A node consists of four mandatory and one optional elements: * a **name** being a Unicode string -* a **tag** being a Unicode string +* an optional **tag** being a Unicode string (the empty string tag is distinct from no tag) * a sequence of **arguments**, each being a [value](#value) -* a set of **properties**, each consisting of - * a name being Unicode string unique within the set +* a map of **properties**, each consisting of + * a key being a Unicode string unique within the map * a [value](#value) -* a list of **children** being a [document](#document) +* a sequence of **children** being a [document](#document) ### Value -A value consists of two elements: - -* a **tag** being a Unicode string - -and one one of: +A value consists of one of: * a Unicode string * a **number**, being an arbitrary-precision, base-10 decimal number value * a **boolean**, being one of the special values *true* and *false* * the special value *null* +and + +* an optional a **tag** being a Unicode string (the empty string tag is distinct from no tag) + +The data model does not limit the use of tags to specific types of values. + ## Implementation Notes ### Extensions to the data model @@ -68,9 +70,6 @@ data model, such as: * Comments and precise details of whitespace and node terminators. -* Whether a tag is the empty string (`("")node)` or missing. KDL syntax allows - nodes and values with and without tag. Both are identical in KDL data model. - * Whether a node had an empty child list (`node {}`) or no child list at all. KDL syntax allows both. KDL data model considers these identical.