2021-06-18
Index and transaction log segments could define value types once and the rows would then contain values without type tags.
Optimizing indexes for storage space and read performance should be separated from the application logic. Application could accept arbitrary precision integers but when an index segment is written to the disk it could be noted that using U16 for all the numbers without type tags is the optimal way of serializing the values in that segment.
Strings could use VLQ:s to define the length of the string.
Record types consisting of primitive types could be described with statements. Each record type and attributes in it would be represented with a global entity ids.
One date format could be defined as:
[:year i16
:month u8
:day u8
:date [:year :month :day]]Record types could be described with abstract types without specifying the precies serialization format. For example a date could be defined as:
[:year integer
:month integer
:day integer]There should be a tuple value type. Abstract tuple type could have arbitrary number of heterogeneous values. Concrete tuple type could fix the tuple length and value types.
An enumeration value type could define a primitieve integer type as the storage format and entity ids for each value. Each entity describes how the corresponding number should be interpreted.
Operator in a statement would be one example of an enumeration.
A data structure that provides space and performance efficient way of sharing large amounts of structured information between information systems in a near real time manner.
The data should describe itself in a way that humans can understand it. It should be possible to generate API:s for a given application specific schema for any programming language using the native data types of that programming language. Dynamic programming languages should be automatically be able to provide access to given dali data structure without code generation.
Data should be modelled in a way that combining data from different systems should be natural. Entity ids and attributes in the schemas should be globally unique.
Given urls to multiple dali data structures it should be possible to present a user friendly interface for browsing and querying the data as one big database. The data required for running the queries should be cacheable near the user.
The data structure sould allow distributing it efficiently using a peer to peer protocol.
It should be possible to document schemas in language agnostic way. Attributes could have many names and describtions in different languages.
Schemass should be optional. A dali datastructure should be able to store semistructured data. If the data has regular structure it would only mean that the data can be stored more space efficiently on disk, but if there are exceptions in the structure the serialization format would automatically change to a more verbose one that supports such irregularities.
Data refers to attributes with entity ids. How could a data reader retrieve metadata about those attributes to represent the data in a human readable format?
Since schemas change slowly, they are highly cacheable and they could be shared in a peer to peer manner.
If origin identifiers would follow domain name like hierarchical structure, it would be possible to share responsibility for mapping origin to their urls like in DNS.
DNS itself could be used to hold records that describe where a given origin can be accessed.
Using domain names as origin identifiers would allow using SSL certificates to sign data segments to ensure their authenticity.
Origin id and transaction number form a unique identifier of a schema version described in that origin.
This site is generated with zetgen