|
ASTcentric Structure
This is the core of the ASTcentric Framework. It deals with
- classes representing an AST
- input/output of an AST
- basic AST transformations
- AST validation
An AST is a tree of nodes. It has a name and an ID (AST-ID). A node knows its parent node (except of the root node). It can have
- a name and an ID (internal ID),
- a reference to a node of the same AST or another AST,
- a typed value,
- a sequence of child nodes.
A valid AST and node name has to fulfill the following rules:
- No trailing or leading space characters.
- No ISO control characters (as specified by the method Character.isISOControl()).
- At least one character.
That is, a name can have any printable Unicode character. In addition any number of spaces are allowed between printable characters.
Note, that the name does not specify an AST or a node uniquely. This is done by the unique ID.
In order to identify ASTs and nodes uniquely a universal ID is used. An AST is specified by an AST ID. A node is specified by an internal ID and the ID of its AST.
Note, that a node with a name must have an ID and vice versa.
An AST ID has three parts:
- Domain: A fully qualified Internet domain name (without
the final period '.') like astcentric.sourceforge.net.
It has to match the following regular expression
[a-z0-9.-]+
- Creator: Usual the name or the e-mail address of the person
who creates a new node. It has to match the following regular
expression
[a-z0-9._%-]+(@[a-zA-Z0-9.-]+)?
- Timestamp: A 64-bit integer number counting the milliseconds since
1.1.1970.
These three parts should make it very unlikely that two ASTs in the world have the same ID.
The ID which specifiey a node has two parts
- Creator: Same as for AST IDs.
[a-z0-9._%-]+(@[a-z0-9.-]+)?
- Number: A 32-bit integer number
In order to be unique the integer part is usually the largest integer among all IDs with the same creator plus one.
The creator part of the ID guarantees uniqueness in the following typical scenario of software development: Two developers manipulate the same AST in parallel. Both create new referrable nodes in different sections of the AST. Assuming a line-based textual AST representation both developers can check-in their changes without any merging conflict and without violating the uniqueness of the node IDs.
A node can have a value of one of the following types:
- Boolean: true or false
- Integer: 32-bit signed integer.
- Long: 64-bit signed integer.
- Float: 32-bit floating number like Float
- Double: 64-bit floating number like Double
- String: Unrestricted sequence of Unicode characters. Empty string allowed.
A node is either childless or has one or more child nodes. The child nodes are ordered. That is, each child node has a child index. The index of the first child is 0.
Nodes can be referred by other nodes of the same AST only if they have an internal ID. In addition they have to be exportable if they should be referrable by nodes from another AST. That is, only exportable nodes are visible from outside the AST. Their IDs are called alias IDs. An alias ID is the combination of the AST ID and an internal ID which is not the same as the original internal ID of the node. For each AST there exists a mapping from the alias IDs onto the internal ID of the exportable nodes.
The alias IDs allow the replacement of an exportable node by another one with a different internal ID. Only the mapping has to be changed.
|