aboutsummaryrefslogtreecommitdiff
path: root/docs/ARCHITECTURE.md
diff options
context:
space:
mode:
authorAleksey Kladov <[email protected]>2018-08-24 16:14:21 +0100
committerAleksey Kladov <[email protected]>2018-08-24 16:14:21 +0100
commit2812015d40c94ddb94bb57ed1a9811ff24414c44 (patch)
treed134a3af875036f3921c5b620685510f6dbe0bf6 /docs/ARCHITECTURE.md
parent6cade3f6d8ad7bb5a11b1910689b25f709c12502 (diff)
README
Diffstat (limited to 'docs/ARCHITECTURE.md')
-rw-r--r--docs/ARCHITECTURE.md93
1 files changed, 0 insertions, 93 deletions
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
deleted file mode 100644
index 6b4434396..000000000
--- a/docs/ARCHITECTURE.md
+++ /dev/null
@@ -1,93 +0,0 @@
1# Design and open questions about libsyntax
2
3
4The high-level description of the architecture is in RFC.md. You might
5also want to dig through https://github.com/matklad/fall/ which
6contains some pretty interesting stuff build using similar ideas
7(warning: it is completely undocumented, poorly written and in general
8not the thing which I recommend to study (yes, this is
9self-contradictory)).
10
11## Tree
12
13The centerpiece of this whole endeavor is the syntax tree, in the
14`tree` module. Open questions:
15
16- how to best represent errors, to take advantage of the fact that
17 they are rare, but to enable fully-persistent style structure
18 sharing between tree nodes?
19
20- should we make red/green split from Roslyn more pronounced?
21
22- one can layout nodes in a single array in such a way that children
23 of the node form a continuous slice. Seems nifty, but do we need it?
24
25- should we use SoA or AoS for NodeData?
26
27- should we split leaf nodes and internal nodes into separate arrays?
28 Can we use it to save some bits here and there? (leaves don't need
29 first_child field, for example).
30
31
32## Parser
33
34The syntax tree is produced using a three-staged process.
35
36First, a raw text is split into tokens with a lexer (the `lexer` module).
37Lexer has a peculiar signature: it is an `Fn(&str) -> Token`, where token
38is a pair of `SyntaxKind` (you should have read the `tree` module and RFC
39by this time! :)) and a len. That is, lexer chomps only the first
40token of the input. This forces the lexer to be stateless, and makes
41it possible to implement incremental relexing easily.
42
43Then, the bulk of work, the parser turns a stream of tokens into
44stream of events (the `parser` module; of particular interest are
45the `parser/event` and `parser/parser` modules, which contain parsing
46API, and the `parser/grammar` module, which contains actual parsing code
47for various Rust syntactic constructs). Not that parser **does not**
48construct a tree right away. This is done for several reasons:
49
50* to decouple the actual tree data structure from the parser: you can
51 build any data structure you want from the stream of events
52
53* to make parsing fast: you can produce a list of events without
54 allocations
55
56* to make it easy to tweak tree structure. Consider this code:
57
58 ```
59 #[cfg(test)]
60 pub fn foo() {}
61 ```
62
63 Here, the attribute and the `pub` keyword must be the children of
64 the `fn` node. However, when parsing them, we don't yet know if
65 there would be a function ahead: it very well might be a `struct`
66 there. If we use events, we generally don't care about this *in
67 parser* and just spit them in order.
68
69* (Is this true?) to make incremental reparsing easier: you can reuse
70 the same rope data structure for all of the original string, the
71 tokens and the events.
72
73
74The parser also does not know about whitespace tokens: it's the job of
75the next layer to assign whitespace and comments to nodes. However,
76parser can remap contextual tokens, like `>>` or `union`, so it has
77access to the text.
78
79And at last, the TreeBuilder converts a flat stream of events into a
80tree structure. It also *should* be responsible for attaching comments
81and rebalancing the tree, but it does not do this yet :)
82
83## Validator
84
85Parser and lexer accept a lot of *invalid* code intentionally. The
86idea is to post-process the tree and to proper error reporting,
87literal conversion and quick-fix suggestions. There is no
88design/implementation for this yet.
89
90
91## AST
92
93Nothing yet, see `AstNode` in `fall`.