diff options
author | Aleksey Kladov <[email protected]> | 2018-02-11 14:58:22 +0000 |
---|---|---|
committer | Aleksey Kladov <[email protected]> | 2018-02-11 14:58:22 +0000 |
commit | 59087840f515c809498f09ec535e59054a893525 (patch) | |
tree | 282e1f9606dbeeba6c19a2dcd6ff94da420d155a /docs | |
parent | 9e2c0564783aa91f6440e7cadcc1a4dfda785de0 (diff) |
Document how the parsing works
Diffstat (limited to 'docs')
-rw-r--r-- | docs/ARCHITECTURE.md | 21 |
1 files changed, 9 insertions, 12 deletions
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index a1fa246c2..6b4434396 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md | |||
@@ -33,19 +33,22 @@ The centerpiece of this whole endeavor is the syntax tree, in the | |||
33 | 33 | ||
34 | The syntax tree is produced using a three-staged process. | 34 | The syntax tree is produced using a three-staged process. |
35 | 35 | ||
36 | First, a raw text is split into tokens with a lexer. Lexer has a | 36 | First, a raw text is split into tokens with a lexer (the `lexer` module). |
37 | peculiar signature: it is an `Fn(&str) -> Token`, where token is a | 37 | Lexer has a peculiar signature: it is an `Fn(&str) -> Token`, where token |
38 | pair of `SyntaxKind` (you should have read the `tree` module and RFC | 38 | is a pair of `SyntaxKind` (you should have read the `tree` module and RFC |
39 | by this time! :)) and a len. That is, lexer chomps only the first | 39 | by this time! :)) and a len. That is, lexer chomps only the first |
40 | token of the input. This forces the lexer to be stateless, and makes | 40 | token of the input. This forces the lexer to be stateless, and makes |
41 | it possible to implement incremental relexing easily. | 41 | it possible to implement incremental relexing easily. |
42 | 42 | ||
43 | Then, the bulk of work, the parser turns a stream of tokens into | 43 | Then, the bulk of work, the parser turns a stream of tokens into |
44 | stream of events. Not that parser **does not** construct a tree right | 44 | stream of events (the `parser` module; of particular interest are |
45 | away. This is done for several reasons: | 45 | the `parser/event` and `parser/parser` modules, which contain parsing |
46 | API, and the `parser/grammar` module, which contains actual parsing code | ||
47 | for various Rust syntactic constructs). Not that parser **does not** | ||
48 | construct a tree right away. This is done for several reasons: | ||
46 | 49 | ||
47 | * to decouple the actual tree data structure from the parser: you can | 50 | * to decouple the actual tree data structure from the parser: you can |
48 | build any datastructre you want from the stream of events | 51 | build any data structure you want from the stream of events |
49 | 52 | ||
50 | * to make parsing fast: you can produce a list of events without | 53 | * to make parsing fast: you can produce a list of events without |
51 | allocations | 54 | allocations |
@@ -77,12 +80,6 @@ And at last, the TreeBuilder converts a flat stream of events into a | |||
77 | tree structure. It also *should* be responsible for attaching comments | 80 | tree structure. It also *should* be responsible for attaching comments |
78 | and rebalancing the tree, but it does not do this yet :) | 81 | and rebalancing the tree, but it does not do this yet :) |
79 | 82 | ||
80 | |||
81 | ## Error reporing | ||
82 | |||
83 | TODO: describe how stuff like `skip_to_first` works | ||
84 | |||
85 | |||
86 | ## Validator | 83 | ## Validator |
87 | 84 | ||
88 | Parser and lexer accept a lot of *invalid* code intentionally. The | 85 | Parser and lexer accept a lot of *invalid* code intentionally. The |