aboutsummaryrefslogtreecommitdiff
path: root/docs/dev/syntax.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/dev/syntax.md')
-rw-r--r--docs/dev/syntax.md154
1 files changed, 77 insertions, 77 deletions
diff --git a/docs/dev/syntax.md b/docs/dev/syntax.md
index 0a4554c55..4dd1de659 100644
--- a/docs/dev/syntax.md
+++ b/docs/dev/syntax.md
@@ -17,7 +17,7 @@ The things described are implemented in two places
17 17
18* Syntax trees are lossless, or full fidelity. All comments and whitespace are preserved. 18* Syntax trees are lossless, or full fidelity. All comments and whitespace are preserved.
19* Syntax trees are semantic-less. They describe *strictly* the structure of a sequence of characters, they don't have hygiene, name resolution or type information attached. 19* Syntax trees are semantic-less. They describe *strictly* the structure of a sequence of characters, they don't have hygiene, name resolution or type information attached.
20* Syntax trees are simple value type. It is possible to create trees for a syntax without any external context. 20* Syntax trees are simple value type. It is possible to create trees for a syntax without any external context.
21* Syntax trees have intuitive traversal API (parent, children, siblings, etc). 21* Syntax trees have intuitive traversal API (parent, children, siblings, etc).
22* Parsing is lossless (even if the input is invalid, the tree produced by the parser represents it exactly). 22* Parsing is lossless (even if the input is invalid, the tree produced by the parser represents it exactly).
23* Parsing is resilient (even if the input is invalid, parser tries to see as much syntax tree fragments in the input as it can). 23* Parsing is resilient (even if the input is invalid, parser tries to see as much syntax tree fragments in the input as it can).
@@ -34,12 +34,12 @@ The syntax tree consists of three layers:
34* SyntaxNodes (aka RedNode) 34* SyntaxNodes (aka RedNode)
35* AST 35* AST
36 36
37Of these, only GreenNodes store the actual data, the other two layers are (non-trivial) views into green tree. 37Of these, only GreenNodes store the actual data, the other two layers are (non-trivial) views into green tree.
38Red-green terminology comes from Roslyn ([link](https://docs.microsoft.com/en-ie/archive/blogs/ericlippert/persistence-facades-and-roslyns-red-green-trees)) and gives the name to the `rowan` library. Green and syntax nodes are defined in rowan, ast is defined in rust-analyzer. 38Red-green terminology comes from Roslyn ([link](https://docs.microsoft.com/en-ie/archive/blogs/ericlippert/persistence-facades-and-roslyns-red-green-trees)) and gives the name to the `rowan` library. Green and syntax nodes are defined in rowan, ast is defined in rust-analyzer.
39 39
40Syntax trees are a semi-transient data structure. 40Syntax trees are a semi-transient data structure.
41In general, frontend does not keep syntax trees for all files in memory. 41In general, frontend does not keep syntax trees for all files in memory.
42Instead, it *lowers* syntax trees to more compact and rigid representation, which is not full-fidelity, but which can be mapped back to a syntax tree if so desired. 42Instead, it *lowers* syntax trees to more compact and rigid representation, which is not full-fidelity, but which can be mapped back to a syntax tree if so desired.
43 43
44 44
45### GreenNode 45### GreenNode
@@ -64,7 +64,7 @@ struct Token {
64} 64}
65``` 65```
66 66
67All the difference bettwen the above sketch and the real implementation are strictly due to optimizations. 67All the difference bettwen the above sketch and the real implementation are strictly due to optimizations.
68 68
69Points of note: 69Points of note:
70* The tree is untyped. Each node has a "type tag", `SyntaxKind`. 70* The tree is untyped. Each node has a "type tag", `SyntaxKind`.
@@ -73,7 +73,7 @@ Points of note:
73* Each token carries its full text. 73* Each token carries its full text.
74* The original text can be recovered by concatenating the texts of all tokens in order. 74* The original text can be recovered by concatenating the texts of all tokens in order.
75* Accessing a child of particular type (for example, parameter list of a function) generarly involves linerary traversing the children, looking for a specific `kind`. 75* Accessing a child of particular type (for example, parameter list of a function) generarly involves linerary traversing the children, looking for a specific `kind`.
76* Modifying the tree is roughly `O(depth)`. 76* Modifying the tree is roughly `O(depth)`.
77 We don't make special efforts to guarantree that the depth is not liner, but, in practice, syntax trees are branchy and shallow. 77 We don't make special efforts to guarantree that the depth is not liner, but, in practice, syntax trees are branchy and shallow.
78* If mandatory (grammar wise) node is missing from the input, it's just missing from the tree. 78* If mandatory (grammar wise) node is missing from the input, it's just missing from the tree.
79* If an extra erroneous input is present, it is wrapped into a node with `ERROR` kind, and treated just like any other node. 79* If an extra erroneous input is present, it is wrapped into a node with `ERROR` kind, and treated just like any other node.
@@ -82,29 +82,29 @@ Points of note:
82An input like `fn f() { 90 + 2 }` might be parsed as 82An input like `fn f() { 90 + 2 }` might be parsed as
83 83
84``` 84```
85FN_DEF@[0; 17) 85FN_DEF@0..17
86 FN_KW@[0; 2) "fn" 86 FN_KW@0..2 "fn"
87 WHITESPACE@[2; 3) " " 87 WHITESPACE@2..3 " "
88 NAME@[3; 4) 88 NAME@3..4
89 IDENT@[3; 4) "f" 89 IDENT@3..4 "f"
90 PARAM_LIST@[4; 6) 90 PARAM_LIST@4..6
91 L_PAREN@[4; 5) "(" 91 L_PAREN@4..5 "("
92 R_PAREN@[5; 6) ")" 92 R_PAREN@5..6 ")"
93 WHITESPACE@[6; 7) " " 93 WHITESPACE@6..7 " "
94 BLOCK_EXPR@[7; 17) 94 BLOCK_EXPR@7..17
95 BLOCK@[7; 17) 95 BLOCK@7..17
96 L_CURLY@[7; 8) "{" 96 L_CURLY@7..8 "{"
97 WHITESPACE@[8; 9) " " 97 WHITESPACE@8..9 " "
98 BIN_EXPR@[9; 15) 98 BIN_EXPR@9..15
99 LITERAL@[9; 11) 99 LITERAL@9..11
100 INT_NUMBER@[9; 11) "90" 100 INT_NUMBER@9..11 "90"
101 WHITESPACE@[11; 12) " " 101 WHITESPACE@11..12 " "
102 PLUS@[12; 13) "+" 102 PLUS@12..13 "+"
103 WHITESPACE@[13; 14) " " 103 WHITESPACE@13..14 " "
104 LITERAL@[14; 15) 104 LITERAL@14..15
105 INT_NUMBER@[14; 15) "2" 105 INT_NUMBER@14..15 "2"
106 WHITESPACE@[15; 16) " " 106 WHITESPACE@15..16 " "
107 R_CURLY@[16; 17) "}" 107 R_CURLY@16..17 "}"
108``` 108```
109 109
110#### Optimizations 110#### Optimizations
@@ -122,20 +122,20 @@ To reduce the amount of allocations, the GreenNode is a DST, which uses a single
122To more compactly store the children, we box *both* interior nodes and tokens, and represent 122To more compactly store the children, we box *both* interior nodes and tokens, and represent
123`Either<Arc<Node>, Arc<Token>>` as a single pointer with a tag in the last bit. 123`Either<Arc<Node>, Arc<Token>>` as a single pointer with a tag in the last bit.
124 124
125To avoid allocating EVERY SINGLE TOKEN on the heap, syntax trees use interning. 125To avoid allocating EVERY SINGLE TOKEN on the heap, syntax trees use interning.
126Because the tree is fully imutable, it's valid to structuraly share subtrees. 126Because the tree is fully imutable, it's valid to structuraly share subtrees.
127For example, in `1 + 1`, there will be a *single* token for `1` with ref count 2; the same goes for the ` ` whitespace token. 127For example, in `1 + 1`, there will be a *single* token for `1` with ref count 2; the same goes for the ` ` whitespace token.
128Interior nodes are shared as well (for example in `(1 + 1) * (1 + 1)`). 128Interior nodes are shared as well (for example in `(1 + 1) * (1 + 1)`).
129 129
130Note that, the result of the interning is an `Arc<Node>`. 130Note that, the result of the interning is an `Arc<Node>`.
131That is, it's not an index into interning table, so you don't have to have the table around to do anything with the tree. 131That is, it's not an index into interning table, so you don't have to have the table around to do anything with the tree.
132Each tree is fully self-contained (although different trees might share parts). 132Each tree is fully self-contained (although different trees might share parts).
133Currently, the interner is created per-file, but it will be easy to use a per-thread or per-some-contex one. 133Currently, the interner is created per-file, but it will be easy to use a per-thread or per-some-contex one.
134 134
135We use a `TextUnit`, a newtyped `u32`, to store the length of the text. 135We use a `TextSize`, a newtyped `u32`, to store the length of the text.
136 136
137We currently use `SmolStr`, an small object optimized string to store text. 137We currently use `SmolStr`, an small object optimized string to store text.
138This was mostly relevant *before* we implmented tree interning, to avoid allocating common keywords and identifiers. We should switch to storing text data alongside the interned tokens. 138This was mostly relevant *before* we implmented tree interning, to avoid allocating common keywords and identifiers. We should switch to storing text data alongside the interned tokens.
139 139
140#### Alternative designs 140#### Alternative designs
141 141
@@ -153,9 +153,9 @@ struct Token {
153} 153}
154``` 154```
155 155
156The tree then contains only non-trivia tokens. 156The tree then contains only non-trivia tokens.
157 157
158Another approach (from Dart) is to, in addition to a syntax tree, link all the tokens into a bidirectional link list. 158Another approach (from Dart) is to, in addition to a syntax tree, link all the tokens into a bidirectional link list.
159That way, the tree again contains only non-trivia tokens. 159That way, the tree again contains only non-trivia tokens.
160 160
161Explicit trivia nodes, like in `rowan`, are used by IntelliJ. 161Explicit trivia nodes, like in `rowan`, are used by IntelliJ.
@@ -165,26 +165,26 @@ Explicit trivia nodes, like in `rowan`, are used by IntelliJ.
165As noted before, accesing a specific child in the node requires a linear traversal of the children (though we can skip tokens, beacuse the tag is encoded in the pointer itself). 165As noted before, accesing a specific child in the node requires a linear traversal of the children (though we can skip tokens, beacuse the tag is encoded in the pointer itself).
166It is possible to recover O(1) access with another representation. 166It is possible to recover O(1) access with another representation.
167We explicitly store optional and missing (required by the grammar, but not present) nodes. 167We explicitly store optional and missing (required by the grammar, but not present) nodes.
168That is, we use `Option<Node>` for children. 168That is, we use `Option<Node>` for children.
169We also remove trivia tokens from the tree. 169We also remove trivia tokens from the tree.
170This way, each child kind genrerally occupies a fixed position in a parent, and we can use index access to fetch it. 170This way, each child kind genrerally occupies a fixed position in a parent, and we can use index access to fetch it.
171The cost is that we now need to allocate space for all not-present optional nodes. 171The cost is that we now need to allocate space for all not-present optional nodes.
172So, `fn foo() {}` will have slots for visibility, unsafeness, attributes, abi and return type. 172So, `fn foo() {}` will have slots for visibility, unsafeness, attributes, abi and return type.
173 173
174IntelliJ uses linear traversal. 174IntelliJ uses linear traversal.
175Roslyn and Swift do `O(1)` access. 175Roslyn and Swift do `O(1)` access.
176 176
177##### Mutable Trees 177##### Mutable Trees
178 178
179IntelliJ uses mutable trees. 179IntelliJ uses mutable trees.
180Overall, it creates a lot of additional complexity. 180Overall, it creates a lot of additional complexity.
181However, the API for *editing* syntax trees is nice. 181However, the API for *editing* syntax trees is nice.
182 182
183For example the assist to move generic bounds to where clause has this code: 183For example the assist to move generic bounds to where clause has this code:
184 184
185```kotlin 185```kotlin
186 for typeBound in typeBounds { 186 for typeBound in typeBounds {
187 typeBound.typeParamBounds?.delete() 187 typeBound.typeParamBounds?.delete()
188} 188}
189``` 189```
190 190
@@ -195,7 +195,7 @@ Modeling this with immutable trees is possible, but annoying.
195A function green tree is not super-convenient to use. 195A function green tree is not super-convenient to use.
196The biggest problem is acessing parents (there are no parent pointers!). 196The biggest problem is acessing parents (there are no parent pointers!).
197But there are also "identify" issues. 197But there are also "identify" issues.
198Let's say you want to write a code which builds a list of expressions in a file: `fn collect_exrepssions(file: GreenNode) -> HashSet<GreenNode>`. 198Let's say you want to write a code which builds a list of expressions in a file: `fn collect_exrepssions(file: GreenNode) -> HashSet<GreenNode>`.
199For the input like 199For the input like
200 200
201```rust 201```rust
@@ -233,7 +233,7 @@ impl SyntaxNode {
233 }) 233 })
234 } 234 }
235 fn parent(&self) -> Option<SyntaxNode> { 235 fn parent(&self) -> Option<SyntaxNode> {
236 self.parent.clone() 236 self.parent.clone()
237 } 237 }
238 fn children(&self) -> impl Iterator<Item = SyntaxNode> { 238 fn children(&self) -> impl Iterator<Item = SyntaxNode> {
239 let mut offset = self.offset 239 let mut offset = self.offset
@@ -251,8 +251,8 @@ impl SyntaxNode {
251 251
252impl PartialEq for SyntaxNode { 252impl PartialEq for SyntaxNode {
253 fn eq(&self, other: &SyntaxNode) { 253 fn eq(&self, other: &SyntaxNode) {
254 self.offset == other.offset 254 self.offset == other.offset
255 && Arc::ptr_eq(&self.green, &other.green) 255 && Arc::ptr_eq(&self.green, &other.green)
256 } 256 }
257} 257}
258``` 258```
@@ -261,35 +261,35 @@ Points of note:
261 261
262* SyntaxNode remembers its parent node (and, transitively, the path to the root of the tree) 262* SyntaxNode remembers its parent node (and, transitively, the path to the root of the tree)
263* SyntaxNode knows its *absolute* text offset in the whole file 263* SyntaxNode knows its *absolute* text offset in the whole file
264* Equality is based on identity. Comparing nodes from different trees does not make sense. 264* Equality is based on identity. Comparing nodes from different trees does not make sense.
265 265
266#### Optimization 266#### Optimization
267 267
268The reality is different though :-) 268The reality is different though :-)
269Traversal of trees is a common operation, and it makes sense to optimize it. 269Traversal of trees is a common operation, and it makes sense to optimize it.
270In particular, the above code allocates and does atomic operations during a traversal. 270In particular, the above code allocates and does atomic operations during a traversal.
271 271
272To get rid of atomics, `rowan` uses non thread-safe `Rc`. 272To get rid of atomics, `rowan` uses non thread-safe `Rc`.
273This is OK because trees traversals mostly (always, in case of rust-analyzer) run on a single thread. If you need to send a `SyntaxNode` to another thread, you can send a pair of **root**`GreenNode` (which is thread safe) and a `Range<usize>`. 273This is OK because trees traversals mostly (always, in case of rust-analyzer) run on a single thread. If you need to send a `SyntaxNode` to another thread, you can send a pair of **root**`GreenNode` (which is thread safe) and a `Range<usize>`.
274The other thread can restore the `SyntaxNode` by traversing from the root green node and looking for a node with specified range. 274The other thread can restore the `SyntaxNode` by traversing from the root green node and looking for a node with specified range.
275You can also use the similar trick to store a `SyntaxNode`. 275You can also use the similar trick to store a `SyntaxNode`.
276That is, a data structure that holds a `(GreenNode, Range<usize>)` will be `Sync`. 276That is, a data structure that holds a `(GreenNode, Range<usize>)` will be `Sync`.
277However rust-analyzer goes even further. 277However rust-analyzer goes even further.
278It treats trees as semi-transient and instead of storing a `GreenNode`, it generally stores just the id of the file from which the tree originated: `(FileId, Range<usize>)`. 278It treats trees as semi-transient and instead of storing a `GreenNode`, it generally stores just the id of the file from which the tree originated: `(FileId, Range<usize>)`.
279The `SyntaxNode` is the restored by reparsing the file and traversing it from root. 279The `SyntaxNode` is the restored by reparsing the file and traversing it from root.
280With this trick, rust-analyzer holds only a small amount of trees in memory at the same time, which reduces memory usage. 280With this trick, rust-analyzer holds only a small amount of trees in memory at the same time, which reduces memory usage.
281 281
282Additionally, only the root `SyntaxNode` owns an `Arc` to the (root) `GreenNode`. 282Additionally, only the root `SyntaxNode` owns an `Arc` to the (root) `GreenNode`.
283All other `SyntaxNode`s point to corresponding `GreenNode`s with a raw pointer. 283All other `SyntaxNode`s point to corresponding `GreenNode`s with a raw pointer.
284They also point to the parent (and, consequently, to the root) with an owning `Rc`, so this is sound. 284They also point to the parent (and, consequently, to the root) with an owning `Rc`, so this is sound.
285In other words, one needs *one* arc bump when initiating a traversal. 285In other words, one needs *one* arc bump when initiating a traversal.
286 286
287To get rid of allocations, `rowan` takes advantage of `SyntaxNode: !Sync` and uses a thread-local free list of `SyntaxNode`s. 287To get rid of allocations, `rowan` takes advantage of `SyntaxNode: !Sync` and uses a thread-local free list of `SyntaxNode`s.
288In a typical traversal, you only directly hold a few `SyntaxNode`s at a time (and their ancesstors indirectly), so a free list proportional to the depth of the tree removes all allocations in a typical case. 288In a typical traversal, you only directly hold a few `SyntaxNode`s at a time (and their ancesstors indirectly), so a free list proportional to the depth of the tree removes all allocations in a typical case.
289 289
290So, while traversal is not exactly incrementing a pointer, it's still prety cheep: tls + rc bump! 290So, while traversal is not exactly incrementing a pointer, it's still prety cheep: tls + rc bump!
291 291
292Traversal also yields (cheap) owned nodes, which improves ergonomics quite a bit. 292Traversal also yields (cheap) owned nodes, which improves ergonomics quite a bit.
293 293
294#### Alternative Designs 294#### Alternative Designs
295 295
@@ -309,14 +309,14 @@ struct SyntaxData {
309``` 309```
310 310
311This allows using true pointer equality for comparision of identities of `SyntaxNodes`. 311This allows using true pointer equality for comparision of identities of `SyntaxNodes`.
312rust-analyzer used to have this design as well, but since we've switch to cursors. 312rust-analyzer used to have this design as well, but since we've switch to cursors.
313The main problem with memoizing the red nodes is that it more than doubles the memory requirenments for fully realized syntax trees. 313The main problem with memoizing the red nodes is that it more than doubles the memory requirenments for fully realized syntax trees.
314In contrast, cursors generally retain only a path to the root. 314In contrast, cursors generally retain only a path to the root.
315C# combats increased memory usage by using weak references. 315C# combats increased memory usage by using weak references.
316 316
317### AST 317### AST
318 318
319`GreenTree`s are untyped and homogeneous, because it makes accomodating error nodes, arbitrary whitespace and comments natural, and because it makes possible to write generic tree traversals. 319`GreenTree`s are untyped and homogeneous, because it makes accomodating error nodes, arbitrary whitespace and comments natural, and because it makes possible to write generic tree traversals.
320However, when working with a specific node, like a function definition, one would want a strongly typed API. 320However, when working with a specific node, like a function definition, one would want a strongly typed API.
321 321
322This is what is provided by the AST layer. AST nodes are transparent wrappers over untyped syntax nodes: 322This is what is provided by the AST layer. AST nodes are transparent wrappers over untyped syntax nodes:
@@ -352,13 +352,13 @@ impl AstNode for FnDef {
352} 352}
353 353
354impl FnDef { 354impl FnDef {
355 pub fn param_list(&self) -> Option<ParamList> { 355 pub fn param_list(&self) -> Option<ParamList> {
356 self.syntax.children().find_map(ParamList::cast) 356 self.syntax.children().find_map(ParamList::cast)
357 } 357 }
358 pub fn ret_type(&self) -> Option<RetType> { 358 pub fn ret_type(&self) -> Option<RetType> {
359 self.syntax.children().find_map(RetType::cast) 359 self.syntax.children().find_map(RetType::cast)
360 } 360 }
361 pub fn body(&self) -> Option<BlockExpr> { 361 pub fn body(&self) -> Option<BlockExpr> {
362 self.syntax.children().find_map(BlockExpr::cast) 362 self.syntax.children().find_map(BlockExpr::cast)
363 } 363 }
364 // ... 364 // ...
@@ -409,14 +409,14 @@ Points of note:
409 409
410##### Semantic Full AST 410##### Semantic Full AST
411 411
412In IntelliJ the AST layer (dubbed **P**rogram **S**tructure **I**nterface) can have semantics attached, and is usually backed by either syntax tree, indices, or metadata from compiled libraries. 412In IntelliJ the AST layer (dubbed **P**rogram **S**tructure **I**nterface) can have semantics attached, and is usually backed by either syntax tree, indices, or metadata from compiled libraries.
413The backend for PSI can change dynamically. 413The backend for PSI can change dynamically.
414 414
415### Syntax Tree Recap 415### Syntax Tree Recap
416 416
417At its core, the syntax tree is a purely functional n-ary tree, which stores text at the leaf nodes and node "kinds" at all nodes. 417At its core, the syntax tree is a purely functional n-ary tree, which stores text at the leaf nodes and node "kinds" at all nodes.
418A cursor layer is added on top, which gives owned, cheap to clone nodes with identity semantics, parent links and absolute offsets. 418A cursor layer is added on top, which gives owned, cheap to clone nodes with identity semantics, parent links and absolute offsets.
419An AST layer is added on top, which reifies each node `Kind` as a separate Rust type with the corresponding API. 419An AST layer is added on top, which reifies each node `Kind` as a separate Rust type with the corresponding API.
420 420
421## Parsing 421## Parsing
422 422
@@ -432,17 +432,17 @@ impl GreenNodeBuilder {
432 432
433 pub fn start_node(&mut self, kind: SyntaxKind) { ... } 433 pub fn start_node(&mut self, kind: SyntaxKind) { ... }
434 pub fn finish_node(&mut self) { ... } 434 pub fn finish_node(&mut self) { ... }
435 435
436 pub fn finish(self) -> GreenNode { ... } 436 pub fn finish(self) -> GreenNode { ... }
437} 437}
438``` 438```
439 439
440The parser, ultimatelly, needs to invoke the `GreenNodeBuilder`. 440The parser, ultimatelly, needs to invoke the `GreenNodeBuilder`.
441There are two principal sources of inputs for the parser: 441There are two principal sources of inputs for the parser:
442 * source text, which contains trivia tokens (whitespace and comments) 442 * source text, which contains trivia tokens (whitespace and comments)
443 * token trees from macros, which lack trivia 443 * token trees from macros, which lack trivia
444 444
445Additionaly, input tokens do not correspond 1-to-1 with output tokens. 445Additionaly, input tokens do not correspond 1-to-1 with output tokens.
446For example, two consequtive `>` tokens might be glued, by the parser, into a single `>>`. 446For example, two consequtive `>` tokens might be glued, by the parser, into a single `>>`.
447 447
448For these reasons, the parser crate defines a callback interfaces for both input tokens and output trees. 448For these reasons, the parser crate defines a callback interfaces for both input tokens and output trees.
@@ -474,7 +474,7 @@ pub trait TreeSink {
474} 474}
475 475
476pub fn parse( 476pub fn parse(
477 token_source: &mut dyn TokenSource, 477 token_source: &mut dyn TokenSource,
478 tree_sink: &mut dyn TreeSink, 478 tree_sink: &mut dyn TreeSink,
479) { ... } 479) { ... }
480``` 480```
@@ -491,21 +491,21 @@ Syntax errors are not stored directly in the tree.
491The primary motivation for this is that syntax tree is not necessary produced by the parser, it may also be assembled manually from pieces (which happens all the time in refactorings). 491The primary motivation for this is that syntax tree is not necessary produced by the parser, it may also be assembled manually from pieces (which happens all the time in refactorings).
492Instead, parser reports errors to an error sink, which stores them in a `Vec`. 492Instead, parser reports errors to an error sink, which stores them in a `Vec`.
493If possible, errors are not reported during parsing and are postponed for a separate validation step. 493If possible, errors are not reported during parsing and are postponed for a separate validation step.
494For example, parser accepts visibility modifiers on trait methods, but then a separate tree traversal flags all such visibilites as erroneous. 494For example, parser accepts visibility modifiers on trait methods, but then a separate tree traversal flags all such visibilites as erroneous.
495 495
496### Macros 496### Macros
497 497
498The primary difficulty with macros is that individual tokens have identities, which need to be preserved in the syntax tree for hygiene purposes. 498The primary difficulty with macros is that individual tokens have identities, which need to be preserved in the syntax tree for hygiene purposes.
499This is handled by the `TreeSink` layer. 499This is handled by the `TreeSink` layer.
500Specifically, `TreeSink` constructs the tree in lockstep with draining the original token stream. 500Specifically, `TreeSink` constructs the tree in lockstep with draining the original token stream.
501In the process, it records which tokens of the tree correspond to which tokens of the input, by using text ranges to identify syntax tokens. 501In the process, it records which tokens of the tree correspond to which tokens of the input, by using text ranges to identify syntax tokens.
502The end result is that parsing an expanded code yields a syntax tree and a mapping of text-ranges of the tree to original tokens. 502The end result is that parsing an expanded code yields a syntax tree and a mapping of text-ranges of the tree to original tokens.
503 503
504To deal with precedence in cases like `$expr * 1`, we use special invisible parenthesis, which are explicitelly handled by the parser 504To deal with precedence in cases like `$expr * 1`, we use special invisible parenthesis, which are explicitelly handled by the parser
505 505
506### Whitespace & Comments 506### Whitespace & Comments
507 507
508Parser does not see whitespace nodes. 508Parser does not see whitespace nodes.
509Instead, they are attached to the tree in the `TreeSink` layer. 509Instead, they are attached to the tree in the `TreeSink` layer.
510 510
511For example, in 511For example, in
@@ -521,7 +521,7 @@ the comment will be (heuristically) made a child of function node.
521 521
522Green trees are cheap to modify, so incremental reparse works by patching a previous tree, without maintaining any additional state. 522Green trees are cheap to modify, so incremental reparse works by patching a previous tree, without maintaining any additional state.
523The reparse is based on heuristic: we try to contain a change to a single `{}` block, and reparse only this block. 523The reparse is based on heuristic: we try to contain a change to a single `{}` block, and reparse only this block.
524To do this, we maintain the invariant that, even for invalid code, curly braces are always paired correctly. 524To do this, we maintain the invariant that, even for invalid code, curly braces are always paired correctly.
525 525
526In practice, incremental reparsing doesn't actually matter much for IDE use-cases, parsing from scratch seems to be fast enough. 526In practice, incremental reparsing doesn't actually matter much for IDE use-cases, parsing from scratch seems to be fast enough.
527 527