diff options
-rw-r--r-- | README.md | 107 | ||||
-rw-r--r-- | docs/ARCHITECTURE.md | 93 | ||||
-rw-r--r-- | docs/RFC.md | 494 | ||||
-rw-r--r-- | docs/TESTS.md | 44 | ||||
-rw-r--r-- | docs/TOOLS.md | 36 |
5 files changed, 102 insertions, 672 deletions
@@ -5,13 +5,110 @@ | |||
5 | [![Build status](https://ci.appveyor.com/api/projects/status/j56x1hbje8rdg6xk/branch/master?svg=true)](https://ci.appveyor.com/project/matklad/libsyntax2/branch/master) | 5 | [![Build status](https://ci.appveyor.com/api/projects/status/j56x1hbje8rdg6xk/branch/master?svg=true)](https://ci.appveyor.com/project/matklad/libsyntax2/branch/master) |
6 | 6 | ||
7 | 7 | ||
8 | libsyntax2.0 is an **experimental** parser of the Rust language, | ||
9 | intended for the use in IDEs. | ||
10 | [RFC](https://github.com/rust-lang/rfcs/pull/2256). | ||
8 | 11 | ||
9 | libsyntax2.0 is an **experimental** implementation of the corresponding [RFC](https://github.com/rust-lang/rfcs/pull/2256). | ||
10 | 12 | ||
11 | See [`docs`](./docs) folder to learn how libsyntax2 works, and check | 13 | ## Quick Start |
12 | [`CONTRIBUTING.md`](./CONTRIBUTING.md) if you want to contribute! | 14 | |
13 | **WARNING** everything is in a bit of a flux recently, the docs are obsolete, | 15 | ``` |
14 | see the recent work on red/green trees. | 16 | $ cargo test |
17 | $ cargo parse < crates/libsyntax2/src/lib.rs | ||
18 | ``` | ||
19 | |||
20 | |||
21 | ## Trying It Out | ||
22 | |||
23 | This installs experimental VS Code plugin | ||
24 | |||
25 | ``` | ||
26 | $ cargo install-code | ||
27 | ``` | ||
28 | |||
29 | It's better to remove existing Rust plugins to avoid interference. | ||
30 | Warning: plugin is not intended for general use, has a lot of rough | ||
31 | edges and missing features (notably, no code completion). That said, | ||
32 | while originally libsyntax2 was developed in IntelliJ, @matklad now | ||
33 | uses this plugin (and thus, libsytax2) to develop libsyntax2, and it | ||
34 | doesn't hurt too much :-) | ||
35 | |||
36 | |||
37 | ### Features: | ||
38 | |||
39 | * syntax highlighting (LSP does not have API for it, so impl is hacky | ||
40 | and sometimes fall-backs to the horrible built-in highlighting) | ||
41 | |||
42 | * commands (`ctrl+shift+p` or keybindings) | ||
43 | - **Show Rust Syntax Tree** (use it to verify that plugin works) | ||
44 | - **Rust Extend Selection** (works with multiple cursors) | ||
45 | - **Rust Matching Brace** (knows the difference between `<` and `<`) | ||
46 | - **Rust Parent Module** | ||
47 | - **Rust Join Lines** (deals with trailing commas) | ||
48 | |||
49 | * **Go to symbol in file** | ||
50 | |||
51 | * **Go to symbol in workspace** (no support for Cargo deps yet) | ||
52 | |||
53 | * code actions: | ||
54 | - Flip `,` in comma separated lists | ||
55 | - Add `#[derive]` to struct/enum | ||
56 | - Add `impl` block to struct/enum | ||
57 | - Run tests at caret | ||
58 | |||
59 | * **Go to definition** ("correct" for `mod foo;` decls, index-based for functions). | ||
60 | |||
61 | |||
62 | ## Code Walk-Through | ||
63 | |||
64 | ### `crates/libsyntax2` | ||
65 | |||
66 | - `yellow`, red/green syntax tree, heavily inspired [by this](https://github.com/apple/swift/tree/ab68f0d4cbf99cdfa672f8ffe18e433fddc8b371/lib/Syntax) | ||
67 | - `grammar`, the actual parser | ||
68 | - `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `yellow` trees | ||
69 | - `grammar.ron` RON description of the grammar, which is used to | ||
70 | generate `syntax_kinds` and `ast` modules. | ||
71 | - `algo`: generic tree algorithms, including `walk` for O(1) stack | ||
72 | space tree traversal (this is cool) and `visit` for type-driven | ||
73 | visiting the nodes (this is double plus cool, if you understand how | ||
74 | `Visitor` works, you understand libsyntax2). | ||
75 | |||
76 | |||
77 | ### `crates/libeditor` | ||
78 | |||
79 | Most of IDE features leave here, unlike `libanalysis`, `libeditor` is | ||
80 | single-file and is basically a bunch of pure functions. | ||
81 | |||
82 | |||
83 | ### `crates/libanalysis` | ||
84 | |||
85 | A stateful library for analyzing many Rust files as they change. | ||
86 | `WorldState` is a mutable entity (clojure's atom) which holds current | ||
87 | state, incorporates changes and handles out `World`s --- immutable | ||
88 | consistent snapshots of `WorldState`, which actually power analysis. | ||
89 | |||
90 | |||
91 | ### `crates/server` | ||
92 | |||
93 | An LSP implementation which uses `libanalysis` for managing state and | ||
94 | `libeditor` for actually doing useful stuff. | ||
95 | |||
96 | |||
97 | ### `crates/cli` | ||
98 | |||
99 | A CLI interface to libsyntax | ||
100 | |||
101 | ### `crate/tools` | ||
102 | |||
103 | Code-gen tasks, used to develop libsyntax2: | ||
104 | |||
105 | - `cargo gen-kinds` -- generate `ast` and `syntax_kinds` | ||
106 | - `cargo gen-tests` -- collect inline tests from grammar | ||
107 | - `cargo install-code` -- build and install VS Code extension and server | ||
108 | |||
109 | ### `code` | ||
110 | |||
111 | VS Code plugin | ||
15 | 112 | ||
16 | 113 | ||
17 | ## License | 114 | ## License |
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md deleted file mode 100644 index 6b4434396..000000000 --- a/docs/ARCHITECTURE.md +++ /dev/null | |||
@@ -1,93 +0,0 @@ | |||
1 | # Design and open questions about libsyntax | ||
2 | |||
3 | |||
4 | The high-level description of the architecture is in RFC.md. You might | ||
5 | also want to dig through https://github.com/matklad/fall/ which | ||
6 | contains some pretty interesting stuff build using similar ideas | ||
7 | (warning: it is completely undocumented, poorly written and in general | ||
8 | not the thing which I recommend to study (yes, this is | ||
9 | self-contradictory)). | ||
10 | |||
11 | ## Tree | ||
12 | |||
13 | The centerpiece of this whole endeavor is the syntax tree, in the | ||
14 | `tree` module. Open questions: | ||
15 | |||
16 | - how to best represent errors, to take advantage of the fact that | ||
17 | they are rare, but to enable fully-persistent style structure | ||
18 | sharing between tree nodes? | ||
19 | |||
20 | - should we make red/green split from Roslyn more pronounced? | ||
21 | |||
22 | - one can layout nodes in a single array in such a way that children | ||
23 | of the node form a continuous slice. Seems nifty, but do we need it? | ||
24 | |||
25 | - should we use SoA or AoS for NodeData? | ||
26 | |||
27 | - should we split leaf nodes and internal nodes into separate arrays? | ||
28 | Can we use it to save some bits here and there? (leaves don't need | ||
29 | first_child field, for example). | ||
30 | |||
31 | |||
32 | ## Parser | ||
33 | |||
34 | The syntax tree is produced using a three-staged process. | ||
35 | |||
36 | First, a raw text is split into tokens with a lexer (the `lexer` module). | ||
37 | Lexer has a peculiar signature: it is an `Fn(&str) -> Token`, where token | ||
38 | is a pair of `SyntaxKind` (you should have read the `tree` module and RFC | ||
39 | by this time! :)) and a len. That is, lexer chomps only the first | ||
40 | token of the input. This forces the lexer to be stateless, and makes | ||
41 | it possible to implement incremental relexing easily. | ||
42 | |||
43 | Then, the bulk of work, the parser turns a stream of tokens into | ||
44 | stream of events (the `parser` module; of particular interest are | ||
45 | the `parser/event` and `parser/parser` modules, which contain parsing | ||
46 | API, and the `parser/grammar` module, which contains actual parsing code | ||
47 | for various Rust syntactic constructs). Not that parser **does not** | ||
48 | construct a tree right away. This is done for several reasons: | ||
49 | |||
50 | * to decouple the actual tree data structure from the parser: you can | ||
51 | build any data structure you want from the stream of events | ||
52 | |||
53 | * to make parsing fast: you can produce a list of events without | ||
54 | allocations | ||
55 | |||
56 | * to make it easy to tweak tree structure. Consider this code: | ||
57 | |||
58 | ``` | ||
59 | #[cfg(test)] | ||
60 | pub fn foo() {} | ||
61 | ``` | ||
62 | |||
63 | Here, the attribute and the `pub` keyword must be the children of | ||
64 | the `fn` node. However, when parsing them, we don't yet know if | ||
65 | there would be a function ahead: it very well might be a `struct` | ||
66 | there. If we use events, we generally don't care about this *in | ||
67 | parser* and just spit them in order. | ||
68 | |||
69 | * (Is this true?) to make incremental reparsing easier: you can reuse | ||
70 | the same rope data structure for all of the original string, the | ||
71 | tokens and the events. | ||
72 | |||
73 | |||
74 | The parser also does not know about whitespace tokens: it's the job of | ||
75 | the next layer to assign whitespace and comments to nodes. However, | ||
76 | parser can remap contextual tokens, like `>>` or `union`, so it has | ||
77 | access to the text. | ||
78 | |||
79 | And at last, the TreeBuilder converts a flat stream of events into a | ||
80 | tree structure. It also *should* be responsible for attaching comments | ||
81 | and rebalancing the tree, but it does not do this yet :) | ||
82 | |||
83 | ## Validator | ||
84 | |||
85 | Parser and lexer accept a lot of *invalid* code intentionally. The | ||
86 | idea is to post-process the tree and to proper error reporting, | ||
87 | literal conversion and quick-fix suggestions. There is no | ||
88 | design/implementation for this yet. | ||
89 | |||
90 | |||
91 | ## AST | ||
92 | |||
93 | Nothing yet, see `AstNode` in `fall`. | ||
diff --git a/docs/RFC.md b/docs/RFC.md deleted file mode 100644 index 2bd9f18f1..000000000 --- a/docs/RFC.md +++ /dev/null | |||
@@ -1,494 +0,0 @@ | |||
1 | - Feature Name: libsyntax2.0 | ||
2 | - Start Date: 2017-12-30 | ||
3 | - RFC PR: (leave this empty) | ||
4 | - Rust Issue: (leave this empty) | ||
5 | |||
6 | |||
7 | >I think the lack of reusability comes in object-oriented languages, | ||
8 | >not functional languages. Because the problem with object-oriented | ||
9 | >languages is they’ve got all this implicit environment that they | ||
10 | >carry around with them. You wanted a banana but what you got was a | ||
11 | >gorilla holding the banana and the entire jungle. | ||
12 | > | ||
13 | >If you have referentially transparent code, if you have pure | ||
14 | >functions — all the data comes in its input arguments and everything | ||
15 | >goes out and leave no state behind — it’s incredibly reusable. | ||
16 | > | ||
17 | > **Joe Armstrong** | ||
18 | |||
19 | # Summary | ||
20 | [summary]: #summary | ||
21 | |||
22 | The long-term plan is to rewrite libsyntax parser and syntax tree data | ||
23 | structure to create a software component independent of the rest of | ||
24 | rustc compiler and suitable for the needs of IDEs and code | ||
25 | editors. This RFCs is the first step of this plan, whose goal is to | ||
26 | find out if this is possible at least in theory. If it is possible, | ||
27 | the next steps would be a prototype implementation as a crates.io | ||
28 | crate and a separate RFC for integrating the prototype with rustc, | ||
29 | other tools, and eventual libsyntax removal. | ||
30 | |||
31 | Note that this RFC does not propose to stabilize any API for working | ||
32 | with rust syntax: the semver version of the hypothetical library would | ||
33 | be `0.1.0`. It is intended to be used by tools, which are currently | ||
34 | closely related to the compiler: `rustc`, `rustfmt`, `clippy`, `rls` | ||
35 | and hypothetical `rustfix`. While it would be possible to create | ||
36 | third-party tools on top of the new libsyntax, the burden of adopting | ||
37 | to breaking changes would be on authors of such tools. | ||
38 | |||
39 | |||
40 | # Motivation | ||
41 | [motivation]: #motivation | ||
42 | |||
43 | There are two main drawbacks with the current version of libsyntax: | ||
44 | |||
45 | * It is tightly integrated with the compiler and hard to use | ||
46 | independently | ||
47 | |||
48 | * The AST representation is not well-suited for use inside IDEs | ||
49 | |||
50 | |||
51 | ## IDE support | ||
52 | |||
53 | There are several differences in how IDEs and compilers typically | ||
54 | treat source code. | ||
55 | |||
56 | In the compiler, it is convenient to transform the source | ||
57 | code into Abstract Syntax Tree form, which is independent of the | ||
58 | surface syntax. For example, it's convenient to discard comments, | ||
59 | whitespaces and desugar some syntactic constructs in terms of the | ||
60 | simpler ones. | ||
61 | |||
62 | In contrast, IDEs work much closer to the source code, so it is | ||
63 | crucial to preserve full information about the original text. For | ||
64 | example, IDE may adjust indentation after typing a `}` which closes a | ||
65 | block, and to do this correctly, IDE must be aware of syntax (that is, | ||
66 | that `}` indeed closes some block, and is not a syntax error) and of | ||
67 | all whitespaces and comments. So, IDE suitable AST should explicitly | ||
68 | account for syntactic elements, not considered important by the | ||
69 | compiler. | ||
70 | |||
71 | Another difference is that IDEs typically work with incomplete and | ||
72 | syntactically invalid code. This boils down to two parser properties. | ||
73 | First, the parser must produce syntax tree even if some required input | ||
74 | is missing. For example, for input `fn foo` the function node should | ||
75 | be present in the parse, despite the fact that there is no parameters | ||
76 | or body. Second, the parser must be able to skip over parts of input | ||
77 | it can't recognize and aggressively recover from errors. That is, the | ||
78 | syntax tree data structure should be able to handle both missing and | ||
79 | extra nodes. | ||
80 | |||
81 | IDEs also need the ability to incrementally reparse and relex source | ||
82 | code after the user types. A smart IDE would use syntax tree structure | ||
83 | to handle editing commands (for example, to add/remove trailing commas | ||
84 | after join/split lines actions), so parsing time can be very | ||
85 | noticeable. | ||
86 | |||
87 | |||
88 | Currently rustc uses the classical AST approach, and preserves some of | ||
89 | the source code information in the form of spans in the AST. It is not | ||
90 | clear if this structure can full fill all IDE requirements. | ||
91 | |||
92 | |||
93 | ## Reusability | ||
94 | |||
95 | In theory, the parser can be a pure function, which takes a `&str` as | ||
96 | an input, and produces a `ParseTree` as an output. | ||
97 | |||
98 | This is great for reusability: for example, you can compile this | ||
99 | function to WASM and use it for fast client-side validation of syntax | ||
100 | on the rust playground, or you can develop tools like `rustfmt` on | ||
101 | stable Rust outside of rustc repository, or you can embed the parser | ||
102 | into your favorite IDE or code editor. | ||
103 | |||
104 | This is also great for correctness: with such simple interface, it's | ||
105 | possible to write property-based tests to thoroughly compare two | ||
106 | different implementations of the parser. It's also straightforward to | ||
107 | create a comprehensive test suite, because all the inputs and outputs | ||
108 | are trivially serializable to human-readable text. | ||
109 | |||
110 | Another benefit is performance: with this signature, you can cache a | ||
111 | parse tree for each file, with trivial strategy for cache invalidation | ||
112 | (invalidate an entry when the underling file changes). On top of such | ||
113 | a cache it is possible to build a smart code indexer which maintains | ||
114 | the set of symbols in the project, watches files for changes and | ||
115 | automatically reindexes only changed files. | ||
116 | |||
117 | Unfortunately, the current libsyntax is far from this ideal. For | ||
118 | example, even the lexer makes use of the `FileMap` which is | ||
119 | essentially a global state of the compiler which represents all know | ||
120 | files. As a data point, it turned out to be easier to move `rustfmt` | ||
121 | into the main `rustc` repository than to move libsyntax outside! | ||
122 | |||
123 | |||
124 | # Guide-level explanation | ||
125 | [guide-level-explanation]: #guide-level-explanation | ||
126 | |||
127 | Not applicable. | ||
128 | |||
129 | |||
130 | # Reference-level explanation | ||
131 | [reference-level-explanation]: #reference-level-explanation | ||
132 | |||
133 | It is not clear if a single parser can accommodate the needs of the | ||
134 | compiler and the IDE, but there is hope that it is possible. The RFC | ||
135 | proposes to develop libsynax2.0 as an experimental crates.io crate. If | ||
136 | the experiment turns out to be a success, the second RFC will propose | ||
137 | to integrate it with all existing tools and `rustc`. | ||
138 | |||
139 | Next, a syntax tree data structure is proposed for libsyntax2.0. It | ||
140 | seems to have the following important properties: | ||
141 | |||
142 | * It is lossless and faithfully represents the original source code, | ||
143 | including explicit nodes for comments and whitespace. | ||
144 | |||
145 | * It is flexible and allows to encode arbitrary node structure, | ||
146 | even for invalid syntax. | ||
147 | |||
148 | * It is minimal: it stores small amount of data and has no | ||
149 | dependencies. For instance, it does not need compiler's string | ||
150 | interner or literal data representation. | ||
151 | |||
152 | * While the tree itself is minimal, it is extensible in a sense that | ||
153 | it possible to associate arbitrary data with certain nodes in a | ||
154 | type-safe way. | ||
155 | |||
156 | |||
157 | It is not clear if this representation is the best one. It is heavily | ||
158 | inspired by [PSI] data structure which used in [IntelliJ] based IDEs | ||
159 | and in the [Kotlin] compiler. | ||
160 | |||
161 | [PSI]: http://www.jetbrains.org/intellij/sdk/docs/reference_guide/custom_language_support/implementing_parser_and_psi.html | ||
162 | [IntelliJ]: https://github.com/JetBrains/intellij-community/ | ||
163 | [Kotlin]: https://kotlinlang.org/ | ||
164 | |||
165 | |||
166 | ## Untyped Tree | ||
167 | |||
168 | The main idea is to store the minimal amount of information in the | ||
169 | tree itself, and instead lean heavily on the source code for the | ||
170 | actual data about identifier names, constant values etc. | ||
171 | |||
172 | All nodes in the tree are of the same type and store a constant for | ||
173 | the syntactic category of the element and a range in the source code. | ||
174 | |||
175 | Here is a minimal implementation of this data structure with some Rust | ||
176 | syntactic categories | ||
177 | |||
178 | |||
179 | ```rust | ||
180 | #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] | ||
181 | pub struct NodeKind(u16); | ||
182 | |||
183 | pub struct File { | ||
184 | text: String, | ||
185 | nodes: Vec<NodeData>, | ||
186 | } | ||
187 | |||
188 | struct NodeData { | ||
189 | kind: NodeKind, | ||
190 | range: (u32, u32), | ||
191 | parent: Option<u32>, | ||
192 | first_child: Option<u32>, | ||
193 | next_sibling: Option<u32>, | ||
194 | } | ||
195 | |||
196 | #[derive(Clone, Copy)] | ||
197 | pub struct Node<'f> { | ||
198 | file: &'f File, | ||
199 | idx: u32, | ||
200 | } | ||
201 | |||
202 | pub struct Children<'f> { | ||
203 | next: Option<Node<'f>>, | ||
204 | } | ||
205 | |||
206 | impl File { | ||
207 | pub fn root<'f>(&'f self) -> Node<'f> { | ||
208 | assert!(!self.nodes.is_empty()); | ||
209 | Node { file: self, idx: 0 } | ||
210 | } | ||
211 | } | ||
212 | |||
213 | impl<'f> Node<'f> { | ||
214 | pub fn kind(&self) -> NodeKind { | ||
215 | self.data().kind | ||
216 | } | ||
217 | |||
218 | pub fn text(&self) -> &'f str { | ||
219 | let (start, end) = self.data().range; | ||
220 | &self.file.text[start as usize..end as usize] | ||
221 | } | ||
222 | |||
223 | pub fn parent(&self) -> Option<Node<'f>> { | ||
224 | self.as_node(self.data().parent) | ||
225 | } | ||
226 | |||
227 | pub fn children(&self) -> Children<'f> { | ||
228 | Children { next: self.as_node(self.data().first_child) } | ||
229 | } | ||
230 | |||
231 | fn data(&self) -> &'f NodeData { | ||
232 | &self.file.nodes[self.idx as usize] | ||
233 | } | ||
234 | |||
235 | fn as_node(&self, idx: Option<u32>) -> Option<Node<'f>> { | ||
236 | idx.map(|idx| Node { file: self.file, idx }) | ||
237 | } | ||
238 | } | ||
239 | |||
240 | impl<'f> Iterator for Children<'f> { | ||
241 | type Item = Node<'f>; | ||
242 | |||
243 | fn next(&mut self) -> Option<Node<'f>> { | ||
244 | let next = self.next; | ||
245 | self.next = next.and_then(|node| node.as_node(node.data().next_sibling)); | ||
246 | next | ||
247 | } | ||
248 | } | ||
249 | |||
250 | pub const ERROR: NodeKind = NodeKind(0); | ||
251 | pub const WHITESPACE: NodeKind = NodeKind(1); | ||
252 | pub const STRUCT_KW: NodeKind = NodeKind(2); | ||
253 | pub const IDENT: NodeKind = NodeKind(3); | ||
254 | pub const L_CURLY: NodeKind = NodeKind(4); | ||
255 | pub const R_CURLY: NodeKind = NodeKind(5); | ||
256 | pub const COLON: NodeKind = NodeKind(6); | ||
257 | pub const COMMA: NodeKind = NodeKind(7); | ||
258 | pub const AMP: NodeKind = NodeKind(8); | ||
259 | pub const LINE_COMMENT: NodeKind = NodeKind(9); | ||
260 | pub const FILE: NodeKind = NodeKind(10); | ||
261 | pub const STRUCT_DEF: NodeKind = NodeKind(11); | ||
262 | pub const FIELD_DEF: NodeKind = NodeKind(12); | ||
263 | pub const TYPE_REF: NodeKind = NodeKind(13); | ||
264 | ``` | ||
265 | |||
266 | Here is a rust snippet and the corresponding parse tree: | ||
267 | |||
268 | ```rust | ||
269 | struct Foo { | ||
270 | field1: u32, | ||
271 | & | ||
272 | // non-doc comment | ||
273 | field2: | ||
274 | } | ||
275 | ``` | ||
276 | |||
277 | |||
278 | ``` | ||
279 | FILE | ||
280 | STRUCT_DEF | ||
281 | STRUCT_KW | ||
282 | WHITESPACE | ||
283 | IDENT | ||
284 | WHITESPACE | ||
285 | L_CURLY | ||
286 | WHITESPACE | ||
287 | FIELD_DEF | ||
288 | IDENT | ||
289 | COLON | ||
290 | WHITESPACE | ||
291 | TYPE_REF | ||
292 | IDENT | ||
293 | COMMA | ||
294 | WHITESPACE | ||
295 | ERROR | ||
296 | AMP | ||
297 | WHITESPACE | ||
298 | FIELD_DEF | ||
299 | LINE_COMMENT | ||
300 | WHITESPACE | ||
301 | IDENT | ||
302 | COLON | ||
303 | ERROR | ||
304 | WHITESPACE | ||
305 | R_CURLY | ||
306 | ``` | ||
307 | |||
308 | Note several features of the tree: | ||
309 | |||
310 | * All whitespace and comments are explicitly accounted for. | ||
311 | |||
312 | * The node for `STRUCT_DEF` contains the error element for `&`, but | ||
313 | still represents the following field correctly. | ||
314 | |||
315 | * The second field of the struct is incomplete: `FIELD_DEF` node for | ||
316 | it contains an `ERROR` element, but nevertheless has the correct | ||
317 | `NodeKind`. | ||
318 | |||
319 | * The non-documenting comment is correctly attached to the following | ||
320 | field. | ||
321 | |||
322 | |||
323 | ## Typed Tree | ||
324 | |||
325 | It's hard to work with this raw parse tree, because it is untyped: | ||
326 | node containing a struct definition has the same API as the node for | ||
327 | the struct field. But it's possible to add a strongly typed layer on | ||
328 | top of this raw tree, and get a zero-cost AST. Here is an example | ||
329 | which adds type-safe wrappers for structs and fields: | ||
330 | |||
331 | ```rust | ||
332 | // generic infrastructure | ||
333 | |||
334 | pub trait AstNode<'f>: Copy + 'f { | ||
335 | fn new(node: Node<'f>) -> Option<Self>; | ||
336 | fn node(&self) -> Node<'f>; | ||
337 | } | ||
338 | |||
339 | pub fn child_of_kind<'f>(node: Node<'f>, kind: NodeKind) -> Option<Node<'f>> { | ||
340 | node.children().find(|child| child.kind() == kind) | ||
341 | } | ||
342 | |||
343 | pub fn ast_children<'f, A: AstNode<'f>>(node: Node<'f>) -> Box<Iterator<Item=A> + 'f> { | ||
344 | Box::new(node.children().filter_map(A::new)) | ||
345 | } | ||
346 | |||
347 | // AST elements, specific to Rust | ||
348 | |||
349 | #[derive(Clone, Copy)] | ||
350 | pub struct StructDef<'f>(Node<'f>); | ||
351 | |||
352 | #[derive(Clone, Copy)] | ||
353 | pub struct FieldDef<'f>(Node<'f>); | ||
354 | |||
355 | #[derive(Clone, Copy)] | ||
356 | pub struct TypeRef<'f>(Node<'f>); | ||
357 | |||
358 | pub trait NameOwner<'f>: AstNode<'f> { | ||
359 | fn name_ident(&self) -> Node<'f> { | ||
360 | child_of_kind(self.node(), IDENT).unwrap() | ||
361 | } | ||
362 | |||
363 | fn name(&self) -> &'f str { self.name_ident().text() } | ||
364 | } | ||
365 | |||
366 | |||
367 | impl<'f> AstNode<'f> for StructDef<'f> { | ||
368 | fn new(node: Node<'f>) -> Option<Self> { | ||
369 | if node.kind() == STRUCT_DEF { Some(StructDef(node)) } else { None } | ||
370 | } | ||
371 | fn node(&self) -> Node<'f> { self.0 } | ||
372 | } | ||
373 | |||
374 | impl<'f> NameOwner<'f> for StructDef<'f> {} | ||
375 | |||
376 | impl<'f> StructDef<'f> { | ||
377 | pub fn fields(&self) -> Box<Iterator<Item=FieldDef<'f>> + 'f> { | ||
378 | ast_children(self.node()) | ||
379 | } | ||
380 | } | ||
381 | |||
382 | |||
383 | impl<'f> AstNode<'f> for FieldDef<'f> { | ||
384 | fn new(node: Node<'f>) -> Option<Self> { | ||
385 | if node.kind() == FIELD_DEF { Some(FieldDef(node)) } else { None } | ||
386 | } | ||
387 | fn node(&self) -> Node<'f> { self.0 } | ||
388 | } | ||
389 | |||
390 | impl<'f> FieldDef<'f> { | ||
391 | pub fn type_ref(&self) -> Option<TypeRef<'f>> { | ||
392 | ast_children(self.node()).next() | ||
393 | } | ||
394 | } | ||
395 | |||
396 | impl<'f> NameOwner<'f> for FieldDef<'f> {} | ||
397 | |||
398 | |||
399 | impl<'f> AstNode<'f> for TypeRef<'f> { | ||
400 | fn new(node: Node<'f>) -> Option<Self> { | ||
401 | if node.kind() == TYPE_REF { Some(TypeRef(node)) } else { None } | ||
402 | } | ||
403 | fn node(&self) -> Node<'f> { self.0 } | ||
404 | } | ||
405 | ``` | ||
406 | |||
407 | Note that although AST wrappers provide a type-safe access to the | ||
408 | tree, they are still represented as indexes, so clients of the syntax | ||
409 | tree can easily associated additional data with AST nodes by storing | ||
410 | it in a side-table. | ||
411 | |||
412 | |||
413 | ## Missing Source Code | ||
414 | |||
415 | The crucial feature of this syntax tree is that it is just a view into | ||
416 | the original source code. And this poses a problem for the Rust | ||
417 | language, because not all compiled Rust code is represented in the | ||
418 | form of source code! Specifically, Rust has a powerful macro system, | ||
419 | which effectively allows to create and parse additional source code at | ||
420 | compile time. It is not entirely clear that the proposed parsing | ||
421 | framework is able to handle this use case, and it's the main purpose | ||
422 | of this RFC to figure it out. The current idea for handling macros is | ||
423 | to make each macro expansion produce a triple of (expansion text, | ||
424 | syntax tree, hygiene information), where hygiene information is a side | ||
425 | table, which colors different ranges of the expansion text according | ||
426 | to the original syntactic context. | ||
427 | |||
428 | |||
429 | ## Implementation plan | ||
430 | |||
431 | This RFC proposes huge changes to the internals of the compiler, so | ||
432 | it's important to proceed carefully and incrementally. The following | ||
433 | plan is suggested: | ||
434 | |||
435 | * RFC discussion about the theoretical feasibility of the proposal, | ||
436 | and the best representation representation for the syntax tree. | ||
437 | |||
438 | * Implementation of the proposal as a completely separate crates.io | ||
439 | crate, by refactoring existing libsyntax source code to produce a | ||
440 | new tree. | ||
441 | |||
442 | * A prototype implementation of the macro expansion on top of the new | ||
443 | sytnax tree. | ||
444 | |||
445 | * Additional round of discussion/RFC about merging with the mainline | ||
446 | compiler. | ||
447 | |||
448 | |||
449 | # Drawbacks | ||
450 | [drawbacks]: #drawbacks | ||
451 | |||
452 | - No harm will be done as long as the new libsyntax exists as an | ||
453 | experiemt on crates.io. However, actually using it in the compiler | ||
454 | and other tools would require massive refactorings. | ||
455 | |||
456 | - It's difficult to know upfront if the proposed syntax tree would | ||
457 | actually work well in both the compiler and IDE. It may be possible | ||
458 | that some drawbacks will be discovered during implementation. | ||
459 | |||
460 | |||
461 | # Rationale and alternatives | ||
462 | [alternatives]: #alternatives | ||
463 | |||
464 | - Incrementally add more information about source code to the current | ||
465 | AST. | ||
466 | |||
467 | - Move the current libsyntax to crates.io as is. In the past, there | ||
468 | were several failed attempts to do that. | ||
469 | |||
470 | - Explore alternative representations for the parse tree. | ||
471 | |||
472 | - Use parser generator instead of hand written parser. Using the | ||
473 | parser from libsyntax directly would be easier, and hand-written | ||
474 | LL-style parsers usually have much better error recovery than | ||
475 | generated LR-style ones. | ||
476 | |||
477 | # Unresolved questions | ||
478 | [unresolved]: #unresolved-questions | ||
479 | |||
480 | - Is it at all possible to represent Rust parser as a pure function of | ||
481 | the source code? It seems like the answer is yes, because the | ||
482 | language and especially macros were cleverly designed with this | ||
483 | use-case in mind. | ||
484 | |||
485 | |||
486 | - Is it possible to implement macro expansion using the proposed | ||
487 | framework? This is the main question of this RFC. The proposed | ||
488 | solution of synthesizing source code on the fly seems workable: it's | ||
489 | not that different from the current implementation, which | ||
490 | synthesizes token trees. | ||
491 | |||
492 | |||
493 | - How to actually phase out current libsyntax, if libsyntax2.0 turns | ||
494 | out to be a success? | ||
diff --git a/docs/TESTS.md b/docs/TESTS.md deleted file mode 100644 index a9d32d1d4..000000000 --- a/docs/TESTS.md +++ /dev/null | |||
@@ -1,44 +0,0 @@ | |||
1 | # libsyntax2.0 testing infrastructure | ||
2 | |||
3 | Libsyntax2.0 tests are in the `tests/data` directory. Each test is a | ||
4 | pair of files, an `.rs` file with Rust code and a `.txt` file with a | ||
5 | human-readable representation of syntax tree. | ||
6 | |||
7 | The test suite is intended to be independent from a particular parser: | ||
8 | that's why it is just a list of files. | ||
9 | |||
10 | The test suite is intended to be progressive: that is, if you want to | ||
11 | write a Rust parser, you can TDD it by working through the test in | ||
12 | order. That's why each test file begins with the number. Generally, | ||
13 | tests should be added in order of the appearance of corresponding | ||
14 | functionality in libsytnax2.0. If a bug in parser is uncovered, a | ||
15 | **new** test should be created instead of modifying an existing one: | ||
16 | it is preferable to have a gazillion of small isolated test files, | ||
17 | rather than a single file which covers all edge cases. It's okay for | ||
18 | files to have the same name except for the leading number. In general, | ||
19 | test suite should be append-only: old tests should not be modified, | ||
20 | new tests should be created instead. | ||
21 | |||
22 | Note that only `ok` tests are normative: `err` tests test error | ||
23 | recovery and it is totally ok for a parser to not implement any error | ||
24 | recovery at all. However, for libsyntax2.0 we do care about error | ||
25 | recovery, and we do care about precise and useful error messages. | ||
26 | |||
27 | There are also so-called "inline tests". They appear as the comments | ||
28 | with a `test` header in the source code, like this: | ||
29 | |||
30 | ```rust | ||
31 | // test fn_basic | ||
32 | // fn foo() {} | ||
33 | fn function(p: &mut Parser) { | ||
34 | // ... | ||
35 | } | ||
36 | ``` | ||
37 | |||
38 | You can run `cargo collect-tests` command to collect all inline tests | ||
39 | into `tests/data/inline` directory. The main advantage of inline tests | ||
40 | is that they help to illustrate what the relevant code is doing. | ||
41 | |||
42 | |||
43 | Contribution opportunity: design and implement testing infrastructure | ||
44 | for validators. | ||
diff --git a/docs/TOOLS.md b/docs/TOOLS.md deleted file mode 100644 index f8754c06f..000000000 --- a/docs/TOOLS.md +++ /dev/null | |||
@@ -1,36 +0,0 @@ | |||
1 | # Tools used to implement libsyntax | ||
2 | |||
3 | libsyntax uses several tools to help with development. | ||
4 | |||
5 | Each tool is a binary in the [tools/](../tools) package. | ||
6 | You can run them via `cargo run` command. | ||
7 | |||
8 | ``` | ||
9 | cargo run --package tools --bin tool | ||
10 | ``` | ||
11 | |||
12 | There are also aliases in [./cargo/config](../.cargo/config), | ||
13 | so the following also works: | ||
14 | |||
15 | ``` | ||
16 | cargo tool | ||
17 | ``` | ||
18 | |||
19 | |||
20 | ## Tool: `gen` | ||
21 | |||
22 | This tool reads a "grammar" from [grammar.ron](../grammar.ron) and | ||
23 | generates the `syntax_kinds.rs` file. You should run this tool if you | ||
24 | add new keywords or syntax elements. | ||
25 | |||
26 | |||
27 | ## Tool: `parse` | ||
28 | |||
29 | This tool reads rust source code from the standard input, parses it, | ||
30 | and prints the result to stdout. | ||
31 | |||
32 | |||
33 | ## Tool: `collect-tests` | ||
34 | |||
35 | This tools collect inline tests from comments in libsyntax2 source code | ||
36 | and places them into `tests/data/inline` directory. | ||