diff options
author | Aleksey Kladov <[email protected]> | 2018-01-10 18:58:38 +0000 |
---|---|---|
committer | Aleksey Kladov <[email protected]> | 2018-01-10 18:58:38 +0000 |
commit | 4c10c31be339ef3082f142061d83149af2a30ec8 (patch) | |
tree | 2343fbe5933e9d02b557d597c26c350825b6691f /docs | |
parent | 5ea7e5fb7ab9f9ed762c8b5220ba01a29796a871 (diff) |
D: start documenting stuff
Diffstat (limited to 'docs')
-rw-r--r-- | docs/ARCHITECTURE.md | 1 | ||||
-rw-r--r-- | docs/CONTRIBUTING.md | 32 | ||||
-rw-r--r-- | docs/TESTS.md | 30 | ||||
-rw-r--r-- | docs/rfc.md | 494 | ||||
-rw-r--r-- | docs/validation.md | 14 |
5 files changed, 571 insertions, 0 deletions
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 000000000..5b50f8faa --- /dev/null +++ b/docs/ARCHITECTURE.md | |||
@@ -0,0 +1 @@ | |||
# Design and open questions about libsyntax. | |||
diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md new file mode 100644 index 000000000..5ae2e830e --- /dev/null +++ b/docs/CONTRIBUTING.md | |||
@@ -0,0 +1,32 @@ | |||
1 | The project is in its early stages: contributions are welcome and | ||
2 | would be **very** helpful, but the project is not *yet* optimized for | ||
3 | contributors. Moreover, it is doubly experimental, so there's no | ||
4 | guarantee that any work here would reach production. That said, here | ||
5 | are some arias where contributions would be **especially** welcome: | ||
6 | |||
7 | |||
8 | * Designing internal data structures: RFC only outlines the | ||
9 | constraints, it's an open question how to satisfy them in the | ||
10 | optimal way. See `ARCHITECTURE.md` for current design questions. | ||
11 | |||
12 | * Porting libsyntax parser to libsyntax2: currently libsyntax2 parses | ||
13 | only a tiny subset of Rust. This should be fixed by porting parsing | ||
14 | functions from libsyntax one by one. | ||
15 | |||
16 | * Writing validators: by design, libsyntax2 is very lax about the | ||
17 | input. For example, the lexer happily accepts unclosed strings. The | ||
18 | idea is that there should be a higher level visitor, which walks the | ||
19 | syntax tree after parsing and produces all the warnings. Alas, | ||
20 | there's no such visitor yet :( Would you like to write one? :) | ||
21 | |||
22 | * Creating tests: it would be tremendously helpful to read each of | ||
23 | libsyntax and libsyntax2 parser functions and crate a small separate | ||
24 | test cases to cover each and every edge case. | ||
25 | |||
26 | * Building stuff with libsyntax2: it would be really cool to compile | ||
27 | libsyntax2 to WASM and add *client side* syntax validation to rust | ||
28 | playground! | ||
29 | |||
30 | |||
31 | Do take a look at the issue tracker, and try to read other docs in | ||
32 | this folder. | ||
diff --git a/docs/TESTS.md b/docs/TESTS.md new file mode 100644 index 000000000..8005ec9da --- /dev/null +++ b/docs/TESTS.md | |||
@@ -0,0 +1,30 @@ | |||
1 | # libsyntax2.0 testing infrastructure | ||
2 | |||
3 | Libsyntax2.0 tests are in the `tests/data` directory. Each test is a | ||
4 | pair of files, an `.rs` file with Rust code and a `.txt` file with a | ||
5 | human-readable representation of syntax tree. | ||
6 | |||
7 | The test suite is intended to be independent from a particular parser: | ||
8 | that's why it is just a list of files. | ||
9 | |||
10 | The test suite is intended to be progressive: that is, if you want to | ||
11 | write a Rust parser, you can TDD it by working through the test in | ||
12 | order. That's why each test file begins with the number. Generally, | ||
13 | tests should be added in order of the appearance of corresponding | ||
14 | functionality in libsytnax2.0. If a bug in parser is uncovered, a | ||
15 | **new** test should be created instead of modifying an existing one: | ||
16 | it is preferable to have a gazillion of small isolated test files, | ||
17 | rather than a single file which covers all edge cases. It's okay for | ||
18 | files to have the same name except for the leading number. In general, | ||
19 | test suite should be append-only: old tests should not be modified, | ||
20 | new tests should be created instead. | ||
21 | |||
22 | |||
23 | Note that only `ok` tests are normative: `err` tests test error | ||
24 | recovery and it is totally ok for a parser to not implement any error | ||
25 | recovery at all. However, for libsyntax2.0 we do care about error | ||
26 | recovery, and we do care about precise and useful error messages. | ||
27 | |||
28 | |||
29 | Contribution opportunity: design and implement testing infrastructure | ||
30 | for validators. | ||
diff --git a/docs/rfc.md b/docs/rfc.md new file mode 100644 index 000000000..2bd9f18f1 --- /dev/null +++ b/docs/rfc.md | |||
@@ -0,0 +1,494 @@ | |||
1 | - Feature Name: libsyntax2.0 | ||
2 | - Start Date: 2017-12-30 | ||
3 | - RFC PR: (leave this empty) | ||
4 | - Rust Issue: (leave this empty) | ||
5 | |||
6 | |||
7 | >I think the lack of reusability comes in object-oriented languages, | ||
8 | >not functional languages. Because the problem with object-oriented | ||
9 | >languages is they’ve got all this implicit environment that they | ||
10 | >carry around with them. You wanted a banana but what you got was a | ||
11 | >gorilla holding the banana and the entire jungle. | ||
12 | > | ||
13 | >If you have referentially transparent code, if you have pure | ||
14 | >functions — all the data comes in its input arguments and everything | ||
15 | >goes out and leave no state behind — it’s incredibly reusable. | ||
16 | > | ||
17 | > **Joe Armstrong** | ||
18 | |||
19 | # Summary | ||
20 | [summary]: #summary | ||
21 | |||
22 | The long-term plan is to rewrite libsyntax parser and syntax tree data | ||
23 | structure to create a software component independent of the rest of | ||
24 | rustc compiler and suitable for the needs of IDEs and code | ||
25 | editors. This RFCs is the first step of this plan, whose goal is to | ||
26 | find out if this is possible at least in theory. If it is possible, | ||
27 | the next steps would be a prototype implementation as a crates.io | ||
28 | crate and a separate RFC for integrating the prototype with rustc, | ||
29 | other tools, and eventual libsyntax removal. | ||
30 | |||
31 | Note that this RFC does not propose to stabilize any API for working | ||
32 | with rust syntax: the semver version of the hypothetical library would | ||
33 | be `0.1.0`. It is intended to be used by tools, which are currently | ||
34 | closely related to the compiler: `rustc`, `rustfmt`, `clippy`, `rls` | ||
35 | and hypothetical `rustfix`. While it would be possible to create | ||
36 | third-party tools on top of the new libsyntax, the burden of adopting | ||
37 | to breaking changes would be on authors of such tools. | ||
38 | |||
39 | |||
40 | # Motivation | ||
41 | [motivation]: #motivation | ||
42 | |||
43 | There are two main drawbacks with the current version of libsyntax: | ||
44 | |||
45 | * It is tightly integrated with the compiler and hard to use | ||
46 | independently | ||
47 | |||
48 | * The AST representation is not well-suited for use inside IDEs | ||
49 | |||
50 | |||
51 | ## IDE support | ||
52 | |||
53 | There are several differences in how IDEs and compilers typically | ||
54 | treat source code. | ||
55 | |||
56 | In the compiler, it is convenient to transform the source | ||
57 | code into Abstract Syntax Tree form, which is independent of the | ||
58 | surface syntax. For example, it's convenient to discard comments, | ||
59 | whitespaces and desugar some syntactic constructs in terms of the | ||
60 | simpler ones. | ||
61 | |||
62 | In contrast, IDEs work much closer to the source code, so it is | ||
63 | crucial to preserve full information about the original text. For | ||
64 | example, IDE may adjust indentation after typing a `}` which closes a | ||
65 | block, and to do this correctly, IDE must be aware of syntax (that is, | ||
66 | that `}` indeed closes some block, and is not a syntax error) and of | ||
67 | all whitespaces and comments. So, IDE suitable AST should explicitly | ||
68 | account for syntactic elements, not considered important by the | ||
69 | compiler. | ||
70 | |||
71 | Another difference is that IDEs typically work with incomplete and | ||
72 | syntactically invalid code. This boils down to two parser properties. | ||
73 | First, the parser must produce syntax tree even if some required input | ||
74 | is missing. For example, for input `fn foo` the function node should | ||
75 | be present in the parse, despite the fact that there is no parameters | ||
76 | or body. Second, the parser must be able to skip over parts of input | ||
77 | it can't recognize and aggressively recover from errors. That is, the | ||
78 | syntax tree data structure should be able to handle both missing and | ||
79 | extra nodes. | ||
80 | |||
81 | IDEs also need the ability to incrementally reparse and relex source | ||
82 | code after the user types. A smart IDE would use syntax tree structure | ||
83 | to handle editing commands (for example, to add/remove trailing commas | ||
84 | after join/split lines actions), so parsing time can be very | ||
85 | noticeable. | ||
86 | |||
87 | |||
88 | Currently rustc uses the classical AST approach, and preserves some of | ||
89 | the source code information in the form of spans in the AST. It is not | ||
90 | clear if this structure can full fill all IDE requirements. | ||
91 | |||
92 | |||
93 | ## Reusability | ||
94 | |||
95 | In theory, the parser can be a pure function, which takes a `&str` as | ||
96 | an input, and produces a `ParseTree` as an output. | ||
97 | |||
98 | This is great for reusability: for example, you can compile this | ||
99 | function to WASM and use it for fast client-side validation of syntax | ||
100 | on the rust playground, or you can develop tools like `rustfmt` on | ||
101 | stable Rust outside of rustc repository, or you can embed the parser | ||
102 | into your favorite IDE or code editor. | ||
103 | |||
104 | This is also great for correctness: with such simple interface, it's | ||
105 | possible to write property-based tests to thoroughly compare two | ||
106 | different implementations of the parser. It's also straightforward to | ||
107 | create a comprehensive test suite, because all the inputs and outputs | ||
108 | are trivially serializable to human-readable text. | ||
109 | |||
110 | Another benefit is performance: with this signature, you can cache a | ||
111 | parse tree for each file, with trivial strategy for cache invalidation | ||
112 | (invalidate an entry when the underling file changes). On top of such | ||
113 | a cache it is possible to build a smart code indexer which maintains | ||
114 | the set of symbols in the project, watches files for changes and | ||
115 | automatically reindexes only changed files. | ||
116 | |||
117 | Unfortunately, the current libsyntax is far from this ideal. For | ||
118 | example, even the lexer makes use of the `FileMap` which is | ||
119 | essentially a global state of the compiler which represents all know | ||
120 | files. As a data point, it turned out to be easier to move `rustfmt` | ||
121 | into the main `rustc` repository than to move libsyntax outside! | ||
122 | |||
123 | |||
124 | # Guide-level explanation | ||
125 | [guide-level-explanation]: #guide-level-explanation | ||
126 | |||
127 | Not applicable. | ||
128 | |||
129 | |||
130 | # Reference-level explanation | ||
131 | [reference-level-explanation]: #reference-level-explanation | ||
132 | |||
133 | It is not clear if a single parser can accommodate the needs of the | ||
134 | compiler and the IDE, but there is hope that it is possible. The RFC | ||
135 | proposes to develop libsynax2.0 as an experimental crates.io crate. If | ||
136 | the experiment turns out to be a success, the second RFC will propose | ||
137 | to integrate it with all existing tools and `rustc`. | ||
138 | |||
139 | Next, a syntax tree data structure is proposed for libsyntax2.0. It | ||
140 | seems to have the following important properties: | ||
141 | |||
142 | * It is lossless and faithfully represents the original source code, | ||
143 | including explicit nodes for comments and whitespace. | ||
144 | |||
145 | * It is flexible and allows to encode arbitrary node structure, | ||
146 | even for invalid syntax. | ||
147 | |||
148 | * It is minimal: it stores small amount of data and has no | ||
149 | dependencies. For instance, it does not need compiler's string | ||
150 | interner or literal data representation. | ||
151 | |||
152 | * While the tree itself is minimal, it is extensible in a sense that | ||
153 | it possible to associate arbitrary data with certain nodes in a | ||
154 | type-safe way. | ||
155 | |||
156 | |||
157 | It is not clear if this representation is the best one. It is heavily | ||
158 | inspired by [PSI] data structure which used in [IntelliJ] based IDEs | ||
159 | and in the [Kotlin] compiler. | ||
160 | |||
161 | [PSI]: http://www.jetbrains.org/intellij/sdk/docs/reference_guide/custom_language_support/implementing_parser_and_psi.html | ||
162 | [IntelliJ]: https://github.com/JetBrains/intellij-community/ | ||
163 | [Kotlin]: https://kotlinlang.org/ | ||
164 | |||
165 | |||
166 | ## Untyped Tree | ||
167 | |||
168 | The main idea is to store the minimal amount of information in the | ||
169 | tree itself, and instead lean heavily on the source code for the | ||
170 | actual data about identifier names, constant values etc. | ||
171 | |||
172 | All nodes in the tree are of the same type and store a constant for | ||
173 | the syntactic category of the element and a range in the source code. | ||
174 | |||
175 | Here is a minimal implementation of this data structure with some Rust | ||
176 | syntactic categories | ||
177 | |||
178 | |||
179 | ```rust | ||
180 | #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] | ||
181 | pub struct NodeKind(u16); | ||
182 | |||
183 | pub struct File { | ||
184 | text: String, | ||
185 | nodes: Vec<NodeData>, | ||
186 | } | ||
187 | |||
188 | struct NodeData { | ||
189 | kind: NodeKind, | ||
190 | range: (u32, u32), | ||
191 | parent: Option<u32>, | ||
192 | first_child: Option<u32>, | ||
193 | next_sibling: Option<u32>, | ||
194 | } | ||
195 | |||
196 | #[derive(Clone, Copy)] | ||
197 | pub struct Node<'f> { | ||
198 | file: &'f File, | ||
199 | idx: u32, | ||
200 | } | ||
201 | |||
202 | pub struct Children<'f> { | ||
203 | next: Option<Node<'f>>, | ||
204 | } | ||
205 | |||
206 | impl File { | ||
207 | pub fn root<'f>(&'f self) -> Node<'f> { | ||
208 | assert!(!self.nodes.is_empty()); | ||
209 | Node { file: self, idx: 0 } | ||
210 | } | ||
211 | } | ||
212 | |||
213 | impl<'f> Node<'f> { | ||
214 | pub fn kind(&self) -> NodeKind { | ||
215 | self.data().kind | ||
216 | } | ||
217 | |||
218 | pub fn text(&self) -> &'f str { | ||
219 | let (start, end) = self.data().range; | ||
220 | &self.file.text[start as usize..end as usize] | ||
221 | } | ||
222 | |||
223 | pub fn parent(&self) -> Option<Node<'f>> { | ||
224 | self.as_node(self.data().parent) | ||
225 | } | ||
226 | |||
227 | pub fn children(&self) -> Children<'f> { | ||
228 | Children { next: self.as_node(self.data().first_child) } | ||
229 | } | ||
230 | |||
231 | fn data(&self) -> &'f NodeData { | ||
232 | &self.file.nodes[self.idx as usize] | ||
233 | } | ||
234 | |||
235 | fn as_node(&self, idx: Option<u32>) -> Option<Node<'f>> { | ||
236 | idx.map(|idx| Node { file: self.file, idx }) | ||
237 | } | ||
238 | } | ||
239 | |||
240 | impl<'f> Iterator for Children<'f> { | ||
241 | type Item = Node<'f>; | ||
242 | |||
243 | fn next(&mut self) -> Option<Node<'f>> { | ||
244 | let next = self.next; | ||
245 | self.next = next.and_then(|node| node.as_node(node.data().next_sibling)); | ||
246 | next | ||
247 | } | ||
248 | } | ||
249 | |||
250 | pub const ERROR: NodeKind = NodeKind(0); | ||
251 | pub const WHITESPACE: NodeKind = NodeKind(1); | ||
252 | pub const STRUCT_KW: NodeKind = NodeKind(2); | ||
253 | pub const IDENT: NodeKind = NodeKind(3); | ||
254 | pub const L_CURLY: NodeKind = NodeKind(4); | ||
255 | pub const R_CURLY: NodeKind = NodeKind(5); | ||
256 | pub const COLON: NodeKind = NodeKind(6); | ||
257 | pub const COMMA: NodeKind = NodeKind(7); | ||
258 | pub const AMP: NodeKind = NodeKind(8); | ||
259 | pub const LINE_COMMENT: NodeKind = NodeKind(9); | ||
260 | pub const FILE: NodeKind = NodeKind(10); | ||
261 | pub const STRUCT_DEF: NodeKind = NodeKind(11); | ||
262 | pub const FIELD_DEF: NodeKind = NodeKind(12); | ||
263 | pub const TYPE_REF: NodeKind = NodeKind(13); | ||
264 | ``` | ||
265 | |||
266 | Here is a rust snippet and the corresponding parse tree: | ||
267 | |||
268 | ```rust | ||
269 | struct Foo { | ||
270 | field1: u32, | ||
271 | & | ||
272 | // non-doc comment | ||
273 | field2: | ||
274 | } | ||
275 | ``` | ||
276 | |||
277 | |||
278 | ``` | ||
279 | FILE | ||
280 | STRUCT_DEF | ||
281 | STRUCT_KW | ||
282 | WHITESPACE | ||
283 | IDENT | ||
284 | WHITESPACE | ||
285 | L_CURLY | ||
286 | WHITESPACE | ||
287 | FIELD_DEF | ||
288 | IDENT | ||
289 | COLON | ||
290 | WHITESPACE | ||
291 | TYPE_REF | ||
292 | IDENT | ||
293 | COMMA | ||
294 | WHITESPACE | ||
295 | ERROR | ||
296 | AMP | ||
297 | WHITESPACE | ||
298 | FIELD_DEF | ||
299 | LINE_COMMENT | ||
300 | WHITESPACE | ||
301 | IDENT | ||
302 | COLON | ||
303 | ERROR | ||
304 | WHITESPACE | ||
305 | R_CURLY | ||
306 | ``` | ||
307 | |||
308 | Note several features of the tree: | ||
309 | |||
310 | * All whitespace and comments are explicitly accounted for. | ||
311 | |||
312 | * The node for `STRUCT_DEF` contains the error element for `&`, but | ||
313 | still represents the following field correctly. | ||
314 | |||
315 | * The second field of the struct is incomplete: `FIELD_DEF` node for | ||
316 | it contains an `ERROR` element, but nevertheless has the correct | ||
317 | `NodeKind`. | ||
318 | |||
319 | * The non-documenting comment is correctly attached to the following | ||
320 | field. | ||
321 | |||
322 | |||
323 | ## Typed Tree | ||
324 | |||
325 | It's hard to work with this raw parse tree, because it is untyped: | ||
326 | node containing a struct definition has the same API as the node for | ||
327 | the struct field. But it's possible to add a strongly typed layer on | ||
328 | top of this raw tree, and get a zero-cost AST. Here is an example | ||
329 | which adds type-safe wrappers for structs and fields: | ||
330 | |||
331 | ```rust | ||
332 | // generic infrastructure | ||
333 | |||
334 | pub trait AstNode<'f>: Copy + 'f { | ||
335 | fn new(node: Node<'f>) -> Option<Self>; | ||
336 | fn node(&self) -> Node<'f>; | ||
337 | } | ||
338 | |||
339 | pub fn child_of_kind<'f>(node: Node<'f>, kind: NodeKind) -> Option<Node<'f>> { | ||
340 | node.children().find(|child| child.kind() == kind) | ||
341 | } | ||
342 | |||
343 | pub fn ast_children<'f, A: AstNode<'f>>(node: Node<'f>) -> Box<Iterator<Item=A> + 'f> { | ||
344 | Box::new(node.children().filter_map(A::new)) | ||
345 | } | ||
346 | |||
347 | // AST elements, specific to Rust | ||
348 | |||
349 | #[derive(Clone, Copy)] | ||
350 | pub struct StructDef<'f>(Node<'f>); | ||
351 | |||
352 | #[derive(Clone, Copy)] | ||
353 | pub struct FieldDef<'f>(Node<'f>); | ||
354 | |||
355 | #[derive(Clone, Copy)] | ||
356 | pub struct TypeRef<'f>(Node<'f>); | ||
357 | |||
358 | pub trait NameOwner<'f>: AstNode<'f> { | ||
359 | fn name_ident(&self) -> Node<'f> { | ||
360 | child_of_kind(self.node(), IDENT).unwrap() | ||
361 | } | ||
362 | |||
363 | fn name(&self) -> &'f str { self.name_ident().text() } | ||
364 | } | ||
365 | |||
366 | |||
367 | impl<'f> AstNode<'f> for StructDef<'f> { | ||
368 | fn new(node: Node<'f>) -> Option<Self> { | ||
369 | if node.kind() == STRUCT_DEF { Some(StructDef(node)) } else { None } | ||
370 | } | ||
371 | fn node(&self) -> Node<'f> { self.0 } | ||
372 | } | ||
373 | |||
374 | impl<'f> NameOwner<'f> for StructDef<'f> {} | ||
375 | |||
376 | impl<'f> StructDef<'f> { | ||
377 | pub fn fields(&self) -> Box<Iterator<Item=FieldDef<'f>> + 'f> { | ||
378 | ast_children(self.node()) | ||
379 | } | ||
380 | } | ||
381 | |||
382 | |||
383 | impl<'f> AstNode<'f> for FieldDef<'f> { | ||
384 | fn new(node: Node<'f>) -> Option<Self> { | ||
385 | if node.kind() == FIELD_DEF { Some(FieldDef(node)) } else { None } | ||
386 | } | ||
387 | fn node(&self) -> Node<'f> { self.0 } | ||
388 | } | ||
389 | |||
390 | impl<'f> FieldDef<'f> { | ||
391 | pub fn type_ref(&self) -> Option<TypeRef<'f>> { | ||
392 | ast_children(self.node()).next() | ||
393 | } | ||
394 | } | ||
395 | |||
396 | impl<'f> NameOwner<'f> for FieldDef<'f> {} | ||
397 | |||
398 | |||
399 | impl<'f> AstNode<'f> for TypeRef<'f> { | ||
400 | fn new(node: Node<'f>) -> Option<Self> { | ||
401 | if node.kind() == TYPE_REF { Some(TypeRef(node)) } else { None } | ||
402 | } | ||
403 | fn node(&self) -> Node<'f> { self.0 } | ||
404 | } | ||
405 | ``` | ||
406 | |||
407 | Note that although AST wrappers provide a type-safe access to the | ||
408 | tree, they are still represented as indexes, so clients of the syntax | ||
409 | tree can easily associated additional data with AST nodes by storing | ||
410 | it in a side-table. | ||
411 | |||
412 | |||
413 | ## Missing Source Code | ||
414 | |||
415 | The crucial feature of this syntax tree is that it is just a view into | ||
416 | the original source code. And this poses a problem for the Rust | ||
417 | language, because not all compiled Rust code is represented in the | ||
418 | form of source code! Specifically, Rust has a powerful macro system, | ||
419 | which effectively allows to create and parse additional source code at | ||
420 | compile time. It is not entirely clear that the proposed parsing | ||
421 | framework is able to handle this use case, and it's the main purpose | ||
422 | of this RFC to figure it out. The current idea for handling macros is | ||
423 | to make each macro expansion produce a triple of (expansion text, | ||
424 | syntax tree, hygiene information), where hygiene information is a side | ||
425 | table, which colors different ranges of the expansion text according | ||
426 | to the original syntactic context. | ||
427 | |||
428 | |||
429 | ## Implementation plan | ||
430 | |||
431 | This RFC proposes huge changes to the internals of the compiler, so | ||
432 | it's important to proceed carefully and incrementally. The following | ||
433 | plan is suggested: | ||
434 | |||
435 | * RFC discussion about the theoretical feasibility of the proposal, | ||
436 | and the best representation representation for the syntax tree. | ||
437 | |||
438 | * Implementation of the proposal as a completely separate crates.io | ||
439 | crate, by refactoring existing libsyntax source code to produce a | ||
440 | new tree. | ||
441 | |||
442 | * A prototype implementation of the macro expansion on top of the new | ||
443 | sytnax tree. | ||
444 | |||
445 | * Additional round of discussion/RFC about merging with the mainline | ||
446 | compiler. | ||
447 | |||
448 | |||
449 | # Drawbacks | ||
450 | [drawbacks]: #drawbacks | ||
451 | |||
452 | - No harm will be done as long as the new libsyntax exists as an | ||
453 | experiemt on crates.io. However, actually using it in the compiler | ||
454 | and other tools would require massive refactorings. | ||
455 | |||
456 | - It's difficult to know upfront if the proposed syntax tree would | ||
457 | actually work well in both the compiler and IDE. It may be possible | ||
458 | that some drawbacks will be discovered during implementation. | ||
459 | |||
460 | |||
461 | # Rationale and alternatives | ||
462 | [alternatives]: #alternatives | ||
463 | |||
464 | - Incrementally add more information about source code to the current | ||
465 | AST. | ||
466 | |||
467 | - Move the current libsyntax to crates.io as is. In the past, there | ||
468 | were several failed attempts to do that. | ||
469 | |||
470 | - Explore alternative representations for the parse tree. | ||
471 | |||
472 | - Use parser generator instead of hand written parser. Using the | ||
473 | parser from libsyntax directly would be easier, and hand-written | ||
474 | LL-style parsers usually have much better error recovery than | ||
475 | generated LR-style ones. | ||
476 | |||
477 | # Unresolved questions | ||
478 | [unresolved]: #unresolved-questions | ||
479 | |||
480 | - Is it at all possible to represent Rust parser as a pure function of | ||
481 | the source code? It seems like the answer is yes, because the | ||
482 | language and especially macros were cleverly designed with this | ||
483 | use-case in mind. | ||
484 | |||
485 | |||
486 | - Is it possible to implement macro expansion using the proposed | ||
487 | framework? This is the main question of this RFC. The proposed | ||
488 | solution of synthesizing source code on the fly seems workable: it's | ||
489 | not that different from the current implementation, which | ||
490 | synthesizes token trees. | ||
491 | |||
492 | |||
493 | - How to actually phase out current libsyntax, if libsyntax2.0 turns | ||
494 | out to be a success? | ||
diff --git a/docs/validation.md b/docs/validation.md new file mode 100644 index 000000000..2739bfcdd --- /dev/null +++ b/docs/validation.md | |||
@@ -0,0 +1,14 @@ | |||
1 | Fixmes: | ||
2 | |||
3 | Lexer: | ||
4 | * Fix `is_whitespace`, add more tests | ||
5 | * Add more thorough tests for idents for XID_Start & XID_Continue | ||
6 | * Validate that float and integer literals use digits only of the appropriate | ||
7 | base, and are in range | ||
8 | * Validation for unclosed char literal | ||
9 | * Strings are completely wrong: more tests and comparison with libsyntax. | ||
10 | * Comment lexing is completely wrong | ||
11 | |||
12 | Parser: | ||
13 | * Figure out what is the expected state of attribute grammar. | ||
14 | Token trees or something more structured? Token trees would be unfortunate: no extend selection =/ \ No newline at end of file | ||