aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--rfc.md217
1 files changed, 209 insertions, 8 deletions
diff --git a/rfc.md b/rfc.md
index 6cc3e1a79..46bec624d 100644
--- a/rfc.md
+++ b/rfc.md
@@ -1,4 +1,4 @@
1- Feature Name: libsyntax2 1- Feature Name: libsyntax2.0
2- Start Date: 2017-12-30 2- Start Date: 2017-12-30
3- RFC PR: (leave this empty) 3- RFC PR: (leave this empty)
4- Rust Issue: (leave this empty) 4- Rust Issue: (leave this empty)
@@ -111,6 +111,8 @@ compiler.
111[Kotlin]: https://kotlinlang.org/ 111[Kotlin]: https://kotlinlang.org/
112 112
113 113
114## Untyped Tree
115
114The main idea is to store the minimal amount of information in the 116The main idea is to store the minimal amount of information in the
115tree itself, and instead lean heavily on the source code string for 117tree itself, and instead lean heavily on the source code string for
116the actual data about identifier names, constant values etc. 118the actual data about identifier names, constant values etc.
@@ -123,6 +125,90 @@ syntactic categories
123 125
124 126
125```rust 127```rust
128#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
129pub struct NodeKind(u16);
130
131pub struct File {
132 text: String,
133 nodes: Vec<NodeData>,
134}
135
136struct NodeData {
137 kind: NodeKind,
138 range: (u32, u32),
139 parent: Option<u32>,
140 first_child: Option<u32>,
141 next_sibling: Option<u32>,
142}
143
144#[derive(Clone, Copy)]
145pub struct Node<'f> {
146 file: &'f File,
147 idx: u32,
148}
149
150pub struct Children<'f> {
151 next: Option<Node<'f>>,
152}
153
154impl File {
155 pub fn root<'f>(&'f self) -> Node<'f> {
156 assert!(!self.nodes.is_empty());
157 Node { file: self, idx: 0 }
158 }
159}
160
161impl<'f> Node<'f> {
162 pub fn kind(&self) -> NodeKind {
163 self.data().kind
164 }
165
166 pub fn text(&self) -> &'f str {
167 let (start, end) = self.data().range;
168 &self.file.text[start as usize..end as usize]
169 }
170
171 pub fn parent(&self) -> Option<Node<'f>> {
172 self.as_node(self.data().parent)
173 }
174
175 pub fn children(&self) -> Children<'f> {
176 Children { next: self.as_node(self.data().first_child) }
177 }
178
179 fn data(&self) -> &'f NodeData {
180 &self.file.nodes[self.idx as usize]
181 }
182
183 fn as_node(&self, idx: Option<u32>) -> Option<Node<'f>> {
184 idx.map(|idx| Node { file: self.file, idx })
185 }
186}
187
188impl<'f> Iterator for Children<'f> {
189 type Item = Node<'f>;
190
191 fn next(&mut self) -> Option<Node<'f>> {
192 let next = self.next;
193 self.next = next.and_then(|node| node.as_node(node.data().next_sibling));
194 next
195 }
196}
197
198pub const ERROR: NodeKind = NodeKind(0);
199pub const WHITESPACE: NodeKind = NodeKind(1);
200pub const STRUCT_KW: NodeKind = NodeKind(2);
201pub const IDENT: NodeKind = NodeKind(3);
202pub const L_CURLY: NodeKind = NodeKind(4);
203pub const R_CURLY: NodeKind = NodeKind(5);
204pub const COLON: NodeKind = NodeKind(6);
205pub const COMMA: NodeKind = NodeKind(7);
206pub const AMP: NodeKind = NodeKind(8);
207pub const LINE_COMMENT: NodeKind = NodeKind(9);
208pub const FILE: NodeKind = NodeKind(10);
209pub const STRUCT_DEF: NodeKind = NodeKind(11);
210pub const FIELD_DEF: NodeKind = NodeKind(12);
211pub const TYPE_REF: NodeKind = NodeKind(13);
126``` 212```
127 213
128Here is a rust snippet and the corresponding parse tree: 214Here is a rust snippet and the corresponding parse tree:
@@ -182,22 +268,137 @@ Note several features of the tree:
182 field. 268 field.
183 269
184 270
271## Typed Tree
272
273It's hard to work with this raw parse tree, because it is untyped:
274node containing a struct definition has the same API as the node for
275the struct field. But it's possible to add a strongly typed layer on
276top of this raw tree, and get a zero-cost typed AST. Here is an
277example which adds type-safe wrappers for structs and fields:
278
279```rust
280pub trait AstNode<'f>: Copy + 'f {
281 fn new(node: Node<'f>) -> Option<Self>;
282 fn node(&self) -> Node<'f>;
283}
284
285pub fn child_of_kind<'f>(node: Node<'f>, kind: NodeKind) -> Option<Node<'f>> {
286 node.children().find(|child| child.kind() == kind)
287}
288
289pub fn ast_children<'f, A: AstNode<'f>>(node: Node<'f>) -> Box<Iterator<Item=A> + 'f> {
290 Box::new(node.children().filter_map(A::new))
291}
292
293#[derive(Clone, Copy)]
294pub struct StructDef<'f>(Node<'f>);
295
296#[derive(Clone, Copy)]
297pub struct FieldDef<'f>(Node<'f>);
298
299#[derive(Clone, Copy)]
300pub struct TypeRef<'f>(Node<'f>);
301
302pub trait NameOwner<'f>: AstNode<'f> {
303 fn name_ident(&self) -> Node<'f> {
304 child_of_kind(self.node(), IDENT).unwrap()
305 }
306
307 fn name(&self) -> &'f str { self.name_ident().text() }
308}
309
310
311impl<'f> AstNode<'f> for StructDef<'f> {
312 fn new(node: Node<'f>) -> Option<Self> {
313 if node.kind() == STRUCT_DEF { Some(StructDef(node)) } else { None }
314 }
315 fn node(&self) -> Node<'f> { self.0 }
316}
317
318impl<'f> AstNode<'f> for FieldDef<'f> {
319 fn new(node: Node<'f>) -> Option<Self> {
320 if node.kind() == FIELD_DEF { Some(FieldDef(node)) } else { None }
321 }
322 fn node(&self) -> Node<'f> { self.0 }
323}
324
325impl<'f> AstNode<'f> for TypeRef<'f> {
326 fn new(node: Node<'f>) -> Option<Self> {
327 if node.kind() == TYPE_REF { Some(TypeRef(node)) } else { None }
328 }
329 fn node(&self) -> Node<'f> { self.0 }
330}
331
332impl<'f> NameOwner<'f> for StructDef<'f> {}
333impl<'f> NameOwner<'f> for FieldDef<'f> {}
334
335impl<'f> StructDef<'f> {
336 pub fn fields(&self) -> Box<Iterator<Item=FieldDef<'f>> + 'f> {
337 ast_children(self.node())
338 }
339}
340
341impl<'f> FieldDef<'f> {
342 pub fn type_ref(&self) -> Option<TypeRef<'f>> {
343 ast_children(self.node()).next()
344 }
345}
346```
347
348
349## Missing Source Code
350
351The crucial feature of this syntax tree is that it is just a view into
352the original source code. And this poses a problem for the Rust
353language, because not all compiled Rust code is represented in the
354form of source code! Specifically, Rust has a powerful macro system,
355which effectively allows to create and parse additional source code at
356compile time. It is not entirely clear that the proposed parsing
357framework is able to handle this use case, and it's the main purpose
358of this RFC to figure it out. The current idea for handling macros is
359to make each macro expansion produce a triple of (expansion text,
360syntax tree, hygiene information), where hygiene information is a side
361table, which colors different ranges of the expansion text according
362to the original syntactic context.
363
364
365## Implementation plan
366
367This RFC proposes huge changes to the internals of the compiler, so
368it's important to proceed carefully and incrementally. The following
369plan is suggested:
370
371* RFC discussion about the theoretical feasibility of the proposal.
372
373* Implementation of the proposal as a completely separate crates.io
374 crate.
375
376* A prototype implementation of the macro expansion on top of the new sytnax tree.
377
378* Additional round of discussion/RFC about merging with the mainline
379 compiler.
380
185 381
186# Drawbacks 382# Drawbacks
187[drawbacks]: #drawbacks 383[drawbacks]: #drawbacks
188 384
189Why should we *not* do this? 385- No harm will be done as long as the new libsyntax exists as an
386 experiemt on crates.io. However, actually using it in the compiler
387 and other tools would require massive refactorings.
190 388
191# Rationale and alternatives 389# Rationale and alternatives
192[alternatives]: #alternatives 390[alternatives]: #alternatives
193 391
194- Why is this design the best in the space of possible designs? 392- Incrementally add more information about source code to the current AST.
195- What other designs have been considered and what is the rationale for not choosing them? 393- Move the current libsyntax to crates.io as is.
196- What is the impact of not doing this? 394- Explore alternative representations for the parse tree.
197 395
198# Unresolved questions 396# Unresolved questions
199[unresolved]: #unresolved-questions 397[unresolved]: #unresolved-questions
200 398
201- What parts of the design do you expect to resolve through the RFC process before this gets merged? 399- Is it at all possible to represent Rust parser as a pure function of
202- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? 400 the source code?
203- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? 401- Is it possible to implement macro expansion using the proposed
402 framework?
403- How to actually phase out current libsyntax, if libsyntax2.0 turns
404 out to be a success?