From 2812015d40c94ddb94bb57ed1a9811ff24414c44 Mon Sep 17 00:00:00 2001 From: Aleksey Kladov Date: Fri, 24 Aug 2018 18:14:21 +0300 Subject: README --- docs/ARCHITECTURE.md | 93 ---------------------------------------------------- 1 file changed, 93 deletions(-) delete mode 100644 docs/ARCHITECTURE.md (limited to 'docs/ARCHITECTURE.md') diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md deleted file mode 100644 index 6b4434396..000000000 --- a/docs/ARCHITECTURE.md +++ /dev/null @@ -1,93 +0,0 @@ -# Design and open questions about libsyntax - - -The high-level description of the architecture is in RFC.md. You might -also want to dig through https://github.com/matklad/fall/ which -contains some pretty interesting stuff build using similar ideas -(warning: it is completely undocumented, poorly written and in general -not the thing which I recommend to study (yes, this is -self-contradictory)). - -## Tree - -The centerpiece of this whole endeavor is the syntax tree, in the -`tree` module. Open questions: - -- how to best represent errors, to take advantage of the fact that - they are rare, but to enable fully-persistent style structure - sharing between tree nodes? - -- should we make red/green split from Roslyn more pronounced? - -- one can layout nodes in a single array in such a way that children - of the node form a continuous slice. Seems nifty, but do we need it? - -- should we use SoA or AoS for NodeData? - -- should we split leaf nodes and internal nodes into separate arrays? - Can we use it to save some bits here and there? (leaves don't need - first_child field, for example). - - -## Parser - -The syntax tree is produced using a three-staged process. - -First, a raw text is split into tokens with a lexer (the `lexer` module). -Lexer has a peculiar signature: it is an `Fn(&str) -> Token`, where token -is a pair of `SyntaxKind` (you should have read the `tree` module and RFC -by this time! :)) and a len. That is, lexer chomps only the first -token of the input. This forces the lexer to be stateless, and makes -it possible to implement incremental relexing easily. - -Then, the bulk of work, the parser turns a stream of tokens into -stream of events (the `parser` module; of particular interest are -the `parser/event` and `parser/parser` modules, which contain parsing -API, and the `parser/grammar` module, which contains actual parsing code -for various Rust syntactic constructs). Not that parser **does not** -construct a tree right away. This is done for several reasons: - -* to decouple the actual tree data structure from the parser: you can - build any data structure you want from the stream of events - -* to make parsing fast: you can produce a list of events without - allocations - -* to make it easy to tweak tree structure. Consider this code: - - ``` - #[cfg(test)] - pub fn foo() {} - ``` - - Here, the attribute and the `pub` keyword must be the children of - the `fn` node. However, when parsing them, we don't yet know if - there would be a function ahead: it very well might be a `struct` - there. If we use events, we generally don't care about this *in - parser* and just spit them in order. - -* (Is this true?) to make incremental reparsing easier: you can reuse - the same rope data structure for all of the original string, the - tokens and the events. - - -The parser also does not know about whitespace tokens: it's the job of -the next layer to assign whitespace and comments to nodes. However, -parser can remap contextual tokens, like `>>` or `union`, so it has -access to the text. - -And at last, the TreeBuilder converts a flat stream of events into a -tree structure. It also *should* be responsible for attaching comments -and rebalancing the tree, but it does not do this yet :) - -## Validator - -Parser and lexer accept a lot of *invalid* code intentionally. The -idea is to post-process the tree and to proper error reporting, -literal conversion and quick-fix suggestions. There is no -design/implementation for this yet. - - -## AST - -Nothing yet, see `AstNode` in `fall`. -- cgit v1.2.3