1 files changed, 96 insertions, 1 deletions
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 5b50f8faa..a1fa246c2 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -1 +1,96 @@
-# Design and open questions about libsyntax.
+# Design and open questions about libsyntax
+The high-level description of the architecture is in RFC.md. You might
+also want to dig through https://github.com/matklad/fall/ which
+contains some pretty interesting stuff build using similar ideas
+(warning: it is completely undocumented, poorly written and in general
+not the thing which I recommend to study (yes, this is
+self-contradictory)).
+## Tree
+The centerpiece of this whole endeavor is the syntax tree, in the
+`tree` module. Open questions:
+- how to best represent errors, to take advantage of the fact that
+  they are rare, but to enable fully-persistent style structure
+  sharing between tree nodes?
+  
+- should we make red/green split from Roslyn more pronounced?
+- one can layout nodes in a single array in such a way that children
+  of the node form a continuous slice. Seems nifty, but do we need it?
+  
+- should we use SoA or AoS for NodeData?
+- should we split leaf nodes and internal nodes into separate arrays?
+  Can we use it to save some bits here and there? (leaves don't need
+  first_child field, for example).
+## Parser
+The syntax tree is produced using a three-staged process. 
+First, a raw text is split into tokens with a lexer. Lexer has a
+peculiar signature: it is an `Fn(&str) -> Token`, where token is a
+pair of `SyntaxKind` (you should have read the `tree` module and RFC
+by this time! :)) and a len. That is, lexer chomps only the first
+token of the input. This forces the lexer to be stateless, and makes
+it possible to implement incremental relexing easily.
+Then, the bulk of work, the parser turns a stream of tokens into
+stream of events. Not that parser **does not** construct a tree right
+away. This is done for several reasons:
+* to decouple the actual tree data structure from the parser: you can
+  build any datastructre you want from the stream of events
+  
+* to make parsing fast: you can produce a list of events without
+  allocations
+  
+* to make it easy to tweak tree structure. Consider this code:
+  ```
+  #[cfg(test)]
+  pub fn foo() {}
+  ```
+  
+  Here, the attribute and the `pub` keyword must be the children of
+  the `fn` node. However, when parsing them, we don't yet know if
+  there would be a function ahead: it very well might be a `struct`
+  there. If we use events, we generally don't care about this *in
+  parser* and just spit them in order.
+  
+* (Is this true?)  to make incremental reparsing easier: you can reuse
+  the same rope data structure for all of the original string, the
+  tokens and the events.
+  
+The parser also does not know about whitespace tokens: it's the job of
+the next layer to assign whitespace and comments to nodes. However,
+parser can remap contextual tokens, like `>>` or `union`, so it has
+access to the text.
+And at last, the TreeBuilder converts a flat stream of events into a
+tree structure. It also *should* be responsible for attaching comments
+and rebalancing the tree, but it does not do this yet :) 
+## Error reporing
+TODO: describe how stuff like `skip_to_first` works
+## Validator
+Parser and lexer accept a lot of *invalid* code intentionally. The
+idea is to post-process the tree and to proper error reporting,
+literal conversion and quick-fix suggestions. There is no
+design/implementation for this yet.
+## AST
+Nothing yet, see `AstNode` in `fall`.

diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 5b50f8faa..a1fa246c2 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md
@@ -1 +1,96 @@
1	# Design and open questions about libsyntax.	1	# Design and open questions about libsyntax
		2
		3
		4	The high-level description of the architecture is in RFC.md. You might
		5	also want to dig through https://github.com/matklad/fall/ which
		6	contains some pretty interesting stuff build using similar ideas
		7	(warning: it is completely undocumented, poorly written and in general
		8	not the thing which I recommend to study (yes, this is
		9	self-contradictory)).
		10
		11	## Tree
		12
		13	The centerpiece of this whole endeavor is the syntax tree, in the
		14	`tree` module. Open questions:
		15
		16	- how to best represent errors, to take advantage of the fact that
		17	they are rare, but to enable fully-persistent style structure
		18	sharing between tree nodes?
		19
		20	- should we make red/green split from Roslyn more pronounced?
		21
		22	- one can layout nodes in a single array in such a way that children
		23	of the node form a continuous slice. Seems nifty, but do we need it?
		24
		25	- should we use SoA or AoS for NodeData?
		26
		27	- should we split leaf nodes and internal nodes into separate arrays?
		28	Can we use it to save some bits here and there? (leaves don't need
		29	first_child field, for example).
		30
		31
		32	## Parser
		33
		34	The syntax tree is produced using a three-staged process.
		35
		36	First, a raw text is split into tokens with a lexer. Lexer has a
		37	peculiar signature: it is an `Fn(&str) -> Token`, where token is a
		38	pair of `SyntaxKind` (you should have read the `tree` module and RFC
		39	by this time! :)) and a len. That is, lexer chomps only the first
		40	token of the input. This forces the lexer to be stateless, and makes
		41	it possible to implement incremental relexing easily.
		42
		43	Then, the bulk of work, the parser turns a stream of tokens into
		44	stream of events. Not that parser does not construct a tree right
		45	away. This is done for several reasons:
		46
		47	* to decouple the actual tree data structure from the parser: you can
		48	build any datastructre you want from the stream of events
		49
		50	* to make parsing fast: you can produce a list of events without
		51	allocations
		52
		53	* to make it easy to tweak tree structure. Consider this code:
		54
		55	```
		56	#[cfg(test)]
		57	pub fn foo() {}
		58	```
		59
		60	Here, the attribute and the `pub` keyword must be the children of
		61	the `fn` node. However, when parsing them, we don't yet know if
		62	there would be a function ahead: it very well might be a `struct`
		63	there. If we use events, we generally don't care about this *in
		64	parser* and just spit them in order.
		65
		66	* (Is this true?) to make incremental reparsing easier: you can reuse
		67	the same rope data structure for all of the original string, the
		68	tokens and the events.
		69
		70
		71	The parser also does not know about whitespace tokens: it's the job of
		72	the next layer to assign whitespace and comments to nodes. However,
		73	parser can remap contextual tokens, like `>>` or `union`, so it has
		74	access to the text.
		75
		76	And at last, the TreeBuilder converts a flat stream of events into a
		77	tree structure. It also should be responsible for attaching comments
		78	and rebalancing the tree, but it does not do this yet :)
		79
		80
		81	## Error reporing
		82
		83	TODO: describe how stuff like `skip_to_first` works
		84
		85
		86	## Validator
		87
		88	Parser and lexer accept a lot of invalid code intentionally. The
		89	idea is to post-process the tree and to proper error reporting,
		90	literal conversion and quick-fix suggestions. There is no
		91	design/implementation for this yet.
		92
		93
		94	## AST
		95
		96	Nothing yet, see `AstNode` in `fall`.