diff options
Diffstat (limited to 'docs/dev/architecture.md')
-rw-r--r-- | docs/dev/architecture.md | 174 |
1 files changed, 174 insertions, 0 deletions
diff --git a/docs/dev/architecture.md b/docs/dev/architecture.md new file mode 100644 index 000000000..3cd63bf73 --- /dev/null +++ b/docs/dev/architecture.md | |||
@@ -0,0 +1,174 @@ | |||
1 | # Architecture | ||
2 | |||
3 | This document describes the high-level architecture of rust-analyzer. | ||
4 | If you want to familiarize yourself with the code base, you are just | ||
5 | in the right place! | ||
6 | |||
7 | See also the [guide](./guide.md), which walks through a particular snapshot of | ||
8 | rust-analyzer code base. | ||
9 | |||
10 | Yet another resource is this playlist with videos about various parts of the | ||
11 | analyzer: | ||
12 | |||
13 | https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE | ||
14 | |||
15 | ## The Big Picture | ||
16 | |||
17 | ![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png) | ||
18 | |||
19 | On the highest level, rust-analyzer is a thing which accepts input source code | ||
20 | from the client and produces a structured semantic model of the code. | ||
21 | |||
22 | More specifically, input data consists of a set of test files (`(PathBuf, | ||
23 | String)` pairs) and information about project structure, captured in the so called | ||
24 | `CrateGraph`. The crate graph specifies which files are crate roots, which cfg | ||
25 | flags are specified for each crate (TODO: actually implement this) and what | ||
26 | dependencies exist between the crates. The analyzer keeps all this input data in | ||
27 | memory and never does any IO. Because the input data is source code, which | ||
28 | typically measures in tens of megabytes at most, keeping all input data in | ||
29 | memory is OK. | ||
30 | |||
31 | A "structured semantic model" is basically an object-oriented representation of | ||
32 | modules, functions and types which appear in the source code. This representation | ||
33 | is fully "resolved": all expressions have types, all references are bound to | ||
34 | declarations, etc. | ||
35 | |||
36 | The client can submit a small delta of input data (typically, a change to a | ||
37 | single file) and get a fresh code model which accounts for changes. | ||
38 | |||
39 | The underlying engine makes sure that model is computed lazily (on-demand) and | ||
40 | can be quickly updated for small modifications. | ||
41 | |||
42 | |||
43 | ## Code generation | ||
44 | |||
45 | Some of the components of this repository are generated through automatic | ||
46 | processes. These are outlined below: | ||
47 | |||
48 | - `gen-syntax`: The kinds of tokens that are reused in several places, so a generator | ||
49 | is used. We use tera templates to generate the files listed below, based on | ||
50 | the grammar described in [grammar.ron]: | ||
51 | - [ast/generated.rs][ast generated] in `ra_syntax` based on | ||
52 | [ast/generated.tera.rs][ast source] | ||
53 | - [syntax_kinds/generated.rs][syntax_kinds generated] in `ra_syntax` based on | ||
54 | [syntax_kinds/generated.tera.rs][syntax_kinds source] | ||
55 | |||
56 | [tera]: https://tera.netlify.com/ | ||
57 | [grammar.ron]: ./crates/ra_syntax/src/grammar.ron | ||
58 | [ast generated]: ./crates/ra_syntax/src/ast/generated.rs | ||
59 | [ast source]: ./crates/ra_syntax/src/ast/generated.rs.tera | ||
60 | [syntax_kinds generated]: ./crates/ra_syntax/src/syntax_kinds/generated.rs | ||
61 | [syntax_kinds source]: ./crates/ra_syntax/src/syntax_kinds/generated.rs.tera | ||
62 | |||
63 | |||
64 | ## Code Walk-Through | ||
65 | |||
66 | ### `crates/ra_syntax`, `crates/ra_parser` | ||
67 | |||
68 | Rust syntax tree structure and parser. See | ||
69 | [RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes. | ||
70 | |||
71 | - [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees. | ||
72 | - `grammar` module is the actual parser. It is a hand-written recursive descent parser, which | ||
73 | produces a sequence of events like "start node X", "finish not Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java), | ||
74 | which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs) | ||
75 | is what we use for the definition of the Rust language. | ||
76 | - `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees. | ||
77 | This is the thing that turns a flat list of events into a tree (see `EventProcessor`) | ||
78 | - `ast` provides a type safe API on top of the raw `rowan` tree. | ||
79 | - `grammar.ron` RON description of the grammar, which is used to | ||
80 | generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command. | ||
81 | - `algo`: generic tree algorithms, including `walk` for O(1) stack | ||
82 | space tree traversal (this is cool) and `visit` for type-driven | ||
83 | visiting the nodes (this is double plus cool, if you understand how | ||
84 | `Visitor` works, you understand the design of syntax trees). | ||
85 | |||
86 | Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs` | ||
87 | (test vectors) and `.txt` files with corresponding syntax trees. During testing, we check | ||
88 | `.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update | ||
89 | tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect | ||
90 | all `//test test_name` comments into files inside `tests/data` directory. | ||
91 | |||
92 | See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which | ||
93 | fixes a bug in the grammar. | ||
94 | |||
95 | ### `crates/ra_db` | ||
96 | |||
97 | We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and | ||
98 | on-demand computation. Roughly, you can think of salsa as a key-value store, but | ||
99 | it also can compute derived values using specified functions. The `ra_db` crate | ||
100 | provides basic infrastructure for interacting with salsa. Crucially, it | ||
101 | defines most of the "input" queries: facts supplied by the client of the | ||
102 | analyzer. Reading the docs of the `ra_db::input` module should be useful: | ||
103 | everything else is strictly derived from those inputs. | ||
104 | |||
105 | ### `crates/ra_hir` | ||
106 | |||
107 | HIR provides high-level "object oriented" access to Rust code. | ||
108 | |||
109 | The principal difference between HIR and syntax trees is that HIR is bound to a | ||
110 | particular crate instance. That is, it has cfg flags and features applied (in | ||
111 | theory, in practice this is to be implemented). So, the relation between | ||
112 | syntax and HIR is many-to-one. The `source_binder` module is responsible for | ||
113 | guessing a HIR for a particular source position. | ||
114 | |||
115 | Underneath, HIR works on top of salsa, using a `HirDatabase` trait. | ||
116 | |||
117 | ### `crates/ra_ide_api` | ||
118 | |||
119 | A stateful library for analyzing many Rust files as they change. `AnalysisHost` | ||
120 | is a mutable entity (clojure's atom) which holds the current state, incorporates | ||
121 | changes and hands out `Analysis` --- an immutable and consistent snapshot of | ||
122 | the world state at a point in time, which actually powers analysis. | ||
123 | |||
124 | One interesting aspect of analysis is its support for cancellation. When a | ||
125 | change is applied to `AnalysisHost`, first all currently active snapshots are | ||
126 | canceled. Only after all snapshots are dropped the change actually affects the | ||
127 | database. | ||
128 | |||
129 | APIs in this crate are IDE centric: they take text offsets as input and produce | ||
130 | offsets and strings as output. This works on top of rich code model powered by | ||
131 | `hir`. | ||
132 | |||
133 | ### `crates/ra_ide_api_light` | ||
134 | |||
135 | All IDE features which can be implemented if you only have access to a single | ||
136 | file. `ra_ide_api_light` could be used to enhance editing of Rust code without | ||
137 | the need to fiddle with build-systems, file synchronization and such. | ||
138 | |||
139 | In a sense, `ra_ide_api_light` is just a bunch of pure functions which take a | ||
140 | syntax tree as input. | ||
141 | |||
142 | The tests for `ra_ide_api_light` are `#[cfg(test)] mod tests` unit-tests spread | ||
143 | throughout its modules. | ||
144 | |||
145 | |||
146 | ### `crates/ra_lsp_server` | ||
147 | |||
148 | An LSP implementation which wraps `ra_ide_api` into a langauge server protocol. | ||
149 | |||
150 | ### `ra_vfs` | ||
151 | |||
152 | Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read | ||
153 | files from disk at the end of the day. This is what `ra_vfs` does. It also | ||
154 | manages overlays: "dirty" files in the editor, whose "true" contents is | ||
155 | different from data on disk. This is more or less the single really | ||
156 | platform-dependent component, so it lives in a separate repository and has an | ||
157 | extensive cross-platform CI testing. | ||
158 | |||
159 | ### `crates/gen_lsp_server` | ||
160 | |||
161 | A language server scaffold, exposing a synchronous crossbeam-channel based API. | ||
162 | This crate handles protocol handshaking and parsing messages, while you | ||
163 | control the message dispatch loop yourself. | ||
164 | |||
165 | Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages. | ||
166 | |||
167 | ### `crates/ra_cli` | ||
168 | |||
169 | A CLI interface to rust-analyzer. | ||
170 | |||
171 | |||
172 | ## Testing Infrastructure | ||
173 | |||
174 | |||