diff options
Diffstat (limited to 'docs/dev/arhictecture.md')
-rw-r--r-- | docs/dev/arhictecture.md | 200 |
1 files changed, 200 insertions, 0 deletions
diff --git a/docs/dev/arhictecture.md b/docs/dev/arhictecture.md new file mode 100644 index 000000000..57f76ebae --- /dev/null +++ b/docs/dev/arhictecture.md | |||
@@ -0,0 +1,200 @@ | |||
1 | # Architecture | ||
2 | |||
3 | This document describes the high-level architecture of rust-analyzer. | ||
4 | If you want to familiarize yourself with the code base, you are just | ||
5 | in the right place! | ||
6 | |||
7 | See also the [guide](./guide.md), which walks through a particular snapshot of | ||
8 | rust-analyzer code base. | ||
9 | |||
10 | For syntax-trees specifically, there's a [video walk | ||
11 | through](https://youtu.be/DGAuLWdCCAI) as well. | ||
12 | |||
13 | ## The Big Picture | ||
14 | |||
15 | ![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png) | ||
16 | |||
17 | On the highest level, rust-analyzer is a thing which accepts input source code | ||
18 | from the client and produces a structured semantic model of the code. | ||
19 | |||
20 | More specifically, input data consists of a set of test files (`(PathBuf, | ||
21 | String)` pairs) and information about project structure, captured in the so called | ||
22 | `CrateGraph`. The crate graph specifies which files are crate roots, which cfg | ||
23 | flags are specified for each crate (TODO: actually implement this) and what | ||
24 | dependencies exist between the crates. The analyzer keeps all this input data in | ||
25 | memory and never does any IO. Because the input data is source code, which | ||
26 | typically measures in tens of megabytes at most, keeping all input data in | ||
27 | memory is OK. | ||
28 | |||
29 | A "structured semantic model" is basically an object-oriented representation of | ||
30 | modules, functions and types which appear in the source code. This representation | ||
31 | is fully "resolved": all expressions have types, all references are bound to | ||
32 | declarations, etc. | ||
33 | |||
34 | The client can submit a small delta of input data (typically, a change to a | ||
35 | single file) and get a fresh code model which accounts for changes. | ||
36 | |||
37 | The underlying engine makes sure that model is computed lazily (on-demand) and | ||
38 | can be quickly updated for small modifications. | ||
39 | |||
40 | |||
41 | ## Code generation | ||
42 | |||
43 | Some of the components of this repository are generated through automatic | ||
44 | processes. These are outlined below: | ||
45 | |||
46 | - `gen-syntax`: The kinds of tokens that are reused in several places, so a generator | ||
47 | is used. We use tera templates to generate the files listed below, based on | ||
48 | the grammar described in [grammar.ron]: | ||
49 | - [ast/generated.rs][ast generated] in `ra_syntax` based on | ||
50 | [ast/generated.tera.rs][ast source] | ||
51 | - [syntax_kinds/generated.rs][syntax_kinds generated] in `ra_syntax` based on | ||
52 | [syntax_kinds/generated.tera.rs][syntax_kinds source] | ||
53 | |||
54 | [tera]: https://tera.netlify.com/ | ||
55 | [grammar.ron]: ./crates/ra_syntax/src/grammar.ron | ||
56 | [ast generated]: ./crates/ra_syntax/src/ast/generated.rs | ||
57 | [ast source]: ./crates/ra_syntax/src/ast/generated.rs.tera | ||
58 | [syntax_kinds generated]: ./crates/ra_syntax/src/syntax_kinds/generated.rs | ||
59 | [syntax_kinds source]: ./crates/ra_syntax/src/syntax_kinds/generated.rs.tera | ||
60 | |||
61 | |||
62 | ## Code Walk-Through | ||
63 | |||
64 | ### `crates/ra_syntax` | ||
65 | |||
66 | Rust syntax tree structure and parser. See | ||
67 | [RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes. | ||
68 | |||
69 | - [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees. | ||
70 | - `grammar` module is the actual parser. It is a hand-written recursive descent parser, which | ||
71 | produces a sequence of events like "start node X", "finish not Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java), | ||
72 | which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs) | ||
73 | is what we use for the definition of the Rust language. | ||
74 | - `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees. | ||
75 | This is the thing that turns a flat list of events into a tree (see `EventProcessor`) | ||
76 | - `ast` provides a type safe API on top of the raw `rowan` tree. | ||
77 | - `grammar.ron` RON description of the grammar, which is used to | ||
78 | generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command. | ||
79 | - `algo`: generic tree algorithms, including `walk` for O(1) stack | ||
80 | space tree traversal (this is cool) and `visit` for type-driven | ||
81 | visiting the nodes (this is double plus cool, if you understand how | ||
82 | `Visitor` works, you understand the design of syntax trees). | ||
83 | |||
84 | Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs` | ||
85 | (test vectors) and `.txt` files with corresponding syntax trees. During testing, we check | ||
86 | `.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update | ||
87 | tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect | ||
88 | all `//test test_name` comments into files inside `tests/data` directory. | ||
89 | |||
90 | See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which | ||
91 | fixes a bug in the grammar. | ||
92 | |||
93 | ### `crates/ra_db` | ||
94 | |||
95 | We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and | ||
96 | on-demand computation. Roughly, you can think of salsa as a key-value store, but | ||
97 | it also can compute derived values using specified functions. The `ra_db` crate | ||
98 | provides basic infrastructure for interacting with salsa. Crucially, it | ||
99 | defines most of the "input" queries: facts supplied by the client of the | ||
100 | analyzer. Reading the docs of the `ra_db::input` module should be useful: | ||
101 | everything else is strictly derived from those inputs. | ||
102 | |||
103 | ### `crates/ra_hir` | ||
104 | |||
105 | HIR provides high-level "object oriented" access to Rust code. | ||
106 | |||
107 | The principal difference between HIR and syntax trees is that HIR is bound to a | ||
108 | particular crate instance. That is, it has cfg flags and features applied (in | ||
109 | theory, in practice this is to be implemented). So, the relation between | ||
110 | syntax and HIR is many-to-one. The `source_binder` module is responsible for | ||
111 | guessing a HIR for a particular source position. | ||
112 | |||
113 | Underneath, HIR works on top of salsa, using a `HirDatabase` trait. | ||
114 | |||
115 | ### `crates/ra_ide_api` | ||
116 | |||
117 | A stateful library for analyzing many Rust files as they change. `AnalysisHost` | ||
118 | is a mutable entity (clojure's atom) which holds the current state, incorporates | ||
119 | changes and hands out `Analysis` --- an immutable and consistent snapshot of | ||
120 | the world state at a point in time, which actually powers analysis. | ||
121 | |||
122 | One interesting aspect of analysis is its support for cancellation. When a | ||
123 | change is applied to `AnalysisHost`, first all currently active snapshots are | ||
124 | canceled. Only after all snapshots are dropped the change actually affects the | ||
125 | database. | ||
126 | |||
127 | APIs in this crate are IDE centric: they take text offsets as input and produce | ||
128 | offsets and strings as output. This works on top of rich code model powered by | ||
129 | `hir`. | ||
130 | |||
131 | ### `crates/ra_ide_api_light` | ||
132 | |||
133 | All IDE features which can be implemented if you only have access to a single | ||
134 | file. `ra_ide_api_light` could be used to enhance editing of Rust code without | ||
135 | the need to fiddle with build-systems, file synchronization and such. | ||
136 | |||
137 | In a sense, `ra_ide_api_light` is just a bunch of pure functions which take a | ||
138 | syntax tree as input. | ||
139 | |||
140 | The tests for `ra_ide_api_light` are `#[cfg(test)] mod tests` unit-tests spread | ||
141 | throughout its modules. | ||
142 | |||
143 | |||
144 | ### `crates/ra_lsp_server` | ||
145 | |||
146 | An LSP implementation which wraps `ra_ide_api` into a langauge server protocol. | ||
147 | |||
148 | ### `crates/ra_vfs` | ||
149 | |||
150 | Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read | ||
151 | files from disk at the end of the day. This is what `ra_vfs` does. It also | ||
152 | manages overlays: "dirty" files in the editor, whose "true" contents is | ||
153 | different from data on disk. | ||
154 | |||
155 | ### `crates/gen_lsp_server` | ||
156 | |||
157 | A language server scaffold, exposing a synchronous crossbeam-channel based API. | ||
158 | This crate handles protocol handshaking and parsing messages, while you | ||
159 | control the message dispatch loop yourself. | ||
160 | |||
161 | Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages. | ||
162 | |||
163 | ### `crates/ra_cli` | ||
164 | |||
165 | A CLI interface to rust-analyzer. | ||
166 | |||
167 | ### `crate/tools` | ||
168 | |||
169 | Custom Cargo tasks used to develop rust-analyzer: | ||
170 | |||
171 | - `cargo gen-syntax` -- generate `ast` and `syntax_kinds` | ||
172 | - `cargo gen-tests` -- collect inline tests from grammar | ||
173 | - `cargo install-code` -- build and install VS Code extension and server | ||
174 | |||
175 | ### `editors/code` | ||
176 | |||
177 | VS Code plugin | ||
178 | |||
179 | |||
180 | ## Common workflows | ||
181 | |||
182 | To try out VS Code extensions, run `cargo install-code`. This installs both the | ||
183 | `ra_lsp_server` binary and the VS Code extension. To install only the binary, use | ||
184 | `cargo install-lsp` (shorthand for `cargo install --path crates/ra_lsp_server --force`) | ||
185 | |||
186 | To see logs from the language server, set `RUST_LOG=info` env variable. To see | ||
187 | all communication between the server and the client, use | ||
188 | `RUST_LOG=gen_lsp_server=debug` (this will print quite a bit of stuff). | ||
189 | |||
190 | There's `rust-analyzer: status` command which prints common high-level debug | ||
191 | info. In particular, it prints info about memory usage of various data | ||
192 | structures, and, if compiled with jemalloc support (`cargo jinstall-lsp` or | ||
193 | `cargo install --path crates/ra_lsp_server --force --features jemalloc`), includes | ||
194 | statistic about the heap. | ||
195 | |||
196 | To run tests, just `cargo test`. | ||
197 | |||
198 | To work on the VS Code extension, launch code inside `editors/code` and use `F5` to | ||
199 | launch/debug. To automatically apply formatter and linter suggestions, use `npm | ||
200 | run fix`. | ||