aboutsummaryrefslogtreecommitdiff
path: root/docs/dev/ARCHITECTURE.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/dev/ARCHITECTURE.md')
-rw-r--r--docs/dev/ARCHITECTURE.md200
1 files changed, 200 insertions, 0 deletions
diff --git a/docs/dev/ARCHITECTURE.md b/docs/dev/ARCHITECTURE.md
new file mode 100644
index 000000000..57f76ebae
--- /dev/null
+++ b/docs/dev/ARCHITECTURE.md
@@ -0,0 +1,200 @@
1# Architecture
2
3This document describes the high-level architecture of rust-analyzer.
4If you want to familiarize yourself with the code base, you are just
5in the right place!
6
7See also the [guide](./guide.md), which walks through a particular snapshot of
8rust-analyzer code base.
9
10For syntax-trees specifically, there's a [video walk
11through](https://youtu.be/DGAuLWdCCAI) as well.
12
13## The Big Picture
14
15![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png)
16
17On the highest level, rust-analyzer is a thing which accepts input source code
18from the client and produces a structured semantic model of the code.
19
20More specifically, input data consists of a set of test files (`(PathBuf,
21String)` pairs) and information about project structure, captured in the so called
22`CrateGraph`. The crate graph specifies which files are crate roots, which cfg
23flags are specified for each crate (TODO: actually implement this) and what
24dependencies exist between the crates. The analyzer keeps all this input data in
25memory and never does any IO. Because the input data is source code, which
26typically measures in tens of megabytes at most, keeping all input data in
27memory is OK.
28
29A "structured semantic model" is basically an object-oriented representation of
30modules, functions and types which appear in the source code. This representation
31is fully "resolved": all expressions have types, all references are bound to
32declarations, etc.
33
34The client can submit a small delta of input data (typically, a change to a
35single file) and get a fresh code model which accounts for changes.
36
37The underlying engine makes sure that model is computed lazily (on-demand) and
38can be quickly updated for small modifications.
39
40
41## Code generation
42
43Some of the components of this repository are generated through automatic
44processes. These are outlined below:
45
46- `gen-syntax`: The kinds of tokens that are reused in several places, so a generator
47 is used. We use tera templates to generate the files listed below, based on
48 the grammar described in [grammar.ron]:
49 - [ast/generated.rs][ast generated] in `ra_syntax` based on
50 [ast/generated.tera.rs][ast source]
51 - [syntax_kinds/generated.rs][syntax_kinds generated] in `ra_syntax` based on
52 [syntax_kinds/generated.tera.rs][syntax_kinds source]
53
54[tera]: https://tera.netlify.com/
55[grammar.ron]: ./crates/ra_syntax/src/grammar.ron
56[ast generated]: ./crates/ra_syntax/src/ast/generated.rs
57[ast source]: ./crates/ra_syntax/src/ast/generated.rs.tera
58[syntax_kinds generated]: ./crates/ra_syntax/src/syntax_kinds/generated.rs
59[syntax_kinds source]: ./crates/ra_syntax/src/syntax_kinds/generated.rs.tera
60
61
62## Code Walk-Through
63
64### `crates/ra_syntax`
65
66Rust syntax tree structure and parser. See
67[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes.
68
69- [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees.
70- `grammar` module is the actual parser. It is a hand-written recursive descent parser, which
71 produces a sequence of events like "start node X", "finish not Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
72 which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
73 is what we use for the definition of the Rust language.
74- `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees.
75 This is the thing that turns a flat list of events into a tree (see `EventProcessor`)
76- `ast` provides a type safe API on top of the raw `rowan` tree.
77- `grammar.ron` RON description of the grammar, which is used to
78 generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command.
79- `algo`: generic tree algorithms, including `walk` for O(1) stack
80 space tree traversal (this is cool) and `visit` for type-driven
81 visiting the nodes (this is double plus cool, if you understand how
82 `Visitor` works, you understand the design of syntax trees).
83
84Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs`
85(test vectors) and `.txt` files with corresponding syntax trees. During testing, we check
86`.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update
87tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect
88all `//test test_name` comments into files inside `tests/data` directory.
89
90See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which
91fixes a bug in the grammar.
92
93### `crates/ra_db`
94
95We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and
96on-demand computation. Roughly, you can think of salsa as a key-value store, but
97it also can compute derived values using specified functions. The `ra_db` crate
98provides basic infrastructure for interacting with salsa. Crucially, it
99defines most of the "input" queries: facts supplied by the client of the
100analyzer. Reading the docs of the `ra_db::input` module should be useful:
101everything else is strictly derived from those inputs.
102
103### `crates/ra_hir`
104
105HIR provides high-level "object oriented" access to Rust code.
106
107The principal difference between HIR and syntax trees is that HIR is bound to a
108particular crate instance. That is, it has cfg flags and features applied (in
109theory, in practice this is to be implemented). So, the relation between
110syntax and HIR is many-to-one. The `source_binder` module is responsible for
111guessing a HIR for a particular source position.
112
113Underneath, HIR works on top of salsa, using a `HirDatabase` trait.
114
115### `crates/ra_ide_api`
116
117A stateful library for analyzing many Rust files as they change. `AnalysisHost`
118is a mutable entity (clojure's atom) which holds the current state, incorporates
119changes and hands out `Analysis` --- an immutable and consistent snapshot of
120the world state at a point in time, which actually powers analysis.
121
122One interesting aspect of analysis is its support for cancellation. When a
123change is applied to `AnalysisHost`, first all currently active snapshots are
124canceled. Only after all snapshots are dropped the change actually affects the
125database.
126
127APIs in this crate are IDE centric: they take text offsets as input and produce
128offsets and strings as output. This works on top of rich code model powered by
129`hir`.
130
131### `crates/ra_ide_api_light`
132
133All IDE features which can be implemented if you only have access to a single
134file. `ra_ide_api_light` could be used to enhance editing of Rust code without
135the need to fiddle with build-systems, file synchronization and such.
136
137In a sense, `ra_ide_api_light` is just a bunch of pure functions which take a
138syntax tree as input.
139
140The tests for `ra_ide_api_light` are `#[cfg(test)] mod tests` unit-tests spread
141throughout its modules.
142
143
144### `crates/ra_lsp_server`
145
146An LSP implementation which wraps `ra_ide_api` into a langauge server protocol.
147
148### `crates/ra_vfs`
149
150Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read
151files from disk at the end of the day. This is what `ra_vfs` does. It also
152manages overlays: "dirty" files in the editor, whose "true" contents is
153different from data on disk.
154
155### `crates/gen_lsp_server`
156
157A language server scaffold, exposing a synchronous crossbeam-channel based API.
158This crate handles protocol handshaking and parsing messages, while you
159control the message dispatch loop yourself.
160
161Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages.
162
163### `crates/ra_cli`
164
165A CLI interface to rust-analyzer.
166
167### `crate/tools`
168
169Custom Cargo tasks used to develop rust-analyzer:
170
171- `cargo gen-syntax` -- generate `ast` and `syntax_kinds`
172- `cargo gen-tests` -- collect inline tests from grammar
173- `cargo install-code` -- build and install VS Code extension and server
174
175### `editors/code`
176
177VS Code plugin
178
179
180## Common workflows
181
182To try out VS Code extensions, run `cargo install-code`. This installs both the
183`ra_lsp_server` binary and the VS Code extension. To install only the binary, use
184`cargo install-lsp` (shorthand for `cargo install --path crates/ra_lsp_server --force`)
185
186To see logs from the language server, set `RUST_LOG=info` env variable. To see
187all communication between the server and the client, use
188`RUST_LOG=gen_lsp_server=debug` (this will print quite a bit of stuff).
189
190There's `rust-analyzer: status` command which prints common high-level debug
191info. In particular, it prints info about memory usage of various data
192structures, and, if compiled with jemalloc support (`cargo jinstall-lsp` or
193`cargo install --path crates/ra_lsp_server --force --features jemalloc`), includes
194 statistic about the heap.
195
196To run tests, just `cargo test`.
197
198To work on the VS Code extension, launch code inside `editors/code` and use `F5` to
199launch/debug. To automatically apply formatter and linter suggestions, use `npm
200run fix`.