aboutsummaryrefslogtreecommitdiff
path: root/docs/dev/architecture.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/dev/architecture.md')
-rw-r--r--docs/dev/architecture.md174
1 files changed, 174 insertions, 0 deletions
diff --git a/docs/dev/architecture.md b/docs/dev/architecture.md
new file mode 100644
index 000000000..3cd63bf73
--- /dev/null
+++ b/docs/dev/architecture.md
@@ -0,0 +1,174 @@
1# Architecture
2
3This document describes the high-level architecture of rust-analyzer.
4If you want to familiarize yourself with the code base, you are just
5in the right place!
6
7See also the [guide](./guide.md), which walks through a particular snapshot of
8rust-analyzer code base.
9
10Yet another resource is this playlist with videos about various parts of the
11analyzer:
12
13https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE
14
15## The Big Picture
16
17![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png)
18
19On the highest level, rust-analyzer is a thing which accepts input source code
20from the client and produces a structured semantic model of the code.
21
22More specifically, input data consists of a set of test files (`(PathBuf,
23String)` pairs) and information about project structure, captured in the so called
24`CrateGraph`. The crate graph specifies which files are crate roots, which cfg
25flags are specified for each crate (TODO: actually implement this) and what
26dependencies exist between the crates. The analyzer keeps all this input data in
27memory and never does any IO. Because the input data is source code, which
28typically measures in tens of megabytes at most, keeping all input data in
29memory is OK.
30
31A "structured semantic model" is basically an object-oriented representation of
32modules, functions and types which appear in the source code. This representation
33is fully "resolved": all expressions have types, all references are bound to
34declarations, etc.
35
36The client can submit a small delta of input data (typically, a change to a
37single file) and get a fresh code model which accounts for changes.
38
39The underlying engine makes sure that model is computed lazily (on-demand) and
40can be quickly updated for small modifications.
41
42
43## Code generation
44
45Some of the components of this repository are generated through automatic
46processes. These are outlined below:
47
48- `gen-syntax`: The kinds of tokens that are reused in several places, so a generator
49 is used. We use tera templates to generate the files listed below, based on
50 the grammar described in [grammar.ron]:
51 - [ast/generated.rs][ast generated] in `ra_syntax` based on
52 [ast/generated.tera.rs][ast source]
53 - [syntax_kinds/generated.rs][syntax_kinds generated] in `ra_syntax` based on
54 [syntax_kinds/generated.tera.rs][syntax_kinds source]
55
56[tera]: https://tera.netlify.com/
57[grammar.ron]: ./crates/ra_syntax/src/grammar.ron
58[ast generated]: ./crates/ra_syntax/src/ast/generated.rs
59[ast source]: ./crates/ra_syntax/src/ast/generated.rs.tera
60[syntax_kinds generated]: ./crates/ra_syntax/src/syntax_kinds/generated.rs
61[syntax_kinds source]: ./crates/ra_syntax/src/syntax_kinds/generated.rs.tera
62
63
64## Code Walk-Through
65
66### `crates/ra_syntax`, `crates/ra_parser`
67
68Rust syntax tree structure and parser. See
69[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes.
70
71- [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees.
72- `grammar` module is the actual parser. It is a hand-written recursive descent parser, which
73 produces a sequence of events like "start node X", "finish not Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
74 which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
75 is what we use for the definition of the Rust language.
76- `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees.
77 This is the thing that turns a flat list of events into a tree (see `EventProcessor`)
78- `ast` provides a type safe API on top of the raw `rowan` tree.
79- `grammar.ron` RON description of the grammar, which is used to
80 generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command.
81- `algo`: generic tree algorithms, including `walk` for O(1) stack
82 space tree traversal (this is cool) and `visit` for type-driven
83 visiting the nodes (this is double plus cool, if you understand how
84 `Visitor` works, you understand the design of syntax trees).
85
86Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs`
87(test vectors) and `.txt` files with corresponding syntax trees. During testing, we check
88`.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update
89tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect
90all `//test test_name` comments into files inside `tests/data` directory.
91
92See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which
93fixes a bug in the grammar.
94
95### `crates/ra_db`
96
97We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and
98on-demand computation. Roughly, you can think of salsa as a key-value store, but
99it also can compute derived values using specified functions. The `ra_db` crate
100provides basic infrastructure for interacting with salsa. Crucially, it
101defines most of the "input" queries: facts supplied by the client of the
102analyzer. Reading the docs of the `ra_db::input` module should be useful:
103everything else is strictly derived from those inputs.
104
105### `crates/ra_hir`
106
107HIR provides high-level "object oriented" access to Rust code.
108
109The principal difference between HIR and syntax trees is that HIR is bound to a
110particular crate instance. That is, it has cfg flags and features applied (in
111theory, in practice this is to be implemented). So, the relation between
112syntax and HIR is many-to-one. The `source_binder` module is responsible for
113guessing a HIR for a particular source position.
114
115Underneath, HIR works on top of salsa, using a `HirDatabase` trait.
116
117### `crates/ra_ide_api`
118
119A stateful library for analyzing many Rust files as they change. `AnalysisHost`
120is a mutable entity (clojure's atom) which holds the current state, incorporates
121changes and hands out `Analysis` --- an immutable and consistent snapshot of
122the world state at a point in time, which actually powers analysis.
123
124One interesting aspect of analysis is its support for cancellation. When a
125change is applied to `AnalysisHost`, first all currently active snapshots are
126canceled. Only after all snapshots are dropped the change actually affects the
127database.
128
129APIs in this crate are IDE centric: they take text offsets as input and produce
130offsets and strings as output. This works on top of rich code model powered by
131`hir`.
132
133### `crates/ra_ide_api_light`
134
135All IDE features which can be implemented if you only have access to a single
136file. `ra_ide_api_light` could be used to enhance editing of Rust code without
137the need to fiddle with build-systems, file synchronization and such.
138
139In a sense, `ra_ide_api_light` is just a bunch of pure functions which take a
140syntax tree as input.
141
142The tests for `ra_ide_api_light` are `#[cfg(test)] mod tests` unit-tests spread
143throughout its modules.
144
145
146### `crates/ra_lsp_server`
147
148An LSP implementation which wraps `ra_ide_api` into a langauge server protocol.
149
150### `ra_vfs`
151
152Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read
153files from disk at the end of the day. This is what `ra_vfs` does. It also
154manages overlays: "dirty" files in the editor, whose "true" contents is
155different from data on disk. This is more or less the single really
156platform-dependent component, so it lives in a separate repository and has an
157extensive cross-platform CI testing.
158
159### `crates/gen_lsp_server`
160
161A language server scaffold, exposing a synchronous crossbeam-channel based API.
162This crate handles protocol handshaking and parsing messages, while you
163control the message dispatch loop yourself.
164
165Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages.
166
167### `crates/ra_cli`
168
169A CLI interface to rust-analyzer.
170
171
172## Testing Infrastructure
173
174