From d56c2f24252d5d444267e33950825f0a7cb438ca Mon Sep 17 00:00:00 2001 From: Aleksey Kladov Date: Wed, 20 Mar 2019 16:05:49 +0300 Subject: explain how to launch the thing --- docs/dev/architecture.md | 174 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 docs/dev/architecture.md (limited to 'docs/dev/architecture.md') diff --git a/docs/dev/architecture.md b/docs/dev/architecture.md new file mode 100644 index 000000000..3cd63bf73 --- /dev/null +++ b/docs/dev/architecture.md @@ -0,0 +1,174 @@ +# Architecture + +This document describes the high-level architecture of rust-analyzer. +If you want to familiarize yourself with the code base, you are just +in the right place! + +See also the [guide](./guide.md), which walks through a particular snapshot of +rust-analyzer code base. + +Yet another resource is this playlist with videos about various parts of the +analyzer: + +https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE + +## The Big Picture + +![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png) + +On the highest level, rust-analyzer is a thing which accepts input source code +from the client and produces a structured semantic model of the code. + +More specifically, input data consists of a set of test files (`(PathBuf, +String)` pairs) and information about project structure, captured in the so called +`CrateGraph`. The crate graph specifies which files are crate roots, which cfg +flags are specified for each crate (TODO: actually implement this) and what +dependencies exist between the crates. The analyzer keeps all this input data in +memory and never does any IO. Because the input data is source code, which +typically measures in tens of megabytes at most, keeping all input data in +memory is OK. + +A "structured semantic model" is basically an object-oriented representation of +modules, functions and types which appear in the source code. This representation +is fully "resolved": all expressions have types, all references are bound to +declarations, etc. + +The client can submit a small delta of input data (typically, a change to a +single file) and get a fresh code model which accounts for changes. + +The underlying engine makes sure that model is computed lazily (on-demand) and +can be quickly updated for small modifications. + + +## Code generation + +Some of the components of this repository are generated through automatic +processes. These are outlined below: + +- `gen-syntax`: The kinds of tokens that are reused in several places, so a generator + is used. We use tera templates to generate the files listed below, based on + the grammar described in [grammar.ron]: + - [ast/generated.rs][ast generated] in `ra_syntax` based on + [ast/generated.tera.rs][ast source] + - [syntax_kinds/generated.rs][syntax_kinds generated] in `ra_syntax` based on + [syntax_kinds/generated.tera.rs][syntax_kinds source] + +[tera]: https://tera.netlify.com/ +[grammar.ron]: ./crates/ra_syntax/src/grammar.ron +[ast generated]: ./crates/ra_syntax/src/ast/generated.rs +[ast source]: ./crates/ra_syntax/src/ast/generated.rs.tera +[syntax_kinds generated]: ./crates/ra_syntax/src/syntax_kinds/generated.rs +[syntax_kinds source]: ./crates/ra_syntax/src/syntax_kinds/generated.rs.tera + + +## Code Walk-Through + +### `crates/ra_syntax`, `crates/ra_parser` + +Rust syntax tree structure and parser. See +[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes. + +- [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees. +- `grammar` module is the actual parser. It is a hand-written recursive descent parser, which + produces a sequence of events like "start node X", "finish not Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java), + which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs) + is what we use for the definition of the Rust language. +- `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees. + This is the thing that turns a flat list of events into a tree (see `EventProcessor`) +- `ast` provides a type safe API on top of the raw `rowan` tree. +- `grammar.ron` RON description of the grammar, which is used to + generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command. +- `algo`: generic tree algorithms, including `walk` for O(1) stack + space tree traversal (this is cool) and `visit` for type-driven + visiting the nodes (this is double plus cool, if you understand how + `Visitor` works, you understand the design of syntax trees). + +Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs` +(test vectors) and `.txt` files with corresponding syntax trees. During testing, we check +`.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update +tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect +all `//test test_name` comments into files inside `tests/data` directory. + +See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which +fixes a bug in the grammar. + +### `crates/ra_db` + +We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and +on-demand computation. Roughly, you can think of salsa as a key-value store, but +it also can compute derived values using specified functions. The `ra_db` crate +provides basic infrastructure for interacting with salsa. Crucially, it +defines most of the "input" queries: facts supplied by the client of the +analyzer. Reading the docs of the `ra_db::input` module should be useful: +everything else is strictly derived from those inputs. + +### `crates/ra_hir` + +HIR provides high-level "object oriented" access to Rust code. + +The principal difference between HIR and syntax trees is that HIR is bound to a +particular crate instance. That is, it has cfg flags and features applied (in +theory, in practice this is to be implemented). So, the relation between +syntax and HIR is many-to-one. The `source_binder` module is responsible for +guessing a HIR for a particular source position. + +Underneath, HIR works on top of salsa, using a `HirDatabase` trait. + +### `crates/ra_ide_api` + +A stateful library for analyzing many Rust files as they change. `AnalysisHost` +is a mutable entity (clojure's atom) which holds the current state, incorporates +changes and hands out `Analysis` --- an immutable and consistent snapshot of +the world state at a point in time, which actually powers analysis. + +One interesting aspect of analysis is its support for cancellation. When a +change is applied to `AnalysisHost`, first all currently active snapshots are +canceled. Only after all snapshots are dropped the change actually affects the +database. + +APIs in this crate are IDE centric: they take text offsets as input and produce +offsets and strings as output. This works on top of rich code model powered by +`hir`. + +### `crates/ra_ide_api_light` + +All IDE features which can be implemented if you only have access to a single +file. `ra_ide_api_light` could be used to enhance editing of Rust code without +the need to fiddle with build-systems, file synchronization and such. + +In a sense, `ra_ide_api_light` is just a bunch of pure functions which take a +syntax tree as input. + +The tests for `ra_ide_api_light` are `#[cfg(test)] mod tests` unit-tests spread +throughout its modules. + + +### `crates/ra_lsp_server` + +An LSP implementation which wraps `ra_ide_api` into a langauge server protocol. + +### `ra_vfs` + +Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read +files from disk at the end of the day. This is what `ra_vfs` does. It also +manages overlays: "dirty" files in the editor, whose "true" contents is +different from data on disk. This is more or less the single really +platform-dependent component, so it lives in a separate repository and has an +extensive cross-platform CI testing. + +### `crates/gen_lsp_server` + +A language server scaffold, exposing a synchronous crossbeam-channel based API. +This crate handles protocol handshaking and parsing messages, while you +control the message dispatch loop yourself. + +Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages. + +### `crates/ra_cli` + +A CLI interface to rust-analyzer. + + +## Testing Infrastructure + + -- cgit v1.2.3