From 84dfbfbd1d72c276a93518fea41196f75069d17e Mon Sep 17 00:00:00 2001 From: Aleksey Kladov Date: Wed, 29 Jan 2020 15:08:31 +0100 Subject: Freshen Architecture.md document --- docs/dev/README.md | 4 +++ docs/dev/architecture.md | 79 +++++++++++++++++++++++++----------------------- 2 files changed, 45 insertions(+), 38 deletions(-) diff --git a/docs/dev/README.md b/docs/dev/README.md index a2be99858..d30727786 100644 --- a/docs/dev/README.md +++ b/docs/dev/README.md @@ -106,6 +106,10 @@ communication, and `print!` would break it. If I need to fix something simultaneously in the server and in the client, I feel even more sad. I don't have a specific workflow for this case. +Additionally, I use `cargo run --release -p ra_cli -- analysis-stats +path/to/some/rust/crate` to run a batch analysis. This is primaraly useful for +performance optimiations, or for bug minimization. + # Logging Logging is done by both rust-analyzer and VS Code, so it might be tricky to diff --git a/docs/dev/architecture.md b/docs/dev/architecture.md index 629645757..9675ed0b6 100644 --- a/docs/dev/architecture.md +++ b/docs/dev/architecture.md @@ -12,6 +12,9 @@ analyzer: https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE +Note that the guide and videos are pretty dated, this document should be in +generally fresher. + ## The Big Picture ![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png) @@ -20,13 +23,12 @@ On the highest level, rust-analyzer is a thing which accepts input source code from the client and produces a structured semantic model of the code. More specifically, input data consists of a set of test files (`(PathBuf, -String)` pairs) and information about project structure, captured in the so called -`CrateGraph`. The crate graph specifies which files are crate roots, which cfg -flags are specified for each crate (TODO: actually implement this) and what -dependencies exist between the crates. The analyzer keeps all this input data in -memory and never does any IO. Because the input data is source code, which -typically measures in tens of megabytes at most, keeping all input data in -memory is OK. +String)` pairs) and information about project structure, captured in the so +called `CrateGraph`. The crate graph specifies which files are crate roots, +which cfg flags are specified for each crate and what dependencies exist between +the crates. The analyzer keeps all this input data in memory and never does any +IO. Because the input data are source code, which typically measures in tens of +megabytes at most, keeping everything in memory is OK. A "structured semantic model" is basically an object-oriented representation of modules, functions and types which appear in the source code. This representation @@ -43,37 +45,39 @@ can be quickly updated for small modifications. ## Code generation Some of the components of this repository are generated through automatic -processes. These are outlined below: +processes. `cargo xtask codegen` runs all generation tasks. Generated code is +commited to the git repository. + +In particular, `cargo xtask codegen` generates: + +1. [`syntax_kind/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_parser/src/syntax_kind/generated.rs) + -- the set of terminals and non-terminals of rust grammar. -- `cargo xtask codegen`: The kinds of tokens that are reused in several places, so a generator - is used. We use `quote!` macro to generate the files listed below, based on - the grammar described in [grammar.ron]: - - [ast/generated.rs][ast generated] - - [syntax_kind/generated.rs][syntax_kind generated] +2. [`ast/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_syntax/src/ast/generated.rs) + -- AST data structure. -[grammar.ron]: ../../crates/ra_syntax/src/grammar.ron -[ast generated]: ../../crates/ra_syntax/src/ast/generated.rs -[syntax_kind generated]: ../../crates/ra_parser/src/syntax_kind/generated.rs +.3 [`doc_tests/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_assists/src/doc_tests/generated.rs), + [`test_data/parser/inline`](https://github.com/rust-analyzer/rust-analyzer/tree/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_syntax/test_data/parser/inline) + -- tests for assists and the parser. + +The source for 1 and 2 is in [`ast_src.rs`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/xtask/src/ast_src.rs). ## Code Walk-Through ### `crates/ra_syntax`, `crates/ra_parser` Rust syntax tree structure and parser. See -[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes. +[RFC](https://github.com/rust-lang/rfcs/pull/2256) and [./syntax.md](./syntax.md) for some design notes. - [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees. - `grammar` module is the actual parser. It is a hand-written recursive descent parser, which produces a sequence of events like "start node X", "finish node Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java), which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs) is what we use for the definition of the Rust language. -- `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees. - This is the thing that turns a flat list of events into a tree (see `EventProcessor`) +- `TreeSink` and `TokenSource` traits bridge the tree-agnostic parser from `grammar` with `rowan` trees. - `ast` provides a type safe API on top of the raw `rowan` tree. -- `grammar.ron` RON description of the grammar, which is used to - generate `syntax_kinds` and `ast` modules, using `cargo xtask codegen` command. -- `algo`: generic tree algorithms, including `walk` for O(1) stack - space tree traversal (this is cool). +- `ast_src` description of the grammar, which is used to generate `syntax_kinds` + and `ast` modules, using `cargo xtask codegen` command. Tests for ra_syntax are mostly data-driven: `test_data/parser` contains subdirectories with a bunch of `.rs` (test vectors) and `.txt` files with corresponding syntax trees. During testing, we check @@ -81,6 +85,10 @@ Tests for ra_syntax are mostly data-driven: `test_data/parser` contains subdirec tests). Additionally, running `cargo xtask codegen` will walk the grammar module and collect all `// test test_name` comments into files inside `test_data/parser/inline` directory. +Note +[`api_walkthrough`](https://github.com/rust-analyzer/rust-analyzer/blob/2fb6af89eb794f775de60b82afe56b6f986c2a40/crates/ra_syntax/src/lib.rs#L190-L348) +in particular: it shows off various methods of working with syntax tree. + See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which fixes a bug in the grammar. @@ -94,18 +102,22 @@ defines most of the "input" queries: facts supplied by the client of the analyzer. Reading the docs of the `ra_db::input` module should be useful: everything else is strictly derived from those inputs. -### `crates/ra_hir` +### `crates/ra_hir*` crates HIR provides high-level "object oriented" access to Rust code. The principal difference between HIR and syntax trees is that HIR is bound to a -particular crate instance. That is, it has cfg flags and features applied (in -theory, in practice this is to be implemented). So, the relation between -syntax and HIR is many-to-one. The `source_binder` module is responsible for -guessing a HIR for a particular source position. +particular crate instance. That is, it has cfg flags and features applied. So, +the relation between syntax and HIR is many-to-one. The `source_binder` module +is responsible for guessing a HIR for a particular source position. Underneath, HIR works on top of salsa, using a `HirDatabase` trait. +`ra_hir_xxx` crates have a strong ECS flavor, in that they work with raw ids and +directly query the databse. + +The top-level `ra_hir` façade crate wraps ids into a more OO-flavored API. + ### `crates/ra_ide` A stateful library for analyzing many Rust files as they change. `AnalysisHost` @@ -135,18 +147,9 @@ different from data on disk. This is more or less the single really platform-dependent component, so it lives in a separate repository and has an extensive cross-platform CI testing. -### `crates/gen_lsp_server` - -A language server scaffold, exposing a synchronous crossbeam-channel based API. -This crate handles protocol handshaking and parsing messages, while you -control the message dispatch loop yourself. - -Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages. - ### `crates/ra_cli` -A CLI interface to rust-analyzer. - +A CLI interface to rust-analyzer, mainly for testing. ## Testing Infrastructure -- cgit v1.2.3