add guide

author: Aleksey Kladov <[email protected]> 2019-01-19 12:51:46 +0000
committer: Aleksey Kladov <[email protected]> 2019-01-21 08:27:01 +0000
commit: 7e1d866481aa6f11dbe96c4d47b6b61b276f07b3 (patch)
tree: 53355cf1d99d2af3555dab3030c5efc88002a052
parent: 237bb929f4332021d97b064dd8178518b30e1f32 (diff)
1 files changed, 364 insertions, 0 deletions
diff --git a/guide.md b/guide.md
new file mode 100644
index 000000000..beb8294ba
--- /dev/null
+++ b/guide.md
@@ -0,0 +1,364 @@
+# Guide to rust-analyzer
+## About the guide
+This guide describes the current start of the rust-analyzer as of 2019-01-20
+(commit hash guide-2019-01). Its purpose is to
+document various problems and architectural solutions related to the problem of
+building IDE-first compiler.
+## The big picture
+On the highest possible level, rust analyzer is a stateful component. Client may
+apply changes to the analyzer (new contents of `foo.rs` file is "fn main() {}")
+and it may ask semantic questions about the current state (what is the
+definition of the identifier with offset 92 in file `bar.rs`?). Two important
+properties hold:
+* Analyzer does not do any IO. It starts in an empty state and all input data is
+  provided via `apply_change` API.
+* Only queries about the current state are supported. One can, of course,
+  simulate undo and redo by keeping log of changes and inverse-changes.
+## IDE API
+To see this big picture, let's take a look at the [`AnalysisHost`] and
+[`Analysis`] pair of types. `AnalysisHost` has three methods:
+* `default` for creating an empty analysis
+* `apply_change(&mut self)` to make changes (this is how you get from an empty
+  state to something interesting)
+* `analysis(&self)` to get an instance of `Analysis`
+`Analysis` has a ton of methods for IDEs, like `goto_definition`, or
+`completions`. Both inputs and outputs of `Analysis`' methods are formulated in
+terms of files and offsets, and **not** in terms of Rust concepts like structs,
+traits, etc. The "typed" API with Rust specific types is slightly lower in the
+stack, we'll talk about it later.
+[`AnalysisHost`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/lib.rs#L265-L284
+[`Analysis`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/lib.rs#L291-L478
+The reason for `Analysis` and `AnalysisHost` separation is that we want apply
+changes "uniquely", but we might want to fork an `Analysis` and send it to
+another thread for background processing. That is, there is only a single
+`AnalysisHost`, but there may be several (equivalent) `Analysis`.
+Note that all of the `Analysis` API return `Cancelable<T>`. This is required to
+be responsive in IDE setting. Sometimes a long-running query is being computed
+and the user types something in the editor and asks for completion. In this
+case, we cancel the long-running computation (so it returns `Err(Canceled)`),
+apply the change and execute request for completion. We never use stale data to
+answer requests. Under the cover, `AnalysisHost` "remembers" all outstanding
+`Analysis` instances. `AnalysisHost::apply_change` method cancels all
+`Analysis`es, blocks until of them are `Dropped` and then applies change
+in-place. This is the familiar to rustaceans read-write lock interior
+mutability.
+Next, lets talk about what are inputs to the Analysis, precisely.
+## Inputs
+Rust Analyzer never does any IO itself, all inputs get passed explicitly via
+`AnalysisHost::apply_change` method, which accepts a single argument:
+`AnalysisChange`. [`AnalysisChange`] is a builder for a single change
+"transaction", so it suffices to study its methods to understand all of the
+input data.
+[`AnalysisChange`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/lib.rs#L119-L167
+The `(add|change|remove)_file` methods control the set of the input files, where
+each file has an integer id (`FileId`, picked by the client), text (`String`)
+and a filesystem path. Paths are tricky, they'll be explained in source roots
+section, together with `add_root` method. `add_library` method allows to add a
+group of files which are assumed to rarely change. It's mostly an optimization
+and does not change fundamental picture.
+`set_crate_graph` method allows to control how the input files are partitioned
+into compilation unites -- crates. It also controls (in theory, not implemented
+yet) `cfg` flags. `CrateGraph` is a directed acyclic graph of crates. Each crate
+has a root `FileId`, a set of active `cfg` flags and a set of dependencies. Each
+dependency is a pair of a crate and a name. It is possible to have two crates
+with the same root `FileId` but different `cfg`-flags/dependencies. This model
+is lower than Cargo's model of packages: each Cargo package consists of several
+targets, each of which is a separate crate (or several crates, if you try
+different feature combinations).
+Procedural macros should become inputs as well, but currently they are not
+supported. Procedural macro will be a black box `Box<dyn Fn(TokenStream) -> TokenStream>`
+function, and will be inserted into the crate graph just like dependencies.
+Soon we'll talk how we build an LSP server on top of `Analysis`, but first,
+let's deal with that paths issue.
+## Source roots (aka filesystems are horrible)
+This is a non-essential section, feel free to skip.
+The previous section said that the file system path is an attribute of a file,
+but this is not a whole truth. Making it an absolute `PathBuf` will be bad for
+several reasons. First, file-systems are full of (platform-dependent) edge cases:
+* it's hard (requires a syscall) to decide if two paths are equivalent
+* some file-systems are case-sensitive
+* paths are not necessary UTF-8
+* symlinks can form cycles
+Second, this might hurt reproducibility and hermeticity of builds. In theory,
+moving a project from `/foo/bar/my-project` to `/spam/eggs/my-project` should
+not change a bit in the output. However, if absolute path is a part of the
+input, it is at least in theory observable, and *could* affect the output.
+Yet another problem is that we really-really want to avoid doing IO, but with
+Rust the set of "input" files is not necessary known up-front. In theory, you
+can have `#[path="/dev/random"] mod foo;`.
+To solve (or explicitly refuse to solve) these problems rust analyzer uses the
+concept of source root. Roughly speaking, source roots is a contents of a
+directory on a file systems, like `/home/matklad/projects/rustraytracer/**.rs`.
+More precisely, all files (`FileId`s) are partitioned into disjoint
+`SourceRoot`s. Each file has a relative utf-8 path within the `SourceRoot`.
+`SourceRoot` has an identity (integer id). Crucially, the root path of the
+source root itself is unknown to the analyzer: client is supposed to maintain a
+mapping between SourceRoot ids (which are assigned by the client) and actual
+`PathBuf`s. `SourceRoot`s give a sane tree model of the file system to the
+analyzer.
+Note that `mod`, `#[path]` and `include!()` can only reference files from the
+same source root. It is of course is possible to explicitly add extra files to
+the source root, even `/dev/random`.
+## Language Server Protocol
+Now let's see how `Analysis` API is exposed via JSON RPC based LSP protocol. The
+hard part here is managing changes (which can come either from the file system
+or from the editor) and concurrency (we want to spawn background jobs for things
+like syntax highlighting). We use the event loop pattern to manage the zoo, and
+the loop is the [`main_loop_inner`] function. The [`main_loop`] does a one-time
+initialization and tearing down of the resources.
+[`main_loop`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L51-L110
+[`main_loop_inner`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L156-L258
+Let's walk through a typical analyzer session!
+First, we need to figure out what to analyze. To do this, we run `cargo
+metadata` to learn about Cargo packages for current workspace and dependencies,
+and we run `rustc --print sysroot` and scan sysroot to learn about crates like
+`std`. Currently we load this configuration once at the start of the server, but
+it should be possible to dynamically reconfigure it later without restart.
+[main_loop.rs#L62-L70](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L62-L70)
+The [`ProjectModel`] we get after this step is very Cargo and sysroot specific,
+it needs to be lowered to get the input in the form of `AnalysisChange`. This
+happens in [`ServerWorldState::new`] method. Specifically
+* Create a `SourceRoot` for each Cargo package and sysroot.
+* Schedule a file system scan of the roots.
+* Create an analyzer's `Crate` for each Cargo **target** and sysroot crate.
+* Setup dependencies between the crates.
+[`ProjectModel`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/project_model.rs#L16-L20
+[`ServerWorldState::new`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/server_world.rs#L38-L160
+The results of the scan (which may take a while) will be processed in the body
+of the main loop, just like any other change. Here's where we handle
+* [File system changes](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L194)
+* [Changes from the editor](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L377)
+After a single loop's turn, we group them into one `AnalysisChange` and
+[apply] it. This always happens on the main thread and blocks the loop.
+[apply]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/server_world.rs#L216
+To handle requests, like ["goto definition"], we create an instance of the
+`Analysis` and [`schedule`] the task (which consumes `Analysis`) onto
+threadpool. [The task] calls the corresponding `Analysis` method, while
+massaging the types into the LSP representation. Keep in mind that if we are
+executing "goto definition" on the threadpool and a new change comes in, the
+task will be canceled as soon as the main loop calls `apply_change` on the
+`AnalysisHost`.
+["goto definition"]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/server_world.rs#L216
+[`schedule`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L426-L455
+[The task]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop/handlers.rs#L205-L223
+This concludes the overview of the analyzer's programing *interface*. Next, lets
+dig into the implementation!
+## Salsa
+The most straightforward way to implement "apply change, get analysis, repeat"
+API would be to maintain the input state and to compute all possible analysis
+information from scratch after every change. This works, but scales poorly with
+the size of the project. To make this fast, we need to take advantage of the
+fact that most of the changes are small, and that analysis results are unlikely
+to change significantly between invocations.
+To do this we use [salsa]: a framework for incremental on-demand computation.
+You can skip the rest of the section if you are familiar with rustc red-green
+algorithm.
+[salsa]: https://github.com/salsa-rs/salsa
+It's better to refer to salsa's docs to learn about it. Here's a small excerpt:
+The key idea of salsa is that you define your program as a set of queries. Every
+query is used like function K -> V that maps from some key of type K to a value
+of type V. Queries come in two basic varieties:
+* **Inputs**: the base inputs to your system. You can change these whenever you
+  like.
+* **Functions**: pure functions (no side effects) that transform your inputs
+  into other values. The results of queries is memoized to avoid recomputing
+  them a lot. When you make changes to the inputs, we'll figure out (fairly
+  intelligently) when we can re-use these memoized values and when we have to
+  recompute them.
+For further discussion, its important to understand one bit of "fairly
+intelligently". Suppose we have to functions, `f1` and `f2`, and one input, `i`.
+We call `f1(X)` which in turn calls `f2(Y)` which inspects `i(Z)`. `i(Z)`
+returns some value `V1`, `f2` uses that and returns `R1`, `f1` uses that and
+returns `O`. Now, let's change `i` at `Z` to `V2` from `V1` and try to compute
+`f1(X)` again. Because `f1(X)` (transitively) depends on `i(Z)`, we can't just
+reuse its value as is. However, if `f2(Y)` is *still* equal to `R1` (despite the
+`i`'s change), we, in fact, *can* reuse `O` as result of `f1(X)`. And that's how
+salsa works: it recomputes results in *reverse* order, starting from inputs and
+progressing towards outputs, stopping as soon as it sees an intermediate value
+that hasn't changed.
+## Salsa Input Queries
+All analyzer information is stored in a salsa database. `Analysis` and
+`AnalysisHost` types are newtype wrappers for [`RootDatabase`] -- a salsa
+database.
+[`RootDatabase`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/db.rs#L88-L134
+Salsa input queries are defined in [`FilesDatabase`] (which is a part of
+`RootDatabase`). They closely mirror the familiar `AnalysisChange` structure:
+indeed, what `apply_change` does is it sets the values of input queries.
+[`FilesDatabase`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_db/src/input.rs#L150-L174
+## From text to semantic model
+The bulk of the rust-analyzer is transforming input text into semantic model of
+Rust code: a web of entities like modules, structs, functions and traits.
+An important fact to realize is that (unlike most other languages like C# or
+Java) there isn't a one-to-one mapping between source code and semantic model. A
+single function definition in the source code might result in several semantic
+functions: for example, the same source file might be included as a module into
+several crate, or a single "crate" might be present in the compilation DAG
+several times, with different sets of `cfg`s enabled.
+The semantic interface is declared in [`code_model_api`] module. Each entity is
+identified by integer id and has a bunch of methods which take a salsa database
+as an argument and returns other entities (which are ids). Internally, this
+methods invoke various queries on the database to build the model on demand.
+Here's [the list of queries].
+[`code_model_api`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_hir/src/code_model_api.rs
+[the list of queries]: https://github.com/rust-analyzer/rust-analyzer/blob/7e84440e25e19529e4ff8a66e521d1b06349c6ec/crates/ra_hir/src/db.rs#L20-L106
+The first step of building the model is parsing the source code.
+## Syntax trees
+An important property of the Rust language is that each file can be parsed in
+isolation. Unlike, say, `C++`, an `include` can't change the meaning of the
+syntax. For this reason, Rust analyzer can build a syntax tree for each "source
+file", which could then be reused by several semantic models if this file
+happens to be a part of several crates.
+Rust analyzer uses a similar representation of syntax trees to that of `Roslyn`
+and Swift's new
+[libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
+Swift's docs give an excellent overview of the approach, so I skip this part
+here and instead outline the main characteristics of the syntax trees:
+* Syntax trees are fully lossless. Converting **any** text to a syntax tree and
+  back is a total identity function. All whitespace and comments are explicitly
+  represented in the tree.
+* Syntax nodes have generic `(next|previous)_sibling`, `parent`,
+  `(first|last)_child` functions. You can get from any one node to any other
+  node in the file using only these functions.
+* Syntax nodes know their range (start offset and length) in the file.
+* Syntax nodes share the ownership of their syntax tree: if you keep a reference
+  to a single function, the whole enclosing file is alive.
+* Syntax trees are immutable and the cost of replacing the subtree is
+  proportional to the depth of the subtree. Read Swift's docs to learn how
+  immutable + parent pointers + cheap modification is possible.
+* Syntax trees are build on best-effort basis. All accessor methods return
+  `Option`s. The tree for `fn foo` will contain a function declaration with
+  `None` for parameter list and body.
+* Syntax trees do not know the file they are build from, they only know about
+  the text.
+The implementation is based on the generic [rowan] crate on top of which a
+[rust-specific] AST is generated.
+[rowan]: https://github.com/rust-analyzer/rowan/tree/100a36dc820eb393b74abe0d20ddf99077b61f88
+[rust-specific]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_syntax/src/ast/generated.rs
+The next step in constructing the semantic model is ...
+## Building a Module Tree
+The algorithm for building a tree of modules is to start with a crate root
+(remember, each `Crate` from a `CrateGraph` has a `FileId`), collect all mod
+declarations and recursively process child modules. This is handled by the
+[`module_tree_query`](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_hir/src/module_tree.rs#L116-L123),
+with a two slight variations.
+First, rust analyzer builds a module tree for all crates in a source root
+simultaneously. The main reason for this is historical (`module_tree` predates
+`CrateGraph`), but this approach also allows to account for files which are not
+part of any crate. That is, if you create a file but do not include it as a
+submodule anywhere, you still get semantic completion, and you get a warning
+about free-floating module (the actual warning is not implemented yet).
+The second difference is that `module_tree_query` does not *directly* depend on
+the "parse" query (which is confusingly called `source_file`). Why would calling
+the parse directly be bad? Suppose the user changes the file slightly, by adding
+an insignificant whitespace. Adding whitespace changes the parse tree (because
+it includes whitespace), and that means recomputing the whole module tree.
+We deal with this problem by introducing an intermediate [`submodules_query`].
+This query processes the syntax tree an extract a set of declared submodule
+names. Now, changing the whitespace results in `submodules_query` being
+re-executed for a *single* module, but because the result of this query stays
+the same, we don't have to re-execute [`module_tree_query`]. In fact, we only
+need to re-execute it when we add/remove new files or when we change mod
+declarations,
+[`submodules_query`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_hir/src/module_tree.rs#L41)
+## Location Interner pattern
+## Macros and recursive locations
+## Name resolution
+## Source Map pattern
+## Tying it all together: completion
author	Aleksey Kladov <[email protected]>	2019-01-19 12:51:46 +0000
committer	Aleksey Kladov <[email protected]>	2019-01-21 08:27:01 +0000
commit	7e1d866481aa6f11dbe96c4d47b6b61b276f07b3 (patch)
tree	53355cf1d99d2af3555dab3030c5efc88002a052
parent	237bb929f4332021d97b064dd8178518b30e1f32 (diff)

diff --git a/guide.md b/guide.md new file mode 100644 index 000000000..beb8294ba --- /dev/null +++ b/guide.md
@@ -0,0 +1,364 @@
	1	# Guide to rust-analyzer
	2
	3	## About the guide
	4
	5	This guide describes the current start of the rust-analyzer as of 2019-01-20
	6	(commit hash guide-2019-01). Its purpose is to
	7	document various problems and architectural solutions related to the problem of
	8	building IDE-first compiler.
	9
	10	## The big picture
	11
	12	On the highest possible level, rust analyzer is a stateful component. Client may
	13	apply changes to the analyzer (new contents of `foo.rs` file is "fn main() {}")
	14	and it may ask semantic questions about the current state (what is the
	15	definition of the identifier with offset 92 in file `bar.rs`?). Two important
	16	properties hold:
	17
	18	* Analyzer does not do any IO. It starts in an empty state and all input data is
	19	provided via `apply_change` API.
	20
	21	* Only queries about the current state are supported. One can, of course,
	22	simulate undo and redo by keeping log of changes and inverse-changes.
	23
	24	## IDE API
	25
	26	To see this big picture, let's take a look at the [`AnalysisHost`] and
	27	[`Analysis`] pair of types. `AnalysisHost` has three methods:
	28
	29	* `default` for creating an empty analysis
	30	* `apply_change(&mut self)` to make changes (this is how you get from an empty
	31	state to something interesting)
	32	* `analysis(&self)` to get an instance of `Analysis`
	33
	34	`Analysis` has a ton of methods for IDEs, like `goto_definition`, or
	35	`completions`. Both inputs and outputs of `Analysis`' methods are formulated in
	36	terms of files and offsets, and not in terms of Rust concepts like structs,
	37	traits, etc. The "typed" API with Rust specific types is slightly lower in the
	38	stack, we'll talk about it later.
	39
	40	[`AnalysisHost`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/lib.rs#L265-L284
	41	[`Analysis`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/lib.rs#L291-L478
	42
	43	The reason for `Analysis` and `AnalysisHost` separation is that we want apply
	44	changes "uniquely", but we might want to fork an `Analysis` and send it to
	45	another thread for background processing. That is, there is only a single
	46	`AnalysisHost`, but there may be several (equivalent) `Analysis`.
	47
	48	Note that all of the `Analysis` API return `Cancelable<T>`. This is required to
	49	be responsive in IDE setting. Sometimes a long-running query is being computed
	50	and the user types something in the editor and asks for completion. In this
	51	case, we cancel the long-running computation (so it returns `Err(Canceled)`),
	52	apply the change and execute request for completion. We never use stale data to
	53	answer requests. Under the cover, `AnalysisHost` "remembers" all outstanding
	54	`Analysis` instances. `AnalysisHost::apply_change` method cancels all
	55	`Analysis`es, blocks until of them are `Dropped` and then applies change
	56	in-place. This is the familiar to rustaceans read-write lock interior
	57	mutability.
	58
	59	Next, lets talk about what are inputs to the Analysis, precisely.
	60
	61	## Inputs
	62
	63	Rust Analyzer never does any IO itself, all inputs get passed explicitly via
	64	`AnalysisHost::apply_change` method, which accepts a single argument:
	65	`AnalysisChange`. [`AnalysisChange`] is a builder for a single change
	66	"transaction", so it suffices to study its methods to understand all of the
	67	input data.
	68
	69	[`AnalysisChange`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/lib.rs#L119-L167
	70
	71	The `(add\|change\|remove)_file` methods control the set of the input files, where
	72	each file has an integer id (`FileId`, picked by the client), text (`String`)
	73	and a filesystem path. Paths are tricky, they'll be explained in source roots
	74	section, together with `add_root` method. `add_library` method allows to add a
	75	group of files which are assumed to rarely change. It's mostly an optimization
	76	and does not change fundamental picture.
	77
	78	`set_crate_graph` method allows to control how the input files are partitioned
	79	into compilation unites -- crates. It also controls (in theory, not implemented
	80	yet) `cfg` flags. `CrateGraph` is a directed acyclic graph of crates. Each crate
	81	has a root `FileId`, a set of active `cfg` flags and a set of dependencies. Each
	82	dependency is a pair of a crate and a name. It is possible to have two crates
	83	with the same root `FileId` but different `cfg`-flags/dependencies. This model
	84	is lower than Cargo's model of packages: each Cargo package consists of several
	85	targets, each of which is a separate crate (or several crates, if you try
	86	different feature combinations).
	87
	88	Procedural macros should become inputs as well, but currently they are not
	89	supported. Procedural macro will be a black box `Box<dyn Fn(TokenStream) -> TokenStream>`
	90	function, and will be inserted into the crate graph just like dependencies.
	91
	92	Soon we'll talk how we build an LSP server on top of `Analysis`, but first,
	93	let's deal with that paths issue.
	94
	95
	96	## Source roots (aka filesystems are horrible)
	97
	98	This is a non-essential section, feel free to skip.
	99
	100	The previous section said that the file system path is an attribute of a file,
	101	but this is not a whole truth. Making it an absolute `PathBuf` will be bad for
	102	several reasons. First, file-systems are full of (platform-dependent) edge cases:
	103
	104	* it's hard (requires a syscall) to decide if two paths are equivalent
	105	* some file-systems are case-sensitive
	106	* paths are not necessary UTF-8
	107	* symlinks can form cycles
	108
	109	Second, this might hurt reproducibility and hermeticity of builds. In theory,
	110	moving a project from `/foo/bar/my-project` to `/spam/eggs/my-project` should
	111	not change a bit in the output. However, if absolute path is a part of the
	112	input, it is at least in theory observable, and could affect the output.
	113
	114	Yet another problem is that we really-really want to avoid doing IO, but with
	115	Rust the set of "input" files is not necessary known up-front. In theory, you
	116	can have `#[path="/dev/random"] mod foo;`.
	117
	118	To solve (or explicitly refuse to solve) these problems rust analyzer uses the
	119	concept of source root. Roughly speaking, source roots is a contents of a
	120	directory on a file systems, like `/home/matklad/projects/rustraytracer/**.rs`.
	121
	122	More precisely, all files (`FileId`s) are partitioned into disjoint
	123	`SourceRoot`s. Each file has a relative utf-8 path within the `SourceRoot`.
	124	`SourceRoot` has an identity (integer id). Crucially, the root path of the
	125	source root itself is unknown to the analyzer: client is supposed to maintain a
	126	mapping between SourceRoot ids (which are assigned by the client) and actual
	127	`PathBuf`s. `SourceRoot`s give a sane tree model of the file system to the
	128	analyzer.
	129
	130	Note that `mod`, `#[path]` and `include!()` can only reference files from the
	131	same source root. It is of course is possible to explicitly add extra files to
	132	the source root, even `/dev/random`.
	133
	134	## Language Server Protocol
	135
	136	Now let's see how `Analysis` API is exposed via JSON RPC based LSP protocol. The
	137	hard part here is managing changes (which can come either from the file system
	138	or from the editor) and concurrency (we want to spawn background jobs for things
	139	like syntax highlighting). We use the event loop pattern to manage the zoo, and
	140	the loop is the [`main_loop_inner`] function. The [`main_loop`] does a one-time
	141	initialization and tearing down of the resources.
	142
	143	[`main_loop`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L51-L110
	144	[`main_loop_inner`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L156-L258
	145
	146
	147	Let's walk through a typical analyzer session!
	148
	149	First, we need to figure out what to analyze. To do this, we run `cargo
	150	metadata` to learn about Cargo packages for current workspace and dependencies,
	151	and we run `rustc --print sysroot` and scan sysroot to learn about crates like
	152	`std`. Currently we load this configuration once at the start of the server, but
	153	it should be possible to dynamically reconfigure it later without restart.
	154
	155	[main_loop.rs#L62-L70](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L62-L70)
	156
	157	The [`ProjectModel`] we get after this step is very Cargo and sysroot specific,
	158	it needs to be lowered to get the input in the form of `AnalysisChange`. This
	159	happens in [`ServerWorldState::new`] method. Specifically
	160
	161	* Create a `SourceRoot` for each Cargo package and sysroot.
	162	* Schedule a file system scan of the roots.
	163	* Create an analyzer's `Crate` for each Cargo target and sysroot crate.
	164	* Setup dependencies between the crates.
	165
	166	[`ProjectModel`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/project_model.rs#L16-L20
	167	[`ServerWorldState::new`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/server_world.rs#L38-L160
	168
	169	The results of the scan (which may take a while) will be processed in the body
	170	of the main loop, just like any other change. Here's where we handle
	171
	172	* [File system changes](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L194)
	173	* [Changes from the editor](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L377)
	174
	175	After a single loop's turn, we group them into one `AnalysisChange` and
	176	[apply] it. This always happens on the main thread and blocks the loop.
	177
	178	[apply]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/server_world.rs#L216
	179
	180	To handle requests, like ["goto definition"], we create an instance of the
	181	`Analysis` and [`schedule`] the task (which consumes `Analysis`) onto
	182	threadpool. [The task] calls the corresponding `Analysis` method, while
	183	massaging the types into the LSP representation. Keep in mind that if we are
	184	executing "goto definition" on the threadpool and a new change comes in, the
	185	task will be canceled as soon as the main loop calls `apply_change` on the
	186	`AnalysisHost`.
	187
	188	["goto definition"]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/server_world.rs#L216
	189	[`schedule`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop.rs#L426-L455
	190	[The task]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_lsp_server/src/main_loop/handlers.rs#L205-L223
	191
	192	This concludes the overview of the analyzer's programing interface. Next, lets
	193	dig into the implementation!
	194
	195	## Salsa
	196
	197	The most straightforward way to implement "apply change, get analysis, repeat"
	198	API would be to maintain the input state and to compute all possible analysis
	199	information from scratch after every change. This works, but scales poorly with
	200	the size of the project. To make this fast, we need to take advantage of the
	201	fact that most of the changes are small, and that analysis results are unlikely
	202	to change significantly between invocations.
	203
	204	To do this we use [salsa]: a framework for incremental on-demand computation.
	205	You can skip the rest of the section if you are familiar with rustc red-green
	206	algorithm.
	207
	208	[salsa]: https://github.com/salsa-rs/salsa
	209
	210	It's better to refer to salsa's docs to learn about it. Here's a small excerpt:
	211
	212	The key idea of salsa is that you define your program as a set of queries. Every
	213	query is used like function K -> V that maps from some key of type K to a value
	214	of type V. Queries come in two basic varieties:
	215
	216	* Inputs: the base inputs to your system. You can change these whenever you
	217	like.
	218
	219	* Functions: pure functions (no side effects) that transform your inputs
	220	into other values. The results of queries is memoized to avoid recomputing
	221	them a lot. When you make changes to the inputs, we'll figure out (fairly
	222	intelligently) when we can re-use these memoized values and when we have to
	223	recompute them.
	224
	225
	226	For further discussion, its important to understand one bit of "fairly
	227	intelligently". Suppose we have to functions, `f1` and `f2`, and one input, `i`.
	228	We call `f1(X)` which in turn calls `f2(Y)` which inspects `i(Z)`. `i(Z)`
	229	returns some value `V1`, `f2` uses that and returns `R1`, `f1` uses that and
	230	returns `O`. Now, let's change `i` at `Z` to `V2` from `V1` and try to compute
	231	`f1(X)` again. Because `f1(X)` (transitively) depends on `i(Z)`, we can't just
	232	reuse its value as is. However, if `f2(Y)` is still equal to `R1` (despite the
	233	`i`'s change), we, in fact, can reuse `O` as result of `f1(X)`. And that's how
	234	salsa works: it recomputes results in reverse order, starting from inputs and
	235	progressing towards outputs, stopping as soon as it sees an intermediate value
	236	that hasn't changed.
	237
	238	## Salsa Input Queries
	239
	240	All analyzer information is stored in a salsa database. `Analysis` and
	241	`AnalysisHost` types are newtype wrappers for [`RootDatabase`] -- a salsa
	242	database.
	243
	244	[`RootDatabase`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_ide_api/src/db.rs#L88-L134
	245
	246	Salsa input queries are defined in [`FilesDatabase`] (which is a part of
	247	`RootDatabase`). They closely mirror the familiar `AnalysisChange` structure:
	248	indeed, what `apply_change` does is it sets the values of input queries.
	249
	250	[`FilesDatabase`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_db/src/input.rs#L150-L174
	251
	252	## From text to semantic model
	253
	254	The bulk of the rust-analyzer is transforming input text into semantic model of
	255	Rust code: a web of entities like modules, structs, functions and traits.
	256
	257	An important fact to realize is that (unlike most other languages like C# or
	258	Java) there isn't a one-to-one mapping between source code and semantic model. A
	259	single function definition in the source code might result in several semantic
	260	functions: for example, the same source file might be included as a module into
	261	several crate, or a single "crate" might be present in the compilation DAG
	262	several times, with different sets of `cfg`s enabled.
	263
	264	The semantic interface is declared in [`code_model_api`] module. Each entity is
	265	identified by integer id and has a bunch of methods which take a salsa database
	266	as an argument and returns other entities (which are ids). Internally, this
	267	methods invoke various queries on the database to build the model on demand.
	268	Here's [the list of queries].
	269
	270	[`code_model_api`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_hir/src/code_model_api.rs
	271	[the list of queries]: https://github.com/rust-analyzer/rust-analyzer/blob/7e84440e25e19529e4ff8a66e521d1b06349c6ec/crates/ra_hir/src/db.rs#L20-L106
	272
	273	The first step of building the model is parsing the source code.
	274
	275	## Syntax trees
	276
	277	An important property of the Rust language is that each file can be parsed in
	278	isolation. Unlike, say, `C++`, an `include` can't change the meaning of the
	279	syntax. For this reason, Rust analyzer can build a syntax tree for each "source
	280	file", which could then be reused by several semantic models if this file
	281	happens to be a part of several crates.
	282
	283	Rust analyzer uses a similar representation of syntax trees to that of `Roslyn`
	284	and Swift's new
	285	[libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
	286	Swift's docs give an excellent overview of the approach, so I skip this part
	287	here and instead outline the main characteristics of the syntax trees:
	288
	289	* Syntax trees are fully lossless. Converting any text to a syntax tree and
	290	back is a total identity function. All whitespace and comments are explicitly
	291	represented in the tree.
	292
	293	* Syntax nodes have generic `(next\|previous)_sibling`, `parent`,
	294	`(first\|last)_child` functions. You can get from any one node to any other
	295	node in the file using only these functions.
	296
	297	* Syntax nodes know their range (start offset and length) in the file.
	298
	299	* Syntax nodes share the ownership of their syntax tree: if you keep a reference
	300	to a single function, the whole enclosing file is alive.
	301
	302	* Syntax trees are immutable and the cost of replacing the subtree is
	303	proportional to the depth of the subtree. Read Swift's docs to learn how
	304	immutable + parent pointers + cheap modification is possible.
	305
	306	* Syntax trees are build on best-effort basis. All accessor methods return
	307	`Option`s. The tree for `fn foo` will contain a function declaration with
	308	`None` for parameter list and body.
	309
	310	* Syntax trees do not know the file they are build from, they only know about
	311	the text.
	312
	313	The implementation is based on the generic [rowan] crate on top of which a
	314	[rust-specific] AST is generated.
	315
	316	[rowan]: https://github.com/rust-analyzer/rowan/tree/100a36dc820eb393b74abe0d20ddf99077b61f88
	317	[rust-specific]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_syntax/src/ast/generated.rs
	318
	319	The next step in constructing the semantic model is ...
	320
	321	## Building a Module Tree
	322
	323	The algorithm for building a tree of modules is to start with a crate root
	324	(remember, each `Crate` from a `CrateGraph` has a `FileId`), collect all mod
	325	declarations and recursively process child modules. This is handled by the
	326	[`module_tree_query`](https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_hir/src/module_tree.rs#L116-L123),
	327	with a two slight variations.
	328
	329	First, rust analyzer builds a module tree for all crates in a source root
	330	simultaneously. The main reason for this is historical (`module_tree` predates
	331	`CrateGraph`), but this approach also allows to account for files which are not
	332	part of any crate. That is, if you create a file but do not include it as a
	333	submodule anywhere, you still get semantic completion, and you get a warning
	334	about free-floating module (the actual warning is not implemented yet).
	335
	336	The second difference is that `module_tree_query` does not directly depend on
	337	the "parse" query (which is confusingly called `source_file`). Why would calling
	338	the parse directly be bad? Suppose the user changes the file slightly, by adding
	339	an insignificant whitespace. Adding whitespace changes the parse tree (because
	340	it includes whitespace), and that means recomputing the whole module tree.
	341
	342	We deal with this problem by introducing an intermediate [`submodules_query`].
	343	This query processes the syntax tree an extract a set of declared submodule
	344	names. Now, changing the whitespace results in `submodules_query` being
	345	re-executed for a single module, but because the result of this query stays
	346	the same, we don't have to re-execute [`module_tree_query`]. In fact, we only
	347	need to re-execute it when we add/remove new files or when we change mod
	348	declarations,
	349
	350	[`submodules_query`]: https://github.com/rust-analyzer/rust-analyzer/blob/guide-2019-01/crates/ra_hir/src/module_tree.rs#L41)
	351
	352
	353
	354
	355
	356	## Location Interner pattern
	357
	358	## Macros and recursive locations
	359
	360	## Name resolution
	361
	362	## Source Map pattern
	363
	364	## Tying it all together: completion