diff options
author | bors[bot] <26634292+bors[bot]@users.noreply.github.com> | 2020-02-03 22:51:17 +0000 |
---|---|---|
committer | GitHub <[email protected]> | 2020-02-03 22:51:17 +0000 |
commit | 918547dbe9a2907401102eba491ac25cebe1404d (patch) | |
tree | e0aa3bdcec597e81f022ac1ce388d42724a92f51 /crates/ra_syntax/test_data/lexer/ok/0004_numbers.txt | |
parent | b090ee5a65f9630146c2842bc51fcfcc8da08da1 (diff) | |
parent | a3e5663ae0206270156fbeb926a174a40abbddb0 (diff) |
Merge #2911
2911: Implement collecting errors while tokenizing r=matklad a=Veetaha
Now we are collecting errors from `rustc_lexer` and returning them in `ParsedToken { token, error }` and `ParsedTokens { tokens, errors }` structures **([UPD]: this is now simplified, see updates bellow)**.
The main changes are introduced in `ra_syntax/parsing/lexer.rs`. It now exposes the following functions and types:
```rust
pub fn tokenize(text: &str) -> ParsedTokens;
pub fn tokenize_append(text: &str, parsed_tokens_to_append_to: &mut ParsedTokens);
pub fn first_token(text: &str) -> Option<ParsedToken>; // allows any number of tokens in text
pub fn single_token(text: &str) -> Option<ParsedToken>; // allows only a single token in text
pub struct ParsedToken { pub token: Token, pub error: Option<SyntaxError> }
pub struct ParsedTokens { pub tokens: Vec<Token>, pub errors: Vec<SyntaxError> }
pub enum TokenizeError { /* Simple enum which reflects rustc_lexer tokenization errors */ }
```
In the first commit I implemented it with iterators, but then decided that since this crate is ad hoc for `rust-analyzer` and we clearly see the places of its usage it would be better to simplify it to vectors.
This is currently WIP, because I want to add tests for error messages generated by the lexer.
I'd like to listen to you thoughts how to define these tests in `ra_syntax/test-data` dir.
Related issues: #223
**[UPD]**
After the PR review the API was simplified:
```rust
pub fn tokenize(text: &str) -> (Vec<Token>, Vec<SyntaxError>);
// Both lex functions do not check for unescape errors
pub fn lex_single_syntax_kind(text: &str) -> Option<(SyntaxKind, Option<SyntaxError>)>;
pub fn lex_single_valid_syntax_kind(text: &str) -> Option<SyntaxKind>;
// This will be removed in the next PR in favour of simlifying `SyntaxError` to `(String, TextRange)`
pub enum TokenizeError { /* Simple enum which reflects rustc_lexer tokenization errors */ }
// this is private, but may be made public if such demand would exist in future (least privilege principle)
fn lex_first_token(text: &str) -> Option<(Token, Option<SyntaxError>)>;
```
Co-authored-by: Veetaha <[email protected]>
Diffstat (limited to 'crates/ra_syntax/test_data/lexer/ok/0004_numbers.txt')
-rw-r--r-- | crates/ra_syntax/test_data/lexer/ok/0004_numbers.txt | 57 |
1 files changed, 57 insertions, 0 deletions
diff --git a/crates/ra_syntax/test_data/lexer/ok/0004_numbers.txt b/crates/ra_syntax/test_data/lexer/ok/0004_numbers.txt new file mode 100644 index 000000000..e19fc5789 --- /dev/null +++ b/crates/ra_syntax/test_data/lexer/ok/0004_numbers.txt | |||
@@ -0,0 +1,57 @@ | |||
1 | INT_NUMBER 1 "0" | ||
2 | WHITESPACE 1 " " | ||
3 | INT_NUMBER 2 "00" | ||
4 | WHITESPACE 1 " " | ||
5 | INT_NUMBER 2 "0_" | ||
6 | WHITESPACE 1 " " | ||
7 | FLOAT_NUMBER 2 "0." | ||
8 | WHITESPACE 1 " " | ||
9 | INT_NUMBER 2 "0z" | ||
10 | WHITESPACE 1 "\n" | ||
11 | INT_NUMBER 5 "01790" | ||
12 | WHITESPACE 1 " " | ||
13 | INT_NUMBER 6 "0b1790" | ||
14 | WHITESPACE 1 " " | ||
15 | INT_NUMBER 6 "0o1790" | ||
16 | WHITESPACE 1 " " | ||
17 | INT_NUMBER 18 "0x1790aAbBcCdDeEfF" | ||
18 | WHITESPACE 1 " " | ||
19 | INT_NUMBER 6 "001279" | ||
20 | WHITESPACE 1 " " | ||
21 | INT_NUMBER 6 "0_1279" | ||
22 | WHITESPACE 1 " " | ||
23 | FLOAT_NUMBER 6 "0.1279" | ||
24 | WHITESPACE 1 " " | ||
25 | FLOAT_NUMBER 6 "0e1279" | ||
26 | WHITESPACE 1 " " | ||
27 | FLOAT_NUMBER 6 "0E1279" | ||
28 | WHITESPACE 1 "\n" | ||
29 | INT_NUMBER 1 "0" | ||
30 | DOT 1 "." | ||
31 | DOT 1 "." | ||
32 | INT_NUMBER 1 "2" | ||
33 | WHITESPACE 1 "\n" | ||
34 | INT_NUMBER 1 "0" | ||
35 | DOT 1 "." | ||
36 | IDENT 3 "foo" | ||
37 | L_PAREN 1 "(" | ||
38 | R_PAREN 1 ")" | ||
39 | WHITESPACE 1 "\n" | ||
40 | FLOAT_NUMBER 4 "0e+1" | ||
41 | WHITESPACE 1 "\n" | ||
42 | INT_NUMBER 1 "0" | ||
43 | DOT 1 "." | ||
44 | IDENT 1 "e" | ||
45 | PLUS 1 "+" | ||
46 | INT_NUMBER 1 "1" | ||
47 | WHITESPACE 1 "\n" | ||
48 | FLOAT_NUMBER 6 "0.0E-2" | ||
49 | WHITESPACE 1 "\n" | ||
50 | FLOAT_NUMBER 26 "0___0.10000____0000e+111__" | ||
51 | WHITESPACE 1 "\n" | ||
52 | INT_NUMBER 4 "1i64" | ||
53 | WHITESPACE 1 " " | ||
54 | FLOAT_NUMBER 7 "92.0f32" | ||
55 | WHITESPACE 1 " " | ||
56 | INT_NUMBER 5 "11__s" | ||
57 | WHITESPACE 1 "\n" | ||