diff options
author | Akshay <[email protected]> | 2022-08-02 15:20:46 +0100 |
---|---|---|
committer | Akshay <[email protected]> | 2022-08-02 15:27:12 +0100 |
commit | cfc70207996e202edbb577b2ad97a61ba9eb0eaa (patch) | |
tree | 97a3f25c3016766d6456efb748d48cbc6c525a47 /readme | |
parent | efd96e8df6805a45aaf5822141dee11c642b51ae (diff) |
structural comparison helps detect a vast majority of duplicates, but it
has a few false positives when files contain only trivia. textual
similarity can help detect and eliminate those false positives.
Diffstat (limited to 'readme')
-rw-r--r-- | readme | 7 |
1 files changed, 3 insertions, 4 deletions
@@ -18,15 +18,14 @@ Internals: | |||
18 | 18 | ||
19 | The tool uses tree-sitter to produce ASTs for the given files. It then lazily | 19 | The tool uses tree-sitter to produce ASTs for the given files. It then lazily |
20 | traverses the trees of the two files to be compared and exits on encountering | 20 | traverses the trees of the two files to be compared and exits on encountering |
21 | the first structural difference in the ASTs. | 21 | the first structural difference in the ASTs. Additionally, it performs a |
22 | textual similarity check to eliminate outliers such as files that consist | ||
23 | entirely of trivia nodes. | ||
22 | 24 | ||
23 | 25 | ||
24 | Known issues: | 26 | Known issues: |
25 | ------------ | 27 | ------------ |
26 | 28 | ||
27 | - A fully commented-out file is equivalent to every other fully commented-out | ||
28 | file and to empty files | ||
29 | |||
30 | - Does not account for equivalence of unordered children: | 29 | - Does not account for equivalence of unordered children: |
31 | 30 | ||
32 | ==== file1.rs ==== ==== file2.rs ==== | 31 | ==== file1.rs ==== ==== file2.rs ==== |