From cfc70207996e202edbb577b2ad97a61ba9eb0eaa Mon Sep 17 00:00:00 2001 From: Akshay Date: Tue, 2 Aug 2022 19:50:46 +0530 Subject: add textual comparison structural comparison helps detect a vast majority of duplicates, but it has a few false positives when files contain only trivia. textual similarity can help detect and eliminate those false positives. --- readme | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'readme') diff --git a/readme b/readme index 869e43e..d92989f 100644 --- a/readme +++ b/readme @@ -18,15 +18,14 @@ Internals: The tool uses tree-sitter to produce ASTs for the given files. It then lazily traverses the trees of the two files to be compared and exits on encountering -the first structural difference in the ASTs. +the first structural difference in the ASTs. Additionally, it performs a +textual similarity check to eliminate outliers such as files that consist +entirely of trivia nodes. Known issues: ------------ -- A fully commented-out file is equivalent to every other fully commented-out - file and to empty files - - Does not account for equivalence of unordered children: ==== file1.rs ==== ==== file2.rs ==== -- cgit v1.2.3