tbsp - tree-based source-processing language tbsp is an awk-like language that operates on tree-sitter syntax trees. to motivate the need for such a program, we could begin by writing a markdown-to-html converter using tbsp and tree-sitter-md [0]. we need some markdown to begin with: # 1 heading content of first paragraph ## 1.1 heading content of nested paragraph for future reference, this markdown is parsed like so by tree-sitter-md (visualization generated by tree-viz [1]): document | section | | atx_heading | | | atx_h1_marker "#" | | | heading_content inline "1 heading" | | paragraph | | | inline "content of first paragraph" | | section | | | atx_heading | | | | atx_h2_marker "##" | | | | heading_content inline "1.1 heading" | | | paragraph | | | | inline "content of nested paragraph" onto the converter itself. every tbsp program is written as a collection of stanzas. typically, we start with a stanza like so: BEGIN { int depth = 0; print("<html>\n"); print("<body>\n"); } the stanza begins with a "pattern", in this case, "BEGIN", and is followed a block of code. this block specifically, is executed right at the beginning, before traversing the parse tree. in this stanza, we set a "depth" variable to keep track of nesting of markdown headers, and begin our html document by printing the "<html>" and "<body>" tags. we can follow this stanza with an "END" stanza, that is executed after the traversal: END { print("</body>\n"); print("</html>\n"); } in this stanza, we close off the tags we opened at the start of the document. we can move onto the interesting bits of the conversion now: enter section { depth += 1; } leave section { depth -= 1; } the above stanzas begin with "enter" and "leave" clauses, followed by the name of a tree-sitter node kind: "section". the "section" identifier is visible in the tree-visualization above, it encompasses a markdown-section, and is created for every markdown header. to understand how tbsp executes above stanzas: document ... depth = 0 | section <-------- enter section (1) ... depth = 1 | | atx_heading | | | inline | | paragraph | | | inline | | section <----- enter section (2) ... depth = 2 | | | atx_heading | | | | inline | | | paragraph | | | | inline | | | <----------- leave section (2) ... depth = 1 | | <-------------- leave section (1) ... depth = 0 the following stanzas should be self-explanatory now: enter atx_heading { print("<h"); print(depth); print(">"); } leave atx_heading { print("</h"); print(depth); print(">\n"); } enter inline { print(text(node)); } but an explanation is included nonetheless: document ... depth = 0 | section <-------- enter section (1) ... depth = 1 | | atx_heading <- enter atx_heading ... print "<h1>" | | | inline <--- enter inline ... print .. | | | <----------- leave atx_heading ... print "</h1>" | | paragraph | | | inline <--- enter inline ... print .. | | section <----- enter section (2) ... depth = 2 | | | atx_heading enter atx_heading ... print "<h2>" | | | | inline <- enter inline ... print .. | | | | <-------- leave atx_heading ... print "</h2>" | | | paragraph | | | | inline <- enter inline ... print .. | | | <----------- leave section (2) ... depth = 1 | | <-------------- leave section (1) ... depth = 0 the examples directory contains a complete markdown-to-html converter, along with a few other motivating examples. --- usage: the tbsp evaluator is written in rust, use cargo to build and run: cargo build --release ./target/release/tbsp --help tbsp requires three inputs: - a tbsp program, referred to as "program file" - a language - an input file or some input text at stdin you can run the interpreter like so (this program prints an overview of a rust file): $ ./target/release/tbsp \ -f./examples/code-overview/overview.tbsp \ -l rust \ src/main.rs module └╴struct Cli └╴trait Cli └╴fn program └╴fn language └╴fn file └╴fn try_consume_stdin └╴fn main --- roadmap: - interpreter performance - [ ] introduce a hir with arena allocated blocks, expr - [ ] bytecode VM? - [ ] look into embedding high perf VMs, lua etc. - pattern matching - [x] allow matching on tree-sitter queries - [ ] support captures - language features - [ ] arrays and loops - [x] access node children - [x] access node fields - [ ] repr for ranges - [ ] comments - [ ] regexes [0]: https://github.com/tree-sitter-grammars/tree-sitter-markdown [1]: https://git.peppe.rs/cli/tree-viz