tbsp - tree-based source-processing language

tbsp - tree-based source-processing language


tbsp is an awk-like language that operates on tree-sitter
syntax trees. to motivate the need for such a program, we
could begin by writing a markdown-to-html converter using
tbsp and tree-sitter-md [0]. we need some markdown to begin
with:


    # 1 heading

    content of first paragraph

    ## 1.1 heading

    content of nested paragraph


for future reference, this markdown is parsed like so by
tree-sitter-md (visualization generated by tree-viz [1]):


    document
    |  section
    |  |  atx_heading
    |  |  |  atx_h1_marker "#"
    |  |  |  heading_content inline "1 heading"
    |  |  paragraph
    |  |  |  inline "content of first paragraph"
    |  |  section
    |  |  |  atx_heading
    |  |  |  |  atx_h2_marker "##"
    |  |  |  |  heading_content inline "1.1 heading"
    |  |  |  paragraph
    |  |  |  |  inline "content of nested paragraph"


onto the converter itself. every tbsp program is written as
a collection of stanzas. typically, we start with a stanza
like so:


    BEGIN {
        int depth = 0;

        print("<html>\n");
        print("<body>\n");
    }


the stanza begins with a "pattern", in this case, "BEGIN",
and is followed a block of code. this block specifically, is
executed right at the beginning, before traversing the parse
tree. in this stanza, we set a "depth" variable to keep
track of nesting of markdown headers, and begin our html
document by printing the "<html>" and "<body>" tags.

we can follow this stanza with an "END" stanza, that is
executed after the traversal:


    END {
        print("</body>\n");
        print("</html>\n");
    }


in this stanza, we close off the tags we opened at the start
of the document. we can move onto the interesting bits of
the conversion now:


    enter section {
        depth += 1;
    }
    leave section {
        depth -= 1;
    }


the above stanzas begin with "enter" and "leave" clauses,
followed by the name of a tree-sitter node kind: "section".
the "section" identifier is visible in the
tree-visualization above, it encompasses a markdown-section,
and is created for every markdown header. to understand how
tbsp executes above stanzas:


    document                                 ...  depth = 0 
    |  section <-------- enter section (1)   ...  depth = 1 
    |  |  atx_heading
    |  |  |  inline
    |  |  paragraph
    |  |  |  inline
    |  |  section <----- enter section (2)   ...  depth = 2 
    |  |  |  atx_heading
    |  |  |  | inline
    |  |  |  paragraph
    |  |  |  | inline
    |  |  | <----------- leave section (2)   ...  depth = 1 
    |  | <-------------- leave section (1)   ...  depth = 0 


the following stanzas should be self-explanatory now:


    enter atx_heading {
        print("<h");
        print(depth);
        print(">");
    }
    leave atx_heading {
        print("</h");
        print(depth);
        print(">\n");
    }

    enter inline {
        print(text(node));
    }


but an explanation is included nonetheless:


    document                                 ...  depth = 0 
    |  section <-------- enter section (1)   ...  depth = 1 
    |  |  atx_heading <- enter atx_heading   ...  print "<h1>"
    |  |  |  inline <--- enter inline        ...  print ..
    |  |  | <----------- leave atx_heading   ...  print "</h1>"
    |  |  paragraph
    |  |  |  inline <--- enter inline        ...  print ..
    |  |  section <----- enter section (2)   ...  depth = 2 
    |  |  |  atx_heading enter atx_heading   ...  print "<h2>"
    |  |  |  | inline <- enter inline        ...  print ..
    |  |  |  | <-------- leave atx_heading   ...  print "</h2>"
    |  |  |  paragraph
    |  |  |  | inline <- enter inline        ...  print ..
    |  |  | <----------- leave section (2)   ...  depth = 1 
    |  | <-------------- leave section (1)   ...  depth = 0 


the examples directory contains a complete markdown-to-html
converter, along with a few other motivating examples.

---

usage:

the tbsp evaluator is written in rust, use cargo to build
and run:

    cargo build --release
    ./target/release/tbsp --help


tbsp requires three inputs:

- a tbsp program, referred to as "program file"
- a language
- an input file or some input text at stdin


you can run the interpreter like so (this program prints an
overview of a rust file):

    $ ./target/release/tbsp \
          -f./examples/code-overview/overview.tbsp \
          -l rust \
          src/main.rs
    module
       └╴struct Cli
       └╴trait Cli
          └╴fn program
          └╴fn language
          └╴fn file
       └╴fn try_consume_stdin
       └╴fn main

    
---

roadmap:

- interpreter performance
  - [ ] introduce a hir with arena allocated blocks, expr
  - [ ] bytecode VM?
  - [ ] look into embedding high perf VMs, lua etc.
- pattern matching
  - [x] allow matching on tree-sitter queries
  - [ ] support captures
- language features
  - [ ] arrays and loops
  - [x] access node children
  - [x] access node fields
  - [ ] repr for ranges
  - [ ] comments
  - [ ] regexes


[0]: https://github.com/tree-sitter-grammars/tree-sitter-markdown
[1]: https://git.peppe.rs/cli/tree-viz