1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
|
tbsp - tree-based source-processing language
tbsp is an awk-like language that operates on tree-sitter
syntax trees. to motivate the need for such a program, we
could begin by writing a markdown-to-html converter using
tbsp and tree-sitter-md [0]. we need some markdown to begin
with:
# 1 heading
content of first paragraph
## 1.1 heading
content of nested paragraph
for future reference, this markdown is parsed like so by
tree-sitter-md (visualization generated by tree-viz [1]):
document
| section
| | atx_heading
| | | atx_h1_marker "#"
| | | heading_content inline "1 heading"
| | paragraph
| | | inline "content of first paragraph"
| | section
| | | atx_heading
| | | | atx_h2_marker "##"
| | | | heading_content inline "1.1 heading"
| | | paragraph
| | | | inline "content of nested paragraph"
onto the converter itself. every tbsp program is written as
a collection of stanzas. typically, we start with a stanza
like so:
BEGIN {
int depth = 0;
print("<html>\n");
print("<body>\n");
}
the stanza begins with a "pattern", in this case, "BEGIN",
and is followed a block of code. this block specifically, is
executed right at the beginning, before traversing the parse
tree. in this stanza, we set a "depth" variable to keep
track of nesting of markdown headers, and begin our html
document by printing the "<html>" and "<body>" tags.
we can follow this stanza with an "END" stanza, that is
executed after the traversal:
END {
print("</body>\n");
print("</html>\n");
}
in this stanza, we close off the tags we opened at the start
of the document. we can move onto the interesting bits of
the conversion now:
enter section {
depth += 1;
}
leave section {
depth -= 1;
}
the above stanzas begin with "enter" and "leave" clauses,
followed by the name of a tree-sitter node kind: "section".
the "section" identifier is visible in the
tree-visualization above, it encompasses a markdown-section,
and is created for every markdown header. to understand how
tbsp executes above stanzas:
document ... depth = 0
| section <-------- enter section (1) ... depth = 1
| | atx_heading
| | | inline
| | paragraph
| | | inline
| | section <----- enter section (2) ... depth = 2
| | | atx_heading
| | | | inline
| | | paragraph
| | | | inline
| | | <----------- leave section (2) ... depth = 1
| | <-------------- leave section (1) ... depth = 0
the following stanzas should be self-explanatory now:
enter atx_heading {
print("<h");
print(depth);
print(">");
}
leave atx_heading {
print("</h");
print(depth);
print(">\n");
}
enter inline {
print(text(node));
}
but an explanation is included nonetheless:
document ... depth = 0
| section <-------- enter section (1) ... depth = 1
| | atx_heading <- enter atx_heading ... print "<h1>"
| | | inline <--- enter inline ... print ..
| | | <----------- leave atx_heading ... print "</h1>"
| | paragraph
| | | inline <--- enter inline ... print ..
| | section <----- enter section (2) ... depth = 2
| | | atx_heading enter atx_heading ... print "<h2>"
| | | | inline <- enter inline ... print ..
| | | | <-------- leave atx_heading ... print "</h2>"
| | | paragraph
| | | | inline <- enter inline ... print ..
| | | <----------- leave section (2) ... depth = 1
| | <-------------- leave section (1) ... depth = 0
the examples directory contains a complete markdown-to-html
converter, along with a few other motivating examples.
[0]: https://github.com/tree-sitter-grammars/tree-sitter-markdown
[1]: https://git.peppe.rs/cli/tree-viz
|