diff options
-rw-r--r-- | posts/auto-currying_rust_functions.md | 892 |
1 files changed, 892 insertions, 0 deletions
diff --git a/posts/auto-currying_rust_functions.md b/posts/auto-currying_rust_functions.md new file mode 100644 index 0000000..170fd31 --- /dev/null +++ b/posts/auto-currying_rust_functions.md | |||
@@ -0,0 +1,892 @@ | |||
1 | This post contains a gentle introduction to procedural | ||
2 | macros in Rust and a guide to writing a procedural macro to | ||
3 | curry Rust functions. The source code for the entire library | ||
4 | can be found [here](https://github.com/nerdypepper/cutlass). | ||
5 | It is also available on [crates.io](https://crates.io/crates/cutlass). | ||
6 | |||
7 | The following links might prove to be useful before getting | ||
8 | started: | ||
9 | |||
10 | - [Procedural Macros](https://doc.rust-lang.org/reference/procedural-macros.html) | ||
11 | - [Currying](https://en.wikipedia.org/wiki/Currying) | ||
12 | |||
13 | Or you can pretend you read them, because I have included | ||
14 | a primer here :) | ||
15 | |||
16 | |||
17 | ### Contents | ||
18 | |||
19 | 1. [Currying](#currying) | ||
20 | 2. [Procedural Macros](#procedural-macros) | ||
21 | 3. [Definitions](#definitions) | ||
22 | 4. [Refinement](#refinement) | ||
23 | 5. [The In-betweens](#the-in-betweens) | ||
24 | 5.1 [Dependencies](#dependencies) | ||
25 | 5.2 [The attribute macro](#the-attribute-macro) | ||
26 | 5.3 [Function Body](#function-body) | ||
27 | 5.4 [Function Signature](#function-signature) | ||
28 | 5.5 [Getting it together](#getting-it-together) | ||
29 | 6. [Debugging and Testing](#debugging-and-testing) | ||
30 | 7. [Notes](#notes) | ||
31 | 8. [Conclusion](#conclusion) | ||
32 | |||
33 | ### Currying | ||
34 | |||
35 | Currying is the process of transformation of a function call | ||
36 | like `f(a, b, c)` to `f(a)(b)(c)`. A curried function | ||
37 | returns a concrete value only when it receives all its | ||
38 | arguments! If it does recieve an insufficient amount of | ||
39 | arguments, say 1 of 3, it returns a *curried function*, that | ||
40 | returns after receiving 2 arguments. | ||
41 | |||
42 | ``` | ||
43 | curry(f(a, b, c)) = h(a)(b)(c) | ||
44 | |||
45 | h(x) = g <- curried function that takes upto 2 args (g) | ||
46 | g(y) = k <- curried function that takes upto 1 arg (k) | ||
47 | k(z) = v <- a value (v) | ||
48 | |||
49 | Keen readers will conclude the following, | ||
50 | h(x)(y)(z) = g(y)(z) = k(z) = v | ||
51 | ``` | ||
52 | |||
53 | Mathematically, if `f` is a function that takes two | ||
54 | arguments `x` and `y`, such that `x ϵ X`, and `y ϵ Y` , we | ||
55 | write it as: | ||
56 | |||
57 | ``` | ||
58 | f: (X × Y) -> Z | ||
59 | ``` | ||
60 | |||
61 | where `×` denotes the Cartesian product of set `X` and `Y`, | ||
62 | and curried `f` (denoted by `h` here) is written as: | ||
63 | |||
64 | ``` | ||
65 | h: X -> (Y -> Z) | ||
66 | ``` | ||
67 | |||
68 | ### Procedural Macros | ||
69 | |||
70 | These are functions that take code as input and spit out | ||
71 | modified code as output. Powerful stuff. Rust has three | ||
72 | kinds of proc-macros: | ||
73 | |||
74 | - Function like macros: `println!`, `vec!`. | ||
75 | - Derive macros: `#[derive(...)]`, used to automatically | ||
76 | implement traits for structs/enums. | ||
77 | - and Attribute macros: `#[test]`, usually slapped onto | ||
78 | functions. | ||
79 | |||
80 | We will be using Attribute macros to convert a Rust function | ||
81 | into a curried Rust function, which we should be able to | ||
82 | call via: `function(arg1)(arg2)`. | ||
83 | |||
84 | ### Definitions | ||
85 | |||
86 | Being respectable programmers, we define the input to and | ||
87 | the output from our proc-macro. Here's a good non-trivial | ||
88 | function to start out with: | ||
89 | |||
90 | ```rust | ||
91 | fn add(x: u32, y: u32, z: u32) -> u32 { | ||
92 | return x + y + z; | ||
93 | } | ||
94 | ``` | ||
95 | |||
96 | Hmm, what would our output look like? What should our | ||
97 | proc-macro generate ideally? Well, if we understood currying | ||
98 | correctly, we should accept an argument and return a | ||
99 | function that accepts an argument and returns ... you get | ||
100 | the point. Something like this should do: | ||
101 | |||
102 | ```rust | ||
103 | fn add_curried1(x: u32) -> ? { | ||
104 | return fn add_curried2 (y: u32) -> ? { | ||
105 | return fn add_curried3 (z: u32) -> u32 { | ||
106 | return x + y + z; | ||
107 | } | ||
108 | } | ||
109 | } | ||
110 | ``` | ||
111 | |||
112 | A couple of things to note: | ||
113 | |||
114 | **Return types** | ||
115 | We have placed `?`s in place of return | ||
116 | types. Let's try to fix that. `add_curried3` returns the | ||
117 | 'value', so `u32` is accurate. `add_curried2` returns | ||
118 | `add_curried3`. What is the type of `add_curried3`? It is a | ||
119 | function that takes in a `u32` and returns a `u32`. So a | ||
120 | `fn(u32) -> u32` will do right? No, I'll explain why in the | ||
121 | next point, but for now, we will make use of the `Fn` trait, | ||
122 | our return type is `impl Fn(u32) -> u32`. This basically | ||
123 | tells the compiler that we will be returning something | ||
124 | function-like, a.k.a, behaves like a `Fn`. Cool! | ||
125 | |||
126 | If you have been following along, you should be able to tell | ||
127 | that the return type of `add_curried1` is: | ||
128 | ``` | ||
129 | impl Fn(u32) -> (impl Fn(u32) -> u32) | ||
130 | ``` | ||
131 | |||
132 | We can drop the parentheses because `->` is right associative: | ||
133 | ``` | ||
134 | impl Fn(u32) -> impl Fn(u32) -> u32 | ||
135 | |||
136 | ``` | ||
137 | |||
138 | **Accessing environment** | ||
139 | A function cannot access it's environment. Our solution | ||
140 | will not work. `add_curried3` attempts to access `x`, which | ||
141 | is not allowed! A closure[^closure] however, can. If we are | ||
142 | returning a closure, our return type must be `impl Fn`, and | ||
143 | not `fn`. The difference between the `Fn` trait and | ||
144 | function pointers is beyond the scope of this post. | ||
145 | |||
146 | [^closure]: [https://doc.rust-lang.org/book/ch13-01-closures.html](https://doc.rust-lang.org/book/ch13-01-closures.html) | ||
147 | |||
148 | ### Refinement | ||
149 | |||
150 | Armed with knowledge, we refine our expected output, this | ||
151 | time, employing closures: | ||
152 | |||
153 | ``` | ||
154 | fn add(x: u32) -> impl Fn(u32) -> impl Fn(u32) -> u32 { | ||
155 | return move |y| move |z| x + y + z; | ||
156 | } | ||
157 | ``` | ||
158 | |||
159 | Alas, that does not compile either! It errors out with the | ||
160 | following message: | ||
161 | |||
162 | ``` | ||
163 | error[E0562]: `impl Trait` not allowed outside of function | ||
164 | and inherent method return types | ||
165 | --> src/main.rs:17:37 | ||
166 | | | ||
167 | | fn add(x: u32) -> impl Fn(u32) -> impl Fn(u32) -> u32 | ||
168 | | ^^^^^^^^^^^^^^^^^^^ | ||
169 | |||
170 | ``` | ||
171 | |||
172 | You are allowed to return an `impl Fn` only inside a | ||
173 | function. We are currently returning it from another return! | ||
174 | Or at least, that was the most I could make out of the error | ||
175 | message. | ||
176 | |||
177 | We are going to have to cheat a bit to fix this issue; with | ||
178 | type aliases and a convenient nightly feature [^features]: | ||
179 | |||
180 | [^features]: [caniuse.rs](https://caniuse.rs) contains an | ||
181 | indexed list of features and their status. | ||
182 | |||
183 | ```rust | ||
184 | #![feature(type_alias_impl_trait)] // allows us to use `impl Fn` in type aliases! | ||
185 | |||
186 | type T0 = u32; // the return value when zero args are to be applied | ||
187 | type T1 = impl Fn(u32) -> T0; // the return value when one arg is to be applied | ||
188 | type T2 = impl Fn(u32) -> T1; // the return value when two args are to be applied | ||
189 | |||
190 | fn add(x: u32) -> T2 { | ||
191 | return move |y| move |z| x + y + z; | ||
192 | } | ||
193 | ``` | ||
194 | |||
195 | Drop that into a cargo project, call `add(4)(5)(6)`, cross | ||
196 | your fingers, and run `cargo +nightly run`. You should see a | ||
197 | 15 unless you forgot to print it! | ||
198 | |||
199 | ### The In-Betweens | ||
200 | |||
201 | Let us write the magical bits that take us from function to | ||
202 | curried function. | ||
203 | |||
204 | Initialize your workspace with `cargo new --lib currying`. | ||
205 | Proc-macro crates are libraries with exactly one export, the | ||
206 | macro itself. Add a `tests` directory to your crate root. | ||
207 | Your directory should look something like this: | ||
208 | |||
209 | ``` | ||
210 | . | ||
211 | ├── Cargo.toml | ||
212 | ├── src | ||
213 | │ └── lib.rs | ||
214 | └── tests | ||
215 | └── smoke.rs | ||
216 | ``` | ||
217 | |||
218 | #### Dependencies | ||
219 | |||
220 | We will be using a total of 3 external crates: | ||
221 | |||
222 | - [proc_macro2](https://docs.rs/proc-macro2/1.0.12/proc_macro2/) | ||
223 | - [syn](https://docs.rs/syn/1.0.18/syn/index.html) | ||
224 | - [quote](https://docs.rs/quote/1.0.4/quote/index.html) | ||
225 | |||
226 | Here's a sample `Cargo.toml`: | ||
227 | |||
228 | ``` | ||
229 | # Cargo.toml | ||
230 | |||
231 | [dependencies] | ||
232 | proc-macro2 = "1.0.9" | ||
233 | quote = "1.0" | ||
234 | |||
235 | [dependencies.syn] | ||
236 | version = "1.0" | ||
237 | features = ["full"] | ||
238 | |||
239 | [lib] | ||
240 | proc-macro = true # this is important! | ||
241 | ``` | ||
242 | |||
243 | We will be using an external `proc-macro2` crate as well as | ||
244 | an internal `proc-macro` crate. Not confusing at all! | ||
245 | |||
246 | #### The attribute macro | ||
247 | |||
248 | Drop this into `src/lib.rs`, to get the ball rolling. | ||
249 | |||
250 | ```rust | ||
251 | // src/lib.rs | ||
252 | |||
253 | use proc_macro::TokenStream; // 1 | ||
254 | use quote::quote; | ||
255 | use syn::{parse_macro_input, ItemFn}; | ||
256 | |||
257 | #[proc_macro_attribute] // 2 | ||
258 | pub fn curry(_attr: TokenStream, item: TokenStream) -> TokenStream { | ||
259 | let parsed = parse_macro_input!(item as ItemFn); // 3 | ||
260 | generate_curry(parsed).into() // 4 | ||
261 | } | ||
262 | |||
263 | fn generate_curry(parsed: ItemFn) -> proc_macro2::TokenStream {} | ||
264 | ``` | ||
265 | |||
266 | **1. Imports** | ||
267 | |||
268 | A `Tokenstream` holds (hopefully valid) Rust code, this | ||
269 | is the type of our input and output. Note that we are | ||
270 | importing this type from `proc_macro` and not `proc_macro2`. | ||
271 | |||
272 | `quote!` from the `quote` crate is a macro that allows us to | ||
273 | quickly produce `TokenStream`s. Much like the LISP `quote` | ||
274 | procedure, you can use the `quote!` macro for symbolic | ||
275 | transformations. | ||
276 | |||
277 | `ItemFn` from the `syn` crate holds the parsed `TokenStream` | ||
278 | of a Rust function. `parse_macro_input!` is a helper macro | ||
279 | provided by `syn`. | ||
280 | |||
281 | **2. The lone export** | ||
282 | |||
283 | Annotate the only `pub` of our crate with | ||
284 | `#[proc_macro_attribute]`. This tells rustc that `curry` is | ||
285 | a procedural macro, and allows us to use it as | ||
286 | `#[crate_name::curry]` in other crates. Note the signature | ||
287 | of the `curry` function. `_attr` is the `TokenStream` | ||
288 | representing the attribute itself, `item` refers to the | ||
289 | thing we slapped our macro into, in this case a function | ||
290 | (like `add`). The return value is a modified `TokenStream`, | ||
291 | this will contain our curried version of `add`. | ||
292 | |||
293 | **3. The helper macro** | ||
294 | |||
295 | A `TokenStream` is a little hard to work with, which is why | ||
296 | we have the `syn` crate, which provides types to represent | ||
297 | Rust tokens. An `RArrow` struct to represent the return | ||
298 | arrow on a function and so on. One of those types is | ||
299 | `ItemFn`, that represents an entire Rust function. The | ||
300 | `parse_macro_input!` automatically puts the input to our | ||
301 | macro into an `ItemFn`. What a gentleman! | ||
302 | |||
303 | **4. Returning `TokenStream`s ** | ||
304 | |||
305 | We haven't filled in `generate_curry` yet, but we can see | ||
306 | that it returns a `proc_macro2::TokenStream` and not a | ||
307 | `proc_macro::TokenStream`, so drop a `.into()` to convert | ||
308 | it. | ||
309 | |||
310 | Lets move on, and fill in `generate_curry`, I would suggest | ||
311 | keeping the documentation for | ||
312 | [`syn::ItemFn`](https://docs.rs/syn/1.0.19/syn/struct.ItemFn.html) | ||
313 | and | ||
314 | [`syn::Signature`](https://docs.rs/syn/1.0.19/syn/struct.Signature.html) | ||
315 | open. | ||
316 | |||
317 | ```rust | ||
318 | // src/lib.rs | ||
319 | |||
320 | fn generate_curry(parsed: ItemFn) -> proc_macro2::TokenStream { | ||
321 | let fn_body = parsed.block; // function body | ||
322 | let sig = parsed.sig; // function signature | ||
323 | let vis = parsed.vis; // visibility, pub or not | ||
324 | let fn_name = sig.ident; // function name/identifier | ||
325 | let fn_args = sig.inputs; // comma separated args | ||
326 | let fn_return_type = sig.output; // return type | ||
327 | } | ||
328 | ``` | ||
329 | |||
330 | We are simply extracting the bits of the function, we will | ||
331 | be reusing the original function's visibility and name. Take | ||
332 | a look at what `syn::Signature` can tell us about a | ||
333 | function: | ||
334 | |||
335 | ``` | ||
336 | .-- syn::Ident (ident) | ||
337 | / | ||
338 | fn add(x: u32, y: u32) -> u32 | ||
339 | (fn_token) / ~~~~~~~,~~~~~~ ~~~~~~ | ||
340 | syn::token::Fn --' / \ (output) | ||
341 | ' `- syn::ReturnType | ||
342 | Punctuated<FnArg, Comma> (inputs) | ||
343 | ``` | ||
344 | |||
345 | Enough analysis, lets produce our first bit of Rust code. | ||
346 | |||
347 | #### Function Body | ||
348 | |||
349 | Recall that the body of a curried `add` should look like | ||
350 | this: | ||
351 | |||
352 | ```rust | ||
353 | return move |y| move |z| x + y + z; | ||
354 | ``` | ||
355 | |||
356 | And in general: | ||
357 | |||
358 | ```rust | ||
359 | return move |arg2| move |arg3| ... |argN| <function body here> | ||
360 | ``` | ||
361 | |||
362 | We already have the function's body, provided by `fn_body`, | ||
363 | in our `generate_curry` function. All that's left to add is | ||
364 | the `move |arg2| move |arg3| ...` stuff, for which we need | ||
365 | to extract the argument identifiers | ||
366 | (doc: | ||
367 | [Punctuated](https://docs.rs/syn/1.0.18/syn/punctuated/struct.Punctuated.html), | ||
368 | [FnArg](https://docs.rs/syn/1.0.18/syn/enum.FnArg.html), | ||
369 | [PatType](https://docs.rs/syn/1.0.18/syn/struct.PatType.html)): | ||
370 | |||
371 | ```rust | ||
372 | // src/lib.rs | ||
373 | use syn::punctuated::Punctuated; | ||
374 | use syn::{parse_macro_input, FnArg, Pat, ItemFn, Block}; | ||
375 | |||
376 | fn extract_arg_idents(fn_args: Punctuated<FnArg, syn::token::Comma>) -> Vec<Box<Pat>> { | ||
377 | return fn_args.into_iter().map(extract_arg_pat).collect::<Vec<_>>(); | ||
378 | } | ||
379 | ``` | ||
380 | |||
381 | Alright, so we are iterating over function args | ||
382 | (`Punctuated` is a collection that you can iterate over) and | ||
383 | mapping an `extract_arg_pat` to every item. What's | ||
384 | `extract_arg_pat`? | ||
385 | |||
386 | ```rust | ||
387 | // src/lib.rs | ||
388 | |||
389 | fn extract_arg_pat(a: FnArg) -> Box<Pat> { | ||
390 | match a { | ||
391 | FnArg::Typed(p) => p.pat, | ||
392 | _ => panic!("Not supported on types with `self`!"), | ||
393 | } | ||
394 | } | ||
395 | ``` | ||
396 | |||
397 | `FnArg` is an enum type as you might have guessed. The | ||
398 | `Typed` variant encompasses args that are written as `name: | ||
399 | type` and the other variant, `Reciever` refers to `self` | ||
400 | types. Ignore those for now, keep it simple. | ||
401 | |||
402 | Every `FnArg::Typed` value contains a `pat`, which is in | ||
403 | essence, the name of the argument. The type of the arg is | ||
404 | accessible via `p.ty` (we will be using this later). | ||
405 | |||
406 | With that done, we should be able to write the codegen for | ||
407 | the function body: | ||
408 | |||
409 | ```rust | ||
410 | // src/lib.rs | ||
411 | |||
412 | fn generate_body(fn_args: &[Box<Pat>], body: Box<Block>) -> proc_macro2::TokenStream { | ||
413 | quote! { | ||
414 | return #( move |#fn_args| )* #body | ||
415 | } | ||
416 | } | ||
417 | ``` | ||
418 | |||
419 | That is some scary looking syntax! Allow me to explain. The | ||
420 | `quote!{ ... }` returns a `proc_macro2::TokenStream`, if we | ||
421 | wrote `quote!{ let x = 1 + 2; }`, it wouldn't create a new | ||
422 | variable `x` with value 3, it would literally produce a | ||
423 | stream of tokens with that expression. | ||
424 | |||
425 | The `#` enables variable interpolation. `#body` will look | ||
426 | for `body` in the current scope, take its value, and insert | ||
427 | it in the returned `TokenStream`. Kinda like quasi quoting | ||
428 | in LISPs, you have written one. | ||
429 | |||
430 | What about `#( move |#fn_args| )*`? That is repetition. | ||
431 | `quote` iterates through `fn_args`, and drops a `move` behind | ||
432 | each one, it then places pipes (`|`), around it. | ||
433 | |||
434 | Let us test our first bit of codegen! Modify `generate_curry` like so: | ||
435 | |||
436 | ```rust | ||
437 | // src/lib.rs | ||
438 | |||
439 | fn generate_curry(parsed: ItemFn) -> TokenStream { | ||
440 | let fn_body = parsed.block; | ||
441 | let sig = parsed.sig; | ||
442 | let vis = parsed.vis; | ||
443 | let fn_name = sig.ident; | ||
444 | let fn_args = sig.inputs; | ||
445 | let fn_return_type = sig.output; | ||
446 | |||
447 | + let arg_idents = extract_arg_idents(fn_args.clone()); | ||
448 | + let first_ident = &arg_idents.first().unwrap(); | ||
449 | |||
450 | + // remember, our curried body starts with the second argument! | ||
451 | + let curried_body = generate_body(&arg_idents[1..], fn_body.clone()); | ||
452 | + println!("{}", curried_body); | ||
453 | |||
454 | return TokenStream::new(); | ||
455 | } | ||
456 | ``` | ||
457 | Add a little test to `tests/`: | ||
458 | |||
459 | ```rust | ||
460 | // tests/smoke.rs | ||
461 | |||
462 | #[currying::curry] | ||
463 | fn add(x: u32, y: u32, z: u32) -> u32 { | ||
464 | x + y + z | ||
465 | } | ||
466 | |||
467 | #[test] | ||
468 | fn works() { | ||
469 | assert!(true); | ||
470 | } | ||
471 | ``` | ||
472 | |||
473 | You should find something like this in the output of `cargo | ||
474 | test`: | ||
475 | |||
476 | ``` | ||
477 | return move | y | move | z | { x + y + z } | ||
478 | ``` | ||
479 | |||
480 | Glorious `println!` debugging! | ||
481 | |||
482 | #### Function signature | ||
483 | |||
484 | This section gets into the more complicated bits of the | ||
485 | macro, generating type aliases and the function signature. | ||
486 | By the end of this section, we should have a full working | ||
487 | auto-currying macro! | ||
488 | |||
489 | Recall what our generated type aliases should look like, for | ||
490 | our `add` function: | ||
491 | |||
492 | ```rust | ||
493 | type T0 = u32; | ||
494 | type T1 = impl Fn(u32) -> T0; | ||
495 | type T2 = impl Fn(u32) -> T1; | ||
496 | ``` | ||
497 | In general: | ||
498 | |||
499 | ```rust | ||
500 | type T0 = <return type>; | ||
501 | type T1 = impl Fn(<type of arg N>) -> T0; | ||
502 | type T2 = impl Fn(<type of arg N - 1>) -> T1; | ||
503 | . | ||
504 | . | ||
505 | . | ||
506 | type T(N-1) = impl Fn(<type of arg 2>) -> T(N-2); | ||
507 | ``` | ||
508 | |||
509 | To codegen that, we need the types of: | ||
510 | |||
511 | - all our inputs (arguments) | ||
512 | - the output (the return type) | ||
513 | |||
514 | To fetch the types of all our inputs, we can simply reuse | ||
515 | the bits we wrote to fetch the names of all our inputs! | ||
516 | (doc: [Type](https://docs.rs/syn/1.0.18/syn/enum.Type.html)) | ||
517 | |||
518 | ```rust | ||
519 | // src/lib.rs | ||
520 | |||
521 | use syn::{parse_macro_input, Block, FnArg, ItemFn, Pat, ReturnType, Type}; | ||
522 | |||
523 | fn extract_type(a: FnArg) -> Box<Type> { | ||
524 | match a { | ||
525 | FnArg::Typed(p) => p.ty, // notice `ty` instead of `pat` | ||
526 | _ => panic!("Not supported on types with `self`!"), | ||
527 | } | ||
528 | } | ||
529 | |||
530 | fn extract_arg_types(fn_args: Punctuated<FnArg, syn::token::Comma>) -> Vec<Box<Type>> { | ||
531 | return fn_args.into_iter().map(extract_type).collect::<Vec<_>>(); | ||
532 | |||
533 | } | ||
534 | ``` | ||
535 | |||
536 | A good reader would have looked at the docs for output | ||
537 | member of the `syn::Signature` struct. It has the type | ||
538 | `syn::ReturnType`. So there is no extraction to do here | ||
539 | right? There are actually a couple of things we have to | ||
540 | ensure here: | ||
541 | |||
542 | 1. We need to ensure that the function returns! A function | ||
543 | that does not return is pointless in this case, and I | ||
544 | will tell you why, in the [Notes](#notes) section. | ||
545 | |||
546 | 2. A `ReturnType` encloses the arrow of the return as well, | ||
547 | we need to get rid of that. Recall: | ||
548 | ```rust | ||
549 | type T0 = u32 | ||
550 | // and not | ||
551 | type T0 = -> u32 | ||
552 | ``` | ||
553 | |||
554 | Here is the snippet that handles extraction of the | ||
555 | return type (doc: [syn::ReturnType](https://docs.rs/syn/1.0.19/syn/enum.ReturnType.html)): | ||
556 | |||
557 | ```rust | ||
558 | // src/lib.rs | ||
559 | |||
560 | fn extract_return_type(a: ReturnType) -> Box<Type> { | ||
561 | match a { | ||
562 | ReturnType::Type(_, p) => p, | ||
563 | _ => panic!("Not supported on functions without return types!"), | ||
564 | } | ||
565 | } | ||
566 | ``` | ||
567 | |||
568 | You might notice that we are making extensive use of the | ||
569 | `panic!` macro. Well, that is because it is a good idea to | ||
570 | quit on receiving an unsatisfactory `TokenStream`. | ||
571 | |||
572 | With all our types ready, we can get on with generating type | ||
573 | aliases: | ||
574 | |||
575 | ```rust | ||
576 | // src/lib.rs | ||
577 | |||
578 | use quote::{quote, format_ident}; | ||
579 | |||
580 | fn generate_type_aliases( | ||
581 | fn_arg_types: &[Box<Type>], | ||
582 | fn_return_type: Box<Type>, | ||
583 | fn_name: &syn::Ident, | ||
584 | ) -> Vec<proc_macro2::TokenStream> { // 1 | ||
585 | |||
586 | let type_t0 = format_ident!("_{}_T0", fn_name); // 2 | ||
587 | let mut type_aliases = vec![quote! { type #type_t0 = #fn_return_type }]; | ||
588 | |||
589 | // 3 | ||
590 | for (i, t) in (1..).zip(fn_arg_types.into_iter().rev()) { | ||
591 | let p = format_ident!("_{}_{}", fn_name, format!("T{}", i - 1)); | ||
592 | let n = format_ident!("_{}_{}", fn_name, format!("T{}", i)); | ||
593 | |||
594 | type_aliases.push(quote! { | ||
595 | type #n = impl Fn(#t) -> #p | ||
596 | }); | ||
597 | } | ||
598 | |||
599 | return type_aliases; | ||
600 | } | ||
601 | |||
602 | ``` | ||
603 | |||
604 | **1. The return value** | ||
605 | We are returning a `Vec<proc_macro2::TokenStream>`, i. e., a | ||
606 | list of `TokenStream`s, where each item is a type alias. | ||
607 | |||
608 | **2. Format identifier?** | ||
609 | I've got some explanation to do on this line. Clearly, we | ||
610 | are trying to write the first type alias, and initialize our | ||
611 | `TokenStream` vector with `T0`, because it is different from | ||
612 | the others: | ||
613 | |||
614 | ```rust | ||
615 | type T0 = something | ||
616 | // the others are of the form | ||
617 | type Tr = impl Fn(something) -> something | ||
618 | ``` | ||
619 | |||
620 | `format_ident!` is similar to `format!`. Instead of | ||
621 | returning a formatted string, it returns a `syn::Ident`. | ||
622 | Therefore, `type_t0` is actually an identifier for, in the | ||
623 | case of our `add` function, `_add_T0`. Why is this | ||
624 | formatting important? Namespacing. | ||
625 | |||
626 | Picture this, we have two functions, `add` and `subtract`, | ||
627 | that we wish to curry with our macro: | ||
628 | |||
629 | ```rust | ||
630 | #[curry] | ||
631 | fn add(...) -> u32 { ... } | ||
632 | |||
633 | #[curry] | ||
634 | fn sub(...) -> u32 { ... } | ||
635 | ``` | ||
636 | |||
637 | Here is the same but with macros expanded: | ||
638 | |||
639 | ```rust | ||
640 | type T0 = u32; | ||
641 | type T1 = impl Fn(u32) -> T0; | ||
642 | fn add( ... ) -> T1 { ... } | ||
643 | |||
644 | type T0 = u32; | ||
645 | type T1 = impl Fn(u32) -> T0; | ||
646 | fn sub( ... ) -> T1 { ... } | ||
647 | ``` | ||
648 | |||
649 | We end up with two definitions of `T0`! Now, if we do the | ||
650 | little `format_ident!` dance we did up there: | ||
651 | |||
652 | ```rust | ||
653 | type _add_T0 = u32; | ||
654 | type _add_T1 = impl Fn(u32) -> _add_T0; | ||
655 | fn add( ... ) -> _add_T1 { ... } | ||
656 | |||
657 | type _sub_T0 = u32; | ||
658 | type _sub_T1 = impl Fn(u32) -> _sub_T0; | ||
659 | fn sub( ... ) -> _sub_T1 { ... } | ||
660 | ``` | ||
661 | |||
662 | Voilà! The type aliases don't tread on each other. Remember | ||
663 | to import `format_ident` from the `quote` crate. | ||
664 | |||
665 | **3. The TokenStream Vector** | ||
666 | |||
667 | We iterate over our types in reverse order (`T0` is the | ||
668 | last return, `T1` is the second last, so on), assign a | ||
669 | number to each iteration with `zip`, generate type names | ||
670 | with `format_ident`, push a `TokenStream` with the help of | ||
671 | `quote` and variable interpolation. | ||
672 | |||
673 | If you are wondering why we used `(1..).zip()` instead of | ||
674 | `.enumerate()`, it's because we wanted to start counting | ||
675 | from 1 instead of 0 (we are already done with `T0`!). | ||
676 | |||
677 | |||
678 | #### Getting it together | ||
679 | |||
680 | I promised we'd have a fully working macro by the end of | ||
681 | last section. I lied, we have to tie everything together in | ||
682 | our `generate_curry` function: | ||
683 | |||
684 | ```rust | ||
685 | // src/lib.rs | ||
686 | |||
687 | fn generate_curry(parsed: ItemFn) -> proc_macro2::TokenStream { | ||
688 | let fn_body = parsed.block; | ||
689 | let sig = parsed.sig; | ||
690 | let vis = parsed.vis; | ||
691 | let fn_name = sig.ident; | ||
692 | let fn_args = sig.inputs; | ||
693 | let fn_return_type = sig.output; | ||
694 | |||
695 | let arg_idents = extract_arg_idents(fn_args.clone()); | ||
696 | let first_ident = &arg_idents.first().unwrap(); | ||
697 | let curried_body = generate_body(&arg_idents[1..], fn_body.clone()); | ||
698 | |||
699 | + let arg_types = extract_arg_types(fn_args.clone()); | ||
700 | + let first_type = &arg_types.first().unwrap(); | ||
701 | + let type_aliases = generate_type_aliases( | ||
702 | + &arg_types[1..], | ||
703 | + extract_return_type(fn_return_type), | ||
704 | + &fn_name, | ||
705 | + ); | ||
706 | |||
707 | + let return_type = format_ident!("_{}_{}", &fn_name, format!("T{}", type_aliases.len() - 1)); | ||
708 | |||
709 | + return quote! { | ||
710 | + #(#type_aliases);* ; | ||
711 | + #vis fn #fn_name (#first_ident: #first_type) -> #return_type { | ||
712 | + #curried_body ; | ||
713 | + } | ||
714 | + }; | ||
715 | } | ||
716 | ``` | ||
717 | |||
718 | Most of the additions are self explanatory, I'll go through | ||
719 | the return statement with you. We are returning a `quote!{ | ||
720 | ... }`, so a `proc_macro2::TokenStream`. We are iterating | ||
721 | through the `type_aliases` variable, which you might recall, | ||
722 | is a `Vec<TokenStream>`. You might notice the sneaky | ||
723 | semicolon before the `*`. This basically tells `quote`, to | ||
724 | insert an item, then a semicolon, and then the next one, | ||
725 | another semicolon, and so on. The semicolon is a separator. | ||
726 | We need to manually insert another semicolon at the end of | ||
727 | it all, `quote` doesn't insert a separator at the end of the | ||
728 | iteration. | ||
729 | |||
730 | We retain the visibility and name of our original function. | ||
731 | Our curried function takes as args, just the first argument | ||
732 | of our original function. The return type of our curried | ||
733 | function is actually, the last type alias we create. If you | ||
734 | think back to our manually curried `add` function, we | ||
735 | returned `T2`, which was in fact, the last type alias we | ||
736 | created. | ||
737 | |||
738 | I am sure, at this point, you are itching to test this out, | ||
739 | but before that, let me introduce you to some good methods | ||
740 | of debugging proc-macro code. | ||
741 | |||
742 | ### Debugging and Testing | ||
743 | |||
744 | Install `cargo-expand` via: | ||
745 | |||
746 | ``` | ||
747 | cargo install cargo-expand | ||
748 | ``` | ||
749 | |||
750 | `cargo-expand` is a neat little tool that expands your macro | ||
751 | in places where it is used, and lets you view the generated | ||
752 | code! For example: | ||
753 | |||
754 | ```shell | ||
755 | # create a bin package hello | ||
756 | $ cargo new hello | ||
757 | |||
758 | # view the expansion of the println! macro | ||
759 | $ cargo expand | ||
760 | |||
761 | #![feature(prelude_import)] | ||
762 | #[prelude_import] | ||
763 | use std::prelude::v1::*; | ||
764 | #[macro_use] | ||
765 | extern crate std; | ||
766 | fn main() { | ||
767 | { | ||
768 | ::std::io::_print(::core::fmt::Arguments::new_v1( | ||
769 | &["Hello, world!\n"], | ||
770 | &match () { | ||
771 | () => [], | ||
772 | }, | ||
773 | )); | ||
774 | }; | ||
775 | } | ||
776 | ``` | ||
777 | |||
778 | Writing proc-macros without `cargo-expand` is tantamount to | ||
779 | driving a vehicle without rear view mirrors! Keep an eye on | ||
780 | what is going on behind your back. | ||
781 | |||
782 | Now, your macro won't always compile, you might just recieve | ||
783 | the bee movie script as an error. `cargo-expand` will not | ||
784 | work in such cases. I would suggest printing out your | ||
785 | variables to inspect them. `TokenStream` implements | ||
786 | `Display` as well as `Debug`. We don't always have to be | ||
787 | respectable programmers. Just print it. | ||
788 | |||
789 | Enough of that, lets get testing: | ||
790 | |||
791 | ```rust | ||
792 | // tests/smoke.rs | ||
793 | |||
794 | #![feature(type_alias_impl_trait)] | ||
795 | |||
796 | #[crate_name::curry] | ||
797 | fn add(x: u32, y: u32, z: u32) -> u32 { | ||
798 | x + y + z | ||
799 | } | ||
800 | |||
801 | #[test] | ||
802 | fn works() { | ||
803 | assert_eq!(15, add(4)(5)(6)); | ||
804 | } | ||
805 | ``` | ||
806 | |||
807 | Run `cargo +nightly test`. You should see a pleasing | ||
808 | message: | ||
809 | |||
810 | ``` | ||
811 | running 1 test | ||
812 | test tests::works ... ok | ||
813 | ``` | ||
814 | |||
815 | Take a look at the expansion for our curry macro, via | ||
816 | `cargo +nightly expand --tests smoke`: | ||
817 | |||
818 | ```rust | ||
819 | type _add_T0 = u32; | ||
820 | type _add_T1 = impl Fn(u32) -> _add_T0; | ||
821 | type _add_T2 = impl Fn(u32) -> _add_T1; | ||
822 | fn add(x: u32) -> _add_T2 { | ||
823 | return (move |y| { | ||
824 | move |z| { | ||
825 | return x + y + z; | ||
826 | } | ||
827 | }); | ||
828 | } | ||
829 | |||
830 | // a bunch of other stuff generated by #[test] and assert_eq! | ||
831 | ``` | ||
832 | |||
833 | A sight for sore eyes. | ||
834 | |||
835 | Here is a more complex example that generates ten multiples | ||
836 | of the first ten natural numbers: | ||
837 | |||
838 | ```rust | ||
839 | #[curry] | ||
840 | fn product(x: u32, y: u32) -> u32 { | ||
841 | x * y | ||
842 | } | ||
843 | |||
844 | fn multiples() -> Vec<Vec<u32>>{ | ||
845 | let v = (1..=10).map(product); | ||
846 | return (1..=10) | ||
847 | .map(|x| v.clone().map(|f| f(x)).collect()) | ||
848 | .collect(); | ||
849 | } | ||
850 | ``` | ||
851 | |||
852 | ### Notes | ||
853 | |||
854 | I didn't quite explain why we use `move |arg|` in our | ||
855 | closure. This is because we want to take ownership of the | ||
856 | variable supplied to us. Take a look at this example: | ||
857 | |||
858 | ```rust | ||
859 | let v = add(5); | ||
860 | let g; | ||
861 | { | ||
862 | let x = 5; | ||
863 | g = v(x); | ||
864 | } | ||
865 | println!("{}", g(2)); | ||
866 | ``` | ||
867 | |||
868 | Variable `x` goes out of scope before `g` can return a | ||
869 | concrete value. If we take ownership of `x` by `move`ing it | ||
870 | into our closure, we can expect this to work reliably. In | ||
871 | fact, rustc understands this, and forces you to use `move`. | ||
872 | |||
873 | This usage of `move` is exactly why **a curried function | ||
874 | without a return is useless**. Every variable we pass to our | ||
875 | curried function gets moved into its local scope. Playing | ||
876 | with these variables cannot cause a change outside this | ||
877 | scope. Returning is our only method of interaction with | ||
878 | anything beyond this function. | ||
879 | |||
880 | ### Conclusion | ||
881 | |||
882 | Currying may not seem to be all that useful. Curried | ||
883 | functions are unwieldy in Rust because the standard library | ||
884 | is not built around currying. If you enjoy the possibilities | ||
885 | posed by currying, consider taking a look at Haskell or | ||
886 | Scheme. | ||
887 | |||
888 | My original intention with [peppe.rs](https://peppe.rs) was | ||
889 | to post condensed articles, a micro blog, but this one | ||
890 | turned out extra long. | ||
891 | |||
892 | Perhaps I should call it a 'macro' blog :) | ||