From 366df8852f503523cc4f9046d82ba9a99dd51d7f Mon Sep 17 00:00:00 2001 From: Akshay Date: Sun, 12 Feb 2023 12:13:49 +0530 Subject: new art: lapse --- docs/posts/auto-currying_rust_functions/index.html | 496 +++++++++++++++------ 1 file changed, 367 insertions(+), 129 deletions(-) (limited to 'docs/posts/auto-currying_rust_functions') diff --git a/docs/posts/auto-currying_rust_functions/index.html b/docs/posts/auto-currying_rust_functions/index.html index 0d6fc77..7b05aeb 100644 --- a/docs/posts/auto-currying_rust_functions/index.html +++ b/docs/posts/auto-currying_rust_functions/index.html @@ -28,7 +28,7 @@ 09/05 — 2020
- 356.43 + 356.44 cm   @@ -42,13 +42,22 @@ Auto-currying Rust Functions
-

This post contains a gentle introduction to procedural macros in Rust and a guide to writing a procedural macro to curry Rust functions. The source code for the entire library can be found here. It is also available on crates.io.

-

The following links might prove to be useful before getting started:

+

This post contains a gentle introduction to procedural macros in Rust +and a guide to writing a procedural macro to curry Rust functions. The +source code for the entire library can be found here. It is also +available on crates.io.

+

The following links might prove to be useful before getting +started:

-

Or you can pretend you read them, because I have included a primer here :)

+

Or you can pretend you read them, because I have included a primer +here :)

Contents

  1. Currying
    @@ -73,7 +82,12 @@
  2. Conclusion

Currying

-

Currying is the process of transformation of a function call like f(a, b, c) to f(a)(b)(c). A curried function returns a concrete value only when it receives all its arguments! If it does recieve an insufficient amount of arguments, say 1 of 3, it returns a curried function, that returns after receiving 2 arguments.

+

Currying is the process of transformation of a function call like +f(a, b, c) to f(a)(b)(c). A curried function +returns a concrete value only when it receives all its arguments! If it +does recieve an insufficient amount of arguments, say 1 of 3, it returns +a curried function, that returns after receiving 2 +arguments.

curry(f(a, b, c)) = h(a)(b)(c)
 
 h(x) = g   <- curried function that takes upto 2 args (g)
@@ -82,49 +96,86 @@ k(z) = v   <- a value (v)
 
 Keen readers will conclude the following,
 h(x)(y)(z) = g(y)(z) = k(z) = v
-

Mathematically, if f is a function that takes two arguments x and y, such that x ϵ X, and y ϵ Y , we write it as:

+

Mathematically, if f is a function that takes two +arguments x and y, such that +x ϵ X, and y ϵ Y , we write it as:

f: (X × Y) -> Z
-

where × denotes the Cartesian product of set X and Y, and curried f (denoted by h here) is written as:

+

where × denotes the Cartesian product of set +X and Y, and curried f (denoted +by h here) is written as:

h: X -> (Y -> Z)

Procedural Macros

-

These are functions that take code as input and spit out modified code as output. Powerful stuff. Rust has three kinds of proc-macros:

+

These are functions that take code as input and spit out modified +code as output. Powerful stuff. Rust has three kinds of proc-macros:

  • Function like macros
  • -
  • Derive macros: #[derive(...)], used to automatically implement traits for structs/enums
    +
  • Derive macros: #[derive(...)], used to automatically +implement traits for structs/enums
  • -
  • and Attribute macros: #[test], usually slapped onto functions
  • +
  • and Attribute macros: #[test], usually slapped onto +functions
-

We will be using Attribute macros to convert a Rust function into a curried Rust function, which we should be able to call via: function(arg1)(arg2).

+

We will be using Attribute macros to convert a Rust function into a +curried Rust function, which we should be able to call via: +function(arg1)(arg2).

Definitions

-

Being respectable programmers, we define the input to and the output from our proc-macro. Here’s a good non-trivial function to start out with:

-
fn add(x: u32, y: u32, z: u32) -> u32 {
-  return x + y + z;
+

Being respectable programmers, we define the input to and the output +from our proc-macro. Here’s a good non-trivial function to start out +with:

+
fn add(x: u32, y: u32, z: u32) -> u32 {
+  return x + y + z;
 }
-

Hmm, what would our output look like? What should our proc-macro generate ideally? Well, if we understood currying correctly, we should accept an argument and return a function that accepts an argument and returns … you get the point. Something like this should do:

-
fn add_curried1(x: u32) -> ? {
-  return fn add_curried2 (y: u32) -> ? {
-    return fn add_curried3 (z: u32) -> u32 {
-      return x + y + z;
+

Hmm, what would our output look like? What should our proc-macro +generate ideally? Well, if we understood currying correctly, we should +accept an argument and return a function that accepts an argument and +returns … you get the point. Something like this should do:

+
fn add_curried1(x: u32) -> ? {
+  return fn add_curried2 (y: u32) -> ? {
+    return fn add_curried3 (z: u32) -> u32 {
+      return x + y + z;
     }
   }
 }

A couple of things to note:

Return types
-We have placed ?s in place of return types. Let’s try to fix that. add_curried3 returns the ‘value’, so u32 is accurate. add_curried2 returns add_curried3. What is the type of add_curried3? It is a function that takes in a u32 and returns a u32. So a fn(u32) -> u32 will do right? No, I’ll explain why in the next point, but for now, we will make use of the Fn trait, our return type is impl Fn(u32) -> u32. This basically tells the compiler that we will be returning something function-like, a.k.a, behaves like a Fn. Cool!

-

If you have been following along, you should be able to tell that the return type of add_curried1 is:

+We have placed ?s in place of return types. Let’s try to +fix that. add_curried3 returns the ‘value’, so +u32 is accurate. add_curried2 returns +add_curried3. What is the type of +add_curried3? It is a function that takes in a +u32 and returns a u32. So a +fn(u32) -> u32 will do right? No, I’ll explain why in +the next point, but for now, we will make use of the Fn +trait, our return type is impl Fn(u32) -> u32. This +basically tells the compiler that we will be returning something +function-like, a.k.a, behaves like a Fn. Cool!

+

If you have been following along, you should be able to tell that the +return type of add_curried1 is:

impl Fn(u32) -> (impl Fn(u32) -> u32)
-

We can drop the parentheses because -> is right associative:

+

We can drop the parentheses because -> is right +associative:

impl Fn(u32) -> impl Fn(u32) -> u32
 

Accessing environment
-A function cannot access it’s environment. Our solution will not work. add_curried3 attempts to access x, which is not allowed! A closure1 however, can. If we are returning a closure, our return type must be impl Fn, and not fn. The difference between the Fn trait and function pointers is beyond the scope of this post.

+A function cannot access it’s environment. Our solution will not work. +add_curried3 attempts to access x, which is +not allowed! A closure1 however, can. If we are returning a +closure, our return type must be impl Fn, and not +fn. The difference between the Fn trait and +function pointers is beyond the scope of this post.

Refinement

-

Armed with knowledge, we refine our expected output, this time, employing closures:

-
fn add(x: u32) -> impl Fn(u32) -> impl Fn(u32) -> u32 {
-  return move |y| move |z| x + y + z;
+

Armed with knowledge, we refine our expected output, this time, +employing closures:

+
fn add(x: u32) -> impl Fn(u32) -> impl Fn(u32) -> u32 {
+  return move |y| move |z| x + y + z;
 }
-

Alas, that does not compile either! It errors out with the following message:

+

Alas, that does not compile either! It errors out with the following +message:

error[E0562]: `impl Trait` not allowed outside of function
 and inherent method return types
   --> src/main.rs:17:37
@@ -132,21 +183,33 @@ and inherent method return types
    | fn add(x: u32) -> impl Fn(u32) -> impl Fn(u32) -> u32
    |                                   ^^^^^^^^^^^^^^^^^^^
 
-

You are allowed to return an impl Fn only inside a function. We are currently returning it from another return! Or at least, that was the most I could make out of the error message.

-

We are going to have to cheat a bit to fix this issue; with type aliases and a convenient nightly feature 2:

-
#![feature(type_alias_impl_trait)]  // allows us to use `impl Fn` in type aliases!
+

You are allowed to return an impl Fn only inside a +function. We are currently returning it from another return! Or at +least, that was the most I could make out of the error message.

+

We are going to have to cheat a bit to fix this issue; with type +aliases and a convenient nightly feature 2:

+
#![feature(type_alias_impl_trait)]  // allows us to use `impl Fn` in type aliases!
 
 type T0 = u32;                 // the return value when zero args are to be applied
 type T1 = impl Fn(u32) -> T0;  // the return value when one arg is to be applied
 type T2 = impl Fn(u32) -> T1;  // the return value when two args are to be applied
 
 fn add(x: u32) -> T2 {
-  return move |y| move |z| x + y + z;
+  return move |y| move |z| x + y + z;
 }
-

Drop that into a cargo project, call add(4)(5)(6), cross your fingers, and run cargo +nightly run. You should see a 15 unless you forgot to print it!

+

Drop that into a cargo project, call add(4)(5)(6), cross +your fingers, and run cargo +nightly run. You should see a +15 unless you forgot to print it!

The In-Betweens

-

Let us write the magical bits that take us from function to curried function.

-

Initialize your workspace with cargo new --lib currying. Proc-macro crates are libraries with exactly one export, the macro itself. Add a tests directory to your crate root. Your directory should look something like this:

+

Let us write the magical bits that take us from function to curried +function.

+

Initialize your workspace with cargo new --lib currying. +Proc-macro crates are libraries with exactly one export, the macro +itself. Add a tests directory to your crate root. Your +directory should look something like this:

.
 ├── Cargo.toml
 ├── src
@@ -156,9 +219,11 @@ and inherent method return types
 

Dependencies

We will be using a total of 3 external crates:

Here’s a sample Cargo.toml:

# Cargo.toml
@@ -173,10 +238,12 @@ features = ["full"]
 
 [lib]
 proc-macro = true  # this is important!
-

We will be using an external proc-macro2 crate as well as an internal proc-macro crate. Not confusing at all!

+

We will be using an external proc-macro2 crate as well +as an internal proc-macro crate. Not confusing at all!

The attribute macro

Drop this into src/lib.rs, to get the ball rolling.

-
// src/lib.rs
+
// src/lib.rs
 
 use proc_macro::TokenStream;  // 1
 use quote::quote;
@@ -190,17 +257,49 @@ proc-macro = true  # this is important!
fn generate_curry(parsed: ItemFn) -> proc_macro2::TokenStream {}

1. Imports

-

A Tokenstream holds (hopefully valid) Rust code, this is the type of our input and output. Note that we are importing this type from proc_macro and not proc_macro2.

-

quote! from the quote crate is a macro that allows us to quickly produce TokenStreams. Much like the LISP quote procedure, you can use the quote! macro for symbolic transformations.

-

ItemFn from the syn crate holds the parsed TokenStream of a Rust function. parse_macro_input! is a helper macro provided by syn.

+

A Tokenstream holds (hopefully valid) Rust code, this is +the type of our input and output. Note that we are importing this type +from proc_macro and not proc_macro2.

+

quote! from the quote crate is a macro that +allows us to quickly produce TokenStreams. Much like the +LISP quote procedure, you can use the quote! +macro for symbolic transformations.

+

ItemFn from the syn crate holds the parsed +TokenStream of a Rust function. +parse_macro_input! is a helper macro provided by +syn.

2. The lone export

-

Annotate the only pub of our crate with #[proc_macro_attribute]. This tells rustc that curry is a procedural macro, and allows us to use it as #[crate_name::curry] in other crates. Note the signature of the curry function. _attr is the TokenStream representing the attribute itself, item refers to the thing we slapped our macro into, in this case a function (like add). The return value is a modified TokenStream, this will contain our curried version of add.

+

Annotate the only pub of our crate with +#[proc_macro_attribute]. This tells rustc that +curry is a procedural macro, and allows us to use it as +#[crate_name::curry] in other crates. Note the signature of +the curry function. _attr is the +TokenStream representing the attribute itself, +item refers to the thing we slapped our macro into, in this +case a function (like add). The return value is a modified +TokenStream, this will contain our curried version of +add.

3. The helper macro

-

A TokenStream is a little hard to work with, which is why we have the syn crate, which provides types to represent Rust tokens. An RArrow struct to represent the return arrow on a function and so on. One of those types is ItemFn, that represents an entire Rust function. The parse_macro_input! automatically puts the input to our macro into an ItemFn. What a gentleman!

+

A TokenStream is a little hard to work with, which is +why we have the syn crate, which provides types to +represent Rust tokens. An RArrow struct to represent the +return arrow on a function and so on. One of those types is +ItemFn, that represents an entire Rust function. The +parse_macro_input! automatically puts the input to our +macro into an ItemFn. What a gentleman!

4. Returning TokenStreams

-

We haven’t filled in generate_curry yet, but we can see that it returns a proc_macro2::TokenStream and not a proc_macro::TokenStream, so drop a .into() to convert it.

-

Lets move on, and fill in generate_curry, I would suggest keeping the documentation for syn::ItemFn and syn::Signature open.

-
// src/lib.rs
+

We haven’t filled in generate_curry yet, but we can see +that it returns a proc_macro2::TokenStream and not a +proc_macro::TokenStream, so drop a .into() to +convert it.

+

Lets move on, and fill in generate_curry, I would +suggest keeping the documentation for syn::ItemFn +and syn::Signature +open.

+
// src/lib.rs
 
 fn generate_curry(parsed: ItemFn) -> proc_macro2::TokenStream {
   let fn_body = parsed.block;      // function body
@@ -210,7 +309,9 @@ proc-macro = true  # this is important!
let fn_args = sig.inputs; // comma separated args let fn_return_type = sig.output; // return type }
-

We are simply extracting the bits of the function, we will be reusing the original function’s visibility and name. Take a look at what syn::Signature can tell us about a function:

+

We are simply extracting the bits of the function, we will be reusing +the original function’s visibility and name. Take a look at what +syn::Signature can tell us about a function:

                       .-- syn::Ident (ident)
                       /
                  fn add(x: u32, y: u32) -> u32
@@ -220,42 +321,77 @@ syn::token::Fn --'            /               \       (output)
              Punctuated<FnArg, Comma> (inputs)

Enough analysis, lets produce our first bit of Rust code.

Function Body

-

Recall that the body of a curried add should look like this:

-
return move |y| move |z| x + y + z;
+

Recall that the body of a curried add should look like +this:

+
return move |y| move |z| x + y + z;

And in general:

-
return move |arg2| move |arg3| ... |argN| <function body here>
-

We already have the function’s body, provided by fn_body, in our generate_curry function. All that’s left to add is the move |arg2| move |arg3| ... stuff, for which we need to extract the argument identifiers (doc: Punctuated, FnArg, PatType):

-
// src/lib.rs
+
return move |arg2| move |arg3| ... |argN| <function body here>
+

We already have the function’s body, provided by +fn_body, in our generate_curry function. All +that’s left to add is the move |arg2| move |arg3| ... +stuff, for which we need to extract the argument identifiers (doc: Punctuated, +FnArg, PatType):

+
// src/lib.rs
 use syn::punctuated::Punctuated;
 use syn::{parse_macro_input, FnArg, Pat, ItemFn, Block};
 
 fn extract_arg_idents(fn_args: Punctuated<FnArg, syn::token::Comma>) -> Vec<Box<Pat>> { 
-  return fn_args.into_iter().map(extract_arg_pat).collect::<Vec<_>>();
+  return fn_args.into_iter().map(extract_arg_pat).collect::<Vec<_>>();
 }
-

Alright, so we are iterating over function args (Punctuated is a collection that you can iterate over) and mapping an extract_arg_pat to every item. What’s extract_arg_pat?

-
// src/lib.rs
+

Alright, so we are iterating over function args +(Punctuated is a collection that you can iterate over) and +mapping an extract_arg_pat to every item. What’s +extract_arg_pat?

+
// src/lib.rs
 
 fn extract_arg_pat(a: FnArg) -> Box<Pat> {
-  match a {
+  match a {
     FnArg::Typed(p) => p.pat,
     _ => panic!("Not supported on types with `self`!"),
   }
 }
-

FnArg is an enum type as you might have guessed. The Typed variant encompasses args that are written as name: type and the other variant, Reciever refers to self types. Ignore those for now, keep it simple.

-

Every FnArg::Typed value contains a pat, which is in essence, the name of the argument. The type of the arg is accessible via p.ty (we will be using this later).

-

With that done, we should be able to write the codegen for the function body:

-
// src/lib.rs
+

FnArg is an enum type as you might have guessed. The +Typed variant encompasses args that are written as +name: type and the other variant, Reciever +refers to self types. Ignore those for now, keep it +simple.

+

Every FnArg::Typed value contains a pat, +which is in essence, the name of the argument. The type of the arg is +accessible via p.ty (we will be using this later).

+

With that done, we should be able to write the codegen for the +function body:

+
// src/lib.rs
 
 fn generate_body(fn_args: &[Box<Pat>], body: Box<Block>) -> proc_macro2::TokenStream {
   quote! {
-    return #( move |#fn_args|  )* #body
+    return #( move |#fn_args|  )* #body
   }
 }
-

That is some scary looking syntax! Allow me to explain. The quote!{ ... } returns a proc_macro2::TokenStream, if we wrote quote!{ let x = 1 + 2; }, it wouldn’t create a new variable x with value 3, it would literally produce a stream of tokens with that expression.

-

The # enables variable interpolation. #body will look for body in the current scope, take its value, and insert it in the returned TokenStream. Kinda like quasi quoting in LISPs, you have written one.

-

What about #( move |#fn_args| )*? That is repetition. quote iterates through fn_args, and drops a move behind each one, it then places pipes (|), around it.

-

Let us test our first bit of codegen! Modify generate_curry like so:

-
// src/lib.rs
+

That is some scary looking syntax! Allow me to explain. The +quote!{ ... } returns a +proc_macro2::TokenStream, if we wrote +quote!{ let x = 1 + 2; }, it wouldn’t create a new variable +x with value 3, it would literally produce a stream of +tokens with that expression.

+

The # enables variable interpolation. #body +will look for body in the current scope, take its value, +and insert it in the returned TokenStream. Kinda like quasi +quoting in LISPs, you have written one.

+

What about #( move |#fn_args| )*? That is repetition. +quote iterates through fn_args, and drops a +move behind each one, it then places pipes +(|), around it.

+

Let us test our first bit of codegen! Modify +generate_curry like so:

+
// src/lib.rs
 
  fn generate_curry(parsed: ItemFn) -> TokenStream {
    let fn_body = parsed.block;
@@ -272,10 +408,11 @@ syn::token::Fn --'            /               \       (output)
 +  let curried_body = generate_body(&arg_idents[1..], fn_body.clone());
 +  println!("{}", curried_body);
 
-   return TokenStream::new();
+   return TokenStream::new();
  }

Add a little test to tests/:

-
// tests/smoke.rs
+
// tests/smoke.rs
 
 #[currying::curry]
 fn add(x: u32, y: u32, z: u32) -> u32 {
@@ -286,17 +423,23 @@ syn::token::Fn --'            /               \       (output)
 fn works() {
   assert!(true);
 }
-

You should find something like this in the output of cargo test:

+

You should find something like this in the output of +cargo test:

return move | y | move | z | { x + y + z }

Glorious println! debugging!

Function signature

-

This section gets into the more complicated bits of the macro, generating type aliases and the function signature. By the end of this section, we should have a full working auto-currying macro!

-

Recall what our generated type aliases should look like, for our add function:

-
type T0 = u32;
+

This section gets into the more complicated bits of the macro, +generating type aliases and the function signature. By the end of this +section, we should have a full working auto-currying macro!

+

Recall what our generated type aliases should look like, for our +add function:

+
type T0 = u32;
 type T1 = impl Fn(u32) -> T0;
 type T2 = impl Fn(u32) -> T1;

In general:

-
type T0 = <return type>;
+
type T0 = <return type>;
 type T1 = impl Fn(<type of arg N>) -> T0;
 type T2 = impl Fn(<type of arg N - 1>) -> T1;
 .
@@ -308,42 +451,59 @@ syn::token::Fn --'            /               \       (output)
 
  • all our inputs (arguments)
  • the output (the return type)
  • -

    To fetch the types of all our inputs, we can simply reuse the bits we wrote to fetch the names of all our inputs! (doc: Type)

    -
    // src/lib.rs
    +

    To fetch the types of all our inputs, we can simply reuse the bits we +wrote to fetch the names of all our inputs! (doc: Type)

    +
    // src/lib.rs
     
     use syn::{parse_macro_input, Block, FnArg, ItemFn, Pat, ReturnType, Type};
     
     fn extract_type(a: FnArg) -> Box<Type> {
    -  match a {
    +  match a {
         FnArg::Typed(p) => p.ty,  // notice `ty` instead of `pat`
           _ => panic!("Not supported on types with `self`!"),
       }
     }
     
     fn extract_arg_types(fn_args: Punctuated<FnArg, syn::token::Comma>) -> Vec<Box<Type>> {
    -  return fn_args.into_iter().map(extract_type).collect::<Vec<_>>();
    +  return fn_args.into_iter().map(extract_type).collect::<Vec<_>>();
     
     }
    -

    A good reader would have looked at the docs for output member of the syn::Signature struct. It has the type syn::ReturnType. So there is no extraction to do here right? There are actually a couple of things we have to ensure here:

    +

    A good reader would have looked at the docs for output member of the +syn::Signature struct. It has the type +syn::ReturnType. So there is no extraction to do here +right? There are actually a couple of things we have to ensure here:

      -
    1. We need to ensure that the function returns! A function that does not return is pointless in this case, and I will tell you why, in the Notes section.

    2. -
    3. A ReturnType encloses the arrow of the return as well, we need to get rid of that. Recall:

      -
      type T0 = u32
      +
    4. We need to ensure that the function returns! A function that does +not return is pointless in this case, and I will tell you why, in the Notes section.

    5. +
    6. A ReturnType encloses the arrow of the return as +well, we need to get rid of that. Recall:

      +
      type T0 = u32
       // and not
       type T0 = -> u32
    -

    Here is the snippet that handles extraction of the return type (doc: syn::ReturnType):

    -
    // src/lib.rs
    +

    Here is the snippet that handles extraction of the return type (doc: +syn::ReturnType):

    +
    // src/lib.rs
     
     fn extract_return_type(a: ReturnType) -> Box<Type> {
    -  match a {
    +  match a {
         ReturnType::Type(_, p) => p,
         _ => panic!("Not supported on functions without return types!"),
       }
     }
    -

    You might notice that we are making extensive use of the panic! macro. Well, that is because it is a good idea to quit on receiving an unsatisfactory TokenStream.

    -

    With all our types ready, we can get on with generating type aliases:

    -
    // src/lib.rs
    +

    You might notice that we are making extensive use of the +panic! macro. Well, that is because it is a good idea to +quit on receiving an unsatisfactory TokenStream.

    +

    With all our types ready, we can get on with generating type +aliases:

    +
    // src/lib.rs
     
     use quote::{quote, format_ident};
     
    @@ -357,7 +517,7 @@ syn::token::Fn --'            /               \       (output)
       let mut type_aliases = vec![quote! { type #type_t0 = #fn_return_type  }];
     
       // 3
    -  for (i, t) in (1..).zip(fn_arg_types.into_iter().rev()) {
    +  for (i, t) in (1..).zip(fn_arg_types.into_iter().rev()) {
         let p = format_ident!("_{}_{}", fn_name, format!("T{}", i - 1));
         let n = format_ident!("_{}_{}", fn_name, format!("T{}", i));
     
    @@ -366,45 +526,70 @@ syn::token::Fn --'            /               \       (output)
         });
       }
     
    -  return type_aliases;
    +  return type_aliases;
     }

    1. The return value
    -We are returning a Vec<proc_macro2::TokenStream>, i. e., a list of TokenStreams, where each item is a type alias.

    +We are returning a Vec<proc_macro2::TokenStream>, i. +e., a list of TokenStreams, where each item is a type +alias.

    2. Format identifier?
    -I’ve got some explanation to do on this line. Clearly, we are trying to write the first type alias, and initialize our TokenStream vector with T0, because it is different from the others:

    -
    type T0 = something
    +I’ve got some explanation to do on this line. Clearly, we are trying to
    +write the first type alias, and initialize our TokenStream
    +vector with T0, because it is different from the
    +others:

    +
    type T0 = something
     // the others are of the form
     type Tr = impl Fn(something) -> something
    -

    format_ident! is similar to format!. Instead of returning a formatted string, it returns a syn::Ident. Therefore, type_t0 is actually an identifier for, in the case of our add function, _add_T0. Why is this formatting important? Namespacing.

    -

    Picture this, we have two functions, add and subtract, that we wish to curry with our macro:

    -
    #[curry]
    +

    format_ident! is similar to format!. +Instead of returning a formatted string, it returns a +syn::Ident. Therefore, type_t0 is actually an +identifier for, in the case of our add function, +_add_T0. Why is this formatting important? Namespacing.

    +

    Picture this, we have two functions, add and +subtract, that we wish to curry with our macro:

    +
    #[curry]
     fn add(...) -> u32 { ... }
     
     #[curry]
     fn sub(...) -> u32 { ... }

    Here is the same but with macros expanded:

    -
    type T0 = u32;
    +
    type T0 = u32;
     type T1 = impl Fn(u32) -> T0;
     fn add( ... ) -> T1 { ... }
     
     type T0 = u32;
     type T1 = impl Fn(u32) -> T0;
     fn sub( ... ) -> T1 { ... }
    -

    We end up with two definitions of T0! Now, if we do the little format_ident! dance we did up there:

    -
    type _add_T0 = u32;
    +

    We end up with two definitions of T0! Now, if we do the +little format_ident! dance we did up there:

    +
    type _add_T0 = u32;
     type _add_T1 = impl Fn(u32) -> _add_T0;
     fn add( ... ) -> _add_T1 { ... }
     
     type _sub_T0 = u32;
     type _sub_T1 = impl Fn(u32) -> _sub_T0;
     fn sub( ... ) -> _sub_T1 { ... }
    -

    Voilà! The type aliases don’t tread on each other. Remember to import format_ident from the quote crate.

    +

    Voilà! The type aliases don’t tread on each other. Remember to import +format_ident from the quote crate.

    3. The TokenStream Vector

    -

    We iterate over our types in reverse order (T0 is the last return, T1 is the second last, so on), assign a number to each iteration with zip, generate type names with format_ident, push a TokenStream with the help of quote and variable interpolation.

    -

    If you are wondering why we used (1..).zip() instead of .enumerate(), it’s because we wanted to start counting from 1 instead of 0 (we are already done with T0!).

    +

    We iterate over our types in reverse order (T0 is the +last return, T1 is the second last, so on), assign a number +to each iteration with zip, generate type names with +format_ident, push a TokenStream with the help +of quote and variable interpolation.

    +

    If you are wondering why we used (1..).zip() instead of +.enumerate(), it’s because we wanted to start counting from +1 instead of 0 (we are already done with T0!).

    Getting it together

    -

    I promised we’d have a fully working macro by the end of last section. I lied, we have to tie everything together in our generate_curry function:

    -
    // src/lib.rs
    +

    I promised we’d have a fully working macro by the end of last +section. I lied, we have to tie everything together in our +generate_curry function:

    +
    // src/lib.rs
     
      fn generate_curry(parsed: ItemFn) -> proc_macro2::TokenStream {
        let fn_body = parsed.block;
    @@ -428,20 +613,39 @@ I’ve got some explanation to do on this line. Clearly, we are trying to write
     
     +  let return_type = format_ident!("_{}_{}", &fn_name, format!("T{}", type_aliases.len() - 1));
     
    -+  return quote! {
    ++  return quote! {
     +      #(#type_aliases);* ;
     +      #vis fn #fn_name (#first_ident: #first_type) -> #return_type {
     +          #curried_body ;
     +      }
     +  };
      }
    -

    Most of the additions are self explanatory, I’ll go through the return statement with you. We are returning a quote!{ ... }, so a proc_macro2::TokenStream. We are iterating through the type_aliases variable, which you might recall, is a Vec<TokenStream>. You might notice the sneaky semicolon before the *. This basically tells quote, to insert an item, then a semicolon, and then the next one, another semicolon, and so on. The semicolon is a separator. We need to manually insert another semicolon at the end of it all, quote doesn’t insert a separator at the end of the iteration.

    -

    We retain the visibility and name of our original function. Our curried function takes as args, just the first argument of our original function. The return type of our curried function is actually, the last type alias we create. If you think back to our manually curried add function, we returned T2, which was in fact, the last type alias we created.

    -

    I am sure, at this point, you are itching to test this out, but before that, let me introduce you to some good methods of debugging proc-macro code.

    +

    Most of the additions are self explanatory, I’ll go through the +return statement with you. We are returning a +quote!{ ... }, so a proc_macro2::TokenStream. +We are iterating through the type_aliases variable, which +you might recall, is a Vec<TokenStream>. You might +notice the sneaky semicolon before the *. This basically +tells quote, to insert an item, then a semicolon, and then +the next one, another semicolon, and so on. The semicolon is a +separator. We need to manually insert another semicolon at the end of it +all, quote doesn’t insert a separator at the end of the +iteration.

    +

    We retain the visibility and name of our original function. Our +curried function takes as args, just the first argument of our original +function. The return type of our curried function is actually, the last +type alias we create. If you think back to our manually curried +add function, we returned T2, which was in +fact, the last type alias we created.

    +

    I am sure, at this point, you are itching to test this out, but +before that, let me introduce you to some good methods of debugging +proc-macro code.

    Debugging and Testing

    Install cargo-expand via:

    cargo install cargo-expand
    -

    cargo-expand is a neat little tool that expands your macro in places where it is used, and lets you view the generated code! For example:

    +

    cargo-expand is a neat little tool that expands your +macro in places where it is used, and lets you view the generated code! +For example:

    # create a bin package hello
     $ cargo new hello
     
    @@ -463,10 +667,18 @@ fn main() {
           ));
       };
     }
    -

    Writing proc-macros without cargo-expand is tantamount to driving a vehicle without rear view mirrors! Keep an eye on what is going on behind your back.

    -

    Now, your macro won’t always compile, you might just recieve the bee movie script as an error. cargo-expand will not work in such cases. I would suggest printing out your variables to inspect them. TokenStream implements Display as well as Debug. We don’t always have to be respectable programmers. Just print it.

    +

    Writing proc-macros without cargo-expand is tantamount +to driving a vehicle without rear view mirrors! Keep an eye on what is +going on behind your back.

    +

    Now, your macro won’t always compile, you might just recieve the bee +movie script as an error. cargo-expand will not work in +such cases. I would suggest printing out your variables to inspect them. +TokenStream implements Display as well as +Debug. We don’t always have to be respectable programmers. +Just print it.

    Enough of that, lets get testing:

    -
    // tests/smoke.rs
    +
    // tests/smoke.rs
     
     #![feature(type_alias_impl_trait)]
     
    @@ -479,55 +691,81 @@ fn main() {
     fn works() {
       assert_eq!(15, add(4)(5)(6));
     }
    -

    Run cargo +nightly test. You should see a pleasing message:

    +

    Run cargo +nightly test. You should see a pleasing +message:

    running 1 test
     test tests::works ... ok
    -

    Take a look at the expansion for our curry macro, via cargo +nightly expand --tests smoke:

    -
    type _add_T0 = u32;
    +

    Take a look at the expansion for our curry macro, via +cargo +nightly expand --tests smoke:

    +
    type _add_T0 = u32;
     type _add_T1 = impl Fn(u32) -> _add_T0;
     type _add_T2 = impl Fn(u32) -> _add_T1;
     fn add(x: u32) -> _add_T2 {
    -  return (move |y| {
    +  return (move |y| {
         move |z| {
    -      return x + y + z;
    +      return x + y + z;
         }
       });
     }
     
     // a bunch of other stuff generated by #[test] and assert_eq!

    A sight for sore eyes.

    -

    Here is a more complex example that generates ten multiples of the first ten natural numbers:

    -
    #[curry]
    +

    Here is a more complex example that generates ten multiples of the +first ten natural numbers:

    +
    #[curry]
     fn product(x: u32, y: u32) -> u32 {
       x * y
     }
     
     fn multiples() -> Vec<Vec<u32>>{
       let v = (1..=10).map(product);
    -  return (1..=10)
    +  return (1..=10)
           .map(|x| v.clone().map(|f| f(x)).collect())
           .collect();
     }

    Notes

    -

    I didn’t quite explain why we use move |arg| in our closure. This is because we want to take ownership of the variable supplied to us. Take a look at this example:

    -
    let v = add(5);
    +

    I didn’t quite explain why we use move |arg| in our +closure. This is because we want to take ownership of the variable +supplied to us. Take a look at this example:

    +
    let v = add(5);
     let g;
     {
       let x = 5;
       g = v(x);
     }
     println!("{}", g(2));
    -

    Variable x goes out of scope before g can return a concrete value. If we take ownership of x by moveing it into our closure, we can expect this to work reliably. In fact, rustc understands this, and forces you to use move.

    -

    This usage of move is exactly why a curried function without a return is useless. Every variable we pass to our curried function gets moved into its local scope. Playing with these variables cannot cause a change outside this scope. Returning is our only method of interaction with anything beyond this function.

    +

    Variable x goes out of scope before g can +return a concrete value. If we take ownership of x by +moveing it into our closure, we can expect this to work +reliably. In fact, rustc understands this, and forces you to use +move.

    +

    This usage of move is exactly why a curried +function without a return is useless. Every variable we pass to +our curried function gets moved into its local scope. Playing with these +variables cannot cause a change outside this scope. Returning is our +only method of interaction with anything beyond this function.

    Conclusion

    -

    Currying may not seem to be all that useful. Curried functions are unwieldy in Rust because the standard library is not built around currying. If you enjoy the possibilities posed by currying, consider taking a look at Haskell or Scheme.

    -

    My original intention with peppe.rs was to post condensed articles, a micro blog, but this one turned out extra long.

    +

    Currying may not seem to be all that useful. Curried functions are +unwieldy in Rust because the standard library is not built around +currying. If you enjoy the possibilities posed by currying, consider +taking a look at Haskell or Scheme.

    +

    My original intention with peppe.rs +was to post condensed articles, a micro blog, but this one turned out +extra long.

    Perhaps I should call it a ‘macro’ blog :)

    -
    +

      -
    1. https://doc.rust-lang.org/book/ch13-01-closures.html↩︎

    2. -
    3. caniuse.rs contains an indexed list of features and their status.↩︎

    4. +
    5. https://doc.rust-lang.org/book/ch13-01-closures.html↩︎

    6. +
    7. caniuse.rs contains an +indexed list of features and their status.↩︎

    -- cgit v1.2.3