Thomas Herzog

Creating C/C++ APIs in Rust

Posted at — May 15, 2019

Rust is an amazing language with an even better ecosystem. Many design decisions of Rust make it a great fit to add new functionality to existing C/C++ systems or gradually replace parts of those systems!

When I tried to make a C++ API for a Rust library, I found that binding from C/C++ to Rust is better documented and has a smoother experience than binding from Rust to C/C++.

As I found out, this is not necessarily the case! There are amazing tools to help you make C/C++ APIs. This post presents a tiny bit of my experience in using those tools and hopefully helps somebody with the same quest :)

Using C/C++ APIs in Rust

This is the most common case and also the most widespread use of Rust’s FFI system.

The easiest way to get started is to use the bindgen tool.

bindgen will create Rust code that binds to a given C or C++ API.

This works pretty well for C APIs but generating bindings for C++ APIs is limited. Most notably, inheritance comes with all kinds of troubles since many C++ compilers implement it in different ways and Rust is also limited in how it can mimic it.

Since this is about using Rust to expose functionality to C/C++ (not the other way around), bindgen will not be explored any further here.

Tools for writing C APIs in Rust

The less common case is that you want to make some functionality from Rust available inside of a C or C++ codebase.

The most polished (but tedious) workflow is to manually create a C API for the Rust code.

The tool of choice should be cbindgen, which scans the Rust crate (and optionally dependencies) for items that can be accessed through C.

Binding types with cbindgen

In order to have cbindgen generate C bindings for Rust types, the Rust types need to be pub and #[repr(C)] or #[repr(transparent)] (or #[repr(NUM_TYPE)] for enums).

Example:

#[repr(C)]
#[derive(Copy, Clone)]
pub struct NumPair {
    pub first: u64,
    pub second: usize,
}

This will generate C bindings similar to this:

typedef struct NumPair {
    uint64_t first;
    uintptr_t second;
} NumPair;

Even Rust enums can be bound to C! If #[repr(C)] is applied to an enum then a stable layout is used that can cleanly map to C (using a Tag-enum and a union type).

Binding functions with cbindgen

Functions have similar requirements to be picked up by cbindgen as types:

A function to be bound needs to be: - pub - extern "C" - #[no_mangle]

If all those conditions are met then a C function declaration will be emitted.

Example:

#[no_mangle]
pub extern "C" fn process_pair(pair: NumPair) -> f64 {
    (pair.first as f64 * pair.second as f64) + 4.2
}

will generate the C function declaration similar to this:

double process_pair(NumPair pair);

Exported functions do not need to be unsafe, but as soon as any kind of C-pointer/Rust-reference passing is involved a function should be marked as unsafe.

Personal experience when exposing multiple Rust functions

I personally tried to use cbindgen to make a C++ API (by creating a C++ class and using the C functions internally) in the beginning and found that it works nicely, but needing to write manual binding code is getting annoying and time-consuming pretty fast. That said, it’s the best option when creating a C API.

For exposing only types to C, cbindgen is really painless, you simply annotate the type and that’s it. 👍

Tools for writing C++ APIs in Rust

Supporting some kind of C++ FFI is generally very challenging to do for a programming language, as C++ name mangling, calling conventions, layout of classes, layout of vtables and other things often differ between compilers.

Because of this, Rust does not have a native way to expose functionality to C++, but both C++ and Rust support C!

Actually, cbindgen also supports outputting data types in C++ style, with templates and all! This is really handy, as types that have generic parameters do not end up with long names like Transform2D_ObjectSpace_WorldSpace but use templates instead.

Another handy tool for making C++ APIs in Rust is the cpp crate.

The cpp crate allows you to embed C++ code inside of Rust code using the ´cpp!´ macro. It does this by taking all the in-line C++ code and writing it into a separate .cpp file which will be compiled into the resulting object code of the Rust crate.

cbindgen and cpp complement each other nicely. cbindgen can be used to make your data types accessible and the cpp crate makes it easy to create binding functions which use those types.

What makes this crate useful is that it also allows to embed Rust code inside of the in-line C++ using a “pseudo-macro” rust!(). Any Rust code inside of the rust!() macro will be placed in an extern "C" function and a function call to this new function is added in the C++ code. This means that calling C++ from Rust and Rust from C++ look like fancy versions of closures or blocks.

cpp!{{
#include <stdio.h>
}}

fn add(a: i32, b: i32) -> i32 {
    cpp!([a as "int32_t", b as "int32_t"] -> i32 as "int32_t" {
        printf("adding %d and %d\n", a, b);
        return a + b;
    })
}

cpp!{{
void call_rust() {
    rust!(cpp_call_rust [] {
        println!("This is in Rust!");
    });
}
}}

I think cpp was made primarily to use C++ from inside of Rust, but the rust! pseudo-macro is nice for low-boilerplate cross-language bindings! The only “overhead” is that parameter- and return-types have to be declared in both languages and the Rust pseudo-macro needs a unique name for the generated extern "C" function.

When having to chose between making a C++ class by creating a C API using cbindgen and manually wrapping C-calls or creating the C++ class using cpp, I would chose cpp as the code stays mostly in one place and is easier to update.

Guidelines for creating C++ APIs in Rust

When creating a C/C++ API for a Rust library, some patterns have emerged for myself.

Project setup and planning

Generally, the FFI surface should be kept as small as possible. For example, while it is possible to write all code inside of rust!() macros when creating a C++ class, this is not advisable as memory safety issues can creep in slowly. Also the compiler is not happy to have to do so many complex macro expansions (Hello there, #![recursion_limit = "4096"]!).

Instead, a mostly idiomatic Rust API should be created. The FFI code should be separated as much as possible. A few things might still be good to leak into the Rust API, such as trying to maximize the use of Copy types (one such way to achieve this is by creating a “storage” system where resources can be referenced by indices instead of passing complex data around).

One pattern that arose in most of my projects that expose Rust functionality is to have a ffi directory (or a separate FFI crate) which contains the C/C++ headers and the “Rust FFI surface” code. Any header files are copied into target/include/projectname/ via a build script so the “receiving” end can easily include them.

This “Rust FFI surface code” receives the data passed from C/C++ and creates the idiomatic Rust data from it, then passes control to the “internal” Rust implementation.

src/lib.rs

use log::info;

mod ffi;

#[derive(Default)]
pub struct Adder {
    count: i64,
}

impl Adder {
    pub fn add(&mut self, value: i64) {
        info!("Adder::add()");
        self.count += value;
    }
    
    pub fn tell(&self) -> i64 {
        info!("Adder::tell()");
        self.count
    }
}

src/ffi/adder.hpp

#ifndef ADDER_HPP
#define ADDER_HPP

class Adder {
	void *internal;
public:
	Adder();
	~Adder();
	void add(int64_t value);
	int64_t tell() const;
};

#endif

src/ffi/mod.rs

// src/ffi/mod.rs
use cpp::cpp;

use crate::Adder;

cpp!{{

Adder::Adder() {
    this->internal =
        rust!(Adder_constructor [] -> *mut Adder as "void *" {
            let b = Box::new(Adder::default());
            Box::into_raw(b)
        });
}

Adder::~Adder() {
    rust!(Adder_destructor [internal: *mut Adder as "void *"] {
        let _b = unsafe {
            Box::from_raw(internal)
        };
    });
}

void Adder::add(int64_t value) {
    rust!(Adder_add [
        internal: &mut Adder as "void *",
        value: i64 as "int64_t
    ] {
        internal.add(value);
    });
}

int64_t Adder::tell() const {
    return rust!(Adder_tell [
        internal: &mut Adder as "void *"
    ] -> i64 as "int64_t" {
        internal.tell()
    });
}

}}

Challenges

A FFI “surface” generally consist of two parts:

Passing control between the caller and callee is a “solved problem” ™️. Usually not many problems arise when calling functions, the C FFIs of most languages are pretty mature and handle ABIs and mangling correctly.

One exception to this is panicking in Rust. If code panics and unwinds across a FFI boundary your pants might be eaten. The easiest way to solve this is by changing the panic behavior to "abort" inside the Cargo.toml file.

The more critical part to get right is the data interface and making sure that concepts map properly from one language into another and any “reinterpretation” of data is safe.

Trying to cast a pointer given from the FFI into a reference can result in undefined behavior and possibly use-after-free, so it should be avoided unless you are 100% sure that the lifetimes match up. While not always bringing the best performance, using owned data (or Copy types) greatly simplifies guaranteeing memory safety.

One pattern that emerged in my code is to have two versions of complex data types: the Rust-internal one and a *Ffi version.

use std::os::raw::c_char;
use std::ffi::CStr;
use std::str::Utf8Error;

pub struct Person {
    pub name: String,
    pub favorite_birds: Vec<String>,
}

#[repr(C)]
#[derive(Copy, Clone)]
pub struct PersonFfi {
    pub name: *const c_char,
    pub favorite_birds_data: *const *const c_char,
    pub favorite_birds_len: usize,
}

impl Person {
    pub unsafe fn from_ffi(p: PersonFfi) -> Result<Self, Utf8Error> {
        use std::slice::from_raw_parts;
    
        unsafe fn ptr_to_string(
            ptr: *const c_char,
        ) -> Result<String, std::str::Utf8Error> {
            let cstr = CStr::from_ptr(ptr);
            
            Ok(cstr.to_str()?.to_string())
        }
        
        let name = ptr_to_string(p.name)?;
        
        let favorite_birds = {
            let slice = from_raw_parts(
                p.favorite_birds_data,
                p.favorite_birds_len,
            );
            let mut res = Vec::with_capacity(p.favorite_birds_len);
            for bird in slice {
                let name = ptr_to_string(*bird)?;
                res.push(name);
            }
            res
        };

        Ok(Person {
            name,
            favorite_birds,
        })
    }
}

These *Ffi types should reside inside the src/ffi directory (or FFI crate).

I personally found that if the FFI code lives inside the same crate as the “main logic” then another challenge can be to know what items should be pub, pub(crate) or hidden. If all access to your code is via an FFI that lives in the same crate, technically everything can be pub(crate) since no outside crate will ever access any items.

Using pub then becomes a mean to get a nice rustdoc documentation for your Rust code, but it also means that warnings such as “unused function” are no longer reported.

Summary

Rust is a good fit for creating C or C++ APIs, the language requires no special runtime, functions are easily callable through a C FFI and user-created types (and primitives) easily translate into C-compatible types.

There are nice tools such as bindgen, cbindgen and cpp which help reduce boilerplate and automate error-prone binding of types and functions.

When binding a Rust library to C/C++, a clear separation between the core-logic and the FFI layer should exist. In the best case the FFI code lives in a separate crate so that designing the Rust API is not affected too much by the presence of a FFI and choosing visibility modifiers becomes easier.

When exposing data-types to a FFI, Copy types are the nicest to work with, and for non-Copy types owned data should be preferred over borrowed data.

If there are any other tools available that I missed that try to solve the same problems, please let me know!