= Unambiguous types Most of these mappings are obvious, but there are some nuances and gotchas with Rust FFI (Foreign Function Interface). This document defines clear, one-to-one mappings between primitive types in C, Rust (and possible other languages in the future). Its purpose is to eliminate ambiguity in type widths, signedness, and binary representation across platforms and languages. For Git, the only header required to use these unambiguous types in C is `git-compat-util.h`. == Boolean types [cols="1,1", options="header"] |=== | C Type | Rust Type | bool^1^ | bool |=== == Integer types In C, `` (or an equivalent) must be included. [cols="1,1", options="header"] |=== | C Type | Rust Type | uint8_t | u8 | uint16_t | u16 | uint32_t | u32 | uint64_t | u64 | int8_t | i8 | int16_t | i16 | int32_t | i32 | int64_t | i64 |=== == Floating-point types Rust requires IEEE-754 semantics. In C, that is typically true, but not guaranteed by the standard. [cols="1,1", options="header"] |=== | C Type | Rust Type | float^2^ | f32 | double^2^ | f64 |=== == Size types These types represent pointer-sized integers and are typically defined in `` or an equivalent header. Size types should be used any time pointer arithmetic is performed e.g. indexing an array, describing the number of elements in memory, etc... [cols="1,1", options="header"] |=== | C Type | Rust Type | size_t^3^ | usize | ptrdiff_t^4^ | isize |=== == Character types This is where C and Rust don't have a clean one-to-one mapping. A C `char` is an 8-bit type that is signless (neither signed nor unsigned) which causes problems with e.g. `make DEVELOPER=1`. Rust's `char` type is an unsigned 32-bit integer that is used to describe Unicode code points. Even though a C `char` is the same width as `u8`, `char` should be converted to u8 where it is describing bytes in memory. If a C `char` is not describing bytes, then it should be converted to a more accurate unambiguous type. While you could specify `char` in the C code and `u8` in Rust code, it's not as clear what the appropriate type is, but it would work across the FFI boundary. However the bigger problem comes from code generation tools like cbindgen and bindgen. When cbindgen see u8 in Rust it will generate uint8_t on the C side which will cause differ in signedness warnings/errors. Similaraly if bindgen see `char` on the C side it will generate `std::ffi::c_char` which has its own problems. === Notes ^1^ This is only true if stdbool.h (or equivalent) is used. + ^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the platform/arch for C does not follow IEEE-754 then this equivalence does not hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but there may be a strange platform/arch where even this isn't true. + ^3^ C also defines uintptr_t, but this should not be used in Git. + ^4^ C also defines ssize_t and intptr_t, but these should not be used in Git. + == Problems with std::ffi::c_* types in Rust TL;DR: They're not guaranteed to match C types for all possible C compilers/platforms/architectures. Only a few of Rust's C FFI types are considered safe and semantically clear to use: + * `c_void` * `CStr` * `CString` Even then, they should be used sparingly, and only where the semantics match exactly. The std::os::raw::c_* (which is deprecated) directly inherits the problems of core::ffi, which changes over time and seems to make a best guess at the correct definition for a given platform/target. This probably isn't a problem for all platforms that Rust supports currently, but can anyone say that Rust got it right for all C compilers of all platforms/targets? On top of all of that we're targeting an older version of Rust which doesn't have the latest mappings. To give an example: c_long is defined in footnote:[https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#175-189[c_long in 1.63.0]] footnote:[https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#135-151[c_long in 1.89.0]] === Rust version 1.63.0 [source] ---- mod c_long_definition { cfg_if! { if #[cfg(all(target_pointer_width = "64", not(windows)))] { pub type c_long = i64; pub type NonZero_c_long = crate::num::NonZeroI64; pub type c_ulong = u64; pub type NonZero_c_ulong = crate::num::NonZeroU64; } else { // The minimal size of `long` in the C standard is 32 bits pub type c_long = i32; pub type NonZero_c_long = crate::num::NonZeroI32; pub type c_ulong = u32; pub type NonZero_c_ulong = crate::num::NonZeroU32; } } } ---- === Rust version 1.89.0 [source] ---- mod c_long_definition { crate::cfg_select! { any( all(target_pointer_width = "64", not(windows)), // wasm32 Linux ABI uses 64-bit long all(target_arch = "wasm32", target_os = "linux") ) => { pub(super) type c_long = i64; pub(super) type c_ulong = u64; } _ => { // The minimal size of `long` in the C standard is 32 bits pub(super) type c_long = i32; pub(super) type c_ulong = u32; } } } ---- Even for the cases where C types are correctly mapped to Rust types via std::ffi::c_* there are still problems. Let's take c_char for example. On some platforms it's u8 on others it's i8. === Subtraction underflow in debug mode The following code will panic in debug on platforms that define c_char as u8, but won't if it's an i8. [source] ---- let mut x: std::ffi::c_char = 0; x -= 1; ---- === Inconsistent shift behavior `x` will be 0xC0 for platforms that use i8, but will be 0x40 where it's u8. [source] ---- let mut x: std::ffi::c_char = 0x80; x >>= 1; ---- === Equality fails to compile on some platforms The following will not compile on platforms that define c_char as i8, but will if it's u8. You can cast x e.g. `assert_eq!(x as u8, b'a');`, but then you get a warning on platforms that use u8 and a clean compilation where i8 is used. [source] ---- let mut x: std::ffi::c_char = 0x61; assert_eq!(x, b'a'); ---- == Enum types Rust enum types should not be used as FFI types. Rust enum types are more like C union types than C enum's. For something like: [source] ---- #[repr(C, u8)] enum Fruit { Apple, Banana, Cherry, } ---- It's easy enough to make sure the Rust enum matches what C would expect, but a more complex type like. [source] ---- enum HashResult { SHA1([u8; 20]), SHA256([u8; 32]), } ---- The Rust compiler has to add a discriminant to the enum to distinguish between the variants. The width, location, and values for that discriminant is up to the Rust compiler and is not ABI stable.