Documentation/technical/unambiguous-types.adoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229

= Unambiguous types

Most of these mappings are obvious, but there are some nuances and gotchas with
Rust FFI (Foreign Function Interface).

This document defines clear, one-to-one mappings between primitive types in C,
Rust (and possible other languages in the future). Its purpose is to eliminate
ambiguity in type widths, signedness, and binary representation across
platforms and languages.

For Git, the only header required to use these unambiguous types in C is
`git-compat-util.h`.

== Boolean types
[cols="1,1", options="header"]
|===
| C Type | Rust Type
| bool^1^       | bool
|===

== Integer types

In C, `<stdint.h>` (or an equivalent) must be included.

[cols="1,1", options="header"]
|===
| C Type | Rust Type
| uint8_t    | u8
| uint16_t   | u16
| uint32_t   | u32
| uint64_t   | u64

| int8_t     | i8
| int16_t    | i16
| int32_t    | i32
| int64_t    | i64
|===

== Floating-point types

Rust requires IEEE-754 semantics.
In C, that is typically true, but not guaranteed by the standard.

[cols="1,1", options="header"]
|===
| C Type | Rust Type
| float^2^      | f32
| double^2^     | f64
|===

== Size types

These types represent pointer-sized integers and are typically defined in
`<stddef.h>` or an equivalent header.

Size types should be used any time pointer arithmetic is performed e.g.
indexing an array, describing the number of elements in memory, etc...

[cols="1,1", options="header"]
|===
| C Type | Rust Type
| size_t^3^     | usize
| ptrdiff_t^4^  | isize
|===

== Character types

This is where C and Rust don't have a clean one-to-one mapping. A C `char` is
an 8-bit type that is signless (neither signed nor unsigned) which causes
problems with e.g. `make DEVELOPER=1`. Rust's `char` type is an unsigned 32-bit
integer that is used to describe Unicode code points. Even though a C `char`
is the same width as `u8`, `char` should be converted to u8 where it is
describing bytes in memory. If a C `char` is not describing bytes, then it
should be converted to a more accurate unambiguous type.

While you could specify `char` in the C code and `u8` in Rust code, it's not as
clear what the appropriate type is, but it would work across the FFI boundary.
However the bigger problem comes from code generation tools like cbindgen and
bindgen. When cbindgen see u8 in Rust it will generate uint8_t on the C side
which will cause differ in signedness warnings/errors. Similaraly if bindgen
see `char` on the C side it will generate `std::ffi::c_char` which has its own
problems.

=== Notes
^1^ This is only true if stdbool.h (or equivalent) is used. +
^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the
platform/arch for C does not follow IEEE-754 then this equivalence does not
hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but
there may be a strange platform/arch where even this isn't true. +
^3^ C also defines uintptr_t, but this should not be used in Git. +
^4^ C also defines ssize_t and intptr_t, but these should not be used in Git. +

== Problems with std::ffi::c_* types in Rust
TL;DR: They're not guaranteed to match C types for all possible C
compilers/platforms/architectures.

Only a few of Rust's C FFI types are considered safe and semantically clear to
use: +

* `c_void`
* `CStr`
* `CString`

Even then, they should be used sparingly, and only where the semantics match
exactly.

The std::os::raw::c_* (which is deprecated) directly inherits the problems of
core::ffi, which changes over time and seems to make a best guess at the
correct definition for a given platform/target. This probably isn't a problem
for all platforms that Rust supports currently, but can anyone say that Rust
got it right for all C compilers of all platforms/targets?

On top of all of that we're targeting an older version of Rust which doesn't
have the latest mappings.

To give an example: c_long is defined in
footnote:[https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#175-189[c_long in 1.63.0]]
footnote:[https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#135-151[c_long in 1.89.0]]

=== Rust version 1.63.0

[source]
----
mod c_long_definition {
    cfg_if! {
        if #[cfg(all(target_pointer_width = "64", not(windows)))] {
            pub type c_long = i64;
            pub type NonZero_c_long = crate::num::NonZeroI64;
            pub type c_ulong = u64;
            pub type NonZero_c_ulong = crate::num::NonZeroU64;
        } else {
            // The minimal size of `long` in the C standard is 32 bits
            pub type c_long = i32;
            pub type NonZero_c_long = crate::num::NonZeroI32;
            pub type c_ulong = u32;
            pub type NonZero_c_ulong = crate::num::NonZeroU32;
        }
    }
}
----

=== Rust version 1.89.0

[source]
----
mod c_long_definition {
    crate::cfg_select! {
        any(
            all(target_pointer_width = "64", not(windows)),
            // wasm32 Linux ABI uses 64-bit long
            all(target_arch = "wasm32", target_os = "linux")
        ) => {
            pub(super) type c_long = i64;
            pub(super) type c_ulong = u64;
        }
        _ => {
            // The minimal size of `long` in the C standard is 32 bits
            pub(super) type c_long = i32;
            pub(super) type c_ulong = u32;
        }
    }
}
----

Even for the cases where C types are correctly mapped to Rust types via
std::ffi::c_* there are still problems. Let's take c_char for example. On some
platforms it's u8 on others it's i8.

=== Subtraction underflow in debug mode

The following code will panic in debug on platforms that define c_char as u8,
but won't if it's an i8.

[source]
----
let mut x: std::ffi::c_char = 0;
x -= 1;
----

=== Inconsistent shift behavior

`x` will be 0xC0 for platforms that use i8, but will be 0x40 where it's u8.

[source]
----
let mut x: std::ffi::c_char = 0x80;
x >>= 1;
----

=== Equality fails to compile on some platforms

The following will not compile on platforms that define c_char as i8, but will
if it's u8. You can cast x e.g. `assert_eq!(x as u8, b'a');`, but then you get
a warning on platforms that use u8 and a clean compilation where i8 is used.

[source]
----
let mut x: std::ffi::c_char = 0x61;
assert_eq!(x, b'a');
----

== Enum types
Rust enum types should not be used as FFI types. Rust enum types are more like
C union types than C enum's. For something like:

[source]
----
#[repr(C, u8)]
enum Fruit {
    Apple,
    Banana,
    Cherry,
}
----

It's easy enough to make sure the Rust enum matches what C would expect, but a
more complex type like.

[source]
----
enum HashResult {
    SHA1([u8; 20]),
    SHA256([u8; 32]),
}
----

The Rust compiler has to add a discriminant to the enum to distinguish between
the variants. The width, location, and values for that discriminant is up to
the Rust compiler and is not ABI stable.