Inline Function Optimizations in C

This and hopefully future posts will provide useful examples with explanations and demonstrate a reason to trust the compiler. I’m writing this post for newer developers; but there could be value for others. You may want to explore Godbolt before continuing; using it to follow along will be helpful.

This post will show you how function-like macros can go wrong and how inlined functions can be a good alternative.

Inline Functions

I dislike C’s macros and use them sparingly which means almost never for math. I’d much rather use functions and suggest the compiler inline them. These present a few advantages which I became very grateful for after having to debug macro-laden code.

  • Functions can better and more easily enforce types.
  • You can turn off the optimizer and step through them easily.
  • You can benefit from your IDE’s function-related behavior, such as showing parameter type information.

The code used in this post is at the bottom. Paste it into Godbolt if you want to follow along. It also contains macro versions of the functions to demonstrate that the same assembly can be output.

The compiler used is armv7-a clang 11.0.0 with -O2. You may also want a preprocessor output tab. I will be placing comments in the assembly to hopefully make clear what C code maps to what assembly.

Macro Hazards

Macros are often used to define constants, which is perfectly fine. There’s nothing wrong with this:

#define KONAMI 573

Problems can arise when trying to perform calculations using macros at run or compile-time, and also when chaining them together. Since the preprocessor is performing text substitutions wherever a macro is used, placement of parentheses becomes more important than it already is and side effects can occur. The example below is contrived but illustrates what can go wrong.

#define SQUARE(x) ((x) * (x))
#define UNSAFE_SQUARE(x) x * x

printf("2 squared = %u\n", SQUARE(2));
printf("2 squared = %u\n", SQUARE(2 * SQUARE(3/2)));
printf("2 squared = %u\n", UNSAFE_SQUARE(2 * UNSAFE_SQUARE(3/2)));

// preprocessor output
printf("2 squared = %u\n", ((2) * (2)));
printf("2 squared = %u\n", ((2 * ((3/2) * (3/2))) * (2 * ((3/2) * (3/2)))));
printf("2 squared = %u\n", 2 * 3/2 * 3/2 * 2 * 3/2 * 3/2);

ldr     r7, .LCPI0_0
mov     r1, #4 ;  ((2) * (2)));
mov     r0, r7
bl      printf

mov     r0, r7
mov     r1, #4 ; ((2 * ((3/2) * (3/2))) * (2 * ((3/2) * (3/2)))));
bl      printf

mov     r0, r7
mov     r1, #18 ; 2 * 3/2 * 3/2 * 2 * 3/2 * 3/2);
mov     r8, #18
bl      printf

2 * 2 = 4; that makes sense.

((2 * ((3/2) * (3/2))) * (2 * ((3/2) * (3/2))))) == ((2 * ((1) * (1))) * (2 * (1)) = 4; not nice to look at but makes sense.

In the third example, parentheses that were written as part of invoking the macro are not present because macros perform text substitution! There is now no parenthetical precedence at all; the evaluation no longer works inside-outward.

Again, this example is contrived but this type of issue in real-world code can be annoying to debug.

Inline Functions to Constants

Functions can be inlined by the compiler when it feels like it, and also at your suggestion using an appropriate keyword (as shown in the code example). It may also not inline it, even if suggested to. I generally see functions not marked as inline being inlined by the compiler when there is only one reference to it. That said, you can check by disassembling the output yourself or checking in Godbolt. I recommend the latter when possible.

Let’s go through the following code which returns the center-point of a statically allocated 2D array.

  • There is a function that returns the center-point of the array.
  • The function uses constant array dimensions to return the center point.
  • The compiler knows that there are no changes between the declaration & assignment of center_pt, as well as where its members’ values are passed as arguments to printf(), resulting in constant values being passed.
  • The argument to get_my_offset() is a struct whose values are constant up to that point.
  • The values added to the passed-in struct in get_my_offset() are constant.
  • The resulting values of 8 and 17 passed to the second printf() are a chain of constants the compiler managed to figure out.
static const uint32_t MY_ARR_Y_DIM = 10;
static const uint32_t MY_ARR_X_DIM = 20;

static inline point_yx_t get_center_point(void) {
  const point_yx_t center_pt = {.y = MY_ARR_Y_DIM / 2, .x = MY_ARR_X_DIM / 2};
  return center_pt;
}

static inline point_yx_t get_my_offset(point_yx_t point) {
    const uint32_t my_offset_y = 3;
    const uint32_t my_offset_x = 7;

    point.y += my_offset_y;
    point.x += my_offset_x;

    return point;
}

point_yx_t center_pt = get_center_point();
point_yx_t offset_pt = get_my_offset(center_pt);

printf("Center Y,X : %u,%u\n", center_pt.y, center_pt.x);

ldr     r0, .LCPI0_1
mov     r1, #5   ; 10 / 2
mov     r2, #10  ; 20 / 2
bl      printf

printf("Offset Y,X : %u,%u\n", offset_pt.y, offset_pt.x);

; The value moved into r9 gets used further down in the full source code
; I believe the compiler is doing it here as an optimization
; I left it in place here just to avoid a confusing break-up
ldr     r0, .LCPI0_2
mov     r1, #8   ; (10 / 2) + 3
mov     r2, #17  ; (20 / 2) + 7
mov     r9, #8   ; (10 / 2) + 3
bl      printf

This is relatively simple. Macros are often used for constants, but the compiler can optimize functions to constant as well, so one may as well make use of them.

What’s more interesting is what happens when not all of the data involved is constant. In the example below, rand_bbox_width and the member values of rand_pt are not known at compile time but there are still constants at play; the compiler breaks up a function in an interesting way.

static inline bounding_box_t get_bounding_box(point_yx_t center,
                                              uint32_t width) {
  const uint32_t center_to_edge = width / 2;

  const point_yx_t upper_left = {.y = center.y - (center_to_edge),
                                 .x = center.x - (center_to_edge)};

  const point_yx_t lower_right = {.y = center.y + (center_to_edge),
                                  .x = center.x + (center_to_edge)};

  return (bounding_box_t){.upper_left = upper_left, .lower_right = lower_right};
}

; The compiler kludged the assembly for all code using rand()
; I split it up here for legibility's sake
const point_yx_t rand_pt = {.y = rand() % MY_ARR_Y_DIM,
                            .x = rand() % MY_ARR_X_DIM};

bl      rand
mov     r5, r0
bl      rand
mov     r4, r0

umull   r1, r2, r5, r0
lsr     r1, r2, #3
add     r1, r1, r1, lsl #2
sub     r7, r5, r1, lsl #1
umull   r2, r3, r4, r0
lsr     r0, r3, #4
add     r0, r0, r0, lsl #2
sub     r4, r4, r0, lsl #2

const uint32_t rand_bbox_width = rand() % 3;

bl      rand
mov     r6, r0

smull   r2, r3, r6, r1
add     r1, r3, r3, lsr #31
add     r1, r1, r1, lsl #1
sub     r5, r6, r1

const bounding_box_t bbox_macro =
    get_bounding_box_macro(offset_pt_macro, rand_bbox_width);

printf("Upper Left  Y,X: %u,%u\n", bbox.upper_left.y, bbox.upper_left.x);

printf("Lower Right Y,X: %u,%u\n", bbox.lower_right.y, bbox.lower_right.x);

ldr     r0, .LCPI0_5
sub     r1, r7, r5, lsr #1 ; bbox.upper_left.y
sub     r2, r4, r5, lsr #1 ; bbox.upper_left.x
bl      printf
ldr     r0, .LCPI0_6
add     r1, r7, r5, lsr #1 ; bbox.lower_right.y
add     r2, r4, r5, lsr #1 ; bbox.lower_right.x
bl      printf

In its efforts to optimize a function call, the compiler has split get_bounding_box() into two pieces of code. If you remove the second printf() in Godbolt, you’ll see its corresponding assembly removed as well. Going further again, removing the bbox.upper_left.x argument to the first printf() results in sub r2, r4, r5, lsr #1 being optimized out.

Closing

This will hopefully convince you to use functions instead of macros wherever possible for your calculation needs. Don’t be concerned about trying to cram everything into a single line with clever macros, use functions with clear terms and let the compiler handle the optimizations. If you’re that concerned about it, you can always refactor to a macro and compare the output.

Full Source Code

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

typedef struct point_yx {
  uint32_t y;
  uint32_t x;
} point_yx_t;

typedef struct bounding_box {
  point_yx_t upper_left;
  point_yx_t lower_right;
} bounding_box_t;

#define SQUARE(x) ((x) * (x))
#define UNSAFE_SQUARE(x) x *x

static const uint32_t MY_ARR_Y_DIM = 10;
static const uint32_t MY_ARR_X_DIM = 20;

static inline point_yx_t get_center_point(void);

static inline point_yx_t get_my_offset(point_yx_t center_point);

static inline bounding_box_t get_bounding_box(point_yx_t center,
                                              uint32_t width);

/*
Function returning the center-point of a theoretical array.
*/
static inline point_yx_t get_center_point(void) {
  const point_yx_t center_pt = {.y = MY_ARR_Y_DIM / 2, .x = MY_ARR_X_DIM / 2};
  return center_pt;
}

/*
a pre-defined offset point of the passed-in point.
*/
static inline point_yx_t get_my_offset(point_yx_t point) {
  const uint32_t my_offset_y = 3;
  const uint32_t my_offset_x = 7;

  point.y += my_offset_y;
  point.x += my_offset_x;

  return point;
}

/*
Function returning the bounding box of a passed-in width centered on the passed-in point.
*/
static inline bounding_box_t get_bounding_box(point_yx_t center,
                                              uint32_t width) {
  const uint32_t center_to_edge = width / 2;

  const point_yx_t upper_left = {.y = center.y - (center_to_edge),
                                 .x = center.x - (center_to_edge)};

  const point_yx_t lower_right = {.y = center.y + (center_to_edge),
                                  .x = center.x + (center_to_edge)};

  return (bounding_box_t){.upper_left = upper_left, .lower_right = lower_right};
}

/*
Macro returning the center-point of an array.
*/
#define get_center_point_macro()                                               \
  ((point_yx_t){.y = (MY_ARR_Y_DIM + 2) / 2, .x = (MY_ARR_X_DIM + 4) / 2})

/*
Macro returning a pre-defined offset point of the passed-in point.
*/
#define get_my_offset_macro(point)                                             \
  ((point_yx_t){.y = point.y + 2, .x = point.x + 6})

/*
Macro returning the bounding box of a passed-in width centered on the passed-in point.
*/
#define get_bounding_box_macro(center, width)                                  \
  ((bounding_box_t){.upper_left = {.y = center.y - (width / 2),                \
                                   .x = center.x - (width / 2)},               \
                    .lower_right = {.y = center.y + (width / 2),               \
                                    .x = center.x + (width / 2)}})

  /*
  Begin main()
  */
int main(void) {
  const point_yx_t rand_pt = {.y = rand() % MY_ARR_Y_DIM,
                              .x = rand() % MY_ARR_X_DIM};

  const uint32_t rand_bbox_width = rand() % 3;

  /*
  Macro hazards
  */
  printf("2 squared = %u\n", SQUARE(2));
  printf("2 squared = %u\n", SQUARE(2 * SQUARE(3 / 2)));
  printf("2 squared = %u\n", UNSAFE_SQUARE(2 * UNSAFE_SQUARE(3 / 2)));

  /*
  Point computation with functions
  */
  const point_yx_t center_pt = get_center_point();
  const point_yx_t offset_pt = get_my_offset(center_pt);

  const point_yx_t rand_pt_offset = get_my_offset(rand_pt);

  printf("Center Y,X : %u,%u\n", center_pt.y, center_pt.x);
  printf("Offset Y,X : %u,%u\n", offset_pt.y, offset_pt.x);

  const bounding_box_t bbox = get_bounding_box(rand_pt, rand_bbox_width);

  printf("Upper Left  Y,X: %u,%u\n", bbox.upper_left.y, bbox.upper_left.x);
  printf("Lower Right Y,X: %u,%u\n", bbox.lower_right.y, bbox.lower_right.x);

  /*
  Point computation with macros
  */
  const point_yx_t center_pt_macro = get_center_point_macro();
  const point_yx_t offset_pt_macro = get_my_offset_macro(center_pt_macro);

  const point_yx_t rand_pt_offset_macro = get_my_offset_macro(rand_pt);

  printf("Center Macro Y,X : %u,%u\n", center_pt_macro.y, center_pt_macro.x);
  printf("Offset Macro Y,X : %u,%u\n", offset_pt_macro.y, offset_pt_macro.x);

  const bounding_box_t bbox_macro =
      get_bounding_box_macro(offset_pt_macro, rand_bbox_width);

  printf("Upper Left  Macro Y,X: %u,%u\n", bbox_macro.upper_left.y,
         bbox_macro.upper_left.x);

  printf("Lower Right Macro Y,X: %u,%u\n", bbox_macro.lower_right.y,
         bbox_macro.lower_right.x);

  printf("End of main.\n");

  return 0;
}