Sometimes one needs to bound a value between two limits, min and max. This operation is called *clamp*.

Let’s consider the clamp of an integer to the interval [0, 255].

int clamp(int x) { return x<0 ? 0 : x > 255 ? 255 : x; }

JPEG decoding requires three such clamp operations per decoded pixel, so we’d like to find a fast way to do it. (I’m mainly concerned with the ARM architecture as often found on Android).

A faster way to do it is:

int clamp(int x) { return !(x & 0xffffff00) ? x : ((~x >> 31) & 0xff); }

The *~x>>31* works because it’s an ASR, *Arithmetic Shift Right*, which feeds from the left copies of the sign bit, not zero bits.

An even faster way to do it, as discussed here, is:

int clamp(int x) { unsigned y; return !(y=x>>8) ? x : (0xff ^ (y>>24)); }

Of course, you wouldn’t like to pay the price of a function call for such a tiny operation, so let’s make the clamp() inlinable by declaring it *static inline*.

static inline int clamp(int x) { unsigned y; return !(y=x>>8) ? x : (0xff ^ (y>>24)); }

That’s about as fast as it gets while staying at the C level. Perhaps we can make it even faster by using some cool ARM instruction?

The USAT (*Unsigned SATurate*) ARM instruction does exactly what we want, and here’s how to put it cleanly in code:

static inline int clamp(int x) { int ret; asm("USAT %0, #8, %1": "=r" (ret): "r" (x)); return ret; }

This way, we cut the clamp() down to a single processor instruction! (yet losing the portability of the pure-C code).

Anything faster? it turns out, the USAT can do at the same time (in the same instruction) an optional shift left/right (ASR or LSL). So if it happens that you anyway needed a fixed shift before the clamp(), you can integrate the two in a shiftAndClamp(), and this way you achieve both the shift and the clamp with a single instruction. We could say that the clamp() alone now takes less than one instruction

## Post a Comment