Skip to content

Fast clamp

Sometimes one needs to bound a value between two limits, min and max. This operation is called clamp.

Let’s consider the clamp of an integer to the interval [0, 255].

int clamp(int x) {
    return x<0 ? 0 : x > 255 ? 255 : x;
}

JPEG decoding requires three such clamp operations per decoded pixel, so we’d like to find a fast way to do it. (I’m mainly concerned with the ARM architecture as often found on Android).

A faster way to do it is:

int clamp(int x) {
    return !(x & 0xffffff00) ? x : ((~x >> 31) & 0xff);
}

The ~x>>31 works because it’s an ASR, Arithmetic Shift Right, which feeds from the left copies of the sign bit, not zero bits.

An even faster way to do it, as discussed here, is:

int clamp(int x) {
    unsigned y;
    return !(y=x>>8) ? x : (0xff ^ (y>>24));
}

Of course, you wouldn’t like to pay the price of a function call for such a tiny operation, so let’s make the clamp() inlinable by declaring it static inline.

static inline int clamp(int x) {
    unsigned y;
    return !(y=x>>8) ? x : (0xff ^ (y>>24));
}

That’s about as fast as it gets while staying at the C level. Perhaps we can make it even faster by using some cool ARM instruction?

The USAT (Unsigned SATurate) ARM instruction does exactly what we want, and here’s how to put it cleanly in code:

static inline int clamp(int x) {
    int ret;
    asm("USAT %0, #8, %1": "=r" (ret): "r" (x));
    return ret;
}

This way, we cut the clamp() down to a single processor instruction! (yet losing the portability of the pure-C code).

Anything faster? it turns out, the USAT can do at the same time (in the same instruction) an optional shift left/right (ASR or LSL). So if it happens that you anyway needed a fixed shift before the clamp(), you can integrate the two in a shiftAndClamp(), and this way you achieve both the shift and the clamp with a single instruction. We could say that the clamp() alone now takes less than one instruction :)

Post a Comment

Your email is never published nor shared. Required fields are marked *