Skip to content

3430 bytes, the smallest APK ever!

What is the absolute smallest size of a “Hello World” app that can be published on the market?

It’s 3430 bytes, and the app is here for your enjoyment: https://play.google.com/store/apps/details?id=h.w

Size Compressed Name
345 248 META-INF/MANIFEST.MF
466 314 META-INF/A.SF
446 363 META-INF/A.RSA
97 97 res/drawable/a.png
1300 489 AndroidManifest.xml
588 588 resources.arsc
820 463 classes.dex
4062 2562 total 7 files, 0 folders

PS: for an APK that can be installed through ADB but is not publishable on the market, the minimum I found is 2259 bytes — it doesn’t need to have an icon and other elements that are enforced by the market but not by the framework.

The smallest Android license check

We all know the License Verification Library a.k.a. Google LVL. It is a bunch of code witch does the IPC call to the Market, verifies the signature of the answer, interprets the result according to policies, stores the result encrypted on storage, etc.

LVL has 20+ classes/interfaces and 2000 lines of code. It is also quite cracked, with tools available that automatically patch an APK that uses the LVL (even obfuscated), producing an un-protected APK that skips the license check, with no effort at all.

Google advises to “roll your own” license verification using the LVL source only as a starting point, in order to circumvent the generic cracking tools. You may start with LVL and gradually remove bits you don’t need, or you may start from “zero”, the bare minimum IPC call/response to the Market, and build up.

Here is the simplest IPC call to the market, for illustration only. It is not fit as a real license check, it’s just a starting point on which you can build. The advantage is that it’s quite.. extremely small — 50 lines of code. PS: don’t forget to add <uses-permission android:name=”com.android.vending.CHECK_LICENSE” /> to your manifest.

import android.content.*;
import android.os.*;

class LicenseListener extends android.os.Binder {
    static final String LISTENER = "com.android.vending.licensing.ILicenseResultListener";

    public boolean onTransact(int op, Parcel in, Parcel reply, int flags) {
        if (op == 1) {
            in.enforceInterface(LISTENER);
            int code         = in.readInt();
            String data      = in.readString();
            String signature = in.readString();
            if (code == 0 || code == 2) {
                // LICENSED or LICENSED_OLD_KEY
            } else if (code == 1) {
                // NOT_LICENSED
            } else {
                // ERROR
            }
        }
        return true;
    }
}

class License {
    static final String SERVICE = "com.android.vending.licensing.ILicensingService";

    static void check(Context context) {
        context.bindService(
            new Intent(SERVICE),
            new ServiceConnection() {
                public void onServiceConnected(ComponentName name, IBinder binder) {
                    Parcel d = Parcel.obtain();
                    try {
                        d.writeInterfaceToken(LICENSING_SERVICE);
                        d.writeLong(0);
                        d.writeString(context.getPackageName());
                        d.writeStrongBinder(new LicenseListener());
                        binder.transact(1, d, null, IBinder.FLAG_ONEWAY);
                    } catch (RemoteException e) {
                    }
                    d.recycle();
                }

                public void onServiceDisconnected(ComponentName name) {}
            },
            Context.BIND_AUTO_CREATE);
    }
}

Android: obfuscating the JNI surface layer

Proguard is a popular Java obfuscator. It makes the compiled bytecode (either JVM or Dalvik) harder to understand when decompiled (obfuscation), and also smaller by class/method renaming and optimization.

There are two aspects of the Java source that are not affected by Proguard. One are the classes that interface with Android, basically the Activities, Views, Services, etc, which are referenced by name in the AndroidManifest or in the resource files. Because they are referenced by class name and use special method names (e.g. onCreate), Proguard won’t obfuscate such classes.

The other part that is not touched by Proguard are the native method names. The JNI methods keep their names unchanged, and these names are visible, upon decompilation, in two places: in the Dalvik bytecode, and in the native library (.so).

The fact that this JNI layer interfacing Java with native code is not obfuscated is a significant weakness — for example it allows an attacker to extract and reuse the native library (.so) unchanged, as the JNI interface is clearly exposed.

I present here a simple trick which allows obfuscation of the JNI layer, renaming the method names to meaningless names on both the Java and native side, while keeping the source code relatively readable and maintainable and without affecting performance.

Let’s consider an example, initial situation:

class Native {
    native static int rotateRGBA(int rgb, int w, int h);
}

extern "C" int Java_pakage_Native_rotateRGBA(JNIEnv *env, jclass, int rgb, int w, int h);

In the example above Proguard can’t obfuscate the method name rotateRGBA, which remains visible on the Java side and on the native side.

The solution is to use directly a meaningless method name in the source, while taking care to minimally disrupt the readability and maintainability of the code.

class Native {
    private native static int a(int rgb, int w, int h); //rotateRGBA

    static int rotateRGBA(int rgb, int w, int h) {
        return a(rgb, w, h);
    }
}

// rotateRGBA
extern "C" int Java_pakage_Native_a(JNIEnv *env, jclass, int rgb, int w, int h);

The JNI method is renamed to a meaningless a. But the call on the Java side is wrapped by the meaningfully named method rotateRGBA. The Java clients continue to invoke Native.rotateRGBA() as before, without being affected at all by the rename.

What is interesting is that the new Native.rotateRGBA method is not native anymore, and thus can be renamed by Proguard at will. The result is that the name rotateRGBA completely disappears from the obfuscated code, on both Dalvik and native side. What’s more, Proguard optimizes away the wrapper method, thus removing the (negligible) performance impact of wrapping the native call.

Conclusion: eliminated the JNI method name from the obfuscated code (both Dalvik bytecode and native library), with minimal impact to readability and no performance impact.

Crude native code protection on Android

Any application code (APK) that is published becomes exposed to decompilation. Obfuscating the Java code (e.g. with Proguard) makes the Java decompilation harder.

But the Proguard obfuscation does not affect the native code (NDK, JNI) of the application in any way. What’s more, the JNI access points to the native library are clearly visible with explicit names. An attacker can simply extract the .so library from the APK and directly reuse it in a different application.

Then, how to protect the native library against such undesired reuse? One tentative solution is presented here: Protecting IP in Android Applications.

But that solution has the drawback of obtaining critical information, such as the application package name, from a Java Context instance that is passed in from the Java side down to the native library. An attacker who has control of the Java side could manufacture a fake instance before passing it in.

A better solution is for the native code to obtain directly the information it needs, such as the package name, without using a Context instance passed in by the attacker. For this the native side can use

static android.app.ActivityThread.currentPackageName();

This way it becomes significantly harder for the attacker to present a fake package name.

Next, the most basic protection is for the native library to refuse to function if the package name doesn’t match. Of course all this should be done in a rather obfuscated way — otherwise the check can be easily discovered and removed by the attacker through binary patching of the library.

One more question: if the native library relies only on a simple check of application package name, what prevents the attacker from reusing the package name of the original application for his own app? The simple reason is the “unique package name” policy that is enforced by the Android Market. Thus the attacker, if he chooses to re-use the package name, would not be able to publish the fake app on the Market, which often is enough.

static: the Android application package

The usual way to obtain the application package (as a string) on Android is invoking getPackageName() on an Activity or Application instance:

String Context.getPackageName();

This method is not static, i.e. you must first have a Context instance at hand in order to obtain the package name.

But it feels like it should be possible to obtain the application package in a static way too.. After all the code is executing in the context of a specific APK, which determines the application package. The package is also globally specified in the AndroidManifest.xml.

Yet after some investigation it becomes clear that the android API overlords specifically wanted to make this task inaccessible to the mortal developers.

After some more searching one solution comes to light:

static String android.app.ActivityThread.currentPackageName();

As you see, this method is static, as we wanted it; but there are two caveats:

  • ActivityThread.currentPackageName() must be invoked from the main thread of the application
  • the android.app.ActivityThread class is API-hidden, so you need tricks to access it

Fast clamp

Sometimes one needs to bound a value between two limits, min and max. This operation is called clamp.

Let’s consider the clamp of an integer to the interval [0, 255].

int clamp(int x) {
    return x<0 ? 0 : x > 255 ? 255 : x;
}

JPEG decoding requires three such clamp operations per decoded pixel, so we’d like to find a fast way to do it. (I’m mainly concerned with the ARM architecture as often found on Android).

A faster way to do it is:

int clamp(int x) {
    return !(x & 0xffffff00) ? x : ((~x >> 31) & 0xff);
}

The ~x>>31 works because it’s an ASR, Arithmetic Shift Right, which feeds from the left copies of the sign bit, not zero bits.

An even faster way to do it, as discussed here, is:

int clamp(int x) {
    unsigned y;
    return !(y=x>>8) ? x : (0xff ^ (y>>24));
}

Of course, you wouldn’t like to pay the price of a function call for such a tiny operation, so let’s make the clamp() inlinable by declaring it static inline.

static inline int clamp(int x) {
    unsigned y;
    return !(y=x>>8) ? x : (0xff ^ (y>>24));
}

That’s about as fast as it gets while staying at the C level. Perhaps we can make it even faster by using some cool ARM instruction?

The USAT (Unsigned SATurate) ARM instruction does exactly what we want, and here’s how to put it cleanly in code:

static inline int clamp(int x) {
    int ret;
    asm("USAT %0, #8, %1": "=r" (ret): "r" (x));
    return ret;
}

This way, we cut the clamp() down to a single processor instruction! (yet losing the portability of the pure-C code).

Anything faster? it turns out, the USAT can do at the same time (in the same instruction) an optional shift left/right (ASR or LSL). So if it happens that you anyway needed a fixed shift before the clamp(), you can integrate the two in a shiftAndClamp(), and this way you achieve both the shift and the clamp with a single instruction. We could say that the clamp() alone now takes less than one instruction :)

The first Galaxy Nexus challenge too hard for anybody

Google posted the first Galaxy Nexus challenge, which proved too difficult for anybody to solve. After 3 hours of waiting for a solution, Google gave in and posted a spoiler hint Dots are like flag messages, which makes the challenge pretty much trivial.

It looks like the initial task was too hard (nobody was solving it), and after the give-away hint it become too easy. Let’s look forward to the next challenges.

PS: in task #1, each android represents a word, making up “what did j cook call hawaii”, and the answer is “the sandwich islands”.

Thanks to Eric Sink for his new book

I just received by international mail the printed edition of Version Control by Example by Eric Sink, offered as a gift. Thanks!

Fast RGB pixel average on ARM

Here I present a small trick, how to compute extremely fast the average of two RGBA_8888 values on an ARM processor, and how to implement it easily in C++/gcc (e.g. using the Android NDK).

We have two 4-byte values representing two pixels, in format RGBA_8888. Let’s ignore the Alpha channel, assuming both have alpha=0xff (opaque). We want to produce a new 4-byte value with the per-channel average of the two pixels.

Below we see the byte structure of the two pixels and of the desired average:
pixel1: r1|g1|b1|a1
pixel2: r2|g2|b2|a2
average: (r1+r2)/2 | (g1+g2)/2 | (b1+b2)/2 | (a1+a2)/2

To implement this operation in C, we’d need to extract the individual bytes (R,G,B,A) of each pixel, average them (four additions and four shifts) and finally recompose the bytes to form the result (e.g. with shifts and ORs).

It turns out there is a much faster way to do it, which takes exactly.. one CPU instruction!

The magic instruction is UHADD8, which adds the corresponding bytes of two registers, in pairs, halves the result and recomposes the bytes — exactly the operation we needed for our RGBA average.

But how to access this ARM instruction from C++? It turns out, again, that there is a clean and simple way to do it:

extern inline unsigned averageBytes(unsigned a, unsigned b) {
    unsigned ret;
    asm("uhadd8 %0, %1, %2": "=r" (ret): "r" (a), "r" (b));
    return ret;
}

GCC is smart enough that invoking this inline function compiles down to just one CPU instruction (the UHADD8) without any other overhead.

So, problem solved: we have this neat C++ function for averaging two RGBA values, with a total cost of just one CPU instruction!

Android ARM: PLD preload magic

I just discovered the incredible performance boost that can be achieved by using the PLD (“Preload Data”) ARM assembler instruction.

What I needed to do is convert image pixel data from RGB to RGBA format — from 3bytes/pixel to 4bytes/pixel; fullscreen in real time during animation. But the general situation is anytime you need to process a large amount of RAM data really fast.

while (n--) {
    *dest++ = *src++;
}

This loop is plain, it just copies data from a memory source to a destination. It is used here just a as a placeholder for some processing of src data. (of course, if you only need to copy the data you should use memcpy instead)

Let’s time this loop over 1MB of data, on a Samsung Galaxy Tab 10.1 with a Tegra 2 processor — it takes about 25ms. What slows the loop down is waiting for data that is not in the processor cache to be fetched from the main memory, which is slow. We can fix this by directing the CPU to prefetch data ahead of the read. We modify the loop adding the PLD magic line:

while (n--) {
    asm ("PLD [%0, #128]"::"r" (src));
    *dest++ = *src++;
}

That asm line starts preloading data from memory to the CPU cache, 128 bytes ahead of the current src location, without blocking the CPU.

We measure again, and the same loop over 1MB of data now takes only 8ms instead of 25ms — it is three times faster! Amazing for that 1-liner, I say. By the way, this is now very close to the performance of memcpy, which is itself implemented in highly-optimized ARM assembly.

You may observe that our loop may be optimized a little bit further by doing partial unrolling — processing more than a single element at each iteration.

With partial loop unrolling:

n /= 4; //assume it's multiple of 4
while (n--) {
    asm ("PLD [%0, #128]"::"r" (src));
    *dest++ = *src++;
    *dest++ = *src++;
    *dest++ = *src++;
    *dest++ = *src++;
}

The conclusion is that if you find yourself optimizing to death some piece of C/C++ code on Android that reads a lot of memory, you should try using the PLD and profile again to see if it helps.. Enjoy!

asm ("PLD [%0, #128]"::"r" (src));

PS:
If you’re curious about the RGB_888 to RGBA_8888 conversion speed, it is possible to do a fullscreen conversion (1280×752 px) on the Tab in about 7ms, which is quite impressive IMO. This is faster than the corresponding memcpy() RGBA to RGBA which takes about 8ms, and thus makes the case for the introduction of the RGB_888 (3bytes/pixel) Bitmap format in the Android Java API (as it saves RAM and memory bandwidth when the Alpha channel isn’t needed).