Synergetic Stupidity in Java

January 21st, 2009

In the Java “class” file format specification, the length of the bytecode of any method is limited to 64KB. This arbitrary limitation is bad in itself, but see below what it does in conjunction with the other stupider Java ‘feature’.

In any normal language, a literal array constant (used for array initialization) is represented as a compact block of data. For example when initializing an array with 1000 byte values, you’d expect the initialization data to take up 1KB (1000 x 1B). But not so in Java, not at all.

Let’s consider a long array of bytes, initialized with some data.
static final byte B[] = {10, 20, 30, 40, 50, 60, 70, 80, …};

In the “class” format there is no provision for storing the array initialization data in a compact format. Instead, bytecode instructions are generated for filling-in the array, a small block of bytecodes for each entry of the array data:

dup          //1 byte; duplicate the array address on the stack
sipush 300 //3 bytes; the array index where to store the value
bipush 99   //2 bytes; the value to store
bastore      //1 byte; do the store

This adds up to 7 bytes for each entry of the array. So if you initialize an array of 5000 bytes, this would generate 35KB of bytecode instead of the 5KB that would normally be enough to store the data.

For an array of int, float or larger types, the bytecode structure is a bit different, in that the integer value is not inlined in the bytecode, but instead read from a constant pool.

dup          //1 byte
sipush 300 //3 bytes; the array index
ldc_w  400 //3 bytes; load int from constant pool position
iastore      //1 byte

Let’s put in a small table the amount of data Java uses for initializing arrays of different types:

Type Bytes / entry Stupidity boost
byte[] 7 7 x
short[] 8 4 x
int[], float[] 8 + 4 3 x
long[] 8 + 8 2 x

This form of doing array initialization is in my opinion the stupidest miss of the “class” file format.

But lets see where Stupidity Synergy comes into play. Synergy means that sometimes, when you put together one stupid feature with another stupid feature, you get a result that is larger than the sum of the parts. Java achieves synergy by combining the 64KB method size limit with the peculiar way of compiling the array initialization.

As we’ve seen, bytecode (i.e. method code) is needed for any array initialization. Even initializing a static array at the class level produces an invisible method that contains the code for array initialization. Combining this with the method size limit produces very restrictive limits on the size of array initializations.

Type Array initialization size limit
byte 9K
short 8K
int, float 8K

Nice powerful language, where you get a “code too large” compilation error when you try to initialize a byte array with more than 9K entries…

Mobile Monday Warsaw

May 28th, 2008

I just participated to the first MoMo event to take place in Poland, Mobile Monday Warsaw. There were about eight presentations, of which I most enjoyed the NFC talk by Florian Resatsch from Servtag, and the MRIA (Mobile Rich Internet Application) talk by Alex Nerst of fring (both are start-up companies). The event was well organized, and represented a great opportunity to introduce a new product or make some local industry contacts. It seems MoMo will happen again in Warsaw in the coming months, and hopefully it will also get adopted in other Polish cities (like Krakow).

Congratulations to the organizers!

Dynamically Adjusting Touch Buttons

May 20th, 2008

I describe here a simple technique that can be used by touch buttons (on a touch-screen interface) to improve, over time, the accuracy of the user touches.

Each button keeps track of the place where the user touch occurs. The touches which are close enough to the button centre are deemed sure touches, meaning that the button is pretty sure that the user wanted to touch this particular button and not a neighbouring one. For example, the button can use an embedded smaller circle or ellipse, called the sure area, and all touch that occurs within the sure area is a sure touch.

Each button averages all its sure touches that occur over time. When a certain number of such sure touches is accumulated, their average is compared with the ideal button center. If a systematic bias is detected (e.g. let’s say that the user generally touches with 3 pixels above the button center), the button records the bias and corrects it in future touches. (alternatively this can be seen as the button moving its center to the actual place where the sure touches occur in average).

The adjustment also affects the definition of future sure touches, as a touch is first corrected before being checked for happening within the sure area.

After the adjustment, the button continues to keep track of new sure touches as they occur, and to make new adjustments as needed.

This technique is most effective when every button keeps track of its own correction parameters. For example, a button which is located at the top of the screen may have a different bias (and thus need a different correction) than a button located at the bottom of the screen (or buttons located left - right may need different correction, etc).

This technique allows the interface to ‘learn’ over time the touch habits of a particular user (and to optimize itself for that user); but the interface is also able to adapt to a new user (if the device, for example, changes its owner).

For this technique to work, UI elements (such as buttons) must have access to touch events from outside their visual area on the screen. What the dynamic adaptation achieves is a separation and translation of the touch area of the button independently of its visual area.

The technique as described above supports the translation of the touch area, but can be easily extended to handle other transformations (such as dynamically growing the touch area of a button when the distribution of the touches shows too many touches close to its border, suggesting that the button is too small for the particular user).

When the touch area of buttons is segregated from their location (visual area) and is mobile (can move or grow/shrink), conflicts may appear between neighboring buttons. How to elegantly handle such ‘touch area conflicts’ is left as a thinking exercise to the reader or implementer.

Touch Buttons: the Optimal Shape and Size

May 20th, 2008

While physical buttons were the standard for mobile phones in the past, it seems that the industry is gradually adopting touch-screen interfaces. The iPhone is the best example of a touch interface, but other manufacturers (e.g. Sony-Ericsson, Motorola) are also producing mobile phones with touch-screen interfaces (either exclusively touch-screen, or in addition to physical keys).

Talking about the physical keys, we can imagine that their design, shape and size was the result of elaborate ergonomic studies in order to make them as easy to use as possible (hoping that the trendy look of the device was not the only factor deciding the design of the keys).

Perhaps taking inspiration from the physical keys, the touch buttons usually have a rectangular shape (sometimes with rounded corners). But is the rectangular shape the best one? and if it is, which is the best width/height ration of the rectangle (e.g. a horizontal rectangle, a vertical one, a square, etc). Of course, the answer may depend to some degree of the particular device and the particular user, but perhaps there are also general characteristics of a good touch-screen button shape/size.

I describe below a simple experiment which allows to discover the optimal shape/size. Ask the user to touch a specific spot (’target spot’) on the screen, and record the place where the touch actually occurs. Repeat this many times. Afterwards, you can visualize the area where the actual touches occur, which likely is a shape around the target spot.

This area, the place where the actual touches occur when the user is trying to touch the target, describes the optimal shape of a touch button. It gives information about the size of the button — it should be large enough as to embed the area of the actual touches, and about its shape.

I expect that the touch area resulting from such an experiment describes an ellipse (a ‘flattened circle’). While this is not a big surprise (as it’s normal that the actual touch occur ‘around’ the target, and the ellipse is the generalization of a circle), there is valuable information in the ratio of the ellipse (how flattened it is), and in its size. While the ellipse can be covered with a rectangular button, the rectangular ratio should follow the ratio of the ellipse.

Arity has Complex Numbers

May 13th, 2008

I just released version 1.3.0 of the Arity library, with a major addition: Complex Numbers. In addition to basic operators on complex numbers (like addition, multiplication, division), all the standard function are supported: trigonometric & hyperbolic, logarithm, exponential & power, factorial (Gamma), and even combinations, permutations and GCD. It is nice that many analytic functions behave better on the complex domain than on the real domain: for example, the logarithm is defined for negative arguments, asin() can take arguments greater than 1, and a negative number can be raised to any power (which results in NaN on reals).

Ever wondered how much is: e^(i*pi), log(-1), i!, i^i, sinh(i*pi)? You can download the arity-1.3.0.jar, and evaluate any expression like this:

java -jar arity-1.3.0.jar "e^(i*pi)"

An essential trait of Arity is that it works on MIDP devices (mobile phones) which have limited RAM & CPU. As such, it has a lightweight and efficient implementation. For example, the Complex class (implementing the complex operations) is done in a way that avoids new object creation, which is different from some other Complex Java libraries I’ve seen.

Enjoy!

Cracovia Marathon

May 13th, 2008

Mihai at Cracovia Marathon

I’ve just run my second marathon, this time in Krakow instead of Warsaw. In the 5 weeks before the marathon I have run a total of 4km, so I can say that I had zero training — I was wondering whether I’ll be able to finish the marathon at all.. and yes, I did it. I ran slowly, with a total time of 4:51 (half time 2:10). The weather was very cold, windy and rainy, I had to eat bananas all the time to keep up the body heat. It wasn’t easy, but I can say that the satisfaction of running it to the end (without training) was great.

Still, if you plan to run a marathon yourself, I recommend at least three 21km runs prior to the marathon — this can save you from a lot of pain and even injury during the real course.

Running Time

March 31st, 2008

Yesterday I ran the Warsaw Half-Marathon (21km) with time 2:00:58 (two hours and one minute). It was a very nice event, the weather was great (in fact I’ve got a bit sun-burnt during the run) and the running was not painful, quite the opposite. I improved my time from last year’s half-marathon with about 9 minutes.

What was particular this time was that I trained… almost not at all. All my training this year was one 10km run and one 9km run (the reason being that the weather was rather cold this spring, while I enjoy running above 10 Celsius; not to exclude pure laziness though).

Given my lack of training, I was afraid I won’t be able to even finish it — it’d be enough to run a bit too fast in the beginning, and I’d be dead at 15km. But everything went well, I could maintain an even rhythm all along, running with about 10km/h without any breaks. I didn’t notice the two hours going by, it seemed as if it were a mere 15minutes. And the weather was great.

The Modem is in Vogue, Again

March 30th, 2008

I am a big fun of having the web’s huge and rich information available on mobile phones — it is really useful for the user to be able to browse the web with the mobile phone’s browser.

Some time ago I was wondering how this ‘mobile browsing’ experience can be improved, and at that point I was thinking that server-side page adaptation is the answer. In this approach, the web application generating the pages uses techniques such as User-Agent sniffing to detect that the client is a mobile phone, and returns a page specially-tailored for a mobile browser.

But now I realize that this approach is fundamentally flawed. It puts the burden on the web application developer to generate two parallel sets of outputs (one for desktop, one for mobile), and the hard part is keeping these two in sync. The problem is that the mobile version will get behind the desktop version, and the mobile user may be put in the paradoxical situation of wanting to browse the desktop version rather than the mobile version.

And the technology advances: my old mobile phone had a 128×128 screen, but now 240×320 is the norm for feature-phones (a feature-phone is the phone that is not as smart as a smart-phone). And the IPhone has a large QVGA screen (320×480). Such devices come closer and closer to displaying normal web pages, so the content-adaptation becomes less critical.

The solution that I see now is different: forget content-adaptation — instead design a single set of content, but take into consideration that a (growing) percentage of the users will be browsing from a mobile phone. So: keep the pages clean and streamlined, favour text instead of large graphics (text scales better), and keep the page size small.

Because the data connection on a mobile phone is slow, and often costs by amount of data transferred (not flat fee as on the desktop), it is important to keep the (total) page size small. This will greatly improve the browsing experience on the mobile phone.

In history there was a moment when most users were connected to the internet through a modem, with a maximum speed of about 5 KBytes/s. At that point, there was a lot of advice on the web on the lines of: keep the size of your web page under 10 KBytes — it would take so and so many seconds to download a web page of that size over the modem, etc.

I think it would be a good mind-set for present day web developers if they imagined that 20% of their users… are connected to the internet through modems again. So, forget the 1MByte (including images, ajax libraries, etc) web page, cause it takes 200 seconds to download it over the modem.

It’s true though, this new modem is getting faster…

Disclaimer: This blog expresses solely my own personal opinion — it is not endorsed by and does not represent the opinion of my employer.

Arity Performance on Mobile Phone vs. Desktop Computer

March 27th, 2008

I’m so happy I’m blogging again!

So, everybody knows already that I am the author of the Arity Arithmetic Engine, a nice little open-source library for evaluating arithmetic expressions. In this library I put quite some attention on the elegant and minimal code, and on performance. The functionality is mainly split in two parts: compiling an expression (takes a string and returns a Function object), and evaluating the Function.

For example, compiling the string “g(x)=x^2″ produces a Function instance. Calling eval(5) on this function returns 25. These two operations (compilation and evaluation) are separated because you typically compile an expression once, but evaluate it many times (for example when plotting the graph of a function).

On a desktop computer, Arity can do about 50,000 compilations/second, and about 1,000,000 evaluations/second. So the compilation is about 20 times slower than the evaluation.

Why is the compilation so slow? well, you may be surprised, but the bottleneck during compilation is the parsing of a double value from a string (using the java.lang.Double.parseDouble(String)).
And Double.parseDouble() is not only slow, it also does quite some memory allocations (which again result in slowness when the GC is invoked to collect that memory).

One key advantage of the Arity library is that it compiles not only on JavaSE (desktop Java), but also on JavaME/MIDP (mobile Java). So last weekend I decided to measure its performance on my mobile phone (a modest Nokia 6300). I wrote a tiny midlet for the benchmark, and the result is:

On the mobile phone, Arity does about 500 compilations/second, and about 10,000 evaluations/second. So the 20 times factor between compilation and evaluation speed is the same as on desktop.

And the key information, the mobile phone is about 100 times slower than the desktop computer (from Arity’s point of view).

Still, 10,000 evaluations/second on the mobile phone is not bad, I am quite happy with this performance.

PS: go check out Arity: http://arity.googlecode.com/

Arity Engine released

January 19th, 2008

The blog was pretty dead lately, except for some really good comments I got on past posts on MIDlet signing — I was quite busy at the day job, and still am. But I thought I’d share the news in a tiny post:

I’ve taken out the arithmetic engine from Javia Calculator, re-written it, cleaned-up the API, packaged and released it under Apache 2.0 (open source):

Arity, the Arithmetic Engine for Java. Enjoy!

It works for both JavaME (MIDP) and normal Java (SE). I’m rather happy with the API and the code, it turned out elegant, compact and (as usual) very efficient. Hope you like it..

The new XML

October 5th, 2007

JSON is the new XML.

JSON, the “JavaScript Object Notation”, is a very simple notation for representing hierarchical data. It achieves what XML was supposed to do, and in a very simple way.

While the XML spec is hundreds (thousands?) of pages, the JSON spec fits comfortably on one page. Although the name suggest an association to JavaScript, JSON is in fact language-agnostic. There are JSON libraries for almost any language out there.

Sample JSON fragment:

{
"name":"John",
"tel":["+16051231234", "+48885003300"],
"address":{"street":"Dyamond 8", "city":"San Francisco"}
}

JSON has elementary values (strings, numbers; e.g. “foo”, 10), arrays marked by square brackets (e.g. [2, 4, 8]), and maps (also named ‘objects’) which contain key:value pairs between curly brackets (e.g. {”name”:”John”, “age”:”10″}).

Because of its simple and uniform structure, JSON is very simple to parse (this is a major advantage over XML). JSON is more compact then XML. JSON is more ‘pure’, more elegant than XML. And because of its ability of representing arbitrary complex hierarchical structures, JSON is just as powerful as XML.

While XML had all the hype, JSON is quietly getting the job done.

Follow-up: quite funny, only one day after I wrote this article, there was a similarly-themed post on CEO’s blog: JSON for data exchange (vs. XML).

Warszaw Marathon

September 23rd, 2007

Today I run, for the first time, the marathon. The marathon distance is 42.195 km. I finished it in 4:28 (i.e. a little under 4 hours and a half). Although I drank a lot of isotonic drinks and water (about 2 liters) during the course, at the end of the course I was weighting with 2.5kg less then at the start (perhaps lost water). Now my legs hurt and it’s a bit difficult to walk, but during the course I had no particular problems. Now I look forward to doing it again (with a better time?).

Today I run, for the first time, the marathon. The marathon distance is 26 miles 385 yards. I finished it in 4:28 (i.e. a little under 4 hours and a half). Although I drank a lot of isotonic drinks and water (about half a gallon) during the course, at the end of the course I was weighting with 5.5pounds less then at the start (perhaps lost water). Now my legs hurt and it’s a bit difficult to walk, but during the course I had no particular problems. Now I look forward to doing it again (with a better time?).

Simplicity is hard to get

September 16th, 2007

I’ve just seen the new web application Jottit, and I liked the way they get simplicity right.

Because simplicity and elegance is not only in the way it looks, but also: in the size of the web pages, in the URL naming scheme, and most importantly in the interaction design.

About the web page size, do you know many sites with under-1KB size pages? To compare, what about sites with over-100KB size pages? Why would size matter — because more people are browsing the web on mobile phones; but also because simplicity and elegance mean minimalism, i.e. not to put on a page stuff that’s not needed. It doesn’t matter that the slack is not visible because it’s hidden inside the HTML source, having an unnecessarily large page is still a violation.

The URL naming scheme is of paramount importance, yet it’s surprising how much developers aren’t aware of this. I created a test page on Jottit, and it has the URL http://javia.jottit.com/. You can’t get the URL simpler than this. The URL must be short, semantic, and without arbitrary, meaningless parts in it. If you have to put some data key somewhere (in the URL or in a Cookie) make it short! Look at Jottit, when they put a page-code in the URL, it has 4 characters: http://jottit.com/xy4v/ .

Let’s compare: I was recently looking for where to host an open source project. Sourceforge.net is the venerable site, and I have respect for the good they did for many open-source projects, but the site IMO sucks. It is as complicated as it can get! I think they worked hard to put as much stuff as possible on as many pages as possible. And one of the smaller details: they also own the domain name sf.net, which only redirects to sourceforge.net. I can’t begin to understand why, having such an incredibly valuable domain name available, the 2-letter SF.net, they don’t use it. If my project was named let’s say “foo”, I’d love a domain name like “foo.sf.net”.

Google Code Project Hosting (I hope I got the official name right) is a breeze compared to Sourceforge, it is much simpler. Much better. I’d recommend GCPH (Google Code Project Hosting) to anybody over Sourceforge, if only for all the hassle you save yourself when creating a new project. Yet GCPH si far from perfect: they don’t get the URL naming scheme! Let’s consider a project named “menstral”, it will have this URL: http://code.google.com/p/menstral/, and the subversion repository URL http://menstral.googlecode.com/svn/. The problems with these URL names is: 2 different domain names (code.google.com and googlecode.com) inside a single project, and the redundant and arbitrary “p” in the project’s main URL. The URL for the project should have been http://menstral.googlecode.com.

The conclusion: bravo to Jottit for setting a new standarnd in simplicity and giving us an example. And I have a feature request for them: get HTTPS to work on all pages.

How to fix multi-tap

September 9th, 2007

The problem

Let’s consider the classical ITU-T 12-key phone keypad:

  1 2 abc 3 def
4 ghi 5 jkl 6 mno
7pqrs
8 tuv 9 wxyz
  *
  0   #

There are two main ways to enter text using it (for example when writing an SMS): multi-tap, and predictive text input.

Predictive text has its own set of advantages and disadvantages, but I won’t discuss it here.

In multi-tap you push a button multiple times to select a letter from a given position — for example, to type ‘c’ you push the button containing the letters ‘abc’ three times. Multi-tap is popular and easy to understand, yet it’s very inefficient. The UTI-T letter layout (i.e. buttons with ‘abc’, ‘def’, and so no) is a lousy one for text input, and the explanation is historical: in the beginning, the phones had letters next to the numbers in order to allow to dial phone numbers that were presented as words (as in: Call 1-800-FLOWERS or 1-800-GOT-JUNK). So initially, the task was not to enter text, but to dial phone numbers presented as text. And it turns out that the letter layout that was chosen for that task is not at all efficient for text input.

The fix

First thing, let’s make use of the “1″ key — put letters on it to un-crowd the 4-letter keys (e.g. “7″ has “pqrs”) . We still have the three bottom keys (*, 0, #) to put space, symbols and digits switches.

Letter Frequency

We also use the fact that in a given language, the different letters are not equally important — some letters are used much more frequently than others. For example, in English, the top-9 letters (ETAOINSHR) account for 70% of letters used, while the bottom-9 letters (YPBVKJXQZ) account for less than 8%. The obvious thing to do is to place the most-frequently-used letters on the first position on the keys, so that they are easier to type.

One may observe that the letter frequency is language-specific. While this is obviously true, it also happens that there is a large amount of similarity in letter frequency among Latin-alphabet languages from as diverse families as Germanic (e.g. German, Dutch, English), Latin (e.g. French, Spanish, Romanian) and Slavic (e.g. Polish, Czech). Surprisingly, there is a great deal of overlap of the most frequent letters in all Latin-alphabet languages. This means that using letter frequency we can come up with a layout which is good not only for English, but for many languages (a similar situation exists with the Dvorak keyboard layout).

For example, doing an informal frequency-weighting between English, French, German, Spanish, and secondarily Polish and Romanian, I came up with these top-9 letters (in order): EATINSORD. What’s to note is that H, which is rather frequent in English but rare in all the other languages, has been dropped in favour of D, which is frequent in many languages, and not too bad in English either.

So, based on frequency, we categorize the 26 letters of the Latin alphabet in 3 groups: the top-9 which will be placed on the first position, the middle-9 which are placed on the second position, and the bottom-8 which are on the third position on the keys. These groups are:

Most frequent e a t i n s o r d
Middle h l u c m w g f p
Least frequent b y v k j x q z

Bigram Frequency

With multi-tap there is a problem when two successive letters (in the text) happen to be on the same key. In this situation usually the user has to makes a small pause before advancing to the next letter.

For example, to type “ba”, you push twice the “ABC” key, make a pause, and push once more the same key.

This pause is annoying and slows down typing, so it’s good to reduce the number of cases when it’s needed. This is accomplished by putting together on the same key letters that appear seldom in succession. For this we use the bigram frequency, which is the frequency (in a language) of two-letter pairs.

An additional optimization is to eliminate the ‘rotation’ when multi-tapping the same key. Once the last letter on a key is selected, and the user pushes once more the same key, the typing automatically moves to the next character instead of ‘wrapping around’ to the first letter on the key in a circular fashion. The benefit is that after the last key on a button there’s no need for the same-key pause.

Let’s consider a key with 3 letters, let’s name them A,B,C. They are on the key in this order, A-B-C. In this situation, the frequency of the same-key pause for this key is given by: f(wait)=f(AA)+f(AB)+f(AC)+f(BB)+f(BC), where f(AB) denotes the frequency of the bigram AB.

On each key we choose one letter from each of the three frequency categories (most-frequent, middle, least-frequent), as to minimize the same-key wait based of linguistic bigram frequency.

Physical placement

Once we have the groups of letters for each button, we attempt to optimize the physical placement to increase the ergonomy while typing. There are many factors involved here (e.g. how does the user type, one hand or two hands? thumb or index? etc). I attempted to minimize the amount of finger movement between successive letters (because it takes a longer time to move the finger a longer distance) — again by using bigram frequency. Also, I tried to make the layout memorable (the first letters form pronounceable words on rows), and to place the most frequent letters (EAT) on the easiest to reach keys.

The Layout

A tentative layout resulting from these considerations is below. While it may still benefit from some improvements, it’s already much better than the classical UTI-T layout.

Afq Tmk Iuy
Sl Ewj Rhx
Npb
Ogz Dcv
  *
space   #

I call this layout ATI SER NOD, from the first-letter (the most frequent one) read by rows.

Java class literal on CLDC

September 9th, 2007

In Java, a class literal is a construct of the form String.class, which returns the Class object that corresponds to the named class. You can use this to test the exact class of an object, like this:

boolean isInteger(Object obj) {
  return obj.getClass() == Integer.class;
}

(A similar goal can be achieved by using instanceof, even though the semantics and the performance are not exactly the same.)

If you try to use the class literal on CLDC 1.0, it won’t work. You get at compile-time the rather funny error message

class file for java.lang.NoClassDefFoundError not found.

The explanation is that, up to and including Java 1.4, the class literal is emulated this way:

static Class class$java$lang$Integer;

static Class class$(String className) {
  try {
    return Class.forName(className);
  } catch (ClassNotFoundException e) {
    throw new NoClassDefFoundError();
  }
}

boolean isInteger(Object obj) {
  return obj.getClass() ==
    (class$java$lang$Integer != null ?
      class$java$lang$Integer :
      (class$java$lang$Integer = class$("java.lang.Integer")));
}

So the innocent-looking Integer.class is translated into a bunch of code, which can throw a new unchecked exception, NoClassDefFoundError. The problem is that CLDC 1.0 doesn’t provide this exception class, so you get a compilation error.

The solution? Use instanceof if possible. Another solution is to use a static final member initialized by calling getClass() on a dummy instance.

static final Class INTEGER_CLASS = new Integer(0).getClass();

OpenMoko mobile phone available for $300

July 9th, 2007

The OpenMoko mobile phone, with a completely open software stack, is available starting today for $300. http://openmoko.com/

Trivial Templates

June 8th, 2007

A template is used to generate a text string based on some data parameters. A simple template for a letter:

Dear ${name},
please pay us ${amount} dollars.

You use a template like this:

letterTemplate = Template(templateString)
letter1 = letterTemplate(name='John', amount='100')
letter2 = letterTemplate(name='Tom', amount='200')

The template can be more powerful than simple text substitution, by using loops, conditionals, etc.

Hello ${name},
you ordered these ${len(items)} items:

{{for (i, item) in enumerate(items):}}
item ${i+1}: ${item}
{{end}}

Calling the template with some parameters

print template(name='Joshua', items=['bottled water', 'milk', 'icecream'])

generates this output:

Hello Joshua,
you ordered these 3 items:

item 1: bottled water
item 2: milk
item 3: icecream

How difficult is it to implement such a template engine? 60 lines of python code:

import cStringIO as StringIO
import re

_reExpr  = re.compile(r'\${(.*?)}')
_reCode  = re.compile(r'(?:\n *)?{{(.*?)}}')

class Adder():
    def __init__(self):
        self.indent = ''
        self.o = StringIO.StringIO()

    def incIndent(self):
        self.indent = self.indent + '  '

    def decIndent(self):
        self.indent = self.indent[:-2]

    def write(self, txt):
        self.o.write(self.indent + txt + '\n')

    def getvalue(self):
        value = self.o.getvalue()
        self.o.close()
        return value

def pairify(seq):
    it = iter(seq)
    while True:
        yield (it.next(), it.next())

class SimpleTemplate():
    def __init__(self, string, origin = None):
        string = _reExpr.sub(r'{{_o(str(\1))}}', string)
        parts = _reCode.split('{{}}' + string)[1:]
        adder = Adder()
        for (code, txt) in pairify(parts):
            code = code.strip()
            if code:
                if code == 'end':
                    adder.decIndent()
                else:
                    firstWord = code.split(None, 1)[0]
                    if firstWord[-1] == ':': firstWord = firstWord[:-1]
                    if firstWord in ['else', 'elif', 'except', 'finally']:
                        adder.decIndent()
                    adder.write(code)
                    if code[-1] == ':': adder.incIndent()
            if txt:
                txt = txt.replace('\\\\', '\\\\\\\\').replace('"', '\\\\"').replace('\\n', '\\\\n')
                adder.write('_o("' + txt + '")')
        source = adder.getvalue()
        self.code = compile(source, origin if origin else source, 'exec')

    def __call__(self, **varMap):
        output = StringIO.StringIO()
        varMap.update(_o = output.write)
        exec self.code in varMap
        content = output.getvalue()
        output.close()
        return content

How to switch to UTC

April 25th, 2007

UTC (Universal Time Coordinated) is basically the same thing as GMT (Greenwich Mean Time). It is the zero-offset time zone.

As the world gets global, it gets confusing to use all the different timezones. It would be good to have a uniform system to name the time — and UTC is the best candidate.

UTC is already used by airplane pilots when they communicate time — because the planes travel often between timezones, and you don’t want some pilot or control tower to get confused in the time-zone names and time translation.

The local time system that is in use now has the advantage that the time (in the local time zone) corresponds to some degree to the sun position on the sky. In general you expect that 8:00am is in the morning (sunrise), and at 1:00am you expect that it’s dark.

The drawbacks of the local time system become appearent when people travel or communicate between timezones: a traveller has to adjust his watch when travelling, and continously make mental transformations to compute the time in the other time zone. Whatsmore, were there a single (unique) timezone, we could drop the timezone concept altogether, and only talk about time (or Earth time, when need comes).

Also the communication between people is now global — instant messenger, VOIP calls, skype etc. More and more people talk between timezones. It’s easy to see the babel scenario when co-workers use different time-zones each to name the time. For example:

A: What about having the meeting at 9:00 PDT?
B: Sorry, I have to leave at 18:00 CEST.
C: 18:00 EEDT works fine for me.
A: Ok, what about 8:00 PDT then?

Here is the plan for the global switch to UTC:

  1. Eliminate Daylight Saving Time (Summer Time) in all time-zones (have a single time throughout the year in each timezone).
  2. Merge close-by zones: Eastern Europe (EET, e.g. Romania) and Central Europe (CET, e.g. Germany) would adopt U.K. time (UTC); The United States would merge into a unique time zone (e.g. PST or EST). This should reduce the global number of time-zones to about half a dozen.
  3. Start communicating the time both in the local timezone and in UTC (in order for the population to get used to UTC).
  4. Abandon the remaining timezones, everybody adopts UTC only.

Of course, an ambitious endeavor, to unify the timezones. To put it in perspective, is it more difficult than the human flight to Mars? Or is it easier than the power-plug unification?

On the plus side, I think it would be enough for some major countries to adopt UTC (China, India, US, European Union, Russia, Japan) and the others will follow.

Working for the man, part 1

April 17th, 2007

Among the first things I’ve been imprinted with at Google, is that Google is a very secretive company. I’m not allowed to tell almost anything about [the inside of] Google. I don’t even know how I could ever post anything mentioning the word Google on my public blog, but given that I’ve read public blogs of other googlers, I imagine that just blogging may not be a reason for immediate firing. Just for caution, I’ll try to restrict myself to saying only positive things about Google (and this does not imply that there are other-than-positive things to be said about Google) — self-censorship is an interesting attitude.

First, I should stress that there are plenty of great things about Google — the problem is that many of these are already well known (not to mention the food).

My first contact with the inside of Google was through the interviewing process (when I was recruited). I can tell that I was positively impressed by the interviewing process: most of the questions/tasks were relevant and challenging, and the people conducting the interviews seemed compentent and smart. I also had the impression that they were looking for objective hints in order to evaluate my skills (as oposed to subjective judgement). As a simple outline of the interviewing process, I started by sending my Resume (CV) by email to Google. After this followed two phone-screens, and afterwards, four in-person interviews.

What can I say, I’d wish to become as good an interviewer as some of the persons who interviewed me. But there is one small thing that keeps me from being completly extatic about the recruiting process at Google: about a year and a half ago, I sent for the first time my resume to Google. It was not exactly the same resume, and the layout was different, but it was essentially describing the same person (me).

Back then, I got back the standard refusal email, we have no oppenings matching your skills at this time or something like this, but I think that the real message of the email was: based on your resume, we decided it’s not worth proceeding to the phone-screen phase. Meaning that the recruiter was so confident that I’m a bad candidate, that it didn’t warrant a phone-screen test.

I don’t really believe that I improved that much between then and now. I rather think that I experienced a glimpse in the recruitment process: either Google hired a bad candidate now, or the recruiter hurried to drop my resume back then — I hope it’s the last. Of course, Google (like some other companies) is more concerned about avoiding a bad hire than about losing a good candidate — or in other words, a false-negative is preferred to a false-positive. From the employer perspective, it’s preferrable to err on the side of caution; what I’m surprised is that my resume was judged so bad that I didn’t even make it to the phone-screen.

PS: if you’re interested in a software development position at Google, you may be better-off sending your resume (CV) to a Google employee you know for them to refer you, rather than doing a spontaneous application. And second, Google is hiring in (extended) Europe and Russia, so don’t be put-off if you don’t live in the US.

Sidebar: the US measure system

April 14th, 2007
US Metric
1 mile 1.6 km
1 foot (feet, ‘) 0.3 m, ~1/3 m
1 inch (”) 2.5 cm
1 gallon 2.5 liter

Temperature

Celsius Fahrenheit
-10 14
0 32
10 50
20 68
30 86
40 104
Fahrenheit Celsius
25 -4
50 10
75 24
100 38