Archive for January, 2007

Carnival of the Mobilists at Mobbu

Monday, January 29th, 2007

Mobbu is hosting this week’s Carnival of the Mobilists, check it out.

URL Design

Friday, January 26th, 2007

The URLs that are used to access an web application are important, they are part of the interface to the application. Here I put together some random principles which should help towards a quality URL design.

  1. Keep URLs short. Short is easy to remember, to edit in the address bar, to write down, etc. The most often used URLs should be the shortest.
  2. Don’t use file extensions (such as .html, .gif, .php, etc.). The extension is an implementation detail that doesn’t need to be presented to the user.
  3. Normalize URLs. Every entity should have one canonic URL. Have other URLs that point to the same entity use HTTP redirects to the canonic URL.
  4. Normalize the domain name to the form without the leading “www.”, because it becomes shorter. The leading www. doesn’t add information for the user.
  5. A path which accepts to be continued with more segments should behave like a directory, i.e. should be normalized to have an ending slash.
    If http://x.org/user/john is a valid URL, then http://x.org/user should redirect to http://x.org/user/ (note the appended slash).
  6. If a path is valid, then any prefix path should be valid too.
    If http://x.org/admin/user/john is valid, then these should be valid too: http://x.org/admin/ and http://x.org/admin/user/ .
  7. Have the prefix URL allow exploration (browsing) to the possible suffixes that may follow it.
    In the case of http://x.org/country/brazil : http://x.org/country/ may provide a list of possible contry names, with links.
    In the case of http://x.org/user/john : http://x.org/user/ may provide an input field for entering an user name if the number of users is too large to enumerate them all in a list.
  8. Avoid ‘query’ parameters in URL. Instead of http://x.org/?p=101 use http://x.org/p/101 . An exception are the form-action URLs which get the parameters appended due to the form submission.
  9. Avoid session-IDs or other long binary (opaque) identifiers in URLs. If you really absolutely need an opaque identifier in the URL, then make the information as short as possible (number of bytes) and encode it in the most compact way (e.g. base64).
  10. Try to have homogenous levels, where all the continuations (suffix segments) of an URL are of the same kind.
    Example: If http://x.org/user/ is continued with ‘john’, ‘andy’, ‘mary’: it’s good, all the continuations are user names (nouns).
    But if http://x.org/user/ is continued with ‘john’, ‘edit’, ‘remove’, ‘102′, ‘new’: bad, because ‘edit’ and ‘remove’ are actions (verbs), ‘john’ is a noun, ‘102′ (noun too) is an id and not a name, ‘new’ is a noun but designates an action (add), etc. This level is not homogenous (it’s mixed-up).

It’s great if a web application has such a simple (and nicely structurated) URL space that the user, after some use of the application, aquires a mental map of the URL space. From that moment on he won’t have the sentiment of being lost in a huge URL spaghetti anymore. He achieves orientation: by simply looking at the URL he knows where he is and where he can go. He can even extrapolate from the observed URL structure and explore new URLs.

Google OpenID

Wednesday, January 24th, 2007

OpenID has two good ideas:

  • The user identifies himself using an URL
  • The same ID (URL) is used to login to multiple web sites

Google, on the other hand:

  • Has a large number of users, and a large internet footprint
  • Uses a single ID (the gmail account name) for login to multiple google services (gmail, analytics, adwords, sitemap, blogger, etc)

Google should realize that allowing the google users to use their same (google) ID on other (non google-affiliated, independent or rival) sites, while maintaining the single google logon, is something that the users want. OpenID offers an API for doing exactly that. In this situation, I see three possibilities:

  • Google embraces OpenID: Google starts offering an (automatic) OpenID URL for each existing google account, thus allowing the google accounts to be used on any OpenId-enabled web site
  • Google develops its own alternative single-logon API, and fights OpenID. The google IDs won’t be URLs in this case, which is unfortunate
  • Google does nothing and misses the single-logon opportunity

Midlet Signing

Wednesday, January 24th, 2007

Disclaimer: this article is in draft state, I plan to come back and improve it upon checking the facts, so take it with a grain of salt.

The fact that a midlet is signed by the author (vendor) certifies two things:

  1. that the author really authored that midlet
  2. that the midlet is in original state (i.e. it hasn’t been modified by a third party)

The purpose of signing a midlet is two-fold:

  1. to gain access to security-sensible operations on the phone (such as sending an SMS or opening a TCP connection)
  2. to make sure that a (malicious) third party can’t present a different midlet as the original one

Digital signing makes use of two concepts: public-key cryptography (a.k.a. asymmetric cryptography) and digital certificate.

Signing works like this: a secure hash (also called message digest) of the midlet is computed, and this hash is encrypted with the private key of the signer. The signature is checked like this: the encrypted hash is decrypted using the public key of the signer, and is checked for equality with a computed hash of the midlet. If the two (the signed hash, and the actual computed hash) are equal it means that the midlet signature was verified successfully.

The idea is that only the owner of the private key can sign the midlet (so that the signature is verified with the corresponding public key), and that any modification to the midlet after the signing will cause the signature verification to fail.

As you see, all that is needed to check the signature is a public key. The result of the signature verification is a yes/no to whether the midlet was signed by the owner of the private key corresponding to the public key used.

But in order to be meaningful, the result should be whether the midlet was signed by a real-world entity (the midlet author/vendor, a person or company). So the public key must be associated to the identity of some author/vendor. This association between a public-key and a real-world identity is achieved using digital certificates.

A digital certificate claims that some public key belongs to some entity (a person or a company). A digital certificate contains the pair: public key, name of the entity (who owns the key). A certificate is also signed (using the same signing procedure we’ve seen above) in order to prevent the situation that a fake certificate is built by a malicious entity which claims a different identity for its key.

The midlet is signed and the midlet signature is verified against a certificate (which contains the public key needed to check the signature, and the name of that key’s owner). This certificate is also signed, and the certificate signature is verified against another certificate, which contains the public key for checking the first certificate’s signature and the name of this key’s owner. This second certificate (which signs the first certificate) is at its turn signed with another certificate. And so on, we have a chain of signatures.

This chain of signatures ends with a certificate which is not signed by any other certificate (a detail is that this original certificate is self-signed, meaning that it’s signed with its own public key, but this has the same effect as not being signed at all). The only way to be sure that this self-signed certificate is not fake is to have it in a list of trusted root certificates.

For example, your browser has a list of trusted root certificates, which contains certificates from well-known and trusted companies such as Thawte and Verisign. Any chain of certificate signatures is followed until it ends with some such trusted root certificate.

But if the final (self-signed) certificate is not found in the list of trusted certificates, then it can’t be trusted (it’s authenticity can’t be verified), so the whole chain of signatures can’t be verified.

Now let’s go back from this general signing stuff to our midlet signing. Any mobile phone comes with a list of trusted root certificates (used for verifying midlet signatures). This list contains root certificates selected by the manufacturer (e.g. Nokia) and usually also contains the root certificate of the network operator (e.g. Orange) in the situation when the mobile phone is distributed by an operator.

Let’s consider a practical example: I am a midlet author. I write a midlet, and I want to sign it, what do I have to do?
First, I need to buy a certificate from some certification authority (i.e. seller of digital certificates), such as Thawte, Verisign, GoDaddy. Because I want to use this certificate for midlet-signing, I have to be careful to select a code-signing certificate (there are also other kinds of certificates, most notably SSL certificates used to certify the identity of web sites, which can’t be used for midlet signing). So I choose to buy a code-signing certificate from Thawte for $200/year (note, the price is per year, recurring, as with a domain name).

To buy the certificate, I first generate a pair of keys, private-key and public-key. I keep the private key secret, and I send the public key to Thawte for inclusion in the certificate. Thawte checks my identity (using perhaps some photo ID that I fax them, and a phone call from them), and afterwards sends me back a certificate containing my public key and my identity (my name). Most importantly, this certificate is signed with Thawte’s root certificate (this is what for I paid the certificate’s price of $200/year).

Using this certificate I just bought, I sign my midlet, and I start distributing the signed midlet. Now some user tries to installs my midlet on his phone. The phone automatically checks the midlet signature, i.e. it follows the signature chain, starting with the midlet, my certificate, and ending at Thawte’s root certificate. If the phone has Thawte’s certificate in its list of trusted certificates, it’s all dandy and the signed midlet is installed. But if it happens that the particular phone model doesn’t contain Thawte’s certificate in its list, the signature verification fails, and the midlet isn’t installed, and the user can’t use it at all. This situation means that I paid the certification money to Thawte for nothing.

Now you may think, perhaps this example is un-reallistic, I surely may buy a certificate which is recognized by all phones? Well, no. Different phone manufacturers (and even different models from the same manufacturer) have different sets of trusted root certificates. For some phones Verisign works and Thawte doesn’t, for other phones Thawte works and Verisign doesn’t, for others neither Thawte nor Verisign works, and so on. The practical solution: buy as many certificates from different certificate vendors as you can afford (all for the same public key), and use them all when signing your midlet. When your midlet is installed on a phone, if at least one of those certificates works, the midlet signature can be verified and all is good. The downside of this method: it costs a lot of money, and it generates a lot of clut and redundant work.

What’s more: while some phome models allow the user to edit the trusted certificate list (perhaps in order for the user to add a root certificate he finds missing), many phone models do not allow it. (I can’t understant why the manufacturer would disallow the user to add new trusted certificates, I consider this a crippleware tactic. For example, Nokia S40 2nd edition allowed user access to the trusted certificate list, while the more recent S40 3rd edition doesn’t allow it anymore.)

So, let’s turn back to my case study (and publicity plug). I developed two freeware midlets (Menstral and Javia Calculator), that I distribute free of charge. Let’s say that, for the security of my users (in order to protect them from a malicious distributor who tries to impersonate my midlets), I want to sign my midlets, as to to certify that it’s really me the author, and that the midlet hasn’t been tampered with. So what do I have to do? Pay $200/year for the certificate, and have my midlets fail to install anymore on part of the phones because my certificate isn’t recognized… this is why there are so very few signed midlets.

You’d think it can’t get worse than that? why, it can: Motorola, Nokia, Siemens, Sony-Ericsson and Sun saw there is a problem and they promptly found the ’solution’: create a new certificate, and have every phone support only this new certificate, and create a signing program which will bring them more money (and have the midlet authors pay).

This new initiative is called Java Verified. The new certificate, which is the only one supported on newer Nokia phones (see Nokia 5300 at “Java verified root certificate”) is called UTI Root (Unified Testing Initiative).

And how does this Java Verified, Unified Testing Initiative works? Simply: you pay! No, really: you send your midlet to one of the Java Verified Test Providers (like CapGemini), who runs a series of tests of your midlet. For example, they check that your application provides a Help command and an Exit command, that it doesn’t hang, that it doesn’t do anything malefic (but they don’t have access to the source code), etc. They do this by running the application a few times on a real device. If they find some problems they send you back a problems report, and (after attempting to fix it) you have to pay again for one more try at the Java Verified certification. Pay, and repeat.

How much does it cost? With CapGemini, it costs 240 euro for the first submission, and 210 euro for subsequent submissions (after failing certification on the previous attempts). This price is per midlet and per targeted device. That is, if you want your signed midlet to run on multiple phone models, you have to pay once for each phone model (see a list of devices supported by Java Verified, you have to pay for each one).

So Java Verified considers that a midlet author should specify the targeted device for the midlet, the targeted device being something like Nokia 5300 or Nokia 6131. So much for the portability of JavaME applications, which says that a JavaME midlet should be able to run on a wide range of devices, from different manufacturers, with different screen sizes, etc. (And Sun, who should be the main pusher of java portability, is a member of Java Verified.)

For example, my Menstral midlet is supposed to run on any MIDP device with a color screen at least 128×128 pixels in size. I guess such a ‘targeted device’ counts as many tens of Java Verified devices.

So again, that’s why there are so very few Java Verified midlets…

Midlet-Vendor

Sunday, January 21st, 2007

A mandatory midlet attribute, which must appear in both the jad and the jar, is Midlet-Vendor; it designates, as the name suggests, the vendor (or the author) of the midlet. One problem with this attribute is that the structure of its value is not specified in the MIDP standard: it may contain any string (including whitespace). In this way, while the Midlet-Vendor attribute may still be useful for presentation to the human user, it is of limited use for authomatic processing.

Let’s consider some examples that illustrate the point:

Midlet-Vendor: Mihai Preda
Midlet-Vendor: mihai-preda

Midlet-Vendor: www.javia.org
Midlet-Vendor: myself

Midlet-Vendor: FooSoft
Midlet-Vendor: Foo Soft Inc., USA

In the first examples we see different ways of writing essentially the same vendor name, by varrying case or whitespace. While it looks the same to a human reader, certainly the vendor string is different from a computer perspective. This makes it difficult to automatically associate the different midlets with the same one vendor. There is also a second problem with such a vendor name which contains the name of the author: of course there are multiple persons with the same name (unless the author has some particularilly rare name), so it may happen that two distinct authors (with the same name) turn up being identified as the same vendor.

Let’s look toward a solution: We want the Midlet-Vendor to have a normal form, which allows to reliably compare two vendor strings in order to detect if they designate the same entity or not; and we want to eliminate the possibility that two different entities turn up using the same vendor string.

The solution that I propose is to have an URL (URI) as the content of Midlet-Vendor. In order to make it explicit that it’s an URL, the schema (e.g. http://) should be present.

Midlet-Vendor: http://javia.org/

The URL should point to a page which contains more information about the vendor; it could point to a company’s home page, to the blog of the author, etc.

One objection to this solution could be that the Midlet-Vendor is also intended for presentation to the human user (of the mobile phone), and the company URL is not as clear to a human as the company’s name. While I don’t necesarilly agree with this argument, a solution would be to have the Midlet-Vendor contain a free string (the company name, intended for user presentation) and an URL, intended for equality-comparison between different Midlet-Vendors. Examples of good Midlet-Vendor strings:

Midlet-Vendor: Mihai Preda (http://javia.org/)
Midlet-Vendor: Foo Soft Inc. (http://www.foo-soft.com/)
Midlet-Vendor: http://google.com/

The important point is this: the Midlet-Vendor should always contain an URL (eventually enclosed in brackets), and only the URL should be used for equality-comparison between different Midlet-Vendor strings.

But according to the MIDP standard, the full Midlet-Vendor string must be compared for vendor-equality — so in practice I would suggest to use only an URL as the content of the Midlet-Vendor.

Dreamhost

Thursday, January 18th, 2007


I’ve been using Dreamhost for one year now, and I thought I’ll write a review, to see how the referral system works (disclaimer: I get referral fees for everybody who signs up following a link from this page).

The lowest plan costs about $10/month, and comes with about 180 GB disk space, and 1800 GB transfer/month; this is rather huge. I look on the status page for my account (that I share with two friends), and I see that together, our disk usage is at 1% of the space available, and out traffic usage never got as big as 1% (it always stays at 0%). So for practical reasons, the disk and transfer seem unlimited.

You can create up to 75 Linux user accounts, with shell (bash, ssh) access. You can host any number of domains, with any number of subdomains and mailboxes; all this free.

Dreamhost offers one-click installs of WordPress, phpBB, Joomla!, MediaWiki, activeCollab and many others.

I’ve installed and I’m using Django (Python web framework) without any problem (with FastCGI). Dreamhost also supports Ruby on Rails, although I don’t use Rails myself. For databases, DH supports MySQL (with many storage engines: MyISAM, InnoDB, BerkeleyDB, etc — InnoDB supports transactions) and of course SQLite (PostgreSQL is not offered yet, but may be added in the future if there is demand for it). Of course, you may have an unlimited number of MySQL databases. Dreamhost supports FastCGI. As web server, Apache versions 1.3 and 2.0 is available, with full .htaccess configuration (including mod_rewrite). PHP (versions 4.4.2 and 5.1.2) is of course available.

Dreamhost also offers subversion (svn) that you can use either through ssh, or access with the browser. I keep my development svn repository at DH, thus taking advantage of their data security and backup (if my laptop gets stolen, or if my hard disk explodes, I won’t lose my source code). DH also offers webdav (mod_webdav in Apache).

It is possible to compile and install at Dreamhost the applications or libraries that you need (if they are not already offered by DH) — this comes handy when developing a web application. For example, I’ve installed on my account: Python 2.5, Django (svn version), GDB, SCons, GeoIP, ImageMagick, readline, ares, and the most recent version of SQLite.

You also have low-level access to the DNS records (so you may add any DNS records on your domains: A, CNAME, MX, NS, PTR, TXT, etc). And the greatest thing: even though wildcard DNS is not officially supported, the DH support team was kind and, at my request, has just activated wildcard DNS on one of my domains (thanks DH, this is great!).

And Dreamhost offers a free domain name registration (com, org, net — which typically cost $9/year) with any account.

What’s more: on my Linux laptop, using FUSE and sshfs, I remotely mount my DH account directory as a mount point on my local filesystem. This way I can work with and edit all the files from DH just as if they were local on my laptop, which is very useful.

About the DH drawbacks: the DH servers do not seem as responsive as they could be. And an account doesn’t come with a dedicated IP included (you may add IPs at additional cost). Because of this, it’s not possible to do SSL (https://) on the hosted domains ‘out-of-the-box’ — you need to add a dedicated IP, which costs $4/month, for that.

In conclusion, Dreamhost offers a huge amount of liberty; it is as close to VPS (virtual private hosting) as it gets, while keeping the cost in the low shared-hosting range.

If you read thus far, don’t forget to use the promo-code WILDCARD when you sign up: it will get you the one year plan (which normally costs $119.40) down to $29, with one domain registration included (isn’t that incredible?).

Update

Sunday, January 14th, 2007

european union flag
Since the 1st of January, 2007, Romania (and Bulgaria) is a member of the European Union. That’s great, congratulations to Romania and Bulgaria for making it. (I’m Romanian myself, and yes this EU accession does feel good).

The open-moko phone (FIC Neo) is now scheduled for release in February 2007. I can’t wait to get one. Its release becomes even more interesting in light of the recent Apple’s IPhone annoncement, as the two phones, despite many similarities, are at the two extremes of freedom: Open-Moko comes with full open source code, while IPhone doesn’t allow instalation of third-party applications and comes with a 2-year contract.

The MIDP 3.0 (JSR 271) early draft is available for download. This draft is the best way to find out now what MIDP 3.0 will bring; it should be an interesting read for anybody involved with JavaME. The most important addition seems to be the concept of libraries (which can be shared between midlets).

For the first time, I am a participant in the Carnival of the Mobilists (edition #58), with my article JAD is Bad (which argues that the JAD concept introduces un-needed complexity).

I’ve started working with Django, which is a Python web framework. A nice thing about Django is that the source code is relativelly small and readable, so it’s easy to check the code out in order to see how it works or to fix things. (reading code is also a good way to learn a programming language). I’ve already submitted a patch regarding the append-slash redirection.

I’ve written a small Python library called Python Mobile User Agent, which analyzes the User-Agent HTTP header in order to detect whether it belongs to a mobile device (and attempts to extract the mobile’s vendor/model from the User-Agent). This is useful for server-side content adaptation, where a web application generates a different variant of a page for a mobile browser vs. a desktop browser. The library is based on Craig Manley’s perl/php MobileUserAgent library. I’ve put my library on Google code hosting (under the MIT license) in order to get a feeling of how Google code hosting works.