POP3 – UIDL is a required command!

RFC 1957 observes, discussing mail reading software that implements the popular POP3 protocol: “two popular clients require optional parts of the RFC. Netscape requires UIDL, and Eudora requires TOP.”

This reads like a complaint, but this tell me that Netscape’s mail reader (which these days is called Thunderbird) is well designed.

The rot started with  RFC 1939, the standard for this protocol. This document specifies that UIDL is optional. This was a mistake. Without UIDL, the protocol is not reliable. I write this in the hope of persuading you that UIDL should not only be considered a requirement for a POP3 server, but that any client software that doesn’t require UIDL should not be trusted. I’m looking at you, Eudora!

What is UIDL and how does it fit into POP3?

UIDL is the “directory listing” command in POP3. When a client issues this request, the server responds with a list of “unique-id” strings that may as well be considered file names.

Opening a POP3 connection, authenticating and performing a “directory listing”.

Each unique-id is paired with a numeric id, starting from 1. The other commands to download and delete messages all use these numeric ids. Each time the client reconnects, it will need to repeat the UIDL command so it knows which numeric ids refer to which messages.

For something as fundamental as a directory listing, it seems odd for that to be optional.

Without UIDL, the client needs to fall-back onto those numeric message ids alone. Instead of UIDL, the STAT command returns the number of messages in a mailbox. With that, the client can loop from 1 to n, downloading and deleting each one, leaving the mailbox empty once they have all been downloaded. As POP3 is explicitly designed for download-and-delete operation and not keeping the messages on the server, you might consider that UIDL is not necessary. So let us follow that road where we don’t have UIDL.

Living in a world without UIDL.

Operating POP3 without UIDL only works in an ideal world. If you had 100% reliable connections to the server then you might get away with it. Reality tells us the world is not ideal.

Let’s think about the step of deleting a message once you’ve downloaded it. You might think that DELE is the request to delete messages you’ve downloaded (or don’t want), but the request to actually delete messages is QUIT.

The client flags the messages to delete with DELE, but those deletes aren’t committed until the client later issues a QUIT request. If the connection stops before a QUIT, the server has to forget about those DELE commands and the messages all have to remain in the mailbox for when you reconnect. This is by design as you wouldn’t want your messages deleted if your client is in an unstable environment that can’t keep a connection open.

Consider though, what would happen if the underlying connection was dropped just as the client issued a QUIT request. You sent the request but no response came back.

Download and delete a single message, but the connection fails a critical point.

What happened? We don’t know. We can’t know. There are three reasonable possibilities…

  • The QUIT command never arrived at the server. The server just saw the connection drop.
  • The server couldn’t process the delete and responded with an error, which got lost.
  • The server successfully deleted the messages, but the response got lost.

You asked for some messages to be deleted, but you don’t know if your instruction was processed or not. The only way to find out is to reconnect (when you can) and see if the messages you asked the server to delete has gone or not.

Let’s say that time has passed and the client is finally able to reconnect to the server again. Last time, the client downloaded a single message and may or may not have deleted it. Now we’ve reconnected we find a single message in the mailbox. Is this the one we deleted before or a new one that’s arrived in the interim? A handy directory listing would be real useful right about now!

This is why I would mistrust any mail reading software that didn’t require that a mail server implements UIDL. Messages might get downloaded twice or wrongly deleted if the wrong assumptions are made.

“Come back!”

The alternatives to UIDL are all unreasonable.

If the above doesn’t convince you that UIDL is necessary, this section is to answer anticipated responses that UIDL is not necessary. Nuh huh!

(If you are already convinced and you don’t want to read my responses to anticipated arguments, you can skip this section.)

“That scenario you describe won’t ever happen in reality.”

Stage one: Denial.

Where is this perfect world where connections don’t stop working at the worst possible time? Where database updates happen instantly? I want to live there!

Think about what a server needs to do to process a QUIT command. Many flagged messages will need to be modified in an atomic transaction such that they won’t be included next time. Indexes will need to be updated and the dust needs to settle before the server can send its acknowledgement. During this time, the underlying TCP connection will be sitting there idle, looking just like a timeout error.

“We wouldn’t have a problem if mail servers were better engineered!”

Stage two: Anger

If your requirements of a mail server include underlying connections over the public internet that never fail, I think your requirements are a little unreasonable.

“So I occasionally see two copies of a message in my mailbox. Big whoop!”

Stage three: Bargaining.

If that started happening in software I was using, I’d file a bug report.

“There are other ways POP3 can resolve this issue.”

Stage four: Depression.

Alas, all of these alternatives that POP3 provide are unreasonable.

You could use the response to LIST as a fall-back? This command requests the size in bytes of each message. Most messages are long enough that they will have a unique size, but this isn’t reliable. Messages are often going to have the same size as others just by accident.

You could use TOP to retrieve just the header and extract something from that to track messages? Problem there is that no single header is a reliable identity. Two adjacent messages might have the same date or the same subject. The closest candidate for a suitable identity is Message-ID but this is generated by the sender, who might not include it or might reuse IDs. If we’re relying on the POP3 server to add them or modify duplicates provided by a sender, we’re back to relying on optional features.

You could use the TOP response and hash the entire header? This could work except message headers can change. I first saw this when experimenting with a mail server and observed that if I connected to a mailbox using IMAP, it would leave IMAP’s version of a unique identity in the header which wasn’t there before. As well as that, anti-spam systems might re-examine a mailbox’s contents and update the anti-spam or anti-virus headers. Any of these changes would look like a new message.

(As well as all that, TOP is itself an optional command, just like UIDL.)

You could download the entire message again and ignore it if you already have it? This would be ultimate fall-back. While I’ve seen headers change, the message body seems to be immutable. This is still an unreasonable situation. We’re downloading the whole message again, just because the server chose not to implement a simple directory-listing command.

Am I certain that the message body is immutable? No, not at all. If someone commented that mail server XYZ updates messages in the form of a MIME attachment, I wouldn’t be at all surprised.

Acceptance?

Because the alternatives are so unreasonable, I consider UIDL a requirement for handling POP3. Servers that don’t implement UIDL are bad servers. Clients that can work without UIDL are unreliable.

Still not convinced? Please leave a comment where you saw this piece posted.

“I’ve seen the future! I’ve seen the future! I’ve seen the future and it’s now!”

IMAP does it wrong.

The other popular mail-reading protocol is IMAP. In contrast to POP3’s download-and-delete model, IMAP’s model is that messages to stay on the server and are only downloaded when the client wishes to read it. This model enables mail readers on low-storage devices such as smartphones.

With IMAP, the IDs are restricted to numeric values and always go upwards, in contrast to the free-for-all “any printable ascii except spaces” allowed by POP3. While this may be nice for the client, by requiring a single source of incrementing ID numbers, it complicates matters for anyone wishing to implement an IMAP server using a distributed database as a back-end.

But the worse thing about IMAP’s message identity system is that the standard permits the server to discard any IDs it has assigned by updating a mailbox’s UIDVALIDITY property. If this value ever changes, it is a signal to the client that any unique IDs it may have remembered are no longer valid.

A client needs a reliable way to identify messages between connections to recover from an unknown state. It does not need for servers to have a license to be unreliable.

If a mail server that implements IMAP wants any respect from me, it would document that its UIDVALIDITY value is fixed and will never change and that the unique-ids it generates are reliable.

POP3 does it wrong too.

If I’m going to criticize IMAP for flaws in its unique ID system, I should address flaws in POP3’s system too, having spent most of this article praising it.

Quoth RFC 1939: “The server should never reuse an unique-id in a given maildrop,” (good) “for as long as the entity using the unique-id exists.” (no!)

Consider that worst case scenario. The client flags a single message to be deleted and finally issues a QUIT command to complete the translation. The server successfully processes the request but the response to the client is lost. As far as the server is concerned, the message is gone and there’s no problem, but as far as the client knows, the continued existence of that message is unknown.

Now consider a new message arrives on the mail server and because the RFC says it can, it assigns the same unique ID to this new message as the one that was just deleted. The client eventually reconnects and requests the list of unique IDs and finds the ID of the message it wanted to delete is still there. It doesn’t know the server used its right to reuse unique IDs and that this is actually a new message!

Now, I’ve never seen a mail server actually reuse a unique ID. The clever people who have developed mail servers in the real world seem to understand that reusing IDs is not something you ever want to do, even if the RFC says you can.

RFC 1939 also says, “this specification is intended to permit unique-ids to be calculated as a hash of the message. Clients should be able to handle a situation where two identical copies of a message in a maildrop have the same unique-id.”

Unique IDs don’t have be unique? Ugh.

This allowance only applies to identical messages. In reality, messages are never identical. After bouncing around the internet and going through various anti-spam and anti-virus servers, messages do accumulate a frightening number of Received: headers left behind from each intermediate hand-over. Each one with a time-stamp and its own ID number. Any one of these is enough to produce a distinct hash.

Picture Credits. (All Creative-Commons licensed.)
Listening to Radio Karnali” by “BBC World Service”.
List 84” by “Weisbaden 2010”.
The Time of Sunset” by Joy Sarah Nawati.
Future” by “Legosz”.
“PuTTY screen-shots” by me.

Dear WordPress. Please stop using MySQL.

This may very well end up being my last Blogger based post as I’m slowly adopting (self-hosted) WordPress as a publishing platform. I have a set of websites running on a commodity cPanel-based shared host, with a view to moving to a dedicated VM in due course. While setting things up and playing about with WordPress, I kept tripping over an obstacle that just kept getting in the way of doing what I wanted to do.

MySQL was that obstacle.

Dear WordPress, please have an option to use a file-based database (such as SQLite) instead.


Why would you want to do such a thing?

First of all, simmer down, MySQL is a perfectly good database. It does the job it was designed to do very well. My problem is that MySQL exists on a server separated from the rest of WordPress.

Think about what makes up a single installation of WordPress. You’ve got a bunch of PHP files, the themes, the plugins, the images and media I’ve uploaded. All of these are in a single folder on the web server. I could ZIP the folder up, UNZIP it later, push the folder into GIT version control, all in the certainty I’m got everything. Except…

Some of my website is not in that folder. It’s on the database. I can’t just ZIP the website up because an essential component is off in another realm. That folder does not contain everything and I now need to keep database backups alongside the folder backups. Grrr…

I was considering adding a plug-in to one of my live websites. Because people were using it, I didn’t want any down-time. Accordingly, I made a copy of the website folder and also made a copy of the database. The new copy then had to be reconfigured to point to the new database and only then could I play about with whatever plug-in or theme I wanted to add. Whatever clones and copies I make, the database is always at arms length and I need to be very careful that the PHP is always linked with the right database and that I’ve not got any cross-overs.

If the data on the database were on a file in that folder, there would be nothing external to keep track of. Copy the whole folder and job done! Taking backups would mean zipping up the folder and everything is there without worrying about keeping the two parts in sync.

Wouldn’t that make the site inefficient?

Maybe, but I plan to use very aggressive caching. There’s a plug-in where the site contents end up as static files and the code accessing the database only has to run when I log into the admin panel or the cache engine decides its time to update itself.

I can imagine this might not work so well for a site where changes happen very frequently. Maybe, but I suspect those are in a minority. For them, MySQL would probably still be an option, but it feels like for such a website, WordPress itself probably isn’t the right tool for the job.

Incidentally, I don’t plan to support commenting because I do find moderation a bit of grind. Because I’m just one person and I have other stuff going on in my life, there would be a very long delay between someone posting a comment and my approving it. The better quality discussion tends to happen on sites like Hacker News where there is an active moderation team that I can’t even hope to match.

Why not use a WP site manager tool?

Since WordPress are probably not on the verge of releasing an update with SQLite option, I will probably end up doing exactly that.

I already have tools for managing folders and zip files. Cloning a website could be a simple folder-copy operation were it not for the separate database. Tools that know about the database are very nice but it all feels like the wrong answer to the question. We’re all in a world where things are in the state they are in and we have to stoically make it work. Site managers fix the symptoms but they don’t address the underlying issue.

Picture Credits. (All CC.)
“Me And My Shadow” by “DaPuglet”.
“Trees” by RichardBH.
“Like a string of pearls” by Thomas Rousing.

My Crazy Software Engineer Tattoo (that I didn’t get)


This isn’t the idea I had,
just a picture for illustration.

I had an idea for a nerdy tattoo a few years ago. It would represent myself as a software engineer and I thought it was quite clever. I seriously considered having it done but decided against it in the end, despite its cleverness.

Ink’d

This is my idea, the “end comment” symbol in many programming languages:

*/

In C, and other languages that can trace their lineage to C, comments start with a /* and end with a */. Anything inside is ignored by the language, allowing the programmer to describe what’s going on. This is tremendously useful when reading other people’s code or even your own code from the past.

    /* This is a comment. */
    Code();

    /* This is another comment. */
    MoreCode();

Another way of looking at it is that these /* and */ symbols mark the change of state between comments and code. /* says “After this is comment” while */ says “After this is code.”

Or to put it another way, */ means “Enough talk, time for action.”

(This is where you exclaim to yourself how clever I am to have thought of that.)

I didn’t have the tattoo done in the end. Describing what it meant would have taken too much explanation. Even if a fellow programmer recognized the symbol, they would probably first think it looked like I’ve been “commented out”, as they wonder if I had the /* on the other side.

Also, rotated a little, it looks a bit like a squinting cyclops.

*/

Picture Credit: Moomin Tattoo by Tiffany Terry.

GIT isn’t perfect. (And other blasphemies.)

I was embarrassingly late in the game coming to GIT as a version control system. Time has since passed and I’m now happily using it. The days when we had to lock files before we worked on them are thankfully a distant memory.

My road was a little bumpy…

Orphan git.

It’s a new dawn, it’s a new day, it’s new version control system and I’m ready to download a project from a GIT server.

The first step was to perform a clone of a project. “Clone? That’s a strange word for it. Maybe this is just it’s own way of saying Download.” Whatever you call it, my “clone” was ready and I can get to work.

Later, I’ve done my work, tested it my dev-environment and I’m ready to check it in. Before I do that I need to check on any changes the others have done. Looking for a “Get Latest” option, I find “Pull”, which seems to do the right thing. Those changes don’t conflict with mine so I can continue.

How do I get my changes back to the server? There’s a “push” option which sounds like it would be the opposite to pull, but that doesn’t do anything. Finally, I find “Commit” and it shows me all my changes with space to write my comments. Success!

I’ve committed my changes to the master branch! Sounds final. The red icons next to the modified files have now turned green. There’s no indication there’s more to do, or at least none that’s obvious to this beginner. My job is done so I can now switch my workstation off and disappear on vacation for a week.

Not quite. The thing called the “master” branch isn’t the master branch. When I had the “clone” operation and thought the name was a bit odd, it turned out I had just made a complete copy of the code repository and it was now separate from the original. To finish the job, I needed to also perform a push, but after having done the commit.

And this is what annoys me about GIT – we profoundly disagree about what’s important. As far as I’m concerned, my local hard disk is just a necessary staging point to the way to the central repository where release builds happen. You perform a commit and that has all the ceremony and pizzazz, but a push deals with the big picture.

Ask most GIT clients what’s important an it will answer that its own local repository is the focus of operations. The remote server is basically an after-thought. “Huzzah! We’ve done a commit! Oh, you want to push too. Okay then. Don’t forget to do the push again next time because I won’t remind you if you forget!”

(I have, many times, ran completely pointless builds having committed but forgotten to push. The build server builds without my changes and I end up wondering why nothing I did made any difference.)

There is no shared server only ZUUL!

Before I proceed, I should probably clear a few things up. Discussing GIT architecture can be a bit tricky because if you talk about a shared server or clients-and-servers as separate things, someone will step in and insist those don’t exist.

Okay. As my mathematics teacher once told me, “You can use what ever fruity language you like, as long as you define your terms.”

A GIT server never produces its own content (except maybe for house-keeping), but instead receives commits pushed from one or more clients.
A GIT client produces commits (by the user) and pushes them to the server.

“I was very happy with SVN thank-you-very-much.”

Enough negativity. What would my ideal client would look like? Maybe it already exists. In a nutshell, there would be no local repository. All operations would take place on the shared server with the client channelling my actions to that server.

  • When I commit, I’m committing to a branch on the server.
        (No more forgetting to push after commit.)
  • When I create a branch, I’m creating the branch on the server.
        (No more forgetting to pull before starting.)
  • When I switch branches, I’m switching to the latest current state of that branch on the server.
  • When I browse the history, I’m browsing the history on the server.
  • If there is anything stored locally, there’s maybe a cache to save time but that’s it.

As far as the server and the rest of the team are concerned, I’m still using traditional GIT. I work, I commit changes, I push them to a branch on the server. Just like everyone else. No-one has to change the way they work to accommodate what I’m doing.

How would it work?

Some clients already have a combined commit-and-push option. My ideal client would take that further into an atomic commit-and-push. The client would create the commit data internally and then attempt to push it to the server. If the push fails, (perhaps someone else has made a change or the server is down) then the error is reported to the user but the commit is rolled back. Once the user has resolved the problem, they may try to commit again.

Other actions would work in a similar way. Actions that would normally happen as actions upon the local branch, are instead applied to the remote branch using pushes and pulls where necessary. If there is a local “cloned” repository behind the scenes, it’s just there as a convenient cache.

What if you are offline?

One of the selling points of GIT is that you can work offline, perhaps while travelling on an airplane. Sometimes the server itself is down and it just keeps on working.

If you do find yourself offline and you can’t push your work, my ideal client would have a way to store an “offline commit”. This would effectively be like how commits work in traditional clients. The difference is that the UI wouldn’t hide that the commits are only offline. The changed files would have a different color and there would be a bright indicator somewhere, warning you that you’ve not really committed your work yet.

What if you send pull-requests or patches instead of pushing?

All of this would only work if you have push rights to the server. Some people don’t work that way and instead can only clone-and-pull, but need other means to get their work into the shared repository. For those people, fair enough, the traditional client is probably best. The best of all would be one that could work in either way, depending on if the user has push rights or not.

I wouldn’t be able to push updates to you directly.

When GIT people say that there’s no such thing as a shared GIT server, they mean it. When you do a “push” action, you don’t have to push to the origin server, you could push to one of your colleague’s workstations instead.

People do that? That sounds like a project management nightmare!

SVN is still available if you prefer that way of working.

I’d still rather be working with GIT than the previous generation such as SVN. Merging changes was a nightmare and I’m happy those days are behind me. Distributed teams couldn’t work well with SVN without a lot of administration.

All in all, I’m only really looking to replace one isolated part of the GIT system. If I switched to my ideal alternative client, the rest of the team wouldn’t have to and could continue using traditional GIT happily.

Picture Credit:
100_2223 by “paolo”. (CC)
Sir Walter Raleigh, by William Segar, 1598. (PD)
(Why yes, I am very clever. Thank you for noticing. I’m sure no-one else has thought of this exact joke.)

Falco T310 – Unleashed

1993. Computers were desktop PCs running MS-DOS and the Internet was unheard of. My school had a number of PCs with Borland Pascal installed which my friends and I happily learnt. Along the way, we wrote a clever variation of the Minesweeper game. Life was good.

That would all change when I started my Computer Science degree course at university that year. Instead of many single-user machines running MS-DOS, we’d all be sharing a multi-user machine running UNIX.

Terminal Illness

To use this multi-user machine, we’d need to log-in from a terminal. If you were fortunate enough to find a vacant PC, you could use the terminal emulator program to connect. This had the very useful feature of being able to switch between screens so you could operate many sessions at once. I would usually have one with the email program running so I could switch to it occasionally to see if any new messages arrived, while a second session would run EMACS for whatever I was writing. A last one would compile and run stuff.

If I wasn’t quite so fortunate to find a vacant PC, I’d have to use one of the Falco T310 terminals. These were serious old-school terminals that connected to that machine over a serial port. Actual RS232 connecting to a multiplexing box in the corner. The university had maybe a hundred of them. Because they did only have a serial connection, you could only have one session per terminal. No fast switching between sessions for you – if you wanted to check your mail you had to shut down whatever you were doing and start up the mail reader.

These terminals weren’t all bad. It understood the standard ANSI codes to move the cursor about, so there wasn’t too much friction moving between the two. We coped and got on with the job.

Loss of control (characters)

One day, I intended to review a source code file, so I typed a “cat” command to show the listing, except I had accidentally run cat for the compiled binary executable instead. Oops! The screen filled with noise punctuated with beeping noises. Efforts to stop the onslaught were in vain as the buffers filled up with unintelligible bytes.

Then something unexpected happened. The screen changed mode and lines were drawn mixed in with the text. Not the box drawing characters I was used to but proper lines, drawn at funky angles spanning across most of the screen. These terminals supported some sort of control codes for vector line drawing, and my executable code just happened to randomly contain those codes. I must find them!

Living the student life, I wasn’t getting much of a chance to exercise my artistic muscles. Back at school, I knew how to program graphics in Borland Pascal and I’d come up with simple games and create animated art. Even dull homework projects would have a bit of a flourish thanks to creative use of the 640x480x16 mode. On UNIX in contrast, I was back in the 80s with an 80×24 character display, yet here was an elusive graphical mode I hadn’t seen in months.

grep -v “a”

Actually finding what those magic control codes were was easier said than done. Once I had accidentally entered this graphical mode, I found I couldn’t type commands anymore. The only way I knew to get back to normal was to power cycle the terminal and login again. My attempts to split the file in half and display one of the halves would be accompanied with incessant loud beeping from all the BEL/7 bytes, which greatly disturbed the other people in the room. That amount of beeping could only mean I was up to no good!

After spending a day trying to extract the codes I needed, I had to give up. I was unfamiliar with working with Unix beyond dealing with plain text files. I knew how to open files in binary mode back on Borland Pascal, but not on any language I had access to in Unix. There was no StackOverflow to ask so I was stuck impotently banging rocks against this monolith. This was software development in those dark ages.

Next: Checking in at The Motel. BBS Systems, Fidonet and reinventing the remote-desktop.

Picture Credit: VT100 in the flesh, by Dana Sibera. (CC licensed.)
(I couldn’t find a picture of the Falco T310, so I used this picture of a VT100 instead. Sorry about that.)

Why I willingly bought a Windows Phone

Without shame or apology, I use a Windows Phone. A bright orange Lumia 630. I purchased it with my own money. No-one pushed me to it or chose it for me. It was entirely my decision.

But why?!

Phones

My story starts in 2012 when I had outgrown my aging Symbian phone. After considering a number of options, I purchased an Android based Samsung Galaxy S2.

I had considered an iPhone at the time, but the main reason I didn’t was that I’d have to buy into the Apple ecosystem, which just wasn’t for me. My primary computer platforms were Windows based and moving to iPhone would be a big culture shock. My Samsung instead fitted into that world quite neatly and I’d remain happy with my choice for years.

Stage Fright!

In 2015, a security vulnerability (known as Stage Fright) was found in many versions of Android, including the one on my phone. All it would take was for someone to send me a malicious text message in the night and my phone would be taken over.

Not to worry, new phones had already been fixed and I was sure it would only be matter of time before that same fix would be pushed out to older phones like mine. Every day for a few weeks, I’d go into the phone’s check-for-updates system to see if a fix was available. Every day, there wasn’t. I’d call tech support to ask when (not if) a fix would become available. “Soon” was always the infuriatingly non-specific answer, occasionally along with the subtle suggestion that maybe I should buy a new handset instead.

Finally, I just couldn’t take it any more and gave up. My phone, despite being only three years old was considered too old to be updated. The risk of keeping it switched on, waiting for a drive-by attacker, was giving me too much stress. I switched the phone off and put it away, never to be used again.

Normally, there will come a natural time with each phone I use when I start to feel it is time to upgrade, having simply outgrown the old one. When that happens, I keep using the old one while I take my time to consider my choices. This time was different.

It was clear to me now that the Android ecosystem had a problem. Security vulnerabilities were not being taken seriously by the handset makers who would rather I just purchased a new device instead. If I had bought a new Android phone back then, I’d be supporting that attitude with my cash!

Choices

Having lost trust in Android, I was left choosing between Apple or Microsoft. At first, I wasn’t even considering Windows Phone, having had bad experiences with the platform some ten years earlier. Faced with an iPhone as my only choice left, I was willing to give the new Windows Phone a try.

Trying out a Lumia 630, I was pleasantly surprised. The tile concept was a welcome relief from the “Space Invader” style rows-of-icons that dominate the rest of the market. Suitably impressed with the whole package I ended up buying one and I’ve not looked back. (Except to write this.)

The lack of apps for this platform is a little annoying, but I get by. I have instant-messaging, a podcast player, a weather tile on the home screen and a few others. For everything else, I use a number of “M Dot” websites. (m.facebook.com, m.youtube.com, etc.)

The Future

How long, after having purchased a smartphone, is it reasonable to expect support in the form of security updates? Back when “Stage Fright” happened, I found that answer for the Android ecosystem was 1½ years. That’s just way too short in my book.

My Lumia 630 is around two years old as I write this and I’ve just installed an update that fixes the WPA2 “KRACK” bug. If I had purchased another Android based phone back in 2015, would I now have an update for this new bug? (Or, would I be back down the shops spending more money to enrich the handset makers who are laughing at the chump that I am…)

While I’m not planning on replacing my phone any time soon, its likely I will feel I’ve outgrown it in maybe a couple of years down the road, especially as Microsoft have announced they will not be actively developing it any more except for those security updates. When that day comes, I hope Android will have taken a tip from Microsoft on how to do updates right.

Picture Credits
Microsoft Lumia 630 running Podcast Lounge. By me, ironically enough, using an iPhone.
Tension, 91/365 by Matt Harris.
Future by “Legosz”.
(Pictures are Creative Commons licensed.)

Is your API broken?

“Welcome to the Example Rutabaga Company. We’ve got a simple REST API for all your rutabaga needs!”

Indeed, it is simple…

   POST https://rutabaga.example.com/Order/ HTTP/1.1
   Content-Type: application/json

   {"Quantity": 5800,
    "Quality": "Tasty!",
    "DeliverTo": "123 Fake Street, New Orleans"}

Send this and you’ll either get an error or an “OK” response with a tracking ID inside. Later, you’ll get several thousand tasty rutabagas in the post. What could go wrong?

Everything.

Schrödinger’s Response

From the client’s point of view, there’s a clear action to take depending on the response code.

  • 200, log the tracking ID.
  • 5xx, try again later.

But what if there’s no response? Perhaps your friendly HTTP client library code has thrown an exception because the connection has broken down. These errors are unavoidable, especially when the client is on a mobile device. What should we do in this situation?

You could try again later? But hang on, this violates the thing that makes POST different from GET and PUT. (GET and PUT are designed to be repeatable, but POST requests are express calls to take action.)

You might reason that the first POST request failed, so you’re not actually repeating anything, but aren’t you? There are two possibilities when you get an error from any sort of network request.

  1. The request was lost on the way and the remote server did not handle the request.
  2. The request arrived and was handled, but the response to the client was lost.

If A, we’re fine to repeat the POST. No problem.
If B, the remote server is already in the process of shipping a truckload of rutabagas to you and has no idea the response got lost. Repeat that request and you’ll end up with two truckloads of rutabagas.

But this is the point, the client has no way of knowing if its A or B. The only entity that knows is the server and we can’t talk to it.

For a surprising number of APIs I’ve written client code for, that’s the end of the story. The API simply has no reliable way for the client to find out what happened.

How does your API handle this situation? Is your API broken?

Opening the box

One way an API designer could resolve this issue is to provide a way to look up the order history.

This is probably what you’d do if (say) you were shopping online and your internet connection died just as you hit the Complete Purchase button. Once you got back online, you’d check to see if the order was in the system before repeating the order.

Sounds simple? This would work but be careful, for alas, this approach has lots of caveats. Fortunately none of them are really insurmountable.

Beware of false duplicates

Say you’re in this worst case scenario and your link to the server has just been restored. Your code dutifully downloads the list of outstanding orders and finds one for 5800 rutabagas. Job done?

Wait! Was that your order? Maybe the account holder deliberately made another identical order from a different machine. We don’t know – We can’t know.

This can be resolved by ensuring the client has the opportunity to supply its own way to identify the the initial request – perhaps with a client supplied ID – and allowing for a lookup later on.

How long should we keep that ID around?

Expire ID records too quickly and a client that’s been offline for a prolonged amount of time will not be able to resynchronize. Store the IDs forever and that would be a waste of space.

You may have a figure in mind that’s reasonable. If not, add an occasional reconciliation of expired IDs to your API.

Who chooses the ID?

The client should be able to freely chose an ID. You may be looking at your database and thinking there’s a field supplied by the client that’s already got a no-duplicates constraint. If those values came from a source external to the client, it won’t be able to control the uniqueness of those important values. That external entity might very well be feeding identical records into the system through different channels and the client won’t know if that duplicate it found was their own or someone else’s.

Whose ID is it anyway?

Make sure the client has a clear space from which to select IDs. We can’t have multiple users all counting from 1 because you’ll get collisions very quickly. A GUID would work as long as they are generated correctly. Maybe if the API requires that the client log-in first, the server could track IDs on a per-user basis, but not all APIs require a log-in or pre-registration.

Avoid colliding with prior attempts still being processed.

Consider this: A client attempts to send a request to a server, but the connection fails with a time-out error. Thirty seconds later, the client asks the server if that prior request made it, which it answers “No”. Time to repeat that first attempt?

But wait! That first attempt timed out because the server was unexpectantly busy and has only just started dealing with your first request.

You can mitigate this (probably rare) scenario by making sure the server will return an error to the second POST request. Almost all DBs allow for any field or combination of fields to have a uniqueness constraint and the error will just happen if this scenario ended up playing out.

Do you have a ticket?

There’s another protocol that works in a similar way but puts the server in control of the IDs, at the cost of requiring two separate phases. (The actual request could be carried along with either the first or second phases.)

The first phase has the client asking the server for an ID while the second phase has the client committing to complete the transaction with that ID.

This protocol does require that when the client begins phase two, they have committed to not return to phase one for this transaction. The client must also store that ID and be ready to use it for when the connection has been restored. Similarly, the server needs to agree that it only starts processing a transaction once the second phase request has arrived.

This two-phase approach covers for failures at any step along the conversation, so long as the client and server stick to the agreement.

  • If the first request is lost, there’s no problem in repeating the first phase.
  • If the first response is lost, the server will have allocated an ID that will never be committed, but will be left indefinitely in an uncommitted state. (A later occasional reconciliation of orphaned IDs would be useful here.)
  • If the second request is lost, the client can later repeat the commitment of the transaction after checking its state using the ID it received in the first phase.
  • If the second response is lost, the client can later check the state of the transaction using the ID and see that it is already committed.

This protocol has a similar caveat from the earlier plan – How long should the server keep track of used ID numbers? The server will be left with IDs that will never be committed as well as committed IDs that the client might still need to check up upon later. Again, you may wish to come up with reasonable time limits or allow for a reconciliation of IDs later on.

While this protocol might be considered more complicated because of the two phases of conversation, there are fewer caveats to this plan and fewer oportunities for things to go wrong. This is my personal favorite.

Do I really need to do this?

As I write this I’m also working on a small web service that uses a REST API with POST requests, but taking none of the advice I offer on this page. Why not? Simply that the cost of the resources being allocated by this API-to-be are so close to zero that making the effort to implement the API robustly is just not worth it in this particular case.

But consider, even if you’re not transmitting invoices worth thousands of dollars, do you really want duplicates turning up?

Picture Credits
“Rutabagas” by Dale Calder
“Barney the cat” by Bill P. Godfrey (me).
“Rutabaga 2” by Dolan Halbrook
“Commit no nuisance” by Pat Joyce

NEVER sanitize your inputs!

I’ve seen this cartoon being linked-to in so many comment threads and forums. Anytime its even a little bit applicable, someone will post a link to this cartoon. It has become so pervasive that if you search Google for “327”, it’ll be the third link returned, right after the Wikipedia pages for the year and the car.

Search “328” and the next XKCD is no-where to be seen.

The lesson, according to this character and so many real people on the internet, is to sanitize your inputs. The school in the cartoon didn’t sanitize its inputs – and one of its database tables got deleted!

Ask anyone about developing websites and they will tell you the first lesson is always to sanitize your inputs. In this day and age you’d have to be crazy not to sanitize your inputs.

Trouble is, sanitizing your inputs is very bad advice.

What went wrong at the school?

A quick aside for what’s going on in this cartoon. A new student named…
      Robert'); DROP TABLE Students; --
… joins a school and the administrators dutifully add a record to their database for the new student. The software takes the new student’s name and builds an SQL instruction.

    string sqlcmd = "INSERT INTO Students (name) VALUES ('" + name + "')";
    // INSERT INTO Students (name) VALUES ('Wilhelm von Hackensplat')

With normal names, the string would be a perfectly valid SQL command which will add a new record into the table named Students. But what about our friend Bobby Tables?

   INSERT INTO Students (name) VALUES ('Robert'); DROP TABLE Students; --')

Because that single-quote character wasn’t sanitized away, an extra command to drop the Students table crept in. This is what we know in the trade as an “SQL Injection” attack, as some unintentional SQL got injected in.

So let’s sanitize it?

We can’t allow people to go about running arbitrary SQL commands willy-nilly. Something must be done!

That single-quote character in the student’s name is clearly the problem, so we’ll take it out while building the SQL command. This fixes the command and you won’t find database tables disappearing. So why do I call this bad advice?

Trouble is, the single-quote character has a bit of a split personality. As well as being a quote, it’s also an apostrophe. Real people have real names with apostrophes and if you’ve ever seen a name where one has clearly been dropped, you’ve seen the mark of the sanitizer.

Perhaps this is why some Irish people prefer to spell their name using the letter Ó. After years of having their name mangled by naive software developers, they made a new letter.

So forget sanitizing your inputs. What you need to do instead is to contain your inputs.

Contain my inputs?

The error made by the programmers at the school was that they failed to contain Bobby’s name. A student’s name is just a sequence of characters, so you need to use it in a way that could only be a sequence of characters.

Lucky for us, all good SQL access libraries support parameters. Instead, you write the command but with placeholders for the values to be added in little boxes later.

   INSERT INTO Students (name) VALUES (@name)

Here, there’s a clear demarcation between what’s the SQL command and what’s the value from outside. The student’s name is inside the little box where the apostrophe is just another character. The name has been contained and that destructive command inside can’t break out.

But that’s what we mean by “sanitize”!

Then you should stop calling it that. The word “sanitize” is a common enough word and most people understand it as a word for cleaning – removing the bad stuff and keeping the good stuff.

  “Did you sanitize the kitchen worktop?”
  “Yes. I put it in that sealed box over there.”
  “That’s not sanitizing!”

  “When I use a word, it means just what I choose it to mean. Neither more nor less.”

There is a real problem with software not accepting names with apostrophes, as discussed earlier. Real software developers are listening to the advice to sanitize and interpreting it to mean they should have the bad characters removed.

Isn’t sanitization still needed with HTML?

HTML has a similar problem with injection. Say you’re building a website that can take comments from the public, like this one, you’d want to prevent people from leaving comments with bits of scripting code inside.

   "I <i>love</i> this website! <script>alert('Baron von Hackensplat Was Here');</script>"

Its fine to allow the emphasis, but if your website also publishes the script, anyone else visiting your site will end up running that script.

Unfortunately, HTML doesn’t support a nice little box from whence nothing can escape, so we need to provide that box of containment ourselves. Any HTML from the public should be parsed and rewritten as safe-HTML, where only a safe subset of tags are allowed.

You might argue that this amounts to sanitization, but it betrays a bad mental model. Okay, you’ve dealt with the big problem, but forgotten about the little problems.

Have you ever seen a comment thread where, starting part way down the page, everything is in italics? This is caused by someone opening italics but not closing them. If your mental model is to sanitize, your natural reaction would be remove the ability to use italics. If your mental model is instead to contain, you know that italics is really harmless and just needs to be closed when left open.

Cross-out Cross Site Scripting

In closing, I’d just like to appeal to the industry to drop the phrase “Cross-Site-Scripting” and call it “HTML Injection” instead.

Any scripting that you didn’t write or don’t trust, cross-site or not, is a very bad thing to have on your website. Putting “Scripting” in the name makes people think of scripting as the problem but its so much more than that.

Calling it “HTML Injection” draws an obvious parallel with “SQL Injection”. Its the same problem with the same solution.

Credits: XKCD 327 – Exploits of a mom by Randall Munroe.
“When I use a word…” is a quote from Lewis Carroll’s “Through the Looking-Glass”.
Second: sanitize the gloves by Thomas Cizauskas.
Fun with cling film by Elizabeth Gomm.

I need a good podcast catcher (and a bit of a rant)

I listen to podcasts on my daily commute. These are radio shows that can be downloaded over the internet and listened to later. However, to keep up with a weekly show, I’d have to – every week – visit the show’s website and manually download the latest episode. That would get real tedious real fast. To resolve the tedium for us all, the podcast catcher app was invented.

Podcast catchers allow me to list all the shows I want to listen to. Every day or so, it automatically checks each show on the list to see there are any new episodes for me. If it finds any, it downloads them and plays them for me.

Currently, I use Google’s ‘Listen’ app, but that service is about to be closed down with the imminent closure of Google Reader. I need to replace it. I’ve downloaded a handful of alternative apps, but they all lacked a feature I find essential. I remain a little flabbergasted that any podcast app out there does it any other way.

“She smoothes her hair with automatic hand and puts a record on the gramophone.”

My daily commute is ~45 minutes of driving each way, so for me, a good player needs an Auto-Play mode. When one show finishes, another should start playing right away. There’s very few places I could safely pull-over and having to push buttons while I’m driving is right out.

But not just any Auto-Play mode. Oh no. All the apps I tried had an Auto-Play mode, but they all did it so very badly.

Ask yourself – When a show finishes playing and Auto-Play is switched on, which show from the list of unplayed shows should your app select to play next?
   A. The one that’s been waiting in the queue longest.
   B. The one that appears next in the list when sorted by episode title.

Did you pick A or B? Sorry, they’re both wrong, and yet these were the only options available on an awful lot of podcast apps.

The right answer, is to play the one the user has queued up next. The “In the order I want” sort criteria. No really, who is actually asking for the order of play-back to be strictly enforced? Would anything else, perhaps, offend your sense of politeness?

   “You want to listen to the latest Cognitive Dissonance show? But what about this episode of Hanselminutes? It has been waiting paitiently in line and this is its turn to be played.”
   “I say! That would be jolly impolite of me. Don’t want to hurt the feelings of those audio files. Pip pip!”

“I sat upon the shore, fishing with the arid plain behind me. Shall I at least set my lands in order?”

With Google Listen, new episodes join the listening queue, but I can arrange them in the order I like. If I’m just not in the mood for the next episode in line, I’ll select another episode that I do want to listen to and bring it to the top using the ‘Move to the top of queue’ button.

Once I’m happy with my selection of the next hour or so’s worth of stuff at the top of the queue, I hit play and drive off. As the first show finishes, its taken off the queue and the next episode I had queued up starts playing, all without any interaction.

The few alternative apps I downloaded did not offer this. It seems such a simple thing and yet I can’t imagine the insanity of not being able to control the playing order.

If one, settling a pillow by her head should say, “That is not what I meant at all.”

Some people reading this, I’m sure, are thinking “He wants a playlist manager”.

To manage a playlist, you’d need to first create a playlist and give it a name. Then you’d need to add shows to the list and save it. Then once its played you’d need to delete that playlist and start a new one.

No. That’s just another level of insanity. All I want is a button on each episode labelled ‘Move to the top of the queue’. That’s it. If I have to perform some ritual every day to create a new playlist or whatever before I can get that button, I’m not going to be happy. Life is too short for pointless ritual.

Maybe if your UI is so user friendly that the ritualistic parts of your playlist manager just disappear, that’s fine but that’s not what I’ve seen out there.

“Oh, I have to chose a name for this new playlist. Why not just pick a random name for me? I’m only going to delete it in an hour’s time anyway.”

So there is my plea. Does anyone please know of a podcast app for Android phones that implements its Auto-Play mode… correctly? I will happily pay a reasonable subscription fee for good quality software.

If you’re an app developer and your podcast app does it correctly, please feel free to use this page’s comments for some free publicity. On the other hand if your app doesn’t do it right, please treat this page as a bug report.

Picture credits:
Day 30.06 Voices on the radio!” by Frerieke on Flickr.
Listening to Radio Karnali” by the BBC World Service.
The section titles were borrowed from The Waste Land and The Love-Song of J. Alfred Prufrock, both by T.S. Eliot.

PHP – Some strings are more equal than others

You may have recently read about the PHP programming language, when it was found that if you compare the two strings "9223372036854775807" and "9223372036854775808" with the == operator, PHP will report these as identical. Most of the time PHP does the right thing, but you need to be careful about these exceptions to the rule.

This was reported as a bug to the people who maintain PHP, but they responded that regarding these two strings as equal was really the correct thing to do. Programmers who feel these two strings should be treated as different should instead use the === operator. This operator checks if two strings are equal, but this time, means it!

But this isn’t the end of the story…

While === is fine for strings containing only digits, there’s a little known feature of Unicode where you can express an accented letter either by a single character such as 'é' (U+00E9), or by using a regular ascii 'e' (U+0065) and then adding a special character (U+0301) which means “put an accent on that last character”. If you want to compare two strings that are the same except they each use different ways of expressing an 'é', you need to add another equal sign and use ==== to differentiate them, as === will see them as equal.

There’s a similar rule about the Unicode smiley face character ‘‘ (U+263A) and the more familiar colon-bracket smiley ':)'. These will compare equal unless you use the ==== operator. As well as that, all of these comparison operators see both the white smiley face ‘‘ and black smiley face ‘‘ (U+263B) as identical, unless php.ini has the ‘Racist’ setting switched on.

Even the ==== operator isn’t the end of the matter. This can’t tell the difference between serif and sans-serif text. Most programmers are happy to treat these as equivalent, but if the text is highly secure, you need the ===== operator which knows that ‘A‘ and ‘A‘ are different.

But the ultimate equality operator is the six equal sign ====== operator. As I write this, no-one has found two values where x======y returns true, even when x and y are copies. Some mathematicians suspect there are no such pairs of values, but a mathematical proof remains elusive.

Picture credits:
‘Equal in stature’ by Kevin Dooley (CC-BY)
‘Equal Opportunity Employment’ by flickr user ‘pasukaru76’ (CC-BY)