Friday, February 8, 2008

CALLing disaster during MySQL upgrade

I've been upgrading one of my servers for the daily use, by migrating the running environment between two PCs. It's not a mission critical server because I don't run a public service there, but I need to do the upgrade carefully anyway because the version number of running FreeBSD and other applications have been changed. I want to keep the old environment as long as possible to use it as a reference, so I'm doing the migration manually by recompiling and reconfiguring the software.

One of the glitches I faced was about MySQL. It's not about the bugs in MySQL, because the database server load is very small. I was trying to transfer a dataset from the old version 4.0 to a new version 5.1 software of MySQL. The mysqldump result of the old 4.0 output didn't get through and reloaded into the 5.1 server. I could not even perform CREATE TABLE. The reason: the column and database identifiers were not properly backquoted.

The database was for my amateur radio activity. Amateur radio stations have callsigns, and in a popular contacting log exchange format called ADIF, the other party's callsign is represented by the identifier CALL. Unfortunately, MySQL 5.1 made the word CALL a reserved word, while in the 4.0 version the word was apparently not. This is a tragedy for an amateur radio enthusiast, and a careless programmer like me who tends to omit proper (back-)quotation.

After an investigation for a few minutes, an idea of referring to the result SHOW CREATE TABLE from the 4.0 server came into my mind. I did it and fortunately the table definition was properly backquoted, so at least I could rebuild the database skeleton. The dumped data was a set of INSERT INTO statements with the VALUES and they were all properly quoted, so I could rebuild the database.

The new mysqldump command's output of MySQL 5.1 looks much better and properly backquoted and quoted all the necessary strings, and even put the SQL statements to lock and unlock the database. The entire dataset is represented by a single INSERT INTO statement VALUES set, so you've got to be careful when you want to use the data set not in the full contents.

I should note that I had to rewrite all the software which generated the SQL statements for the proper (back-)quotation. This was a handful of complicated tasks of fixing the Bourne shell and Perl scripts.

And I realize why SQL injection is so popular for attacking the database servers. Parsing SQL correctly is a non-trivial process. A word can be either a part of a directive or a target identifier, depending on the position where it is in an SQL statement.

So when you want to store CALLs into a MySQL database, you've got to do it carefully with (back-)quotation.