Thursday, January 08, 2009

JournalSpace learn that its all about the data

Did any-one blog on JournalSpace ?

I'll admit it wasn't a site I'd heard of, but they've been around for six years or so.

Until December 18th, when they died. And upon the gravestone they will carve "RAID is not a backup"

"There was no hardware failure. Both drives are operating fine; DriveSavers had no problem in making images of the drives. The data was simply gone. Overwritten."

It is hard to accidentally fully overwrite large files. Its easier for the OS just to mark a few areas of disk as 'available for use' than overwrite them. It may well have been deliberate sabotage, possibly by that same IT guy. Though I'd have thought that, if he knew that this 'attack' would work, he'd have known the weakness of RAID from a recovery point of view, and would therefore have a suggested additional backup measures. Maybe more details will come to light if JournalSpace can establish the cause of the overwrite.

It could be that partial data recovery was physically possible but financially unviable. Reconstructing valid database files, or even extracting the data from them, would be hard enough for 'simple' corruption, let alone the residue from a full overwrite (security professionals recommend seven overwrites before data is REALLY gone). I can't see recovering blog posts as being cost-effective.

According to this report, the owners of JournalSpace site also owned the hosting company. I'd expect a hosting company to have some concept of data backups. That, at some point, machines will push (or have pulled from them) a biggish chunk of data.

Oh wait, they did understand some backups.

"He had set up automated backups for the HTTP server which contains the PHP code, but, inscrutibly (sic), had no backup system in place for the SQL data. "
Its a lot easier to set up a backup for the application code. Application code probably has a copy on a dev/test machine or developer's desktop environment or in a source control system. [Okay, if they don't have backups, they may well not have source control either.] The code files aren't constantly changing, and are relatively small, so its pretty easy to arrange a 3am backup with an OS tool. You can use the same idiot-friendly tools you use to back up the digital photos on your home PC. I use SyncBack.

Hosting a database application is only part of the job. If you can't back that database up, you are hosed. Half of Oracle's cloud announcement at OpenWorld was facilitating backup to Amazon storage.

Properly backing up an 24/7 active database is a job for a professional. It often won't be a full-time job, especially for a small outfit. You can hire someone for a day to set up a backup situation, then do a few hours every so often to check it. Maybe get your PHP coder to do some backups/restores.

Because, after all, if the database gets lost, then all that application coding was just a waste.