Thursday, December 11, 2008

The Certainty Bottleneck (and ORMs)

Oraclenerd has opened that can of worms about OO, ORMs and Databases.

I'm in the school that there are some (a lot ?) of application developers who don't understand the role of the database. Especially among those who call it a persistence layer.
They complain that "the database is the bottleneck". Guess what guys. That's the whole point of the database. It is the SINGLE point of reference that stops your Application Servers tripping over each other with their processing. For there to be one correct data value, it can exist in only one place. It's the Certainty principle and the database is the Certainty Layer. And it doesn't matter how you persist your data, if you want a single point of truth, it will be the single bottleneck of truth.

When you just use a RDBMS to persist data, rather than supporting concurrency, consistency and integrity, you've missed the point. You can use XML, flat files or anything for persistance. In Oracle, persistence (or it's ACID cousin, DURABILITY) is ultimately managed though a flat file, the redo log. Persistance isn't the hard bit.

There's a whole buch of Oracle that is simply about fast access to that persisted data. Oracle data files are about structuring the data for fast access. Indexes, buffer cache, are all about performance. There's also a chunk of Oracle about sharing data, especially in regards to locking data.

But what about the rest of ACID. Atomicity is ALL about business processes. It either succeeds or fails. And, to my mind, that is the true object-relational mapping. The OO layer changes an object, and atomicity turns that object-level change into a consistent set of changes to the underlying data (note: data, not tables). Consistency is the flip-side of that. It ensures that a view of data is an accurate representation of the truth at a point-in-time. If your 'order' object has a bunch of ordered items, consistency gives you back that order with all its items.

Durability, atomicity and consistency are aspects of databases I have no problem with being 'invisible' to the application developer. A lot Oracle developers don't know about REDO, UNDO and SCNs. I almost agree with "Persisting something to a database is really a low-level concern that as an application developer I shouldn't even have to be bothered with". From the application point of view, those parts should just work. I say "almost" because developers (database and application developers) need some understanding of the mechanism and its consequences. Anything that takes on the responsibility of consistency and atomicity BECOMES part of the database. And it doesn't have to be an RDBMS. I can see XML or OODBMS being able to do that.

The main advantage I see in the relational model is integrity. The normalisation process forces data to be stored in a way that embodies certain rules, specifically business rules (albeit not every business rule). Application logic can only enforce business rules for incoming data, and maybe filter out or raise alerts about non-conforming data already recorded.

There are applications where the integrity enforced by a normalised database are secondary to performance requirements. Data warehouses fit in here. So do a lot of 'Internet scale' databases, though here performance is more about scalability. A blog post may say '20 comments' but no-one really cares if only 19 show up because the database with the comments had to be refreshed from a backup and is 5 seconds out of step from the database with the posts.

Unfortunately a lot of web app developers seem to be under the mistaken assumption they work for Google, Ebay or Amazon and that they have to be able to scale to 100 million users. So when the big boys say "we sacrificed integrity for performance", they think that is "best practices". It isn't, it is a compromise. Oracle charge lots of money for RAC and people pay it when they need that level of integrity.

For the majority of business applications, scalability requirements can be met without sacrificing the integrity benefits of the relational model. For some of those, cost is a significant factor in sacrificing integrity too. I am confident that in five years time, changes within the RDBMS software will increase its ability to scale out (plus increased cost pressures from Open Source) and there will be LESS reason not to use an RDBMS. There are limits to how far RAC will scale when all nodes are sharing entire databases, but how about if only some tables are shared between some nodes, or with distributed databases with some over-arching recovery mechanism for consistency. There is nothing fundamentally broken about relational theory or relational databases. [Okay, thats debatable. You can certainly get into a situation when old data is so out of step with current business practices it is hard to reconcile the two into a workable data model.]

I'll agree that SQL is a poor language for application logic. I once coded in SQL*Forms before we had PL/SQL. Not good.
PL/SQL is a massive improvement especially with frameworks like Apex andr Quest Code Tester, but has its limits, and is fundamentally a procedural bridge to SQL. Java is poor at handling large volumes of persistent data. It relies on an API (JDBC) to something else that can do the job. A database resident language with static SQL (such as PL/SQL or potentially Java+SQLJ or anything else someone could write to run on Oracle's JVM) has advantages that code running on a separate server can't have (automatic validation against metadata, cursor caching, full error/trace stacks come to mind).

The first move in resolving the weaknesses of Java has already been made. Java applications have given up using JDBC directly and separated out their database logic from the application logic into an ORM. The next logical move is to shift that ORM towards the database where it can be better managed. After all, if you want to separate application logic from relational dependencies, the dumbest place to put an ORM is in the application layer.