Monday, December 11, 2006

NULL is my middle name

Eddie Awad has linked to one of the perennial database discussions, the role of NULLs in the database.

My take on this is coloured by the fact the NULL is my middle name. Or to be more precise, my parents chose not to give me a middle name. So let me make it quite clear that NULL is NOT "Unknown". I know for a fact, backed up by my birth certificate, that I do not have a middle name. Any application that deals with my name needs to get used to that fact, because it is not going to change. I do not want to be addressed as Mr G. M. Yers, or Mr G. . Myers or Mr G. ?. Myers. The vast majority of computer applications don't have a problem with my lack of middle name. The world has not fallen in because of my parents decision.

So if NULL isn't "Unknown", then what is it ? Simply speaking NULL is as NULL does. It is ignored by aggregate functions (count, min, max etc), it is ignored by most predicates (=, IN, LIKE etc) and in fact ignored anywhere that doesn't explicitly cater for them. And that is they key to its usage. You allow a column to be null when you want null values to be processed in the way Oracle processes them.

Null-haters tend to list a bunch of cases where the NULL makes no sense and extend that to say that NULL should be abolished. That assumes NULL is a data problem. It isn't. It is a business issue, and one of a number of several presented by 'extreme' data. A company decides to pay a bonus based on the percentage increase in sales value by individual salesmen between quarters. But Fred has only been there a month, so didn't have sales for the last quarter. Does he get a bonus, and if so, how much ? That is a business decision, not a data one. It doesn't matter whether we have recorded a null or a zero against Fred's sales for a quarter, or whether the row doesn't exist or if we have an exotic datatype that holds either a number or a 'weird data flag'.

There are undoubtably cases where a NULL makes no sense. We are allowed to defined columns as NOT NULL (and even put in other check constraints to ensure they only have acceptable values) and this data integrity constraints should be used wherever possible. The question is not just whether we should allow SALES_VALUE to be null, but should be allow it to hold negative values, sub-cent (or penny) fractions, values over a billion....

And my last word is, if you abolish NULLs, then I'm going to get a middle name that is at least 2001 characters long (and include some heiroglyphics not seen since the pyramids were finished), and that will really sod up your database !


Tony Andrews said...

Your middle name isn't NULL, it is '' (the empty string). Unfortunately, Oracle isn't smart enough to know the difference! '' is to strings as zero is to numbers, whereas NULL is just ... NULL.

Gary Myers said...

Not so.
Firstly, "strings" are a computer concept not a real-world one. In the real world, I do not have a middle name. Alternatively phrased, the number of "middle" names I have is zero (and the total number of names I have is two, a Christian name and a surname). Some people have several middle names.
In purely relational terms, if names were fully normalised, they would be in a child relationship with some form of sequence.

NULL as an SQL concept is a 'shortcut' for an optional parent-child relationship where there's a maximum of one 'child' for a parent.
If you recognise that shortcut, most of the functionality around NULL makes sense. If the shortcut was removed, relational database pedants would be happier. However businesses would still have to deal with 'missing' data.

Jeffrey Kemp said...

You're right, NULLs are a business issue - along with all data modelling issues.

In the relational model, there is no such thing as "absence of value" - every attribute of every tuple must have a value taken from the applicable domain.

Basically, you give a good example of the common practice of trying to shoehorn what are (in reality) two types of entities into one relation, indicated by the fact that a "nullable" field is not unknown, but inapplicable.

As you mentioned - we need a second table. I've seen cases where creating multiple tables with 1:1 relationships with each other simplifies the overall design, improves data integrity, and even improves performance.

Robert Vollman said...

This is no different from a "maiden name" column.

Your middle name is known. It's nothing. NULL might suggest you have a middle name, but it's not known.

How about a numeric field indicating the number of middle names? Of course then, what about the situation where you don't know how many middle names someone has? What numeric value would you use then? You guessed it ... NULL! :)

Anyway, Date's solution would generally be to simply create a table of middle names, and create relationships between people and middle names. For you, there simply would be no such relationship. For others, there would be many ordered relationships.

Gary Myers said...

"Anyway, Date's solution would generally be to simply create a table of middle names, and create relationships between people and middle names."

The purely relational solution works fine for data storage. However the problem is the SQL language mechanism for picking up a parent plus an optional child is through an outer join, which results in NULLs.

Some will view this as a problem with SQL as an implementation of relational database theory (and it is). However for us working with current RDBMSs, we need to recognise that abolishing NULLs in the storage layer doesn't help at the levels of the query, application or business.

Anonymous said...

Hi, Gary noname Myers,

Tony Andrews is right.

You have misunderstood the concept of NULLs. If you read the article you linked to (it was written by me :-), you will clearly see what kind of problems you may enter into (ref. your count example).

That article is just one of a batch: May I suggest you read the whole section:

All these articles describe problems coming from ignorance, bypassing theory, and in some cases, complete stupidity/marketing arguments.

Feel free to take on them in your blog, anf best of luck to you :-)


Gary Myers said...

"You have misunderstood the concept of NULLs."
No, we just disagree about the concept of nulls. Your opinion is that they represent 'unknown'. My opinion is they represent a non-existence of data.
I find my opinion makes it easier to comprehend how Oracle deals with NULLs. I don't get paid to care about the theory of NULLs.

Am I 100% happy with the way that Oracle deals with NULLs ? No. There are inconsistencies such as using || with a null doesn't result in a null, unlike practically every other function.

Do I trust a database with NULLs ? Yes, I use it every day and the same functions with the same values work the same way.