Tuesday, January 16, 2007

Column level dependencies in 11g

One of the reported 11g enhancements is "No recompilation of dependent objects when columns added to tables"

This is going to be an interesting challenge for Oracle as it is not as easy as it first seems. Or at least if they take the simple route it's going to disguise a gotcha for some shoddy code.

Take this little procedure (a useless stub just to demonstrate the issue)

PROCEDURE prc_test (pi_location in VARCHAR2) IS
v_num number;
SELECT count(*) INTO v_num
FROM emp e, dept d
WHERE location = pi_location
and dept_mgr = emp_no;

A decent PL/SQL developer would immediately spot that we can't see which of emp or dept has the columns location, dept_mgr or emp_no. However, if location and dept_mgr are only in dept, and emp_no is only in emp, then the SQL will parse successfully and the procedure compiles fine.

So what will happen in 11g if 'location' or 'dept_mgr' is added to emp or 'emp_no' to dept ?
  • Option 1 is that Oracle will not only maintain the dependencies on columns that exist, but also on ones that don't exist. That way it can perform the necessary invalidation. This is no different to the rows with 'NON-EXISTENT' type that appear in the user_dependencies view when you refer to objects that rely on public synonyms.
  • Option 2 is that the procedure isn't invalidated but the SQL falls over if the SQL is executed.
  • Option 3 is that the procedure isn't invalidated and the SQL continues to run as always....until something else forces recompilation of the procedure somewhere down the line and THEN it fails.
Options 2 and 3 aren't pretty, but I guess Oracle could get away with those without much complaint.

But what happens if a column 'pi_location' was added to emp or dept ?

A decent Oracle shop has variable naming standards in place and a big reason for that is to ensure that the names of PL/SQL variables can't conflict with column names. If every Oracle shop had that in force and guaranteed, Oracle wouldn't have a problem. If....

To accommodate this, Option 1 has to extend to recording dependencies on the names of PL/SQL variables used in SQLs. With this, as in the current versions of Oracle, the procedure is marked as invalid when the column is added. It can recompile successfully, but testing should show up the new error (and the invalidation of the procedure should prompt for it to be tested).

If Oracle doesn't maintain that dependency, then the procedure continues to run as normal (using a pi_location bind variable) until at some point it is recompiled and switches to seeing pi_location as a column and produces unexpected results. If a procedure suddenly produces unexpected results, then you tend to look at what has changed recently. Who is going to look at causes that may have been happened months ago ?

I'm not an 11g Beta tester, so I don't know what approach Oracle has taken. It may be Option 1 and we may benefit from an insight into our PL/SQL variables. It may have a totally different solution. One option would be a tentative recompile if the table structure changes, and only invalidating the original if the tentative recompile comes up different. Another approach may be to change the precedence of PL/SQL variables and column names. Currently, if you create a database function with the same name as a PL/SQL variable used in an SQL, the PL/SQL variable is still used. It is only column names which take precedence.

Whatever is offered in 11g, it's just another reason to enforce some decent variable naming standards.

Developer or DBA

Two posts had me thinking last week.

The first was Howard's mention that his employer in Sydney is looking for a ground-to-middling level DBA. The second was the Alchemist's posting on the DBA / Developer divide.

I'm a developer. I coded Pro*Cobol (when dinosaurs ruled the Earth), wrote Pro*C, developed with Oracle Forms and Reports, threw together some Perl and created buckets of PL/SQL. I've also built a reasonably good understanding of Oracle. I know what segments, extents and tablespaces are about, why we have undo and redo, and even the difference between a database and instance. I've got a shelf full of Tom Kyte books (okay, three of them, but also Jonathan Lewis and Connor McDonald and a couple of others so it is a heavy bookshelf). I have, in extremis, altered datafile and tablespace settings (but only in development environments).

In keeping with the great divide, I'll admit to not knowing an awful lot about what DBAs do all day. As one of the comments to the 'great divide' says, management seem intent on keeping developers and DBAs apart. At one place I worked, the DBAs were on a different continent with about six hours time difference.

I know their responsibilities include ensuring backups are being done in a way that they can actually be used, and that sometimes they'll copy databases, create users, run scripts to create the tables and objects that us mere developers have asked for. They are the gatekeepers to the great and secret passwords. They patch and monitor. And if there is a high database to DBA ratio, this alone can keep them very busy. But I have the feeling that a lot of DBAs do a lot of other things too. I just don't know what and I suspect the scope of DBAing varies a lot between employers.

I've assumed that DBAs know what developers do. There's a lot of time spent in specifications and requirements, trying to 'phrase' the business requirements in a manner that means they can actually be coded. More time is spent writing SQL, PL/SQL and maybe some other languages or in an interface development environment. Then there's getting the code to run properly with some testing. Finally, we wrap it all up in some sort of parcel and throw it at the DBAs and say "put this into test or production or wherever".

I've never used RMAN, OEM or Statspack, patched an Oracle installation, wouldn't even begin to recover a crashed database and I was totally lost with Howard's recent post on fracturing LUNs (which is something to do with disks, apparently). All of which means I'd learn an awful lot as a ground level DBA. I think I'd enjoy learning how to be a DBA.

I'm just not sure whether I'd enjoy BEING a DBA. The word 'administrator' isn't inspiring, the ringing of the phone at odd hours is unappealling, and the thought of someone standing over me saying, "Oh, and if we can't get this data back, we're all out of a job" is frankly scary.

I think I'll stick to development...but I did have to have a think about it.

Wednesday, January 10, 2007

High and Low values from *_TAB_COLUMNS

The various _TAB_COLUMNS views (user, all, dba) contain the high and low values of the column picked up when stats were last gathered on the columns.
The drawback is that they are in RAW format.
Even worse, SQL Developer doesn't display the contents of RAWs yet (but please add your voice to the feature request)
This query will translate the high and low values of the common datatypes into something more readable.

select column_name, data_type, data_length, nullable, num_distinct,sample_size,
case when data_type IN ('CHAR', 'VARCHAR2') then utl_raw.cast_to_varchar2(low_value)
when data_type = 'NUMBER' then to_char(utl_raw.cast_to_number(low_value))
when data_type = 'DATE' then
end trans_low,
case when data_type IN ('CHAR', 'VARCHAR2') then utl_raw.cast_to_varchar2(high_value)
when data_type = 'NUMBER' then to_char(utl_raw.cast_to_number(high_value))
when data_type = 'DATE' then
end trans_high, num_nulls, num_buckets
from dba_tab_columns
where table_name = '...'
and owner = '...'
and column_name in ('...','...')
order by column_id

If someone can do something for timestamps, please add a comment.

After a comment on Martin Widlake's blog from Maxim, the logic for the DATE datatype should be amended to
    + (to_number(substr(high_value,3,2),'XX')-100),'0000'))||'.'||

You can also build some wrapper functions around the procedures in DBMS_STATS (assuming you are allowed to create objects in the database). Thanks for the link Herald

create or replace function stats_raw_to_date (p_in raw) return varchar2 is
  v_date date;
  v_char varchar2(25);
  dbms_stats.CONVERT_RAW_VALUE(p_in, v_date);
  v_char := to_char(v_date,'YYYY.MM.DD.HH24.MI.SS');
  return v_char;
  when others then return null;

Thursday, January 04, 2007

Blogger Comment Feed

I've upgraded to the new blogger engine which should have feeds for comments as well as posts. If it works, I'd encourage all blogspotters to do the same, as comments are just as important as the original post.

The new blogger engine is tied up to Google accounts. As such, I've acquired a new blog, called Possum Port. While not up to a certain individuals menagerie of housemates or the wallabies on the Dizwell blog, my own neighbours deserve a place in the sun.