Sunday, December 27, 2009

Happy New Year and fun with dates

As the calendar clicks over once more, it is time for some fun with date to character functions and function-based indexes.
This has been tested in XE (10gR2) and 11gR2 databases.

So nothing up my sleeves...

set echo on
start date_test.sql
clear screen
drop table date_format_test purge;
create table date_format_test (id number not null, val date not null); 

create index date_format_test_ix1 on date_format_test (
  id, to_char(val), to_char(val,'DD-Mon-YYYY'), to_char(val,'DD-fmMonth-YYYY'),  to_char(val,'DD-MM-YYYY'));

Insert a bunch of date values as they might come in from different sessions from different countries in different languages

  alter session set nls_date_format='DD/Mon/YYYY';
  alter session set nls_calendar='Gregorian';

alter session set nls_language = 'English';
insert into date_format_test values (1, date '2009-01-20');
  insert into date_format_test values (2, trunc(sysdate,'Mon'));
  insert into date_format_test values (3, to_date('25122009','ddmmyyyy'));

 alter session set nls_language = 'French';

  insert into date_format_test values (11, date '2009-01-20');
  insert into date_format_test values (12, trunc(sysdate,'Mon'));
  insert into date_format_test values (13, to_date('25122009','ddmmyyyy'));

 alter session set nls_language = 'German';

  insert into date_format_test values (21, date '2009-01-20');
  insert into date_format_test values (22, trunc(sysdate,'Mon'));
  insert into date_format_test values (23, to_date('25122009','ddmmyyyy'));

 alter session set nls_language = 'Spanish';

  alter session set nls_date_format='DD/fmMonth/YYYY';

Now lets see what's in the database...

  select /*+INDEX t*/ id, to_char(val,'DD-fmMonth-YYYY') from date_format_test t;

----------- ------------------------------
1.00 20-January-2009
2.00 01-December-2009
3.00 25-December-2009
11.00 20-Janvier-2009
12.00 01-Décembre-2009
13.00 25-Décembre-2009
21.00 20-Januar-2009
22.00 01-Dezember-2009
23.00 25-Dezember-2009

  select /*+INDEX t*/ id, to_char(val,'DD-Mon-YYYY') from date_format_test t;
----------- --------------------
1.00 20-Jan-2009
2.00 01-Dec-2009
3.00 25-Dec-2009
11.00 20-Janv.-2009
12.00 01-Déc. -2009
13.00 25-Déc. -2009
21.00 20-Jan-2009
22.00 01-Dez-2009
23.00 25-Dez-2009

In both of those cases, the value 'preserved' in the function based index is in the language of the inserting session, so the querying session sees the value in a variety of languages rather than their own.

  select /*+INDEX t*/ id, to_char(val) from date_format_test t;
----------- ------------------------------------------------
1.00 20/Enero/2009
2.00 01/Diciembre/2009
3.00 25/Diciembre/2009
11.00 20/Enero/2009
12.00 01/Diciembre/2009
13.00 25/Diciembre/2009
21.00 20/Enero/2009
22.00 01/Diciembre/2009
23.00 25/Diciembre/2009

In this cases, despite the hint, the data comes from the base table and is correctly shown in the language of the selecting session.

But this isn't just about display.

  select id, to_char(val,'DD-Mon-YYYY'), val from date_format_test t where to_char(val,'DD-Mon-YYYY') = '20-Jan-2009';

no rows selected

  select id, to_char(val,'DD-Mon-YYYY') from date_format_test where to_char(val,'DD-Mon-YYYY') = '20-Jan-2009';
----------- --------------------
1.00 20-Jan-2009
21.00 20-Jan-2009

In the first query, since VAL is selected, the data comes from the base table. No records are matched because, converting the date in the table to a character format in the current session language (Spanish) doesn't return any values matching the string '20-jan-2009'. In the second, it comes from the index and, because some inserting sessions were in English, we get some matches.

The problem is that function based indexes should only be used on deterministic functions and TO_CHAR(date,'format mask') doesn't necessarily meet the criteria. Being techies, we'd recognize that linguistic months might be a problem. We can always rely on numbers though, can't we ?
Well not quite. Did you notice my setup included setting NLS_CALENDAR ?

  alter session set nls_language = 'English';
  alter session set nls_calendar='Persian';

  select /*+INDEX t*/ id, to_char(val,'DD-MM-YYYY') from date_format_test t;
----------- ----------
1.00 20-01-2009
2.00 01-12-2009
3.00 25-12-2009
11.00 20-01-2009
12.00 01-12-2009
13.00 25-12-2009
21.00 20-01-2009
22.00 01-12-2009
23.00 25-12-2009

  select id, val, to_char(val,'DD-MM-YYYY') from date_format_test t;
----------- ---------------------- ----------
1.00 01 Bahman 1387 01-11-1387
2.00 10 Azar 1388 10-09-1388
3.00 04 Dey 1388 04-10-1388
11.00 01 Bahman 1387 01-11-1387
12.00 10 Azar 1388 10-09-1388
13.00 04 Dey 1388 04-10-1388
21.00 01 Bahman 1387 01-11-1387
22.00 10 Azar 1388 10-09-1388
23.00 04 Dey 1388 04-10-1388

While you may think we are reaching the end of 2009, that isn't necessarily true.
But for those of us who are, then Happy New Year !

It is possible to specify the date language or the calendar as part of the to_char function. I suspect that would be sufficient to make these deterministic.

select to_char(sysdate,'DD-fmMonth-YYYY','nls_calendar=''Arabic Hijrah''') from dual;
select to_char(sysdate,'DD-fmMonth-YYYY','nls_date_language=''Italian''') from dual;

Thursday, December 24, 2009

Autonomous SERVERERROR trigger

Last month I posted about the 'disappearing row trick'. In my response on stackoverflow I posted an additional demonstration of the issue using a SERVERERROR trigger. That was flawed because, as I just read in 5th Edition Oracle PL/SQL Programming, the AFTER SERVERERROR trigger fires as an autonomous transaction.

Take the following example:

drop table test_se_auto_tbl;

create table test_se_auto_tbl 
  (id number(2) primary key, val varchar2(20));

create or replace trigger test_se_auto_trg 
after servererror on schema
  dbms_output.put_line('Trigger ran');
  for c_rec in (select id, val from test_se_auto_tbl) loop
  end loop;

Running a transaction as three separate independent SQL statements shows that, when the third statement fails the SERVERERROR trigger fires but the two successful inserts are not visible (because the trigger is an autonomous transaction).

insert into test_se_auto_tbl values (1,'test ');
insert into test_se_auto_tbl values (10,'test 10');
insert into test_se_auto_tbl values (100,'test 100');

select id, val from test_se_auto_tbl;


Revising the example (as I've done on stackoverflow), it is still possible to demonstrate the disappearing row trick however.

create or replace trigger test_se_auto_trg after servererror on schema
  dbms_output.put_line('Trigger ran');
  insert into test_se_auto_tbl values (1,'test in trigger');

  insert into test_se_auto_tbl values (1,'test 1');
  insert into test_se_auto_tbl values (10,'test 10');
  insert into test_se_auto_tbl values (100,'test 100');

select id, val from test_se_auto_tbl;

In this case the trigger fires and is successfully able to insert the "test in trigger" row. This indicates the insert of the "test 1" row has been rolled back. If you run the three individual statements, the last insert errors but reports a deadlock as the autonomous transaction in the SERVERERROR trigger fails trying to violate the primary key on ID.

truncate table test_se_auto_tbl;
insert into test_se_auto_tbl values (1,'test ');
insert into test_se_auto_tbl values (10,'test 10');
insert into test_se_auto_tbl values (100,'test 100');

The "test" row from the first insert is still there but is not directly visible to the autonomous transaction. It can be detected indirectly by trying to insert the duplicate row. This finds the uncommitted "test" row. Normally it would wait until a blocking transaction is committed or rolled-back and then either fail or allow the insert. But because the original transaction cannot complete until the trigger completes and the trigger can't complete until the original transaction completes, the deadlock situation arises.

Tuesday, December 22, 2009

Wide records in history tables - Squeezing the Juice

I had one of those WIDE tables, over a hundred a fifty columns which was a history style table, recording the state on particular days.

I've put together a PL/SQL routine that would dump out the columns that actually differed between successive entries.

The first parameter is the table name, or the target of the select statement.
I have used DBMS_SQL.DESCRIBE_COLUMNS rather than parsing *_TAB_COLUMNS for this parameter for greater flexibility. It means I don't have to separate out the schema from the table in separate parameters. I can even pass a inline view as the target, such as '(select * from schema1.tbl2 where val !=123)'

The second parameter is the column(s) which identify the ordering, which will normally be a date or sequence. Again it is possible to have multiple values (eg 'batch_id, line').

The third parameter is optional, and identifies the partitioning column(s). If we were looking at an employee history, this would be the employee number.

A final optional parameter allows you to limit the query to a specific value for the identifier column.

Because it uses dynamic SQL, and I've opted for flexibility over restriction, there are obvious injection possibilities. It would be vulnerable if some-one specified a function name as part of the target [eg '(select bad_func from dual)'] or as the ordering column. It is invoker rights so it won't directly allow access to anything the user can't otherwise see or do, but if it gets called through a wrapper then it becomes dangerous. As a pipelined function it can only be called through SQL, so any data changes or DDL would have to be wrapped in an autonomous transaction. But that doesn't make it bullet-proof though.

clear screen

create or replace function tab_diff
(p_tab_name in varchar2, p_ord_col in varchar2, p_id_col in varchar2 default null, p_id_val in varchar2 default null)
return tab_char_4000 AUTHID CURRENT_USER pipelined is
v_line varchar2(2000);
v_col_cnt INTEGER;
v_ind NUMBER;
rec_tab dbms_sql.desc_tab;
v_tab dbms_sql.varchar2a;
v_temp VARCHAR2(32000);
v_cursor NUMBER;
v_clause VARCHAR2(200);
-- Basic parameter validation
if p_id_val is not null and p_id_col is null then
raise_application_error(-20001,'Cannot specify an id value without an associated column');
elsif p_tab_name is null or p_ord_col is null then
raise_application_error(-20001,'Table or query target and order column(s) must be specified');
end if;
-- Initial values
v_ind := 1;
v_tab(v_ind) := 'select '||p_id_col||','||p_ord_col;
v_tab(v_ind) := 'select '||p_ord_col;
v_temp := 'select * from '||p_tab_name;
IF p_id_col IS NULL THEN
v_clause := ' over (order by '||p_ord_col||')';
v_clause := ' over (partition by '||p_id_col||' order by '||p_ord_col||')';
-- Identify the columns in the target and build the new query
v_cursor := dbms_sql.open_cursor;
dbms_sql.parse(v_cursor, v_temp, dbms_sql.native);
dbms_sql.describe_columns(v_cursor, v_col_cnt, rec_tab);
FOR v_pos in 1..rec_tab.LAST LOOP
v_line := rec_tab(v_pos).col_name;
IF v_line in (p_id_col, p_ord_col) THEN
-- When the new value doesn't match the 'lag' value, return the "new:old"
v_line := ', case when '||v_line|| ' = lag('||v_line||') '||v_clause||' then null '||
' else '||v_line||'||'':''||lag('||v_line||') '||v_clause||' end '||
v_ind := v_ind + 1;
v_tab(v_ind) := v_line;
-- Add the FROM, WHERE and ORDER BY to the new query
v_ind := v_ind + 1;
IF p_id_val is null then
IF p_id_col is null then
v_tab(v_ind) := ' from '||p_tab_name||' order by '||p_ord_col;
v_tab(v_ind) := ' from '||p_tab_name||' order by '||p_id_col||','||p_ord_col;
v_tab(v_ind) := ' from '||p_tab_name||' where '||p_id_col||' = '''||p_id_val||''' order by '||p_ord_col;
-- parse, describe and execute the new query
dbms_sql.parse(v_cursor, v_tab, v_tab.first, v_tab.last, true, dbms_sql.native);
FOR i in 1..v_tab.count LOOP
dbms_sql.describe_columns( v_cursor, v_col_cnt, rec_tab);
FOR i IN 1 .. rec_tab.COUNT LOOP
dbms_sql.define_column( v_cursor, i, v_line, 2000);
v_ind := dbms_sql.execute( v_cursor );
-- Fetch each row from the result set
v_ind := DBMS_SQL.FETCH_ROWS( v_cursor );
EXIT WHEN v_ind = 0;
pipe row( '===================================================================================================');
-- Go through each column and if the new:old value for the column is not null then display it
FOR v_col_seq IN 1 .. rec_tab.COUNT LOOP
-- Get the value
dbms_sql.column_value( v_cursor, v_col_seq, v_line );
IF NVL(v_line,':') != ':' THEN
pipe row( rpad(rec_tab(v_col_seq).col_name,35)||'>'||v_line);
ELSIF rec_tab(v_col_seq).col_name in (p_id_col, p_ord_col) THEN
--Also show the identifier and order columns
pipe row( rpad(rec_tab(v_col_seq).col_name,35)||'>'||v_line);
end tab_diff;

show errors function tab_diff

select * from table(tab_diff('TABLE_1','BATCH_ID','TAB_ID','TAB_ID_VAL'));

select * from table(tab_diff(
'(select * from TABLE_1 WHERE TAB_ID=1234)','BATCH_ID'));

Tuesday, December 15, 2009

Reading Matters

Chet asked, in a comment on my post, for the security blogs I read.

I was planning a 'reading matter' blog anyway, and there's a bit of crossover so here we go:


I get the OraNA Security feed as my main source. Sometimes I'll add a new feed I come across, but they seem to get added to OraNA very quickly, and then I just remove the specific feed. The remaining ones aren't specific to Oracle (or even necessarily databases).

Imperva blog - Mostly database or data related

-Mostly computer related

Schneier on Security - A bit more high level

Risks Digest - Mostly reports on things that have gone wrong

A Day in the Life of an Information Security Investigator
This one is entertaining, but it has got quiet over recent months. He's either working hard, or its because he has cut down on his coffee intake.

I'm glad Chet asked since I found a couple of broken feeds while checking and had to re-point them to the new feed locations.

I follow the blogs of Joel Spolsky and Jeff Atwood.

I've just finished "More Joel on Software" which I found very enjoyable. While it is 'just' a collection of his blog articles, it has been brought together in a very cohesive way, so works well as a book. I have been reading the 'Joel on Software' blog for a couple of years and had previously bought "User Interface Design for Programmers".

I suspect I got to Jeff Atwood's Coding Horror blog from Joel's blog, but it may have been the other way. Either way they got together to produce stackoverflow.

If you haven't already been, go take a look round I used to hang around a couple of forums but this one just works better. OpenId authentication so I don't have to worry about ANOTHER password to remember. A whole bunch of 'badges' to earn and reputation to be garnered. Users with a high enough reputation get editing privileges on the question or answers (though there's a wiki style history to see who did what). As well as originators being able to mark answers as correct, others can vote answers up and down, and vote the question up and down. You can comment on the question and on answers and vote the comments up and down. Its not only free to use, but you can even download the whole Q&A database. If Oracle ever feel like ditching Jive, they can get this off the shelf []. That said, I'm sure the thing could be spoiled by the wrong user base.

'Paper' reading

I've started on Oracle PL/SQL Programming, 5th Edition. I won this after Steven's visit to Sydney. I'd previously owned a much earlier edition (okay, it was my wife's but it sat on *our* bookshelf), but eventually got rid of it as it was too dated to be fully trusted. I need to catch up with best practices for the later versions of PL/SQL, and the latest edition goes all the way up to 11gR2. It's a big book, so it will take me a while.

On my non-work stuff, I've been re-reading my set of Jasper Fforde books. Very English and refreshingly different. I'd sort of class them as satire, but they are very tricky to categorise.

Monday, December 14, 2009

Who caught my error ?

This one is prompted by a Stackoverflow question here

DBMS_TRACE is a handy way to peek under the scenes in PL/SQL. The bad news is you'll need someone with SYSDBA privileges to install the required tables. I suggest you buy your DBA a box of chocolates for Christmas, and then suggest that you would be annoying them less often if they'd install the objects. Oh, and while they are there, could they give you some select privileges on some of the V$ views.

The required tables are in a script called tracetab.sql in $ORACLE_HOME/rdbms/admin. They get installed under SYS and you'll want some privileges to manage them too.

D:\oraclexe\app\oracle\product\10.2.0\server\RDBMS\ADMIN>sqlplus / as sysdba
SQL*Plus: Release - Production on Mon Dec 14 09:24:45 2009
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Express Edition Release - Production
SQL> @tracetab.sql
SQL> CREATE PUBLIC SYNONYM plsql_trace_runs FOR plsql_trace_runs;
SQL> CREATE PUBLIC SYNONYM plsql_trace_events FOR plsql_trace_events;
SQL> CREATE PUBLIC SYNONYM plsql_trace_runnumber FOR plsql_trace_runnumber;
SQL> GRANT SELECT ON plsql_trace_runnumber TO PUBLIC;

Now for a demo :

create or replace
procedure test_trace is
v_test varchar2(3);
select '12' into v_test from dual;
select '123' into v_test from dual;
select '1234' into v_test from dual;
select '12345' into v_test from dual;
when value_error then

Somewhere in the mire of code, an exception is being raised. It could be a select or other SQL statement or just a PL/SQL assignment. The benefit of DBMS_TRACE over an SQL trace is that it catches PL/SQL exceptions.

Run the test by clearing out the table of any old junk, set the flag and execute the procedure

conn .../...@xe
delete from plsql_trace_events;
exec DBMS_TRACE.set_plsql_trace (DBMS_TRACE.trace_all_exceptions);
exec test_trace;

Then query the results.

conn .../...@xe
select event_kind, event_unit, event_line, stack_depth, excp, event_comment, callstack, errorstack
from plsql_trace_events
where event_kind not in (38,40,43,44)
order by event_seq;

----------- ------------------------------- ----------- ----------- -----------
52.00 TEST_TRACE 6.00 2.00 6,502.00
Exception raised
----- PL/SQL Call Stack -----
object line object
handle number name
3BF0F6D4 6 procedure GARY.TEST_TRACE
3BDF1764 1 anonymous block
ORA-06502: PL/SQL: numeric or value error: character string buffer too small

53.00 TEST_TRACE 11.00 2.00 6,502.00
Exception handled

We can see an exception was raised at line 6 and the fact that it was caught at line 11. The latter is pretty important too. If you have some complex code, it isn't impossible that an exception handler a couple of levels up the call hierarchy may 'handle' an exception for which it was never intended. You can even see the error number which is handy if it got caught by a WHEN OTHERS.

Tuesday, December 08, 2009

The CREATE ANY TRIGGER three card trick.

I've got an interest in security, or at least in vulnerabilities. It isn't something I've really had as part of a job role but I like the tricky way of thinking it entails, and I follow a number of security blogs (both Oracle and generic).

During my degree course, when I was mooching around the Prime computers the students had accounts on I discovered some forgotten directories which allowed me to stop worrying about the number and size of files I wanted to store. My degree course was a 'sandwich' one, where the third year was spent working in the real world, and while there I discovered another 'hole'. In that case it was on a *nix system where, although the login account went straight to a menu rather than giving shell access, I could execute shell commands from within the text editor.

After graduating, I was working for a software provider. One client had an important batch job which failed to run at a weekend. Some clients had support accounts that we could access anytime, others required the account to be unlocked, and one site had to turn on the modem before we could connect. This site was quite secure and I couldn't get to the support account but it was local enough that I could drive to the site. The guy I was dealing with had a basic login with just a list of menu options so I still didn't have any command line or SQL level access to fix the problem. Remembering my previous adventures, I nipped into the MAIL option and from there to the text editor. [Hey, this was pre-internet and pretty much pre-Windows. You sent and read your mail through green-screen terminals.] This was VMS and the environment was smarter than the *nix environment had been and knew I was limited to a locked down account, so the editor wouldn't allow shell commands. But it would let me open up another file in the editor, and so I could edit a file that would be run by the batch job when it restarted. See what I mean about twisty thinking ?

DBAs need to understand the twisty thinking when it comes to protecting databases. So I though I'd push out an Oracle example of twisty thinking.

DBAs know that the *ANY* privileges are generally dangerous and shouldn't be handed out unless absolutely necessary (and even then, with lots of auditing and a large spiky stick). CREATE ANY TRIGGER is one of the trickier ones. On the face of it, it allows a user to create a trigger on anyone's table (or a DDL or event trigger). The trick is that, table triggers do NOT need to be owned by the same schema as that of the table.

So a user, DICK, with 'CREATE ANY TRIGGER' can create a trigger owned by TOM on a table owned by HARRY. That way anyone who inserts, updates or deletes a row in HARRY's table will fire the trigger that runs with TOM's privileges. And given DICK can create the trigger with any owner he wants, the TOM account will be the one with the most privileges.

Obviously, DBAs don't go around granting CREATE ANY TRIGGER to just anyone. A measly developer may just get an account with CREATE TABLE, but that's about it. Once you consider that the developer's account can be the table owner though (HARRY in the example), we have our first twist. A sneaky cracker will need a powerful DICK to create the trigger so he looks around for an account that already has that privilege, such as MDSYS (which owns the spatial datatypes, packages etc). That account is probably LOCKED and EXPIRED, but that doesn't make it unusable. A cracker will be studying the Oracle CPUs and security sites for SQL or PL/SQL injection vulnerabilities in those privileged (and locked down) accounts and those will be used to get access to the CREATE ANY TRIGGER privilege.

To summarise, cracker gets a lowly priviliged HARRY account, uses vulnerabilities in packages owned by default DICK users (despite them being locked) to create triggers owned by TOM users with very high privileges on HARRY's own tables. By inserting into those tables, the cracker can fire a trigger to grant himself DBA or whatever.

From there you can use CREATE ANY DIRECTORY to start hunting around the server file system, and write ssh or rhosts files to get to the oracle login. That may only be on a development system, but some snazzy rootkit style executables there could help the cracker make the jump to production. Or maybe, by pointing the CREATE ANY DIRECTORY cannon at the directory with the Oracle data files, they use UTL_FILE to read the files/blocks relating to C_USER# (the data segment underlying dba_users) and get the password hashes (which are no longer visible in DBA_USERS in 11g). There are some fun things being done to speed up cracking of Oracle passwords so anyone thinking the eight-character alphanumeric password they have been using for a couple of months is safe should think again. You don't use the same password for anything important do you ?

So, have a think about your password complexity and change requirements, applying those CPUs to those development and test databases as well as production, and actively auditing those more powerful privileges (by which I mean you READ about any use of them when they happen, and not six months later).

Thursday, November 26, 2009


A comment on Tom Kyte's blog alerted me to DBMS_METADATA_DIFF in 11gR2.
This allows you to generate a set of 'ALTER ...' scripts that bring two objects into line.

It is documented as being part of
DBMS_METADATA at the PSOUG (formerly Morgan's Library) reference. The latest Morgan's Library reference says it is part of but undocumented for that release.

Say you have added a column, or some constraints to a development or test environment, this can be used to compare that table definition to the production one and give you the statements that need to be applied to production to bring the tables into line.

The bad news is that it
requires the Change Management Option.

Hey Oracle. Can you please STOP installing stuff by default that you can't use without extra payment. Or at least don't create the public synonyms and grants that make it trivial to use.

To use the PL/SQL Profiler, you need to create some SYS tables. There's a script to do this but it has to be manually run by the DBA. Can't they do something similar with the extra cost options ?

Anyway, for the financially challenged, remember
that "Personal Edition includes all of the components that are included with Enterprise Edition, as well as all of the options that are available with Enterprise Edition", and this functionality works with database links. So for a relatively small outlay a DBA can get have Personal Edition sitting on a Windows box and use this handy addition.

Since this is first documented with 11gR2 you may want to until the Windows port of 11gR2 comes out for Personal Edition. If you want to risk it with 11gR1, you'd still need the first patchset, so the OTN version won't do.

09:41:05 GARY@db11gr2 > desc scott.gm_x;
Name Null? Type
--------------- -------- ----------------

09:41:10 GARY@db11gr2 > desc gary.gm_x;
Name Null? Type
------------- -------- ----------------

09:41:14 GARY@db11gr2 >

Thursday, November 19, 2009

The amazing disappearing row trick

This curiosity was prompted by a question on stackoverflow
The code inserts a row into a table and then raises an exception.
Despite the exception being caught by an exception handler, and with no explicit rollback, the inserted row has vanished.

drop table dummy;

create table dummy (id number);

v_num NUMBER;
execute immediate
v_num number(2);
insert into dummy values (1);
v_num := 100;
when value_error then null;
select count(*) into v_num from dummy;

The simple answer is that EXECUTE IMMEDIATE executes an SQL statement (which may be an anonymous PL/SQL block), and that statement will either succeed or fail. If it errors, atomicity states that changes made as part of that statement must be rolled back.

Without the EXECUTE IMMEDIATE, the entirety of the code is a single statement and there is no need for a rollback, since the statement succeeds due to the exception handler.

v_num NUMBER;
v_num number(2);
insert into dummy values (1);
v_num := 100;
when value_error then null;
select count(*) into v_num from dummy;

PS. You get the same result using EXECUTE IMMEDIATE or DBMS_SQL.

Thursday, November 05, 2009

A quick example of proxy authentication

It is fairly typical to have a bunch of developers working in a single schema. Often you want to keep track of which person did what in that schema. [Ideally, you want to know why, but reading minds is beyond the scope of this article.] One way to improve the traceability of activity in the schema is to give everyone a user account with their own password and then allow those development users to connect as a proxy to the schema user. Since everyone connects to the single schema using their own individual username/password, no-one needs to know the schema password and the DBA can even set it to some random jumble of 20 to 30 characters to prevent guessing.

The proxy user doesn't naturally appear in the V$ views (as far as I can tell), but can be derived from SYS_CONTEXT, and therefore used in a LOGON trigger to set CLIENT_INFO which is visible, or you could trace DDLs with AUDIT or a trigger and store the value there.

C:\>sqlplus gary/gary@xe
SQL*Plus: Release - Production

SQL> create user schema1 identified by ***** default tablespace users;
User created.
SQL> alter user schema1 quota unlimited on users;
User altered.
SQL> create table schema1.test (id number);
Table created.
SQL> create user devuser identified by dangermouse default tablespace users;
User created.
SQL> alter user schema1 grant connect through devuser;
User altered.
SQL> grant create session to schema1;
Grant succeeded.
SQL> conn devuser[schema1]/dangermouse@xe
SQL> select user from dual;
SQL> select sys_context('userenv','proxy_user') from dual;

With a JDBC client, such as SQuirrel, I simply put the username as DEVUSER[SCHEMA1] and use the DEVUSER password.

Friday, September 04, 2009

11gR2 nuggets

Haven't installed 11gR2 yet, but found a couple of nuggets in the documentation.

Image this situation. At 11:59, transaction A inserts a record into a table with a CREATED_ON of SYSDATE, but is not yet committed. At 12:00 a batch job kicks off to pick out all the new records, using the CREATED_ON date. Since the one from transaction A isn't committed yet, it is missed by the batch job. If the batch job says "I picked up everything dated before 12:00" it will miss it next time too.
Previously, you might work around this with v$transaction or queuing. A long time ago I even used a LOCK TABLE IN EXCLUSIVE MODE. It was immediately released after it had been obtained but still provided a potential for an issue (though it isn't so dangerous if you add a WAIT 5 on the end). Now you can simply tell your transaction to wait until any other transaction on the table that started before 12:00 has completed.

Secondly, having the option of an EXIT in SQL*Plus default to quitting with a ROLLBACK rather than a COMMIT.
It is not one I'd thought was an issue, but it came up on Stackoverflow at the same time. Coincidence, I guess. Anyway, I'd expect Instant Clients for 11gR2 to come out fairly soon, so this can be put into production a lot quicker. Be warned though, a DISCONNECT or CONNECT will (apparently) still do an implicit commit. That all seems odd to me, as mentally I'd expect an EXIT to do a disconnect from the database session then exit the client (and a CONNECT to do a disconnect from one session and then a connect to a new one) and that any COMMIT/ROLLBACK option would relate to the disconnect option.

Thirdly, hints.
I'm not sure I like this but IGNORE_ROW_ON_DUPKEY_INDEX allows the statement to silently ignore ORA-0001 errors. The row won't be inserted, but the SQL statement won't fail either. Not sure if AFTER ROW triggers will fire or not. Personally I prefer the thought of the error logging clause for this sort of functionality. I just don't like hints which change what the statement does at a logical level.

One hint I want to experiment with is APPEND_VALUES, which is like the APPEND hint, but for FORALL...INSERT in PL/SQL. Not sure how often you would have an array in memory large enough to benefit from this. Also, once you've done a direct-path insert, the table wants a commit/rollback before you can do another. If APPEND_VALUES is the same, you'd need to introduce a commit if you want to use this for looping big chunks of data.

Friday, August 07, 2009

Interlude - Cheap(er) Spatial

In the previous post, I mentioned the cost of the Spatial option. There is a way to both decrease the cost and increase performance of the geocoding. I thought that would grab your attention.

The secret to improving performance is too precompute the answers. That is, work out the geocode for every street and cross street. I'll cover that in the coming posts.

The secret to reducing cost is Oracle Personal Edition. In the licensing guide, "Personal Edition includes all of the components that are included with Enterprise Edition, as well as all of the options that are available with Enterprise Edition". This is distinct from the Enterprise Edition wording which states that it "can be further enhanced with the purchase of the options". My emphasis. So you get Spatial free with Personal Edition.

Personal Edition is normally $460 according to the Oracle Store. That's for a perpetual license though. If you only wanted to do a one-off job, or just reduce your startup costs, it has the option of a one year license for $92 (though support would double that). And, for reasons I can't even guess at, Personal Edition only runs on Windows so you may have to add a Windows license to that.

Of course, with Personal Edition you do have to be very careful that only one person uses it, so this is best suited to an environment with one off processing, or where a bunch of data can be imported, processed and extracted as part of a daily or weekly batch by the license holder.

Wednesday, August 05, 2009

I can see my house ... Part 3 - Sax and the Cities

In the previous post in the series, I mentioned trying to work with the raw OSM file from OpenStreetMap. If you could do this, it would remove the 'dependency' on the extracts from CloudMade.


But it isn't easy. The OSM format is XML. But that isn't my problem with it. Yes, XML is verbose and that makes the files big (about 1GB for Australia). I actually think XML is appropriate for this sort of data interchange requirement, as people will be interested in various slices of it and new bits will need adding on as time goes on.

No, what I object to is that practically everything that isn't a reference to a longitude/latitude is a key/value 'tag' element. While I don't like it, I do understand the reason for it given in the FAQ

"Imposing too many rules on them would reduce mapping from a fun and creative activity to a dull and boring data collection task. We give our data away for free, and if our users need more structure, they can convert our data to match their needs."

At this stage in the project, I agree that data volume outweighs data structure. However I suspect (and hope) there will come a time when the project matures to point where more structure is imposed.

Structure, what structure ?

Currently, the OSM is mostly a bunch of nodes/points (with a latitude, longitude and multiple tags) and ways (an ordered set of points, and multiple tags). Some 'ways' are tagged as highways , some are coastlines, some are rivers. A single 'way' can be multiple things (eg a road or river may serve as a boundary between regions or countries, or a 'way' may serve as both road and tramway). There are also relations, which can group nodes and ways (and have their own tags).

Give this problem to a database designer, and they would have looked at a map and think "Hmmm. We need to differentiate between roads, rivers, train lines etc. And most roads have names, maybe speed limits...". They'd come up with a data model, But rather than store these as XML attributes (or sub-elements), they have been relegated to catch-all key/value pairs.

That means, rather than being documented as part of the XML schema (XSD or DTD), it is all on a wiki. And there's no automated way to validate conformance.

But enough whinging. If you want to use the OSM, here's a few suggestions based on my attempts.

How to get the OSM

You can download the entire planet's OSM file from one of the mirrors listed, but our planet is a big place. Personally, I'm just interested in Australia at the moment, and I can get a download of that from OSM Australia.

If, as is likely, you have a different area of interest, you can download the whole planet and use a tidy java app called osmosis, to extract a square, rectangle or other shape containing your region of interest.

They also do daily deltas for the planet.osm. Again, you'd chop out the rectangle of interest and apply that. Australia is surrounded by sea and it is pretty easy to come up with a rectange that includes Australia and nothing else. You can do other shapes too, but I like to keep it simple.

java -jar osmosis.jar --read-xml file="/osm/australia.osm" --bounding-box top="-33.50" bottom="-34.12" left="150.54" right="151.45" --write-xml file="greater_sydney.osm"

The osmosis instructions contains some other filters. Alas there isn't one to extract only highways, but it is open source and I've put in a request for that to the dev list.
How not to load the OSM

I tried to load part of Australia (the greater Sydney area - about 70MB) in one chunk to an Object-Relational XML schema and gave up. It worked for a smaller extract of 4 Mb though and I was using a single core laptop with 1GB of RAM. A bigger system and/or a smaller region and you may have better luck. The SQL for that is osm_schema.sql. You will have to amend the OSM file to put in the appropriate XML headers. Plus I had some character set issues which I resolved through uconv (simply deleting the characters it didn't recognise).

How to load the OSM

After hunting around, I found Marco's blog entry on SAXLoader. Thanks Marco. I've copied the source code from the messages on the thread and saved them on as separate files.
I didn't want to inflict JDeveloper on my little laptop, so I compiled them through the command line.
Your mileage may vary, but I set the following classpath, and compiled as follows:

export CLASSPATH=$CLASSPATH:.:$ORACLE_HOME/lib/xmlparserv2.jar:$ORACLE_HOME/lib/servlet.jar:/$ORACLE_HOME/jdbc/lib/ojdbc6.jar
javac -d *.java

As I said above, under the top level "osm" element, the XML has three different elements, for "node", "way" and "relation". This means three passes through the SaxLoader. I used three destination tables, all simply created like CREATE TABLE OSM_XML_NODE OF "SYS"."XMLTYPE". The easiest way to script the run is with three separate config XML files and call the java for each type.

java -Xmx256M > node.log

java -Xmx256M > rel.log

java -Xmx256M > way.log

For me, the 'node' took several hours, 'way' was a couple of hours and 'relation' about ten minutes. Again, your mileage will vary depending on your hardware and input file size. The smaller daily delta files should breeze through.

How to extract relational data from the OSM XML elements

The script xml_to_rel.sql is my first effort at this.

There were almost eight million nodes and about 1.5 million tags for Australia (but 1.2 million tags were 'created by', and I don't care about those). From the nodes I extracted the longitude, latitude, user and timestamp attributes. The tags are populated to a child table. Again, this operation takes several hours.

I extracted the details for way in a similar manner (450,000 entries), except there are two child tables. The first is the relationship between WAY and NODE (8 million), and the second the tags for the WAY (1.5 million). I don't bother with the visible attribute for WAY either. The osm_way_node extraction took about seven hours. I believe a large chunk of that was the handful of XMLs over 100kB in size, with tens of thousands of nodes (and I think is the largest of those is the Tasmanian coastline). You could try filtering out based on length(s.SYS_NC_ROWINFO$.getclobval()) if that's a problem for you.

Finally, I 'promoted' the name and highway to attributes of the way.

I haven't gone as far as pulling the longitude/latitudes into MDSYS.SDO_GEOMETRY datatypes, either on the nodes or ways. That would bring the data into line with that derived in the previous post, removing the reliance on the Cloudmade downloads.

Thursday, July 30, 2009

I can see my house.. - Part 2 : Open Source Geocoding

The drawback of relying on Google APIs is that they've got all these 'Terms of Use'. Unless you pay, you can only call the geocoder so often and can't use it for applications which aren't available freely to the public, and you can't store results except for caching.

What you could really do with is your own geocoding service and that might not be as far-fetched as it seems. If you have enough monkeys and typewriters, they'll eventually write Hamlet. In a more realistic timescale, if you have enough people driving around with GPS devices, they'll come up with a map. They have and it is called OpenStreetMap. It is still a work-in-progress (and always will be, as roads get built or redeveloped). You'll have to see if the data is sufficient for your area of interest. If not, buy a GPS and fix it.

You can get raw data from them, in an XML format called OSM. I had a look at that route and it is not ideal. I'll blog on that later. In the meantime, here's one some-else made earlier. Cloudmade have done a nice chunk of processing for us, and generated Shapefiles similar to those I loaded up in the prior post. From here, I can get a zip file containing shapefiles, one of which is 'australia_highway'. I load that the same way I did the others.

They do other countries too and boundaries as well as highways, so if the Australian postcodes/suburbs from the last post weren't useful to you, this resource may help out.

The local_geocode.sql is an example of how I've used the data to do some basic geocoding. To get that code to work in XE, you'd have to get rid of the call to "sdo_geom.sdo_closest_points" as it is new in 11g. Lets hope that OpenWorld will give us a hint on when we may see 11gR2 XE.

The data isn't at the level of Google. It doesn't do street numbers, for example. Some streets in the source OSM don't have names, and the street next to the one I live in is missing entirely. But if you are trying to determine the closest store for your customer based on their address, it could be good enough. Even if coverage isn't perfect, you may still be able to reduce the amount of calls you make to a "paid for" geocoder (which could reduce the amount you need to pay for it).

Warning: You may be tempted to switch to use sdo_geom.sdo_intersection to find where two streets cross. This is actually part of Oracle Spatial and isn't licensed for use with the basic Locator functionality that comes free with the database. In 10gR2, Locator only included the SGO_GEOM functions SDO_DISTANCE , VALIDATE_GEOMETRY_WITH_CONTEXT, VALIDATE_LAYER_WITH_CONTEXT. In 11g, it includes all subprograms EXCEPT for the following: SDO_GEOM.RELATE, SDO_GEOM.SDO_DIFFERENCE, SDO_GEOM.SDO_INTERSECTION, SDO_GEOM.SDO_UNION, SDO_GEOM.SDO_VOLUME, SDO_GEOM.SDO_XOR.

With XE, the licensing is less-defined. According to this, Spatial is not included with XE. Logically that would mean sdo_intersection can't be part of Spatial as it is included. And even if it is part of Spatial, then are the license restrictions in either the 10gR2 or 11g 'Spatial' documents applicable. Since though either would preclude sdo_intersection, the last question is probably academic.

A discussion of Spatial vs Locator licensing can be found here. As reported here, the cost of Spatial has risen from an arm and a leg to boths legs, one arm and an ear (or by 52% in dollar terms).

I can see my house from here - Part 1

On a recent project, I was looking at Geocoding with Google Maps.

Searching with a simple string would sometimes produce odd results, like places in America when I really wanted one in Australia, or one in Victoria when I expected it in Sydney. Looking further, I've found that you can tell Google Maps to limit its results to a specific 'box' specified by latitude and longitude. The trick is knowing what latitude and longitude to use as the limits.

In Australia, we can get boundaries from the Australian Bureau of Statistics here. It's a few years out of date, but will be sufficiently accurate for most purposes. The drawback is that this is in a shapefile (which is actually a set of files - go figure).

But a Googlehunt directed me to the GDAL/OGR, which can convert shapefiles into Oracle spatial/locator geometries. I downloaded the source to my ubuntu box (which has the database installed), then uncommented and set the line in nmake.opt regarding ORACLE_HOME and installed it (don't forget to set the LD_LIBRARY_PATH).

Unzip the the data files, and gdal/ogr can load the shape files straight into the database, even creating the table. That's the ease of use I like.
ogr2ogr -f OCI OCI:dev/[email protected]/db11r1 SSC06aAUST_region.shp ogr2ogr -f OCI OCI:dev/[email protected]/db11r1 POA06aAUST_region.shp

From there, I want to summarize up a 'box' for each postcode/suburb/state that tells me the outer limits of the area. Luckily for me, Australia is positioned such that it is pretty easy to determine whether a value is a latitude or longitude.

create table aus_postcode as
select a.poa_2006,
decode(state_2006 ,1,'NSW',2,'VIC', 3, 'QLD',
4,'SA', 5,'WA', 6,'TAS',7,'NT',8,'ACT','Other') state,
min(case when b.column_value between 80 and 160 then b.column_value end) west_longitude,
max(case when b.column_value between 80 and 160 then b.column_value end) east_longitude,
min(case when b.column_value between -50 and -20 then b.column_value end) south_latitude,
max(case when b.column_value between -50 and -20 then b.column_value end) north_latitude
from POA06AAUST_REGION a, table(a.ora_geometry.SDO_ORDINATES) b
group by a.poa_2006, state_2006;

create table aus_localities as
select a.name_2006 locality, ssc_2006 locality_number,
decode(state_2006 ,1,'NSW',2,'VIC', 3, 'QLD',
4,'SA', 5,'WA', 6,'TAS',7,'NT',8,'ACT','Other') state,
min(case when b.column_value between 80 and 160 then b.column_value end) west_longitude,
max(case when b.column_value between 80 and 160 then b.column_value end) east_longitude,
min(case when b.column_value between -50 and -20 then b.column_value end) south_latitude,
max(case when b.column_value between -50 and -20 then b.column_value end) north_latitude
from SSC06aAUST_region a, table(a.ora_geometry.SDO_ORDINATES) b
group by a.name_2006, ssc_2006, state_2006;

create table aus_states as
select state, min(west_longitude) west_longitude, max(east_longitude) east_longitude,
min(south_latitude) south_latitude, max(north_latitude) north_latitude
from aus_localities
group by state;

So now, before I geocode an address with suburb, postcode or state, I can query the bounding box and pass it to Google to get a more accurate answer. For an example of the code for this, look at the util_geocode package here. You'll need to plug in your own Google Maps API key before using it. Also, if you want to borrow the code for parsing the returned KML/XML, you may need to make adjustments for the address fields returned for searches in your country of interest.

Thursday, January 08, 2009

JournalSpace learn that its all about the data

Did any-one blog on JournalSpace ?

I'll admit it wasn't a site I'd heard of, but they've been around for six years or so.

Until December 18th, when they died. And upon the gravestone they will carve "RAID is not a backup"

"There was no hardware failure. Both drives are operating fine; DriveSavers had no problem in making images of the drives. The data was simply gone. Overwritten."

It is hard to accidentally fully overwrite large files. Its easier for the OS just to mark a few areas of disk as 'available for use' than overwrite them. It may well have been deliberate sabotage, possibly by that same IT guy. Though I'd have thought that, if he knew that this 'attack' would work, he'd have known the weakness of RAID from a recovery point of view, and would therefore have a suggested additional backup measures. Maybe more details will come to light if JournalSpace can establish the cause of the overwrite.

It could be that partial data recovery was physically possible but financially unviable. Reconstructing valid database files, or even extracting the data from them, would be hard enough for 'simple' corruption, let alone the residue from a full overwrite (security professionals recommend seven overwrites before data is REALLY gone). I can't see recovering blog posts as being cost-effective.

According to this report, the owners of JournalSpace site also owned the hosting company. I'd expect a hosting company to have some concept of data backups. That, at some point, machines will push (or have pulled from them) a biggish chunk of data.

Oh wait, they did understand some backups.

"He had set up automated backups for the HTTP server which contains the PHP code, but, inscrutibly (sic), had no backup system in place for the SQL data. "
Its a lot easier to set up a backup for the application code. Application code probably has a copy on a dev/test machine or developer's desktop environment or in a source control system. [Okay, if they don't have backups, they may well not have source control either.] The code files aren't constantly changing, and are relatively small, so its pretty easy to arrange a 3am backup with an OS tool. You can use the same idiot-friendly tools you use to back up the digital photos on your home PC. I use SyncBack.

Hosting a database application is only part of the job. If you can't back that database up, you are hosed. Half of Oracle's cloud announcement at OpenWorld was facilitating backup to Amazon storage.

Properly backing up an 24/7 active database is a job for a professional. It often won't be a full-time job, especially for a small outfit. You can hire someone for a day to set up a backup situation, then do a few hours every so often to check it. Maybe get your PHP coder to do some backups/restores.

Because, after all, if the database gets lost, then all that application coding was just a waste.