Saturday, February 11, 2006

More on the MODEL clause

I have been playing with the MODEL clause, on the assumption that it can probably do lots of fun and exciting things if only I knew how to use it. I think the fun part is the ITERATE clause which seems a massive step forward for the SELECT statement. SELECT basically churns through row after row of a result set, the fundamental SEQUENCE component of data processing. The WHERE clause tells it which rows to process and which to ignore, and the CASE statement (and to a lesser extent, DECODE before that) offers a powerful mechanism for choosing how to process the row, representing the SELECTION component. The next step, which had been missing until the MODEL clause, was a way to repeatedly process the same selected row, the ITERATION component.

So, lets take a single row, from the famous dual table, and continually reprocess it to generate multiple entries in a new result set, a magical number generator.

SELECT x from dual
MODEL DIMENSION BY (1 AS z) MEASURES (1 x)
RULES ITERATE (7) (x[ITERATION_NUMBER]=ITERATION_NUMBER+1);
Extending that, lets do some REAL data processing, splitting a single row with a colun containing a list of values into multiple rows.

Firstly, I used the iteration operation to loop around up to 4000 times (based on a maximum size of a VARCHAR2 in the database). But I did not want 4000 rows for each row selected, so I counted the number of delimiters in the list (the 'vcnt' made-up column) and quit the iteration when all list entries have been processed.

The tricky bit is pulling the Nth entry from the list. Firstly, I pick out the start position. For iteration zero, that will be the first character, and for the subsequent (1st, 2nd, 3rd etc) interations, the character after the Nth delimiter. Then I derive the end position using INSTR to locate the Nth+1 delimiter. Where that returns zero, I use the length of the string. Finally, I take the characters from the start position to the end position.

The great thing about the model clause, is that I can use the interim vstart and vend 'made-up columns' to come up with the var2_list column I actually want, simplifying some of the more complex derivations.


drop table str_test;

create table str_test (var_list varchar2(40));

insert into str_test values ('a,b,c,d,e,f,g,h');
insert into str_test values ('AA,BB,CC,DEF,GG');
insert into str_test values ('1,2,,45,6');
insert into str_test values ('fred,is,here,today');
insert into str_test values ('a');
insert into str_test values ('');

commit;

column var_list2 format a5

SELECT var_list, var_list2
FROM str_test
MODEL
PARTITION BY (var_list)
DIMENSION BY (0 AS z)
MEASURES (var_list var_list2,
0 vcnt, 0 as vstart, 0 as vend)
RULES ITERATE (4000) UNTIL (ITERATION_NUMBER+1 = vcnt[0] )
(vcnt[ITERATION_NUMBER]=
nvl(length(cv(var_list)) - length(replace(cv(var_list),',')) + 1,1),
vstart[ITERATION_NUMBER] =
case when ITERATION_NUMBER = 0 then 1
else instr(cv(var_list),',',1,ITERATION_NUMBER)+1 end,
vend[ITERATION_NUMBER] =
case when nvl(instr(cv(var_list),',',1,ITERATION_NUMBER+1),0) = 0
then nvl(length(cv(var_list))+1,1)
else instr(cv(var_list),',',1,ITERATION_NUMBER+1) end,
var_list2[ITERATION_NUMBER] =
substr(cv(var_list),vstart[ITERATION_NUMBER],
vend[ITERATION_NUMBER] - vstart[ITERATION_NUMBER])
)
order by var_list, vstart;

No comments: