Tuesday, February 14, 2006

STRAGG without PL/SQL - MODEL strikes again

Still playing around with MODEL. Last time I posted on using MODEL to break down a single columne containing comma separated values into multiple rows. This time, I'll go the other way, and take multiple rows and 'stragg' them up into a single comma separated list.

I use dense_rank to count, and sequence, the individual values. The order by clause in the function allows you to define how the 'fields' in the concatenated function will be ordered.

I'm using the 'return updated rows' clause to return fewer rows (ie just the one containing the concatenated value). I iterate up to 4000 times (maximum length of VARCHAR2) but quit when there isn't a row for that iteration (eg if there are five rows to be concatenated, there isn't a value for the sixth, so the iteration quits).

I'm not convinced this is the most efficient method as I think I'm iterating for each row selected. However at the moment, my involvement with 10G is purely 'play' (and all on XE), so I'm trying to experiment to see what is possible with MODEL. Once I've got more of its techniques under my belt, I'll have a better idea of how to improve my SQLs.

select data_type, seq, all_cols
from
(select data_type, cast(column_name as varchar2(4000)) cname,
dense_rank() over (partition by data_type order by column_name) -1 seq
from user_tab_columns
where table_name = 'PLAN_TABLE')
model
return updated rows
partition by (data_type)
dimension by (seq as seq)
measures (cname all_cols)
unique single reference
RULES ITERATE (4000)
UNTIL (PRESENTV(all_cols[ITERATION_NUMBER+1],1,0) = 0)
(all_cols[0] = case when ITERATION_NUMBER = 0
then all_cols[0]
else all_cols[0]||','||all_cols[ITERATION_NUMBER] end
)
order by 1,3,2;


PS. I'm trying performancing to publish this entry.

Saturday, February 11, 2006

More on the MODEL clause

I have been playing with the MODEL clause, on the assumption that it can probably do lots of fun and exciting things if only I knew how to use it. I think the fun part is the ITERATE clause which seems a massive step forward for the SELECT statement. SELECT basically churns through row after row of a result set, the fundamental SEQUENCE component of data processing. The WHERE clause tells it which rows to process and which to ignore, and the CASE statement (and to a lesser extent, DECODE before that) offers a powerful mechanism for choosing how to process the row, representing the SELECTION component. The next step, which had been missing until the MODEL clause, was a way to repeatedly process the same selected row, the ITERATION component.

So, lets take a single row, from the famous dual table, and continually reprocess it to generate multiple entries in a new result set, a magical number generator.

SELECT x from dual
MODEL DIMENSION BY (1 AS z) MEASURES (1 x)
RULES ITERATE (7) (x[ITERATION_NUMBER]=ITERATION_NUMBER+1);
Extending that, lets do some REAL data processing, splitting a single row with a colun containing a list of values into multiple rows.

Firstly, I used the iteration operation to loop around up to 4000 times (based on a maximum size of a VARCHAR2 in the database). But I did not want 4000 rows for each row selected, so I counted the number of delimiters in the list (the 'vcnt' made-up column) and quit the iteration when all list entries have been processed.

The tricky bit is pulling the Nth entry from the list. Firstly, I pick out the start position. For iteration zero, that will be the first character, and for the subsequent (1st, 2nd, 3rd etc) interations, the character after the Nth delimiter. Then I derive the end position using INSTR to locate the Nth+1 delimiter. Where that returns zero, I use the length of the string. Finally, I take the characters from the start position to the end position.

The great thing about the model clause, is that I can use the interim vstart and vend 'made-up columns' to come up with the var2_list column I actually want, simplifying some of the more complex derivations.


drop table str_test;

create table str_test (var_list varchar2(40));

insert into str_test values ('a,b,c,d,e,f,g,h');
insert into str_test values ('AA,BB,CC,DEF,GG');
insert into str_test values ('1,2,,45,6');
insert into str_test values ('fred,is,here,today');
insert into str_test values ('a');
insert into str_test values ('');

commit;

column var_list2 format a5

SELECT var_list, var_list2
FROM str_test
MODEL
PARTITION BY (var_list)
DIMENSION BY (0 AS z)
MEASURES (var_list var_list2,
0 vcnt, 0 as vstart, 0 as vend)
RULES ITERATE (4000) UNTIL (ITERATION_NUMBER+1 = vcnt[0] )
(vcnt[ITERATION_NUMBER]=
nvl(length(cv(var_list)) - length(replace(cv(var_list),',')) + 1,1),
vstart[ITERATION_NUMBER] =
case when ITERATION_NUMBER = 0 then 1
else instr(cv(var_list),',',1,ITERATION_NUMBER)+1 end,
vend[ITERATION_NUMBER] =
case when nvl(instr(cv(var_list),',',1,ITERATION_NUMBER+1),0) = 0
then nvl(length(cv(var_list))+1,1)
else instr(cv(var_list),',',1,ITERATION_NUMBER+1) end,
var_list2[ITERATION_NUMBER] =
substr(cv(var_list),vstart[ITERATION_NUMBER],
vend[ITERATION_NUMBER] - vstart[ITERATION_NUMBER])
)
order by var_list, vstart;

Tuesday, February 07, 2006

Oracle security interview - thoughts

Interesting article through Pete Finnegan's blog from an interview with one of oracle's security people.

A few quotes:
"We don’t hide our internally discovered vulnerabilities. When we discover something internally, we still mention it in our Critical Patch Updates."

"There are others [security researchers] who for their own good reasons choose to pressure us and put our customers at risk by a partial or early or zero-day disclosure of vulnerabilities in Oracle products."

So let me get this straight. If Oracle employees find a bug, then disclose it, it is a GOOD THING. If external parties practise partial disclosure, they are putting customers at risk. The exact meaning of 'zero day' disclosure is debateable. If it means disclosure prior to informing Oracle, then there is no excuse and it is BAD. They MAY mean disclosure before Oracle have patched it, but then what is "early disclosure" ?
Where they are referring to disclosure before a patch is available, there is room for debate. If there is a potential workaround to reduce risk, then publicising the issue plus the workaround, could benefit some customers.

"If you look at all of the vulnerabilities that my security group handles, we discover about 75% of them. About 10% is reported to us by our customers. The remainder comes to us through external security researchers. "

Mostly figures should add up to 100% but I don't think this is one of them. Remember, Oracle don't publicise bugs before they patch them, so a vulnerability could be 'discovered' by customers, researchers AND internal Oracle staff, just like America being discovered by the Carthaginians, Vikings, Chinese, Columbus, and so on (not to mention people already actually living there).



HTMLDB rename poll results and blog colours

Firstly, the results of the poll are in, and half opted for APEX as a quickie name for Application Express/HTMLDB, which is fine by me.
Luckily it seems to be what Oracle are going towards, as 'apex' gets used here.

AndyC commented on the history of Oracle Bloggers
Not sure whether the comment about tweaks and garish colours was meant for me. If not, it probably should be. I was looking for some background images, and the bluish mazes one came with OpenOffice as a web-background. I've now toned it down a few shades, but left the original colours for the title box. I think it works, but what do I know ?

My main issue with the original templates are, having the sidebar on the left means some scrolling can be needed in 'narrow' windows. Having it on the right means that, when I'm lazy and use a PRE tag to highlight code, long lines of code can run over the edge of the content area into the sidebar and making it hard to read.

So I've shoved all that stuff down the bottom, with a links up the top. And I had to hunt around for a workaround to IE's inability to respect the MAXWIDTH CSS directive.