Tuesday, December 22, 2009

Wide records in history tables - Squeezing the Juice

I had one of those WIDE tables, over a hundred a fifty columns which was a history style table, recording the state on particular days.

I've put together a PL/SQL routine that would dump out the columns that actually differed between successive entries.


The first parameter is the table name, or the target of the select statement.
I have used DBMS_SQL.DESCRIBE_COLUMNS rather than parsing *_TAB_COLUMNS for this parameter for greater flexibility. It means I don't have to separate out the schema from the table in separate parameters. I can even pass a inline view as the target, such as '(select * from schema1.tbl2 where val !=123)'

The second parameter is the column(s) which identify the ordering, which will normally be a date or sequence. Again it is possible to have multiple values (eg 'batch_id, line').


The third parameter is optional, and identifies the partitioning column(s). If we were looking at an employee history, this would be the employee number.


A final optional parameter allows you to limit the query to a specific value for the identifier column.


Because it uses dynamic SQL, and I've opted for flexibility over restriction, there are obvious injection possibilities. It would be vulnerable if some-one specified a function name as part of the target [eg '(select bad_func from dual)'] or as the ordering column. It is invoker rights so it won't directly allow access to anything the user can't otherwise see or do, but if it gets called through a wrapper then it becomes dangerous. As a pipelined function it can only be called through SQL, so any data changes or DDL would have to be wrapped in an autonomous transaction. But that doesn't make it bullet-proof though.

clear screen

create or replace function tab_diff
(p_tab_name in varchar2, p_ord_col in varchar2, p_id_col in varchar2 default null, p_id_val in varchar2 default null)
return tab_char_4000 AUTHID CURRENT_USER pipelined is
v_line varchar2(2000);
v_col_cnt INTEGER;
v_ind NUMBER;
rec_tab dbms_sql.desc_tab;
v_tab dbms_sql.varchar2a;
v_temp VARCHAR2(32000);
v_cursor NUMBER;
v_clause VARCHAR2(200);
begin
--
-- Basic parameter validation
--
if p_id_val is not null and p_id_col is null then
raise_application_error(-20001,'Cannot specify an id value without an associated column');
elsif p_tab_name is null or p_ord_col is null then
raise_application_error(-20001,'Table or query target and order column(s) must be specified');
end if;
--
-- Initial values
--
v_ind := 1;
IF p_id_col IS NOT NULL THEN
v_tab(v_ind) := 'select '||p_id_col||','||p_ord_col;
ELSE
v_tab(v_ind) := 'select '||p_ord_col;
END IF;
v_temp := 'select * from '||p_tab_name;
IF p_id_col IS NULL THEN
v_clause := ' over (order by '||p_ord_col||')';
ELSE
v_clause := ' over (partition by '||p_id_col||' order by '||p_ord_col||')';
END IF;
--
-- Identify the columns in the target and build the new query
--
v_cursor := dbms_sql.open_cursor;
dbms_sql.parse(v_cursor, v_temp, dbms_sql.native);
dbms_sql.describe_columns(v_cursor, v_col_cnt, rec_tab);
--
FOR v_pos in 1..rec_tab.LAST LOOP
v_line := rec_tab(v_pos).col_name;
IF v_line in (p_id_col, p_ord_col) THEN
null;
ELSE
-- When the new value doesn't match the 'lag' value, return the "new:old"
v_line := ', case when '||v_line|| ' = lag('||v_line||') '||v_clause||' then null '||
' else '||v_line||'||'':''||lag('||v_line||') '||v_clause||' end '||
v_line;
v_ind := v_ind + 1;
v_tab(v_ind) := v_line;
END IF;
END LOOP;
-- Add the FROM, WHERE and ORDER BY to the new query
v_ind := v_ind + 1;
IF p_id_val is null then
IF p_id_col is null then
v_tab(v_ind) := ' from '||p_tab_name||' order by '||p_ord_col;
ELSE
v_tab(v_ind) := ' from '||p_tab_name||' order by '||p_id_col||','||p_ord_col;
END IF;
ELSE
v_tab(v_ind) := ' from '||p_tab_name||' where '||p_id_col||' = '''||p_id_val||''' order by '||p_ord_col;
END IF;
--
-- parse, describe and execute the new query
--
BEGIN
dbms_sql.parse(v_cursor, v_tab, v_tab.first, v_tab.last, true, dbms_sql.native);
EXCEPTION
WHEN OTHERS THEN
FOR i in 1..v_tab.count LOOP
dbms_output.put_line('ERR:'||v_tab(i));
END LOOP;
RAISE;
END;
dbms_sql.describe_columns( v_cursor, v_col_cnt, rec_tab);
--
FOR i IN 1 .. rec_tab.COUNT LOOP
dbms_sql.define_column( v_cursor, i, v_line, 2000);
END LOOP;
v_ind := dbms_sql.execute( v_cursor );
--
-- Fetch each row from the result set
--
LOOP
v_ind := DBMS_SQL.FETCH_ROWS( v_cursor );
EXIT WHEN v_ind = 0;
pipe row( '===================================================================================================');
--
-- Go through each column and if the new:old value for the column is not null then display it
--
FOR v_col_seq IN 1 .. rec_tab.COUNT LOOP
-- Get the value
dbms_sql.column_value( v_cursor, v_col_seq, v_line );
IF NVL(v_line,':') != ':' THEN
pipe row( rpad(rec_tab(v_col_seq).col_name,35)||'>'||v_line);
ELSIF rec_tab(v_col_seq).col_name in (p_id_col, p_ord_col) THEN
--Also show the identifier and order columns
pipe row( rpad(rec_tab(v_col_seq).col_name,35)||'>'||v_line);
END IF;
END LOOP;
END LOOP;
return;
end tab_diff;
/

show errors function tab_diff

select * from table(tab_diff('TABLE_1','BATCH_ID','TAB_ID','TAB_ID_VAL'));

select * from table(tab_diff(
'(select * from TABLE_1 WHERE TAB_ID=1234)','BATCH_ID'));

No comments: