Wednesday, September 28, 2005

J Lewis 'Cost Based Oracle' Apress link

Apress have put in some details on Jonathon Lewis's book here
Price at $50 (US I guess) which is the same as Tom Kyte's
400 pages. That's less than Tom's (750+ pages), so maybe I'll be able to carry it around for reading on the train without my bag strap leaving a mark :)

Apart from that, no formal release date, and no information that wasn't on Jonathan's site

Tuesday, September 27, 2005

AskTom changes on a (distant?) horizon

Looks like there'll be some changes to AskTom coming sometime.

"I'm rewriting the site :) will be locking threads after 4 weeks (will allow for emergency bug fixes to the content to be alerted to me of course - so they won't be locked totally, you'll still be able to comment on them but if the comment isn't relevant to the existing text - a question about how it worked or pointing out a mistake - it'll never see the light of day)"

I think it will be an improvement. At the moment there is the conundrum of not wanting to miss anything, but also not wanting to plough through text from the several years ago. Plus topic drift makes searching more difficult as you can get lots of results returned where the titles don't seem to match what you are looking for but you need to look at the article to be certain.

The trick for Tom will be managing late comments that are 'sort-of' relevant. Maybe some will make it as distinct questions in their own right. To avoid that becoming a route for bypassing the "I'm not accepting questions now", you could simply hold off posting those as new questions for a week or two so that people get the message that it isn't a 'fast track' for answers.

Tuesday, September 20, 2005

Locating an error in a query run using Dynamic SQL

Well not everyone has managed to get to San Francisco. This morning I was debugging some of my code which was falling over trying to run a complex query though a refcursor. Unfortunately, in 9iR2 at least, Oracle's error message was a terse "Invalid argument" and didn't tell me the location of the problem.

So I threw together a quick procedure that uses the LAST_ERROR_POSITION function in DBMS_SQL to help me out. As you can see, its easy enough to plug into any routine and you can just put in the query and it will parse it, locate the position of the error and let you know. Obviously this example is trivial, but with an SQL with a couple of dozen lines it is pretty handy.

declare
v_query varchar2(2000);
rc_dummy sys_refcursor;
PROCEDURE p_dynamic_sql_err_pos (p_query IN VARCHAR2) IS
v_cursor NUMBER;
v_temp VARCHAR2(32000) := p_query;
v_tab DBMS_SQL.VARCHAR2S;
v_ind NUMBER := 1;
begin
-- Pass in a query and it will be parsed and the location
-- of any error will be reported
v_cursor := dbms_sql.open_cursor;
IF length(v_temp) > 2000 THEN
WHILE nvl(length(v_temp),0) > 0 LOOP
v_tab(v_ind) := substr(v_temp,1,250);
v_temp := substr(v_temp,251);
v_ind := v_ind + 1;
END LOOP;
dbms_sql.parse(v_cursor, v_tab, 1, v_ind - 1, FALSE, 1);
ELSE
dbms_sql.parse(v_cursor, v_temp, 1);
END IF;
dbms_sql.close_cursor(v_cursor);
exception
WHEN OTHERS THEN
v_ind := dbms_sql.last_error_position;
v_temp := substr(sqlerrm,1,100)||' before '||
substr(p_query,v_ind,60)||' ('||v_ind||') ';
raise_application_error(-20001,v_temp);
end;
begin
v_query := 'select decode(1) from dual';
begin
open rc_dummy for v_query;
exception when others then
p_dynamic_sql_err_pos(v_query);
end;
close rc_dummy;
end;

Friday, September 16, 2005

Tracking my blog's "customers"

Peter Scott's blog recently looked at problems with identification of customers, and the first response remarked on how much harder it is with internet 'click' customers as opposed to 'real' people walking about. I'm going to talk about what I know, or think I know, about my blog's customers.

A blog's customers are its readers and despite (or because of) being new, I've kept an eye on my site stats to try to get a feel for my customers. Since I left my SiteMeter public, you can see them too, just by clicking on the link to SiteMeter at the bottom of the blog or here [go on, have a look at what I can tell about you :) ].

My item on Jonathon Lewis's book got triple the traffic of any other page though I don't think he should get too excited since we are still talking double figures here. But for me trying to understand my readers, it is interesting. Most of my visitors, where I can tell, have come from orablogs or a similar blog aggregator. Although there were links about the book on AskTom's front page and Jonathan's own page, neither featured in an RSS feed, so it was probably news to most readers. Of course, I've got no idea how many readers OraBlogs have, so I am probably judging by too small a sample.

Since my previous internet activities were in forums, my blog extended from there, being mostly technical and resembling answers to questions which hadn't actually been asked. While I'll still do that, the 'Jonathon Lewis' spike gave me a feel for my 'customers' suggesting that blogging, and being syndicated by OraBlogs, is more like having a chat in the pub (or coffee-shop) with your workmates.

The Oracle WTF [http://oracle-wtf.blogspot.com] definately has that "You'll never believe what I saw today..." feel and other blogs have newsy items such as what has been released, who is going where and so forth. I'm therefore going to try to throw some more non-technical talking points out and see what happens.


PS.
Judging by my SiteMeter data, especially with my relatively small traffic figures, its feasible to track down details of readers and especially those who leave comments. I wanted a counter at least to pick up hit numbers and was a bit surprised by the amount of information it recorded, especially with the referrer details. I think I can turn the referrer details bit off and/or remove public access from my site traffic. However I don't want to get into some of the heated discussions that can go on, and I don't expect people to post anything that they wouldn't say 'aloud'. Since the information is recorded I figured I might as well let people see what is collected about them. My own 'Freedom of Information' policy if you like.

I'm open to persuasion though ironically you'd need to leave a comment here to voice your argument. Obviously you can use one of the anonymiser sites, and I think the referrer bit can be blocked by stopping Javascript and/or following the instructions here for Mozilla/Firefox.

Wednesday, September 14, 2005

Jonathon Lewis book Cost Based Oracle due Nov

This looks like a book worth having if you really need to understand how to deal with Oracle's CBO.
Nothing on Amazon or the Apress website yet so I don't know the price. Unfortunately, there isn't an indication of size either. The optimizer is one of those subjects which could, if demonstrations and detailed analysis is shown, easily run to a thousand pages. While that information is valuable, it also puts up the cost and hence the price, which will limit its affordability. Hopefully any such detail has been consigned to the zip-file of scripts mentioned.

The technical reviewers were Wolfgang Breitling and Christian Antognini. While I hadn't previously encountered any of Christian's work, I followed the links and found one of his presentations and it's worth a read. I've referred to Wolfgang's work in my post on 10053 and judging by previous papers, I'd be confident that anything he had technically reviewed would be sound.

I do like the teaser in the preface - "The preface exists to tell you what’s not in Volume 1.".

Views, DB Links, the CBO and indexes

This builds on my previous post and I'll use the same LOTS_OF_PEOPLE table for my demonstration

Firstly, I'll create a view that does nothing but rearrange the columns in a table. A single table view might be used where you want a groups of users to only see some columns in a table or where you want to include a pseudo-column derived from one or more of the other columns in the table.

create view lp_view as select first_name, surname, lp_id, gender from lots_of_people;
Again, I'll use a DB Link and the local DUAL table to ensure the query is split between the local and remote databases.


EXPLAIN PLAN SET STATEMENT_ID = 'TEST1' INTO PLAN_TABLE FOR
select lp_id from lots_of_people@loop, (select 1 from dual where rownum = 1) d
where lp_id = 5;

select level, lpad('_',1*(level-1),'_')||operation||'('||options||')' operation,
object_owner||'.'||object_name obj_name,
ltrim(to_char(cardinality,'999,999,999')) no_rows, object_node
link, id, parent_id, other, filter_predicates
from PLAN_TABLE
start with id=0 and statement_id = 'TEST1'
connect by prior id = parent_id and prior statement_id = statement_id
and prior timestamp <= timestamp order by id, position

LEVEL OPERATION OBJ_NAME NO_ROWS LINK ID PARENT_ID
----- ---------------------- --------- ------- ---- ---- ----------
OTHER FILTER_PRE
---------------------------------------------------------------------- ----------
1 SELECT STATEMENT() . 1 0
2 _NESTED LOOPS() . 1 1 0
3 __REMOTE() . 1 LOOP 2 1
SELECT "LP_ID" FROM "LOTS_OF_PEOPLE" "LOTS_OF_PEOPLE" WHERE "LP_ID"=5
3 __VIEW() USERXX. 1 3 1
4 ___COUNT(STOPKEY) . 4 3
ROWNUM=1
5 ____TABLE ACCESS(FULL) SYS.DUAL 8,168 5 4

I've queried the plan table directly to show the OTHER column, containing the query sent to the remote database, which isn't shown by DBMS_XPLAN in 9.2.0.4.
You can see that the optimizer expects one row to be returned by the query on LP_ID. That indicates that it knows about the unique index on that column.

In contrast, we can see a similar query using the remote view

EXPLAIN PLAN SET STATEMENT_ID = 'TEST2' INTO PLAN_TABLE FOR
select lp_id from lp_view@loop, (select 1 from dual where rownum = 1) d
where lp_id = 5;

select level, lpad('_',1*(level-1),'_')||operation||'('||options||')' operation,
object_owner||'.'||object_name obj_name,
ltrim(to_char(cardinality,'999,999,999')) no_rows, object_node
link, id, parent_id, other, filter_predicates
from PLAN_TABLE
start with id=0 and statement_id = 'TEST2'
connect by prior id = parent_id and prior statement_id = statement_id
and prior timestamp <= timestamp order by id, position;

LEVEL OPERATION OBJ_NAME NO_ROWS LINK ID PARENT_ID
----- ------------------------- --------- ------- ---- ---- ----------
OTHER FILTER_PRE
------------------------------------------------------ ----------
1 SELECT STATEMENT() . 825 0
2 _MERGE JOIN(CARTESIAN) . 825 1 0
3 __VIEW() USERXX. 1 2 1
4 ___COUNT(STOPKEY) . 3 2
ROWNUM=1
5 ____TABLE ACCESS(FULL) SYS.DUAL 8,168 4 3
3 __BUFFER(SORT) . 825 5 1
4 ___REMOTE() . 825 LOOP 6 5
SELECT "LP_ID" FROM "LP_VIEW" "LP_VIEW" WHERE "LP_ID"=5
Here it expects 825 rows , or 1% of the table, to be returned by the remote query. The index on the table doesn't show up against the view and so its information is unavailable.
Both queries in the 'OTHER' column will be passed to the remote database for parsing though, and there the view will be resolved against the table and the index can be used. However the fact that the local database doesn't know about the index can mean that the best plan is missed.

Finally, I'll include an extract of the 10053 dump file which, for the table, show the LP_ID column with no nulls and the same number as distinct values as the table's cardinality, plus an index on the column. The view shows no such information.

***********************
Table stats Table: LOTS_OF_PEOPLE Alias: LOTS_OF_PEOPLE
TOTAL :: CDN: 82522 NBLKS: 496 AVG_ROW_LEN: 37
Column: LP_ID Col#: 1 Table: LOTS_OF_PEOPLE Alias: LOTS_OF_PEOPLE
NDV: 82522 NULLS: 0 DENS: 1.2118e-05 LO: 1 HI: 82522
NO HISTOGRAM: #BKT: 1 #VAL: 2
-- Index stats
INDEX NAME: 0 COL#: 2 3
TOTAL :: LVLS: 2 #LB: 485 #DK: 78007 LB/K: 1 DB/K: 1 CLUF: 14253
INDEX NAME: 0 COL#: 1
TOTAL :: LVLS: 1 #LB: 154 #DK: 82522 LB/K: 1 DB/K: 1 CLUF: 489
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
TABLE: LOTS_OF_PEOPLE ORIG CDN: 82522 ROUNDED CDN: 1 CMPTD CDN: 1
Access path: tsc Resc: 32 Resp: 32
Access path: index (iff)
Index: 0
TABLE: LOTS_OF_PEOPLE
RSC_CPU: 0 RSC_IO: 11
IX_SEL: 0.0000e+00 TB_SEL: 1.0000e+00
Access path: iff Resc: 11 Resp: 11
Access path: index (unique)
Index: 0
TABLE: LOTS_OF_PEOPLE
RSC_CPU: 0 RSC_IO: 1
IX_SEL: 1.0000e+00 TB_SEL: 1.0000e+00
Access path: index (eq-unique)
Index: 0
TABLE: LOTS_OF_PEOPLE
RSC_CPU: 0 RSC_IO: 1
IX_SEL: 0.0000e+00 TB_SEL: 0.0000e+00
One row CDN: 1
BEST_CST: 2.00 PATH: 3 Degree: 1
BEST_CST: 8.00 PATH: 2 Degree: 1

***********************
Table stats Table: LP_VIEW Alias: LP_VIEW
TOTAL :: CDN: 82522 NBLKS: 496 AVG_ROW_LEN: 37
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column: LP_ID Col#: 3 Table: LP_VIEW Alias: LP_VIEW
NO STATISTICS (using defaults)
NDV: 2579 NULLS: 0 DENS: 3.8778e-04
NO HISTOGRAM: #BKT: 0 #VAL: 0
TABLE: LP_VIEW ORIG CDN: 82522 ROUNDED CDN: 825 CMPTD CDN: 825
Access path: tsc Resc: 32 Resp: 32
BEST_CST: 32.00 PATH: 2 Degree: 1
BEST_CST: 8.00 PATH: 2 Degree: 1


Tuesday, September 13, 2005

Views, DB Links and the CBO

Two exceptions to some general conceptions about Oracle's CBO, both related to DB Links.
Firstly, that the order of the tables in the FROM clause is no longer relevant.
Secondly, that selecting from a view and from the underlying query are the same thing.

When a query involves objects held in a local database and on a remote database over the database link, Query Optimization has another level of complexity. Simply put, neither database has all the information, and so there is more guesswork involved.

Here's the setup :

drop table lots_of_people;
drop table lookup;

create table lookup (lookup_code varchar2(2) constraint l_pk primary key, lookup_desc varchar2(30));

create table lots_of_people
(lp_id number constraint p_pk primary key, surname varchar2(30), first_name varchar2(30),
gender varchar2(2) constraint lk_fk references lookup);

insert into lookup values ('M','MALE');
insert into lookup values ('F','FEMALE');

insert into lots_of_people (lp_id, surname, first_name, gender)
select rownum, table_name, column_name, decode(mod(rownum,1),1,'M','F')
from dba_tab_columns;

create index lp_ix on lots_of_people (surname, first_name);

BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER,tabname => 'LOOKUP',estimate_percent => 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY', CASCADE=>TRUE);
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER,tabname => 'LOTS_OF_PEOPLE',estimate_percent => 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY', CASCADE=>TRUE);
END;
.
/

select table_name, num_rows from user_tables
where table_name in ('LOOKUP','LOTS_OF_PEOPLE')
/

TABLE_NAME NUM_ROWS
------------------------------ ----------
LOOKUP 2
LOTS_OF_PEOPLE 82522

create view lots_of_people_vw1
as select lp.lp_id, lp.surname, lp.first_name, l.lookup_desc gender_desc
from lots_of_people lp, lookup l
where l.lookup_code = lp.gender;

create view lots_of_people_vw2
as select lp.lp_id, lp.surname, lp.first_name, l.lookup_desc gender_desc
from lookup l, lots_of_people lp
where l.lookup_code = lp.gender;

create database link loop connect to **USER** identified by **PASSWORD** using '**ORIGDB**';

Note: The two views differ only in the order of the tables in the FROM clause.
I used a database link back to the same user and database so I didn't need to involve either two users or two databases.

Now we'll try a query that can be fully resolved at the remote database. When no local objects are involved, the local database will throw the problem over to the remote database.

select * from lots_of_people_vw1@loop
where surname = 'PLAN_TABLE';

EXPLAIN PLAN SET STATEMENT_ID = 'TEST1' INTO PLAN_TABLE FOR
select * from lots_of_people_vw1@loop
where surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST1') );

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | | 61 | 2745 | 6 (0)| |
|* 1 | HASH JOIN | | 61 | 2745 | 6 (0)| |
| 2 | TABLE ACCESS BY INDEX ROWID| LOTS_OF_PEOPLE | 61 | 2257 | 3 (0)| ORIGDB |
|* 3 | INDEX RANGE SCAN | LP_IX | 61 | | 3 (0)| ORIGDB |
| 4 | TABLE ACCESS FULL | LOOKUP | 2 | 16 | 2 (0)| ORIGDB |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("L"."LOOKUP_CODE"="LP"."GENDER")
3 - access("LP"."SURNAME"='PLAN_TABLE')

Note: fully remote operation
That gives a sensible result.
Now try it using both a local and a remote table which, without a DRIVING_SITE hint will generally be resolved locally. I've used dual as the local table. From the point of view of the demonstration, the part of the plan(s) relating to dual can be ignored.

EXPLAIN PLAN SET STATEMENT_ID = 'TEST2' INTO PLAN_TABLE FOR
select * from lots_of_people_vw1@loop, dual d
where surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST2') );

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Inst |IN-OUT|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 163 | 1630 | 10 | | |
| 1 | MERGE JOIN CARTESIAN| | 163 | 1630 | 10 | | |
| 2 | REMOTE | | 1 | 8 | 2 | LOOP | R->S |
| 3 | BUFFER SORT | | 8168 | 16336 | 8 | | |
| 4 | TABLE ACCESS FULL | DUAL | 8168 | 16336 | 8 | | |
------------------------------------------------------------------------------------

Note: cpu costing is off

Here it thinks only 1 row will typically be returned from the remote operation.
Now using the other view (which differed only in the table order in the FROM clause).

EXPLAIN PLAN SET STATEMENT_ID = 'TEST3' INTO PLAN_TABLE FOR
select * from lots_of_people_vw2@loop, dual d
where surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST3') );

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Inst |IN-OUT|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6740K| 250M| 6632 | | |
| 1 | MERGE JOIN CARTESIAN| | 6740K| 250M| 6632 | | |
| 2 | REMOTE | | 825 | 30525 | 32 | LOOP | R->S |
| 3 | BUFFER SORT | | 8168 | 16336 | 6600 | | |
| 4 | TABLE ACCESS FULL | DUAL | 8168 | 16336 | 8 | | |
------------------------------------------------------------------------------------

Note: cpu costing is off

Now it has decided on 825 rows (or 1% of the total number of rows in the LOTS_OF_PEOPLE table).

Finally, with the driving site hint.
EXPLAIN PLAN SET STATEMENT_ID = 'TEST4' INTO PLAN_TABLE FOR
select /*+DRIVING_SITE (l) */ * from lots_of_people_vw2@loop l, dual d
where l.surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST4') );

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | | 500K| 22M| 22 (0)| |
|* 1 | HASH JOIN | | 500K| 22M| 22 (0)| |
| 2 | TABLE ACCESS BY INDEX ROWID| LOTS_OF_PEOPLE | 61 | 2257 | 3 (0)| ORIGDB |
|* 3 | INDEX RANGE SCAN | LP_IX | 61 | | 3 (0)| ORIGDB |
| 4 | MERGE JOIN CARTESIAN | | 16336 | 159K| 18 (0)| |
| 5 | TABLE ACCESS FULL | LOOKUP | 2 | 16 | 2 (0)| ORIGDB |
| 6 | BUFFER SORT | | 8168 | 16336 | 16 (0)| |
| 7 | REMOTE | | 8168 | 16336 | | ! |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("L"."LOOKUP_CODE"="LP"."GENDER")
3 - access("LP"."SURNAME"='PLAN_TABLE')

Note: fully remote operation
Back to a sensible plan.

Finally, using the query itself rather than a view :

EXPLAIN PLAN SET STATEMENT_ID = 'TEST5' INTO PLAN_TABLE FOR
select *
from (select lp.lp_id, lp.surname, lp.first_name, l.lookup_desc gender_desc
from lots_of_people@loop lp, lookup@loop l
where l.lookup_code = lp.gender) l, dual d
where l.surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST5') );
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 249K| 11M| 21 (0)| | |
|* 1 | HASH JOIN | | 249K| 11M| 21 (0)| | |
| 2 | REMOTE | | 15 | 555 | | LOOP | R->S |
| 3 | MERGE JOIN CARTESIAN| | 16336 | 159K| 18 (0)| | |
| 4 | REMOTE | | 2 | 16 | | LOOP | R->S |
| 5 | BUFFER SORT | | 8168 | 16336 | 16 (0)| | |
| 6 | TABLE ACCESS FULL | DUAL | 8168 | 16336 | 8 (0)| | |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("L"."LOOKUP_CODE"="LP"."GENDER")

[Looking at the 'OTHER' column in the plan table shows that ID 2 relates to the LOTS_OF_PEOPLE component and ID 4 is the LOOKUP component].

In summary, when presented with a object in a remote database, the local database doesn't have much idea about it. With a remote table, the remote database will supply some information on the number of rows and indexes. When it comes down to views, the remote database gives a 'guess', which appears to be the number of rows in the last table mentioned in the FROM clause.

In my next post, I'll look at the 10053 plan for remote views and discuss the repurcussions for index usage.

Ashes and my home PC woes. Nothing really Oracley

It's already been mentioned here but being a Pom living in Australia, I have to mention the Ashes victory too. The urn holding the Ashes stays in England whatever happens. Ironically, it has made one trip to Australia in 1988, after England last won the trophy in 1986/1987 and now England have won it again, is expected to make another journey for 2006/2007.

Meanwhile American readers are wondering how a game can go on for five days and STILL finish without a result. And anyone who has experienced an English summer wonders how the English could come up with a game that actually takes five days of fine weather to play anyway.
---------------------------------------------------------------------------

A few months ago, my home PC got an overhaul. The main bits was a new 80Gb hard drive, more memory and a switch from Windows 98 to XP. All in all it went pretty well (except for a new firewire card which won't talk to my videa camera).


Then two weeks ago, the PC got stroppy, and said it wouldn't talk to my new hard drive without an 80 wire cable. Apparently the 40 wire cable that I had for my old 4Gb hard drive wasn't good enough any more. A web search indicated that the 80 wire cable, while not mandatory, was faster and since it was only $5 it seemed a cheap enough fix. Swapped the cable, and the PC seemed fine.


I wanted Linux on the machine for many reasons. One of which was the install of Oracle10G on XP didn't go well (possibly because I refused to let Java.exe through the firewall, possibly because I try to rush these things in an hour or so between getting the toddler asleep and falling asleep myself). But hearing of someone who had his internet savings account hacked, and that Windows really needs twenty gazillion variants of spyware checked to be safe, I moved up my plans.


I didn't think far enough ahead when installing XP, so my 80Gb hard drive had two partitions taking up the first 30Gb of the drive. Apparently Linux likes being installed, at least in part, within the first 8.5Gb. So I created a third partition, moved all the windows stuff from the second partition to the third and wiped the seond one to give linux a clean slot within the first 8.5Gb and another slot at the end of the drive. I also (finally) moved a load of old pictures and documents from the Windows98 era 4Gb drive to this new windows partition, and wiped that old drive.


About then, the PC started thowing another wobbler, this time reporting a Disk Boot Failure.

The XP disks said they couldn't find an XP installation to repair.


So I tried to see if I could trick it and get Linux installed (at least to get access to my files again). Ubuntu went in quite happily and recognised that I also had Windows XP installed. Shame Windows XP couldn't find it. And it also recognised the existing NTFS partitions and was quite happy to create ext3 partitions for linux in the spaces between.


It was still coming back with Disk Boot Failure. However I gave it a night's rest powered off. Last night I rebooted and the GRUB menu came up and both Ubuntu and Windows XP are running fine. Hopefully I'll be able to get Oracle installed and running sometime soon.


PS.

Before anyone asks why I want Windows XP, my three-year old is already practising mouse and keyboard skills with Winnie the Pooh, Bob the Builder and some Matchbox FireTruck/Police Car game.


Thursday, September 08, 2005

Atom/RSS Feeds from blogger.com

I just noticed that, although BLOGGER.COM makes it pretty easy to set up a Atom feed for Feed Readers, it didn't automatically include a link in the template for it.
So I've added the link (and another that gets processed through Feedburner). If you are looking for a feed from a blogger here, just try adding /atom.xml to their main URL and see if one exists.

I've just started using FeedReader and it's a lot easier than running through a list of bookmarks to see if anything is new.

Analytics with ORDER BY and DISTINCT STRAGG

A recent request in a help forum required an SQL that would return rows until 5 distinct values of a specific column had been returned.

Using analytics was proving difficult as the DISTINCT clause wouldn't work with an ORDER BY. While a non-analytical function was found, I offered an analytic SQL that used a user-defined aggregate. This is similar to STRAGG but eliminates duplicates prior to returning the result string. [The elimination process involves 'associative arrays' - the old INDEX BY tables with a VARCHAR2 index. This was new in 9iR2 so won't work in earlier versions where you'd need to develop your own sort routine.]


create or replace type str_dist_agg_type as object
(
total varchar2(4000),
static function
ODCIAggregateInitialize(sctx IN OUT str_dist_agg_type )
return number,
member function
ODCIAggregateIterate(self IN OUT str_dist_agg_type ,
value IN varchar2 )
return number,
member function
ODCIAggregateTerminate(self IN str_dist_agg_type,
returnValue OUT varchar2,
flags IN number)
return number,
member function
ODCIAggregateMerge(self IN OUT str_dist_agg_type,
ctx2 IN str_dist_agg_type)
return number,
member function
ODCIAggregateDelete(self IN OUT str_dist_agg_type,
value in varchar2)
return number
)
/

create or replace type body str_dist_agg_type
is
--
static function ODCIAggregateInitialize(sctx IN OUT str_dist_agg_type)
return number
is
begin
sctx := str_dist_agg_type( null );
return ODCIConst.Success;
end;
--
member function ODCIAggregateIterate(self IN OUT str_dist_agg_type,
value IN varchar2 )
return number
is
begin
--Use chr(7) as an internal separator because, since
--it is a pretty useless control character, it is unlikely
--to be in the input data or cause any internal problems.
self.total := self.total || value || chr(7);
return ODCIConst.Success;
end;
--
member function ODCIAggregateTerminate(self IN str_dist_agg_type,
returnValue OUT varchar2,
flags IN number)
return number
is
type tab_dist is table of number index by varchar2(4000);
t_dist tab_dist;
v_point number := 0;
v_str varchar2(4000);
v_sep varchar2(1) := chr(7);
v_ret varchar2(4000);
begin
v_str := rtrim(ltrim(self.total,v_sep),v_sep);
v_point := instr(v_str,v_sep);
while nvl(v_point,0) != 0 loop
t_dist(substr(v_str,1,v_point-1)) := 1;
v_str := ltrim(substr(v_str,v_point),v_sep);
v_point := instr(v_str,v_sep);
end loop;
t_dist(v_str) := 1;
--
v_str := t_dist.first;
while v_str is not null loop
v_ret := v_ret || v_str;
v_str := t_dist.next(v_str);
end loop;
returnValue := v_ret;
return ODCIConst.Success;
end;
--
member function ODCIAggregateDelete(self IN OUT str_dist_agg_type,
value in varchar2)
return number
is
v_point number;
begin
v_point := instr(self.total,value||chr(7));
if v_point > 0 then
self.total := substr(self.total,1,v_point-1)||
substr(self.total, v_point+length(value)+1);
end if;
return ODCIConst.Success;
end;
--
member function ODCIAggregateMerge(self IN OUT str_dist_agg_type,
ctx2 IN str_dist_agg_type)
return number
is
begin
self.total := self.total ||chr(7)||ctx2.total;
return ODCIConst.Success;
end;
end;
/

CREATE or replace
FUNCTION strdagg(input varchar2 )
RETURN varchar2
PARALLEL_ENABLE AGGREGATE USING str_dist_agg_type;

grant execute on strdagg to public;

Note : STRDAGG uses the CHR(7) character internally as a delimiter, so if this is in the data you are likely to see problems. No 'visible' delimiter is shown, so if you want one you'll need to specify it as part of the expression :

select table_name, column_name, strdagg(data_type||',') over
(partition by table_name order by table_name, column_id) data_types
from user_tab_columns
where table_name in ('CHILD','PARENT')

TABLE_NAME COLUMN_NAME DATA_TYPES
---------- --------------- --------------------
CHILD ID NUMBER,
CHILD PARENT_ID NUMBER,
CHILD DESCRIPTION NUMBER,VARCHAR2,
PARENT ID NUMBER,
PARENT DESCRIPTION NUMBER,VARCHAR2,

I've put all the 'tricky' processing in the TERMINATE function so it can be easily amended (eg if you wanted to return the number of duplicates for each value or have a case-insensitive sort).

(If you are interested in the original problem and the non-analytical solution, it is in the Quest Pipelines' PL/SQL forum entitled 'Tricky SQL Challenge' in the 'Writing and Tuning SQL' conference.)

Tuesday, September 06, 2005

Explain Plans and Scalar Subqueries

Explain plans can give a misleading impression for queries which include a scalar subquery.

Here is an example of some different approaches for a query, and their explain plans and actual runs.

drop table parent;
drop table child;
create table parent (id number primary key, description varchar2(80));
create table child (id number primary key,
parent_id number constraint par_fk references parent,
description varchar2(10));

insert into parent
select rownum, table_name||'.'||column_name
from all_tab_columns
where rownum <= 10000;
insert into child select rownum, p.id, m.column_value
from parent p, table(sys.DBMS_DEBUG_VC2COLL('Number 1','Number 2')) m;
commit;
exec dbms_stats.gather_table_stats(user,'PARENT',null,100);
exec dbms_stats.gather_table_stats(user,'CHILD',null,100);
commit;

Now, have a look at various forms of a query to find the best option
set autotrace off
set pages 0
set lines 120

Firstly, analytics plus a DISTINCT

EXPLAIN PLAN SET STATEMENT_ID = 'T1' INTO PLAN_TABLE FOR
select distinct p.id, count(c.id) over (partition by c.parent_id)
from parent p, child c
where p.id=c.parent_id;

select * from TABLE( dbms_xplan.display);

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 253K| 140 (1)|
| 1 | SORT UNIQUE | | 20000 | 253K| 140 (1)|
| 2 | WINDOW SORT | | 20000 | 253K| 140 (1)|
| 3 | NESTED LOOPS | | 20000 | 253K| 8 (13)|
| 4 | TABLE ACCESS FULL| CHILD | 20000 | 175K| 6 (0)|
|* 5 | INDEX UNIQUE SCAN| SYS_C00109804 | 1 | 4 | |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("P"."ID"="C"."PARENT_ID")

When run it gave

Statistics
-----------------------------------------------------
0 recursive calls
0 db block gets
20070 consistent gets
0 physical reads
0 redo size
124037 bytes sent via SQL*Net to client
4936 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
10000 rows processed


Now with an inline view.

EXPLAIN PLAN SET STATEMENT_ID = 'T1' INTO PLAN_TABLE FOR
select distinct p.id, c.cnt
from parent p,
(select parent_id, count(id) cnt
from child group by parent_id) c
where p.id=c.parent_id;

select * from TABLE( dbms_xplan.display);

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 205K| 82 (2)|
| 1 | SORT UNIQUE | | 10000 | 205K| 82 (2)|
| 2 | NESTED LOOPS | | 10000 | 205K| 38 (3)|
| 3 | VIEW | | 10000 | 166K| |
| 4 | SORT GROUP BY | | 10000 | 40000 | 36 (0)|
| 5 | TABLE ACCESS FULL| CHILD | 20000 | 80000 | 6 (0)|
|* 6 | INDEX UNIQUE SCAN | SYS_C00109804 | 1 | 4 | |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("P"."ID"="C"."PARENT_ID")

Statistics
----------------------------------------------------
5 recursive calls
0 db block gets
10072 consistent gets
0 physical reads
0 redo size
124001 bytes sent via SQL*Net to client
4936 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
10000 rows processed


Now with a Scalar subquery.

EXPLAIN PLAN SET STATEMENT_ID = 'T1' INTO PLAN_TABLE FOR
select p.id,
(select count(c.id) from child c where c.parent_id = p.id)
from parent p;

select * from TABLE( dbms_xplan.display);

-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 40000 | 3 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
|* 2 | TABLE ACCESS FULL | CHILD | 2 | 8 | 6 |
| 3 | INDEX FAST FULL SCAN| SYS_C00109804 | 10000 | 40000 | 3 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C"."PARENT_ID"=:B1)

Statistics
----------------------------------------------------
0 recursive calls
0 db block gets
680689 consistent gets
0 physical reads
0 redo size
124048 bytes sent via SQL*Net to client
4936 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed
The 'inline view' ran best of all, mostly because it does exactly what we want. It summarises up the CHILD table by the parent id, and joins it up with the PARENT (or in this case the index on the parent as it doesn't need the base table).

The 'scalar subquery' version was the worst performing. Although the lookup table will be accessed ten thousand times, the explain plan only shows 2 rows and a total cost of 6. And furthermore, the cost isn't added to the '3' cost for the parent table, giving a total cost of 3. Even the 'Bytes' is based only on the parent table data. From the optimizers point of view, it can calculate the cost of running the scalar subquery once. But, given that the number of times it has to be run (and therefore its total cost) will not alter whatever plan the optimizer chooses, then there is no point in trying to factor that cost in anywhere.

The explain plan is relevant only to the query passed in, and explains HOW that query will be evaluated. The role of the developer is to interpret the plan and work out if that is an appropriate way to go about answering the question for which the query was developed.

PS. This was run in 9iR2. If there is any improvement in later Oracle versions, I'd love to know

Friday, September 02, 2005

Functions as datasources and NO_DATA_FOUND

Quick post today, on a 'gotcha' with NO_DATA_FOUND.

In Oracle9i, you can define a function that returns a collection and simple do a
SELECT * from table(function_name);

So lets try it.


create or replace function f_alpha
return sys.DBMS_DEBUG_VC2COLL is
v_ret sys.DBMS_DEBUG_VC2COLL := sys.DBMS_DEBUG_VC2COLL ();
begin
v_ret.extend(26);
for i in 1..26 loop
v_ret(i) := chr(64+i);
end loop;
return v_ret;
end;
.
/

select * from table(f_alpha);

That nicely gives you the letters of the alphabet.
Now lets try an error

create or replace function f_alpha
return sys.DBMS_DEBUG_VC2COLL is
v_ret sys.DBMS_DEBUG_VC2COLL := sys.DBMS_DEBUG_VC2COLL ();
begin
v_ret.extend(26);
for i in 1..26 loop
v_ret(i) := to_char(1/(26-i),'0.000');
end loop;
return v_ret;
end;
.
/

select * from table(f_alpha);

In this case we get no data returned, and a 'divide by zero' error.

Lets try it pipelined

create or replace function f_alpha
return sys.DBMS_DEBUG_VC2COLL pipelined is
begin
for i in 1..26 loop
pipe row(to_char(1/(26-i),'0.000'));
end loop;
return;
end;
.
/

select * from table(f_alpha);

Here we get some rows returned (depending on arraysize) before the error.

Lets try it with a NO_DATA_FOUND error

create or replace function f_alpha
return sys.DBMS_DEBUG_VC2COLL pipelined is
v_ret varchar2(30);
begin
for i in 1..26 loop
select 'Number '||i into v_ret from dual where i != 20;
pipe row(v_ret);
end loop;
return;
end;
.
/

select * from table(f_alpha);

Interesting. At row 20, the no_data_found was returned.
PL/SQL treats it as an exception, and the function fails, but SQL treats a NO_DATA_FOUND as just that and doesn't report an error.

The same happens with a NO_DATA_FOUND from accessing an uninitialized entry in a PL/SQL table:

create or replace function f_alpha
return sys.DBMS_DEBUG_VC2COLL pipelined is
type tab_number is table of number index by pls_integer;
t_num tab_number;
begin
for i in 1..18 loop
t_num(i) := i;
end loop;
for i in 1..26 loop
pipe row(t_num(i));
end loop;
return;
end;
.
/

select * from table(f_alpha);

If you are going to use a function as a data source, then it is advisable to catch the NO_DATA_FOUND and throw it up as another exception. Of course, prior to 10g, this means you'll lose the line where the error actually occurred.

create or replace function f_alpha
return sys.DBMS_DEBUG_VC2COLL pipelined is
type tab_number is table of number index by pls_integer;
t_num tab_number;
begin
for i in 1..18 loop
t_num(i) := i;
end loop;
for i in 1..26 loop
pipe row(t_num(i));
end loop;
return;
exception
when no_data_found then raise_application_error(-20001,'No Data Found in f_alpha',true);
end;
.
/
select * from table(f_alpha);