Tuesday, September 13, 2005

Views, DB Links and the CBO

Two exceptions to some general conceptions about Oracle's CBO, both related to DB Links.
Firstly, that the order of the tables in the FROM clause is no longer relevant.
Secondly, that selecting from a view and from the underlying query are the same thing.

When a query involves objects held in a local database and on a remote database over the database link, Query Optimization has another level of complexity. Simply put, neither database has all the information, and so there is more guesswork involved.

Here's the setup :

drop table lots_of_people;
drop table lookup;

create table lookup (lookup_code varchar2(2) constraint l_pk primary key, lookup_desc varchar2(30));

create table lots_of_people
(lp_id number constraint p_pk primary key, surname varchar2(30), first_name varchar2(30),
gender varchar2(2) constraint lk_fk references lookup);

insert into lookup values ('M','MALE');
insert into lookup values ('F','FEMALE');

insert into lots_of_people (lp_id, surname, first_name, gender)
select rownum, table_name, column_name, decode(mod(rownum,1),1,'M','F')
from dba_tab_columns;

create index lp_ix on lots_of_people (surname, first_name);

BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER,tabname => 'LOOKUP',estimate_percent => 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY', CASCADE=>TRUE);
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER,tabname => 'LOTS_OF_PEOPLE',estimate_percent => 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY', CASCADE=>TRUE);
END;
.
/

select table_name, num_rows from user_tables
where table_name in ('LOOKUP','LOTS_OF_PEOPLE')
/

TABLE_NAME NUM_ROWS
------------------------------ ----------
LOOKUP 2
LOTS_OF_PEOPLE 82522

create view lots_of_people_vw1
as select lp.lp_id, lp.surname, lp.first_name, l.lookup_desc gender_desc
from lots_of_people lp, lookup l
where l.lookup_code = lp.gender;

create view lots_of_people_vw2
as select lp.lp_id, lp.surname, lp.first_name, l.lookup_desc gender_desc
from lookup l, lots_of_people lp
where l.lookup_code = lp.gender;

create database link loop connect to **USER** identified by **PASSWORD** using '**ORIGDB**';

Note: The two views differ only in the order of the tables in the FROM clause.
I used a database link back to the same user and database so I didn't need to involve either two users or two databases.

Now we'll try a query that can be fully resolved at the remote database. When no local objects are involved, the local database will throw the problem over to the remote database.

select * from lots_of_people_vw1@loop
where surname = 'PLAN_TABLE';

EXPLAIN PLAN SET STATEMENT_ID = 'TEST1' INTO PLAN_TABLE FOR
select * from lots_of_people_vw1@loop
where surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST1') );

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | | 61 | 2745 | 6 (0)| |
|* 1 | HASH JOIN | | 61 | 2745 | 6 (0)| |
| 2 | TABLE ACCESS BY INDEX ROWID| LOTS_OF_PEOPLE | 61 | 2257 | 3 (0)| ORIGDB |
|* 3 | INDEX RANGE SCAN | LP_IX | 61 | | 3 (0)| ORIGDB |
| 4 | TABLE ACCESS FULL | LOOKUP | 2 | 16 | 2 (0)| ORIGDB |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("L"."LOOKUP_CODE"="LP"."GENDER")
3 - access("LP"."SURNAME"='PLAN_TABLE')

Note: fully remote operation
That gives a sensible result.
Now try it using both a local and a remote table which, without a DRIVING_SITE hint will generally be resolved locally. I've used dual as the local table. From the point of view of the demonstration, the part of the plan(s) relating to dual can be ignored.

EXPLAIN PLAN SET STATEMENT_ID = 'TEST2' INTO PLAN_TABLE FOR
select * from lots_of_people_vw1@loop, dual d
where surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST2') );

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Inst |IN-OUT|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 163 | 1630 | 10 | | |
| 1 | MERGE JOIN CARTESIAN| | 163 | 1630 | 10 | | |
| 2 | REMOTE | | 1 | 8 | 2 | LOOP | R->S |
| 3 | BUFFER SORT | | 8168 | 16336 | 8 | | |
| 4 | TABLE ACCESS FULL | DUAL | 8168 | 16336 | 8 | | |
------------------------------------------------------------------------------------

Note: cpu costing is off

Here it thinks only 1 row will typically be returned from the remote operation.
Now using the other view (which differed only in the table order in the FROM clause).

EXPLAIN PLAN SET STATEMENT_ID = 'TEST3' INTO PLAN_TABLE FOR
select * from lots_of_people_vw2@loop, dual d
where surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST3') );

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Inst |IN-OUT|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6740K| 250M| 6632 | | |
| 1 | MERGE JOIN CARTESIAN| | 6740K| 250M| 6632 | | |
| 2 | REMOTE | | 825 | 30525 | 32 | LOOP | R->S |
| 3 | BUFFER SORT | | 8168 | 16336 | 6600 | | |
| 4 | TABLE ACCESS FULL | DUAL | 8168 | 16336 | 8 | | |
------------------------------------------------------------------------------------

Note: cpu costing is off

Now it has decided on 825 rows (or 1% of the total number of rows in the LOTS_OF_PEOPLE table).

Finally, with the driving site hint.
EXPLAIN PLAN SET STATEMENT_ID = 'TEST4' INTO PLAN_TABLE FOR
select /*+DRIVING_SITE (l) */ * from lots_of_people_vw2@loop l, dual d
where l.surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST4') );

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | | 500K| 22M| 22 (0)| |
|* 1 | HASH JOIN | | 500K| 22M| 22 (0)| |
| 2 | TABLE ACCESS BY INDEX ROWID| LOTS_OF_PEOPLE | 61 | 2257 | 3 (0)| ORIGDB |
|* 3 | INDEX RANGE SCAN | LP_IX | 61 | | 3 (0)| ORIGDB |
| 4 | MERGE JOIN CARTESIAN | | 16336 | 159K| 18 (0)| |
| 5 | TABLE ACCESS FULL | LOOKUP | 2 | 16 | 2 (0)| ORIGDB |
| 6 | BUFFER SORT | | 8168 | 16336 | 16 (0)| |
| 7 | REMOTE | | 8168 | 16336 | | ! |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("L"."LOOKUP_CODE"="LP"."GENDER")
3 - access("LP"."SURNAME"='PLAN_TABLE')

Note: fully remote operation
Back to a sensible plan.

Finally, using the query itself rather than a view :

EXPLAIN PLAN SET STATEMENT_ID = 'TEST5' INTO PLAN_TABLE FOR
select *
from (select lp.lp_id, lp.surname, lp.first_name, l.lookup_desc gender_desc
from lots_of_people@loop lp, lookup@loop l
where l.lookup_code = lp.gender) l, dual d
where l.surname = 'PLAN_TABLE';

SELECT * FROM TABLE(dbms_xplan.display('PLAN_TABLE','TEST5') );
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 249K| 11M| 21 (0)| | |
|* 1 | HASH JOIN | | 249K| 11M| 21 (0)| | |
| 2 | REMOTE | | 15 | 555 | | LOOP | R->S |
| 3 | MERGE JOIN CARTESIAN| | 16336 | 159K| 18 (0)| | |
| 4 | REMOTE | | 2 | 16 | | LOOP | R->S |
| 5 | BUFFER SORT | | 8168 | 16336 | 16 (0)| | |
| 6 | TABLE ACCESS FULL | DUAL | 8168 | 16336 | 8 (0)| | |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("L"."LOOKUP_CODE"="LP"."GENDER")

[Looking at the 'OTHER' column in the plan table shows that ID 2 relates to the LOTS_OF_PEOPLE component and ID 4 is the LOOKUP component].

In summary, when presented with a object in a remote database, the local database doesn't have much idea about it. With a remote table, the remote database will supply some information on the number of rows and indexes. When it comes down to views, the remote database gives a 'guess', which appears to be the number of rows in the last table mentioned in the FROM clause.

In my next post, I'll look at the 10053 plan for remote views and discuss the repurcussions for index usage.

No comments: