Monday, February 08, 2010

Append Values and how not to break the database

With the advent of the /*+ APPEND_VALUES*/ hint in 11gR2, I suspect we will see the feature misused and a bunch of questions on why it "doesn't work". The documentation states that "direct-path INSERT can be considerably faster than conventional INSERT.". What it should state is that it can also be considerably slower (plus that, if logging is not enforced at either the table or tablespace level, it could also render backups unusable).

Firstly, direct path inserts work by inserting data at the end of existing data, above the high water mark. If you have two conventional inserts throwing data at a table, they can each move the high-water mark as required. Session 1 can move it out by 10 blocks, then session 2 can move it out another 2, then session 1 moves it again. The high-water mark is metadata about the table. It will be stored in one place and there is the potential for contention by multiple sessions wanting to change it at once. For conventional inserts, as soon as the session has adjusted it, it releases its hold and other sessions can do their adjustments. It doesn't need to wait on session commits, and contention generally isn't a problem.

In direct-path inserts, the insert moves the HWM but cannot release its hold on that information. That's because the data it is writing between the old and new HWM is 'dodgy'. It isn't yet committed and it shouldn't be read into the buffer cache. [I suspect it is written to the data file as if it were committed rather than with lock flags and transaction identifiers. That way it avoids the need for a delayed block cleanout when it is subsequently read.] If another insert (or even an update or merge) on the table needs to move the HWM, it has to wait until the direct path transaction is committed or rolled back.  That could happen with conventional insert/update/merge, but will ALWAYS happen with another direct path insert.

Try this in one session
drop table TEST_IA purge;
create table TEST_IA (id number, val varchar2(4000));
insert /*+ APPEND */ into TEST_IA
select rownum rn, 'b' val from dual connect by level <= 1000;

And then this in another
insert /*+ APPEND */ into TEST_IA
select rownum rn, 'b' val from dual connect by level <= 1000;

See what I mean ?
If you have 11gR2, insert /*+ APPEND_VALUES */ into TEST_IA values (1,'b'); will have the same effect.

So the first failure pattern I predict will be multiple sessions all trying APPEND_VALUES into the same table and knocking heads together. This would most likely happen when developers have tried to parallelise operations outside the database. Knocking heads together may be the appropriate solution too.

I suspect this will be somewhat mitigated by the second failure pattern. Once you've done a direct path insert, if you try anything else on that table in the same session, you'll get a "ORA-12838: cannot read/modify an object after modifying it in parallel" error message. I'd say the error message was somewhat mis-leading, but a quick google will tell them that the solution is to do a commit after the insert. You can split coders into two groups, the first who understand the concept of a transaction, and the second who don't. I think the number of the latter are increasing. Even if it is okay to commit, you could still have log sync waits.

The final failure pattern I predict will be those who think "Ah, I can do inserts without generating log data. That should be faster.". The problem is that the metadata changes, moving the HWM, are logged and it is only the creation of content data that might be unlogged. In the following script, I compare several approaches to inserting a single record in a loop. When compared with a conventional part insert, it is apparent that a lot more redo is generated for the single row direct path insert with a small record size. When I used a larger record size (padding the value to several thousand characters) the redo size was comparable but there were still fewer redo entries in the conventional path inserts. I do concede logging is not the only performance impact and performance may still improve due to bypassing of the buffer cache, no need to locate free space in the table etc.

drop table TEST_ROW_IAV purge;
drop table TEST_ROW_IA purge;
drop table TEST_ROW_IV purge;
drop table TEST_ROW_I purge;

create table TEST_ROW_IAV (id number, val varchar2(4000));
create table TEST_ROW_IA  (id number, val varchar2(4000));
create table TEST_ROW_IV  (id number, val varchar2(4000));
create table TEST_ROW_I   (id number, val varchar2(4000));

clear screen
declare
  cursor c_1 is
    select rownum rn, 'b' val from dual connect by level <= 10000;
  --
  procedure rep_ext (p_seg in varchar2)
  is
    cursor c_e  is
      select rpad(segment_name,20) segment_name, tablespace_name,
             count(extent_id) cnt, sum(round(bytes/1024)) kb, sum(blocks) blocks
      from user_extents
      where segment_name  = p_seg
      group by segment_name, tablespace_name;
  begin
    for c_out in c_e loop
      dbms_output.put_line(to_char(c_out.segment_name)||

       ' '||c_out.tablespace_name||' '||to_char(c_out.cnt,'9990')||
       ' '||to_char(c_out.kb,'999,990.00')||' '||to_char(c_out.blocks,'999,990'));
    end loop;
  end rep_ext;
  --
  procedure rep_redo (p_text in varchar2)
  is
    cursor c_r is
      select    sum(case when name = 'redo entries' then value end) redo_entries,
                sum(case when name = 'redo size' then value end) redo_size
      from v$mystat s join v$statname n on n.statistic# = s.statistic#
      where name in ('redo entries','redo size');
  begin
    dbms_output.put(rpad(p_text,20)||' at '||to_char(sysdate,'hh24:mi:ss')||' ');
    for c_rec in c_r loop
      dbms_output.put_line('Entries:'||to_char(c_rec.redo_entries,'9,999,990')||

                           ' Size:'||to_char(c_rec.redo_size,'999,999,990'));
    end loop;
    end rep_redo;
    --
begin
    commit;
    rep_redo('Start');
    for c_rec in c_1 loop
      insert /*+ APPEND_VALUES */ into TEST_ROW_IAV values (c_rec.rn, c_rec.val);
      commit;
    end loop;
    rep_redo('After /*+ APPEND_VALUES */');
    --
    for c_rec in c_1 loop
      insert /*+ APPEND */ into TEST_ROW_IA select c_rec.rn, c_rec.val from dual;
      commit;
    end loop;
    rep_redo('After /*+ APPEND */');
    --
    for c_rec in c_1 loop
      insert into TEST_ROW_IV values (c_rec.rn, c_rec.val);
      commit;
    end loop;
    rep_redo('After insert values');
    --
    for c_rec in c_1 loop
      insert into TEST_ROW_I select c_rec.rn, c_rec.val from dual;
      commit;
    end loop;
    rep_redo('After insert select');
    --
    rep_ext('TEST_ROW_IAV');
    rep_ext('TEST_ROW_IA');
    rep_ext('TEST_ROW_IV');
    rep_ext('TEST_ROW_I');
    --
end;
/
And my results:


Start                at 14:10:59 Entries:       912 Size:     125,628
After /*+ APPEND_VAL at 14:11:02 Entries:   112,547 Size:  15,995,632
After /*+ APPEND */  at 14:11:08 Entries:   224,184 Size:  31,863,240
After insert values  at 14:11:09 Entries:   234,409 Size:  36,723,128
After insert select  at 14:11:11 Entries:   244,634 Size:  41,422,384
TEST_ROW_IAV         USERS    81   81,920.00   10,240
TEST_ROW_IA          USERS    81   81,920.00   10,240
TEST_ROW_IV          USERS     3      192.00       24
TEST_ROW_I           USERS     3      192.00       24

APPEND_VALUES (and indeed APPEND) are not intended for single row inserts, but even small arrays will demonstrate similar problems. Ideally you want each insert to leave full blocks to minimize unused (and probably unusable) space. If you target your array size to a single block, you may find that in practice you get variations between 75% of a block and 1 and a bit blocks (more empty space). However if each insert creates one hundred blocks, you don't mind if the last one is a bit empty. With this in mind you probably want to think of arrays of at least thousands of rows, and maybe in the tens or hundreds of thousands of rows depending on both row size and block size

One more script, looking at how things should be done. Firstly the classical insert with append hint from a select. Secondly, with an array of a hundred thousand rows inserted with the append_values hint. Finally, the '10g' way of a direct path insert from a PL/SQL variable, using SQL types. What you see is that, for this data set, there's not much to choose between the three.

drop table TEST_FORALL_IAV purge;
drop table TEST_IA purge;
drop table TEST_TYPE_IA purge;

drop type type_test_ia;
drop type type_tab_test_ia;

create type type_test_ia is object (id number, val varchar2(4000));
/
create type type_tab_test_ia is table of type_test_ia;
/

create table TEST_FORALL_IAV (id number, val varchar2(4000));
create table TEST_IA (id number, val varchar2(4000));
create table TEST_TYPE_IA (id number, val varchar2(4000));

clear screen
declare
  cursor c_1 is
    select rownum rn, 'b' val from dual connect by level <= 100000;
  TYPE tab_1 is table of c_1%rowtype index by pls_integer;
  t_1    tab_1;
  t_tab    type_tab_test_ia;
  --
  procedure rep_ext (p_seg in varchar2)
  is
    cursor c_e  is
      select rpad(segment_name,20) segment_name, tablespace_name,

             count(extent_id) cnt, sum(round(bytes/1024)) kb, sum(blocks) blocks
      from user_extents
      where segment_name  = p_seg
      group by segment_name, tablespace_name;
  begin
    for c_out in c_e loop
      dbms_output.put_line(to_char(c_out.segment_name)||

        ' '||c_out.tablespace_name||' '||to_char(c_out.cnt,'9990')||
        ' '||to_char(c_out.kb,'999,990.00')||

        ' '||to_char(c_out.blocks,'999,990'));
    end loop;
  end rep_ext;
  --
  procedure rep_redo (p_text in varchar2)
  is
    cursor c_r is
      select sum(case when name = 'redo entries' then value end) redo_entries,
             sum(case when name = 'redo size' then value end) redo_size
      from v$mystat s join v$statname n on n.statistic# = s.statistic#
      where name in ('redo entries','redo size');
  begin
    dbms_output.put(rpad(p_text,20)||' at '||to_char(sysdate,'hh24:mi:ss')||' ');
    for c_rec in c_r loop

        dbms_output.put_line('Entries:'||to_char(c_rec.redo_entries,'9,999,990')||
          ' Size:'||to_char(c_rec.redo_size,'999,999,990'));
    end loop;
  end rep_redo;
    -- 

begin
  commit;
  rep_redo('Start');
  insert /*+ APPEND */ into TEST_IA
  select rownum rn, 'b' val from dual connect by level <= 100000;
  rep_redo('After Append dual');
  --
  open c_1;
  fetch c_1 bulk collect into t_1;
  close c_1;
  forall i in 1..t_1.count
      insert /*+ APPEND_VALUES */ into TEST_FORALL_IAV values t_1(i);
  rep_redo('After Append values');
  --
  select cast(collect(type_test_ia(rn,val)) as type_tab_test_ia)
  into t_tab
  from (select rownum rn, 'b' val from dual connect by level <= 100000);
  --
  insert /*+ APPEND */ into    TEST_TYPE_IA
  select * from table(t_tab);
  rep_redo('After Append type');
  --
  rep_ext ('TEST_IA');
  rep_ext ('TEST_FORALL_IAV');
  rep_ext ('TEST_TYPE_IA');
end loop;
/

And again, my results

Start                at 14:33:04 Entries:       580 Size:     112,392
After Append dual    at 14:33:04 Entries:       850 Size:     153,468
After Append values  at 14:33:04 Entries:     1,116 Size:     193,836
After Append type    at 14:33:05 Entries:     1,383 Size:     234,432
TEST_IA              USERS    17    2,048.00      256
TEST_FORALL_IAV      USERS    17    2,048.00      256
TEST_TYPE_IA         USERS    17    2,048.00      256

No comments: