Saturday, March 12, 2011

Why I don't trust deferrable constraints

I don't like deferred constraints. I don't like deferrable constraints either, because there is the risk of someone making them deferred.

There's a Pythian blog on one potential corruption pitfall where you can end up with duplicate primary keys.

A little over a year ago, I commented on a blog by Chrisian Antognini to demonstrate where the optimizer gives incorrect results by invalid table elimination with a deferred constraint.

I've recently retested this in the 11.2.0.2 environment on apex.oracle.com and seen a change in behavior, though the bug is still there.

Firstly, create the parent and child tables, linked by a deferrable constraint that is INITIALLY IMMEDIATE (ie not deferred by default).

CREATE TABLE t1 (
    id NUMBER NOT NULL,
    val varchar2(5),
   CONSTRAINT t1_pk PRIMARY KEY(id)
);


CREATE TABLE t2 (
   id NUMBER NOT NULL,
   t1_id NUMBER NOT NULL,
   val varchar2(5),
   CONSTRAINT t2_pk PRIMARY KEY(id),
   CONSTRAINT t2_t1_fk FOREIGN KEY (t1_id) REFERENCES t1
            deferrable initially immediate
 );

Then add a view to encapsulate the join

CREATE VIEW v12 AS
 SELECT t1.id AS t1_id, t1.val AS t1_val,
        t2.id AS t2_id, t2.val AS t2_val, t2.t1_id AS t2_par_t1
 FROM t1, t2
 WHERE t1.id = t2.t1_id;

Because I was testing on the apex.oracle.com instance, I needed to do it as a single request (because the web-based SQL Worksheet treats a request as a transaction). So I used a stored procedure.


CREATE OR REPLACE PROCEDURE test_v AS
BEGIN
  --
  EXECUTE IMMEDIATE 'alter session set constraints=deferred';
  INSERT INTO t2 (id, t1_id, val) VALUES (10,2,'Child');
  --
  FOR i IN (SELECT t2_id, t2_val, t2_par_t1 FROM v12) LOOP
    DBMS_OUTPUT.PUT_LINE('1:'||i.t2_id||'/'||i.t2_val||
                          '/'||i.t2_par_t1);
  END LOOP;
  --
  FOR j IN (SELECT t1_id FROM v12) LOOP
    DBMS_OUTPUT.PUT_LINE('2:'||j.t1_id);
  END LOOP;
  --
  FOR k IN (SELECT t1_id, t1_val FROM v12) LOOP
    DBMS_OUTPUT.PUT_LINE('3:'||k.t1_id||'/'||k.t1_val);  
  END LOOP;
  --
  ROLLBACK;
  --
END;
/


The procedure sets the transaction to use deferred constraints, then inserts a row into the child table (without the corresponding parent row). Then there are three selects taking different columns from the view.

The first loop picks details that exist only in the child table, the second picks the common identifier and the third picks up details from the parent table.

In XE, only first loop returns records, but the second and third do not. The first loop "works" because the join to the parent table is eliminated on the assumption that the constraint requires that a parent row exists.

In 11.2.0.2, the first and second loops return records. The second loop can work if you assume that, because the ids are common, then it makes no logical difference if you take it from the child table or the parent table. If you assume the parent table will have at most one row for the key then the number of records returned is fully dependent on the child table. It is therefore more efficient to just hit the child table, and it is am improvement to eliminate the join to the parent.

In neither situation does the third query return a row. Because it requires the parent table to be hit, the optimizer cannot eliminate it and when it is hit, it is found to be empty.

Ultimately, the 'solution' would require every cursor to have additional dependencies on the state of every deferrable constraint for the query...or to simply assume that any constraint that is deferrable might be deferred and to optimize it as if it doesn't exist.

No comments: