There's been no activity here in a while, because I haven't done much in the Oracle area. And no-one tagged me in the blog-tag game (so the story about my Dad and Princess Anne stays secret).
This entry is more SQL Server than Oracle, but it is generic in some ways, and its also got the closest I've found to v$sql in SQL Server 2005.
I've been working with a tool that generates procedures for moving data around. You know the sort of thing. Press the button and away it goes and you end up with a plain vanilla script with a big SELECT cursor loop, an update and, if no rows were updated, an insert. It commits every thousand records and, when it inserts more than a thousand records, it gathers stats. Not at the end, but after the thousand so if you insert ten million, the stats reflect the first thousand. Odd, but there you go.
Anyway, its just the sort of GUI button generated code that Noons remarked upon.
Guess what ? It ran pretty slowly when dealing with millions of records.
The initial temptation was to throw it away and write custom code. While there's definately benefits to that, there's also the downside that you lose that automatic generation capacity for the future, and since that was one reason for getting the tool, we need a good reason to swithch.
So in the spirit of BAAG, we took a different approach.
Firstly, we stuck in a bunch of debugs to indicate how long it took for each 250,000 records. Enough messages to see how it was going, not so many that we are overwhelmed.
The result was that it started fast enough (about 1 min per batch) and consistently decelerated (to 10-15 minutes a batch after several million). The first suspect was the update statement. Initially the table would be empty, so there would be no updates and it should find that out quickly. As data is added, the update query would take longer even if no rows were found to update. In fact we knew that the update would never match rows, so one option would be to remove that altogether.
But it was still a guess that the update was the problem. Or at least a hypothesis as it was something we could test. So with sufficient googling, I came up with a query like :
select creation_time, last_execution_time, execution_count,
total_logical_writes, total_logical_reads, total_physical_reads,
(qs.total_worker_time/1000000) tot_wrk_sec, (qs.total_elapsed_time/1000000) tot_secs,
SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) as statement_text
from sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
where last_execution_time > dateadd(Minute,-20,getdate())
order by execution_count, total_logical_writes, total_logical_reads desc
which is sort of like looking at SQL Server's version of v$sql.
It indicated that the time was pretty much all taken up in the insert component. The update was innocent. Actually, given that B-Tree indexes are pretty good at finding stuff, the 'slowing down' should have eased up fairly quickly, going from one hit to two/three for each index check as the 'depth' of the index grew. And even doing three times as many reads for each index check wouldn't have explained a ten-fold decrease in performance.
The underlying culprit was half-a-dozen indexes on the table on low-cardinality columns. Whether those indexes are actually useful or not for the application is a wider question, but for this particular purpose, it was simple enough to drop them before hand and recreate them at the end.
Result - run time cut to about 15% of the original, even including the time taken to rebuild the indexes.
And while the generated code clearly isn't optimal, it is adequate (at least in this case).