Thursday, February 14, 2008

Databases differ. Problem solving approaches don't.

There's been no activity here in a while, because I haven't done much in the Oracle area. And no-one tagged me in the blog-tag game (so the story about my Dad and Princess Anne stays secret).

This entry is more SQL Server than Oracle, but it is generic in some ways, and its also got the closest I've found to v$sql in SQL Server 2005.

I've been working with a tool that generates procedures for moving data around. You know the sort of thing. Press the button and away it goes and you end up with a plain vanilla script with a big SELECT cursor loop, an update and, if no rows were updated, an insert. It commits every thousand records and, when it inserts more than a thousand records, it gathers stats. Not at the end, but after the thousand so if you insert ten million, the stats reflect the first thousand. Odd, but there you go.

Anyway, its just the sort of GUI button generated code that Noons remarked upon.
Guess what ? It ran pretty slowly when dealing with millions of records.

The initial temptation was to throw it away and write custom code. While there's definately benefits to that, there's also the downside that you lose that automatic generation capacity for the future, and since that was one reason for getting the tool, we need a good reason to swithch.

So in the spirit of BAAG
, we took a different approach.

Firstly, we stuck in a bunch of debugs to indicate how long it took for each 250,000 records. Enough messages to see how it was going, not so many that we are overwhelmed.

The result was that it started fast enough (about 1 min per batch) and consistently decelerated (to 10-15 minutes a batch after several million). The first suspect was the update statement. Initially the table would be empty, so there would be no updates and it should find that out quickly. As data is added, the update query would take longer even if no rows were found to update. In fact we knew that the update would never match rows, so one option would be to remove that altogether.

But it was still a guess that the update was the problem. Or at least a hypothesis as it was something we could test. So with sufficient googling, I came up with a query like :

select creation_time, last_execution_time, execution_count,
total_logical_writes, total_logical_reads, total_physical_reads,
(qs.total_worker_time/1000000) tot_wrk_sec, (qs.total_elapsed_time/1000000) tot_secs,
SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
((CASE statement_end_offset
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) as statement_text
from sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
where last_execution_time > dateadd(Minute,-20,getdate())
order by execution_count, total_logical_writes, total_logical_reads desc

which is sort of like looking at SQL Server's version of v$sql.

It indicated that the time was pretty much all taken up in the insert component. The update was innocent. Actually, given that B-Tree indexes are pretty good at finding stuff, the 'slowing down' should have eased up fairly quickly, going from one hit to two/three for each index check as the 'depth' of the index grew. And even doing three times as many reads for each index check wouldn't have explained a ten-fold decrease in performance.

The underlying culprit was half-a-dozen indexes on the table on low-cardinality columns. Whether those indexes are actually useful or not for the application is a wider question, but for this particular purpose, it was simple enough to drop them before hand and recreate them at the end.

Result - run time cut to about 15% of the original, even including the time taken to rebuild the indexes.

And while the generated code clearly isn't optimal, it is adequate (at least in this case).


Noons said...

Excellent stuff, Gary.

Yeah, for some reason inserts on low cardnlty columns are a bit of a disaster in SS2k5.

Not sure if they do bitmap indexes instead of btree without telling anyone - a common occurrance in M$-land - or some other weird coding to check values on the duplicate keys.

Real weird.

There are a few other "gotchas" in that product. I'm finding more and more that good old "rule of thumb" stuff in Oracle also applies to SS2k5. Like drop index-load data-create index.

I wish they had an index disable. They probably do, but I never found the dialogue box! :-)

Have you tried the maintenance plans in Studio 2k5? That's the bit I'm finding more and more interesting: visual programming at its best. Drag and drop modules, customize their interface, create custom connectors and presto: there's a batch maintenance job.

Good stuff.

Alex Gorbachev said...

Hey Gary,
Nice example of SQL Server troubleshooting for Oracle DBA. Too bad SS is still far behind Oracle on instrumentation front and you have to tweak the code to track it down.