Log in

No account? Create an account

Previous Entry | Next Entry

Learning about good SQL performance?

I've worked with eight or nine different database vendors in my career, and I've learned enough SQL to avoid obvious antipatterns (e.g. the N+1 selects problem) and pitfalls (e.g. DB2 page-level locking), and some optimizations by rote (e.g. expanding transitive relationships in selects). I still don't have a particularly good handle on which of two or more competing ways to accomplish something in SQL is more efficient, and it's not only not obvious, it's sometimes counterintuitive.

Can anyone recommend good references, both vendor-specific and in general, that will let me reason about the performance of SQL that I'm writing (or rewriting)? Thanks in advance....



( 10 comments — Leave a comment )
Jun. 14th, 2005 12:55 pm (UTC)
I wish I had good advice for you, but my SQL knowledge extends only as far as an undergrad course, where we didn't do a whole lot of optimization. Are there profiling tools which could help?

I mostly wanted to comment to say that that's an adorable user icon! :)

(On the other hand, I do have my databases textbook still. I can't say it'd be of mucch use, but if you want to look through it, you're welcome to. I think most of our SQL, IIRC, was of the "Write an SQL statement to make this happen," variety rather than the, "Write an SQL statement to make this happen *well*, variety. I also recall that we discussed normalization, but it was mostly presented in an always-a-good-thing light which I don't think jibes with reality.)
Jun. 14th, 2005 01:41 pm (UTC)
I'm really only competent to talk about tuning SQL for Sybase, but while there are definitely some vendor-specific tricks one should know to do the best possible job, I'm sure "generic SQL tips" could probably boil down to a small list.

Make as few trips to the data as possible

Accessing a table once is better than accessing the table many times, all else being equal. You see this problem when traditional programmers aren't comfortable in SQL - rather than touching a table once with set-based operations, they'll fetch a row, operate on it, fetch a row, etc. (Also called "joining in the client" vs "... in the server".

Handle the data as little as possible

I see a lot of code where an entire table is truncated then refreshed from somewhere else, then hit with an update statement against the whole table that's only relevant to a few rows, then the rows not wanted are discarded. In many cases this tip could be summarised as "apply the WHERE clauses as early as possible against as many rows as possible". Filter sooner. Only get the rows you care about. Does that row really need to be handled four times?

SQL is set based

Really. Row by row processing is almost always slower. Operate on sets; often a single SQL statement can replace an entire loop of code.

The fewer the I/Os, the better

This is really what it all boils down to. Fewer I/Os uses less memory (more for something else) and touches disk less. I've already said only fetch the rows you care about, but also only fetch the columns you'll use.

Be as simple as possible, but no simpler

Understand what really does impose an overhead. In some dialects, writing "!<" for "not less than" will force a table scan, but ">=" (semantically equivalent) will not. If all you care about is "was there a non-empty result set", use "EXISTS" rather than "IF COUNT(*)>0". In Sybase there are sometimes arcane rules about ensuring datatypes match on both sides of a join - some "implicit conversions" (like numeric to int) run without error but slow things down a lot.

Don't be afraid to make humans work to understand the SQL

It's sometimes true that the fastest SQL gets rather complicated with CASE statements, characteristic functions, self-joins, cross-tabs, etc. I'm all for writing self-documenting code, but I see a lot of extra effort placed on the server as far as I can tell only to avoid writing a nasty-looking statement.

I can't think of specific references, but most vendors should have many many guides for performance & tuning work with their product...
Jun. 14th, 2005 11:58 pm (UTC)

Is it a bad thing that generalizations about SQL optimization are exciting to me?
Jun. 15th, 2005 01:54 am (UTC)
Possibly TMI: At least one of my sweeties and I use technical discussion as foreplay....
Jun. 15th, 2005 02:37 am (UTC)
You know, I can *totally* understand that. :)
Jun. 15th, 2005 02:04 am (UTC)
That all makes a lot of sense, except for the last point. Cycles are cheap. Hardware is cheap. Maintaining "write-only" code is absurdly expensive....

Jun. 14th, 2005 04:26 pm (UTC)
While slower than just knowing, I'm pretty sure many databases include a statement which takes another statement and tells you how it plans to execute it. In PostGres, it's EXPLAIN. I don't recall what it is in Oracle (INSPECT? blech. it's been a long time.). Combining this with australian_joe's advice would probably take you most/part of the way.
Jun. 15th, 2005 02:02 am (UTC)
EXPLAIN is helpful, but what I'd really like to be able to do is to reason about what's going to be a good approach in the first place, rather than have to discover things empirically.
Jun. 16th, 2005 04:34 pm (UTC)
This is true.

On the other hand, I find empirical research can teach me things faster and better than book knowledge. Although, granted, that sometimes it leads me down false paths. The combination of the two would probably be best.

Also, if you have some queries tagged with "slow" and "fast," and then get a book or something, it might be helpful to EXPLAIN them and match them to things the book/whatever's saying.
Jun. 16th, 2005 03:27 pm (UTC)
I'm totally lost on the discussion, but I'm digging your icon -- it's nice to see your face again!
( 10 comments — Leave a comment )