[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug #27823] SQLClient drops connections without sending notificatio
From: |
Richard Frith-Macdonald |
Subject: |
Re: [bug #27823] SQLClient drops connections without sending notifications |
Date: |
Wed, 28 Oct 2009 06:41:30 +0000 |
On 27 Oct 2009, at 19:42, Robert J. Slover wrote:
Richard,
At work, one part of our application is used to 'scan' data gathered
from various devices, normalizes it, and inserts it into the
database as measurements. This is very performance-critical, and
for a very large percentage of the cases, we are only ever inserting
rows (new measurements). However, on occasion we need to
'reprocess' the data that was originally gathered (for instance, if
the model used to normalize the data has been modified to correct an
error). In this case, the same measurement rows will be inserted,
albeit they may contain slightly different information. This
component of the application has no knowledge of whether it is
processing data for the first time or reprocessing it (and no need
to know, either). It first attempts to insert a row and if that
fails due to a duplicate key constraint violation, it will do an
update instead. The overhead of querying the database first to see
if the record is already there would in most cases be completely
wasted, and in the rarer case would not save anything over simply
attempting the insert in the first place. This of course uses
straight C and ODBC, but the principle is the sameā¦if the ODBC
drivers forced a disconnect on every constraint violation, we would
have significantly worse performance, and would have to opt for the
generally slower approach of querying first, since we can only
commit a group of measurements for an interval on success of the
entire scan (it either all goes in or none of it does).
As I said I don't mind accepting a patch to allow things to not
disconnect on error, but your example really just re-enforces my
assertion that it should not be an issue.
You say that your code does not 'need to know' whether it's adding new
records or replacing existing ones, yet it certainly does since it
must handle the errors which will occur if it tries to insert a
duplicate value. This means that your code is more complex than it
would be if it really didn't need to know.
Your code will be performing inconsistently ... sometimes (usually) it
will be fast, but other times is will be slow because of the error
handling. When it's slow, it is presumably still 'good enough' for
your current system, but is unlikely to scale well if you start having
to deal with bigger datasets.
If, instead it was structured as a transaction which first deletes any
existing records and then inserts the new ones, it might be very
slightly slower in the common case, but more consistent and simpler.
It would never be anything like as slow as the case where you try an
insert, catch the error, and then update ... and if that performance
is acceptable then the performance of the simpler, more consistent way
of doing things must be acceptable too. In fact the delete and insert
model is very efficient ... when no deletion is actually needed, the
delete has the effect of reading index information into memory so it's
available for the insertion and was not wasted effort. When a
deletion is needed, the database server is able to optimise it ...
postgres implements an update as a delete and insert anyway, so the
performance in this case is about the same as in the case where no
deletion is needed, which is also about the same as when you just do
an update!
The SQLClient library was developed specifically for performance
critical database coding (specifically pushing huge numbers of
messages to mobile phones) ...
My idea of performance critical code is software which runs
consistently fast, and error generation/handling is something you take
great effort to avoid as it is fundamentally opposed to consistency
(when an error occurs performance changes) and speed (error handling
is slow because of the additional client-server messages and
transaction overheads). In fact, avoiding error generation would come
about number three on the list of essentials for high performance
database programming (after use of indexes and batching of inserts/
updates).
The only times I use the design pattern of attempting an operation,
catching errors, and handling the errors separately are:
1. rarely, when performance is truly not an issue (in which case loss
of connection is irrelevant)
2. inside a stored procedure ... so the error handling is all done
within the database server and is therefore much faster as it's all in
a single transaction
The second is not really what we were talking about though ... server-
side error handling is a legitimate tool and means that the client
side doesn't receive an error.
So as I see it, the only case where this matters is where existing
code catching errors happens to be fast enough with the error
catching, but not fast enough if reconnects are required ... a fairly
rare situation, in which I'd see the ability to change the disconnect
behavior as a stop-gap to allow you to keep a system running while
rewriting and testing the critical section to handle heavier loads.