bug-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #27823] SQLClient drops connections without sending notificatio


From: Richard Frith-Macdonald
Subject: Re: [bug #27823] SQLClient drops connections without sending notifications
Date: Wed, 28 Oct 2009 06:41:30 +0000


On 27 Oct 2009, at 19:42, Robert J. Slover wrote:

Richard,

At work, one part of our application is used to 'scan' data gathered from various devices, normalizes it, and inserts it into the database as measurements. This is very performance-critical, and for a very large percentage of the cases, we are only ever inserting rows (new measurements). However, on occasion we need to 'reprocess' the data that was originally gathered (for instance, if the model used to normalize the data has been modified to correct an error). In this case, the same measurement rows will be inserted, albeit they may contain slightly different information. This component of the application has no knowledge of whether it is processing data for the first time or reprocessing it (and no need to know, either). It first attempts to insert a row and if that fails due to a duplicate key constraint violation, it will do an update instead. The overhead of querying the database first to see if the record is already there would in most cases be completely wasted, and in the rarer case would not save anything over simply attempting the insert in the first place. This of course uses straight C and ODBC, but the principle is the sameā€¦if the ODBC drivers forced a disconnect on every constraint violation, we would have significantly worse performance, and would have to opt for the generally slower approach of querying first, since we can only commit a group of measurements for an interval on success of the entire scan (it either all goes in or none of it does).

As I said I don't mind accepting a patch to allow things to not disconnect on error, but your example really just re-enforces my assertion that it should not be an issue.

You say that your code does not 'need to know' whether it's adding new records or replacing existing ones, yet it certainly does since it must handle the errors which will occur if it tries to insert a duplicate value. This means that your code is more complex than it would be if it really didn't need to know.

Your code will be performing inconsistently ... sometimes (usually) it will be fast, but other times is will be slow because of the error handling. When it's slow, it is presumably still 'good enough' for your current system, but is unlikely to scale well if you start having to deal with bigger datasets.

If, instead it was structured as a transaction which first deletes any existing records and then inserts the new ones, it might be very slightly slower in the common case, but more consistent and simpler. It would never be anything like as slow as the case where you try an insert, catch the error, and then update ... and if that performance is acceptable then the performance of the simpler, more consistent way of doing things must be acceptable too. In fact the delete and insert model is very efficient ... when no deletion is actually needed, the delete has the effect of reading index information into memory so it's available for the insertion and was not wasted effort. When a deletion is needed, the database server is able to optimise it ... postgres implements an update as a delete and insert anyway, so the performance in this case is about the same as in the case where no deletion is needed, which is also about the same as when you just do an update!

The SQLClient library was developed specifically for performance critical database coding (specifically pushing huge numbers of messages to mobile phones) ... My idea of performance critical code is software which runs consistently fast, and error generation/handling is something you take great effort to avoid as it is fundamentally opposed to consistency (when an error occurs performance changes) and speed (error handling is slow because of the additional client-server messages and transaction overheads). In fact, avoiding error generation would come about number three on the list of essentials for high performance database programming (after use of indexes and batching of inserts/ updates).

The only times I use the design pattern of attempting an operation, catching errors, and handling the errors separately are: 1. rarely, when performance is truly not an issue (in which case loss of connection is irrelevant) 2. inside a stored procedure ... so the error handling is all done within the database server and is therefore much faster as it's all in a single transaction The second is not really what we were talking about though ... server- side error handling is a legitimate tool and means that the client side doesn't receive an error.

So as I see it, the only case where this matters is where existing code catching errors happens to be fast enough with the error catching, but not fast enough if reconnects are required ... a fairly rare situation, in which I'd see the ability to change the disconnect behavior as a stop-gap to allow you to keep a system running while rewriting and testing the critical section to handle heavier loads.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]