info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: merge mode for XML


From: Peter Ring
Subject: RE: merge mode for XML
Date: Wed, 15 May 2002 02:15:50 +0200

A paper that will interest you:

(preliminary version)
http://citeseer.nj.nec.com/cache/papers/cs/15339/http:zSzzSzwww.cs.arizona.e
duzSzpeoplezSztodszSzacceptedzSz2000zSzParsonsEmancipating.pdf/parsons00eman
cipating.pdf

(published)
http://portal.acm.org/citation.cfm?id=357778&coll=portal&dl=ACM&CFID=2131136
&CFTOKEN=70981949

Abstract:
"Database design commonly assumes, explicitly or implicitly, that instances
must belong to
classes. This can be termed the assumption of inherent classification. We
argue that the extent and complexity of problems in schema integration,
schema evolution, and interoperability are, to a large extent, consequences
of inherent classification. Furthermore, we make the case that the
assumption of inherent classification violates philosophical and cognitive
guidelines on classification and is, therefore, inappropriate in view of the
role of data modeling in representing knowledge about application domains."

Also, a search for 'semantic interoperability' should return some
interesting hits.

To tell the difference between two (or three) sequences of bytes is not too
difficult; comparing two sequences A and B to determine their longest common
subsequence (LCS) or the edit distance between them has been much studied.
GNU diff is based on an algorithm published by Eugene W. Myers in 1986.

To tell the difference (distance) between two semantic structures is
difficult in a very fundamental way.

Kind regards
Peter Ring


-----Original Message-----
From: address@hidden [mailto:address@hidden Behalf Of
Glew, Andy
Sent: 13. maj 2002 19:32
To: address@hidden; Glew, Andy
Cc: Gary Bisaga
Subject: RE: merge mode for XML


> > Motivation: schema changes in most existing relational databases are
> > onerous.
>
> For very good reason.

And what is that reason?

OK, I admit that some RDBMS applications in production
need stability - just like some systems software applications
(the kind Greg seems to work on, the kind I used to
work on) value stability above all else, and actively
want to make it hard to change things.

However, there are other application domains
- in programming, the domains attacked by agile
methodologies like XP (eXtreme Programming).
{Donning asbestos underwear, expecting Greg
to flame.}

An application area that I frequently work in nowadays
is experimental databases - databases for experimental data.
I want to archive all of my experimental data in a form that
allows me to do arbitrary SQL-like queries over it.

Problem is, as I continue my research, the format of
my records is continually changing.  For example, a few years
ago I might have recorded CPU MHz and Cache Size as
configuration parameters - now I have to record at least
3 different cache sizes, as well as multiple clock domain
frequencies. Not to mention that the observations that
I record are constantly changing.
        Rather than continually reformatting my database,
adding new fields which are "Unknown" or "Null" on old data,
I find it easier to add records containing fields that were not
known earlier.

<snip />




reply via email to

[Prev in Thread] Current Thread [Next in Thread]