[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] the buzzword paper
From: |
Thomas Lord |
Subject: |
[Gnu-arch-users] the buzzword paper |
Date: |
Mon, 12 Dec 2005 11:38:25 -0800 |
Andy:
>> This research paper may be of interest...
>> http://eprints.ecs.soton.ac.uk/11606/
>> Tom, you have competitions related to your research!
John:
> I've only gotten through the first page, and I'm willing to believe
> that there is some content here, or you wouldn't have posted it.
> But could this paper *have* more buzzwords in it?
> We'll see where it goes, but it does sound a bit like aWiki.
It's not the best written paper one is likely to encounter but
it makes, to my eyes, two contributions:
One is simply what we presume is Yet Another small suggestion
that standards being developed as part of Semantic Web efforts
are applicable to specific problems in spite of their very
abstract nature. (We expect many such papers in the coming
years.)
The other contribution is the idea that queries over revision
control history which ask about coincidences between what
was changed and by whom -- and how that relates to the usual
patterns of who makes what changes -- may provide insights
for project management. One of their examples: if one
programmer is seen to be consistently reverting the changes
made by another, perhaps there is a problem there worthy of
investigation (they specifically suggest a "social" problem).
Also jumbled into the paper is a suggestion that it is best
for a revision control system to sign each individual file
checked in rather than signing, say, a changeset. It isn't
clear to me how this relates to what they are attempting to
contribute.
Observations and criticisms:
~ It's disappointing that this paper is just that: some ideas.
This seems to me like a work-in-progress report, not a paper.
It sounds plausible that the kinds of query functionality they
are enabling may be of some use but the presentation does not show
that, only provides a suggestion and appeals to intuition.
~ It is strange that they frame the task of enabling these kinds
of queries in terms of advancing revision control technology.
While revision control provides *some* of the meta-data they
need to support such queries, revision control also does much
more besides and, additionally, *other* parts of the meta-data
they are manipulating has nothing at all to do with revision
control (e.g., the coincidence of one-Java-class-per-file).
Their approach would be more interesting and convincing if they
worked to make it orthogonal to revision control. A descriptive
framework for capturing patterns of how teams change software
sounds like it might be useful, but why, other than for the
convenience of the author's of this system, does that require
a fundamental change to revision control technology? They
hand-wave that question, imo. (A cynic might note where the
paper is being published and understand the linkage they try
to draw in those terms.)
Worse than handwaving -- they undermine their own claims of a
deficiency in revision control technology by reporting that,
to prove their concept, they have automatically imported data
from a CVS repository. If CVS data supports their idea then it
follows that CVS is just fine for the work they are doing -- there
is nothing wrong with that revision control system at all (from
their perspective).
~ Now is a good time for philosophers, logicians, and those with
expertise in library science -- especially members of those groups
with deep understandings of software, 60s-style A.I., and
graph-oriented databases -- to participate in software development
for the Semantic Web. There is considerable long-standing canon
in those other disciplines that is "suddenly" relevant to the
current standardization and development efforts.
The absence of awareness of the basics in those fields can be felt
in papers such as this. This paragraph of the paper really irked me:
Our use of existing ontologies is important because
simply defining a new ontology does not help in shared
understanding across domains. This concern was voiced
by Guus Schreiber [23], who stated "Good ontologies
are used in applications. They represent some form of
consensus in a community...creating my own ontology
is a misappropriation of the term. Ontology is about
shared understanding" [24].
That paragraph represents a profound philosophical error of
considerable political significance (limited, of course, by
the overall significance of the paper).
The word they are grasping for is *taxonomy*, not *ontology*.
Taxonomy: How that which is is conventionally classified.
Ontology: What that which is, is.
An example of an ontological hypothesis is "All that exists is
formed from earth, air, water, and fire."
An example of a taxonomic practice is: "Books about computer
programming are shelved in the `600s' section".
Ontological facts, to the extent there are any, have absolutely
nothing at all to do with consensus in a community. What is, is.
Taxonomic practices have absolutely everything to do with
consensus in a community. We agree that the "Javascript Pocket
Reference" should be shelved in the 600s section.
Taxonomy and ontology relate, of course. We are collectively
insane if our taxonomies have no grounding in ontological reality.
Yet taxonomies, while they may be grounded in ontological reality,
are simplifications -- reductions of that reality. They are by
nature incomplete. As guides to reality, a given taxonomy is just
a small fragment of a map.
And there, exactly, is what irks me about the sloppiness of their
(borrowed) language: the conflation (confusion between -- missing
of a distinction between) taxonomy and ontology.
As products of pedagogy and culture, our ontological perceptions are
often shaped by our taxonomic context. Psychologically, we are
often unprepared to recognize the *existence* of real things for
which we lack *names* or *categories*. Even when all that is real
fits somewhere in our taxonomic framework, where the structure of
that framework fails to reflect the reality of that which is, it
often leads us to fail to appreciate the nature of that which is
classified.
Some classic examples of what goes wrong when people confuse their
taxonomy with the world's ontology: racism, sexism, homophobia, and
other prejudices. The confusion of taxonomy with ontology is called,
in some circles, "essentialism" (the false reduction of a thing to
some presumed essence based on a dubious classification of the thing).
Bad software engineering may or may not be on par with racism, sexism,
homophobia et al. but this much is certain: working backwards from
excepted taxonomic frameworks to derive presumed engineering
ontologies is, literally, medieval pseudo-logic. Before one goes
about applying received taxonomic frameworks to engineering facts,
one has an obligation to demonstrate and characterize the engineering
facts. Then one can show how they fit in to the taxonomic framework.
It is not, as the authors abuse the language "important" to use
"existing ontologies [sic -- read `taxonomies']" BECAUSE they are
existing. The authors have put the cart before the horse, here.
Some people will read this and correctly acknowledge that I make
a valid philosophical point but question why I bring it up here.
Isn't this esoteric point a minor thing in this context? Isn't
their software engineering theorizing just fine, even if their
word choices are a bit wrong?
I say it matters quite a bit, even if the work in this paper doesn't
specifically go very far. Work *like this* will. Our critical
standards matter. Here is a scenario:
Suppose that, 3 years hence, a project like this publishes results
rather than speculation. In their study, certain patterns of changes
to software correlate very strongly with, say, social problems among
the team or programmers making mistakes because they are operating out
of their area of expertise. Those would be interesting correlations
but it would be tragic if project managers began relying on them as
ontology -- "a programmer whose commits fit such-and-such a pattern,
clearly being up to no good, shall be fired, without exception".
This is not an esoteric, "purely philosophical" concern. On the
contrary, taxonomic v. ontological confusion is real, active, and
increasingly dominant in the free software industry. It directly
effects your career options and mine. A good example is CMU's
recent work on "readiness metrics" for open source projects which,
not especially original to CMU, reflect criteria already in use by
key industry decision makers: "Good projects have mailing lists
featuring particular activity levels and natures; good projects
have such and such a change rate; etc." Very bad projects can
and do sail past -- even `game' -- such criteria. Very good
projects can be sabotaged by manipulating the measured quantities
and qualities of these criteria. Application of these criteria
is no substitute for actually (gasp) examining the content of the
project critically and in detail -- yet such rote application is
touted as, and used as, a substitute for caring about the actual
facts.
~ As a practical matter, the paper relies on some syntactic quirks
of Java and thus its applicability in general is limited. *If*
the authors produce compelling software engineering results out
of these ideas, it will be interesting to see whether and how they
try to generalize these ideas. Perhaps they might one day have some
good advice for language designers.
-t
- [Gnu-arch-users] the buzzword paper,
Thomas Lord <=