[Gnu-arch-users] the buzzword paper

gnu-arch-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] the buzzword paper

From:	Thomas Lord
Subject:	[Gnu-arch-users] the buzzword paper
Date:	Mon, 12 Dec 2005 11:38:25 -0800
Andy:
>> This research paper may be of interest... 

>> http://eprints.ecs.soton.ac.uk/11606/ 

>> Tom, you have competitions related to your research!

John:
> I've only gotten through the first page, and I'm willing to believe 
> that there is some content here, or you wouldn't have posted it.

> But could this paper *have* more buzzwords in it?

> We'll see where it goes, but it does sound a bit like aWiki.

It's not the best written paper one is likely to encounter but
it makes, to my eyes, two contributions:

One is simply what we presume is Yet Another small suggestion
that standards being developed as part of Semantic Web efforts
are applicable to specific problems in spite of their very 
abstract nature.  (We expect many such papers in the coming
years.)

The other contribution is the idea that queries over revision
control history which ask about coincidences between what
was changed and by whom -- and how that relates to the usual
patterns of who makes what changes -- may provide insights 
for project management.   One of their examples: if one 
programmer is seen to be consistently reverting the changes
made by another, perhaps there is a problem there worthy of
investigation (they specifically suggest a "social" problem).

Also jumbled into the paper is a suggestion that it is best
for a revision control system to sign each individual file
checked in rather than signing, say, a changeset.  It isn't
clear to me how this relates to what they are attempting to
contribute.


Observations and criticisms:

~ It's disappointing that this paper is just that: some ideas.
  This seems to me like a work-in-progress report, not a paper.
  It sounds plausible that the kinds of query functionality they
  are enabling may be of some use but the presentation does not show
  that, only provides a suggestion and appeals to intuition.

~ It is strange that they frame the task of enabling these kinds
  of queries in terms of advancing revision control technology.
  While revision control provides *some* of the meta-data they 
  need to support such queries, revision control also does much
  more besides and, additionally, *other* parts of the meta-data
  they are manipulating has nothing at all to do with revision 
  control (e.g., the coincidence of one-Java-class-per-file).

  Their approach would be more interesting and convincing if they
  worked to make it orthogonal to revision control.   A descriptive
  framework for capturing patterns of how teams change software
  sounds like it might be useful, but why, other than for the 
  convenience of the author's of this system, does that require
  a fundamental change to revision control technology?  They 
  hand-wave that question, imo.   (A cynic might note where the 
  paper is being published and understand the linkage they try
  to draw in those terms.)

  Worse than handwaving -- they undermine their own claims of a 
  deficiency in revision control technology by reporting that,
  to prove their concept, they have automatically imported data
  from a CVS repository.  If CVS data supports their idea then it
  follows that CVS is just fine for the work they are doing -- there
  is nothing wrong with that revision control system at all (from
  their perspective).

~ Now is a good time for philosophers, logicians, and those with
  expertise in library science -- especially members of those groups
  with deep understandings of software, 60s-style A.I., and 
  graph-oriented databases -- to participate in software development   
  for the Semantic Web.   There is considerable long-standing canon 
  in those other disciplines that is "suddenly" relevant to the 
  current standardization and development efforts.

  The absence of awareness of the basics in those fields can be felt
  in papers such as this.  This paragraph of the paper really irked me:

      Our use of existing ontologies is important because
      simply defining a new ontology does not help in shared
      understanding across domains.  This concern was voiced
      by Guus Schreiber [23], who stated "Good ontologies
      are used in applications.  They represent some form of
      consensus in a community...creating my own ontology 
      is a misappropriation of the term.  Ontology is about
      shared understanding" [24].

  That paragraph represents a profound philosophical error of
  considerable political significance (limited, of course, by
  the overall significance of the paper).

  The word they are grasping for is *taxonomy*, not *ontology*.

  Taxonomy: How that which is is conventionally classified.

  Ontology: What that which is, is.

  An example of an ontological hypothesis is "All that exists is
  formed from earth, air, water, and fire."

  An example of a taxonomic practice is: "Books about computer
  programming are shelved in the `600s' section".

  Ontological facts, to the extent there are any, have absolutely
  nothing at all to do with consensus in a community.  What is, is.

  Taxonomic practices have absolutely everything to do with
  consensus in a community.  We agree that the "Javascript Pocket
  Reference" should be shelved in the 600s section.

  Taxonomy and ontology relate, of course.  We are collectively 
  insane if our taxonomies have no grounding in ontological reality.
  Yet taxonomies, while they may be grounded in ontological reality,
  are simplifications -- reductions of that reality.  They are by
  nature incomplete.   As guides to reality, a given taxonomy is just
  a small fragment of a map.

  And there, exactly, is what irks me about the sloppiness of their
  (borrowed) language:  the conflation (confusion between -- missing
  of a distinction between) taxonomy and ontology.

  As products of pedagogy and culture, our ontological perceptions are
  often shaped by our taxonomic context.  Psychologically, we are 
  often unprepared to recognize the *existence* of real things for
  which we lack *names* or *categories*.   Even when all that is real
  fits somewhere in our taxonomic framework, where the structure of
  that framework fails to reflect the reality of that which is, it 
  often leads us to fail to appreciate the nature of that which is 
  classified.

  Some classic examples of what goes wrong when people confuse their
  taxonomy with the world's ontology: racism, sexism, homophobia, and
  other prejudices.  The confusion of taxonomy with ontology is called,
  in some circles, "essentialism" (the false reduction of a thing to
  some presumed essence based on a dubious classification of the thing).

  Bad software engineering may or may not be on par with racism, sexism,
  homophobia et al. but this much is certain:  working backwards from
  excepted taxonomic frameworks to derive presumed engineering
  ontologies is, literally, medieval pseudo-logic.  Before one goes
  about applying received taxonomic frameworks to engineering facts,
  one has an obligation to demonstrate and characterize the engineering
  facts.  Then one can show how they fit in to the taxonomic framework.

  It is not, as the authors abuse the language "important" to use 
  "existing ontologies [sic -- read `taxonomies']" BECAUSE they are
  existing.  The authors have put the cart before the horse, here.

  Some people will read this and correctly acknowledge that I make
  a valid philosophical point but question why I bring it up here.
  Isn't this esoteric point a minor thing in this context?  Isn't
  their software engineering theorizing just fine, even if their
  word choices are a bit wrong?

  I say it matters quite a bit, even if the work in this paper doesn't
  specifically go very far.   Work *like this* will.  Our critical 
  standards matter.  Here is a scenario:

  Suppose that, 3 years hence, a project like this publishes results
  rather than speculation.   In their study, certain patterns of changes
  to software correlate very strongly with, say, social problems among
  the team or programmers making mistakes because they are operating out
  of their area of expertise.   Those would be interesting correlations
  but it would be tragic if project managers began relying on them as
  ontology -- "a programmer whose commits fit such-and-such a pattern,
  clearly being up to no good, shall be fired, without exception".

  This is not an esoteric, "purely philosophical" concern.  On the 
  contrary, taxonomic v. ontological confusion is real, active, and
  increasingly dominant in the free software industry.  It directly
  effects your career options and mine.   A good example is CMU's 
  recent work on "readiness metrics" for open source projects which,
  not especially original to CMU, reflect criteria already in use by
  key industry decision makers:  "Good projects have mailing lists 
  featuring particular activity levels and natures;  good projects
  have such and such a change rate; etc."   Very bad projects can
  and do sail past -- even `game' -- such criteria.   Very good 
  projects can be sabotaged by manipulating the measured quantities
  and qualities of these criteria.  Application of these criteria
  is no substitute for actually (gasp) examining the content of the
  project critically and in detail -- yet such rote application is 
  touted as, and used as, a substitute for caring about the actual 
  facts.

~ As a practical matter, the paper relies on some syntactic quirks
  of Java and thus its applicability in general is limited.  *If*
  the authors produce compelling software engineering results out
  of these ideas, it will be interesting to see whether and how they
  try to generalize these ideas.   Perhaps they might one day have some
  good advice for language designers.

-t
[Prev in Thread]
Current Thread
[Next in Thread]
[Gnu-arch-users] the buzzword paper, Thomas Lord <=
- Re: [Gnu-arch-users] the buzzword paper, Andrew Suffield, 2005/12/13
- Re: [Gnu-arch-users] the buzzword paper, Thomas Lord, 2005/12/15
Prev by Date: Re: [Gnu-arch-users] Storage efficiency of revlibs
Next by Date: Re: [Gnu-arch-users] GNU Arch 1.3.4 release candidate 1
Previous by thread: [Gnu-arch-users] interesting research paper
Next by thread: Re: [Gnu-arch-users] the buzzword paper
Index(es):
- Date
- Thread