monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: [RFC] M.T. phone home


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: [RFC] M.T. phone home
Date: Tue, 13 Jun 2006 18:55:03 -0700
User-agent: Mutt/1.5.11+cvs20060403

On Mon, Jun 12, 2006 at 01:24:23AM -0700, Graydon Hoare wrote:
> Nathaniel Smith wrote:
> 
> >If you made it this far, thanks for reading :-).  I'll probably start
> >implementing this in the next few weeks (assuming that the response to
> >this isn't an overwhelming "this would be a horrible violation and
> >can't be done at all!"), but really want to make sure that we get the
> >details right so that people don't feel spied-on or otherwise
> >uncomfortable.  So, comments?
> 
> I know I discussed this with you, and expressed initial enthusiasm. But 
> the more I think about it, the more clearly my preferences err on the 
> side of privacy. I kept writing up lists of conditions and criteria, and 
> they kept getting narrower and narrower, until I was left with the 
> simple conclusion:
> 
>   Don't do it.
>
> Much as I've often wished for such abilities in my programs, I have to 
> say I now think it's a bad plan. I think people are so fed up with lying 
> spyware that the merest hint of "gathering data" is enough to make users 
> foam at the mouth. Maybe not every user, but programmers more than 
> average, and crypto-y programmers more still.
[...]
> Currently we're trying to beat the reputation of being "the VC that's 
> too slow to use"; imagine how much worse it'd be to have a reputation as 
> "the VC that phones home". No amount of user data is worth that. It's 
> even worse than being "the VC which is full of buffer overruns" or such.

These are good arguments, but I'd like to hear a little more
fine-grained discussion before I'm fully convinced.  Maybe a useful
prompt for that discussion would be this question:

Here are some options on how one could gather usage data from users:
  0) do nothing, continue exactly as now
  1) add usage instrumentation to mtn on a branch; users who want to
     be helpful can pull, build, and use that branch
  2) add usage instrumentation to mainline mtn, but have it be
     completely disabled by default.  Users can, if they want, create
     a magic file in ~/.monotone that enables statistics gathering,
     and then when they remember, they can email that file to us, or
     hit an "upload file" button on a web form somewhere.
  3) same as above, but also add a command like "mtn send-usage-stats"
     that sends the email/pushes the button for them.
  4) same as above, but also add a command like "mtn
     always-send-usage-stats" that toggles a flag so that monotone
     automatically sends the email/pushes the button at useful times.
  5) same as above, but gather data automatically, and nudge the user
     in the UI to either send it or disable it.
  6) same as above, but automatically send data as it is gathered
(I am also assuming that we would, of course, fully document exactly
how personal the info we collect is, make it all fully human-readable,
put a header on the statistics file describing what exactly it was and
how to read the format in case anyone found it be accident, and so
on.)

We all definitely agree that (0) is fine, and that (6) is not.
Therefore, we probably each have some first number that we think would
be unacceptable.  When you say "Don't do it", do you mean that for
you, the line of acceptability falls between (0) and (1), (1) is
already too much, or, can you expand on where exactly you think the
line is?

(Dan Carosone: I'd love to hear your answer here too, since you've
also come down strongly against this general approach on IRC.)

>From previous mails, I believe that Nuno Lucas's line falls between
(3) and (4), and on further thought, mine is between (4) and (5) (my
original email sketch design is for (5), but I'm feeling more
conservative now).

Of course, then there are other questions, like whether it's even
worth the effort to do something like (1), but I'm inclined to think
that it is, given how _very_ little we know about how people use this
kind of software "in the wild"... but, well, if it can't be done, it
can't be done.

It might also be worthwhile to look at some analogous projects.  The
two that I can think of are:
  CBI: http://www.cs.wisc.edu/cbi/
    gathers sparse statistics on _every_ program run
    users enable by installing special builds of popular software that
      they provide
    data is kept private
    used for research on automatically finding bugs
  popcon: http://popcon.debian.org/
    gathers statistics on which packages are installed, and used
      (based on atime)
    users enable by 'apt-get install popcon'
    only summary statistics are published, though the developers get
      quite detailed information (per-host UUID, full list of atimes
      and ctimes, file by file, for each package installed)
    used to determine which packages are higher priority for debian
    used by debian maintainers to track their package's popularity
    used as a data source for other views, e.g.:
      http://qa.debian.org/address@hidden
(There are also things like crash-reporter tools in mozilla, gnome,
etc., but those only are enabled at crash-time, which is very
different from a UI point of view from something that tries to be
unobtrusive.)

-- Nathaniel

-- 
"If you can explain how you do something, then you're very very bad at it."
  -- John Hopfield




reply via email to

[Prev in Thread] Current Thread [Next in Thread]