gwl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Managing data files in workflows


From: Konrad Hinsen
Subject: Re: Managing data files in workflows
Date: Fri, 02 Apr 2021 10:41:35 +0200

Hi Ricardo,

> Maybe.  You could run with “--dry-run” to see what GWL claims it would
> do to confirm that it considers the file to be “not cached”.
>
> Also enable more log events (in particular cache events) with
>
> “--log-events=error,info,execute,cache,debug”

Thanks, I think I made progress with those nice debugging aids.

When I run my workflow for the first time, I see

  cache: Caching `./data/weekly-incidence.csv' as
  
`/tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra/./data/weekly-incidence.csv'

The '.' in there looks suspect. Let's see what I got:

   $ ls -lR /tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra
   /tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra:
   total 4
   drwxrwxr-x 2 hinsen hinsen 4096  2 avril 10:13 data

   /tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra/data:
   total 0
   lrwxrwxrwx 1 hinsen hinsen 27  2 avril 10:13 weekly-incidence.csv -> 
./data/weekly-incidence.csv

That's an invalid symbolic link, so it's not surprising that a second
run doesn't find the cached file.

When I use an absolute filename to refer to my download target, the
symlink in the cache is valid and points to the downloaded file. And
when I run the workflow a second time, it skips the "download" process
as expected. But then, it fails trying to "restore" the file:

   run: Skipping process "download" (cached at 
/tmp/gwl/ubvscxwoezl63qmvyfszlf6azmuc655h7gbbtosqshlm5r6ckyhq/).
   cache: Restoring 
`/tmp/gwl/ubvscxwoezl63qmvyfszlf6azmuc655h7gbbtosqshlm5r6ckyhq//home/hinsen/projects/mooc-workflows/influenza-analysis/data/weekly-incidence.csv'
 to 
`/home/hinsen/projects/mooc-workflows/influenza-analysis/data/weekly-incidence.csv'
   Backtrace:
              6 (primitive-load "/home/hinsen/.config/guix/current/bin/guix")
   In guix/ui.scm:
     2164:12  5 (run-guix-command _ . _)
   In srfi/srfi-1.scm:
      460:18  4 (fold #<procedure 7f45ba1d5c40 at gwl/workflows.scm:388:2 
(ite…> …)
      460:18  3 (fold #<procedure 7f45ba1d5c20 at gwl/workflows.scm:391:13 
(pr…> …)
   In gwl/workflows.scm:
      392:21  2 (_ #<process download> ())
   In srfi/srfi-1.scm:
       634:9  1 (for-each #<procedure 7f45ba1d57e0 at 
gwl/workflows.scm:549:26…> …)
   In guix/ui.scm:
       566:4  0 (_ system-error "symlink" _ _ _)

   guix/ui.scm:566:4: In procedure symlink: Operation not permitted: 
"/home/hinsen/projects/mooc-workflows/influenza-analysis/data/weekly-incidence.csv"

Looking at the source code in (gwl cache), restoring means symlinking
the target file to the cached file, which can't work given that the
cache is already a symlink to the target file.

So... I don't understand how the cache is supposed to work. If it stores
symlinks, there is no need to restore anything. If it is supposed to
store copies, then that's not what it does. My original issue with the
relative filename is a detail that should be easy to fix.

Cheers,
  Konrad.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]