[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to debug `parallel` crash?
From: |
Ole Tange |
Subject: |
Re: How to debug `parallel` crash? |
Date: |
Sun, 10 Jul 2022 13:04:30 +0800 |
On Sun, Jul 10, 2022 at 5:35 AM Nagle, Michael F
<michael.nagle@oregonstate.edu> wrote:
>
> First, I’d like to thank the developers and community for producing GNU
> Parallel and supporting it.
Thanks. You can help by:
• (Re-)walk through the tutorial if you have not done so in the past
year (https://www.gnu.org/software/parallel/parallel_tutorial.html)
• Give a demo at your local user group/your team/your colleagues
• Post the intro videos and the tutorial on Reddit, Mastodon,
Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
Linkedin, and mailing lists
• Request or write a review for your favourite blog or magazine
(especially if you do something cool with GNU parallel)
• Invite me for your next conference
If you use GNU parallel for research:
• Please cite GNU parallel in you publications (use --citation)
If GNU parallel saves you money:
• (Have your company) donate to FSF or become a member
https://my.fsf.org/donate/
> I use GNU parallel for a particular part of a scientific workflow, and it
> worked great on a previous machine. On a new machine (with many more cores),
> I’m now having it crash sometimes and am having trouble debugging this.
If you can, you should follow:
https://www.gnu.org/software/parallel/man.html#reporting-bugs
And in your case:
https://www.gnu.org/software/parallel/man.html#bug-dependent-on-environment
In a few weeks I will have access to a 64-core AMD 512 GB server
running Ubuntu 22.04, so it should be possible to get *very* close to
the environment you experience this in.
In your case you should try:
* Can the bug be triggered reliably with multiple copies of the same
input file? Or do the input files need to be different?
* Can it be triggered by running fewer jobs in parallel?
* Can it be triggered by converting the code to `xargs -P`? (in which
case it is probably not GNU Parallel that is the root cause).
To help you think out of the box see
https://github.com/tesseract-ocr/tesseract/issues/3109 It shows
Tesseract working badly if multiple copies are run in parallel. GNU
Parallel is not the root cause, but is uncovering this.
/Ole