pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Excessive file system usage


From: Dave Trollope
Subject: Re: Excessive file system usage
Date: Wed, 4 Dec 2019 09:24:47 -0600

Hi Alan,

Sorry, yes I forgot to mention this is linux, Debian GNU/Linux 9
Linux e1e6db1d8408 4.9.184-linuxkit #1 SMP Tue Jul 2 22:58:16 UTC 2019 x86_64 
GNU/Linux

I’ve reproduced this behavior in kubernetes and outside kubernetes in a raw 
docker container so its not kubernetes specific but may be related to the way 
the containerized image is built in docker.

We haven’t observed this on our standard ec2, but to be honest we haven’t 
monitored in the same way - I can try that and see. We have enough space there 
that it could have gone unnoticed. I will try.

What I'm doing is watching the filesystem as the SAVE TRANSLATE command is 
running, using watch -n 0.5 "df -H; ls -ltr /tmp"

The only file being written is the csv but the filesystem used space is 
dropping at a much higher rate than data being written. No other temp files are 
being placed in /tmp

I also reproduced this using a ram based fs - if you watch the usage it behaves 
the same so I don't think its specific to dockerized filesystems, but I might 
yet be wrong on that.

The link you share is a common problem when starting out with containers where 
the build process creates lots of images. As you build lots of images, you have 
to cleanup. Its one of the first things you learn as you step in to the 
container world!

Appreciate the quick reply. It certainly was a shocking observation when I 
found it :-)

Cheers
Dave


On Dec 4, 2019, 8:29 AM -0600, Alan Mead <address@hidden>, wrote:
> Wow, that's a lot. Do you mean that 7GB of space are needed (for, I guess 
> temporary files)? And you did not observe that previously?
>
> Maybe the devs are familiar with kubernetes; I only know the name. Can you 
> describe the environment (e.g., OS)? And pspp version? How many conversions 
> have you observed this behavior?
>
> And you're sure this isn't a kubernetes problem (like it's making snapshots 
> as it writes the file or something)? I ask because when I google about this, 
> it looks like there are sharp edges; glancing through, these don't seem to 
> directly and specifically address the behavior you're seeing, but it looks 
> like there could be these kinds of issues with kubernetes and the PSPP devs 
> wouldn't be able to help unless they knew kubernetes:
>
> https://cntnr.io/whats-eating-my-disk-docker-system-commands-explained-d778178f96f1
> https://softwareengineeringdaily.com/2019/01/11/why-is-storage-on-kubernetes-is-so-hard/
>
> -Alan
>
>
> On 12/4/2019 6:40 AM, Dave Trollope wrote:
> > We just moved Pspp to Kubernetes containers where we use it to extract csvs 
> > from sav files. The sav files are about 1gb and each csv is about 150mb.
> >
> > We’ve watched the file system as it does it and over 7gb of the file system 
> > is used while writing 150mb. I assume the SAVE command is doing lots of 
> > seeks and insertions in the file magnifying the file system usage. Any 
> > options to limit this behavior?
> >
> > Here is the script we are using
> > GET FILE = "{}"
> >
> > SAVE TRANSLATE
> >  /OUTFILE="{}"
> >  /TYPE=CSV
> >  /FIELDNAMES
> >  /REPLACE
> >  /KEEP={}
> >  /MISSING=RECODE
> >  /CELLS=LABELS.
> > Cheers
> > Dave
> >
>
> --
>
> Alan D. Mead, Ph.D.
> President, Talent Algorithms Inc.
>
> science + technology = better workers
>
> http://www.alanmead.org
>
> The irony of this ... is that the Internet is
> both almost-infinitely expandable, while at the
> same time constrained within its own pre-defined
> box. And if that makes no sense to you, just
> reflect on the existence of Facebook. We have
> the vastness of the internet and yet billions
> of people decided to spend most of them time
> within a horribly designed, fake-news emporium
> of a website that sucks every possible piece of
> personal information out of you so it can sell it
> to others. And they see nothing wrong with that.
>
> -- Kieren McCarthy, commenting on why we are not
>                    all using IPv6


reply via email to

[Prev in Thread] Current Thread [Next in Thread]