bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Segmentation fault" when input contains embedded NUL characters


From: Erik Auerswald
Subject: Re: "Segmentation fault" when input contains embedded NUL characters
Date: Mon, 6 Jun 2022 19:23:44 +0200

Hi Catalin,

On Sat, Oct 24, 2020 at 02:44:18AM -0000, Catalin Patulea wrote:
> 
> $ datamash  --version
> datamash (GNU datamash) 1.6
> 
> $ dd if=/dev/zero bs=100 count=1 | datamash countunique 1
> 1+0 records in
> 1+0 records out
> 100 bytes copied, 0.000125612 s, 796 kB/s
> Segmentation fault
> 
> backtrace:
> 
> (gdb) bt
> #0  0x000055555555c95c in field_op_get_string_ptrs (op=0x55555557a5f0,
> sort_case_sensitive=sort_case_sensitive@entry=true, sort=true)
>     at src/field-ops.c:278
> [...]
> Simply, field_op_get_string_ptrs, and probably datamash in general,
> assumes input will not contain embedded NULs:
> https://github.com/agordon/datamash/blob/v1.6/src/field-ops.c#L279

Thanks for the bug report!

Indeed, NUL bytes in the input data do not result in useful results for
many operations.

Your test input of 100 NUL bytes does work fine with the checksumming
operations, thus I would not say that GNU Datamash in general assumes
no NUL bytes in the input.

> For my application, the embedded NULs are an accident, and I can
> resolve that and resume using datamash. datamash does not need to
> support inputs with embedded NULs. But it should not crash on such
> inputs, either. Perhaps output a message warning the user that such
> inputs are not supported.

I have tested quite a few GNU datamash operations with input data
comprising only NUL bytes.  Only the "unique" and "countunique" operations
resulted in memory errors and crashes in my tests.

The reason seems to be a missing bounds check in the
field_op_get_string_ptrs() function.  I have added the bounds
check in git commit 71bfdc5aa07c47496bd9dc8d9f6c60c77804d256
(https://git.savannah.gnu.org/gitweb/?p=datamash.git;a=commit;h=71bfdc5aa07c47496bd9dc8d9f6c60c77804d256).

Best regards,
Erik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]