[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
C++ hash map usage
From: |
Gavin Smith |
Subject: |
C++ hash map usage |
Date: |
Mon, 14 Oct 2024 13:25:29 +0100 |
On Sun, Oct 06, 2024 at 02:28:02PM +0200, Patrice Dumas wrote:
> On Sun, Oct 06, 2024 at 12:09:45PM +0100, Gavin Smith wrote:
> > On Sun, Oct 06, 2024 at 09:53:22AM +0200, Patrice Dumas wrote:
> > > > How much slower would the linear search actually be?
> > >
> > > It is much slower (if I recall well, it was something like 100 times
> > > slower for the texi2any manual). I juste tested the overall effect and
> > > for the pure C demonstrator for the Texinfo manual, time is about 0.375
> > > s with linear search and 0.305 with the hash map.
>
> > Does it scale with the size of the input? Perhaps a certain part of
> > this 0.07 sec is fixed regardless of the size of the input file. Have
> > you tried it with a larger manual like glibc or elisp?
>
> I think that it should be worse for bigger manuals, as the linear search
> time increases with the number of entries for each search, while the
> hashmap time traversal should be less dependent to the number of
> entries.
>
> For the elisp manual, the times are 1.12s with hashmaps and 1.51s
> with linear search.
I get a similar comparison.
I understand that the C++ hash maps make a large difference in
relative terms, but it is still no more than half a second shorter run
time. I doubt this is worth the extra complexity of having part of
the source code in C++.
Even if there is back-up code if a C++ compiler isn't available, it will
tend not be tested and may become buggy and/or inefficient over time. The
presence of C++ code will also make more C++ more likely in the future.
It may be possible to get almost the same amount of efficiency with a
relatively small amount of C code.
I thought it would be interesting to see the hash usage for some manuals
(tested with teximakehtml):
manual number of keys number of buckets
elisp 9779 10273
libc 8018 10273
texinfo 3997 5087
gawk 5079 5087
m4 793 1109
(groff.texi gave a segmentation fault - I haven't investigated.)
These numbers do not seem so high that we need an ultra-optimised
algorithm (developed by C++ boffins) to deal with them - anything faster
than a linear search would probably good enough.
I have some spare time over the next couple of days so may be able
to come up with something.
We do not need the full level of generality that a general purpose library
in C would likely provide as the hashmap is from std::string (char * from
the C code) and maps to an int, and moreover keys only take the value 1 or
are undefined.
Tested with the following patch:
diff --git a/tp/Texinfo/XS/convert/call_html_cxx_function.cpp
b/tp/Texinfo/XS/convert/call_html_cxx_function.cpp
index 429d0ab600..4d174da100 100644
--- a/tp/Texinfo/XS/convert/call_html_cxx_function.cpp
+++ b/tp/Texinfo/XS/convert/call_html_cxx_function.cpp
@@ -52,6 +52,7 @@ hashmap_register_id (CONVERTER *self, const char *in_string)
CXX_HASHMAP *registered_id = (CXX_HASHMAP *)self->registered_ids_hashmap;
std::string string (in_string);
(*registered_id)[string] = 1;
+ std::cout << registered_id->size() << ":" << registered_id->bucket_count()
<< "\n";
}
void
diff --git a/tp/Texinfo/XS/teximakehtml.c b/tp/Texinfo/XS/teximakehtml.c
index 16029d4bd8..9f91ae5a06 100644
--- a/tp/Texinfo/XS/teximakehtml.c
+++ b/tp/Texinfo/XS/teximakehtml.c
@@ -314,9 +314,8 @@ main (int argc, char *argv[])
&convert_options,
/* default, use C++ hashmap if available */
0);
- /* to test linear search
- CONVF_string_list);
- */
+ /* to test linear search */
+ //CONVF_string_list);
free_strings_list (&converter_texinfo_language_config_dirs);
free_strings_list (&texinfo_language_config_dirs);
- Re: Flood of commits from July?, (continued)
- Re: Flood of commits from July?, Gavin Smith, 2024/10/06
- C++ hashmap implementation, Gavin Smith, 2024/10/06
- Re: C++ hashmap implementation, Patrice Dumas, 2024/10/06
- Re: C++ hashmap implementation, Patrice Dumas, 2024/10/06
- Re: C++ hashmap implementation, Patrice Dumas, 2024/10/06
- Re: C++ hashmap implementation, Patrice Dumas, 2024/10/06
- Re: C++ hashmap implementation, Patrice Dumas, 2024/10/06
- Re: C++ hashmap implementation, Gavin Smith, 2024/10/15
- Re: C++ hashmap implementation, Patrice Dumas, 2024/10/15
- Re: Flood of commits from July?, Patrice Dumas, 2024/10/06
- C++ hash map usage,
Gavin Smith <=
- Re: C++ hash map usage, Gavin Smith, 2024/10/14
- Re: C++ hash map usage, Patrice Dumas, 2024/10/14
- Re: C++ hash map usage, Gavin Smith, 2024/10/15
- Re: C++ hash map usage, Patrice Dumas, 2024/10/15
- Re: C++ hash map usage, Gavin Smith, 2024/10/17
- Re: C++ hash map usage, Patrice Dumas, 2024/10/17
- Re: C++ hash map usage, Patrice Dumas, 2024/10/17
- Re: C++ hash map usage, Gavin Smith, 2024/10/18
- Re: C++ hash map usage, Gavin Smith, 2024/10/19
- Re: Flood of commits from July?, Patrice Dumas, 2024/10/06