[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Optimise memset on i386
From: |
Vladimir 'φ-coder/phcoder' Serbinenko |
Subject: |
Re: [PATCH] Optimise memset on i386 |
Date: |
Fri, 25 Jun 2010 20:04:41 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100515 Icedove/3.0.4 |
On 06/23/2010 11:38 PM, Colin Watson wrote:
> With this approach, one of the most noticeable time sinks is that
> setting a graphical video mode (I'm using the VBE backend) takes ages:
> 1.6 seconds, which is a substantial percentage of this project's total
> boot time. It turns out that most of this is spent initialising
> double-buffering: doublebuf_pageflipping_init calls
> grub_video_fb_create_render_target_from_pointer twice, and each call
> takes a little over 600 milliseconds. Now,
> grub_video_fb_create_render_target_from_pointer is basically just a big
> grub_memset to clear framebuffer memory, so this equates to under two
> frames per second. What's going on?
>
> It turns out that write caching is disabled on video memory when GRUB is
> running, so we take a cache stall on every single write, and it's
> apparently hard to enable caching without implementing MTRRs. People
> who know more about this than I do tell me that this can get
> unpleasantly CPU-specific at times, although I still hold out some hope
> that it's possible in GRUB.
>
>
On non-device memory GRUB should take advantage of cache. On MIPS
enabling/disabling cache is done by using a different address. So we
have all infrastructure necessary for differentiating
cacheable/non-cacheable is present. Enabling cache on video memory is
however more of a trouble. One of the reasons is that cache nmishandling
produces difficult bugs.
> However, there's a way to substantially speed things up without that.
> The naïve implementation of grub_memset writes a byte at a time, and for
> that matter on i386 it compiles to a poorly-optimised loop rather than
> using REP STOS or similar. grub_memset is an inner loop practically by
> definition, and it's worth optimising. We can fix both of these
> weaknesses by importing the optimised memset from GNU libc: since it
> writes four bytes at a time except (sometimes) at the start and end, it
> should take about a quarter the number of cache stalls. And, indeed,
> measurement bears this out: instead of taking over 600 milliseconds per
> call to grub_video_fb_create_render_target_from_pointer (I think it was
> actually 630 or so, though I neglected to write that down), GRUB now
> takes about 160 milliseconds per call. Much better!
>
> The optimised memset is LGPLv2.1 or later, and I've preserved that
> notice, but as far as I know this should be fine for use in GRUB; it can
> be upgraded to LGPLv3, and that's just GPLv3 with some additional
> permissions. It's already assigned to the FSF due to being in glibc.
>
>
It's ok to use this code but be sure to mention its origin. It's also ok
to keep its license unless big divergeance is to be expected.
Did you test it on x86_64?
> +void *
> +grub_memset (void *s, int c, grub_size_t n)
> +{
> + unsigned char *p = (unsigned char *) s;
> +
> + while (n--)
> + *p++ = (unsigned char) c;
> +
> + return s;
> +}
>
This can be optimised the same way as i386 part, just replace stos with
a loop over iterator with a pointer aligned on its size.
> Thanks,
>
>
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
signature.asc
Description: OpenPGP digital signature