bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multipage requests for GNU Mach 1,3


From: Sergio Lopez
Subject: Re: Multipage requests for GNU Mach 1,3
Date: Sat, 18 Dec 2004 05:02:38 +0100

On Fri, 17 Dec 2004 11:09:26 +0000
"Neal H. Walfield" <neal@walfield.org> wrote:

> Hi,
> 
> > I've been playing a little with GNU Mach, and I think there is a
> > thing that could be nice to implement in it. In "vm/vm_fault.c",
> > when the kernel is requesting some data from a translatator for a
> > memory_object, we can read this code:
> > 
> >     if ((rc = memory_object_data_request(object->pager,
> >             object->pager_request,
> >             m->offset + object->paging_offset,
> >             PAGE_SIZE, access_required)) != KERN_SUCCESS) {
> > 
> > And this is the syntax for m_o_d_request (from The GNU Mach
> > Reference Manual):
> > 
> >     kern_return_t seqnos_memory_object_data_request (
> >             memory_object_t memory_object,
> >             mach_port_seqno_t seqno,
> >             memory_object_control_t memory_control,
> >             vm_offset_t offset, 
> >             vm_offset_t length,
> >             vm_prot_t desired_access)
> > 
> > As you can see, the parameter for "length" is always "PAGE_SIZE"
> > (you know, 4K in x86) in GNU Mach. This means that for a translator
> > which works reading and writting from a disk (like ext2fs), every
> > I/O operation is splitted up into 4K fragments.
> 
> A small nit: data_request is only for page in, not page outs.  That
> is, this does not effect every i/o operation, only input operations
> (and specifically those resulting from vm faults).  Page outs are only
> grouped together in memory_object_lock_request (vm/memory_object.c)
> which is only invoked from user space.  When Mach evicts pages, it
> calls vm_pageout_page (vm/vm_pageout.c) which only returns a single
> page at a time (using memory_object_data_return) rather than trying to
> coalesce them in, for instance, vm_object_terminate (vm/vm_object.c).
> Maybe you would be interested in looking at this problem after you get
> page in clusters to work.
> 

Yes, I've forgotten to talk about page outs :-) as you said,
m_o_lock_request calls m_o_data_return sending
multiple pages at once, but due to the libpager's API limitation,
writting is done page by page:

      for (i = 0; i < npages; i++)
        if (!(omitdata & (1 << i)))
          pagerrs[i] = pager_write_page (p->upi,
                                         offset + (vm_page_size * i),
                                         data + (vm_page_size * i));

Changing libpager's API, and working a little in translators, this issue
can be easily solved. Also, I'll take a look at vm_pageout_page and
vm_pageout_scan (I've done that before, but touching it makes Mach
become some unstable, probably I was doing it the wrong way).

> > But, in OSF Mach, things are a bit different. The memory_objects
> > have a property named "cluster_size", and "length" in m_o_d_request
> > is determined by that. I don't know where OSF Mach sets the value of
> > cluster_size, but we can do it in m_o_ready/m_o_set_attributes, so
> > every translator can set this as it wants.
> 
> Mac OS X, which is based off of OSF's Mach, supports this in
> memory_object_change_attributes [1].  I think it makes sense to be
> compatible with their interface even it if means updating our API.
> 

Fine, anyways, having multipage support requires API changes.

> > but even with this issue, benchmarks [1] (I've
> > made and fast (ugly, buggy and dirty) implementation over GNU Mach
> > to test it [2]) show that the performance for I/O operations is
> > slightly increased.
> 
> That looks promising.  I assume that you must have also changed
> libpager as hurd/libpager/data_request.c only supports length ==
> vm_page_size?  I looked in the CVS repository on bee.nopcode.org,
> however, I did not see a Hurd tree. 

Yes, I changed libpager, ext2fs and isofs for testing pruposes. I've
just uploaded the Hurd tree to Bee's CVS (*.mp are the modified
directories).

If you look at that code, you'll find some calls to *_direct functions,
and that there are no memory allocations on file_page_read_multipage().
As I said before, the code in CVS contains other changes, not only the
ones realted to multipage requests. Those *_direct functions are a
experimental change, which makes that the glue code inserts the
requested pages directly to the pager's memory object.

I think that I'll reimplement multipage support on a clean GNU Mach
tree, to make changes progressive and easy to review.

> > But with this strategy we have a trouble that must be resolved. Many
> > times, GNU Mach requests more pages than the translator (ext2fs in
> > my tests) can fill (if your are dumping a 17K file, with a 16K
> > cluster_size(4 pages), first call will fill all the pages, and the
> > second only 1) , and we must free it some way. I think that
> > m_o_d_unavailable and m_o_d_error don't fit well for this purpose,
> > so I've hacked the glue code (linux/dev/glue/block.c), to make that
> > "device_read" writes the pages directly to the memory_object,
> > freeing the unused ones at time(probably, there is a much better way
> > to do this ;-).
> 
> The memory object operates at the page level granularity.  So if you
> do a memory_object_data_supply for the pages you do have and
> memory_object_data_error for those you don't, it would seem to me that
> it should work.  Is this not the case?
> 

I think I tried with m_o_d_error and I had some trouble, but I can't
remember right now. Anyways, I'll take a look at this again.

> > What do you think about this?
> 
> This looks promising.  I look forward to seeing the patch and the
> results of benchmarks with various cluster sizes with ext2fs before I
> advocate upstream inclusion.  (Also, we need to think about copyright
> assignment to the FSF for both the Hurd and Mach if you have not done
> those yet?)

For the copyright issue, what do I need to do?

> 
> Thanks for your work,

Hey, playing with GNU Mach is really funny! ;-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]