[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multipage requests for GNU Mach 1,3

From: Neal H. Walfield
Subject: Re: Multipage requests for GNU Mach 1,3
Date: Fri, 17 Dec 2004 11:09:26 +0000
User-agent: Wanderlust/2.10.1 (Watching The Wheels) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/21.2 (i386-debian-linux-gnu) MULE/5.0 (SAKAKI)


> I've been playing a little with GNU Mach, and I think there is a thing
> that could be nice to implement in it. In "vm/vm_fault.c", when the
> kernel is requesting some data from a translatator for a memory_object,
> we can read this code:
>       if ((rc = memory_object_data_request(object->pager,
>               object->pager_request,
>               m->offset + object->paging_offset,
>               PAGE_SIZE, access_required)) != KERN_SUCCESS) {
> And this is the syntax for m_o_d_request (from The GNU Mach Reference
> Manual):
>       kern_return_t seqnos_memory_object_data_request (
>               memory_object_t memory_object,
>               mach_port_seqno_t seqno,
>               memory_object_control_t memory_control,
>               vm_offset_t offset, 
>               vm_offset_t length,
>               vm_prot_t desired_access)
> As you can see, the parameter for "length" is always "PAGE_SIZE" (you
> know, 4K in x86) in GNU Mach. This means that for a translator which
> works reading and writting from a disk (like ext2fs), every I/O
> operation is splitted up into 4K fragments.

A small nit: data_request is only for page in, not page outs.  That
is, this does not effect every i/o operation, only input operations
(and specifically those resulting from vm faults).  Page outs are only
grouped together in memory_object_lock_request (vm/memory_object.c)
which is only invoked from user space.  When Mach evicts pages, it
calls vm_pageout_page (vm/vm_pageout.c) which only returns a single
page at a time (using memory_object_data_return) rather than trying to
coalesce them in, for instance, vm_object_terminate (vm/vm_object.c).
Maybe you would be interested in looking at this problem after you get
page in clusters to work.

> But, in OSF Mach, things are a bit different. The memory_objects have a
> property named "cluster_size", and "length" in m_o_d_request is
> determined by that. I don't know where OSF Mach sets the value of
> cluster_size, but we can do it in m_o_ready/m_o_set_attributes, so every
> translator can set this as it wants.

Mac OS X, which is based off of OSF's Mach, supports this in
memory_object_change_attributes [1].  I think it makes sense to be
compatible with their interface even it if means updating our API.

> This means that, when a page fault is triggered from a memory_object
> that needs data, vm_fault.c fills (cluster_size/PAGE_SIZE) pages,
> starting with the one that generated the fault. Many times, we read more
> data than we ever use,

This is true, however, since the external memory managers are able to
specify the size then we can assume that they know best.  Indeed, it
often makes sense to read larger blocks from the disk as reading 16k,
for instance, is only marginally more expensive than reading 4k.

> but even with this issue, benchmarks [1] (I've
> made and fast (ugly, buggy and dirty) implementation over GNU Mach to
> test it [2]) show that the performance for I/O operations is slightly
> increased.

That looks promising.  I assume that you must have also changed
libpager as hurd/libpager/data_request.c only supports length ==
vm_page_size?  I looked in the CVS repository on bee.nopcode.org,
however, I did not see a Hurd tree.  You may want to look at
resurrecting [2].  I recollect that when Roland did eventually look at
it (I can't seem to find any message in the list archives), he said
that he preferred to use byte offsets rather than page offsets.  But I
think that that was so long after I wrote that patch that I was
working on other things and didn't have time to fix it up.  I would be
happy to work with you on this, however.

> But with this strategy we have a trouble that must be resolved. Many
> times, GNU Mach requests more pages than the translator (ext2fs in my
> tests) can fill (if your are dumping a 17K file, with a 16K cluster_size
> (4 pages), first call will fill all the pages, and the second only 1) ,
> and we must free it some way. I think that m_o_d_unavailable and
> m_o_d_error don't fit well for this purpose, so I've hacked the glue
> code (linux/dev/glue/block.c), to make that "device_read" writes the
> pages directly to the memory_object, freeing the unused ones at time
> (probably, there is a much better way to do this ;-).

The memory object operates at the page level granularity.  So if you
do a memory_object_data_supply for the pages you do have and
memory_object_data_error for those you don't, it would seem to me that
it should work.  Is this not the case?

> What do you think about this?

This looks promising.  I look forward to seeing the patch and the
results of benchmarks with various cluster sizes with ext2fs before I
advocate upstream inclusion.  (Also, we need to think about copyright
assignment to the FSF for both the Hurd and Mach if you have not done
those yet?)

Thanks for your work,

[2] http://lists.gnu.org/archive/html/bug-hurd/2002-04/msg00110.html
    and http://lists.gnu.org/archive/html/bug-hurd/2002-04/msg00203.html

reply via email to

[Prev in Thread] Current Thread [Next in Thread]