commit-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SCM] GNU Mach branch, master, updated. v1.8-902-ge2332ece


From: Samuel Thibault
Subject: [SCM] GNU Mach branch, master, updated. v1.8-902-ge2332ece
Date: Fri, 5 Apr 2024 12:07:32 -0400 (EDT)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Mach".

The branch, master has been updated
       via  e2332eceff15dc37193a4f82fbac6a913af540ca (commit)
       via  13751a0fc146976ac722e8424f9c35c59d678f4f (commit)
       via  1a4ce71588630224412891fae16c59cde94190bf (commit)
       via  6e5e1f09ef6e1ae0b92d38f0c7960ef3fed9f63b (commit)
       via  7b87e10b89a190bca0d9e6c5e183984411840dc3 (commit)
      from  df19628b2b6665468e698e290dfd1568720ba042 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit e2332eceff15dc37193a4f82fbac6a913af540ca
Author: Samuel Thibault <samuel.thibault@ens-lyon.org>
Date:   Fri Apr 5 18:01:52 2024 +0200

    SMP: force APIC
    
    We need it to properly driver interrupts etc. of APs

commit 13751a0fc146976ac722e8424f9c35c59d678f4f
Author: Samuel Thibault <samuel.thibault@ens-lyon.org>
Date:   Fri Apr 5 17:51:06 2024 +0200

    linux: Do not enable in SMP, it is not MP-safe

commit 1a4ce71588630224412891fae16c59cde94190bf
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Fri Apr 5 18:18:50 2024 +0300

    vm: Mark entries as in-transition while wiring down
    
    When operating on the kernel map, vm_map_pageable_scan() does what
    the code itself describes as "HACK HACK HACK HACK": it unlocks the map,
    and calls vm_fault_wire() with the map unlocked. This hack is required
    to avoid a deadlock in case vm_fault or one of its callees (perhaps, a
    pager) needs to allocate memory in the kernel map. The hack relies on
    other kernel code being "well-behaved", in particular on that nothing
    will do any serious changes to this region of memory while the map is
    unlocked, since this region of memory is "owned" by the caller.
    
    Even if the kernel code is "well-behaved" and doesn't alter VM regions
    that it doesn't "own", it can still access adjacent regions. While this
    doesn't affect the region being wired down as such, it can still end up
    causing trouble due to extension & coalescence (merging) of VM entries.
    
    VM entry coalescence is an optimization where two adjacent VM entries
    with identical properties are merged into a single one that spans the
    combined region of the two original entries. VM entry extension is a
    similar an optimization where an existing VM entry is extended to cover
    an adjacent region, instead of a new VM entry being created to describe
    the region.
    
    These optimizations are a private implementation detail of vm_map, and
    (while they can be observed through e.g. vm_region) they are not
    supposed to cause any visible effects to how the described regions of
    memory behave; coalescence/extension and clipping happen automatically
    as needed when adding or removing mappings, or changing their
    properties. This is why it's fine for "well-behaved" kernel code to
    unknowingly cause extension or coalescence of VM entries describing a
    region by operating on adjacent VM regions.
    
    The "HACK HACK HACK HACK" code path relies on the VM entries in the
    region staying intact while it keeps the map unlocked, as it passes
    direct pointers to the entries into vm_fault_wire(), and also walks the
    list of entries in the region by following the vme_next pointers in the
    entries. Yet, this assumption is violated by the entries getting
    concurrently modified by other kernel code operating on adjacent VM
    regions, as described above. This is not only undefined behavior in the
    sense of the C language standard, but can also cause very real issues.
    
    Specifically, we've been seeing the VM subsystem deadlock when building
    Mach with SMP support and running a test program that calls
    mach_port_names() concurrently and repearedly. mach_port_names()
    implementation allocates and wires down memory, and when called from
    multiple threads, it was likely to allocate, and wire, several adjacent
    regions of memory, which would then cause entry coalescence/extension
    and clipping to kick in. The specific sequence of events that led to a
    deadlock appear to have been:
    
    1. Multiple threads execute mach_port_names() concurrently.
    2. One of the threads is wiring down a memory region, another is
       unwiring an adjacent memory region.
    3. The wiring thread has unlocked the ipc_kernel_map, and called into
       vm_fault_wire().
    4. Due to entry coalescence/extension, the entry the wiring thread was
       going to wire down now describes a broader region of memory, namely
       it includes an adjustent region of memory that has previously been
       wired down by the other thread that is about to unwire it.
    5. The wiring thread sets the busy bit on a wired-down page that the
       unwiring thread is about to unwire, and is waiting to take the map
       lock for reading in vm_map_verify().
    6. The unwiring thread holds the map lock for writing, and is waiting
       for the page to lose its busy bit.
    7. Deadlock!
    
    To prevent this from happening, we have to ensure that the VM entries,
    at least as passed into vm_fault_wire() and as used for walking the list
    of such entries, stay intact while we have the map unlocked. One simple
    way to achieve that that I have proposed previously is to make a
    temporary copy of the VM entries in the region, and pass the copies into
    vm_fault_wire(). The entry copies would not be affected by coalescence/
    extension, even if the original entries in the map are. This is however
    only straigtforward to do when there's just a single entry describing
    the while region, and there are further concerns with e.g. whether the
    underlying memory objects could, too, get coalesced.
    
    Arguably, making copies of the memory entries is making the hack even
    bigger. This patch instead implements a relatively clean solution that,
    arguably, makes the whole thing less of a hack: namely, making use of
    the in-transition bit on VM entries to prevent coalescence and any other
    unwanted effects. The entry in-transition bit was introduced for a very
    similar use case: the VM map copyout logic has to temporarily unlock the
    map to run its continuation, so it marks the VM entries it copied out
    into the map up to that point as being "in transition", asking other
    code to hold off making any serious changes to those entries. There's a
    companion "needs wakeup" bit that other code can set to block on the VM
    entry exiting this in-transition state; the code that puts an entry into
    the in-transition state is expected to, when unsetting the in-transition
    bit back, check for needs_wakeup being set, and wake any waiters up in
    that case, so they can retry whatever operation they wanted to do. There
    is no need to check for needs_wakeup in case of vm_map_pageable_scan(),
    however, exactly because we expect kernel code to be "well-behaved" and
    not make any attempts to modify the VM region.
    
    This relies on the in-transition bit inhibiting coalescence/extension,
    as implemented in the previous commit.
    
    Also, fix a tiny sad misaligned comment line.
    
    Reported-by: Damien Zammit <damien@zamaudio.com>
    Helped-by: Damien Zammit <damien@zamaudio.com>
    Message-ID: <20240405151850.41633-3-bugaevc@gmail.com>

commit 6e5e1f09ef6e1ae0b92d38f0c7960ef3fed9f63b
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Fri Apr 5 18:18:49 2024 +0300

    vm: Don't attempt to extend in-transition entries
    
    The in-transition mechanism exists to make it possible to unlock a map
    while still making sure some VM entries won't disappear from under you.
    This is currently used by the VM copyin mechanics.
    
    Entries in this state are better left alone, and extending/coalescing is
    only an optimization, so it makes sense to skip it if the entry to be
    extended is in transition. vm_map_coalesce_entry() already checks for
    this; check for it in other similar places too.
    
    This is in preparation for using the in-transition mechanism for wiring,
    where it's much more important that the entries are not extended while
    in transition.
    Message-ID: <20240405151850.41633-2-bugaevc@gmail.com>

commit 7b87e10b89a190bca0d9e6c5e183984411840dc3
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Fri Apr 5 18:18:48 2024 +0300

    vm: Fix use-after-free in vm_map_pageable_scan()
    
    When operating on the kernel map, vm_map_pageable_scan() does what
    the code itself describes as "HACK HACK HACK HACK": it unlocks the map,
    and calls vm_fault_wire() with the map unlocked. This hack is required
    to avoid a deadlock in case vm_fault or one of its callees (perhaps, a
    pager) needs to allocate memory in the kernel map. The hack relies on
    other kernel code being "well-behaved", in particular on that nothing
    will do any serious changes to this region of memory while the map is
    unlocked, since this region of memory is "owned" by the caller.
    
    This reasoning doesn't apply to the validity of the 'end' entry (the
    first entry after the region to be wired), since it's not a part of the
    region, and is "owned" by someone else. Once the map is unlocked, the
    'end' entry could get deallocated. Alternatively, a different entry
    could get inserted after the VM region in front of 'end', which would
    break the 'for (entry = start; entry != end; entry = entry->vme_next)'
    loop condition.
    
    This was not an issue in the original Mach 3 kernel, since it used an
    address range check for the loop condition, but got broken in commit
    023401c5b97023670a44059a60eb2a3a11c8a929 "VM: rework map entry wiring".
    Fix this by switching the iteration back to use an address check.
    
    This partly fixes a deadlock with concurrent mach_port_names() calls on
    SMP, which was
    
    Reported-by: Damien Zammit <damien@zamaudio.com>
    Message-ID: <20240405151850.41633-1-bugaevc@gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 i386/configfrag.ac  |  6 ++++++
 linux/configfrag.ac |  5 +++++
 vm/vm_map.c         | 57 ++++++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 57 insertions(+), 11 deletions(-)


hooks/post-receive
-- 
GNU Mach



reply via email to

[Prev in Thread] Current Thread [Next in Thread]