grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instan


From: Michael Chang
Subject: Re: [edk2] [PATCH 2/2] efinet: fix lost packets due to active MNP instances
Date: Tue, 21 Apr 2015 14:12:54 +0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Apr 20, 2015 at 02:30:00PM +0800, Michael Chang wrote:
> On Sun, Apr 19, 2015 at 11:01:11AM +0300, Andrei Borzenkov wrote:
> > EDK2 network stack is based on Managed Network Protocol which is layered
> > on top of Simple Management Protocol and does background polling. This
> > polling races with grub for received (and probably trasmitted) packets
> > which causes either serious slowdown or complete failure to load files.
> > 
> > Additionaly PXE driver creates two child devices - IPv4 and IPv6 - with
> > bound SNP instance as well. This means we get three cards for every
> > physical adapter when enumerating. Not only is this confusing, this
> > may result in grub ignoring packets that come in via the "wrong" card.
> > 
> > Example of device hierarchy is
> > 
> >  Ctrl[91] PciRoot(0x0)/Pci(0x3,0x0)
> >    Ctrl[95] PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)
> >      Ctrl[B4] PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0)
> >      Ctrl[BC] 
> > PciRoot(0x0)/Pci(0x3,0x0)/MAC(525400123456,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000)
> > 
> > Skip PXE created virtual devices and open SNP on base device exclusively.
> > This destroys all child MNP instances and stops background polling.  We
> > cannot do it for virtual devices anyway as PXE would probably attempt
> > to destroy them when stopped and current OVMF crashes when we try to do it.
> > 
> > Exclusive open cannot be done when enumerating cards, as it would destroy
> > PXE information we need to autoconfigure interface; and it cannot be done
> > during autoconfiguration as we need to do it for non-PXE boot as well. So
> > move SNP open to card ->open method and add matching ->close to clean up.
> > 
> > References: https://savannah.gnu.org/bugs/?41731
> > ---
> >  grub-core/net/drivers/efi/efinet.c | 147 
> > ++++++++++++++++++++++++++++---------
> >  1 file changed, 112 insertions(+), 35 deletions(-)
> > 
> > diff --git a/grub-core/net/drivers/efi/efinet.c 
> > b/grub-core/net/drivers/efi/efinet.c
> > index f171f20..5777016 100644
> > --- a/grub-core/net/drivers/efi/efinet.c
> > +++ b/grub-core/net/drivers/efi/efinet.c
> > @@ -142,9 +142,72 @@ get_card_packet (struct grub_net_card *dev)
> >    return nb;
> >  }
> >  
> > +static grub_err_t
> > +open_card (struct grub_net_card *dev)
> > +{
> > +  grub_efi_simple_network_t *net;
> 
> I'm not sure about adding null check for dev->efi_net here is necessary
> or not, as we don't close it in close_card so that reopening a closed
> interface would make open_protocol fail with EFI_ACCESS_DENIED and in
> turn makes the entire open_card funtion fail with GRUB_ERR_BAD_DEVICE.
> 
> > +
> > +  net = grub_efi_open_protocol (dev->efi_handle, &net_io_guid,
> > +                           GRUB_EFI_OPEN_PROTOCOL_BY_EXCLUSIVE);
> > +  if (! net)
> > +    /* This should not happen... Why?  */
> > +    return GRUB_ERR_BAD_DEVICE;
> > +
> > +  if (net->mode->state == GRUB_EFI_NETWORK_STOPPED
> > +      && efi_call_1 (net->start, net) != GRUB_EFI_SUCCESS)
> > +    return GRUB_ERR_BAD_DEVICE;
> > +
> > +  if (net->mode->state == GRUB_EFI_NETWORK_STOPPED)
> > +    return GRUB_ERR_BAD_DEVICE;
> > +
> > +  if (net->mode->state == GRUB_EFI_NETWORK_STARTED
> > +      && efi_call_3 (net->initialize, net, 0, 0) != GRUB_EFI_SUCCESS)
> > +    return GRUB_ERR_BAD_DEVICE;
> > +
> > +  dev->mtu = net->mode->max_packet_size;
> > +
> > +  dev->txbufsize = ALIGN_UP (dev->mtu, 64) + 256;
> > +  dev->txbuf = grub_zalloc (dev->txbufsize);
> > +  if (!dev->txbuf)
> > +    {
> > +      grub_print_error ();
> > +      return GRUB_ERR_BAD_DEVICE;
> > +    }
> > +  dev->txbusy = 0;
> > +
> > +  dev->rcvbufsize = ALIGN_UP (dev->mtu, 64) + 256;
> > +
> > +  grub_memcpy (dev->default_address.mac,
> > +          net->mode->current_address,
> > +          sizeof (dev->default_address.mac));
> > +
> > +  dev->efi_net = net;
> > +  return GRUB_ERR_NONE;
> > +}
> > +
> > +static void
> > +close_card (struct grub_net_card *dev)
> > +{
> > +  if (dev->txbuf)
> > +    {
> > +      grub_free (dev->txbuf);
> > +      dev->txbuf = NULL;
> > +    }
> > +
> > +  if (dev->rcvbuf)
> > +    {
> > +      grub_free (dev->rcvbuf);
> > +      dev->rcvbuf = NULL;
> > +    }
> > +
> > +  /* FIXME do we need to close Simple Network Protocol here? */
> 
> The question from me is why not? :)
> 
> If we don't close it, the consequence might prevent other application
> (for eg, the chainloaded one from grub) from using managed network
> protocol or related one ?
> 
> The rest of the patch looks good to me and a lot better than my
> workaround patch. Thanks for you work on this.
> 
> I can give this patch a test to see if it fixed the slowness issue I
> have also experienced in OVMF for IPv4 networking and also together with
> my net_bootp6 patch.

The patch works well with IPv4 as expected, I verfied it to fix the
slowness problem.

But with IPv6, there were some problems found.

1. It failed with "packet too big" in grub_net_send_ip6_packet. The
reason is the card not opened at that time so that the mtu is unset
(zero). That makes the packet size testing against mtu "too big".

The call stack looks like.

grub_net_link_layer_resolve grub_net_icmp6_send_request
grub_net_send_ip_packet grub_net_send_ip6_packet
    
Meanwhile the IPv4 call stack:

grub_net_link_layer_resolve grub_net_arp_send_request
send_ethernet_packet

It calls send_ethernet_packet which do not have the mtu size check and
also the open the card device if not opened.

2. I tried to get it going by adding card open to
grub_net_send_ip6_packet (), but experienced another problem as the link
layer resolve timeout due to the network interface's hardware mac
address unset (inf->hwaddress.mac). After some debugging I realized
that's in my net_bootp6 patch, the network interface is added during
handling of cached dhcpv6 reply packets using the hardware mac provided
by the card instance, which is again not yet opened.:(

The reason why IPv4 works is because it doesn't use card instance's mac
address but mac_addr from the bootp's ack packet.

It's possble to find mac_addr from the DHCPv6 Reply packets, but it's
unreliable you will always have one. I think it's related to what DUID
type is using in the CLIENT_ID option. The spec defines

DUID-LLT  1 DUID-EN   2 DUID-LL   3

With DUID-LLT, DUID-LL you can read the mac_addr (and type) but with
DUID-EN you are out of luck. The final round made me surrender is that
the cached reply packet in UEFI, the DUID seems to be undefined type
"4"..

3. Even I can add the card open earler before hadling the dhcpv6
packets, it will freeze at grub_efi_open_protocol if the option in use
is GRUB_EFI_OPEN_PROTOCOL_BY_EXCLUSIVE. I was actually using
GRUB_EFI_OPEN_PROTOCOL_GET_PROTOCOL in the entire test and am running
out of idea how to deal with that diversity.

Thanks,
Michael



reply via email to

[Prev in Thread] Current Thread [Next in Thread]