grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V2] ieee1275/ofdisk: retry on open and read failure


From: Daniel Kiper
Subject: Re: [PATCH V2] ieee1275/ofdisk: retry on open and read failure
Date: Wed, 5 Apr 2023 18:41:53 +0200
User-agent: NeoMutt/20170113 (1.7.2)

First of all, please start new thread when you post next version of the patch.

On Wed, Mar 29, 2023 at 11:00:13AM +0530, Mukesh Kumar Chaurasiya wrote:
> Sometimes, when booting from a very busy SAN, the access to the
> disk can fail and then grub will eventually drop to grub prompt.
> This scenario is more frequent when deploying many machines at
> the same time using the same SAN.
> This patch aims to force the ofdisk module to retry the open or
> read function after it fails. We use DEFAULT_RETRY_TIMEOUT, which
> is 15000ms or 15 seconds to specify the time it'll retry to
> access the disk before it definitely fails. The timeout can
> be changed by setting the environment variable
> ofdisk_retry_timeout.
>
> Signed-off-by: Diego Domingos <diegodo@linux.vnet.ibm.com>
> Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.vnet.ibm.com>
> ---
>  grub-core/disk/ieee1275/ofdisk.c | 65 +++++++++++++++++++++++++++++++-
>  1 file changed, 63 insertions(+), 2 deletions(-)
>
> diff --git a/grub-core/disk/ieee1275/ofdisk.c 
> b/grub-core/disk/ieee1275/ofdisk.c
> index c6cba0c8a..f4183a531 100644
> --- a/grub-core/disk/ieee1275/ofdisk.c
> +++ b/grub-core/disk/ieee1275/ofdisk.c
> @@ -24,6 +24,9 @@
>  #include <grub/ieee1275/ofdisk.h>
>  #include <grub/i18n.h>
>  #include <grub/time.h>
> +#include <grub/env.h>
> +
> +#define RETRY_DEFAULT_TIMEOUT 15000
>
>  static char *last_devpath;
>  static grub_ieee1275_ihandle_t last_ihandle;
> @@ -452,7 +455,7 @@ compute_dev_path (const char *name)
>  }
>
>  static grub_err_t
> -grub_ofdisk_open (const char *name, grub_disk_t disk)
> +grub_ofdisk_open_real (const char *name, grub_disk_t disk)
>  {
>    grub_ieee1275_phandle_t dev;
>    char *devpath;
> @@ -525,6 +528,41 @@ grub_ofdisk_open (const char *name, grub_disk_t disk)
>    return 0;
>  }
>
> +static grub_uint64_t
> +grub_ofdisk_disk_timeout(void)

s/grub_ofdisk_disk_timeout(void)/grub_ofdisk_disk_timeout (void)/

> +{
> +   if(grub_env_get("ofdisk_retry_timeout") != NULL)

if (grub_env_get ("ofdisk_retry_timeout") != NULL)

> +     {
> +     grub_uint64_t retry = 
> grub_strtoul(grub_env_get("ofdisk_retry_timeout"), 0, 10);

Please move variable declaration to the beginning of the function.

And you blindly assume grub_strtoul() call will not fail. This is not
true. Please fix it. The commit ac8a37dda (net/http: Allow use of
non-standard TCP/IP ports) is good example how it should be done.

And please document ofdisk_retry_timeout in the docs/grub.texi.
I think ofdisk_retry_timeout should be expressed in seconds.

> +     if(retry)
> +       return retry;
> +     }
> +
> +   return RETRY_DEFAULT_TIMEOUT;

In general please stick to the GRUB coding style [1]. You can also take
a look at grub-core/kern/efi/sb.c.

> +}
> +
> +static grub_err_t
> +grub_ofdisk_open (const char *name, grub_disk_t disk)
> +{
> +  grub_err_t err;
> +  grub_uint64_t timeout = grub_get_time_ms () + grub_ofdisk_disk_timeout();
> +
> + retry:
> +  err = grub_ofdisk_open_real (name, disk);
> +
> +  if (err == GRUB_ERR_UNKNOWN_DEVICE)
> +    {
> +      if (grub_get_time_ms () < timeout)
> +        {
> +          grub_dprintf ("ofdisk","Failed to open disk %s. Retrying...\n", 
> name);
> +          grub_errno = GRUB_ERR_NONE;
> +          goto retry;

Please use while() or for() instead of goto.

And are you sure you want retry at full speed? Should not you introduce,
e.g., 1s delay between subsequent grub_ofdisk_open_real() calls?

> +     }
> +    }
> +
> +  return err;
> +}
> +
>  static void
>  grub_ofdisk_close (grub_disk_t disk)
>  {
> @@ -568,7 +606,7 @@ grub_ofdisk_prepare (grub_disk_t disk, grub_disk_addr_t 
> sector)
>  }
>
>  static grub_err_t
> -grub_ofdisk_read (grub_disk_t disk, grub_disk_addr_t sector,
> +grub_ofdisk_read_real (grub_disk_t disk, grub_disk_addr_t sector,
>                 grub_size_t size, char *buf)
>  {
>    grub_err_t err;
> @@ -587,6 +625,29 @@ grub_ofdisk_read (grub_disk_t disk, grub_disk_addr_t 
> sector,
>    return 0;
>  }
>
> +static grub_err_t
> +grub_ofdisk_read (grub_disk_t disk, grub_disk_addr_t sector,
> +               grub_size_t size, char *buf)
> +{
> +  grub_err_t err;
> +  grub_uint64_t timeout = grub_get_time_ms () + grub_ofdisk_disk_timeout();
> +
> + retry:
> +  err = grub_ofdisk_read_real (disk, sector, size, buf);
> +
> +  if (err == GRUB_ERR_READ_ERROR)
> +    {
> +      if (grub_get_time_ms () < timeout)
> +        {
> +          grub_dprintf ("ofdisk","Failed to read disk %s. Retrying...\n", 
> (char*)disk->data);
> +          grub_errno = GRUB_ERR_NONE;
> +          goto retry;

Same comments as above...

Daniel

[1] https://www.gnu.org/software/grub/manual/grub-dev/grub-dev.html#Coding-style



reply via email to

[Prev in Thread] Current Thread [Next in Thread]