[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V2] ieee1275/ofdisk: retry on open and read failure
From: |
Daniel Kiper |
Subject: |
Re: [PATCH V2] ieee1275/ofdisk: retry on open and read failure |
Date: |
Wed, 5 Apr 2023 18:41:53 +0200 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
First of all, please start new thread when you post next version of the patch.
On Wed, Mar 29, 2023 at 11:00:13AM +0530, Mukesh Kumar Chaurasiya wrote:
> Sometimes, when booting from a very busy SAN, the access to the
> disk can fail and then grub will eventually drop to grub prompt.
> This scenario is more frequent when deploying many machines at
> the same time using the same SAN.
> This patch aims to force the ofdisk module to retry the open or
> read function after it fails. We use DEFAULT_RETRY_TIMEOUT, which
> is 15000ms or 15 seconds to specify the time it'll retry to
> access the disk before it definitely fails. The timeout can
> be changed by setting the environment variable
> ofdisk_retry_timeout.
>
> Signed-off-by: Diego Domingos <diegodo@linux.vnet.ibm.com>
> Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.vnet.ibm.com>
> ---
> grub-core/disk/ieee1275/ofdisk.c | 65 +++++++++++++++++++++++++++++++-
> 1 file changed, 63 insertions(+), 2 deletions(-)
>
> diff --git a/grub-core/disk/ieee1275/ofdisk.c
> b/grub-core/disk/ieee1275/ofdisk.c
> index c6cba0c8a..f4183a531 100644
> --- a/grub-core/disk/ieee1275/ofdisk.c
> +++ b/grub-core/disk/ieee1275/ofdisk.c
> @@ -24,6 +24,9 @@
> #include <grub/ieee1275/ofdisk.h>
> #include <grub/i18n.h>
> #include <grub/time.h>
> +#include <grub/env.h>
> +
> +#define RETRY_DEFAULT_TIMEOUT 15000
>
> static char *last_devpath;
> static grub_ieee1275_ihandle_t last_ihandle;
> @@ -452,7 +455,7 @@ compute_dev_path (const char *name)
> }
>
> static grub_err_t
> -grub_ofdisk_open (const char *name, grub_disk_t disk)
> +grub_ofdisk_open_real (const char *name, grub_disk_t disk)
> {
> grub_ieee1275_phandle_t dev;
> char *devpath;
> @@ -525,6 +528,41 @@ grub_ofdisk_open (const char *name, grub_disk_t disk)
> return 0;
> }
>
> +static grub_uint64_t
> +grub_ofdisk_disk_timeout(void)
s/grub_ofdisk_disk_timeout(void)/grub_ofdisk_disk_timeout (void)/
> +{
> + if(grub_env_get("ofdisk_retry_timeout") != NULL)
if (grub_env_get ("ofdisk_retry_timeout") != NULL)
> + {
> + grub_uint64_t retry =
> grub_strtoul(grub_env_get("ofdisk_retry_timeout"), 0, 10);
Please move variable declaration to the beginning of the function.
And you blindly assume grub_strtoul() call will not fail. This is not
true. Please fix it. The commit ac8a37dda (net/http: Allow use of
non-standard TCP/IP ports) is good example how it should be done.
And please document ofdisk_retry_timeout in the docs/grub.texi.
I think ofdisk_retry_timeout should be expressed in seconds.
> + if(retry)
> + return retry;
> + }
> +
> + return RETRY_DEFAULT_TIMEOUT;
In general please stick to the GRUB coding style [1]. You can also take
a look at grub-core/kern/efi/sb.c.
> +}
> +
> +static grub_err_t
> +grub_ofdisk_open (const char *name, grub_disk_t disk)
> +{
> + grub_err_t err;
> + grub_uint64_t timeout = grub_get_time_ms () + grub_ofdisk_disk_timeout();
> +
> + retry:
> + err = grub_ofdisk_open_real (name, disk);
> +
> + if (err == GRUB_ERR_UNKNOWN_DEVICE)
> + {
> + if (grub_get_time_ms () < timeout)
> + {
> + grub_dprintf ("ofdisk","Failed to open disk %s. Retrying...\n",
> name);
> + grub_errno = GRUB_ERR_NONE;
> + goto retry;
Please use while() or for() instead of goto.
And are you sure you want retry at full speed? Should not you introduce,
e.g., 1s delay between subsequent grub_ofdisk_open_real() calls?
> + }
> + }
> +
> + return err;
> +}
> +
> static void
> grub_ofdisk_close (grub_disk_t disk)
> {
> @@ -568,7 +606,7 @@ grub_ofdisk_prepare (grub_disk_t disk, grub_disk_addr_t
> sector)
> }
>
> static grub_err_t
> -grub_ofdisk_read (grub_disk_t disk, grub_disk_addr_t sector,
> +grub_ofdisk_read_real (grub_disk_t disk, grub_disk_addr_t sector,
> grub_size_t size, char *buf)
> {
> grub_err_t err;
> @@ -587,6 +625,29 @@ grub_ofdisk_read (grub_disk_t disk, grub_disk_addr_t
> sector,
> return 0;
> }
>
> +static grub_err_t
> +grub_ofdisk_read (grub_disk_t disk, grub_disk_addr_t sector,
> + grub_size_t size, char *buf)
> +{
> + grub_err_t err;
> + grub_uint64_t timeout = grub_get_time_ms () + grub_ofdisk_disk_timeout();
> +
> + retry:
> + err = grub_ofdisk_read_real (disk, sector, size, buf);
> +
> + if (err == GRUB_ERR_READ_ERROR)
> + {
> + if (grub_get_time_ms () < timeout)
> + {
> + grub_dprintf ("ofdisk","Failed to read disk %s. Retrying...\n",
> (char*)disk->data);
> + grub_errno = GRUB_ERR_NONE;
> + goto retry;
Same comments as above...
Daniel
[1] https://www.gnu.org/software/grub/manual/grub-dev/grub-dev.html#Coding-style
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH V2] ieee1275/ofdisk: retry on open and read failure,
Daniel Kiper <=