bug-grub
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Strange behavior with grub (and disk failure)


From: Laurent Michel
Subject: Strange behavior with grub (and disk failure)
Date: 06 Jul 2001 12:24:02 -0400
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7

Hi!

I am having some troubles installing grub on my HD. I think  I ran
into bugs. I ran grub in gdb to see what is going on and I collected
quite a bit of information.  

First, here is the system description:

Hardware: MB: Abit KT7-Raid
          CPU: Athlon TB 1.1Ghz
          RAM: 256
          kernel: 2.4.4
          
The machine has 3 HD, 2 of them on ide0 (hda/hdb). The system normally
boots from hda. The third disk is on an ATARAID interface, i.e., hde
on ide2. 

The linux root partition is on hde1

I did setup  grub on a floppy first, and this works great. Right now,
I am trying to install it on the hardrive. 

Here is the partition  layout
hda(hda1) : W2K
hdb(hdb1,hdb5): hdb1: reiserfs, hdb5 ext2
hd2(hde1,hde2): hde1: reiserfs, hde2 reiserfs

Here is how I tried to setup grub:

root(hd2,0)  ;; root fs is on hde1
setup(hd0)   ;; install grub in MBR of hda

When I run grub, I can a disk write error as this output shows:


 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/reiserfs_stage1_5" exists... yes
 Running "embed /boot/grub/reiserfs_stage1_5 (hd0)"...  18 sectors are embedded
.
succeeded
 Running "install /boot/grub/stage1 d (hd0) (hd0)1+18 p (hd2,0)/boot/grub/stage
2"... failed

Error 29: Disk write error


I then went into gdb to debug it out. The failure happens when the
system tries to write to hde1

Here is the stack trace:

#0  write_to_partition (map=0x806b150, drive=130, partition=65535, 
sector=1472473, size=1, buf=0x401dce00 "êp\202") at device.c:589
#1  0x805451f in install_func (arg=0x401ccd78 "/boot/grub/stage1 d (hd0) 
(hd0)1+18 p (hd2,0)/boot/grub/stage2", flags=1)
    at builtins.c:1919
#2  0x8055f04 in setup_func (arg=0x4017bc4e "(hd0)", flags=1) at builtins.c:3581
#3  0x80566b5 in enter_cmdline (heap=0x4017bc48 "setup (hd0)", forever=1) at 
cmdline.c:168
#4  0x80527ca in cmain () at stage2.c:907
#5  0x804a606 in init_bios_info () at common.c:282
#6  0x80495da in doit () at asmstub.c:120
#7  0x80497d2 in grub_stage2 () at asmstub.c:176
#8  0x8049596 in main (argc=1, argv=0xbffffc2c) at main.c:238
#9  0x4007bbcc in __libc_start_main () from /lib/libc.so.6

I stepped into write_partition to see what was going on. I stumbled
upon this piece of code:

577     #else
578       {
579         off_t offset = (off_t) sector * (off_t) SECTOR_SIZE;
580     
581         if (lseek (fd, offset, SEEK_SET) != offset)
582           {
583             errnum = ERR_DEV_VALUES;
584             return 0;
(gdb) l
585           }
586       }
587     #endif
588       
589       if (write (fd, buf, size * SECTOR_SIZE) != (size * SECTOR_SIZE))
590         {
591           close (fd);
592           errnum = ERR_WRITE;
593           return 0;
594         }

The routine is acting funny right away.

At the call site (of write_to_partition) the value passed in is 
saved_sector - part_start


(gdb) p saved_sector - part_start
$3 = 1472473

And this is consistent with what we see on the stack.

However, inside the routine, the first thing I see is:

(gdb) p sector
$1 = 65535

Even, if I assume that the debugger is somehow confused, 


The interesting parts starts at line 579. sector is an int(4 bytes)
and is equal to 

(gdb) p sector
$16 = 32768
(gdb) whatis sector
type = int
(gdb) p sizeof(sector)
$17 = 4

SECTOR_SIZE is #define'd to 0x200 (512 block size)

So, the next exerpt from gdb is interseting:

579         off_t offset = (off_t) sector * (off_t) SECTOR_SIZE;
(gdb) n
581         if (lseek (fd, offset, SEEK_SET) != offset)
(gdb) p offset
$4 = 4619790794267537920

which is surprising.... as the actual product ought to be 753906176

Now, take a look at this:

(gdb) whatis offset
type = off_t
(gdb) p sizeof(offset)
$5 = 8
(gdb) p sizeof(off_t)
$6 = 8


This got me by surprise, so I checked with a small C program and got
an off_t with size 4. 

#include <stdio.h>
#define SECTOR_SIZE 0x200
int main() 
{
   int sector = 1472473;
   off_t offset = (off_t)sector * (off_t)SECTOR_SIZE;
   printf("%ld",offset);
   return 0;
}

So I am a little confused here.   Note that the seek actually succeeds
but the write call fails returning -1 and errno is 9 (BAD File Number)
Note that the argument to the open call was:

(gdb) p dev
$10 = 
"/dev/address@hidden@address@hidden@address@hidden&@address@hidden&@address@hidden&@"


Note that the file is opened O_RDONLY and we are trying to write! So
this code has me completely confused. 

Would you be so kind as to tell me what is going on exactly ? 

BTW, I am using gdb 5.0, grub (0.5.96) was compiled by me, from source
as follow:

./configure --prefix=/usr
make;make install

gcc is the following version:

thorgal:/usr/local/src/grub-0.5.96/grub# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.2/specs
gcc version 2.95.2 20000220 (Debian GNU/Linux)

from  standard stable potato 2.2r3 distrib.

hde1,hde2 are both mounted when trying to execute the grub
commands. Here is the output of fdisk on /dev/hde (that may help)

Command (m for help): p

Disk /dev/hde: 16 heads, 63 sectors, 39870 cylinders
Units = cylinders of 1008 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hde1   *         1     10159   5120104+  83  Linux
/dev/hde2         10160     39870  14974344   83  Linux

Same thing for hda

Command (m for help): p

Disk /dev/hda: 128 heads, 63 sectors, 935 cylinders
Units = cylinders of 8064 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1       935   3769888+   b  Win95 FAT32


My kernel is 2.4.4:
thorgal:/usr/local/src/grub-0.5.96/grub# uname -a
Linux thorgal 2.4.4 #19 Tue Jul 3 18:45:47 EDT 2001 i686 unknown

And the following modules are loaded:

thorgal:/usr/local/src/grub-0.5.96/grub# lsmod
Module                  Size  Used by
ide-floppy              9584   0  (autoclean)
ipt_state                960   3  (autoclean)
ipt_limit               1200  29  (autoclean)
iptable_filter          2080   0  (autoclean) (unused)
iptable_mangle          2048   0  (unused)
ipt_LOG                 3472   1 
ipt_MIRROR              1312   0  (unused)
ipt_MASQUERADE          1488   1 
ipt_TOS                 1248   0  (unused)
ipt_REDIRECT            1088   0  (unused)
iptable_nat            15184   0  [ipt_MASQUERADE ipt_REDIRECT]
ipt_REJECT              3328   0  (unused)
ip_tables              10432  13  [ipt_state ipt_limit iptable_filter 
iptable_mangle ipt_LOG ipt_MIRROR ipt_MASQUERADE ipt_TOS ipt_REDIRECT 
iptable_nat ipt_REJECT]
ip_conntrack           14240   2  [ipt_state ipt_MASQUERADE ipt_REDIRECT 
iptable_nat]
via686a                 8160   0  (unused)
eeprom                  3216   0  (unused)
adm1021                 5600   0  (unused)
sensors                 6144   0  [via686a eeprom adm1021]
i2c-isa                 1200   0  (unused)
i2c-viapro              3936   0  (unused)
i2c-core               13072   0  [via686a eeprom adm1021 sensors i2c-isa 
i2c-viapro]
rtc                     5376   0  (autoclean)
nls_iso8859-1           2864   0  (unused)
nls_cp437               4384   0  (unused)
vfat                    9104   0  (unused)
fat                    31488   0  [vfat]


I would appreciate any form of feedback. 

Thanks a lot,

-- 
  Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]