[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: parted - larger logical and physical block sizes on GPT disks
From: |
Elliott, Robert (Server Storage) |
Subject: |
RE: parted - larger logical and physical block sizes on GPT disks |
Date: |
Fri, 4 Nov 2005 08:51:42 -0600 |
There is a web site with presentations from Hitachi, Intel, LSI,
Microsoft, Seagate, and WD discussing the topic:
http://www.bigsector.org.
--
Rob Elliott, address@hidden
Hewlett-Packard Industry Standard Server Storage Advanced Technology
https://ecardfile.com/id/RobElliott
> -----Original Message-----
> From: K.G. [mailto:address@hidden
> Sent: Friday, November 04, 2005 8:22 AM
> To: Elliott, Robert (Server Storage); address@hidden;
> Sven Luther
> Cc: address@hidden; address@hidden
> Subject: Re: parted - larger logical and physical block sizes
> on GPT disks
>
> Hi,
>
> Only my opinion:
>
> On Wed, 2 Nov 2005 15:18:53 -0600 "Elliott, Robert (Server
> Storage)" <address@hidden> wrote:
> > I noticed a few things in parted's source code that might warrant
> > fixing, particularly for the GUID Partition Table (GPT)
> partition format
> > used by Extensible Firmware Interface (EFI) systems. The
> Unified EFI
> > Specification will discuss these issues.
> >
> > 1. Logical block sizes are not necessary 512 bytes; they
> could be 1024,
> > 2048, or 4096 bytes (at least). Both the ATA and the SCSI
> block command
> > sets support this. ATA devices typically do not implement it; SCSI
> > logical units sometimes do. The code has "512" sprinkled
> throughout,
> > which will probably cause problems.
>
> Well, the sector size is a known problem in Parted. Until recently we
> didn't receive much reports about it, because probably about
> 99.9999999%
> of people are using 512 bytes sectors. But now multiples of 512 bytes
> are beginning to be seen sometimes (in raid systems? very big
> disk?) and
> Parted is mostly unusable when that happens.
> I believe this should be fixed in the whole program, but unfortunately
> this probably would involve a lot of work.
> Also some file system or disk labels are only described for 512 bytes
> sectors, so this might be a problem. In the disk_atari.c I've
> recently written, I explicitly discard atari disklabel probing if the
> sector size isn't 512; I guess we should probably add those kinds of
> tests for problematic FS/disklabels.
>
> > 2. Even if the logical block size is 512 bytes, the
> underlying physical
> > block size may be a multiple of that. The drive performs
> > read-modify-write when a full physical block is not
> accessed, incurring
> > a performance hit but maintaining compatibility with
> software that uses
> > 512 byte logical blocks.
> >
> > Serial ATA disks are expected to start doing this soon;
> their physical
> > block may contain 1, 2, 4, or 8 logical blocks (the ATA
> IDENTIFY DEVICE
> > command indicates how many). SCSI doesn't have a way to report this
> > type of behavior yet (it has always assumed that software
> would support
> > a larger logical block size) but it might be added to match ATA.
>
> Interesting. I guess this divides data structures in 2 sets: old ones
> which aren't aware of the logical vs physical disk block size
> issue will
> only consider logical sizes - and new ones like GPT which
> handle it fine
> with size fields and backward compatibility with systems that
> don't probe
> the physical sector size, right?
> (indeed there's a third set: the ones that just assume 512
> bytes sectors)
>
> > In this situation, it is important to align important
> structures like
> > partition boundaries on the physical block boundaries; if they are
> > unaligned, then accesses that are aligned to the start of
> the partition
> > will actually result in excessive read-modify-writes by the disk.
> >
> > For the GPT partition format, the first partition naturally
> starts on
> > LBA 34, which is fine for 512 and 1024 byte physical block
> sizes but not
> > good for 2048 or 4096 byte physical block sizes. Partition
> tools like
> > parted should, unless specifically requested otherwise by a
> > knowledgeable user, start aligning their GPT partitions on larger
> > boundaries (e.g. 128KiB would suffice for many years).
>
> I believe this could be easily done with a constraint in
> *_partition_align
> functions. I think that when we get the logical and physical
> disk block
> size and handle the logical size cleanly, we should put that kind of
> alignment for most disklabels (even if they know nothing about
> logical/physical sector sizes) but this might be a problem if
> "cylinder"
> alignment is needed.
>
> > Excerpts from disk_gpt.c that might have problems:
> > typedef struct _GuidPartitionTableHeader_t {
> > uint64_t Signature;
> > uint32_t Revision;
> > uint32_t HeaderSize;
> > uint32_t HeaderCRC32;
> > uint32_t Reserved1;
> > uint64_t MyLBA;
> > uint64_t AlternateLBA;
> > uint64_t FirstUsableLBA;
> > uint64_t LastUsableLBA;
> > efi_guid_t DiskGUID;
> > uint64_t PartitionEntryLBA;
> > uint32_t NumberOfPartitionEntries;
> > uint32_t SizeOfPartitionEntry;
> > uint32_t PartitionEntryArrayCRC32;
> > uint8_t Reserved2[512 - 92];
> > } __attribute__ ((packed)) GuidPartitionTableHeader_t;
> >
> > Comment: The header (and its Reserved2 field) actually fills up the
> > entire logical block, not just 512.
> >
> > data_start = 2 + GPT_DEFAULT_PARTITION_ENTRY_ARRAY_SIZE / 512;
> > data_end = dev->length - 2
> > - GPT_DEFAULT_PARTITION_ENTRY_ARRAY_SIZE / 512;
> >
> > Comment: The logical block size is not always 512 bytes.
> > Comment: This probably leads to the first partition
> starting at LBA 34,
> > which is not aligned for 2048 or 4096 byte sectors.
> >
> >
> > if (!ped_device_read (dev, gpt, sector,
> > sizeof (GuidPartitionTableHeader_t) /
> > 512))
> >
> > Comment: The logical block size is not always 512 bytes.
> >
> > if ((PedSector) PED_LE64_TO_CPU (gpt.AlternateLBA)
> > < disk->dev->length - 1) {
> > char zeros[512];
> >
> > #ifndef DISCOVER_ONLY
> > if (ped_exception_throw (
> > PED_EXCEPTION_ERROR,
> > PED_EXCEPTION_FIX |
> > PED_EXCEPTION_CANCEL,
> > _("The backup GPT table is not at the end of the disk,
> > as it "
> > "should be. This might mean that another operating
> > system "
> > "believes the disk is smaller. Fix, by moving the
> > backup "
> > "to the end (and removing the old backup)?"))
> > == PED_EXCEPTION_CANCEL)
> > goto error;
> >
> > write_back = 1;
> > memset (zeros, 0, 512);
> > ped_device_write (disk->dev, zeros,
> > PED_LE64_TO_CPU
> > (gpt.AlternateLBA),
> > 1);
> > #endif /* !DISCOVER_ONLY */
> > Comment: The logical block size is not always 512 bytes.
> >
> > ...
> > etc. (search on "512" to find likely problems)
>
> As I said before, disk_gpt.c is only a small part of the problem... :/
>
> Cheers,
> Guillaume Knispel
>