你的位置:首页 > 操作系统

[操作系统]分区容量大于16TB的格式化

File systems do have limits. Thats no surprise. ext3 had a limit at 16 TB file system size. If you needed more space you´d have to use another file system for instance XFS or JFS or spilt the capacity into multiple mount points.

ext4 was designed to allow far more larger file systems than ext3. According to wikipedia ext4 has a maximum file system size of 1 EiB (approx. one exabyte or 1024 TB).

Now if you´d try to create one single large file system with ext4 on every linux distribution out there (including OEL 6.1; as of 18th August 2011) you will end up with:

[[email protected] ~]# mkfs.ext4 /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)mkfs.ext4: Size of device /dev/iscsi/test too big to be expressed in 32 bit susing a blocksize of 4096.

This post is about how to solve the issue.

 

The demo system

My demo system consists of one large LUNof 18 TB encapsulated in LVM with a logical volume of 17 TB on a Oracle Enterprise Linux (OEL 5.5):

[[email protected] ~]# uname -aLinux localhost.localdomain 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[[email protected] ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[[email protected] ~]# fdisk -l /dev/sdbDisk /dev/sdb: 19791.2 GB, 19791209299968 bytes255 heads, 63 sectors/track, 2406144 cylinders Units = cylinders of 16065 * 512 = 8225280 bytesDisk /dev/sdb doesn't contain a valid partition table [[email protected] ~]# vgdisplay iscsi--- Volume group ---VG Name               iscsiSystem IDFormat                lvm2Metadata Areas        1Metadata Sequence No  2VG Access             read/writeVG Status             resizableMAX LV                0Cur LV                1Open LV               0Max PV                0Cur PV                1Act PV                1VG Size               18.00 TBPE Size               4.00 MBTotal PE              4718591Alloc PE / Size       4456448 / 17.00 TBFree  PE / Size       262143 / 1024.00 GBVG UUID               tdi4f2-3ZYr-c1P0-NuSl-i3w2-5qQl-K75guj
[[email protected] ~]# lvdisplay iscsi--- Logical volume ---LV Name                /dev/iscsi/testVG Name                iscsiLV UUID                8q1UrT-ludC-FEkT-NExO-4Gzd-cn5H-FYJcB1LV Write Access        read/writeLV Status              available# open                 0LV Size                17.00 TBCurrent LE             4456448Segments               1Allocation             inheritRead ahead sectors     auto- currently set to     256Block device           253:2

Creating file systems  larger than 16TB with ext4:

If you try to create a ext4 file system on the 17 TB logical volume:

[[email protected] ~]# mkfs.ext4 /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)mkfs.ext4: Size of device /dev/iscsi/test too big to be expressed in 32 bit susing a blocksize of 4096.

OK. Maybe with ext4dev:

[[email protected] ~]# mkfs.ext4dev /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)mkfs.ext4dev: Size of device /dev/iscsi/test too big to be expressed in 32 bits using a blocksize of 4096.

Nope – no success. The reason behind that are the e2fsprogs (or how they are called on OEL: e4fsprogs) are not able to deal with file systems larger than ~ 16 TB.

To be specific: Even with the most recent e2fsprogs 1.41.14 there is no way to create file systems larger than 16 TB.

But: According to this post it should work since June:

It’s taken way too long, but I’ve finally finished integrating the 64-bit patches into e2fsprogs’s mainline repository. All of the necessary patches should now be in the master branch for e2fsprogs. The big change from before is that I replaced Val’s changes for fixing up how mke2fs picked the correct fs-type profile from mke2fs.conf with something that I think works much better and leaves the code much cleaner. With this change you need to add the following to your /etc/mke2fs.conf file if you want to enable the 64-bit feature flag automatically for a big disk:

[fs_types] ext4 = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
auto_64-bit_support = 1 # <—- add this line
inode_size = 256
}

Alternatively you can change the features line to include the feature “64bit”; this will force the use of the 64-bit fields, and double the size of the block group descriptors, even for smaller file systems that don’t require the 64-bit support. (This was one of my problems with Val’s implementation; it forced the mke2fs.conf file to always enable the 64-bit feature flag, which would cause backwards compatibility issues.) This might be a good thing to do for debugging purposes, though, so this is an option which I left open, but the better way of doing things is to use the auto_64-bit-support flag.

So the change must be there. A short look at the ‘WIP’ (work-in-progress) branch of the e2fsprogrs confirmed the integration.

So i tried to build the most recent e2fsprogs (Remeber: This are *development* tools – use at your OWN RISK):

[[email protected] ~] git clone git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git[[email protected] ~]# cd e2fsprogs[[email protected] e2fsprogs]# mkdir build ; cd build/[[email protected] build]# ../configure[[email protected] build]# make[[email protected] build]# make install

So let´s try to create a file system:

[[email protected] misc]# ./mke2fs -O 64bit,has_journal,extents,huge_file,flex_bg, \uninit_bg,dir_nlink,extra_isize -i 4194304 /dev/iscsi/test mke2fs 1.42-WIP (02-Jul-2011)Filesystem label=OS type: LinuxBlock size=4096 (log=2)Fragment size=4096 (log=2)Stride=0 blocks, Stripe width=0 blocks4456448 inodes, 4563402752 blocks228170137 blocks (5.00%) reserved for the super userFirst data block=0Maximum filesystem blocks=6710886400139264 block groups32768 blocks per group, 32768 fragments per group32 inodes per group
Superblock backups stored on blocks:32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,2560000000, 3855122432Allocating group tables: doneWriting inode tables: doneCreating journal (32768 blocks): doneWriting superblocks and filesystem accounting information: doneThis filesystem will be automatically checked every 0 mounts or 0 days,whichever comes first.  Use tune2fs -c or -i to override.

OK. Seems to have worked. Lets check it:

[[email protected] misc]# mount /dev/iscsi/test /mnt
[[email protected] misc]# df -hFilesystem                   Size Used Avail Use% Mounted on/dev/mapper/VolGroup00-LogVol00     18G  2.6G   14G  16% //dev/sda1                    99M 13M  82M   14% /boottmpfs                        502M 0   502M   0% /dev/shm/dev/mapper/iscsi-test          17T  229M   17T   1% /mnt
[[email protected] misc]# mount | grep mnt/dev/mapper/iscsi-test on /mnt type ext4 (rw)

As you can see: With the most recent development e2fsprogrs it is possible to create ext4 file systems larger than 16 TB.

I even tried it with a 50 TB file system (because thats what i needed i my use case):

[[email protected] misc]# df -hFilesystem                          Size Used Avail Use% Mounted on/dev/mapper/iscsi-test        50T  237M   48T   1% /mnt

Update:

Today i tested some more user space tools.

fsck

Maybe the most important tool in case the journaling fails. I copied some data to the file system (roughly about 2 TB) and had 73% of my 6.5 million inodes (one inode per 8 MB) allocated. Running fsck on my demo system with 1 GB memory yields:

[[email protected] ~]# fsck.ext4 -f /dev/iscsi/teste2fsck 1.42-WIP (02-Jul-2011)Pass 1: Checking inodes, blocks, and sizesError allocating block bitmap (4): Memory allocation failed

fsck is some kind of messy with memory. Increasing the memory to 8 GB did it. While running fsck i noticed a memory consumption of up to 3.4 GB! So large file systems require a lot of memory for fscking. It requires even more memory with more inodes!

resize2fs

After fscking my file system i tried to resize it:

[[email protected] sbin]# lvresize -l +7199 /dev/iscsi/test  Extending logical volume test to 50.00 TB  Logical volume test successfully resized
[[email protected] sbin]# resize2fs /dev/iscsi/testresize2fs 1.42-WIP (02-Jul-2011)resize2fs: New size too large to be expressed in 32 bits

As you can see resizing the file system is not yet supported/implemented. So it would be wise to create the file system with the final size from start since growing is NOT possible!

tune2fs

tune2fs seems to work – at least it dumps the suberblock contents:

[[email protected] sbin]# tune2fs -l /dev/iscsi/testtune2fs 1.42-WIP (02-Jul-2011)Filesystem volume name:   <none>Last mounted on:          /mnt/mntFilesystem UUID:          a754e947-8b89-415d-909d-000e6c95c44aFilesystem magic number:  0xEF53Filesystem revision #:    1 (dynamic)Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isizeFilesystem flags:         signed_directory_hashDefault mount options:    user_xattr aclFilesystem state:         cleanErrors behavior:          ContinueFilesystem OS type:       LinuxInode count:              6550000Block count:              13414400000Reserved block count:     670720000Free blocks:              13394134177Free inodes:              1484526First block:              0Block size:               4096Fragment size:            4096Reserved GDT blocks:      1024Blocks per group:         32768Fragments per group:      32768Inodes per group:         16Inode blocks per group:   1Flex block group size:    16Filesystem created:       Wed Oct 19 17:09:06 2011Last mount time:          Wed Oct 19 18:45:47 2011Last write time:          Wed Oct 19 18:45:47 2011Mount count:              1Maximum mount count:      20Last checked:             Wed Oct 19 18:35:36 2011Check interval:           0 (<none>)Lifetime writes:          2511 MBReserved blocks uid:      0 (user root)Reserved blocks gid:      0 (group root)First inode:              11Inode size:               256Required extra isize:     28Desired extra isize:      28Journal inode:            8Default directory hash:   half_md4Directory Hash Seed:      ea117174-a04a-412e-a067-7972804f83d7Journal backup:           inode blocks

Setting properties works as well:

[[email protected] sbin]# tune2fs -L test /dev/iscsi/testtune2fs 1.42-WIP (02-Jul-2011)
[[email protected] sbin]# tune2fs -l /dev/iscsi/test | head -10tune2fs 1.42-WIP (02-Jul-2011)Filesystem volume name:   testLast mounted on:          /mnt/mnt[...]

e4defrag

e4defrag is a new tool to defragment the ext4 file system. According to the man page:

e4defrag  reduces  fragmentation of extent based file. The file targeted by e4defrag is created on ext4 filesystem made with “-O extent” option (see  mke2fs(8)).   The  targeted  file gets more contiguous blocks and improves the file access speed.

I am not yet sure how this affects file systems used for oracle datafiles. All i can say is that e4defrag seems to work with >16 TB file systems:

 

[[email protected] sbin]# e4defrag /mnt/ext4 defragmentation for directory(/mnt/)[....]        Success:                        [ 4772040/5065465 ]        Failure:                        [ 293425/5065465 ]

The failures are from directories which cannot be defragmented.

 

Conclusion

With the most recent e2fstools (1.42-WIP) it is possible to create ext4 file system larger than 16 TB.

If you do so remember the following:

  • the tool is still in development – use at your own risk!
  • tune the values for autocheck (after x mounts / after y days)
  • adjust the “-i” switch which defnes the bytes/inode ratio; in the example above one inode is created for every 8 MB
  • the more inodes you create the longer fsck takes and the more memory it needs
  • Resizing the file system (growing / shrinking) is NOT possible at the moment

http://blog.ronnyegner-consulting.de/2011/08/18/ext4-and-the-16-tb-limit-now-solved/