Options for performance improvements on very big Filesystems and high IOWAIT
I have a Ubuntu 16.04 Backup Server with 8x10TB HDD via a SATA 3.0 Backplane. The 8 Harddisks are assembled to a RAID6, an EXT4 Filesystem is in use. This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput. In fact there are many small files from different servers which get snappshotted via rsnapshot every day (multiple INODES direct to the same files. I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75% and a
du -sch /backup-root/
takes several days(!). The machine has 8 Cores and 16G of RAM. The RAM is totally utilized by the OS Filesystem Cache, 7 of 8 cores always idle because of IOWAIT.
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 5af205b0-d622-41dd-990e-b4d660c12bd9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 912203776
Block count: 14595257856
Reserved block count: 0
Free blocks: 4916228709
Free inodes: 793935052
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2048
Inode blocks per group: 128
RAID stride: 128
RAID stripe width: 768
Flex block group size: 16
Filesystem created: Wed May 31 21:47:22 2017
Last mount time: Sat Apr 14 18:48:25 2018
Last write time: Sat Apr 14 18:48:18 2018
Mount count: 9
Maximum mount count: -1
Last checked: Wed May 31 21:47:22 2017
Check interval: 0 (<none>)
Lifetime writes: 152 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 513933330
Default directory hash: half_md4
Directory Hash Seed: 5e822939-cb86-40b2-85bf-bf5844f82922
Journal backup: inode blocks
Journal features: journal_incompat_revoke journal_64bit
Journal size: 128M
Journal length: 32768
Journal sequence: 0x00c0b9d5
Journal start: 30179
I'm lacking experience with this kind of filesystem usage. What options do I have to tune this. What filesystem would perform better with this scenario? Are there any options to involve RAM for other caching options than the OS-build-in one?
How do You handle very large amounts of small files on large RAID assemblies?
Thanks,
Sebastian
ubuntu-16.04 ext4 performance-tuning
New contributor
add a comment |
I have a Ubuntu 16.04 Backup Server with 8x10TB HDD via a SATA 3.0 Backplane. The 8 Harddisks are assembled to a RAID6, an EXT4 Filesystem is in use. This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput. In fact there are many small files from different servers which get snappshotted via rsnapshot every day (multiple INODES direct to the same files. I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75% and a
du -sch /backup-root/
takes several days(!). The machine has 8 Cores and 16G of RAM. The RAM is totally utilized by the OS Filesystem Cache, 7 of 8 cores always idle because of IOWAIT.
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 5af205b0-d622-41dd-990e-b4d660c12bd9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 912203776
Block count: 14595257856
Reserved block count: 0
Free blocks: 4916228709
Free inodes: 793935052
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2048
Inode blocks per group: 128
RAID stride: 128
RAID stripe width: 768
Flex block group size: 16
Filesystem created: Wed May 31 21:47:22 2017
Last mount time: Sat Apr 14 18:48:25 2018
Last write time: Sat Apr 14 18:48:18 2018
Mount count: 9
Maximum mount count: -1
Last checked: Wed May 31 21:47:22 2017
Check interval: 0 (<none>)
Lifetime writes: 152 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 513933330
Default directory hash: half_md4
Directory Hash Seed: 5e822939-cb86-40b2-85bf-bf5844f82922
Journal backup: inode blocks
Journal features: journal_incompat_revoke journal_64bit
Journal size: 128M
Journal length: 32768
Journal sequence: 0x00c0b9d5
Journal start: 30179
I'm lacking experience with this kind of filesystem usage. What options do I have to tune this. What filesystem would perform better with this scenario? Are there any options to involve RAM for other caching options than the OS-build-in one?
How do You handle very large amounts of small files on large RAID assemblies?
Thanks,
Sebastian
ubuntu-16.04 ext4 performance-tuning
New contributor
Faster disks, preferably SSD. As much RAM as possible for read caching. 16GiB isn't even in the same planet as enough RAM. Get LOTS of it, even 512GiB or more. And of course don't use RAID 6.
– Michael Hampton♦
7 hours ago
Thanks for your reply. I'm aware of the SSD option, but this makes the difference between a 7000$ Server or a 70000$ Server for backing up data. The RAM hint is a good one, but I fear that I will only get a virgin-like filesystem performance if I totally avoid DISK IO for SEEK operations which means at 60TB net. capacity a 60TB RAM cache, doesn't it? I avoided other Filesystems than EXT2/3/4 in the past, but now I am totally open for options in this direction, if they will help. :)
– t2m
7 hours ago
What's your recommendation for a RAID6 replacement at this disk configuration?
– t2m
7 hours ago
add a comment |
I have a Ubuntu 16.04 Backup Server with 8x10TB HDD via a SATA 3.0 Backplane. The 8 Harddisks are assembled to a RAID6, an EXT4 Filesystem is in use. This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput. In fact there are many small files from different servers which get snappshotted via rsnapshot every day (multiple INODES direct to the same files. I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75% and a
du -sch /backup-root/
takes several days(!). The machine has 8 Cores and 16G of RAM. The RAM is totally utilized by the OS Filesystem Cache, 7 of 8 cores always idle because of IOWAIT.
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 5af205b0-d622-41dd-990e-b4d660c12bd9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 912203776
Block count: 14595257856
Reserved block count: 0
Free blocks: 4916228709
Free inodes: 793935052
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2048
Inode blocks per group: 128
RAID stride: 128
RAID stripe width: 768
Flex block group size: 16
Filesystem created: Wed May 31 21:47:22 2017
Last mount time: Sat Apr 14 18:48:25 2018
Last write time: Sat Apr 14 18:48:18 2018
Mount count: 9
Maximum mount count: -1
Last checked: Wed May 31 21:47:22 2017
Check interval: 0 (<none>)
Lifetime writes: 152 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 513933330
Default directory hash: half_md4
Directory Hash Seed: 5e822939-cb86-40b2-85bf-bf5844f82922
Journal backup: inode blocks
Journal features: journal_incompat_revoke journal_64bit
Journal size: 128M
Journal length: 32768
Journal sequence: 0x00c0b9d5
Journal start: 30179
I'm lacking experience with this kind of filesystem usage. What options do I have to tune this. What filesystem would perform better with this scenario? Are there any options to involve RAM for other caching options than the OS-build-in one?
How do You handle very large amounts of small files on large RAID assemblies?
Thanks,
Sebastian
ubuntu-16.04 ext4 performance-tuning
New contributor
I have a Ubuntu 16.04 Backup Server with 8x10TB HDD via a SATA 3.0 Backplane. The 8 Harddisks are assembled to a RAID6, an EXT4 Filesystem is in use. This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput. In fact there are many small files from different servers which get snappshotted via rsnapshot every day (multiple INODES direct to the same files. I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75% and a
du -sch /backup-root/
takes several days(!). The machine has 8 Cores and 16G of RAM. The RAM is totally utilized by the OS Filesystem Cache, 7 of 8 cores always idle because of IOWAIT.
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 5af205b0-d622-41dd-990e-b4d660c12bd9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 912203776
Block count: 14595257856
Reserved block count: 0
Free blocks: 4916228709
Free inodes: 793935052
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2048
Inode blocks per group: 128
RAID stride: 128
RAID stripe width: 768
Flex block group size: 16
Filesystem created: Wed May 31 21:47:22 2017
Last mount time: Sat Apr 14 18:48:25 2018
Last write time: Sat Apr 14 18:48:18 2018
Mount count: 9
Maximum mount count: -1
Last checked: Wed May 31 21:47:22 2017
Check interval: 0 (<none>)
Lifetime writes: 152 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 513933330
Default directory hash: half_md4
Directory Hash Seed: 5e822939-cb86-40b2-85bf-bf5844f82922
Journal backup: inode blocks
Journal features: journal_incompat_revoke journal_64bit
Journal size: 128M
Journal length: 32768
Journal sequence: 0x00c0b9d5
Journal start: 30179
I'm lacking experience with this kind of filesystem usage. What options do I have to tune this. What filesystem would perform better with this scenario? Are there any options to involve RAM for other caching options than the OS-build-in one?
How do You handle very large amounts of small files on large RAID assemblies?
Thanks,
Sebastian
ubuntu-16.04 ext4 performance-tuning
ubuntu-16.04 ext4 performance-tuning
New contributor
New contributor
New contributor
asked 8 hours ago
t2mt2m
211
211
New contributor
New contributor
Faster disks, preferably SSD. As much RAM as possible for read caching. 16GiB isn't even in the same planet as enough RAM. Get LOTS of it, even 512GiB or more. And of course don't use RAID 6.
– Michael Hampton♦
7 hours ago
Thanks for your reply. I'm aware of the SSD option, but this makes the difference between a 7000$ Server or a 70000$ Server for backing up data. The RAM hint is a good one, but I fear that I will only get a virgin-like filesystem performance if I totally avoid DISK IO for SEEK operations which means at 60TB net. capacity a 60TB RAM cache, doesn't it? I avoided other Filesystems than EXT2/3/4 in the past, but now I am totally open for options in this direction, if they will help. :)
– t2m
7 hours ago
What's your recommendation for a RAID6 replacement at this disk configuration?
– t2m
7 hours ago
add a comment |
Faster disks, preferably SSD. As much RAM as possible for read caching. 16GiB isn't even in the same planet as enough RAM. Get LOTS of it, even 512GiB or more. And of course don't use RAID 6.
– Michael Hampton♦
7 hours ago
Thanks for your reply. I'm aware of the SSD option, but this makes the difference between a 7000$ Server or a 70000$ Server for backing up data. The RAM hint is a good one, but I fear that I will only get a virgin-like filesystem performance if I totally avoid DISK IO for SEEK operations which means at 60TB net. capacity a 60TB RAM cache, doesn't it? I avoided other Filesystems than EXT2/3/4 in the past, but now I am totally open for options in this direction, if they will help. :)
– t2m
7 hours ago
What's your recommendation for a RAID6 replacement at this disk configuration?
– t2m
7 hours ago
Faster disks, preferably SSD. As much RAM as possible for read caching. 16GiB isn't even in the same planet as enough RAM. Get LOTS of it, even 512GiB or more. And of course don't use RAID 6.
– Michael Hampton♦
7 hours ago
Faster disks, preferably SSD. As much RAM as possible for read caching. 16GiB isn't even in the same planet as enough RAM. Get LOTS of it, even 512GiB or more. And of course don't use RAID 6.
– Michael Hampton♦
7 hours ago
Thanks for your reply. I'm aware of the SSD option, but this makes the difference between a 7000$ Server or a 70000$ Server for backing up data. The RAM hint is a good one, but I fear that I will only get a virgin-like filesystem performance if I totally avoid DISK IO for SEEK operations which means at 60TB net. capacity a 60TB RAM cache, doesn't it? I avoided other Filesystems than EXT2/3/4 in the past, but now I am totally open for options in this direction, if they will help. :)
– t2m
7 hours ago
Thanks for your reply. I'm aware of the SSD option, but this makes the difference between a 7000$ Server or a 70000$ Server for backing up data. The RAM hint is a good one, but I fear that I will only get a virgin-like filesystem performance if I totally avoid DISK IO for SEEK operations which means at 60TB net. capacity a 60TB RAM cache, doesn't it? I avoided other Filesystems than EXT2/3/4 in the past, but now I am totally open for options in this direction, if they will help. :)
– t2m
7 hours ago
What's your recommendation for a RAID6 replacement at this disk configuration?
– t2m
7 hours ago
What's your recommendation for a RAID6 replacement at this disk configuration?
– t2m
7 hours ago
add a comment |
3 Answers
3
active
oldest
votes
I have a similar (albeit smaller) setup, with 12x 2TB disks in a RAID6 array, used for the very same purpose (rsnapshot
backup server).
First, it is perfectly normal for du -hs
to take so much time on such a large, and used, filesystem. This is especially true due to the -h
option, which cause considerable and bursty CPU load in addition to the obvious IO load.
Your slowness is due to the filesystem metadata being located in very distant (in LBA terms) blocks, causing many seeks. As a normal 7.2K RPM disks provide about ~100 IOPS, you can see how hours, if not days, are needed to load all metadata.
Something you can try to (non-destructively) ameliorate the situation:
- be sure to not having
mlocate/slocate
indexing your/backup-root/
(you can use the prunefs facility to avoid that), or metadata cache trashing will severly impair your backup time; - for the same reason, avoid running
du
on/backup-root/
. If needed, yourdu
only on the specific subfolder interested; - lower
vfs_cache_pressure
from the default value (100) to a more conservative one (10 or 20). This will instruct the kernel to prefer metadata caching, rather than data caching; this should, in turn, speed up thersnapshot/rsync
discovery phase; - you can try adding a writethrough metadata caching device, for example via lvmcache or bcache. This metadata device should obviously be an SSD;
- increase your available RAM.
- as you are using ext4, be aware of inode allocation issues (read here for an example). This is not directly correlated to performance, but it is an important factor when having so many files on an ext-based filesystem.
Other things you can try - but there are destructive operations:
- use XFS with both
-ftype
andfinobt
option set; - use ZFS on Linux (ZoL) with compressed ARC and
primarycache=metadata
setting (and, maybe, an L2ARC for read-only cache).
Thank you very much for this reply. As you've might have expected, I've got something to read now. The vfs_cache_pressure option is very interesting. I've played around with the caches for some minutes now and I think, the System became a bit more responsive (directory listings, autocomplete, etc..). I'll check the other points as well and give a feedback. Thanks again.
– t2m
6 hours ago
"primarycache=metadata setting (and, maybe, an L2ARC for read-only cache)." ZFS can't do both, I had a write up on its most prominent down sides: medium.com/@poige/zfs-is-raid5-of-2010s
– poige
36 mins ago
add a comment |
This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput.
🎉
This is thing that catches lots of people nowadays. Alas, conventional FSes do not scale any well here. I can give you probably just a few advices when it comes to the set-up you already have: EXT4 over RAID-6 on HDDs:
- Lower
vm.vfs_cache_pressure
down, say to 1. It'd change cacheing bias towards preserving more metadata (inode, dentry) instead of data itself and it should have positive effect in reducing number of seeks - Add more RAM. Although it might look strange for a server that doesn't run any piggy apps remember: the only way to reduce seeks is to keep more metadata in faster storage, given that you have 16 GB only it seems that it should be relatively easy to increase the RAM amount
- As I've said EXT4 isn't good choice for the use case you have, but still you can put in use some of the features it poses to soothe pain:
external journal is supported so you can try adding SSD (better mirrored) and place the journal there. Check out "ext4: external journal caveats"- Try switching journal mode to "all data's being journaled" mounting with
data=journal
- Try moving files outside of single FS scope. For e. g., if you have LVM-2 here you can create volumes of a lesser size and use them for a time being, then when it gets full, create another one and so on.
- If you don't have LVM-2 you can try doing that with /dev/loop but it's not that convenient and probably less performant
— That's probably most of what can be improved w/o from scratch re-design.
I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75%
That's very serious issue because that high disk space occupancy level only worsen fragmentation. And more fragmentation means more seeks. Wonder no longer why it gave more-or-less acceptable performance before reaching 50 %. Lots of manuals have clear recommendations to do not allow FSes grow up behind 75—80 %.
add a comment |
RAID6 does not help you much in this case, something like ZFS might enable much faster metadata and directory access while keeping speeds about the same.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
t2m is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f949808%2foptions-for-performance-improvements-on-very-big-filesystems-and-high-iowait%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I have a similar (albeit smaller) setup, with 12x 2TB disks in a RAID6 array, used for the very same purpose (rsnapshot
backup server).
First, it is perfectly normal for du -hs
to take so much time on such a large, and used, filesystem. This is especially true due to the -h
option, which cause considerable and bursty CPU load in addition to the obvious IO load.
Your slowness is due to the filesystem metadata being located in very distant (in LBA terms) blocks, causing many seeks. As a normal 7.2K RPM disks provide about ~100 IOPS, you can see how hours, if not days, are needed to load all metadata.
Something you can try to (non-destructively) ameliorate the situation:
- be sure to not having
mlocate/slocate
indexing your/backup-root/
(you can use the prunefs facility to avoid that), or metadata cache trashing will severly impair your backup time; - for the same reason, avoid running
du
on/backup-root/
. If needed, yourdu
only on the specific subfolder interested; - lower
vfs_cache_pressure
from the default value (100) to a more conservative one (10 or 20). This will instruct the kernel to prefer metadata caching, rather than data caching; this should, in turn, speed up thersnapshot/rsync
discovery phase; - you can try adding a writethrough metadata caching device, for example via lvmcache or bcache. This metadata device should obviously be an SSD;
- increase your available RAM.
- as you are using ext4, be aware of inode allocation issues (read here for an example). This is not directly correlated to performance, but it is an important factor when having so many files on an ext-based filesystem.
Other things you can try - but there are destructive operations:
- use XFS with both
-ftype
andfinobt
option set; - use ZFS on Linux (ZoL) with compressed ARC and
primarycache=metadata
setting (and, maybe, an L2ARC for read-only cache).
Thank you very much for this reply. As you've might have expected, I've got something to read now. The vfs_cache_pressure option is very interesting. I've played around with the caches for some minutes now and I think, the System became a bit more responsive (directory listings, autocomplete, etc..). I'll check the other points as well and give a feedback. Thanks again.
– t2m
6 hours ago
"primarycache=metadata setting (and, maybe, an L2ARC for read-only cache)." ZFS can't do both, I had a write up on its most prominent down sides: medium.com/@poige/zfs-is-raid5-of-2010s
– poige
36 mins ago
add a comment |
I have a similar (albeit smaller) setup, with 12x 2TB disks in a RAID6 array, used for the very same purpose (rsnapshot
backup server).
First, it is perfectly normal for du -hs
to take so much time on such a large, and used, filesystem. This is especially true due to the -h
option, which cause considerable and bursty CPU load in addition to the obvious IO load.
Your slowness is due to the filesystem metadata being located in very distant (in LBA terms) blocks, causing many seeks. As a normal 7.2K RPM disks provide about ~100 IOPS, you can see how hours, if not days, are needed to load all metadata.
Something you can try to (non-destructively) ameliorate the situation:
- be sure to not having
mlocate/slocate
indexing your/backup-root/
(you can use the prunefs facility to avoid that), or metadata cache trashing will severly impair your backup time; - for the same reason, avoid running
du
on/backup-root/
. If needed, yourdu
only on the specific subfolder interested; - lower
vfs_cache_pressure
from the default value (100) to a more conservative one (10 or 20). This will instruct the kernel to prefer metadata caching, rather than data caching; this should, in turn, speed up thersnapshot/rsync
discovery phase; - you can try adding a writethrough metadata caching device, for example via lvmcache or bcache. This metadata device should obviously be an SSD;
- increase your available RAM.
- as you are using ext4, be aware of inode allocation issues (read here for an example). This is not directly correlated to performance, but it is an important factor when having so many files on an ext-based filesystem.
Other things you can try - but there are destructive operations:
- use XFS with both
-ftype
andfinobt
option set; - use ZFS on Linux (ZoL) with compressed ARC and
primarycache=metadata
setting (and, maybe, an L2ARC for read-only cache).
Thank you very much for this reply. As you've might have expected, I've got something to read now. The vfs_cache_pressure option is very interesting. I've played around with the caches for some minutes now and I think, the System became a bit more responsive (directory listings, autocomplete, etc..). I'll check the other points as well and give a feedback. Thanks again.
– t2m
6 hours ago
"primarycache=metadata setting (and, maybe, an L2ARC for read-only cache)." ZFS can't do both, I had a write up on its most prominent down sides: medium.com/@poige/zfs-is-raid5-of-2010s
– poige
36 mins ago
add a comment |
I have a similar (albeit smaller) setup, with 12x 2TB disks in a RAID6 array, used for the very same purpose (rsnapshot
backup server).
First, it is perfectly normal for du -hs
to take so much time on such a large, and used, filesystem. This is especially true due to the -h
option, which cause considerable and bursty CPU load in addition to the obvious IO load.
Your slowness is due to the filesystem metadata being located in very distant (in LBA terms) blocks, causing many seeks. As a normal 7.2K RPM disks provide about ~100 IOPS, you can see how hours, if not days, are needed to load all metadata.
Something you can try to (non-destructively) ameliorate the situation:
- be sure to not having
mlocate/slocate
indexing your/backup-root/
(you can use the prunefs facility to avoid that), or metadata cache trashing will severly impair your backup time; - for the same reason, avoid running
du
on/backup-root/
. If needed, yourdu
only on the specific subfolder interested; - lower
vfs_cache_pressure
from the default value (100) to a more conservative one (10 or 20). This will instruct the kernel to prefer metadata caching, rather than data caching; this should, in turn, speed up thersnapshot/rsync
discovery phase; - you can try adding a writethrough metadata caching device, for example via lvmcache or bcache. This metadata device should obviously be an SSD;
- increase your available RAM.
- as you are using ext4, be aware of inode allocation issues (read here for an example). This is not directly correlated to performance, but it is an important factor when having so many files on an ext-based filesystem.
Other things you can try - but there are destructive operations:
- use XFS with both
-ftype
andfinobt
option set; - use ZFS on Linux (ZoL) with compressed ARC and
primarycache=metadata
setting (and, maybe, an L2ARC for read-only cache).
I have a similar (albeit smaller) setup, with 12x 2TB disks in a RAID6 array, used for the very same purpose (rsnapshot
backup server).
First, it is perfectly normal for du -hs
to take so much time on such a large, and used, filesystem. This is especially true due to the -h
option, which cause considerable and bursty CPU load in addition to the obvious IO load.
Your slowness is due to the filesystem metadata being located in very distant (in LBA terms) blocks, causing many seeks. As a normal 7.2K RPM disks provide about ~100 IOPS, you can see how hours, if not days, are needed to load all metadata.
Something you can try to (non-destructively) ameliorate the situation:
- be sure to not having
mlocate/slocate
indexing your/backup-root/
(you can use the prunefs facility to avoid that), or metadata cache trashing will severly impair your backup time; - for the same reason, avoid running
du
on/backup-root/
. If needed, yourdu
only on the specific subfolder interested; - lower
vfs_cache_pressure
from the default value (100) to a more conservative one (10 or 20). This will instruct the kernel to prefer metadata caching, rather than data caching; this should, in turn, speed up thersnapshot/rsync
discovery phase; - you can try adding a writethrough metadata caching device, for example via lvmcache or bcache. This metadata device should obviously be an SSD;
- increase your available RAM.
- as you are using ext4, be aware of inode allocation issues (read here for an example). This is not directly correlated to performance, but it is an important factor when having so many files on an ext-based filesystem.
Other things you can try - but there are destructive operations:
- use XFS with both
-ftype
andfinobt
option set; - use ZFS on Linux (ZoL) with compressed ARC and
primarycache=metadata
setting (and, maybe, an L2ARC for read-only cache).
answered 7 hours ago
shodanshokshodanshok
25.3k34084
25.3k34084
Thank you very much for this reply. As you've might have expected, I've got something to read now. The vfs_cache_pressure option is very interesting. I've played around with the caches for some minutes now and I think, the System became a bit more responsive (directory listings, autocomplete, etc..). I'll check the other points as well and give a feedback. Thanks again.
– t2m
6 hours ago
"primarycache=metadata setting (and, maybe, an L2ARC for read-only cache)." ZFS can't do both, I had a write up on its most prominent down sides: medium.com/@poige/zfs-is-raid5-of-2010s
– poige
36 mins ago
add a comment |
Thank you very much for this reply. As you've might have expected, I've got something to read now. The vfs_cache_pressure option is very interesting. I've played around with the caches for some minutes now and I think, the System became a bit more responsive (directory listings, autocomplete, etc..). I'll check the other points as well and give a feedback. Thanks again.
– t2m
6 hours ago
"primarycache=metadata setting (and, maybe, an L2ARC for read-only cache)." ZFS can't do both, I had a write up on its most prominent down sides: medium.com/@poige/zfs-is-raid5-of-2010s
– poige
36 mins ago
Thank you very much for this reply. As you've might have expected, I've got something to read now. The vfs_cache_pressure option is very interesting. I've played around with the caches for some minutes now and I think, the System became a bit more responsive (directory listings, autocomplete, etc..). I'll check the other points as well and give a feedback. Thanks again.
– t2m
6 hours ago
Thank you very much for this reply. As you've might have expected, I've got something to read now. The vfs_cache_pressure option is very interesting. I've played around with the caches for some minutes now and I think, the System became a bit more responsive (directory listings, autocomplete, etc..). I'll check the other points as well and give a feedback. Thanks again.
– t2m
6 hours ago
"primarycache=metadata setting (and, maybe, an L2ARC for read-only cache)." ZFS can't do both, I had a write up on its most prominent down sides: medium.com/@poige/zfs-is-raid5-of-2010s
– poige
36 mins ago
"primarycache=metadata setting (and, maybe, an L2ARC for read-only cache)." ZFS can't do both, I had a write up on its most prominent down sides: medium.com/@poige/zfs-is-raid5-of-2010s
– poige
36 mins ago
add a comment |
This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput.
🎉
This is thing that catches lots of people nowadays. Alas, conventional FSes do not scale any well here. I can give you probably just a few advices when it comes to the set-up you already have: EXT4 over RAID-6 on HDDs:
- Lower
vm.vfs_cache_pressure
down, say to 1. It'd change cacheing bias towards preserving more metadata (inode, dentry) instead of data itself and it should have positive effect in reducing number of seeks - Add more RAM. Although it might look strange for a server that doesn't run any piggy apps remember: the only way to reduce seeks is to keep more metadata in faster storage, given that you have 16 GB only it seems that it should be relatively easy to increase the RAM amount
- As I've said EXT4 isn't good choice for the use case you have, but still you can put in use some of the features it poses to soothe pain:
external journal is supported so you can try adding SSD (better mirrored) and place the journal there. Check out "ext4: external journal caveats"- Try switching journal mode to "all data's being journaled" mounting with
data=journal
- Try moving files outside of single FS scope. For e. g., if you have LVM-2 here you can create volumes of a lesser size and use them for a time being, then when it gets full, create another one and so on.
- If you don't have LVM-2 you can try doing that with /dev/loop but it's not that convenient and probably less performant
— That's probably most of what can be improved w/o from scratch re-design.
I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75%
That's very serious issue because that high disk space occupancy level only worsen fragmentation. And more fragmentation means more seeks. Wonder no longer why it gave more-or-less acceptable performance before reaching 50 %. Lots of manuals have clear recommendations to do not allow FSes grow up behind 75—80 %.
add a comment |
This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput.
🎉
This is thing that catches lots of people nowadays. Alas, conventional FSes do not scale any well here. I can give you probably just a few advices when it comes to the set-up you already have: EXT4 over RAID-6 on HDDs:
- Lower
vm.vfs_cache_pressure
down, say to 1. It'd change cacheing bias towards preserving more metadata (inode, dentry) instead of data itself and it should have positive effect in reducing number of seeks - Add more RAM. Although it might look strange for a server that doesn't run any piggy apps remember: the only way to reduce seeks is to keep more metadata in faster storage, given that you have 16 GB only it seems that it should be relatively easy to increase the RAM amount
- As I've said EXT4 isn't good choice for the use case you have, but still you can put in use some of the features it poses to soothe pain:
external journal is supported so you can try adding SSD (better mirrored) and place the journal there. Check out "ext4: external journal caveats"- Try switching journal mode to "all data's being journaled" mounting with
data=journal
- Try moving files outside of single FS scope. For e. g., if you have LVM-2 here you can create volumes of a lesser size and use them for a time being, then when it gets full, create another one and so on.
- If you don't have LVM-2 you can try doing that with /dev/loop but it's not that convenient and probably less performant
— That's probably most of what can be improved w/o from scratch re-design.
I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75%
That's very serious issue because that high disk space occupancy level only worsen fragmentation. And more fragmentation means more seeks. Wonder no longer why it gave more-or-less acceptable performance before reaching 50 %. Lots of manuals have clear recommendations to do not allow FSes grow up behind 75—80 %.
add a comment |
This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput.
🎉
This is thing that catches lots of people nowadays. Alas, conventional FSes do not scale any well here. I can give you probably just a few advices when it comes to the set-up you already have: EXT4 over RAID-6 on HDDs:
- Lower
vm.vfs_cache_pressure
down, say to 1. It'd change cacheing bias towards preserving more metadata (inode, dentry) instead of data itself and it should have positive effect in reducing number of seeks - Add more RAM. Although it might look strange for a server that doesn't run any piggy apps remember: the only way to reduce seeks is to keep more metadata in faster storage, given that you have 16 GB only it seems that it should be relatively easy to increase the RAM amount
- As I've said EXT4 isn't good choice for the use case you have, but still you can put in use some of the features it poses to soothe pain:
external journal is supported so you can try adding SSD (better mirrored) and place the journal there. Check out "ext4: external journal caveats"- Try switching journal mode to "all data's being journaled" mounting with
data=journal
- Try moving files outside of single FS scope. For e. g., if you have LVM-2 here you can create volumes of a lesser size and use them for a time being, then when it gets full, create another one and so on.
- If you don't have LVM-2 you can try doing that with /dev/loop but it's not that convenient and probably less performant
— That's probably most of what can be improved w/o from scratch re-design.
I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75%
That's very serious issue because that high disk space occupancy level only worsen fragmentation. And more fragmentation means more seeks. Wonder no longer why it gave more-or-less acceptable performance before reaching 50 %. Lots of manuals have clear recommendations to do not allow FSes grow up behind 75—80 %.
This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput.
🎉
This is thing that catches lots of people nowadays. Alas, conventional FSes do not scale any well here. I can give you probably just a few advices when it comes to the set-up you already have: EXT4 over RAID-6 on HDDs:
- Lower
vm.vfs_cache_pressure
down, say to 1. It'd change cacheing bias towards preserving more metadata (inode, dentry) instead of data itself and it should have positive effect in reducing number of seeks - Add more RAM. Although it might look strange for a server that doesn't run any piggy apps remember: the only way to reduce seeks is to keep more metadata in faster storage, given that you have 16 GB only it seems that it should be relatively easy to increase the RAM amount
- As I've said EXT4 isn't good choice for the use case you have, but still you can put in use some of the features it poses to soothe pain:
external journal is supported so you can try adding SSD (better mirrored) and place the journal there. Check out "ext4: external journal caveats"- Try switching journal mode to "all data's being journaled" mounting with
data=journal
- Try moving files outside of single FS scope. For e. g., if you have LVM-2 here you can create volumes of a lesser size and use them for a time being, then when it gets full, create another one and so on.
- If you don't have LVM-2 you can try doing that with /dev/loop but it's not that convenient and probably less performant
— That's probably most of what can be improved w/o from scratch re-design.
I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75%
That's very serious issue because that high disk space occupancy level only worsen fragmentation. And more fragmentation means more seeks. Wonder no longer why it gave more-or-less acceptable performance before reaching 50 %. Lots of manuals have clear recommendations to do not allow FSes grow up behind 75—80 %.
edited 27 mins ago
answered 1 hour ago
poigepoige
6,91211337
6,91211337
add a comment |
add a comment |
RAID6 does not help you much in this case, something like ZFS might enable much faster metadata and directory access while keeping speeds about the same.
add a comment |
RAID6 does not help you much in this case, something like ZFS might enable much faster metadata and directory access while keeping speeds about the same.
add a comment |
RAID6 does not help you much in this case, something like ZFS might enable much faster metadata and directory access while keeping speeds about the same.
RAID6 does not help you much in this case, something like ZFS might enable much faster metadata and directory access while keeping speeds about the same.
answered 2 hours ago
John KeatesJohn Keates
63349
63349
add a comment |
add a comment |
t2m is a new contributor. Be nice, and check out our Code of Conduct.
t2m is a new contributor. Be nice, and check out our Code of Conduct.
t2m is a new contributor. Be nice, and check out our Code of Conduct.
t2m is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f949808%2foptions-for-performance-improvements-on-very-big-filesystems-and-high-iowait%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Faster disks, preferably SSD. As much RAM as possible for read caching. 16GiB isn't even in the same planet as enough RAM. Get LOTS of it, even 512GiB or more. And of course don't use RAID 6.
– Michael Hampton♦
7 hours ago
Thanks for your reply. I'm aware of the SSD option, but this makes the difference between a 7000$ Server or a 70000$ Server for backing up data. The RAM hint is a good one, but I fear that I will only get a virgin-like filesystem performance if I totally avoid DISK IO for SEEK operations which means at 60TB net. capacity a 60TB RAM cache, doesn't it? I avoided other Filesystems than EXT2/3/4 in the past, but now I am totally open for options in this direction, if they will help. :)
– t2m
7 hours ago
What's your recommendation for a RAID6 replacement at this disk configuration?
– t2m
7 hours ago