4 min read

BTRFS and Performance Troubles

Please support my work.

OpenSuse has a really nice Btrfs setup, and it makes it easy to use Btrfs as your main file system, and you get a lot of benefits from doing so. Automatic snapshots for installations, easy time based snapshots, and of course, subvolume management and the like. But I encountered horrible BTRFS performance when I was doing my installation a little while ago. It was so bad that Gnome actually balked and timed out on a number of things, leading to all sorts of trouble. I didn't understand what was going on, but now I do. So here are some things that you can do to ensure that you have a smooth running BTRFS file system that is suitable for your general purpose use.

The Problem. One of the biggest problems in BTRFS performance is also one of its best features. The copy on write semantics of the file system mean that it is easy to get fragmentation on the file system with certain workflows. One of these is if you are constantly reading or touching lots of files throughout the weeks. A read operation on Unix systems is traditionally a writing operation, because it updates the access time. Modern systems, including OpenSuse take a compromise on this with the relatime mount option, which only updates the access time once a day. However, if you are constantly accessing those files every day, and there are many of them, you are going to end up fragmenting your system even with the relatime option enabled.

Another case is when you are doing lots of random reads and writes on a file. Because of the COW semantics, you'll end up with a massively fragmented file over time, and possibly in very short order. I didn't find this right away, and I don't know why I did not, but the BTRFS Gotchas page has this as their first problem. However, I did not find a link to this page anywhere, either in my searching for this problem, or on any of the mailing lists that showed this issue. In the end, it's a very telling hint there, and we will get back to these in a little bit.

There are two situations where you will see this massive fragmentation problem on a daily bases. The first is with the personal databases that are used on most modern desktop environments. They often use SQLite or something like that to store information, or they write a text file rapidly throughout the day. These sorts of files are going to be subject to a lot of fragmentation. The other is the Virtual Machine disk file such as that created by VMWare. This file is the epitome of the bad corner case for BTRFS, and in my situation, led to a simple file with over 20k extents, despite its size being around 10G. Suffice it to say that this is enough to bring any system to a crawl. Think massive slowdown, sluggishness, whatever.

So how can we address the issues?

Defragmentation. The way to fix an already fragmented file system is to run the defragmentation operation on all of the files and directories. The btrfs filesystem defragment command is not recursive, so you need to do this individually for every file. I think that find(1) is your friend at this point.

Auto-defragmentation. There is a mount option that you can use for defragmenting files on the fly. It's not yet enabled by default, but this helps to keep BTRFS from getting too fragmented on certain work loads. It works well for the normal desktop environments, and it is well suited for databases like SQLite and the types fo things that you see in your desktop environment daemons. However, by itself it is insufficient and likely detrimental to performance of your virtual machines.

Disabling access times. You can avoid the fragmentation that comes with the access times by simply disabling it with the noatime mount option. Most applications do not need it. Unfortunately, some do, and if you use an application that does need it, then there is little that you can do to avoid this situation, except for regular defragmentation.

Enabling Compression. The latest versions of BTRFS have the lzo compression feature, which is a fast compression algorithm that works well to speed up your system. It reduces the write burden and read burden of the system in exchange for a little CPU time, and in this day and age, it's the right trade-off most of the time. It will also save space.

Dealing with VMs. Virtual machines are not going to be well served by any of the above techniques. However, it's just not acceptable for me to have to have a separate partition dedicated just for VMs on my machine. So what's a poor guy to do? Well, if I were willing to give up on all the COW niceties, I could mount some partition with the nodatacow option, but unfortunately, you cannot mount a subvolume with nodatacow and still mount the main partition without it (it's a limitation of the mount options that you can use when manually mounting a subvolume). However, modern systems now support the C file attribute (see lsattr(1) and chattr(1)). With this, you can selectively disable the COW semantics on certain directories or files, making it possible for you to have a directory that is dedicated to virtual machines. All new files created in that directory will have COW disabled, which avoids the issues with excessive fragmentation due to the COW semantics for VM disks. It's a nice, relatively elegant and simple solution that works well.

So there you have it, enable compression, disable access times, and selectively disable COW support on those files that don't work well with it. Doing this, and also occasionally doing your own manual defragment will likely keep your BTRFS system in good shape. You'll also be able to take advantage of all the great goodies that are available to you through BTRFS. A word of warning though, many BTRFS installations do not yet have support for the snapshot aware defragmentation, so you will want to avoid defragmenting a system where you care about keeping some snapshots around from before the defragment. For me, if I need to do a defragment on such a system, I delete all the snapshots, defragment, and then resume snapshotting after that. This means that you have to be okay with dropping some of those snapshots, which may or may not be okay with you.

Please support my work.

Featured Image from Goliath.