Geom-based Disk SchedulersDisk scheduling is a consolidated topic in computer science. Some of the well-known results reached by the research community have been even developed on FreeBSD, but did not made into the main system for a number of reasons.
With Fabio Checconi, we have developed a prototype framework to introduce pluggable disk schedulers in the Geom layer. It basically consists in a Geom class that queues up requests going to the provider it is attached to, and releases them according to the algorithm, which is implemented in an external module. For a description on what geom_sched does, please have a look at the BSDCan 2009 slides
The code is currently (April 2010) available as part of FreeBSD HEAD. A version for FreeBSD 8 and FreeBSD 7 is available as 20100412-geom_sched.tgz
As a quick example of what this code can give you, try to run "dd", "tar", or some other program with highly SEQUENTIAL access patterns, together with "cvs", "cvsup", "svn" or other highly RANDOM access patterns (this is not a made-up example: it is pretty common for developers to have one or more apps doing random accesses, and others that do sequential accesses e.g., loading large binaries from disk, checking the integrity of tarballs, watching media streams and so on).
These are the results we get on a local machine (AMD BE2400 dual core CPU, SATA 250GB disk):
/mnt is a partition mounted on /dev/ad0s1f cvs: cvs -d /mnt/home/ncvs-local update -Pd /mnt/ports dd-read: dd bs=128k of=/dev/null if=/dev/ad0 (or ad0-sched-) dd-writew dd bs=128k if=/dev/zero of=/mnt/largefile NO SCHEDULER RR SCHEDULER dd cvs dd cvs dd-read only 72 MB/s ---- 72 MB/s --- dd-write only 55 MB/s --- 55 MB/s --- dd-read+cvs 6 MB/s ok 30 MB/s ok dd-write+cvs 55 MB/s slooow 14 MB/s ok
As you can see, when a cvs is running concurrently with dd, the performance drops dramatically, and depending on read or write mode, one of the two is severely penalized. The use of the RR scheduler in this example makes the dd-reader go much faster when competing with cvs, and lets cvs progress when competing with a writer.
To try the code
NOTES ON THE SCHEDULERSThe important contribution of this code is the framework to experiment with different scheduling algorithms. 'Anticipatory scheduling' is a very powerful technique based on the following reasoning:
The disk throughput is much better if it serves sequential requests. If we have a mix of sequential and random requests, and we see a non-sequential request, do not serve it immediately but instead wait a little bit (2..5ms) to see if there is another one coming that the disk can serve more efficiently.There are many details that should be added to make sure that the mechanism is effective with different workloads and systems, to gain a few extra percent in performance, to improve fairness, insulation among processes etc. A discussion of the vast literature on the subject is beyond the purpose of this short note.
geom_sched is an ordinary geom module, however it is convenient to plug it transparently into the geom graph, so that one can enable or disable scheduling on a mounted filesystem, and the names in /etc/fstab do not depend on the presence of the scheduler.
To understand how this works in practice, remember that in GEOM we have "providers" and "geom" objects. Say that we want to hook a scheduler on provider "ad0", accessible through pointer 'pp'. Originally, pp is attached to geom "ad0" (same name, different object) accessible through pointer old_gp
BEFORE ---> [ pp --> old_gp ...]
A normal "geom sched create ad0" call would create a new geom node on top of provider ad0/pp, and export a newly created provider ("ad0.sched." accessible through pointer newpp).
AFTER create ---> [ newpp --> gp --> cp ] ---> [ pp --> old_gp ... ]
On top of newpp, a whole tree will be created automatically, and we can e.g. mount partitions on /dev/ad0.sched.s1d, and those requests will go through the scheduler, whereas any partition mounted on the pre-existing device entries will not go through the scheduler.
With the transparent insert mechanism, the original provider "ad0" is hooked to the newly created geom, as follows:
AFTER insert ---> [ pp --> gp --> cp ] ---> [ newpp --> old_gp ... ]
so anything that was previously using provider pp will now have the requests routed through the scheduler node.
A removal ("geom sched destroy ad0.sched.") will restore the original configuration.