The Lustre system is designed for high performance in two major regards: (1) parallelism, scaling, and redundancy; and (2) separation of concerns. Most conventional file systems (as opposed to storage systems) implement storage management at all levels within a single overall software structure. Manipulation of user-level directories and files, logical devices, physical devices, and indexing, allocation,and recovery of storage blocks, all take place within utilities designed into the basic operating system of the computer using the file system.
Storage systems take a different approach, implementing these functions on different devices and at different levels, more or less independently of the interfaces provided by the using OS. For instance, managing the name-space of the storage system, involving indexing the files and directories (i.e., managing the meta-data), is separated from managing the logical/physical device duality and the low-level storage blocks. This allows meta-data and user data operations to go on simultaneously: a file can be read and written, possibly changing size and location, simultaneously and independently of the bookkeeping that allows it to be locked piecewise for safety and expanded or shrunk dynamically.
Lustre takes this further. It allows the separation of concerns to be more finely resolved and formalized, so that, for instance the devices making up the user data and meta-data storage pools can grow and shrink dynamically, recover from faults, and be partitioned along with the computer system itself, as needed. Redundancy at all levels allows for resiliency, and parallelism through replication allows for performance, to an almost unbounded extent. The limitations currently existing result from Lustre’s newness and the lack of organized support within the community. For instance. The low level device file systems that have hitherto been used are reaching their limits, while the capacities of individual devices continue to increase. However, Lustre’s flexibility allows that underlying file system to change without greatly affecting the overall operation of Lustre in general.
Status
Lustre is currently available as a stable, functional utility for several very large computer systems, and is in general use at many facilities. It has been developed and maintained by a combination of local expertise, and the using facilities, and commercial and community effort. This situation will allow continuing use while Whamcloud begins to grow and evolve Lustre, for greater platform agnisticism, improved robustness, and enhanced manageability.
Planned Developments
The Lustre high performance parallel storage system is poised to evolve into an utterly necessary component of any computing system intended to deliver high performance or manipulate very large data sets. Its current state is one of utility without polish. Its use on a majority of the top systems on the Top 500 list testifies to its utility, while all who use it agree on the need for augmentations of several sorts.
The first improvements planned by Whamcloud will be to improve Lustre’s reliability by way of enhancing its fault tolerance. Planning is now under way for formalizing its fault reporting and response/recovery mechanisms. Such enhancements properly fit within its overall design philosophy, but up to now have taken second place to more basic functionality concerns. Redundancy at all levels, from the target devices and their individual file systems, to the management structures and servers, is so much a part of Lustre’s basis that adding such enhancements likely can be done with updates to the existing operational code base at using facilities.
Second, but no less important, are high level management tools. These will allow greater facility at exploiting Lustre’s flexible architecture. Hot-swapping of target storage devices, for both data and meta-data, are already possible. What are envisioned are easy modification of overall storage system morphology, as well as inquiry into device and system state, fault prediction, and even low level file system changes. Currently, these matters are best left to highly experiences sysadmins, and some require downtime to implement. High level GUI-based tools will allow more qualitative views of system performance and status, as well as in-operation configuration changes in response to failures or expansions of the overall storage system.
Third, the underlying device-level file system is nearing the end of its expandability, and potential replacements are under study. ZFS is one possibility, and BTRFS is another. The final choice will hinge upon both technical matters and availability. This change, however, will have minimal effects on the higher layers of Lustre.
Lustre’s flexible architecture allows more or less “on the fly” modifications to low-level organization of component devices, as well as addition of high level management machinery.
