The open source parallel file system Lustre is in a stronger position than it has ever been. Lustre has undeniable strengths in performance, scalability, and open, collaborative innovation. Because of this it is used in over 60% of the TOP100 supercomputing sites. But it was not an overnight success. It required a strong response from the community to pull together and get us here.
Two years ago I wrote a piece in HPCwire called “The State of the Lustre Community.” (August 2011) I had a chance to reread it recently, and as I see the HPDD Lustre team giving presentations and interacting with colleagues at the annual Lustre User Group meeting (LUG 2013) this week, I am struck by what incredible strides Lustre has made in the past two years.
In 2011, the State of the Lustre Community was on the right track, having pulled together organizationally. At LUG 2011, the two US-based Lustre community groups had announced a merger, and at the International Supercomputing (ISC) show in Germany, shortly thereafter, the two existing community groups, OpenSFS and EOFS, signed a memorandum of understanding to show their allegiance.
But there was a perception problem. Lustre was trusted and used only in the capacity of a /tmp directory. This was because Lustre was seen as science project. While Lustre had amazing performance and scalability, it had a reputation that only the best HPC installations in the world were willing to put up with stability issues or unclear release schedules, with their needs, deep knowledge and patience.
That time, and more importantly, that perception, is at least two years behind us. Perceptions – and realities – have changed. Today, Lustre is now often found in the /home directory as a key technology. For 2013, we expect it to be found more and more in enterprise use, as a foundational technology in both HPC and commercial HPC settings.
Getting to this point has been hard work. My company at the time, Whamcloud, did a lot of heavy lifting to lead the technical direction of Lustre. I do not think that is boasting. I think there were legitimate concerns about the future of Lustre in general and very specific issues around its feature set and product roadmap. Whamcloud’s impact on Lustre helped correct these issues.
We especially focused on committing to and following through with a clear release schedule and a very open process that encouraged participation.
Throughout, Whamcloud had a clear goal: To foster an environment where no one entity controlled the technology. We still believe this is key, and is an important ingredient in Lustre’s rapid development. Lustre is open source, and has been paid for by the community. We feel it should be controlled by the broader community.
So, what is the current state of this community today?
I see a united community, a vibrant vendor environment, an accelerating technology and both a push and a pull into the broader space of Big Data.
First of all, Lustre has an incredibly united community. Born out of necessity two years ago, it has consistently pulled in the same direction. Both the OpenSFS and EOFS have rallied behind one code tree. Open development continues and the OpenSFS working groups are the established clearing houses for the community. This has allowed of very real progress for over two years on the technical front. This is all extremely positive news.
Also very important, vendors have noticed. The quest to expand the vendor community has produced a vibrant Lustre vendor community with multiple product and service offerings that simply did not exist before. The Whamcloud.com website, for example, lists 23 Resellers. This is incredible commercial progress.
And the technology development is accelerating quickly. I mean that in two dimensions! Code contributions have grown from 35k lines of code (LOC) in Lustre 2.1, to 50k in Lustre 2.2, to 85k in Lustre 2.3 and we expect over 200k LOC changes in Lustre 2.4. This is not inefficient, this is not bloat. In fact, it includes both additions and subtractions. What this shows is the growth of a maturing technology, with a wider and wider feature set with uses in more and more markets.
For example, Hierarchical Storage Management (HSM), a data storage technique that facilitates the movement of data between high-cost and low-cost storage media, is coming out soon. A major, long-awaited feature for enterprise customers, HSM will help accelerate Lustre use in the commercial high performance I/O space.
But there’s another way that the technology is accelerating. More groups are now contributing. At the same time as LOC has increased, the core HPDD team contribution has evolved from 90%, to 80% and we expect to be at 75% or less of the code modifications in Lustre 2.4. We think this is a fabulous development because the number of contributors has climbed to 12 or more groups in Lustre 2.4.
This makes Lustre stronger and more in the control of the community.
And when talking about the current state of Lustre in 2013, it is important note the increasing connection to Big Data. If we can find solid ground for the technology in Big Data, we will have broadened our customer base and reduced the load on the HPC community who have been so supportive. To be sure, the relation of Big Data at this stage should not be overstated, but as a key concept in Lustre development, we are Interested in extending Lustre to help expand the market for the vendors who support us today.
Lustre is also strong in future HPC directions now being mapped out now via the Fastforward program. Lustre is an obvious technology choice for the journey to exascale speeds. We are optimistic about the future of Lustre in HPC for the next ten years based on the groundbreaking work going on there.
Brent is General Manager, High Performance Data Division, Intel Corp. He lives in the East Bay Area, loves his family and his road bike.