Researchers at IBM's Almaden, California research lab are building what
will be the world's largest data array--a monstrous repository of
200,000 individual hard drives all interlaced. All together, it has a
storage capacity of 120 petabytes, or 120 million gigabytes.
There
are plenty of challenges inherent in building this kind of
groundbreaking array, which, says, IBM, is destined to be used for, as
Technology Review writes, "an unnamed client that needs a new
supercomputer for detailed simulations of real-world phenomena." For one
thing, IBM had to rely on water-cooling units rather than traditional
fans, as this many hard drives creates heat that can't be subdued in the
normal manner. There's also a sophisticated backup system that senses
the number of hard disk failures and adjusts the speed of rebuilding
data accordingly--the more failures, the faster it rebuilds. According
to IBM, that should allow it to operate with the absolute minimum of
data loss, even none.
IBM's also using a new filesystem, designed
in-house, that writes individual files to multiple disks so different
parts of the file can be read and written to at the same time.
This
kind of array is bottlenecked pretty severely by the speed of the drives
themselves, so IBM has to rely on software improvements like that new
recovery and filesystem to up the speed and enable the use of so many
different drives at once.