Storage on Steroids: Can we ‘Save Everything’?
We’ve been hearing about it for some time – storage requirements are exploding. The nature of the things we need to store have changed – we now have to find ways to save video and audio files, as well as images. Layer on top of that the various compliance mandates that dictate that organizations hold on to certain types of data for extended periods of time.
To look at just one aspect of the challenge, there’s all that email proliferating all over the place. I was recently chatting with the IT director for one of the nation’s largest insurance companies, who remarked that as a result of Sarbanes-Oxley, when it comes to emails or anything else, it stays. “Right now, we save everything – no matter how far back,” he said. And this is at an organization that manages almost a million and a half emails a day. I asked him how they’re going to be able to store and manage all those files, to which he acknowledged that the challenge was a “constant battle” his department was dealing with. Currently, everything was backed up on disk, and his department was investigating ways to move older files to tape.
Such is the state of society and the legal system that is turning enterprises into pack rats – having to spend time and money engaged in the ludicrous exercise of saving and cataloging every scrap of correspondence and data that ever crossed into their domains. As a result of this, as well as very legitimate data requirements, terabytes have become commonplace. As reported in the pages of this publication last June, Winter Corporation, which tracks the world’s largest databases, is finding that there are now database deployments that crack the 100- to 200-terabyte mark. A decade ago, the hugest mega-databases were just topping a terabyte. And research out of the Independent Oracle Users Group (conducted by Unisphere Research) finds that multi-terabyte databases are fairly commonplace these days.
I recently had the opportunity to speak with Craig Butler, manager of disk, SAN, and NAS product marketing for IBM, about the state of all these growing storage requirements. IBM is researching ways to pack all this data into denser storage media. The occasion was the 50th anniversary of disk-based storage (yes, time flies, doesn’t it), and Butler was taking a reflective look at what we have and where we are going. Basically, he explained, storage devices have been able to keep up with exploding data demands because there is sort of a ‘Moore’s Law’ at work with storage. Over the past 50 years, capacity has grown by at least 40 percent a year, and has even accelerated to 100 percent a year over the last decade. All this with no change in price. The good news, Butler says, is that this growth is likely to continue for at least another 10 years.
Nothing has come along that really can take the place of disk or tape (for longer-term storage), Butler said, and there is nothing emerging on the horizon – yet. In fact, 10 years out, Butler predicted that disk drives would still be the way data is stored, and they’ll be smaller and even more ubiquitous.
There are two challenges we face with storage as time goes by, Butler said. First, there’s a need to be able to better manage and access the huge proliferation of data piling up in enterprises. “We need new search and retrieval techniques to find the right data,” he explained. “We have amassed all this capacity, but human beings don’t have time to search through it all.” Take video files for example. When the police need to review security cameras for suspects, someone has to spend time – perhaps hours or even days — watching analog video.
The other challenge is being able to access storage media decades from the time the data or image is captured. “A lot of valuable data could get stranded in an application that no longer exists, or in a file format that now longer exists, or a hard disk storage of some type that no longer works with new systems,” Butler said. Butler said IBM is actually working on a project to address this longevity issue. “We’ve got a research project that looks at storing metadata with data that explains what that data is, and how to interpret it to an application.”
Ultimately, its not disk technology that will restrain storage – it’s the applications and organization around it. “The energy is shifting from how we store on our hard disk drives to the applications, security concerns, how we use all this data, and how we search through all this data,” Butler said. “Then there’s the privacy and ethical considerations of keeping all this data — because a lot of it is going to be about you and me. Who owns it? How do we keep it safe? We’ve got a lot of legal, ethical, and privacy concerns to sort though.”