Research@DBTA

Hyper Storage: We’re Turning Our Enterprises into Pack Rats

IBM recently released figures estimating that worldwide digital archive capacity is expected to increase at a nearly 60 percent compound annual growth rate by 2012, which means the total amount of data will have increased by 800 percent during that time.

Organizations are facing a sharp increase in the amount of digital content that must be managed with new compliance and discovery requirements. Worldwide file, database and email archive capacity will each skyrocket at an a compound annual growth rate of up to 73 percent — altogether totaling nearly two trillion full filing cabinets of information.

To put it bluntly, that’s a lot of bits and bytes to be managed. We all know that storage requirements — for video, audio files, images, and transactional data — are exploding. Layer on top of that the various compliance mandates that dictate that organizations hold on to certain types of data for extended periods of time, and you get the idea.

A couple of years back, I was chatting with the IT director for one of the nation’s largest insurance companies, who remarked that as a result of Sarbanes-Oxley, when it comes to e-mails — or anything else — the information stays. “Right now, we save everything — no matter how far back,” he said. And this is an organization that handles almost a million and a half e-mails a day. I asked him how they’re going to be able to store and manage all those files. Currently, everything was backed up on disk, and his department was investigating ways to move older files to tape.

Society and the legal system are turning enterprises into pack rats — havoing to spend time and money on the ludicrous exercise of saving and cataloging every scrap of correspondence and data that ever crossed their domains. As a result of this, and some very legitimate data requirements, multi-terabyte databases have become commonplace. Many databases are now cracking the 100-TB market, compared to a few that finally hit the one-terabyte mark a decade ago. Our research with the Independent Oracle Users Group found that multi-terabyte databases are fairly common these days.

I had the chance to speak with Craig Butler, IBM’s storage products marketing manager, on the occassion of the 50th annoversary of disk-based storage. Basically, Butler explained, storage devices have been able to keep up with exploding data demands because of a sort of “Moore’s Law of storage.” Over the past 50 years, capacity has grown by at least 40 percent a year, and even accelerated to 100 percent a year over the last decade. All this came with no change in price, and this is likely to keep pace into the near future.

However, nothing has come along that really can take the place of disk or tape, Butler said, and there is nothing emerging on the horizon.In fact, Butler predicts that we’ll still be relying on disk drives a decade from now — except that they’ll be smaller and more ubiquitous.

There are two challenges that we face with storage as time goes by, Butler said. First, there’s a need to be able to better manage and access the huge proliferation of data piling up in enterprises. “We need new search and retrieval techniques to find thee right data,” he said. “We have amassed all this capacity, but human beings don’t have time to search through it all.” Unstructured information is real challenge — take video files, for example. When the police need to review security cameras for suspects, soemone has to spend time — hours and days worth — watching analog video.

The other challenge is being able to access storage media decades from the time the data or image is captured. “A lot of valuable data could get stranded in an application that no longer exists, or in a file format that no longer exists, or a hard disk storage of some type that no longer works with new systems,” Butler said. IBM has been working on a research initiative that has been looking at storing data and artifacts with descriptive metadata, he said.

In terms of hardware, IBM has been exploring approaches such as molecular-size switches that could toggle molecules to different states — a sort of dense way to store ones and zeros.

Ultimately, Butler pointed out, its not disk technology that will restrain storage — it’s the organization and applications around it. “The energy is shifting from how we store our hard disk drives to the applications, security concerns, how we use all this data, and how we search through all this data,” he said. “Then there’s the privacy and ethical considerations of keeping all this data, because a lot of it is going to be about you and me. Who owns it? How do we keep it safe? We’ve got a lot of legal, ethical, and privacy concerns to sort through.”

Get Your Kicks on CICS for SOA

RedMonk Analyst James Governor, who has had one ear to the IBM space and another to the burgeoning SOA/Enterprise 2.0 space for some time now, recently reflected on the changing role of CICS, once the standard middleware for mainframes. (Around before middleware was even called middleware.)

James wonders whether IBM’s pitch to abstract CICS functions into an SOA-based service layer can alleviate the need for hard-core mainframe programming and administration skills. Under the law of “leaky abstractions,” he says, things go wrong, and there will always be a need to dig underneath the virtual layer and learn to code what’s underneath.

Thus, if the mainframe is to be a player in SOA, companies will need a solid core of mainframe skills — not just SOA skills.

Still, maybe CICS’s time has come again. “Originally it was a customer information control system. I think its time to make CICS a front and center information management platform again, rather than just providing transaction management services. That’s a potential new frame for the frame. And yes – IBM’s moves to offer REST interfaces to CICS is a good move in that direction, as is all the WS-* service enablement.”

Million-Server Data Centers? Welcome to the Era of Extreme Integration

“This is the era of extreme integration… We’re not too far away from seeing a million-server data center.” This prediction by Kirk Skaugen, vice president and co-general manager at Intel Corporation, was part of an experts’ panel at the recent Windows 2008 launch, hosted by Al Gillen, vice president at IDC. (Archived video available here — requires Microsoft SilverLight download.)

While the vast majority of enterprises, of course, will not be faced with managing a million servers anytime soon, it’s fair to say that server proliferation and sprawl is now a pervasive problem, compounded by enormous growth in data and storage requirements.

Add to this the impetus to better align with the business, by being more flexible and adaptable to fast-changing processes. Given the prominence of the panel at the Windows 2008 event, this signals that data center efficiency is clearly an emerging positioning strategy for Microsoft – and perhaps a defensive one as well. Other leading vendors, including IBM and Sun, have been promoting this concept for several years, and have gained a lot of traction by blaming Windows-based servers for a lot of the server sprawl and waste that now dots enterprise landscapes.

Skaugen made the direct shot over the bows of the big server companies by pointing out that while distributed systems and storage put their share of strain on data center environments, large centralized systems are not above blame, either. “Intel and AMD today have 94 percent of the units of servers, but six percent of what is still RISC and mainframe computing make up almost 50 percent of the IT spend on hardware spend,” he said.

Regardless of platform environment, panelists agreed that the emphasis of data center value is rapidly evolving from “watts and slots” to achieving higher efficiency and better business delivery. The bottom line is that data center infrastructures need to deliver more for the business, while reducing their own footprints. “If you look at the emergence of these huge data centers, the new metric is going to be performance per watt per cubic foot – how much computational ability can you put into a volume space,” said Randy Allen, corporate vice president of AMD’s Server and Workstation Division.

James Mouton, vice president of the platform division within the HP Industry Standard Server business unit, suggested that executives and managers view data center metrics over at least a three-year time horizon. “You’ve got to look at that cost at least over a three-year period, including the power and cooling prices,” he said. “The manageability of the infrastructure overall is key. Unless you can see what’s going on, and map that to the application service level, you’re missing the forward-looking thinking. You have to make sure you’re looking at that very holistically, including business outcomes. It may or may not be a server uptime question — more likely, it’s a question of, ‘are your apps doing what they’re expected to do? Are your IT costs going down, as well as your overall three-year horizon of overall costs?‘”

Panelists also agreed that virtualization was clearly the best strategy for managing the complexity of server sprawl. Allen noted that the ability to pack greater processing power into smaller, more efficient spaces is a natural opportunity for virtualizing resources. “Virtualization flows right into that, offering simplicity in terms of managing all the complexity,” he said.

As enterprises demand more and more computing power, the challenge is more efficiently managing large-scale resources, and virtualization is one key element of such an approach. The challenges not only include improving the efficiency of data centers, but also reigning in energy consumption while correcting the underutilization of servers. As Skaugen put it, “The challenge that everyone’s facing today is that servers are only used 10 to 15 percent of their utilization in a standard data center, which is appalling.”

Rick Becker, vice president of software and solutions for Dell, added an additional element that should be considered alongside virtualization – intelligent network storage. “Intelligent storage is the next piece of the puzzle,” he said. “The only thing growing faster than server proliferation is data proliferation. As we go digital, we have to get intelligent about how we manage and grow storage. You also get more benefit out of a virtual environment when in fact the workloads are on a shared resource.”

Enterprises need to view their data center costs and benefits more holistically in terms of the business, panelists agreed. “In the old days, the question was on the entry price of the gear, and sometimes, enterprises would miss the whole, ‘once I deploy it, how do I deploy it, how do I maintain it what type of staff do I need, what type of training do they need to have?‘” said James Mouton, vice president of the platform division within the HP Industry Standard Server business unit.

Ultimately, the data center needs to view itself as a business, and not as a technology shop – a view echoed by another panelist, Ajei Gopal, senior vice president and general manager of the Enterprise Systems Management business unit at CA. Gopal said there’s an evolution underway, from “IT as a cost center to IT as a driver of the business.” Data centers need to think of themselves more as service centers, he added. As part of this initiative, Gopal advices data center executives and managers to embrace best practices such as ITIL [Information Technology Infrastructure Library].

Ultimately, through strategies such as adopting best practices and virtualization, many companies may be able to forego the need to dramatically expand or build new data centers to alleviate the crush of data and applications. “We have the ability to find the hidden data centers we have in our data centers,” said Becker.