Research@DBTA

How Far Has XML Come?

(This post is excerpted from an article that appeared in the August 2007 of Database Trends & Applications.)

It was well over a decade ago that XML was first introduced as a lingua franca that could bring together even the most disparate data environments. While it has become a fairly ubiquitous part of the enter­prise landscape, has it lived up to its promises? A mixed pic­ture emerges. Many companies are leveraging the standardiza­tion that is possible through XML-based interfaces to link up and better integrate with partners’ systems. But chal­lenges still remain, and adop­tion still touches relatively few applications.

In one sense, XML is helping to transform the industry as we know it, well beyond its origi­nal predictions. Nowhere is this more apparent than in the rise of software as a service, in which data and calls to applica­tions can be seamlessly exchanged across corporate boundaries. “We wouldn’t as a company be able to have the success that we have, nor deliv­er the kind of results that we have for our customers, without XML,” said Adam Gross, vice president of developer market­ing for Salesforce.com. “XML powers the advanced integra­tion capabilities for our largest sophisticated enterprise accounts, like Dell, Cisco, and Merrill Lynch.”

Indeed, whether it’s for the external enterprise, or inside­-the-firewall deployments, there is still no shortage of enthusiasm over the potential benefits XML can deliver. One company that serves the publishing industry, Mark Logic, reports that all of its customers are either already using XML or are interested in adopting it as a new standard for managing content. “Companies are increasingly becoming excited about what XML will enable them to do with their content,” John Kreisa, director of product management at Mark Logic, told DBTA. “ Organizations are realizing the importance of storing the information about their data along with the data itself, and that this is the key to enabling content re- use and repurposing.”

Standardized Integration: For many companies con­cerned with data and applica­tion integration, there simply has been no better way than through XML. “ XML pro­vides a standards-based way for data to interact between applications, databases, and even legacy operational sys­tems,” Scott Gidley, co­founder and chief technology officer for DataFlux (a sub­sidiary of SAS), told DBTA. “From a vendor perspective, we can focus on development of data quality and data man­agement services instead of developing APIs for specific platforms, applications, and programming languages. From a customer perspective, com­panies have a standardized method of application integra­tion that doesn’t change from vendor to vendor.”

Salesforce’s Gross observes that his company now handles 50 per­cent of all of its transactions with XML. “All of the actual requests into our servers and our data center, are now XML and SOAP, as opposed to HTTP and HTML Web browser requests. We actually do more XML than we do non-XML transactions,” he said. As recently as four years ago, all transactions were HTML, Gross said.

“One of the things that people fre­quently have questions about is inte­grating with our service,” Gross explained. “ How do you integrate with an application that lives out in somebody else’s data center? For a lot of our customers, integration is a key requirement. XML and Web services and SOAP have allowed us to do that.”

Adoption Rates: However, while adoption of XML is wide, it is not deep. Recent data from Evans Data, for example, finds that 61 percent of applications at 400 sur­veyed companies use XML in at least some of their applications. However, only three percent report that XML is now supported by the majority of their applications.

For example, Bart Grantham, direc­tor of software development for LogicWorks, told DBTA that his com­pany currently uses “ XML for very little of our internal development work, generally restricted to AJAX for client- side work.” Grantham pointed out, however, that his compa­ny’s commercial products use XML.

Kreisa agrees that the adoption is still relatively low, noting that XML “ is still a relatively new format.” He added, however, that “ with the adop­tion of XQuery as a standard query language for XML, organizations will now have the ability to build applica­tions and more fully leverage their XML content. Consequently, we see more and more organizations creating their content directly into an XML format.”

What are some of the issues impact­ing XML? Grantham stated that the markup language adds “ a great deal of overhead to the processing of data as well as complexity for software development. Not all data structures map cleanly onto trees, and many data stores do not need human readability or editability.”

“One of the biggest challenges with XML, besides separating the hyper­bole from the reality, is syntax fragili­ty,” Grantham noted. “ Solid libraries do much to alleviate this problem, but as a data format there is much that can go wrong in syntax and parsing of XML. XML’s strength is in being human- readable, and leveraging the industry’s familiarity with SGML­derived languages. It is also quite flexible. But, in my opinion, it should not be a first choice for a machine- to­machine data format, due to process­ing and memory overhead tradeoffs.”

Performance Issues: Appliance solutions on the market, such as IBM Datapower, are designed to offload XML processing overhead from servers and onto dedicated hard­ware that reside on the network.

However, not everyone believes XML will drag down performance. Mark Logic’s Kreisa, for one, believes hardware capacity will keep up with XML- based workloads. “ With the continued reduction in storage costs we rarely see this as a consider­ation regarding converting existing content to XML,” he explained. “ Organizations should look for a con­tent server or content base that has the scalability and performance to be able to handle the high volume of XML content.”

The industry has been responding with new approaches, including the introducing of Binary XML by the World Wide Web Consortium ( W3C), which delivers XML capabilities as object code, rather than as more human- readable – but more verbose ­text.

The most pronounced issue with XML, as cited by the Evans Data research, included the ability to write XML schemas and document- type definitions ( DTDs), which are the building blocks of XML- based docu­ments. One out of four enterprise developers say this is an issue, along with one out of five who feel that XML syntaxes create performance overhead, especially at the server level for XML parsing.

Issues with semantics also hamper XML adoption as well. “ XML has helped standardize the ‘ syntax’ for sharing data between systems, but has not addressed the far more important issue of semantics – what the data means,” Cliff Longman, chief tech­nology officer of Kalido, told DBTA. “ It is as though XML has allowed a phone connection between two peo­ple, but if one speaks English and the other speaks Chinese, the phone con­nection does not help much. It is semantic interoperability on top of XML that has become the new battle­ground for standardization at the enterprise level,” Longman said.

“Metadata really is one of the dirti­er aspects of information integration,” Michael Curry, director product strategy and management, IBM Information Platform and Solutions, told DBTA. ‘ For example, a business might refer to customer information in one database with the phrase ‘ cus­tomer ID,’ and put the same informa­tion under the phrase ‘ customer account number’ in another database. This adds to the confusion.”

Other Challenges: Other issues dampen XML adoption as well. Xiong Wang, associate pro­fessor in the Department of Computer Science at California State University, Fullerton, notes that the industry still suffers from “ a shortage of mature XML authoring tools, a lack of stable standards, and numerous customiza­tions required to make software solu­tions work.”

XML is not a good fit for structured data formats as well, Wang stated. “With structured data XML induces too much unnecessary redundancy. RDBMS on the other hand is a perfect fit in such data,” he said. In addition, Wang continued, “ lack of efficient storage structures is another chal­lenge with XML.”

There are situations where data is better left as it is, and not converted to XML. “ It’s not necessary to use XML in situations where the data has a reg­ular structure and fits neatly into rows and columns,” Kriesa pointed out. “ While it is possible to use XML for this type of content, you’re probably not fully leveraging the strengths of the XML standard.” Clearly XML has to be thought of as an option that must be applied appropriately In addition, the RDBMS vendors are increasingly addressing XML integration. For example, IBM’s recently released DB2 9 (“ Viper 2”) is considered as a “ hybrid data server to serve data from both pure relational and pure XML structures,” Bernie Spang, director of IBM data servers, told DBTA. IBM’s intention is to “ lower development time and cost savings that makes ‘ XML as data’ cost- effective for the first time.”

DB2 accomplishes this functionali­ty by storing XML data in a hierarchi­cal structure that naturally reflects the structure of XML, which allows DB2 to efficiently manage this data and eliminate much of the complex and time- consuming parsing required for XML,” Spang said.

Companies are increasingly becoming excited about what XML will enable them to do with their content. But there are situations where data is better left as it is, and not convert­ed to XML.

Five Trends Bubbling Under the Surface of IT

There are always many new and shifting paradigms shaping the direction of enter­prise information technology. We’ve talked many times here about the mega­trends that are upending our preconceived notions of how data centers should be built, run, and financed, such as open source and SOA. Here at Database Trends and Applications and Unisphere Research, we have also been watching a proliferation of new trends bubbling under the surface. Will they emerge as full-fledged data center megatrends in their own right? It remains to be seen – enterprise IT and data centers tend to be relatively conservative and cautious about leaping into new paradigms. So the impact of some trends may take years to be fully felt.

The money is now following software as a service. They say if you want to know where things are going, follow the money. At the beginning of March, i2 and IBM announced an initiative in which the two companies will collaborate to offer i2’s FreightMatrix application on an on-demand subscription basis. Dave Mitchell, direc­tor of software as a service strategy for IBM, told me in a recent interview that nowa­days, ‘most of the start-up application vendors are selecting SaaS as their primary model, and increasingly as their sole model for delivering applications. It’s gotten to the point now where most of the venture capital firms are only investing in ISVs that deliver their applications as a software service.’ Networking giant Cisco also sees gold in the SaaS model, having just plunked $3.2 billion down on WebEx, the online conferencing service.

While SaaS shows a lot of promise as a software delivery method, the model still has to be proven – and there are still concerns about data security and service reliability.

Virtualization is hot, thanks to data center consolidation initiatives. One major analyst firm reportedly scaled down its forecasts for server shipments – especially for commodity platforms – through 2010 by 4.5 million servers, all due to virtualization. Of course, the mainframe has been virtualized for years, meaning it has been stealth­ily taking on new workloads the whole time, well beneath the radar of industry ana­lysts and pundits. If anything, there are more end-users for mainframe-based appli­cations than at any time in its history, thanks to Web services and SOA. You just don’t need as many mainframes to support this growth, thanks to virtualization and the efficiency of the way the system handles workloads.

Expect to see more virtualiza­tion across the board, especially in non-mainframe environments. ‘There is absolute­ly no question in our minds that data centers are compelled to virtualize and they are going to do it in large numbers this year,’ stated Earl Hines, director of product mar­keting at uXcomm. ‘They must, simply because they are running out of power and cooling and rack space. There is no additional space to meet their needs,” said Hines.

The problem is management of large-scale operations – not easily done through virtualization. ‘There is a fundamentally different set of problems that occur when you go from tens or hundreds of something to having thousands or tens of thousands of any new technology deployed in the data center,’ said Hines.

Data centers get greener. Earlier this year, IBM announced ‘Project Big Green,’ in which it is redirecting $1 billion per year across its businesses, mobilizing the com­pany’s resources to ‘dramatically increase the level of energy efficiency in IT.’ The savings are substantial – for an average 25,000 square foot data center, clients should be able to achieve 42 percent energy savings. Based on the energy mix in the U.S., this savings equates to 7,439 tons of carbon emissions saved per year.

Project Big Green targets corporate data centers where energy constraints and costs can limit their ability to grow. IBM will promote high-density computing systems utilizing vir­tualization technology, along with energy-efficient power and cooling technologies. At the same time, Google announced that it is nearing completion of its solar power project at its Mountain View headquarters, and estimates that it can meet 30 percent of its peak electricity demand from solar panels.

This is all commendable, and it certainly is in the best self-interest of companies to cut power costs. But how many tons of carbon emissions does the use of informa­tion technology actually help reduce? Probably far more than it emits.

Enterprise 2.0 will help increase data center productivity and empower end ­users. This year, vendors, experts, and pundits have been touting Enterprise 2.0 – an umbrella term that encompasses everything from wikis to application mashups to software as a service – as the ‘Next Big Thing’ poised to sweep enterprise comput-ing. Harvard’s Andrew McAfee, for one, predicts major shifts in the way IT is organized across the enterprise. However, Tom Davenport, noted evan­gelist of ‘Competing on Analytics,’ said Enterprise 2.0 is nice to have, but will have little impact on corporate IT operations for the foreseeable future.

As with SaaS, the advantages of Enterprise 2.0 approaches have yet to be tested and proven. Enterprise 2.0 may not deliver measurable impacts for now, but it has the potential to evolve into an enabler of greater participation and flex­ibility in IT development and manage­ment – but that’s hard to quantify.

Do more with less. That’s a phrase that has been uttered endlessly down through the decades. And, no matter what the state of the economy, the pres­sure remains on data center executives to tamp down spending and TCO as much as possible.

The greatest obstacle in the way of IT departments achieving their goals for 2007 is simple – a lack of budget, according to a proprietary study conducted by Unisphere Research, the research arm of DBTA. In a survey of 248 data management professionals and executives, 39 percent of the respondents said that a lack of budget was the primary obstacle to their IT groups achieving their goals this year. A lack of time was the second greatest obstacle, mentioned by 29 percent of the respondents, followed by lack of skilled professionals, which was tabbed by 16 percent of those who participated in the survey.

While organizations remain stingy with IT funding, they are quite willing to open up their wallets for initiatives that can directly impact the bottom line – business intelligence and analytics continue to be big spending categories.

Competing on Analytics

In theory, companies are now capable of capturing and ana­lyzing the details of every minute transaction and event that occurs within their walls. Although businesses are being inundated with data, much of it is the wrong data. It’s not timely, and it’s not get­ting to the right end- users. This is perhaps one of the most vexing challenges to “ competing on analytics,” now seen as a key strategy for attaining competitive differ­entiation, and well- document- ed in popular books by indus­try experts such as Tom Davenport of Babson College.

“In the old days, you could more or less rely on your competitors being at about the same level of efficiency, but analytics changes the playing field dramatically,” Joe Pusztai, director of product marketing for Applix, told DBTA. “ Business process automation is important too, but it essentially only enables you to execute strategy; ana­lytics is what enables you to set the strategy in the first place, for example, by detect­ing trends and ‘ seismic shifts’ in your industry early.”

How can such shifts be accu­rately and quickly detected? Some leaders in competing on analytics have employed multi-faceted approaches that leverage a wide range of data sources, and they extend this capability to as many end­users as possible. One such organization, BlueCross BlueShield of Tennessee ( BCBST), offers account reporting to its largest groups, which allows the company to
respond more effectively to RFPs to acquire new business and retain existing clients, Frank Brooks, senior manager of data resource management and chief data architect for BCBST, told DBTA. Analytical capabilities cur­rently delivered via the Internet to BCBST clients include utilization manage­ment through interactive reports and OLAP data cubes. BCBST plans to provide additional analytical capabili­ties for its account reporting packages, including national and regional benchmarking data from Blue Health Intelligence ( a national data warehouse of BlueCross BlueShield Plans), Brooks said. “ We’re now in the process of enhancing our busi­ness intelligence and analyti­cal infrastructure to also sup­port instant access to the results of text analytics and predictive analytics process­ing.” BCBST is taking a multi­pronged approach involving traditional business intelli­gence tools, as well as data

mining, text analytics, and enterprise search to sift through a variety of com­pany data sources to spot trends and pat­terns in service, claims, and utilization.

Clearly, the industry is moving into a new generation of tools that focus more on real-time delivery of operational data, as well as extending reporting
capabilities to corporate performance management dashboard systems to pro­vide a picture of the health of the busi­ness.

However, many companies are inun­dated with data, and are still mired in earlier generations of query and report­ing products. “Most organizations are barely at the toddler stage when it comes to analytics,” Eric Blankenburg, vice president of application and integration solutions at Avanade, told DBTA. “We are drowning in information. It’s past the point where it is even possible for us to interpret the data and make reasoned decisions without some significant level of analytical support.”

A recent survey of 296 data applica­tions managers, conducted by Unisphere Research for the Oracle Applications Users Group (OAUG) in partnership with Cognos, found that a paradox exists in most organizations today. Decision- makers are over­whelmed by information overload, but at the same time, there isn’t enough of the right information available. The study found that 91 percent of companies said that their decision-making capabilities were stymied by a lack of complete information. Yet, three out of four also report they suffer from ‘information overload.’ Identifying and separating out the pieces of data that have the most value may be like looking for a particular piece of straw in a haystack. Add to this the fact that most end-users do not have access to the latest BI tools, and still have to go through IT or other depart­ments. The majority of respondents to the OAUG survey, in fact, report that it takes more than three to five days to get a report out of IT. Overall, the survey found, fewer than 10 percent of employ­ees have access to BI and corporate per­formance management tools.

“We’re still only touching the surface of business intelligence,” Marc Andrews, director of strategy and busi­ness development for unstructured information at IBM, told DBTA. “The number of business processes and the number of users across the organization that are leveraging the technologies is still only the fraction of the population potential.”

Other industry experts strongly agree that BI has not proliferated as thorough­ly as it should. “Companies have yet to find an effective way to deliver BI capa­bilities to more than a handful of ‘power users’ who have the technical expertise to leverage BI tools,” Mark Lorion, director of product marketing for the Spotfire Division of TIBCO, told DBTA. “Instead, their employees are using spreadsheets and other packaged applications because the BI platforms are not flexible enough to suit their analysis needs or pace. BI tools fre­quently are not intuitive, and require heavy IT involvement to reconfigure cubes or generate new reports. Because they require IT involvement, they do not work at the speed of front­line decision-makers.”

How does a company leverage such overwhelming data stores and learn how to compete on analytics? To suc­cessfully compete on analytics, compa­nies need to embed analytic functional­ity in every mission-critical application across the enterprise, IBM’s Andrews pointed out. “Most companies are using BI for traditional querying and report­ing, not for real-time operational busi­ness intelligence. They’re not using it as part of their business applications – as part of processing a claim, as part of helping a customer resolve a problem, or as part of processing a transaction. The future is enabling people to access business intelligence within a call cen­ter application – not as a standalone application that they have to go to for querying and reporting.”

Data quality also takes on greater urgency as companies turn on opera­tional analytics. Mary Crissey, analytics marketing manager at SAS Institute, sounded a note of caution that many companies may rush too fast to rely on real-time or near real-time data without vetting it for accuracy or timeliness. “With business intelligence, there’s a data integration piece, which involves the storage and cleansing of the data. We’re all putting data together from dif­ferent sources – some people are key­stroking it in, some people are collect­ing it in from the Web, some people are getting it over the phone. You get all this data coming in, lots of times, different formats, and you have to merge it all together. Cleansing of that data real­time is critical.”

Eventually, prices of sophisticated analytical tools – still out of the reach of many companies – may begin to come down as capabilities become more widespread. This will dramatically improve the availability of such tools and capabilities. “There has not been enough innovation in the BI industry for years,” agreed Scott Yara, co­founder and president of Greenplum, who posited that more powerful com­modity systems and open source soft­ware are poised to disrupt the entire BI industry. “Only very recently has it become possible to buy a high-perform­ance database for large-scale BI for under $1 million per terabyte. By com­parison, you can go to the store today and buy a terabyte of storage for well under a thousand dollars. It’s the cost and performance of the traditional solu­tions that have made it difficult for companies to adopt BI to analyze all ­or any significant portion – of their data.”

In the meantime, data managers need to sharpen their selling skills to cost-justi­fy BI expenditures to skeptical corpo­rate management. Demonstrating ROI on new BI technologies was the greatest challenge for BCBST, Brooks related. “Our biggest issue is the justification of new technology where the value cannot be easily quantified,” he explained. “ Unlike operational systems where projects or enhancements provide cost reductions in the form of increased effi­ciency and productivity, information management infrastructure enhance­ments often enable a more effective organization where cost savings or increased revenue are difficult to corre­late.” Many of the potential uses for enterprise data warehouses, for exam­ple, “are difficult to forecast a return on investment,” he said.

This requires greater understanding and education provided to the business as a whole. “Analytics at the strategic and competitive level of decision-mak­ing in enterprises is typically under­resourced, misunderstood, and doesn’t lend itself as well to digital solutions as the kind of tactical and day-to-day deci­sions that analytics, BI and knowledge management solutions are most com­monly applied to,’ observed Craig Fleisher, co-author of Business and Competitive Analysis (Financial Times Press) and professor at the University of Windsor. “ Many companies are, to some degree, competing on analytics, but the bigger issue is to what degree are they competing on analytics? Since the field of analytics, particularly whereby databases, systems, solutions and applications are concerned, is still in its early stages, companies are in var­ious stages of moving up their analytics learning curves,” he told DBTA.

Ultimately, Pusztai observed, the best pitch for greater analytics comes from the late management thinker Peter Drucker, who said, “We have to stop counting and start measuring.” “This means that many business analysts out there are actually ‘counting,’ not analyz­ing. BI and analytics puts an enormous amount of power into people’s hands, but they have to learn how to leverage it better,” Pusztai said.

Overall, fewer than 10 per­cent of employees have access to BI tools.