Sandeepan Banerjee

Subscribe to Sandeepan Banerjee: eMailAlertsEmail Alerts
Get Sandeepan Banerjee: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Virtualization Magazine, Java EE Journal, XML Magazine, SOA & WOA Magazine

Virtualization: Article

The Information Grid - XML and Databases Moving Toward Convergence

XML and databases: moving toward convergence

Two somewhat contrary-sounding drivers fuel the emerging renaissance in enterprise data management - virtualization and convergence. Virtualization is a framework for dividing up the resources of an organization into multiple execution environments through the application of one or more technologies such as hardware clustering, software partitioning, application modularization, emulation, and so on. Convergence, on the other hand, tries to bring diverse information assets - databases, mail stores, documents - under unified management. The coming Information Grid unites these opposing drivers.

The drive behind virtualization is the lowering of cost. Today's emerging grid computing environments enable not only the virtualization of IT resources such as storage, bandwidth, CPU cycles (supporting the ad hoc provisioning, on-demand deployment, and decentralized management of the resources), but also allow looser couplings between applications and modules, which are no longer assumed to be monolithic clients and servers. Loosely coupled applications will run different modules on different nodes of a virtualized IT fabric, invoke functionality from remote Web services, exchange self-describing marked up data, and orchestrate the behavior of diverse process modules. XML technologies underpin loosely coupled grid-computing applications. Within the data center, the first generation XML Web services-based service-oriented architectures (SOAs) are already in development.

Convergence, on the other hand, seeks to bring together the management of all of your data assets. Today, less than 10 percent of the world's information is managed, and most of what is found to be valuable to manage - capture, store, index, search, analyze, share, and repurpose - falls into the category of traditional rows and columns such as structured data. Being able to manage the remaining data is what convergence is all about. Here again, XML technologies underpin the renaissance. In XML we finally have a data model that is capable of addressing highly structured data (rows and columns), textual unstructured data (documents), and anything semi-structured in between (messages, template-based business data documents, or metadata). Document-intensive industries are already benefiting from standardizing their document formats on XML. Content-creation vendors are XML-enabling their tools to make it easier to capture information in content repositories. Vendors are XML-enabling business intelligence tools, application servers, enterprise portals, and other infrastructure products to make it easier to share and repurpose XML-based information.

The real driver behind convergence is better business intelligence across all assets. When unstructured information becomes a managed resource, it can be integrated into more day-to-day organizational processes, such as search and compliance, which are really types of business intelligence. Users can search across information that was previously stored in silos, such as file systems, document repositories, Web sites, and e-mail. Collaborative processes can be automated. Compliance policies - privacy, information life-cycle management, and audit - can be implemented uniformly across all organizational assets.

XML's applicability to both virtualization and convergence allows the industry to make progress on both fronts without the need for multiple disruptive paradigm shifts. Moving toward a new data-management architecture based on XML-backed information repositories distributed across XML/SOA fabrics will be a key future step for organizations. This architecture, which combines virtualization and convergence, can be called the Information Grid.

The Information Grid and Its Components
Grid computing can virtualize any IT resource, including infrastructure, applications, and information. In the Information Grid, resources span all of the data in the organization, as well as all of the metadata required to make that data meaningful. This data may be structured, semi-structured, or unstructured; stored in any location, such as databases, local file systems, or e-mail servers; and created by any application. The vision for the Information Grid builds on technologies such as semantics, distributed query, and distributed data management. The goal is to enable organizations to view all of their assets in a smooth continuum, from the Internet to the intranet, with uniform, semantically rich access.

Application Grid vs. Information Grid
Within an Application Grid, individual modules run on different parts of the infrastructure, with sharing of application state and control enabled via Web services. Each module, however, may be still tightly coupled to its data - database, file-system, e-mail server - and intelligence about the data has to be compiled into the application module. An Information Grid, in contrast, is self-describing: the application modules can discover what sources exist, what data they possess, what the life cycle of that data is, and how that data should be interpreted. The Information Grid builds on the Infrastructure and Application Grids.

Let's say a manufacturing organization is interested in tracking product defects. The defect reports come into the organization in a variety of ways - customer e-mail, news stories, phone calls to support centers, and so on. At a pure application level, the organization could build e-mail-analysis, RSS-feed-search, or CRM defect-tracking modules to be dispatched across the grid, with each module hardwired to analysis of exactly one kind of data. However, if new kinds of defect reports occur with unpredictable frequency (suddenly Internet blogs become a major source of defect information), then modules that are hard coded to a particular kind of data are proven to be fragile, and the Application Grid is not successful. An Information Grid where the defect reports can describe their own meaning, and modules interact with the defect reports to understand their semantics, appears to be more flexible. The following are the components of the Information Grid.

Infrastructure Provisioning and Failover
What are the major components of the Information Grid? At the very basic level, any grid involves the virtualization of resources. Infrastructure Grid resources include hardware resources such as storage, processors, memory, and networks, as well as software designed to manage this hardware, such as databases, storage management, system management, application servers, and operating systems. Provisioning of infrastructure resources involves pooling the resources together and allocating to the appropriate consumers based on policies. For example, one policy might be to load-balance processing power across a farm of Web servers depending on the amount of processing demanded by each, thus treating the overall processing resource as a single pool and allocating that resource through supply and demand. In addition to the cost savings that accrue from better overall CPU utilization, the spreading of computing capacity among many different computers or spreading storage capacity across multiple disk groups removes single points of failure.

More Stories By Sandeepan Banerjee

Sandeepan Banerjee is director of product management in Oracle's Server Technologies division. He is responsible for SQL, XML, and Text Search infrastructure, and especially their convergence into one platform for all data. Sandeepan has worked with database technologies for over 15 years, and the majority of them have been with Oracle.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Sandeepan Banerjee 06/28/05 12:46:11 PM EDT

The Information Grid - XML and Databases Moving Toward Convergence. Two somewhat contrary-sounding drivers fuel the emerging renaissance in enterprise data management - virtualization and convergence. Virtualization is a framework for dividing up the resources of an organization into multiple execution environments through the application of one or more technologies such as hardware clustering, software partitioning, application modularization, emulation, and so on. Convergence, on the other hand, tries to bring diverse information assets - databases, mail stores, documents - under unified management. The coming Information Grid unites these opposing drivers.