Data is overshadowing IT in size, volume, and complexity, and it's growing daily. The Data Big Bang is here, and luckily technology is finally catching up, making it possible and feasible, even for smaller enterprises, to leverage and transform their big data into truly useful and actionable business intelligence.
Organizations finally understand that they have always been sitting on valuable data they should mine and transform into information to better understand their customers, vendors, and employees. There is only one problem: the data is not usually in a centralized database or in the proper formats and structures to enable "fast-enough" analysis for it to be leveraged as meaningful business information.
Significant advances in processing power and storage have eliminated most hardware challenges. Architecture, integration, and data access are the real issues. Each organization needs to examine its existing infrastructure to determine the best way to connect its data structures so it can ultimately use it to deliver the answers necessary to grow the business.
The Nature of Big Data
In addition to your standard data (sales per quarter, inventory levels, average purchase per customer), big data consists of large amounts of dynamic, growing, and changing structured and unstructured data, where relationships between data are frequently inferred rather than declared, such as (in an unstructured data example) product mentions and customer feedback on Twitter and Facebook.
According to Gartner, big data is best understood by the 3 Vs: volume, velocity, and variety. The McKinsey Global Institute (MGI) refers to big data as a "datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze." In other words, big data is just unstructured metadata lying around your business that you can analyze, organize, and utilize to create business benefit or better value.
The trend of handling "larger data sets" has grown because it's easier to analyze a single large set of related data than separate smaller sets with the same total amount of data. A big data set allows greater correlations and deeper data analysis, enabling businesses to gain rapid insights, spot new business trends, uncover hidden relationships, compose predictive algorithms, and so much more.
One example of big data is real estate listings. At any time, more than two million real estate listings are active in the United States. Eighty-five percent of the listings are being added, subtracted, and changed at any given 15 minutes interval -- almost seven million updates per hour. Each listing contains the house price and basic specifications (which are generally static) as well as data elements such as average prices in the neighborhood, community data, and industry averages, which are always in flux.
Integration: The "Big Integration Bang"
Integration is the most critical component of your ability to leverage big data. Isolated data silos do not work in the age of big data. It's hard to get a 360-degree view when your ERP data and your CRM, SCM, and other data reside in separate systems that do not talk to one another. You need to synchronize or replicate both structured and unstructured data between systems and into a single data warehouse for viewing, usage, or analysis. Systems that were not designed to communicate must now do so and share data the right way. That means using middleware that is up to the task, but by its very nature, middleware generates huge amounts of metadata and thus becomes a perfect candidate for big data approaches such as an in-memory data grid.
Creating a true big data "stream" requires a flow of information coming from mission-critical systems such as CRM, ERP, SCM, and legacy systems -- in short, all critical systems. It isn't trivial to synchronize or replicate data in a service-oriented, real-time, event-driven architecture and pass it to a data warehouse. Flexible integration tools are required to build and access systems on a business-rules level using vendor-approved adapters or APIs to achieve best practice. Direct database-to-database integration can literally void the support obligations of your ERP and other critical system vendors. They don't allow it.
Your integration tool must operate on the data-to-data level and must consider business rules and be able to trigger processes in real time. With cloud and social computing adding communication and transport latency to multiple processes, it becomes essential that middleware architecture be optimized to eliminate any integration latency behind the scenes.
Integration and the Data Grid
Traditional extract, transform, and load (ETL) tools are inadequate for enterprise integration, especially in the era of big data. In fact, ETLs can make the situation even worse. Although you will be able to collect the data, you may find that the result is now even larger and more complex to handle.
Big data access requires the management of large volumes of data and it must handle the unstructured data and deliver all the information in real time. Latency is a challenge; businesses need to provide fast data retrieval for apps that require fast response times. When an insurance company customer service rep is on the phone with a customer who is waiting for a real-time quote, speed is of the essence.
An integration system needs to deliver performance, scalability, and redundancy as well as the right set of tools to manage data and take control right after the big bang has occurred. Integration accomplished with in-memory data grid computing provides a positive option for doing just that, harnessing big data and managing enterprise business processes using big data techniques.
An in-memory data grid is middleware software composed of multiple server processes running on multiple machine instances (physical or virtual) that work together to store large amounts of data in memory, thereby achieving high performance, elastic scalability, and fail-safe redundancy. The technique allows you to access data randomly and quickly with near-zero latency as opposed to sequential disk access methods that require sequential access (resulting in sub-optimal latency).
The integration platform makes certain that data relevant to business requirements is pulled from the master databases of all environments and made available in the data warehouse for OLAP and other business intelligence needs. The in-memory database will deliver this information faster than existing environments, allowing for real-time analysis of even terabytes of information.
Integration and IT Leadership
Integration significantly improves workflows and overall efficiency within the organization. However, not all integration platforms are created equal. For the best integration performance for big data challenges, you need middleware that leverages in-memory data grid computing -- but don't overlook the need for business process management and optimization for the enterprise systems you are integrating.
Integration that leverages big data technology accelerates data retrieval and, as a positive side effect, integration will also provide a platform for improving business processes and data warehousing. It's a win-win for IT and business.
Glenn Johnson is a Senior Vice President at Magic Software Enterprises, Inc. Active in the software industry since 1984, he frequently speaks at industry conferences and writes for numerous publications.
Originally published on TDWI (The Data Warehousing Institute)http://tdwi.org/Articles/2014/02/11/Integrating-Systems-Big-Data.aspx?Page=1