Big data presents the challenge of managing massive amounts of information. However, the increase in data is only half the story. Metadata, which in the past was limited to names and labels, today encompasses information about the relationships between data, reports, and processes and due to its growing volume requires its own strategy for efficient management.
Everything Must Stay: Managing the Data
Some of the largest volumes of metadata are created by middleware products that are used for systems integration. Middleware is a culprit in the generation of big metadata because when business processes span applications, it’s necessary to maintain an Operational Data Store (ODS) for all the data that flows through the system. Metadata gets logged in the ODS before data is handled, after the data is handled or both. With the current trend toward “retain everything” data approaches, the ODS is growing in size and becoming difficult to manage. In addition, messaging that flows between the API of an application and the middleware server contains metadata that consumes processing, memory and bandwidth.
In many situations, reducing the time it takes to produce value from all this data is a critical requirement for success. For example, AppNexus, an online ad exchange, processes between 200,000 and 275,000 online advertising bids per second with an expected response time of less than 100 milliseconds. Due to huge data volumes, they claim it can sometimes take 3 minutes to update their cache. According to AppNexus CEO Brian O’Kelley, latency in the network can cost $10,000 per minute.
Reduce Latency, Increase Efficiency
A common technological approach for reducing latency is to leverage in-memory computing. Processing the data as it arrives without having to first write it to a database can provide performance benefits and data reduction. Avoiding the need to write intermediate values to disk eliminates another source of introduced latency.
Financial institutions have been using in-memory computing for credit card fraud detection and robotic trading for many years. Meanwhile, Google uses in-memory approaches to search massive amounts of data. Metadata can also be used to leverage real-time platform capabilities.
In addition, improved data handling can increase metadata processing efficiency. When data is received by a middleware layer for purposes of data integration, it can have one of several destinations; it can be transferred to another application or a messaging queue, converted to another format, saved in one or more tables or databases, or used for determining other actions. All of these processes can be examined and tweaked for better efficiency.
For example, greater efficiencies can be achieved by using direct statements to update only information that has changed rather than updating an entire table. In event-driven or real-time scenarios, a simple flow can be evoked to have only one field or record updated. In order to cut back the volumes of metadata, more efficient commands can be used such as Replicate_Last_Updates to update only changes or Multicast to simultaneously update multiple clients.
Visualizing Data Flows
In order to analyze the most efficient processes for handling metadata, it is helpful to use design tools to help visualize data flows. One of the most powerful and efficient means for visualizing the instructions for data and metadata handling is through the use of a data mapping service.
A data mapping service typically shows source data on the left and destination data on the right and provides tools to connect data lines visually between source and destination fields while applying business functions or expression logic for transformation purposes. Steps before and after a data mapper service may contain data handling logic as well and can be visualized through an integration flowchart.
Business Process Execution Language (BPEL) is an example of a visualization approach that is very helpful. Since the logic of an integration flowchart or BPEL service can be nested to call other integration flows or BPEL services, the capabilities are extremely versatile and powerful. Proper administration of these capabilities can lead to improved data handling and metadata efficiency.
Getting ready for big metadata means making certain you have a strategy for adoption of in-memory computing techniques in order to have processing power available. Taking additional steps to evaluate your current use of outdated and inefficient ETL, data management, operational data stores, and integration data and metadata management practices will also help get your big metadata under control.
Glenn Johnson is Senior Vice President of Magic Software Enterprises Americas. He is is the author of the award-winning blog Integrate My JDE and contributor to the Business Week Guide to Multimedia Presentations (Osborne-McGraw Hill). He has presented at Collaborate, Interop, COMMON, CIO Logistics Forum and dozens of other user groups and conferences. His interviews on software industry issues have been aired on the NBC Today Show, E! News, Discovery and more.