Efficacy is defined succinctly as the ability to generate a desired or intended effect. Organizations manage data primarily to enable informed decisions. Thus, "Data Efficacy" is, in essence, the ability for organizations to manage data in such a way to drive informed decisions. Conceptually, the idea of data efficacy is to assess the stored enterprise data in terms of the value it brings to the enterprise and the efficiency with which it is stored, accessed, and retrieved. With the rapid growth of Big Data, measuring data efficacy would seem to be an important task for organizations. A 2010 Gartner survey reported that 47 percent of IT staffers ranked data growth as one of their IT organization's top three challenges. That said, most organizations don't measure data efficacy because they aren't interested, don't understand the concepts, or view it as too difficult. This blog will describe three key factors that have most influenced data inefficacy, or the lack of data efficacy. I will provide some key metrics for measuring data efficacy in a future blog.
1. Migration to Client-Server Models
Although a lack of data efficacy has existed for years, the push in the 1990s to downsize data centers via migration from mainframe systems to client-server models has exacerbated this issue. Mainframe ecosystems typically offered one data source for single data sources representing business entities, such as orders, customers, etc. Client-server computing, which typically decentralized processing, often replicated the same data to different computing ecosystems, possibly segregated by business function, region, or other business unit entity. Even with extensible markup language (XML) data integration, service-oriented architecture (SOA), and similar technology innovations in the new millennium, data became more duplicative and segregated. Opportunities to integrate and/or consolidate data, such as operational data stores, data warehouses, etc. often resulted in adding to the data inefficacy issue by simply adding an additional copy of data, albeit with a different model.
2. The Junk Drawer
Cloud technology has only worsened this issue of data inefficacy, because organizations are now consolidating disparate data centers into virtual clouds, duplicative data and all. Data clouds typically consolidate tens, hundreds, and possibly even thousands of data sources without applying analytics and modeling to determine duplicate data, value of data, derived data, master data, and other important factors and metrics. This has led to many data clouds becoming "junk drawers." The junk drawer in my kitchen contains super glue, batteries, a broken remote [waiting for me to fix], markers, two screwdrivers, a flash drive, and a deck of cards embossed with a picture of a very young Elvis. With the exception of the Elvis cards - they go with anything - these items have nothing in common. We can still find batteries in the junk drawer -- just not efficiently. I found two "AA" batteries under the super glue, which was under the markers. If we added a second junk drawer to better organize our first junk drawer, we would find something new to put in there instead. This is the concept behind data efficacy, or lack thereof in this case. Data centers, in general, and data clouds, specifically, have become like the junk drawers in my kitchen. Data is there, but the lack of organization and disciplined approach hides the value of the data.
3. Growth of Unstructured Data
Finally, rapid growth of unstructured data is responsible for increased data inefficacy. A 2011 survey by MarkLogic suggests that a large percentage of the senior IT respondents believe that unstructured data will overtake structured data as the largest cause of Big Data concerns by 2014 ... that's next year. There is no equivalent to master data management (MDM), a discipline that has seen success in increasing data efficacy on structured data sources, in the unstructured and semi-structured data source world. The only way to find "master data" in unstructured sources is to extract the key facts from the data sources, possibly using natural language processing (NLP) technologies. Keyword searches over unstructured text only provide a portion of the true value that lies within. Measuring data efficacy for unstructured sources requires both fact extraction and advanced analytics.
I will dedicate my next post to technologies that increase data efficacy and metrics used to measure data efficacy. In the meantime, what are your thoughts on data efficacy?