01/06/2014 11:11 am ET Updated Mar 08, 2014

Handling Big Data: An Interview with Author William McKnight

The buzz on Big Data is nothing short of deafening, and I often have to shut down. As a guy who knows a thing or two about information management and Big Data, it's a often a bit frustrating going to conferences and listening to people pontificate, dropping jargon and buzzwords ostensibly for the sake of appearing smart.

By contrast, William McKnight is a breath of fresh air. A frequent and excellent speaker, he explains technical concepts in very conversational terms, sprinkled with real-world examles. I recent sat down with him to discuss his new book Information Management: Strategies for Gaining a Competitive Advantage with Data (The Savvy Manager's Guides).


PS: In the book, you explain how important information is to business success today. Why is that and are companies making the most of this asset?

WM: No matter what business you're in, if you think about what sets it apart from the competition and where you need to take it, most of these strategies involve having access to reliable information when you need it. What used to be optional and prioritized well after the operational aspects of storefront, supply chain and transacting business is what now sets companies apart. If you look at the leaders in industry, they are masters of the information asset. They save more information, make it more accessible to a broad internal community and have developed business leaders that are able to consume the data and advance the business with it.

PS: How can the people around technology in organizations trying to compete out there best go about building an information architecture to aid in company decision making? What if different people have different perceptions about performance?

WM: The key is to get the right workload - a combination of the data and its processing - into the right platform based on its unique characteristics. Today, there are many options including many that have not been considered by organizations, at their peril. These include master data management, Hadoop, NoSQL systems, data stream processing and graph databases. And for all of it, there is the cloud computing option. I describe the options in the book. Various infrastructure like master data management, data virtualization and agile practices are helpful at enabling new technologies and I describe these as well. Perception matters in determining the suitable performance of a platform, but know that expectations go nowhere but up over time. Again, that platform choice is crucial. Most organizations need more possibilities at their disposal.

It is difficult to replace something working. As long as SAP is running - albeit slower than some would like - or the fraud detection system is working - though it is too slow to stop much of the fraud - performance can always be promised as being "around the corner." One more round of parameter tuning, one more patch, one more code tweak and it will perform. However, most of the time, the improvements are minor at best. At the same time, the data is growing, the plans for the usage of the system are growing and the reliance is larger than ever. I encourage my readers to architect their information systems for scalability.

PS: It has been touted for decades that a sound data strategy is to dump all company data into a data warehouse, yet you take issue with that approach.

WM:A "catch all" data warehouse is still a must. It will serve the bulk of the company reporting needs and it's still a great place to calculate analytics and broker data out to other databases for more specific needs. Many data warehouses were built years ago and as this "one size fits all" approach begins to not work for certain uses, either the data warehouse starts to only pass-through data or these other uses don't use the data warehouse at all. There are so many innovations in databases like columnar databases, in-memory high-performance databases, etc. that would improve data warehouse performance quite a bit, but when you have so many users, so many people using it, changing the platform and potentially slowing some things down, becomes untenable. In the book, I encourage the right platform for the job and means like data virtualization and master data management to hold it all together well.

PS: When explaining how to approach these projects, you mention that an organization must be agile. How does an organization become agile? Does that set the stage for information success?

WM: It's easy to say you're agile, but there is also a body of work out there that details specifics that organizations can adopt in becoming agile. A lot of that is overkill for information management projects, but much of it is not. I distill the essence of agility in the book and provide the hard-learned specifics from my consulting practice of what to adopt to enable the ability to get these projects to their checkpoints as quick as possible, while staying aligned with the business. In terms of the organization, it helps those delivering information management projects to be surrounded by similarly agile internal organizations. That does set the stage for information success. So does organization change management, which is led by the information management team and ensures the stakeholders come on board with the projects appropriately. Taking care of people is the "other half" - the non-technology half - of these projects that is required for success.

PS: You cover Big Data in the book in a very pragmatic, hype-free way. How do you define Big Data and is it as important as we are hearing?

WM: Big Data is a rather abused term these days, but I define it as data that doesn't make sense to have in systems built 10 plus years ago and based on principles built 30 years ago, specifically relational databases. Big Data is constantly being emitted - by sensor devices and by the Internet. As such, it accumulates much more rapidly than the data most organizations are used to. I believe strongly in an organizational fit for relational databases, just not for everything anymore. Many data stores will thrive because all data is so important to business. Hadoop, NoSQL databases and graph databases all have a different pattern to them than relational databases. Big Data is not going to be as important per capita, but it still delivers return on investment and may just be what defines the leaders of industry in the near future.