I recently attended the NoSQLNow! Conference 2011 in San Jose with 512 attendees. The conference primarily focused on the rise of NoSQL databases as the future of database management systems.
It is coincidentally ironic that the conference was scheduled in the same city and 42 years and four days after Dr. Edgar Frank (E. F.) "Ted" Codd submitted RJ599 internally at IBM. This research report led to his July 1970 article, published in Communications of the ACM, which provided a scientific approach to data modeling and storage, and is generally credited with fueling the relational database evolution.
The NoSQL Now! Conference 2011 delivered several examples of emerging non-relational databases, including XML databases, graph databases, RDF databases (triple stores), object databases, document databases, and column-oriented databases. One keynote presentation suggested at least (and named) 20 different types of databases available in the market and in use today.
This suggests that the NoSQL market is making the selection of database systems more complex, or that niche databases are required for niche processing. Currently, RDBMSs are the core for most online transaction processing (OLTP) systems.
The "relational wars" were already fought in the 70s, 80s and 90s against hierarchical, network and object databases... RDBMS emerged as victor. One RDBMS vendor advertises that 98 percent of the Fortune 500 among its customers. That may or may not suggest that their software is actually in use, but to some degree implies that RDBMS is used throughout major corporations. That same vendor also advertises their RDBMS process many of the data types that purportedly warrant the 20+ database types, such as XML, objects, text, and RDF. Nonetheless, NoSQL databases are finding success in organizations' Web transactions, especially involving large data.
Object databases (ODBMS) have been around since the 80s, and have successful deployments in thousands (estimated using customer lists available from ODBMS vendors -- Versant, Progress, Objectivity, etc.) of organizations, but have not greatly impacted the traditional OLTP environments. Document databases (CouchDB and MongoDB appear to be the most widely used) also have thousands of implementations, but are not pervasively entrenched in the OLTP market in most major organizations... yet.
In addition to storing documents, document databases are primarily used for storing and retrieving structures like JSON, which makes them applicable to store Web transaction data for sites that use JSON. XML databases are used primarily for XML document storage and retrieval and XML traffic caching, and have thousands of implementations. Graph databases are an emerging database type -- two announcements of new graph databases coinciding with the conference -- primarily focus on social network analysis. RDF databases are a type of graph database that support the W3C RDF standard internally. Graph and RDF databases are not immediate threats to OLTP reliance on RDBMS.
Column-oriented databases have been around since at least the 80s as well, and claim faster queries for data analysis applications because table entries are stored by column (i.e. search values) as opposed to rows. My first exposure -- SQL Expressway, which was acquired by Sybase and became the core of SAP's Sybase IQ product -- provided 500-times faster queries for data analysis than an RDBMS.
Google's Bigtable paper fueled a resurgence in column and column-hybrid databases, such as HBase and Cassandra. These databases are primarily used for processing unstructured, semi-structured, and even structured data to provide fast search and manipulation of data. Again, these databases do not show immediate promise of replacing the RDBMS as the traditional OLTP database.
The "NewSQL" database, formerly known as HybridSQL or ScalableSQL databases, keeps the relational model, the SQL interface, and ACID compliance, but adds scalability. Different "NewSQL" databases (VoltDB, NuoDB, ScaleDB, etc.) provide this scalability differently. One vendor spokesman and RDBMS guru suggests that an average RDBMS spends 96 percent of their processing on a TPC-C transaction in overhead (buffer pool management, locking, latching, and recovery) and only 4 percent performing actual work.
By serializing tasks and making their database in memory, these NewSQL databases can achieve much faster performance than traditional RDBMS. However, this vendor does not recommend long-running transactions, since they would slow all transactions... the downside of transaction serialization.
Over the years, traditional RDBMSs have evolved with demands placed on it, such as the capability to support non-relational structures (text, objects, XML, documents, triples, etc.), long-running transactions, ACID transactions, "warm" recovery, etc. However, these databases have, in general, not proven to scale out efficiently, and in many cases, represent large expenditures for license management.
OLTP transactions have changed over the past 40 years since the advent of the relational model, and the value that reduced disk storage from relational model provided has been limited by the declining costs of disk storage. NoSQL databases provide niche capabilities for specific structures, and most scale out very efficiently. However, many do not provide ACID compliance, which requires this to be coded into applications.
NewSQL databases provide enhanced scalability for RDBMS, while maintaining ACID compliance (in many cases). However, some do so by limiting the overall capabilities of the database, such as supporting long-running transactions.
The NoSQL database market, including NewSQL databases, offers promise for replacing the traditional RDBMS for OLTP and data analysis; however, don't expect the traditional RDBMS to utter "The rest is silence" anytime soon.