For many years the relational database management system (RDBMS) has been the developers go-to when a database is required as part of the solution. The RDBMS has been so ubiquitous, that even with the rise of the so called ‘NoSQL’ family of databases, in many cases the RDBMS still becomes the de facto, if less than ideal, database of choice for projects.
In this article we’ll take a high level look at the motivations behind ‘NoSQL’ databases, and three of the most common categories of NoSQL database in use today. this will allow us to understand the benefits and pitfalls of each, and the types of data and tasks that each best suits.
With the advent of the NoSQL movement, we now have a wider choice of database technologies available. By understanding these choices, we can ensure we’re picking the best tool for the job, rather than spending our time working around the limitations of a poorly chosen database technology.
The aim of this article is not to say that there is no longer a place for the RDBMS in modern software systems; in many cases it’s still the best option. However, it is to say that the RDBMS is no longer the only option.
NoSQL databases generally share a number of motivations for their existence
- Simplicity of design over traditionally large RDBMS, especially when being deployed in to more modern microservice style architectures.
- Support for horizontal scaling and clustering, which has traditionally been extremely challenging for RDBMS
- Removal, or reduction, of the impedance miss-match between the schema inherent in code, and implicit in a highly schema oriented RDBMS
- Increased performance through creation of specialized data structures and techniques, rather than the generalized nature of an RDBMS
Within these common motivations, and with a desire to create specialized classes of database optimized for specific tasks, a number of categories of database have been created (or in some cases rediscovered from the distant past!).
The key-value store is one of the simplest forms of non-trivial data storage concepts, and has existing in programming almost since it’s inception in the form of maps and dictionaries. As the name implies each item in the store is represented by a tuple of a key and value, keys and values may be anything from a trivial type (string, integer) etc. to a complex representation depending on the implementation of the data store. Keys may also be stored in a lexicographical order, allowing for optimized retrieval of single and ranges of values.
A key consideration of a true key-value database is that data can only be queried by key. Whilst it may be possible to perform complex queries on the key (assuming the key is, itself, a complex data structure), it’s not possible to select values directly by querying the value.
Key-value style databases are commonly used to implement high speed, and highly scalable caches for software, often with the data held in memory only, rather than persisted to physical storage as would be common with other forms of database. Redis and memcacheD are probably the most common key-value databases in use, most commonly (although not always) as caches.
The relative simplicity of the key-value store also means that it’s often used as a building block of other, more complex forms of database, in particular graph style databases.
After the key-value store, document style database are probably the most common form of NoSQL database and as such most developers first experience of the paradigm.
At it’s heart a document database stores documents, these typically being in some form of structured encoding such as XML, JSON or YAML. Despite being encoded with structure, document databases are generally schema-less, that is the content and structure of a document is not strictly defined by the database, allowing the schema to be defined in the code accessing the data.
By using structure to defined the data, as single document (record) can hold a variety of nested data that in a more traditional relational database would be normalized into separate tables and require joins to be retrieved. This can give significant advantages in terms of speed and complexity of retrieving complex models from a document database over a RDBMS, as joins aren’t needed. Equally, if the client doesn’t need the full representation of the data back, most document query languages allow for the document to be projected to a specific form during the running of the query.
Typically documents in a document database are addressed by a unique key assigned to each document, and in this way they can be thought of in the same way as a key-value database. Unlike a key-value database however, a document database will include an API (query language) that allows the contents of the document, the ‘value’ in a key-value database, to be indexed and queried, thus allowing a richer but more complex set of operations to be performed.
As previously noted, document databases are probably the most widespread of the NoSQL family of databases, and as such there are a wide variety of implementations available, including MongoDB, CouchDB and pure Cloud offerings such as Microsoft’s Azure based Cosmos DB. Their relative simplicity to implement, and commonality makes them most peoples ‘first stop’ when considering moving from a RDBMS into the world of NoSQL, and typically they do make a good general purpose option.
The most challenging aspects of moving from RDBMS to the document database are probably in generating a good design for the documents. Often much time is spent deciding where documents should be nested (i.e. arrays within documents) or specified using relationships (e.g. links between documents). Often there are no hard and fast rules for this, and so a degree of experience and experimentation is required.
Graph databases could easily be argued as even more ‘relational’ than traditional DRBMS systems, in that the strength of the graph database is all about it’s ability to assign data to relationships. Graph databases may be the most complex of the common NoSQL styles of database to understand, but when applied to the right problems there can be no doubting the effectiveness they can deliver in simplifying the overall solution.
Graph databases rely on mathematical graph theory. A the most common type of graph in a graph database will consist of nodes, or entities with data such as users of a social network, and edges which in our example would be the relationships between the users. The key strength of the graph database is that those edges can themselves contain data, such as the nature of the relationship for example x is a friend of y, x is married to y, the date x and y connected etc. This data is stored as a series of key-value tuples.
As with most NoSQL databases, these nodes and edges are also schema-less meaning that adding new relationship types and data is a trivial operation that is generally non-breaking.
It’s certainly true that graph databases don’t make especially good general purpose data stores, however for tasks where relationships between entities are key, the graph database is often unequaled in the ease with which that data can be modeled and manipulated. These tasks are certainly possible with an RBMS, but the complexity of querying the structures is significantly greater.
The more specific nature of the domains best suited to graph databases means that they are not so widely used or understood as the key-value and document database, and as such there are fewer implementation available; Neo4j is the most widely used graph database at present.
As can be seen, the NoSQL movement has bought with it a number of classifications of implementation, each with their own unique approaches to storing any querying data, and each with a domain of tasks for which they are best suited. We need to move beyond our dogmatic reliance on the relational database, and understand that these tools are available for us to use to best solve our database needs; as with so much in software engineering, there isn’t a ‘one size fits all’ solutions.