You will require MongoDB version 4+, Apache Kafka, Apache Spark and Apache Hadoop installed to smoothly sail through the chapter. The codes that have been used for all the chapters can be found at: https://github.com/PacktPublishing/Mastering-MongoDB-4.x-Second-Edition.
Structured Query Language (SQL) existed even before the WWW. Dr. E. F. Codd originally published the paper, A Relational Model of Data for Large Shared Data Banks, in June, 1970, in the Association of Computer Machinery (ACM) journal, Communications of the ACM. SQL was initially developed at IBM by Chamberlin and Boyce, in 1974. Relational Software (now Oracle Corporation) was the first to develop a commercially available implementation of SQL, targeted at United States governmental agencies.
The first American National Standards Institute (ANSI) SQL standard came out in 1986. Since then, there have been eight revisions, with the most recent being published in 2016 (SQL:2016).
SQL was not particularly popular at the start of the WWW. Static content could just be hardcoded into the HTML page without much fuss. However, as the functionality of websites grew, webmasters wanted to generate web page content driven by offline data sources, in order to generate content that could change over time without redeploying code.
Common Gateway Interface (CGI) scripts, developing Perl or Unix shells, were driving early database-driven websites in Web 1.0. With Web 2.0, the web evolved from directly injecting SQL results into the browser to using two-tier and three-tier architectures that separated views from the business and model logic, allowing for SQL queries to be modular and isolated from the rest of the web application.
On the other hand, Not only SQL (NoSQL) is much more modern and supervened web evolution, rising at the same time as Web 2.0 technologies. The term was first coined by Carlo Strozzi in 1998, for his open source database that did not follow the SQL standard, but was still relational.
This is not what we currently expect from a NoSQL database. Johan Oskarsson, a developer at Last.fm at the time, reintroduced the term in early 2009, in order to group a set of distributed, non-relational data stores that were being developed. Many of them were based on Google's Bigtable and MapReduce papers, or Amazon's DynamoDB, a highly available key-value based storage system.
NoSQL's foundations grew upon relaxed atomicity, consistency, isolation, and durability (ACID) properties, which guarantee the performance, scalability, flexibility, and reduced complexity. Most NoSQL databases have gone one way or another in providing as many of the previously mentioned qualities as possible, even offering adjustable guarantees to the developer. The following diagram describes the evolution of SQL and NoSQL:
10gen started to develop a cloud computing stack in 2007 and soon realized that the most important innovation was centered around the document-oriented database that they built to power it, which was MongoDB. MongoDB was initially released on August 27, 2009.
Version 1 of MongoDB was pretty basic in terms of features, authorization, and ACID guarantees but it made up for these shortcomings with performance and flexibility.
In the following sections, we will highlight the major features of MongoDB, along with the version numbers with which they were introduced.
The different features of versions 1.0 and 1.2 are as follows:
- Document-based model
- Global lock (process level)
- Indexes on collections
- CRUD operations on documents
- No authentication (authentication was handled at the server level)
- Master and slave replication
- MapReduce (introduced in v1.2)
- Stored JavaScript functions (introduced in v1.2)
The different features of version 2.0 are as follows:
- Background index creation (since v1.4)
- Sharding (since v1.6)
- More query operators (since v1.6)
- Journaling (since v1.8)
- Sparse and covered indexes (since v1.8)
- Compact commands to reduce disk usage
- Memory usage more efficient
- Concurrency improvements
- Index performance enhancements
- Replica sets are now more configurable and data center aware
- MapReduce improvements
- Authentication (since 2.0, for sharding and most database commands)
- Geospatial features introduced
- Aggregation framework (since v2.2) and enhancements (since v2.6)
- TTL collections (since v2.2)
- Concurrency improvements, among which is DB-level locking (since v2.2)
- Text searching (since v2.4) and integration (since v2.6)
- Hashed indexes (since v2.4)
- Security enhancements and role-based access (since v2.4)
- V8 JavaScript engine instead of SpiderMonkey (since v2.4)
- Query engine improvements (since v2.6)
- Pluggable storage engine API
- WiredTiger storage engine introduced, with document-level locking, while previous storage engine (now called MMAPv1) supports collection-level locking
The different features of version 3.0 are as follows:
- Replication and sharding enhancements (since v3.2)
- Document validation (since v3.2)
- Aggregation framework enhanced operations (since v3.2)
- Multiple storage engines (since v3.2, only in Enterprise Edition)
- Query language and indexes collation (since v3.4)
- Read-only database views (since v3.4)
- Linearizable read concern (since v3.4)
The different features of version 4.0 are as follows:
- Multi-document ACID transactions
- Change streams
- MongoDB tools (Stitch, Mobile, Sync, and Kubernetes Operator)
The following diagram shows MongoDB's evolution:
As we can observe, version 1 was pretty basic, whereas version 2 introduced most of the features present in the current version, such as sharding, usable and special indexes, geospatial features, and memory and concurrency improvements.
On the way from version 2 to version 3, the aggregation framework was introduced, mainly as a supplement to the ageing (and never up to par with dedicated frameworks, such as Hadoop) MapReduce framework. Then, text search was added, and slowly but surely, the framework was improving performance, stability, and security, to adapt to the increasing enterprise load of customers using MongoDB.
With WiredTiger's introduction in version 3, locking became much less of an issue for MongoDB, as it was brought down from the process (global lock) to the document level, almost the most granular level possible.
Version 4 marked a major transition, bridging the SQL and NoSQL world with the introduction of multi-document ACID transactions. This allowed for a wider range of applications to use MongoDB, especially applications that require a strong real-time consistency guarantee. Further, the introduction of change streams allowed for a faster time to market for real-time applications using MongoDB. A series of tools have also been introduced, to facilitate serverless, mobile, and Internet of Things (IoT) development.
In its current state, MongoDB is a database that can handle loads ranging from start up MVPs and POCs to enterprise applications with hundreds of servers.
MongoDB was developed in the Web 2.0 era. By then, most developers had been using SQL or o...