NoSQL Databases

In this chapter, we will talk about clusters, sharding, replication and NoSQL databases (especially Google Firestore).

Relational Model

Persistence - capacity to persist large amounts of data, which is infeasible to fit in memory;
Concurrency - capacity to handle multiple clients at the same time;
Transactions - capacity to ensure that the database is always in a consistent state, even in the presence of failures;
Integration - different technologies can access the database;
Standards - SQL is a standard language for relational databases;
Impedance Mismatch - the difference between the relational model and the object-oriented model;
- Limitation of the relational model - it is not possible to store objects (aggregated data) in a relational database;
- Object-relational mapping (ORM) - a technique to map objects to relational databases.

Clustering

Clusters allow to scale the database horizontally (scale-out);
In a scale-up approach, the database is scaled vertically, by adding more resources to the same machine - this requires a lot of resources;
In the other hand, in a scale-out approach, the database is scaled horizontally, by adding more machines to the cluster - this is more cost-effective;

Problem: the relational model is not designed to work in a distributed environment - companies like Google and Amazon created their own solutions to this problem.

Vantages of Clustering

High Scalability - the database can be scaled horizontally, by adding more machines to the cluster;
Reduced Cost - the database can be scaled horizontally, which is more cost-effective than scaling vertically;
Flexibility - data does not need to be stored using a schema, like in the relational model, allowing flexible formats - key-value, document, column-oriented, graph, etc;
Availability - the database can be replicated, allowing to handle failures.

Note: the CAP theorem also applies to distributed databases - impossible to have all three properties at the same time - consistency, availability and partition tolerance.

Distributed Databases

Sharding - the database is split into multiple shards, which are distributed across the cluster; in the relational model, this can be done in two ways:
- Vertical - each shard contains a subset of the columns;
- Horizontal - each shard contains a subset of the rows;
Replication - each shard is replicated across multiple machines, to handle failures; this can be done in the following ways:
- Master-slave - one machine is the master, and the others are slaves; the master is responsible for writing, and the slaves are responsible for reading;
- Peer-to-peer - all machines are equal, and can read and write;
- Master-slave reduce the number of conflicts, but peer-to-peer is more scalable, because reduce the write overhead in the master;

Some systems use a combination of sharding and replication.

NoSQL Databases

A NoSQL database is a schema-less database that is not based on the relational model - data is stored in collections.

Does not use the relational model;
Supports horizontal scaling, using clusters;
Multiple open-source implementations;
Schemaless - there is no rigid schema, allowing to store different types of data in the same collection;
Polyglot persistence - different types of data can be stored in different types of databases - e.g. relational, document, graph, etc.

BASE Properties

Basically Available - the database is always available, even if it is not consistent;
Soft state - the database is not always consistent;
Eventual consistency - the database will eventually be consistent.

Aggregate Data Models

NoSQl databases support aggregate data models, which can be classified in the following categories:

Key-value - data is stored as key-value pairs; e.g. Redis;
Document - data is stored as documents, which are similar to JSON objects; e.g. MongoDB;
Column-family - data is stored in columns; e.g. Cassandra;
Graph - data is stored as a graph, where nodes and edges represent entities and relationships, respectively; e.g. Neo4j.

Google Firestore

Firestore is a NoSQL database, which is part of the Google Cloud Platform.

The database is a set of collections;
Collections contain documents, which are aggregates of fields and values (with various types - string, number, boolean, timestamp, reference, geo-point, array, map, etc) - similar to JSON objects;
Documents in the same collection can have different fields - schemaless;
A document can reference other collections - until 100 levels of nesting;
Documents should be small, because they are replicated across the cluster - more than 1 MiB is not recommended;
Documents are identified by IDs - they can be generated automatically, or manually by the user.

Java API

public class FirestoreClient {
  public static void main(String[] args) {
        FirestoreOptions options = FirestoreOptions.getDefaultInstance().getService();
        Firestore db = options.getService();
      
        // Get collection
        CollectionReference collection = db.collection("users");

        // List documents
        Iterable<DocumentReference> documents = collection.listDocuments();
        for (DocumentReference document : documents) {
            ApiFuture<DocumentSnapshot> future = document.get();
            DocumentSnapshot documentSnapshot = future.get();
            if (documentSnapshot.exists()) {
                System.out.println("Document data: " + documentSnapshot.getData());
            } else {
                System.out.println("No such document!");
            }
        }

        // Get document
        DocumentReference document = collection.document("user1"); // user1 is the ID of the document

        // Get document data
        ApiFuture<DocumentSnapshot> future = document.get();
        DocumentSnapshot documentSnapshot = future.get();
        if (documentSnapshot.exists()) {
            System.out.println("Document data: " + documentSnapshot.getData());
        } else {
            System.out.println("No such document!");
        }

        // Add document
        Map<String, Object> data = new HashMap<>();
        data.put("name", "John");
        data.put("age", 30);
        ApiFuture<WriteResult> result = document.set(data);
        System.out.println("Update time: " + result.get().getUpdateTime());

        // Update document
        ApiFuture<WriteResult> update = document.update("age", 40);
        System.out.println("Update time: " + update.get().getUpdateTime());

        // Delete document
        ApiFuture<WriteResult> delete = document.delete();
        System.out.println("Update time: " + delete.get().getUpdateTime());

        // Add document with ID
        DocumentReference document2 = collection.document("user2");
        Map<String, Object> data2 = new HashMap<>();
        data2.put("name", "Mary");
        data2.put("age", 25);
        ApiFuture<WriteResult> result2 = document2.set(data2);
        System.out.println("Update time: " + result2.get().getUpdateTime());

        // Add document with auto-generated ID
        Map<String, Object> data3 = new HashMap<>();
        data3.put("name", "Peter");
        data3.put("age", 35);
        ApiFuture<DocumentReference> addedDocRef = collection.add(data3);
        System.out.println("Added document with ID: " + addedDocRef.get().getId());

        // Read field from document
        ApiFuture<DocumentSnapshot> future2 = document.get();
        DocumentSnapshot documentSnapshot2 = future2.get();
        if (documentSnapshot2.exists()) {
            System.out.println("Document data: " + documentSnapshot2.getData());
            System.out.println("Name: " + documentSnapshot2.getString("name"));
            System.out.println("Age: " + documentSnapshot2.getLong("age"));
        } else {
            System.out.println("No such document!");
        }
        // or
        documentSnapshot2.toObject(User.class);

        // Delete field from document
        Map<String, Object> updates = new HashMap<>();
        updates.put("age", FieldValue.delete());
        ApiFuture<WriteResult> writeResult = document.update(updates);
        System.out.println("Update time: " + writeResult.get().getUpdateTime());

        // Simple query
        Query query = collection.whereEqualTo("age", 30); // whereLessThan, whereLessThanOrEqualTo, whereGreaterThan, whereGreaterThanOrEqualTo, whereArrayContains, whereArrayContainsAny, whereIn, whereNotIn
        ApiFuture<QuerySnapshot> querySnapshot = query.get();
        List<QueryDocumentSnapshot> documents2 = querySnapshot.get().getDocuments();

        // Simple query of complex field (nested)
        FieldPath fieldPath = FieldPath.of("address", "city"); // address.city
        Query query2 = collection.whereEqualTo(fieldPath, "New York");
        
        // Complex query
        Query query3 = collection.whereEqualTo("age", 30).whereEqualTo("name", "John");
    }
}

In order to use complex queries (more than one field/condition), it is necessary to create an index in the Firestore console.

Indexing refers to the process of creating additional data structures that enable efficient data retrieval. Essentially, it's a way to assist the database in quickly locating the data you require without having to scan the entire database.

Limitations

It is not possible to use equality operators (whereEqualTo, whereLessThan, etc) on multiple fields in the same query;
Read for more limitations here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7-nosql-databases.md

7-nosql-databases.md

NoSQL Databases

Relational Model

Clustering

Vantages of Clustering

Distributed Databases

NoSQL Databases

BASE Properties

Aggregate Data Models

Google Firestore

Java API

Limitations

Files

7-nosql-databases.md

Latest commit

History

7-nosql-databases.md

File metadata and controls

NoSQL Databases

Relational Model

Clustering

Vantages of Clustering

Distributed Databases

NoSQL Databases

BASE Properties

Aggregate Data Models

Google Firestore

Java API

Limitations