MongoDB is a NoSQL, document-oriented database that provides high performance, high availability, and easy scalability. It stores data in flexible, JSON-like documents.
MongoDB offers schema flexibility, scalability, high performance, and support for dynamic queries. It also has built-in replication and failover support for high availability.
A MongoDB document is a JSON-like data structure composed of key-value pairs. Each document can have a different structure, allowing for flexible schema design.
A collection in MongoDB is a grouping of MongoDB documents. It's the equivalent of a table in relational databases. Collections don't enforce a schema, allowing for flexible data storage.
BSON (Binary JSON) is the binary-encoded serialization of JSON-like documents used by MongoDB to store data in a more compact and efficient format.
A MongoDB database is a container for collections. It holds sets of collections, each of which is similar to a table in a relational database.
MongoDB ensures high availability through features like replica sets, which maintain multiple copies of data across multiple servers, and automatic failover mechanisms.
A replica set is a group of MongoDB servers that maintain the same data set. It provides redundancy and high availability, automatically electing a primary node for read and write operations.
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host : "mongo1.example.net:27017" },
{ _id: 1, host : "mongo2.example.net:27017" },
{ _id: 2, host : "mongo3.example.net:27017" }
]
});
Sharding in MongoDB is a method for distributing data across multiple servers to support horizontal scaling. It helps manage large datasets and high-throughput applications by dividing the data into smaller, more manageable pieces, or shards. Here are the key points:
Indexing in MongoDB is the process of creating indexes to improve query performance by allowing the database to quickly locate and retrieve documents based on the indexed fields.
Some key diffrence between SQL databases and MongoDB are:
Feature | SQL Databases | MongoDB |
---|---|---|
Data Model | Follows a rigid, tabular structure | Follows a flexible, document-oriented structure |
Schema | Enforces a fixed schema | Has a dynamic schema, allowing flexible structures |
Query Language | Uses SQL (Structured Query Language) | Uses MongoDB Query Language (based on JSON) |
Scaling | Primarily scales vertically | Scales horizontally with ease (sharding) |
Transactions | Supports ACID transactions (atomic, consistent, isolated, durable) | Supports multi-document transactions (from version 4.0) |
Schema Migration | Schema changes require altering existing tables | Flexible schema allows easy schema updates |
Data Integrity | Strong data integrity constraints | Flexible schema may require additional validation |
Joins | Supports complex JOIN operations | Favors denormalized data structures |
Indexing | Supports traditional indexing techniques | Supports various index types (e.g., single field, compound, multikey) |
Scalability | Vertical scaling (adding more resources to a single server) | Horizontal scaling (distributing data across multiple servers) |
Use Cases | Well-suited for structured data with complex relationships | Ideal for unstructured or semi-structured data, real-time analytics, and high-volume applications |
Community Adoption | Widely adopted with mature ecosystem | Growing adoption, especially in web and mobile app development |
MongoDB supports multi-document transactions starting from version 4.0, allowing developers to perform atomic operations on multiple documents within a single transaction.
session.startTransaction();
try {
db.collection1.updateOne({ _id: 1 }, { $set: { status: "processed" } });
db.collection2.deleteOne({ _id: 1 });
session.commitTransaction();
} catch (error) {
print("Transaction aborted:", error);
session.abortTransaction();
}
GridFS is a specification for storing and retrieving large files, such as images, videos, and audio files, in MongoDB. It divides files into smaller chunks for efficient storage and retrieval.
Write Concern in MongoDB determines the level of acknowledgment requested from MongoDB for write operations. It specifies the number of replicas that must acknowledge a write operation for it to be considered successful.
Aggregation Framework in MongoDB is a powerful tool for performing data aggregation operations, such as grouping, filtering, and transforming documents, to obtain aggregated results.
MongoDB uses locking at the database level to ensure concurrency control. It employs a multiple-reader/single-writer locking mechanism to allow concurrent read operations while ensuring consistency.
db.collection.update(
{ _id: 1, status: "pending" },
{ $set: { status: "processed" } }
);
The ObjectId is a unique identifier generated by MongoDB for each document in a collection. It consists of a 12-byte hexadecimal value, which includes a timestamp, machine identifier, and process identifier.
The mongod process is the primary daemon process for MongoDB, responsible for managing data storage, handling client requests, and performing administrative tasks within a MongoDB deployment.
The TTL (Time-To-Live) index in MongoDB is a special type of index that automatically removes documents from a collection after a specified period, allowing for the automatic expiration of data.
MongoDB handles unstructured data by storing it as flexible, JSON-like documents within collections. This allows developers to store and retrieve data without predefined schemas, making it suitable for handling diverse data types.
Secondary nodes in a MongoDB replica set replicate data from the primary node and can serve read operations. They provide redundancy and high availability, allowing for failover in case the primary node becomes unavailable.
rs.slaveOk();
db.collection.find();
MongoDB does not support traditional table joins like SQL databases. Instead, it uses embedded documents and referencing to represent relationships between data, reducing the need for complex joins.
WiredTiger is the default storage engine for MongoDB starting from version 3.2. It provides features like compression, document-level concurrency control, and support for transactions.
MongoDB's flexible schema design makes schema migrations less disruptive compared to relational databases. Developers can add new fields to documents without requiring alterations to existing documents.
The difference between find() and findOne() in MongoDB lies in their behavior and the results they return:
Feature | find() Method | findOne() Method |
---|---|---|
Syntax | db.collection.find(query, projection) | db.collection.findOne(query, projection) |
Returns | Cursor pointing to matching documents | Single document matching the query |
Behavior | Returns all documents matching criteria | Returns the first document matching criteria or null if none |
Matching Documents | Returns multiple documents | Returns a single document |
Empty Result | Returns an empty cursor if no match | Returns null if no match found |
Default Behavior | Returns all documents if no criteria | Returns the first document if no criteria specified |
Use Case | Retrieving multiple matching documents | Retrieving a single matching document |
// Using find()
db.collection.find({ status: "active" });
// Using findOne()
db.collection.findOne({ status: "active" });
Read Concern in MongoDB determines the level of consistency for read operations. It specifies the visibility of data changes to read operations, ensuring that clients receive consistent data views.
Document embedding in MongoDB refers to a data modeling technique where one document contains another document or documents within its structure. This nesting allows for the representation of complex relationships between data entities directly within a single document, rather than using separate collections and establishing explicit relationships through references.
The difference between a compound index and a multikey index in MongoDB are:
Feature | Compound Index | Multikey Index |
---|---|---|
Definition | Index that combines multiple fields into one index | Index created on an array field, indexing each array element |
Syntax | { field1: 1, field2: 1 } | { arrayField: 1 } |
Fields | Indexes multiple fields together | Indexes each element of an array field |
Example | { "name": 1, "age": 1 } | { "tags": 1 } |
Query Optimization | Useful for queries that involve multiple fields | Useful for queries on array fields |
Storage | Consumes less storage compared to multikey indexes | May consume more storage due to indexing each array element |
Use Cases | Queries involving multiple fields | Queries on arrays or sub-documents within documents |
Capped collections in MongoDB are fixed-size collections that maintain insertion order. Once a collection reaches its maximum size, older documents are automatically removed to accommodate new ones.
The oplog (operation log) in MongoDB is a special capped collection that records all write operations in a replica set. It allows secondary nodes to replicate changes from the primary node asynchronously.
MongoDB ensures data durability by persisting write operations to disk in the WiredTiger storage engine's write-ahead log (WAL). Write operations are acknowledged only after they are safely stored on disk, ensuring durability.
db.collection.insertOne({ name: "John" }, { writeConcern: { w: "majority" } });
The difference between a primary key and a shard key in MongoDB are:
Feature | Primary Key | Shard Key |
---|---|---|
Purpose | Uniquely identifies each document in a collection | Determines how data is distributed across shards |
Uniqueness | Must be unique within the collection | Should ideally have high cardinality to evenly distribute data |
Indexing | Automatically indexed by MongoDB | Needs to be explicitly defined as a shard key |
Default Behavior | _id field is automatically used as the primary key | Shard key needs to be explicitly chosen or created |
Data Distribution | Does not affect data distribution | Determines how data is distributed across shards |
Impact on Queries | Used to efficiently retrieve individual documents | Can impact query performance and data distribution |
Use Cases | Ensuring document uniqueness and integrity | Scaling databases horizontally by distributing data |
MongoDB supports various types of indexes to optimize query performance and enforce unique constraints. Here are the different types of indexes supported by MongoDB:
MongoDB offers several security features such as authentication, authorization, encryption at rest, encryption in transit, role-based access control (RBAC), auditing, and TLS/SSL support to ensure data protection and access control.
The aggregation pipeline in MongoDB is a framework for performing data aggregation operations on documents. It consists of multiple stages, each representing a transformation step, allowing developers to process and analyze data efficiently.
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customer", total: { $sum: "$amount" } } }
]);
Both $push and $addToSet are update operators in MongoDB used to modify arrays within documents:
Feature | $push Operator | $addToSet Operator |
---|---|---|
Purpose | Adds a value to an array field | Adds a value to an array field if not already present |
Syntax | { $push: { <field>: <value> } } | { $addToSet: { <field>: <value> } } |
Behavior | Appends value to the array unconditionally | Appends value to the array if not already present |
Duplicates | Allows duplicates | Ensures uniqueness of elements within the array |
Effect on Array | May result in duplicate values in array | Ensures each value is unique in the array |
MongoDB ensures high availability in a sharded environment by deploying replica sets for each shard, providing redundancy and failover capabilities. Additionally, it uses the config servers to manage metadata and maintain cluster configuration.
The MongoDB Connector for BI (Business Intelligence) plays a crucial role in bridging the gap between MongoDB's flexible document-oriented data model and traditional BI tools used for data analysis and visualization.
MongoDB Atlas is a fully managed cloud database service provided by MongoDB, Inc. It offers features such as automated backups, scaling, security, monitoring, and global clusters, allowing developers to deploy MongoDB databases easily on the cloud.
const { MongoClient } = require('mongodb');
const uri = "mongodb+srv://<username>:<password>@<cluster>/<dbname>?retryWrites=true&w=majority";
const client = new MongoClient(uri, { useNewUrlParser: true, useUnifiedTopology: true });
MongoDB uses replica sets to achieve data replication, where each replica set consists of multiple MongoDB instances (nodes) that replicate data asynchronously from the primary node. This ensures data redundancy and fault tolerance.
MongoDB uses replica sets and distributed transactions to ensure data consistency in a distributed environment. Replica sets maintain multiple copies of data for redundancy, while transactions provide atomicity and isolation guarantees.
The balancer in MongoDB sharding is responsible for redistributing data across shards to ensure a balanced distribution of data and equal utilization of resources among shard nodes.
The difference between a document-oriented database and a relational database are:
Feature | Document-Oriented Database | Relational Database |
---|---|---|
Data Model | Stores data in flexible, schema-less documents | Organizes data into structured tables |
Schema Flexibility | Offers schema flexibility, allowing varying document structures | Enforces rigid schema with fixed table structures |
Relationships | Supports embedded documents and references for relationships | Relies on foreign keys and joins for relationships |
Scalability | Scales horizontally by sharding data across multiple servers | Traditionally scales vertically with single-server upgrades |
Query Language | Uses query languages based on JSON-like syntax | Uses SQL (Structured Query Language) for querying |
MongoDB uses replica sets and automatic failover mechanisms to handle network partitioning. In the event of a network partition, the replica set's consensus algorithm elects a new primary node, ensuring continued availability and data consistency.
MongoDB Compass is a graphical user interface (GUI) tool provided by MongoDB, Inc. It allows users to visually explore, query, and manipulate MongoDB data, as well as perform administrative tasks such as index management and schema validation.
Document validation in MongoDB is a feature that allows developers to define rules and constraints on the structure and content of documents stored in a collection. These validation rules are specified using JSON Schema validation, which defines the expected structure, data types, and validation criteria for documents within the collection.
db.createCollection("employees", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["name", "age", "department"],
properties: {
name: { bsonType: "string" },
age: { bsonType: "int", minimum: 18 },
department: { bsonType: "string" }
}
}
}
});
MongoDB provides several backup and disaster recovery options, including snapshot backups, continuous backups with MongoDB Atlas, and third-party backup solutions. These options help organizations protect against data loss and recover data in case of emergencies.
MongoDB Change Streams is a feature that allows developers to monitor real-time changes to MongoDB collections, such as insertions, updates, and deletions, by opening a persistent stream of change events. It enables building reactive applications and implementing data synchronization mechanisms.
const pipeline = [{ $match: { operationType: { $in: ["insert", "update", "delete"] } } }];
const changeStream = db.collection.watch(pipeline);
changeStream.on("change", (change) => {
console.log("Change Event:", change);
});
The WiredTiger cache in MongoDB is an in-memory cache used to store frequently accessed data and indexes, improving read and write performance. It helps reduce disk I/O operations by caching frequently accessed data pages and index entries.
MongoDB uses the concept of zone sharding to control data locality in a sharded environment. By defining zones based on a shard key range, developers can ensure that data belonging to specific regions or criteria is stored on designated shards, optimizing query performance and resource utilization.
sh.shardCollection("test.users", { "country": 1 });
sh.addTagRange("test.users", { "country": "USA" }, { "country": "USA" }, "USA");
sh.addTagRange("test.users", { "country": "UK" }, { "country": "UK" }, "UK");