MongoDb - MongoDB GridFS for Large File Storage
MongoDB is designed to store data in BSON documents, and each document has a maximum size limit of 16 MB. While this size is sufficient for most application data, it becomes a limitation when dealing with large files such as videos, audio recordings, high-resolution images, PDFs, backups, medical scans, and other multimedia content. To overcome this restriction, MongoDB provides a specification called GridFS, which enables efficient storage and retrieval of files larger than 16 MB.
What is GridFS?
GridFS is a file storage system built into MongoDB that divides large files into smaller chunks and stores them across multiple documents. Instead of storing an entire file in a single document, GridFS breaks the file into manageable pieces and stores these pieces in a structured manner.
This approach allows MongoDB to handle files of virtually any size while maintaining the benefits of a database system, such as replication, sharding, indexing, and backup management.
For example, if a video file is 100 MB in size, GridFS can divide it into hundreds of smaller chunks and store each chunk separately. When the file is requested, MongoDB automatically reconstructs the file from these chunks.
How GridFS Works
GridFS uses two collections to store file information:
fs.files Collection
This collection stores metadata about each file, including:
-
File name
-
Upload date
-
File size
-
Content type
-
Chunk size
-
Additional custom metadata
Example document:
{
"_id": ObjectId("65ab123"),
"filename": "training_video.mp4",
"length": 104857600,
"chunkSize": 261120,
"uploadDate": ISODate("2026-06-12"),
"metadata": {
"department": "Training",
"author": "Admin"
}
}
fs.chunks Collection
This collection stores the actual file data in small chunks.
Example document:
{
"_id": ObjectId("65ab456"),
"files_id": ObjectId("65ab123"),
"n": 0,
"data": BinData(...)
}
Here:
-
files_idlinks the chunk to its parent file. -
nindicates the sequence number of the chunk. -
datacontains the binary content.
When MongoDB retrieves the file, it reads all chunks in sequence and reconstructs the original file.
File Upload Process
When a file is uploaded using GridFS:
-
MongoDB reads the file.
-
The file is divided into chunks.
-
Metadata is stored in the
fs.filescollection. -
Each chunk is stored as a separate document in the
fs.chunkscollection. -
References between chunks and file metadata are maintained.
For instance, a 50 MB file may be divided into approximately 200 chunks depending on the configured chunk size.
File Download Process
When a user requests a file:
-
MongoDB locates the file metadata in
fs.files. -
The associated chunks are identified through the
files_idfield. -
Chunks are retrieved in order.
-
The chunks are combined to recreate the original file.
-
The completed file is delivered to the application.
This process is transparent to the user.
Default Chunk Size
GridFS uses a default chunk size of 255 KB.
The chunk size can be customized during file upload.
Example:
bucket.uploadFromStream(
"largefile.zip",
fileStream,
{
chunkSizeBytes: 1024 * 1024
}
);
In this example, each chunk is 1 MB.
Choosing an appropriate chunk size can improve performance depending on the application's requirements.
Advantages of GridFS
Handles Very Large Files
GridFS allows storage of files significantly larger than MongoDB's 16 MB document limit.
Supports Streaming
Files can be streamed directly from MongoDB without loading the entire file into memory.
This is especially useful for:
-
Video streaming
-
Audio streaming
-
Document viewing
-
Cloud storage systems
Replication Support
Files stored in GridFS automatically benefit from MongoDB's replica sets.
This provides:
-
High availability
-
Data redundancy
-
Automatic failover
Sharding Support
Large file collections can be distributed across multiple servers using MongoDB sharding.
Benefits include:
-
Better scalability
-
Load distribution
-
Increased storage capacity
Metadata Storage
Applications can attach additional metadata to files.
Examples include:
-
User information
-
Categories
-
Tags
-
Upload history
-
Security classifications
This metadata can be searched and indexed efficiently.
GridFS Architecture
A typical GridFS architecture consists of:
Application
|
V
MongoDB Driver
|
V
GridFS Bucket
|
+----------------+
| |
V V
fs.files fs.chunks
The application communicates with MongoDB through a GridFS bucket. The bucket manages storage and retrieval operations across the two collections.
Creating a GridFS Bucket
In MongoDB, a bucket acts as a logical container for files.
Example in Node.js:
const bucket = new GridFSBucket(db, {
bucketName: "documents"
});
This creates collections:
documents.files
documents.chunks
instead of the default:
fs.files
fs.chunks
This allows separation of different file categories.
Uploading Files with GridFS
Example using Node.js:
const uploadStream = bucket.openUploadStream(
"report.pdf"
);
fs.createReadStream("report.pdf")
.pipe(uploadStream);
MongoDB automatically splits the file into chunks and stores them.
Downloading Files with GridFS
Example:
bucket.openDownloadStream(fileId)
.pipe(fs.createWriteStream("output.pdf"));
The file is reconstructed automatically during download.
Deleting Files
Deleting a GridFS file removes both its metadata and associated chunks.
Example:
bucket.delete(fileId);
This prevents orphaned chunks from remaining in the database.
Common Use Cases
Content Management Systems
Organizations store:
-
PDFs
-
Documents
-
Images
-
Presentations
within MongoDB.
Video Streaming Platforms
Large video files can be streamed directly through GridFS.
Medical Applications
Hospitals often store:
-
MRI scans
-
X-rays
-
Medical reports
along with metadata.
Cloud Storage Services
GridFS can serve as a backend storage mechanism for custom file-sharing applications.
IoT Systems
Devices may generate large logs and binary data that need long-term storage.
Limitations of GridFS
Performance Overhead
Breaking files into chunks introduces additional database operations.
Not Always Ideal for Static Content
For websites serving millions of images or videos, dedicated object storage systems may perform better.
Examples include:
-
Amazon S3
-
Google Cloud Storage
-
Azure Blob Storage
Increased Database Size
Storing many large files can significantly increase MongoDB storage requirements.
GridFS vs Traditional File System
| Feature | GridFS | File System |
|---|---|---|
| Large File Support | Yes | Yes |
| Metadata Storage | Built-in | Separate mechanism required |
| Replication | Automatic | Additional setup needed |
| Sharding | Supported | Complex implementation |
| Querying Metadata | Easy | Limited |
| Scalability | High | Depends on infrastructure |
Best Practices
-
Use GridFS only when files need to be managed within MongoDB.
-
Store meaningful metadata for easier searching and categorization.
-
Enable indexes on frequently queried metadata fields.
-
Use replica sets for reliability.
-
Implement sharding when handling massive file collections.
-
Monitor storage growth and chunk distribution.
-
Consider object storage solutions when serving extremely high volumes of media content.
Conclusion
GridFS is MongoDB's specialized solution for storing and managing files that exceed the 16 MB document limit. By dividing files into smaller chunks and storing metadata separately, GridFS enables efficient handling of large multimedia files, documents, backups, and binary data. It integrates seamlessly with MongoDB features such as replication, sharding, indexing, and metadata querying, making it a powerful choice for applications that require both file storage and database capabilities within a unified system.