MongoDb - MongoDB GridFS for Large File Storage

MongoDB is designed to store data in BSON documents, and each document has a maximum size limit of 16 MB. While this size is sufficient for most application data, it becomes a limitation when dealing with large files such as videos, audio recordings, high-resolution images, PDFs, backups, medical scans, and other multimedia content. To overcome this restriction, MongoDB provides a specification called GridFS, which enables efficient storage and retrieval of files larger than 16 MB.

What is GridFS?

GridFS is a file storage system built into MongoDB that divides large files into smaller chunks and stores them across multiple documents. Instead of storing an entire file in a single document, GridFS breaks the file into manageable pieces and stores these pieces in a structured manner.

This approach allows MongoDB to handle files of virtually any size while maintaining the benefits of a database system, such as replication, sharding, indexing, and backup management.

For example, if a video file is 100 MB in size, GridFS can divide it into hundreds of smaller chunks and store each chunk separately. When the file is requested, MongoDB automatically reconstructs the file from these chunks.

How GridFS Works

GridFS uses two collections to store file information:

fs.files Collection

This collection stores metadata about each file, including:

  • File name

  • Upload date

  • File size

  • Content type

  • Chunk size

  • Additional custom metadata

Example document:

{
  "_id": ObjectId("65ab123"),
  "filename": "training_video.mp4",
  "length": 104857600,
  "chunkSize": 261120,
  "uploadDate": ISODate("2026-06-12"),
  "metadata": {
    "department": "Training",
    "author": "Admin"
  }
}

fs.chunks Collection

This collection stores the actual file data in small chunks.

Example document:

{
  "_id": ObjectId("65ab456"),
  "files_id": ObjectId("65ab123"),
  "n": 0,
  "data": BinData(...)
}

Here:

  • files_id links the chunk to its parent file.

  • n indicates the sequence number of the chunk.

  • data contains the binary content.

When MongoDB retrieves the file, it reads all chunks in sequence and reconstructs the original file.

File Upload Process

When a file is uploaded using GridFS:

  1. MongoDB reads the file.

  2. The file is divided into chunks.

  3. Metadata is stored in the fs.files collection.

  4. Each chunk is stored as a separate document in the fs.chunks collection.

  5. References between chunks and file metadata are maintained.

For instance, a 50 MB file may be divided into approximately 200 chunks depending on the configured chunk size.

File Download Process

When a user requests a file:

  1. MongoDB locates the file metadata in fs.files.

  2. The associated chunks are identified through the files_id field.

  3. Chunks are retrieved in order.

  4. The chunks are combined to recreate the original file.

  5. The completed file is delivered to the application.

This process is transparent to the user.

Default Chunk Size

GridFS uses a default chunk size of 255 KB.

The chunk size can be customized during file upload.

Example:

bucket.uploadFromStream(
  "largefile.zip",
  fileStream,
  {
    chunkSizeBytes: 1024 * 1024
  }
);

In this example, each chunk is 1 MB.

Choosing an appropriate chunk size can improve performance depending on the application's requirements.

Advantages of GridFS

Handles Very Large Files

GridFS allows storage of files significantly larger than MongoDB's 16 MB document limit.

Supports Streaming

Files can be streamed directly from MongoDB without loading the entire file into memory.

This is especially useful for:

  • Video streaming

  • Audio streaming

  • Document viewing

  • Cloud storage systems

Replication Support

Files stored in GridFS automatically benefit from MongoDB's replica sets.

This provides:

  • High availability

  • Data redundancy

  • Automatic failover

Sharding Support

Large file collections can be distributed across multiple servers using MongoDB sharding.

Benefits include:

  • Better scalability

  • Load distribution

  • Increased storage capacity

Metadata Storage

Applications can attach additional metadata to files.

Examples include:

  • User information

  • Categories

  • Tags

  • Upload history

  • Security classifications

This metadata can be searched and indexed efficiently.

GridFS Architecture

A typical GridFS architecture consists of:

Application
     |
     V
MongoDB Driver
     |
     V
GridFS Bucket
     |
     +----------------+
     |                |
     V                V
 fs.files       fs.chunks

The application communicates with MongoDB through a GridFS bucket. The bucket manages storage and retrieval operations across the two collections.

Creating a GridFS Bucket

In MongoDB, a bucket acts as a logical container for files.

Example in Node.js:

const bucket = new GridFSBucket(db, {
    bucketName: "documents"
});

This creates collections:

documents.files
documents.chunks

instead of the default:

fs.files
fs.chunks

This allows separation of different file categories.

Uploading Files with GridFS

Example using Node.js:

const uploadStream = bucket.openUploadStream(
    "report.pdf"
);

fs.createReadStream("report.pdf")
  .pipe(uploadStream);

MongoDB automatically splits the file into chunks and stores them.

Downloading Files with GridFS

Example:

bucket.openDownloadStream(fileId)
      .pipe(fs.createWriteStream("output.pdf"));

The file is reconstructed automatically during download.

Deleting Files

Deleting a GridFS file removes both its metadata and associated chunks.

Example:

bucket.delete(fileId);

This prevents orphaned chunks from remaining in the database.

Common Use Cases

Content Management Systems

Organizations store:

  • PDFs

  • Documents

  • Images

  • Presentations

within MongoDB.

Video Streaming Platforms

Large video files can be streamed directly through GridFS.

Medical Applications

Hospitals often store:

  • MRI scans

  • X-rays

  • Medical reports

along with metadata.

Cloud Storage Services

GridFS can serve as a backend storage mechanism for custom file-sharing applications.

IoT Systems

Devices may generate large logs and binary data that need long-term storage.

Limitations of GridFS

Performance Overhead

Breaking files into chunks introduces additional database operations.

Not Always Ideal for Static Content

For websites serving millions of images or videos, dedicated object storage systems may perform better.

Examples include:

  • Amazon S3

  • Google Cloud Storage

  • Azure Blob Storage

Increased Database Size

Storing many large files can significantly increase MongoDB storage requirements.

GridFS vs Traditional File System

Feature GridFS File System
Large File Support Yes Yes
Metadata Storage Built-in Separate mechanism required
Replication Automatic Additional setup needed
Sharding Supported Complex implementation
Querying Metadata Easy Limited
Scalability High Depends on infrastructure

Best Practices

  1. Use GridFS only when files need to be managed within MongoDB.

  2. Store meaningful metadata for easier searching and categorization.

  3. Enable indexes on frequently queried metadata fields.

  4. Use replica sets for reliability.

  5. Implement sharding when handling massive file collections.

  6. Monitor storage growth and chunk distribution.

  7. Consider object storage solutions when serving extremely high volumes of media content.

Conclusion

GridFS is MongoDB's specialized solution for storing and managing files that exceed the 16 MB document limit. By dividing files into smaller chunks and storing metadata separately, GridFS enables efficient handling of large multimedia files, documents, backups, and binary data. It integrates seamlessly with MongoDB features such as replication, sharding, indexing, and metadata querying, making it a powerful choice for applications that require both file storage and database capabilities within a unified system.