MongoDB data modeling is not about copying relational tables into collections. It is about designing documents around how the application reads, writes, scales, and evolves. MongoDB’s own guidance is very clear: model data based on application access patterns, and keep data accessed together in the same document where it makes sense.

This blog explains the most important MongoDB data modeling patterns, where each pattern fits, when to avoid it, and how to implement it with practical examples

The Golden Rule of MongoDB Data Modeling

In relational modeling, we usually start with entities and normalization.

In MongoDB, start with:

What does the application need to read together?
What does it update together?
What must scale independently?
What query must be fast?
What data grows without limit?

A good MongoDB model is not always the most normalized model. It is the model that minimizes unnecessary joins, reduces query fan-out, supports indexing, and keeps documents within healthy size and growth boundaries.

MongoDB documents can have flexible structures, and documents in the same collection do not need to have identical fields. This flexibility supports polymorphic models, evolving schemas, and domain-oriented design.

Pattern Selection Matrix

Pattern	Best For	Avoid When
Embedded Document Pattern	One-to-one, one-to-few, tightly coupled data	Child array grows without limit
Reference Pattern	Many-to-many, large independent entities	You always need child data with parent
Extended Reference Pattern	Reducing repeated lookups	Referenced fields change very frequently
Subset Pattern	Product pages, profile previews, dashboards	Full child data is always needed
Bucket Pattern	Time-series, telemetry, logs, IoT events	Random updates inside buckets dominate
Attribute Pattern	Dynamic attributes, product specs, metadata	Attributes are fixed and simple
Computed Pattern	Expensive repeated calculations	Source data changes every second
Outlier Pattern	Handling exceptional large documents	Most records are large, not exceptional
Archive Pattern	Lifecycle management, cold data	Historical data must be updated frequently
Polymorphic / Single Collection Pattern	Similar entities with slight variation	Query patterns are completely different
Approximation Pattern	Counters, likes, views, analytics	Exact real-time precision is mandatory
Tree Pattern	Hierarchies, org charts, categories	Deep graph traversal is required

1. Embedded Document Pattern

What it solves

Use embedding when related data is naturally owned by the parent and commonly read together.

Classic examples:

Customer with addresses
Product with top reviews
Order with line items
Vehicle trip with route summary
Employee with department snapshot

MongoDB recommends embedding when data is accessed together because it can reduce the need for application-side joins and allow one-query reads.

Example: E-commerce Order

db.orders.insertOne({
orderId: "ORD-1001",
customerId: "CUST-501",
orderDate: ISODate("2026-05-20T10:30:00Z"),
status: "CONFIRMED",
shippingAddress: {
name: "Sachin Gupta",
city: "Delhi",
pincode: "110001",
country: "India"
},
items: [
{
sku: "SHOE-001",
name: "Running Shoes",
qty: 1,
price: 5999
},
{
sku: "TSHIRT-022",
name: "Cotton T-Shirt",
qty: 2,
price: 999
}
],
payment: {
mode: "UPI",
status: "SUCCESS"
}
})

Why this works

The order and its line items are normally read together. Embedding avoids querying orders, then order_items, then productsjust to show an order confirmation page.

Index

db.orders.createIndex({ customerId: 1, orderDate: -1 })
db.orders.createIndex({ orderId: 1 }, { unique: true })

When not to use

Avoid embedding when:

The child array grows endlessly.
Child records are updated independently at high frequency.
Many parents share the same child record.
The embedded array can become very large.

2. Reference Pattern

What it solves

Use references when related entities have independent lifecycles or many-to-many relationships.

Examples:

Users and roles
Products and suppliers
Students and courses
Vehicles and drivers
Customers and loyalty programs

Example: Product and Supplier

db.products.insertOne({
sku: "CAMERA-001",
title: "Mirrorless Camera",
category: "Electronics",
supplierId: ObjectId("6650aa111111111111111111"),
price: 84999
})

db.suppliers.insertOne({
_id: ObjectId("6650aa111111111111111111"),
name: "Global Camera Distributors",
country: "India",
rating: 4.7
})

Query with $lookup

db.products.aggregate([
{
$match: {
sku: "CAMERA-001"
}
},
{
$lookup: {
from: "suppliers",
localField: "supplierId",
foreignField: "_id",
as: "supplier"
}
},
{
$unwind: "$supplier"
}
])

When this works

Use referencing when the supplier is reused across thousands of products and supplier details are updated independently.

When not to use

Avoid pure referencing when the application always needs supplier name, rating, and city with every product listing. In that case, use the Extended Reference Pattern

Extended Reference Pattern

What it solves

The Extended Reference Pattern duplicates selected fields from a referenced document into the main document to reduce frequent lookups.

MongoDB describes this pattern as useful when applications perform repetitive joins to lookup data; by bringing frequently accessed fields into the main document, reads become faster, but the tradeoff is duplication.

Example: Product with Supplier Snapshot

db.products.insertOne({
sku: "CAMERA-001",
title: "Mirrorless Camera",
category: "Electronics",
price: 84999,

supplierRef: {
supplierId: ObjectId("6650aa111111111111111111"),
name: "Global Camera Distributors",
country: "India",
rating: 4.7
}
})

Now the product listing page does not need a $lookup

db.products.find(
{
category: "Electronics"
},
{
title: 1,
price: 1,
"supplierRef.name": 1,
"supplierRef.rating": 1
}
)

Good use cases

Product catalog
Order history
Customer profile snapshot
Vendor snapshot in purchase orders
Driver snapshot in trip records

Critical warning

Do not duplicate fields that change frequently.

Good duplicated fields:

supplier name
customer name
product title at order time
city
rating category

Risky duplicated fields:

inventory
wallet balance
real-time price
credit limit
current fraud score

Subset Pattern

What it solves

The Subset Pattern stores the most frequently accessed subset of child data inside the parent document, while the full child data remains in another collection.

MongoDB documentation gives a similar e-commerce example: product pages may show only the five most recent reviews, while older reviews can stay in a separate collection.

Example: Product with Recent Reviews

db.products.insertOne({
sku: "SHOE-001",
title: "Running Shoes",
price: 5999,
recentReviews: [
{
reviewId: ObjectId(),
userName: "Amit",
rating: 5,
comment: "Very comfortable",
reviewDate: ISODate("2026-05-18T10:00:00Z")
},
{
reviewId: ObjectId(),
userName: "Neha",
rating: 4,
comment: "Good for daily running",
reviewDate: ISODate("2026-05-17T09:00:00Z")
}
],
reviewCount: 245,
avgRating: 4.6
})

Full reviews collection:

db.productReviews.insertOne({
productSku: "SHOE-001",
userId: "USER-889",
rating: 5,
comment: "Excellent cushioning for long runs",
reviewDate: ISODate("2026-05-18T10:00:00Z"),
images: [
"review1.jpg",
"review2.jpg"
]
})

Indexes

db.products.createIndex({ sku: 1 }, { unique: true })
db.productReviews.createIndex({ productSku: 1, reviewDate: -1 })

Best for

Product detail pages
Profile pages
News feeds
Dashboards
Recent transactions
Recent alerts

Avoid when

The application always needs all child records. In that case, referencing or pagination is better.

Bucket Pattern

What it solves

The Bucket Pattern groups many events into one document. It is especially useful for time-series, IoT, telemetry, logs, sensor data, and clickstream events.

MongoDB also has native Time Series Collections, which are optimized for timestamped data. For modern telemetry and IoT use cases, prefer native Time Series Collections unless you have a custom bucketing requirement. MongoDB data modeling guidance includes storage strategy and lifecycle management as key modeling considerations.

Manual Bucket Example: Vehicle Telematics

db.vehicleTelemetryBuckets.insertOne({
vehicleId: "VH-1001",
bucketStart: ISODate("2026-05-20T10:00:00Z"),
bucketEnd: ISODate("2026-05-20T10:05:00Z"),
count: 3,
readings: [
{
ts: ISODate("2026-05-20T10:00:01Z"),
speed: 68,
fuel: 44,
engineTemp: 91,
gps: [77.5946, 12.9716]
},
{
ts: ISODate("2026-05-20T10:00:02Z"),
speed: 70,
fuel: 44,
engineTemp: 92,
gps: [77.5948, 12.9719]
}
]
})

Native Time Series Collection

db.createCollection("vehicleTelemetry", {
timeseries: {
timeField: "timestamp",
metaField: "vehicle",
granularity: "seconds"
}
})

db.vehicleTelemetry.insertOne({
vehicle: {
vehicleId: "VH-1001",
fleetId: "FLEET-NORTH",
vehicleType: "truck"
},
timestamp: ISODate("2026-05-20T10:00:01Z"),
speed: 68,
fuel: 44,
engineTemp: 91,
location: {
type: "Point",
coordinates: [77.5946, 12.9716]
}
})

Query: average speed per vehicle

db.vehicleTelemetry.aggregate([
{
$match: {
timestamp: {
$gte: ISODate("2026-05-20T10:00:00Z"),
$lt: ISODate("2026-05-20T11:00:00Z")
},
"vehicle.fleetId": "FLEET-NORTH"
}
},
{
$group: {
_id: "$vehicle.vehicleId",
avgSpeed: { $avg: "$speed" },
maxTemp: { $max: "$engineTemp" }
}
}
])

Best for

Telematics
IoT sensors
Clickstream
Logs
Financial ticks
Energy meter readings
Industrial equipment monitoring

Avoid when

Every event needs independent frequent updates.
You need strict event-level transactional updates.
Buckets can grow unpredictably.

Attribute Pattern

What it solves

The Attribute Pattern is useful when documents have many optional, dynamic, or sparsely populated attributes.

Common use cases:

Product specifications
Vehicle features
Medical observations
Real estate attributes
Insurance policy clauses
Marketplace filters

Instead of creating hundreds of sparse fields, store attributes as key-value objects or arrays.

Bad Model

{
"sku": "LAPTOP-001",
"ram": "16GB",
"processor": "Intel i7",
"screenSize": "14 inch",
"battery": "70Wh",
"touchscreen": true,
"graphicsCard": "RTX 4060"
}

This is okay for one category, but not for a marketplace with laptops, gowns, cameras, shoes, jewellery, and furniture.

Attribute Pattern Model

db.products.insertOne({
sku: "LAPTOP-001",
title: "Business Laptop",
category: "Electronics",
price: 85000,
attributes: [
{
k: "ram",
v: "16GB"
},
{
k: "processor",
v: "Intel i7"
},
{
k: "screenSize",
v: "14 inch"
},
{
k: "touchscreen",
v: true
}
]
})

INDEX

db.products.createIndex({
category: 1,
"attributes.k": 1,
"attributes.v": 1
})

QUERY

db.products.find({
category: "Electronics",
attributes: {
$elemMatch: {
k: "ram",
v: "16GB"
}
}
})

Best for

E-commerce catalog filters
Dynamic metadata
Multi-category platforms
Configurable products
Search facets

Avoid when

Fields are stable, mandatory, and heavily queried. In that case, top-level indexed fields are simpler and faster.

Computed Pattern

What it solves

The Computed Pattern stores pre-calculated values to avoid expensive repeated computation.

Examples:

Average rating
Order total
Wallet ledger balance
Product review count
Monthly revenue
Driver score
Customer lifetime value

Example: Product Rating Summary

db.products.insertOne({
sku: "SHOE-001",
title: "Running Shoes",
ratingSummary: {
avgRating: 4.6,
reviewCount: 245,
fiveStarCount: 180,
fourStarCount: 40
}
})

When a new review comes:

db.products.updateOne(
{ sku: "SHOE-001" },
{
$inc: {
"ratingSummary.reviewCount": 1,
"ratingSummary.fiveStarCount": 1
},
$set: {
"ratingSummary.avgRating": 4.61
}
}
)

Best for

Dashboard metrics
Frequently displayed counts
Aggregated KPIs
Leaderboards
Product ratings

Avoid when

Exact real-time correctness is mandatory and updates are extremely frequent.
Many concurrent updates hit the same document, causing write contention.

In high-write workloads, consider partitioned counters.

Outlier Pattern

What it solves

Most documents follow normal size and growth behavior, but a few become unusually large. The Outlier Pattern keeps normal documents simple and moves exceptional data into a separate collection.

MongoDB University’s advanced schema design patterns include the Outlier Pattern as a pattern for handling exceptional cases in schema design.

Example: Influencer Profile with Huge Followers

Normal users:

db.users.insertOne({
userId: "USER-1001",
name: "Regular User",
followerIds: [
"USER-2001",
"USER-2002"
],
hasOutlierFollowers: false
})

Outlier user:

db.users.insertOne({
  userId: "USER-9999",
  name: "Celebrity User",
  followerPreview: [
    "USER-2001",
    "USER-2002",
    "USER-2003"
  ],
  followerCount: 9000000,
  hasOutlierFollowers: true
})

Separate follower collection:

db.userFollowers.insertOne({
userId: "USER-9999",
followerId: "USER-8888",
followedAt: ISODate("2026-05-20T10:00:00Z")
})

Index

db.userFollowers.createIndex({ userId: 1, followedAt: -1 })

Best for

Social followers
Product reviews
Viral posts
Popular products
Chat groups with huge memberships

Avoid when

Most documents are outliers. Then it is not an outlier problem; it is your core data model.

Archive Pattern

What it solves

The Archive Pattern separates hot operational data from cold historical data.

MongoDB’s best-practice guidance asks modelers to consider lifecycle management from creation to archiving and deletion for performance, cost, and security.

Example: Orders

Hot collection:

db.orders.insertOne({
orderId: "ORD-1001",
customerId: "CUST-501",
orderDate: ISODate("2026-05-20T10:30:00Z"),
status: "DELIVERED",
totalAmount: 7997
})

Archive collection:

db.orders_archive.insertOne({
orderId: "ORD-2019-9001",
customerId: "CUST-501",
orderDate: ISODate("2019-02-10T10:30:00Z"),
status: "DELIVERED",
totalAmount: 4999,
archivedAt: ISODate("2026-05-20T00:00:00Z")
})

TTL for Temporary Data

db.sessions.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 0 }
)

Best for

Logs
Old orders
Old trips
Old telemetry
Compliance retention
Cost optimization

Avoid when

The application frequently updates old data.

Polymorphic / Single Collection Pattern

What it solves

Use this when multiple document types share common query patterns but have different fields.

Examples:

Notifications
Activity feeds
Content management
Payments
Events
Audit logs

Example: Activity Feed

db.activities.insertMany([
{
activityType: "ORDER_PLACED",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T10:00:00Z"),
orderId: "ORD-1001",
amount: 5999
},
{
activityType: "PRODUCT_REVIEWED",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T11:00:00Z"),
productSku: "SHOE-001",
rating: 5
},
{
activityType: "LOGIN",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T12:00:00Z"),
ip: "103.20.10.1"
}
])


Index

db.activities.createIndex({ userId: 1, createdAt: -1 })
db.activities.createIndex({ activityType: 1, createdAt: -1 })

Query

db.activities.find({
userId: "USER-1001"
}).sort({
createdAt: -1
}).limit(20)

Best for

Feed design
Audit trail
Notification system
Event store
Timeline views

Avoid when

Each type has completely different query patterns, retention policies, and indexes.

Approximation Pattern

What it solves

The Approximation Pattern sacrifices exactness for scale and cost efficiency.

Examples:

View counts
Like counts
Impressions
Page analytics

Instead of updating the same counter document for every event, aggregate periodically or use partitioned counters.

Example: Partitioned Counter

db.productCounters.updateOne(
{
sku: "SHOE-001",
counterType: "views",
partition: Math.floor(Math.random() * 20)
},
{
$inc: {
count: 1
}
},
{
upsert: true
}
)

Read Toatal

db.productCounters.aggregate([
{
$match: {
sku: "SHOE-001",
counterType: "views"
}
},
{
$group: {
_id: "$sku",
totalViews: { $sum: "$count" }
}
}
])

Best for

High-volume counters
Social media metrics
Product views
Ad impressions
Trending calculations

Avoid when

You need financial-grade accuracy on every read.

Tree Pattern

What it solves

Use Tree Patterns for hierarchical data.

Examples:

Product categories
Organization hierarchy
Location hierarchy
Folder structures
Menu navigation

Materialized Path Example

db.categories.insertMany([
{
categoryId: "electronics",
name: "Electronics",
path: ",electronics,"
},
{
categoryId: "mobiles",
name: "Mobiles",
parentId: "electronics",
path: ",electronics,mobiles,"
},
{
categoryId: "android-phones",
name: "Android Phones",
parentId: "mobiles",
path: ",electronics,mobiles,android-phones,"
}
])

Query descendants

db.categories.find({
path: /^,electronics,mobiles,/
})

Index

db.categories.createIndex({ path: 1 })

Best for

Category trees
Navigation menus
Org structures
File folders

Avoid when

You need complex graph traversal. For deep graph relationships, use $graphLookup carefully or consider a graph-specialized design.

Schema Versioning Pattern

What it solves

Applications evolve. Schema Versioning allows documents with different versions to coexist safely.

db.customers.insertOne({
customerId: "CUST-1001",
schemaVersion: 2,
name: {
first: "Sachin",
last: "Gupta"
},
contact: {
email: "sachin@example.com",
mobile: "+91XXXXXXXXXX"
}
})

Older document:

db.customers.insertOne({
customerId: "CUST-999",
schemaVersion: 1,
fullName: "Old Customer",
email: "old@example.com"
})

Application migration logic:

function normalizeCustomer(doc) {
if (doc.schemaVersion === 1) {
return {
customerId: doc.customerId,
name: {
first: doc.fullName.split(" ")[0],
last: doc.fullName.split(" ").slice(1).join(" ")
},
contact: {
email: doc.email
}
}
}

return doc
}

Best for

Large collections
Zero-downtime migrations
Evolving APIs
SaaS platforms

Avoid when

You can migrate all records safely before release.