MongoDB data modeling is not about copying relational tables into collections. It is about designing documents around how the application reads, writes, scales, and evolves. MongoDB’s own guidance is very clear: model data based on application access patterns, and keep data accessed together in the same document where it makes sense. 

This blog explains the most important MongoDB data modeling patterns, where each pattern fits, when to avoid it, and how to implement it with practical examples

The Golden Rule of MongoDB Data Modeling

In relational modeling, we usually start with entities and normalization.

In MongoDB, start with:

  • What does the application need to read together?
  • What does it update together?
  • What must scale independently?
  • What query must be fast?
  • What data grows without limit?

A good MongoDB model is not always the most normalized model. It is the model that minimizes unnecessary joins, reduces query fan-out, supports indexing, and keeps documents within healthy size and growth boundaries.

MongoDB documents can have flexible structures, and documents in the same collection do not need to have identical fields. This flexibility supports polymorphic models, evolving schemas, and domain-oriented design. 

Pattern Selection Matrix

PatternBest ForAvoid When
Embedded Document PatternOne-to-one, one-to-few, tightly coupled dataChild array grows without limit
Reference PatternMany-to-many, large independent entitiesYou always need child data with parent
Extended Reference PatternReducing repeated lookupsReferenced fields change very frequently
Subset PatternProduct pages, profile previews, dashboardsFull child data is always needed
Bucket PatternTime-series, telemetry, logs, IoT eventsRandom updates inside buckets dominate
Attribute PatternDynamic attributes, product specs, metadataAttributes are fixed and simple
Computed PatternExpensive repeated calculationsSource data changes every second
Outlier PatternHandling exceptional large documentsMost records are large, not exceptional
Archive PatternLifecycle management, cold dataHistorical data must be updated frequently
Polymorphic / Single Collection PatternSimilar entities with slight variationQuery patterns are completely different
Approximation PatternCounters, likes, views, analyticsExact real-time precision is mandatory
Tree PatternHierarchies, org charts, categoriesDeep graph traversal is required

1. Embedded Document Pattern

What it solves

Use embedding when related data is naturally owned by the parent and commonly read together.

Classic examples:

  • Customer with addresses
  • Product with top reviews
  • Order with line items
  • Vehicle trip with route summary
  • Employee with department snapshot

MongoDB recommends embedding when data is accessed together because it can reduce the need for application-side joins and allow one-query reads.

Example: E-commerce Order

db.orders.insertOne({
orderId: "ORD-1001",
customerId: "CUST-501",
orderDate: ISODate("2026-05-20T10:30:00Z"),
status: "CONFIRMED",
shippingAddress: {
name: "Sachin Gupta",
city: "Delhi",
pincode: "110001",
country: "India"
},
items: [
{
sku: "SHOE-001",
name: "Running Shoes",
qty: 1,
price: 5999
},
{
sku: "TSHIRT-022",
name: "Cotton T-Shirt",
qty: 2,
price: 999
}
],
payment: {
mode: "UPI",
status: "SUCCESS"
}
})

Why this works

The order and its line items are normally read together. Embedding avoids querying orders, then order_items, then productsjust to show an order confirmation page.

Index

db.orders.createIndex({ customerId: 1, orderDate: -1 })
db.orders.createIndex({ orderId: 1 }, { unique: true })

When not to use

Avoid embedding when:

  • The child array grows endlessly.
  • Child records are updated independently at high frequency.
  • Many parents share the same child record.
  • The embedded array can become very large.

2. Reference Pattern

What it solves

Use references when related entities have independent lifecycles or many-to-many relationships.

Examples:

  • Users and roles
  • Products and suppliers
  • Students and courses
  • Vehicles and drivers
  • Customers and loyalty programs

Example: Product and Supplier

db.products.insertOne({
sku: "CAMERA-001",
title: "Mirrorless Camera",
category: "Electronics",
supplierId: ObjectId("6650aa111111111111111111"),
price: 84999
})
db.suppliers.insertOne({
_id: ObjectId("6650aa111111111111111111"),
name: "Global Camera Distributors",
country: "India",
rating: 4.7
})

Query with $lookup

db.products.aggregate([
{
$match: {
sku: "CAMERA-001"
}
},
{
$lookup: {
from: "suppliers",
localField: "supplierId",
foreignField: "_id",
as: "supplier"
}
},
{
$unwind: "$supplier"
}
])

When this works

Use referencing when the supplier is reused across thousands of products and supplier details are updated independently.

When not to use

Avoid pure referencing when the application always needs supplier name, rating, and city with every product listing. In that case, use the Extended Reference Pattern

Extended Reference Pattern

What it solves

The Extended Reference Pattern duplicates selected fields from a referenced document into the main document to reduce frequent lookups.

MongoDB describes this pattern as useful when applications perform repetitive joins to lookup data; by bringing frequently accessed fields into the main document, reads become faster, but the tradeoff is duplication. 

Example: Product with Supplier Snapshot

db.products.insertOne({
sku: "CAMERA-001",
title: "Mirrorless Camera",
category: "Electronics",
price: 84999,
supplierRef: {
supplierId: ObjectId("6650aa111111111111111111"),
name: "Global Camera Distributors",
country: "India",
rating: 4.7
}
})

Now the product listing page does not need a $lookup

db.products.find(
{
category: "Electronics"
},
{
title: 1,
price: 1,
"supplierRef.name": 1,
"supplierRef.rating": 1
}
)

Good use cases

  • Product catalog
  • Order history
  • Customer profile snapshot
  • Vendor snapshot in purchase orders
  • Driver snapshot in trip records

Critical warning

Do not duplicate fields that change frequently.

Good duplicated fields:

  • supplier name
  • customer name
  • product title at order time
  • city
  • rating category

Risky duplicated fields:

  • inventory
  • wallet balance
  • real-time price
  • credit limit
  • current fraud score

Subset Pattern

What it solves

The Subset Pattern stores the most frequently accessed subset of child data inside the parent document, while the full child data remains in another collection.

MongoDB documentation gives a similar e-commerce example: product pages may show only the five most recent reviews, while older reviews can stay in a separate collection. 

Example: Product with Recent Reviews

db.products.insertOne({
sku: "SHOE-001",
title: "Running Shoes",
price: 5999,
recentReviews: [
{
reviewId: ObjectId(),
userName: "Amit",
rating: 5,
comment: "Very comfortable",
reviewDate: ISODate("2026-05-18T10:00:00Z")
},
{
reviewId: ObjectId(),
userName: "Neha",
rating: 4,
comment: "Good for daily running",
reviewDate: ISODate("2026-05-17T09:00:00Z")
}
],
reviewCount: 245,
avgRating: 4.6
})

Full reviews collection:

db.productReviews.insertOne({
productSku: "SHOE-001",
userId: "USER-889",
rating: 5,
comment: "Excellent cushioning for long runs",
reviewDate: ISODate("2026-05-18T10:00:00Z"),
images: [
"review1.jpg",
"review2.jpg"
]
})

Indexes

db.products.createIndex({ sku: 1 }, { unique: true })
db.productReviews.createIndex({ productSku: 1, reviewDate: -1 })

Best for

  • Product detail pages
  • Profile pages
  • News feeds
  • Dashboards
  • Recent transactions
  • Recent alerts

Avoid when

The application always needs all child records. In that case, referencing or pagination is better.

Bucket Pattern

What it solves

The Bucket Pattern groups many events into one document. It is especially useful for time-series, IoT, telemetry, logs, sensor data, and clickstream events.

MongoDB also has native Time Series Collections, which are optimized for timestamped data. For modern telemetry and IoT use cases, prefer native Time Series Collections unless you have a custom bucketing requirement. MongoDB data modeling guidance includes storage strategy and lifecycle management as key modeling considerations. 

Manual Bucket Example: Vehicle Telematics

db.vehicleTelemetryBuckets.insertOne({
vehicleId: "VH-1001",
bucketStart: ISODate("2026-05-20T10:00:00Z"),
bucketEnd: ISODate("2026-05-20T10:05:00Z"),
count: 3,
readings: [
{
ts: ISODate("2026-05-20T10:00:01Z"),
speed: 68,
fuel: 44,
engineTemp: 91,
gps: [77.5946, 12.9716]
},
{
ts: ISODate("2026-05-20T10:00:02Z"),
speed: 70,
fuel: 44,
engineTemp: 92,
gps: [77.5948, 12.9719]
}
]
})

Native Time Series Collection

db.createCollection("vehicleTelemetry", {
timeseries: {
timeField: "timestamp",
metaField: "vehicle",
granularity: "seconds"
}
})
db.vehicleTelemetry.insertOne({
vehicle: {
vehicleId: "VH-1001",
fleetId: "FLEET-NORTH",
vehicleType: "truck"
},
timestamp: ISODate("2026-05-20T10:00:01Z"),
speed: 68,
fuel: 44,
engineTemp: 91,
location: {
type: "Point",
coordinates: [77.5946, 12.9716]
}
})

Query: average speed per vehicle

db.vehicleTelemetry.aggregate([
{
$match: {
timestamp: {
$gte: ISODate("2026-05-20T10:00:00Z"),
$lt: ISODate("2026-05-20T11:00:00Z")
},
"vehicle.fleetId": "FLEET-NORTH"
}
},
{
$group: {
_id: "$vehicle.vehicleId",
avgSpeed: { $avg: "$speed" },
maxTemp: { $max: "$engineTemp" }
}
}
])

Best for

  • Telematics
  • IoT sensors
  • Clickstream
  • Logs
  • Financial ticks
  • Energy meter readings
  • Industrial equipment monitoring

Avoid when

  • Every event needs independent frequent updates.
  • You need strict event-level transactional updates.
  • Buckets can grow unpredictably.

Attribute Pattern

What it solves

The Attribute Pattern is useful when documents have many optional, dynamic, or sparsely populated attributes.

Common use cases:

  • Product specifications
  • Vehicle features
  • Medical observations
  • Real estate attributes
  • Insurance policy clauses
  • Marketplace filters

Instead of creating hundreds of sparse fields, store attributes as key-value objects or arrays.

Bad Model

{
"sku": "LAPTOP-001",
"ram": "16GB",
"processor": "Intel i7",
"screenSize": "14 inch",
"battery": "70Wh",
"touchscreen": true,
"graphicsCard": "RTX 4060"
}

This is okay for one category, but not for a marketplace with laptops, gowns, cameras, shoes, jewellery, and furniture.

Attribute Pattern Model

db.products.insertOne({
sku: "LAPTOP-001",
title: "Business Laptop",
category: "Electronics",
price: 85000,
attributes: [
{
k: "ram",
v: "16GB"
},
{
k: "processor",
v: "Intel i7"
},
{
k: "screenSize",
v: "14 inch"
},
{
k: "touchscreen",
v: true
}
]
})

INDEX

db.products.createIndex({
category: 1,
"attributes.k": 1,
"attributes.v": 1
})

QUERY

db.products.find({
category: "Electronics",
attributes: {
$elemMatch: {
k: "ram",
v: "16GB"
}
}
})

Best for

  • E-commerce catalog filters
  • Dynamic metadata
  • Multi-category platforms
  • Configurable products
  • Search facets

Avoid when

Fields are stable, mandatory, and heavily queried. In that case, top-level indexed fields are simpler and faster.

Computed Pattern

What it solves

The Computed Pattern stores pre-calculated values to avoid expensive repeated computation.

Examples:

  • Average rating
  • Order total
  • Wallet ledger balance
  • Product review count
  • Monthly revenue
  • Driver score
  • Customer lifetime value

Example: Product Rating Summary

db.products.insertOne({
sku: "SHOE-001",
title: "Running Shoes",
ratingSummary: {
avgRating: 4.6,
reviewCount: 245,
fiveStarCount: 180,
fourStarCount: 40
}
})

When a new review comes:

db.products.updateOne(
{ sku: "SHOE-001" },
{
$inc: {
"ratingSummary.reviewCount": 1,
"ratingSummary.fiveStarCount": 1
},
$set: {
"ratingSummary.avgRating": 4.61
}
}
)

Best for

  • Dashboard metrics
  • Frequently displayed counts
  • Aggregated KPIs
  • Leaderboards
  • Product ratings

Avoid when

  • Exact real-time correctness is mandatory and updates are extremely frequent.
  • Many concurrent updates hit the same document, causing write contention.

In high-write workloads, consider partitioned counters.

Outlier Pattern

What it solves

Most documents follow normal size and growth behavior, but a few become unusually large. The Outlier Pattern keeps normal documents simple and moves exceptional data into a separate collection.

MongoDB University’s advanced schema design patterns include the Outlier Pattern as a pattern for handling exceptional cases in schema design. 

Example: Influencer Profile with Huge Followers

Normal users:

db.users.insertOne({
userId: "USER-1001",
name: "Regular User",
followerIds: [
"USER-2001",
"USER-2002"
],
hasOutlierFollowers: false
})

Outlier user:

db.users.insertOne({
  userId: "USER-9999",
  name: "Celebrity User",
  followerPreview: [
    "USER-2001",
    "USER-2002",
    "USER-2003"
  ],
  followerCount: 9000000,
  hasOutlierFollowers: true
})

Separate follower collection:

db.userFollowers.insertOne({
userId: "USER-9999",
followerId: "USER-8888",
followedAt: ISODate("2026-05-20T10:00:00Z")
})

Index

db.userFollowers.createIndex({ userId: 1, followedAt: -1 })

Best for

  • Social followers
  • Product reviews
  • Viral posts
  • Popular products
  • Chat groups with huge memberships

Avoid when

Most documents are outliers. Then it is not an outlier problem; it is your core data model.

Archive Pattern

What it solves

The Archive Pattern separates hot operational data from cold historical data.

MongoDB’s best-practice guidance asks modelers to consider lifecycle management from creation to archiving and deletion for performance, cost, and security.

Example: Orders

Hot collection:

db.orders.insertOne({
orderId: "ORD-1001",
customerId: "CUST-501",
orderDate: ISODate("2026-05-20T10:30:00Z"),
status: "DELIVERED",
totalAmount: 7997
})

Archive collection:

db.orders_archive.insertOne({
orderId: "ORD-2019-9001",
customerId: "CUST-501",
orderDate: ISODate("2019-02-10T10:30:00Z"),
status: "DELIVERED",
totalAmount: 4999,
archivedAt: ISODate("2026-05-20T00:00:00Z")
})

TTL for Temporary Data

db.sessions.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 0 }
)

Best for

  • Logs
  • Old orders
  • Old trips
  • Old telemetry
  • Compliance retention
  • Cost optimization

Avoid when

The application frequently updates old data.

Polymorphic / Single Collection Pattern

What it solves

Use this when multiple document types share common query patterns but have different fields.

Examples:

  • Notifications
  • Activity feeds
  • Content management
  • Payments
  • Events
  • Audit logs

Example: Activity Feed

db.activities.insertMany([
{
activityType: "ORDER_PLACED",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T10:00:00Z"),
orderId: "ORD-1001",
amount: 5999
},
{
activityType: "PRODUCT_REVIEWED",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T11:00:00Z"),
productSku: "SHOE-001",
rating: 5
},
{
activityType: "LOGIN",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T12:00:00Z"),
ip: "103.20.10.1"
}
])


Index
db.activities.createIndex({ userId: 1, createdAt: -1 })
db.activities.createIndex({ activityType: 1, createdAt: -1 })

Query

db.activities.find({
userId: "USER-1001"
}).sort({
createdAt: -1
}).limit(20)

Best for

  • Feed design
  • Audit trail
  • Notification system
  • Event store
  • Timeline views

Avoid when

Each type has completely different query patterns, retention policies, and indexes.

Approximation Pattern

What it solves

The Approximation Pattern sacrifices exactness for scale and cost efficiency.

Examples:

  • View counts
  • Like counts
  • Impressions
  • Page analytics

Instead of updating the same counter document for every event, aggregate periodically or use partitioned counters.

Example: Partitioned Counter

db.productCounters.updateOne(
{
sku: "SHOE-001",
counterType: "views",
partition: Math.floor(Math.random() * 20)
},
{
$inc: {
count: 1
}
},
{
upsert: true
}
)

Read Toatal
db.productCounters.aggregate([
{
$match: {
sku: "SHOE-001",
counterType: "views"
}
},
{
$group: {
_id: "$sku",
totalViews: { $sum: "$count" }
}
}
])

Best for

  • High-volume counters
  • Social media metrics
  • Product views
  • Ad impressions
  • Trending calculations

Avoid when

You need financial-grade accuracy on every read.

Tree Pattern

What it solves

Use Tree Patterns for hierarchical data.

Examples:

  • Product categories
  • Organization hierarchy
  • Location hierarchy
  • Folder structures
  • Menu navigation

Materialized Path Example

db.categories.insertMany([
{
categoryId: "electronics",
name: "Electronics",
path: ",electronics,"
},
{
categoryId: "mobiles",
name: "Mobiles",
parentId: "electronics",
path: ",electronics,mobiles,"
},
{
categoryId: "android-phones",
name: "Android Phones",
parentId: "mobiles",
path: ",electronics,mobiles,android-phones,"
}
])

Query descendants

db.categories.find({
path: /^,electronics,mobiles,/
})

Index

db.categories.createIndex({ path: 1 })

Best for

  • Category trees
  • Navigation menus
  • Org structures
  • File folders

Avoid when

You need complex graph traversal. For deep graph relationships, use $graphLookup carefully or consider a graph-specialized design.

Schema Versioning Pattern

What it solves

Applications evolve. Schema Versioning allows documents with different versions to coexist safely.

db.customers.insertOne({
customerId: "CUST-1001",
schemaVersion: 2,
name: {
first: "Sachin",
last: "Gupta"
},
contact: {
email: "sachin@example.com",
mobile: "+91XXXXXXXXXX"
}
})

Older document:

db.customers.insertOne({
customerId: "CUST-999",
schemaVersion: 1,
fullName: "Old Customer",
email: "old@example.com"
})

Application migration logic:

function normalizeCustomer(doc) {
if (doc.schemaVersion === 1) {
return {
customerId: doc.customerId,
name: {
first: doc.fullName.split(" ")[0],
last: doc.fullName.split(" ").slice(1).join(" ")
},
contact: {
email: doc.email
}
}
}

return doc
}

Best for

  • Large collections
  • Zero-downtime migrations
  • Evolving APIs
  • SaaS platforms

Avoid when

You can migrate all records safely before release.