MongoDB data modeling is not about copying relational tables into collections. It is about designing documents around how the application reads, writes, scales, and evolves. MongoDB’s own guidance is very clear: model data based on application access patterns, and keep data accessed together in the same document where it makes sense.
This blog explains the most important MongoDB data modeling patterns, where each pattern fits, when to avoid it, and how to implement it with practical examples
The Golden Rule of MongoDB Data Modeling
In relational modeling, we usually start with entities and normalization.
In MongoDB, start with:
- What does the application need to read together?
- What does it update together?
- What must scale independently?
- What query must be fast?
- What data grows without limit?
A good MongoDB model is not always the most normalized model. It is the model that minimizes unnecessary joins, reduces query fan-out, supports indexing, and keeps documents within healthy size and growth boundaries.
MongoDB documents can have flexible structures, and documents in the same collection do not need to have identical fields. This flexibility supports polymorphic models, evolving schemas, and domain-oriented design.
Pattern Selection Matrix
| Pattern | Best For | Avoid When |
|---|---|---|
| Embedded Document Pattern | One-to-one, one-to-few, tightly coupled data | Child array grows without limit |
| Reference Pattern | Many-to-many, large independent entities | You always need child data with parent |
| Extended Reference Pattern | Reducing repeated lookups | Referenced fields change very frequently |
| Subset Pattern | Product pages, profile previews, dashboards | Full child data is always needed |
| Bucket Pattern | Time-series, telemetry, logs, IoT events | Random updates inside buckets dominate |
| Attribute Pattern | Dynamic attributes, product specs, metadata | Attributes are fixed and simple |
| Computed Pattern | Expensive repeated calculations | Source data changes every second |
| Outlier Pattern | Handling exceptional large documents | Most records are large, not exceptional |
| Archive Pattern | Lifecycle management, cold data | Historical data must be updated frequently |
| Polymorphic / Single Collection Pattern | Similar entities with slight variation | Query patterns are completely different |
| Approximation Pattern | Counters, likes, views, analytics | Exact real-time precision is mandatory |
| Tree Pattern | Hierarchies, org charts, categories | Deep graph traversal is required |
1. Embedded Document Pattern
What it solves
Use embedding when related data is naturally owned by the parent and commonly read together.
Classic examples:
- Customer with addresses
- Product with top reviews
- Order with line items
- Vehicle trip with route summary
- Employee with department snapshot
MongoDB recommends embedding when data is accessed together because it can reduce the need for application-side joins and allow one-query reads.
Example: E-commerce Order
db.orders.insertOne({
orderId: "ORD-1001",
customerId: "CUST-501",
orderDate: ISODate("2026-05-20T10:30:00Z"),
status: "CONFIRMED",
shippingAddress: {
name: "Sachin Gupta",
city: "Delhi",
pincode: "110001",
country: "India"
},
items: [
{
sku: "SHOE-001",
name: "Running Shoes",
qty: 1,
price: 5999
},
{
sku: "TSHIRT-022",
name: "Cotton T-Shirt",
qty: 2,
price: 999
}
],
payment: {
mode: "UPI",
status: "SUCCESS"
}
})
Why this works
The order and its line items are normally read together. Embedding avoids querying orders, then order_items, then productsjust to show an order confirmation page.
Index
db.orders.createIndex({ customerId: 1, orderDate: -1 })
db.orders.createIndex({ orderId: 1 }, { unique: true })
When not to use
Avoid embedding when:
- The child array grows endlessly.
- Child records are updated independently at high frequency.
- Many parents share the same child record.
- The embedded array can become very large.
2. Reference Pattern
What it solves
Use references when related entities have independent lifecycles or many-to-many relationships.
Examples:
- Users and roles
- Products and suppliers
- Students and courses
- Vehicles and drivers
- Customers and loyalty programs
Example: Product and Supplier
db.products.insertOne({
sku: "CAMERA-001",
title: "Mirrorless Camera",
category: "Electronics",
supplierId: ObjectId("6650aa111111111111111111"),
price: 84999
})
db.suppliers.insertOne({
_id: ObjectId("6650aa111111111111111111"),
name: "Global Camera Distributors",
country: "India",
rating: 4.7
})
Query with $lookup
db.products.aggregate([
{
$match: {
sku: "CAMERA-001"
}
},
{
$lookup: {
from: "suppliers",
localField: "supplierId",
foreignField: "_id",
as: "supplier"
}
},
{
$unwind: "$supplier"
}
])
When this works
Use referencing when the supplier is reused across thousands of products and supplier details are updated independently.
When not to use
Avoid pure referencing when the application always needs supplier name, rating, and city with every product listing. In that case, use the Extended Reference Pattern
Extended Reference Pattern
What it solves
The Extended Reference Pattern duplicates selected fields from a referenced document into the main document to reduce frequent lookups.
MongoDB describes this pattern as useful when applications perform repetitive joins to lookup data; by bringing frequently accessed fields into the main document, reads become faster, but the tradeoff is duplication.
Example: Product with Supplier Snapshot
db.products.insertOne({
sku: "CAMERA-001",
title: "Mirrorless Camera",
category: "Electronics",
price: 84999,
supplierRef: {
supplierId: ObjectId("6650aa111111111111111111"),
name: "Global Camera Distributors",
country: "India",
rating: 4.7
}
})
Now the product listing page does not need a $lookup
db.products.find(
{
category: "Electronics"
},
{
title: 1,
price: 1,
"supplierRef.name": 1,
"supplierRef.rating": 1
}
)
Good use cases
- Product catalog
- Order history
- Customer profile snapshot
- Vendor snapshot in purchase orders
- Driver snapshot in trip records
Critical warning
Do not duplicate fields that change frequently.
Good duplicated fields:
- supplier name
- customer name
- product title at order time
- city
- rating category
Risky duplicated fields:
- inventory
- wallet balance
- real-time price
- credit limit
- current fraud score
Subset Pattern
What it solves
The Subset Pattern stores the most frequently accessed subset of child data inside the parent document, while the full child data remains in another collection.
MongoDB documentation gives a similar e-commerce example: product pages may show only the five most recent reviews, while older reviews can stay in a separate collection.
Example: Product with Recent Reviews
db.products.insertOne({
sku: "SHOE-001",
title: "Running Shoes",
price: 5999,
recentReviews: [
{
reviewId: ObjectId(),
userName: "Amit",
rating: 5,
comment: "Very comfortable",
reviewDate: ISODate("2026-05-18T10:00:00Z")
},
{
reviewId: ObjectId(),
userName: "Neha",
rating: 4,
comment: "Good for daily running",
reviewDate: ISODate("2026-05-17T09:00:00Z")
}
],
reviewCount: 245,
avgRating: 4.6
})
Full reviews collection:
db.productReviews.insertOne({
productSku: "SHOE-001",
userId: "USER-889",
rating: 5,
comment: "Excellent cushioning for long runs",
reviewDate: ISODate("2026-05-18T10:00:00Z"),
images: [
"review1.jpg",
"review2.jpg"
]
})
Indexes
db.products.createIndex({ sku: 1 }, { unique: true })
db.productReviews.createIndex({ productSku: 1, reviewDate: -1 })
Best for
- Product detail pages
- Profile pages
- News feeds
- Dashboards
- Recent transactions
- Recent alerts
Avoid when
The application always needs all child records. In that case, referencing or pagination is better.
Bucket Pattern
What it solves
The Bucket Pattern groups many events into one document. It is especially useful for time-series, IoT, telemetry, logs, sensor data, and clickstream events.
MongoDB also has native Time Series Collections, which are optimized for timestamped data. For modern telemetry and IoT use cases, prefer native Time Series Collections unless you have a custom bucketing requirement. MongoDB data modeling guidance includes storage strategy and lifecycle management as key modeling considerations.
Manual Bucket Example: Vehicle Telematics
db.vehicleTelemetryBuckets.insertOne({
vehicleId: "VH-1001",
bucketStart: ISODate("2026-05-20T10:00:00Z"),
bucketEnd: ISODate("2026-05-20T10:05:00Z"),
count: 3,
readings: [
{
ts: ISODate("2026-05-20T10:00:01Z"),
speed: 68,
fuel: 44,
engineTemp: 91,
gps: [77.5946, 12.9716]
},
{
ts: ISODate("2026-05-20T10:00:02Z"),
speed: 70,
fuel: 44,
engineTemp: 92,
gps: [77.5948, 12.9719]
}
]
})
Native Time Series Collection
db.createCollection("vehicleTelemetry", {
timeseries: {
timeField: "timestamp",
metaField: "vehicle",
granularity: "seconds"
}
})
db.vehicleTelemetry.insertOne({
vehicle: {
vehicleId: "VH-1001",
fleetId: "FLEET-NORTH",
vehicleType: "truck"
},
timestamp: ISODate("2026-05-20T10:00:01Z"),
speed: 68,
fuel: 44,
engineTemp: 91,
location: {
type: "Point",
coordinates: [77.5946, 12.9716]
}
})
Query: average speed per vehicle
db.vehicleTelemetry.aggregate([
{
$match: {
timestamp: {
$gte: ISODate("2026-05-20T10:00:00Z"),
$lt: ISODate("2026-05-20T11:00:00Z")
},
"vehicle.fleetId": "FLEET-NORTH"
}
},
{
$group: {
_id: "$vehicle.vehicleId",
avgSpeed: { $avg: "$speed" },
maxTemp: { $max: "$engineTemp" }
}
}
])
Best for
- Telematics
- IoT sensors
- Clickstream
- Logs
- Financial ticks
- Energy meter readings
- Industrial equipment monitoring
Avoid when
- Every event needs independent frequent updates.
- You need strict event-level transactional updates.
- Buckets can grow unpredictably.
Attribute Pattern
What it solves
The Attribute Pattern is useful when documents have many optional, dynamic, or sparsely populated attributes.
Common use cases:
- Product specifications
- Vehicle features
- Medical observations
- Real estate attributes
- Insurance policy clauses
- Marketplace filters
Instead of creating hundreds of sparse fields, store attributes as key-value objects or arrays.
Bad Model
{
"sku": "LAPTOP-001",
"ram": "16GB",
"processor": "Intel i7",
"screenSize": "14 inch",
"battery": "70Wh",
"touchscreen": true,
"graphicsCard": "RTX 4060"
}
This is okay for one category, but not for a marketplace with laptops, gowns, cameras, shoes, jewellery, and furniture.
Attribute Pattern Model
db.products.insertOne({
sku: "LAPTOP-001",
title: "Business Laptop",
category: "Electronics",
price: 85000,
attributes: [
{
k: "ram",
v: "16GB"
},
{
k: "processor",
v: "Intel i7"
},
{
k: "screenSize",
v: "14 inch"
},
{
k: "touchscreen",
v: true
}
]
})
INDEX
db.products.createIndex({
category: 1,
"attributes.k": 1,
"attributes.v": 1
})
QUERY
db.products.find({
category: "Electronics",
attributes: {
$elemMatch: {
k: "ram",
v: "16GB"
}
}
})
Best for
- E-commerce catalog filters
- Dynamic metadata
- Multi-category platforms
- Configurable products
- Search facets
Avoid when
Fields are stable, mandatory, and heavily queried. In that case, top-level indexed fields are simpler and faster.
Computed Pattern
What it solves
The Computed Pattern stores pre-calculated values to avoid expensive repeated computation.
Examples:
- Average rating
- Order total
- Wallet ledger balance
- Product review count
- Monthly revenue
- Driver score
- Customer lifetime value
Example: Product Rating Summary
db.products.insertOne({
sku: "SHOE-001",
title: "Running Shoes",
ratingSummary: {
avgRating: 4.6,
reviewCount: 245,
fiveStarCount: 180,
fourStarCount: 40
}
})
When a new review comes:
db.products.updateOne(
{ sku: "SHOE-001" },
{
$inc: {
"ratingSummary.reviewCount": 1,
"ratingSummary.fiveStarCount": 1
},
$set: {
"ratingSummary.avgRating": 4.61
}
}
)
Best for
- Dashboard metrics
- Frequently displayed counts
- Aggregated KPIs
- Leaderboards
- Product ratings
Avoid when
- Exact real-time correctness is mandatory and updates are extremely frequent.
- Many concurrent updates hit the same document, causing write contention.
In high-write workloads, consider partitioned counters.
Outlier Pattern
What it solves
Most documents follow normal size and growth behavior, but a few become unusually large. The Outlier Pattern keeps normal documents simple and moves exceptional data into a separate collection.
MongoDB University’s advanced schema design patterns include the Outlier Pattern as a pattern for handling exceptional cases in schema design.
Example: Influencer Profile with Huge Followers
Normal users:
db.users.insertOne({
userId: "USER-1001",
name: "Regular User",
followerIds: [
"USER-2001",
"USER-2002"
],
hasOutlierFollowers: false
})
Outlier user:
db.users.insertOne({
userId: "USER-9999",
name: "Celebrity User",
followerPreview: [
"USER-2001",
"USER-2002",
"USER-2003"
],
followerCount: 9000000,
hasOutlierFollowers: true
})
Separate follower collection:
db.userFollowers.insertOne({
userId: "USER-9999",
followerId: "USER-8888",
followedAt: ISODate("2026-05-20T10:00:00Z")
})
Index
db.userFollowers.createIndex({ userId: 1, followedAt: -1 })
Best for
- Social followers
- Product reviews
- Viral posts
- Popular products
- Chat groups with huge memberships
Avoid when
Most documents are outliers. Then it is not an outlier problem; it is your core data model.
Archive Pattern
What it solves
The Archive Pattern separates hot operational data from cold historical data.
MongoDB’s best-practice guidance asks modelers to consider lifecycle management from creation to archiving and deletion for performance, cost, and security.
Example: Orders
Hot collection:
db.orders.insertOne({
orderId: "ORD-1001",
customerId: "CUST-501",
orderDate: ISODate("2026-05-20T10:30:00Z"),
status: "DELIVERED",
totalAmount: 7997
})
Archive collection:
db.orders_archive.insertOne({
orderId: "ORD-2019-9001",
customerId: "CUST-501",
orderDate: ISODate("2019-02-10T10:30:00Z"),
status: "DELIVERED",
totalAmount: 4999,
archivedAt: ISODate("2026-05-20T00:00:00Z")
})
TTL for Temporary Data
db.sessions.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 0 }
)
Best for
- Logs
- Old orders
- Old trips
- Old telemetry
- Compliance retention
- Cost optimization
Avoid when
The application frequently updates old data.
Polymorphic / Single Collection Pattern
What it solves
Use this when multiple document types share common query patterns but have different fields.
Examples:
- Notifications
- Activity feeds
- Content management
- Payments
- Events
- Audit logs
Example: Activity Feed
db.activities.insertMany([
{
activityType: "ORDER_PLACED",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T10:00:00Z"),
orderId: "ORD-1001",
amount: 5999
},
{
activityType: "PRODUCT_REVIEWED",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T11:00:00Z"),
productSku: "SHOE-001",
rating: 5
},
{
activityType: "LOGIN",
userId: "USER-1001",
createdAt: ISODate("2026-05-20T12:00:00Z"),
ip: "103.20.10.1"
}
])
Index
db.activities.createIndex({ userId: 1, createdAt: -1 })
db.activities.createIndex({ activityType: 1, createdAt: -1 })
Query
db.activities.find({
userId: "USER-1001"
}).sort({
createdAt: -1
}).limit(20)
Best for
- Feed design
- Audit trail
- Notification system
- Event store
- Timeline views
Avoid when
Each type has completely different query patterns, retention policies, and indexes.
Approximation Pattern
What it solves
The Approximation Pattern sacrifices exactness for scale and cost efficiency.
Examples:
- View counts
- Like counts
- Impressions
- Page analytics
Instead of updating the same counter document for every event, aggregate periodically or use partitioned counters.
Example: Partitioned Counter
db.productCounters.updateOne(
{
sku: "SHOE-001",
counterType: "views",
partition: Math.floor(Math.random() * 20)
},
{
$inc: {
count: 1
}
},
{
upsert: true
}
)
Read Toatal
db.productCounters.aggregate([
{
$match: {
sku: "SHOE-001",
counterType: "views"
}
},
{
$group: {
_id: "$sku",
totalViews: { $sum: "$count" }
}
}
])
Best for
- High-volume counters
- Social media metrics
- Product views
- Ad impressions
- Trending calculations
Avoid when
You need financial-grade accuracy on every read.Tree Pattern
What it solves
Use Tree Patterns for hierarchical data.
Examples:
- Product categories
- Organization hierarchy
- Location hierarchy
- Folder structures
- Menu navigation
Materialized Path Example
db.categories.insertMany([
{
categoryId: "electronics",
name: "Electronics",
path: ",electronics,"
},
{
categoryId: "mobiles",
name: "Mobiles",
parentId: "electronics",
path: ",electronics,mobiles,"
},
{
categoryId: "android-phones",
name: "Android Phones",
parentId: "mobiles",
path: ",electronics,mobiles,android-phones,"
}
])
Query descendants
db.categories.find({
path: /^,electronics,mobiles,/
})
Index
db.categories.createIndex({ path: 1 })
Best for
- Category trees
- Navigation menus
- Org structures
- File folders
Avoid when
You need complex graph traversal. For deep graph relationships, use $graphLookup carefully or consider a graph-specialized design.
Schema Versioning Pattern
What it solves
Applications evolve. Schema Versioning allows documents with different versions to coexist safely.
db.customers.insertOne({
customerId: "CUST-1001",
schemaVersion: 2,
name: {
first: "Sachin",
last: "Gupta"
},
contact: {
email: "sachin@example.com",
mobile: "+91XXXXXXXXXX"
}
})
Older document:
db.customers.insertOne({
customerId: "CUST-999",
schemaVersion: 1,
fullName: "Old Customer",
email: "old@example.com"
})
Application migration logic:
function normalizeCustomer(doc) {
if (doc.schemaVersion === 1) {
return {
customerId: doc.customerId,
name: {
first: doc.fullName.split(" ")[0],
last: doc.fullName.split(" ").slice(1).join(" ")
},
contact: {
email: doc.email
}
}
}
return doc
}
Best for
- Large collections
- Zero-downtime migrations
- Evolving APIs
- SaaS platforms
Avoid when
You can migrate all records safely before release.