Every SaaS company faces the same architectural decision early on: how do you store data for hundreds — eventually thousands — of customers in a way that’s secure, performant, and won’t require a painful rewrite when you hit scale?
Get it right and your infrastructure grows with your revenue. Get it wrong and your engineering team spends six months rebuilding the data layer instead of shipping features.
Here are the three patterns, when to use each, and the tradeoffs nobody tells you about until it’s too late.
Pattern 1: Shared Database, Shared Schema
Every tenant’s data lives in the same tables. A tenant_id column on every row separates Customer A’s data from Customer B’s.
users table:
| id | tenant_id | name | email |
|----|-----------|------------|--------------------|
| 1 | acme | Alice | alice@acme.com |
| 2 | globex | Bob | bob@globex.com |
When to use it: Early-stage SaaS (pre-Series A), simple data models, low compliance requirements, rapid iteration.
Advantages:
- Simplest to build and deploy. One database, one schema, one migration.
- Cheapest infrastructure cost. One PostgreSQL instance serves all tenants.
- Easiest to query across tenants for internal analytics.
- Schema changes apply instantly to all tenants.
Risks:
- Data isolation depends entirely on application logic. One missing
WHERE tenant_id = ?clause and you’ve leaked data between customers. This is the number one cause of SaaS security incidents. - Noisy neighbor problem. One tenant running a heavy export job degrades performance for everyone.
- Compliance friction. GDPR “right to deletion” means you’re deleting rows from shared tables. SOC 2 auditors will ask hard questions about isolation.
- Hard to offer per-tenant customization. Custom fields, custom workflows, or tenant-specific schema changes become awkward.
Row-Level Security (RLS) as mitigation: PostgreSQL and most modern databases support RLS policies that enforce tenant isolation at the database level — not just the application. This moves the security boundary from “developer remembered to add the WHERE clause” to “the database won’t return rows you shouldn’t see.” If you use this pattern, RLS is not optional.
Pattern 2: Database-Per-Tenant
Every customer gets their own database (or schema). Complete isolation.
When to use it: Enterprise SaaS, regulated industries (healthcare, finance, insurance), large tenants with custom requirements.
Advantages:
- Perfect data isolation. Impossible to accidentally leak data. Each tenant’s database is a separate security boundary.
- Per-tenant performance tuning. You can give your biggest customer dedicated resources without affecting anyone else.
- Simple compliance. Data deletion = drop database. Data residency = put that database in the EU region.
- Backup and restore per tenant. You can restore one customer’s data without touching anyone else.
Risks:
- Operational overhead explodes. At 100 tenants you manage 100 databases. At 1,000 you manage 1,000. Schema migrations must be applied to every single one.
- Cross-tenant analytics is painful. Your internal BI team can’t simply query across tenants — they need a separate analytics pipeline that aggregates data from all databases.
- Cost scales linearly. Each database has a base cost. At scale, this is significantly more expensive than shared.
- Connection pool exhaustion. Your application needs connections to every tenant database. At scale, this becomes a serious engineering challenge.
Managed solutions: Citus (distributed PostgreSQL), Azure SQL Elastic Pools, and AWS RDS Proxy can reduce the operational burden. But they don’t eliminate it — they just push the complexity to a different layer.
Pattern 3: Hybrid (The One Most Scaling SaaS Companies End Up With)
Operational data is shared (Pattern 1 with RLS). Analytics data is separated. Large enterprise tenants get isolated databases. Small tenants share.
When to use it: Series B+ SaaS, mixed customer base (SMB + enterprise), need for both operational efficiency and enterprise-grade isolation.
Architecture:
Tier 1 (SMB customers):
→ Shared PostgreSQL with RLS
→ Shared analytics in BigQuery/Snowflake (partitioned by tenant_id)
Tier 2 (Enterprise customers):
→ Dedicated database per tenant
→ Dedicated Snowflake warehouse or BigQuery dataset
→ Custom data retention policies
All tiers:
→ Shared event pipeline (Kafka → data warehouse)
→ Shared application layer with tenant-aware routing
Advantages:
- Optimizes cost for the majority (SMB) while satisfying enterprise requirements.
- Lets you charge more for “dedicated infrastructure” as an enterprise plan feature.
- Analytics pipeline is unified — your data team sees all tenants in one place, with proper access controls.
Risks:
- Most complex to build and maintain. Two infrastructure patterns means two sets of runbooks, monitoring, and deployment processes.
- Tenant routing logic. Your application needs to know which tenant is on which tier and route accordingly. This is a source of bugs.
- Migration between tiers. When a customer upgrades from SMB to enterprise, you need to migrate their data from shared to dedicated. This needs to be automated and tested.
How to Choose
| Factor | Shared (Pattern 1) | Dedicated (Pattern 2) | Hybrid (Pattern 3) |
|---|---|---|---|
| Team size | < 5 engineers | > 10 engineers | > 5 engineers |
| Number of tenants | 10-10,000 | 10-500 | 100-100,000 |
| Largest tenant | < 1% of total data | > 10% of total data | Mix of both |
| Compliance | Basic | SOC 2, HIPAA, GDPR strict | Mixed requirements |
| Budget | Minimize infra cost | Enterprise can pay premium | Optimize by tier |
| Time to market | Critical | Less urgent | Have runway to build properly |
The Analytics Layer (Regardless of Pattern)
Whichever pattern you choose for your operational database, your analytics infrastructure should be separate. Never run analytics queries against your production database.
The standard SaaS analytics stack:
- Event collection — Segment, Rudderstack, or custom Kafka pipeline
- Data warehouse — Snowflake, BigQuery, or Redshift (partitioned by tenant)
- Transformation — dbt for SQL-based transforms, tenant-aware models
- BI layer — Metabase, Looker, or embedded analytics (Qrvey, Explo)
- Reverse ETL — Census or Hightouch to push insights back into the product
Embedded Analytics: The Feature Your Customers Will Ask For
At some point, your customers will say: “I want to see dashboards inside your product.”
This is embedded analytics — and it’s a separate architectural decision. Options:
- Build it yourself — Full control, high engineering cost. D3.js, Recharts, or custom charts.
- Embedded BI tool — Metabase (open source), Qrvey, Explo, or Looker embedded. Faster to ship, per-tenant security built in.
- Notebook-style — Hex or Observable for data-heavy customers who want to explore.
The key constraint is multi-tenant security. Every embedded chart must filter to the current tenant’s data. RLS at the warehouse level is the safest approach.
Common Mistakes
- Choosing database-per-tenant at day one because “we might need it later.” You probably won’t for 2+ years. Start shared, add isolation when a paying customer demands it.
- No tenant_id on analytics events. If your event pipeline doesn’t tag every event with a tenant identifier, your analytics warehouse will be useless for per-customer insights.
- Running analytics on the production database. This will slow down your product for all tenants. Separate the read path from day one.
- Ignoring data deletion. GDPR and CCPA require you to delete a customer’s data on request. In a shared database, this is row-level deletion across dozens of tables. Plan for it before an enterprise customer asks.
- No cost attribution per tenant. If you can’t measure how much infrastructure each tenant consumes, you can’t price your enterprise tier correctly.
Simba Hu helps companies make better decisions with data and AI — from strategy to implementation. Based in Tokyo, serving clients globally. Book a strategy call or visit simbahu.com.