Back to posts
13 min read

MongoDB to PostgreSQL Migration: What We Learned Moving 130M Rows

We migrated 2M MongoDB documents to 130M PostgreSQL rows — not because we were broken, but because we could see the break coming. Here's the full story.

Siddhartha Katiyar

Siddhartha Katiyar

Software Engineer

postgresqlmongodbdatabasemigrationbackend+6

Let me say this upfront: we didn't migrate because our system was collapsing. We migrated because it would.

That's a different story than most migration posts you read. There was no outage, no 3 AM PagerDuty spiral, no moment where the database literally caught fire. It was quieter than that — and in some ways more unsettling. Just a slow realization that the architecture we had was debt that would eventually come due.

This is that story. I'm Siddhartha Katiyar, an SDE at Jsmon, an early-stage attack surface management startup. This is a first-hand account of how we migrated our entire production database — 2 million MongoDB documents into 130 million PostgreSQL rows — without a single major incident.

TL;DR

  • Migrated 2M MongoDB documents into 130M PostgreSQL rows (normalization isn't free)
  • Relational queries were taking 5+ minutes; data consistency was becoming a real problem
  • Used a JSONB bridge for fast cut-over + built a custom Go writer for high-throughput inserts
  • Complex queries now run in 25 seconds instead of 5 minutes; schema constraints caught bugs that had been silently lurking for months
  • Used AI agents (Claude + Gemini) to compress what would've been ~2 weeks of execution into 2 hours — more on that at the end

What We Were Building

Jsmon is an attack surface management (ASM) platform — continuous surveillance of your external-facing infrastructure. Subdomains, exposed secrets, vulnerabilities, DNS changes, JS file diffs, across thousands of assets, running 24/7.

The data model is not simple. Workspaces, users, findings, secrets, scan results, domain records, URL snapshots — and these things are deeply relational. A finding belongs to a scan. A scan belongs to a workspace. A secret is tied to a URL which is tied to a domain which is tied to a scan run.

At the time of migration:

  • ~3,000 users
  • ~2 million documents in MongoDB
  • A Node.js + Go backend, with Redis for queuing

After migration: 130 million rows in PostgreSQL. That's not a typo — 2M documents → 130M rows. Normalization isn't free.

Why MongoDB in the First Place?

Honest answer: I don't fully know. I joined after the initial architectural decisions were made.

But I've seen enough early-stage startups to reconstruct the reasoning. MongoDB is the "just ship it" database. Schema flexibility means you don't have to think hard upfront, documents map cleanly to JSON objects, and that maps cleanly to how Node.js devs think. You iterate fast, nest things, move on.

It's not a bad choice for day one — I want to be clear about that. The early team made a reasonable call. The problem is what MongoDB implicitly encourages, and what you don't notice until you're already knee-deep in it.

The Warning Signs

Nobody came to me with a postmortem. There was no incident. What there was, instead, were specific pain points that kept resurfacing.

Aggregations across collections were painful

MongoDB is excellent when your data fits neatly inside a single document. Jsmon's data doesn't. A query like "show me all high-severity findings for this workspace, grouped by scan, with the associated user details" means you either embed everything (hello, duplication), do manual joins in application code across multiple queries, or write a $lookup aggregation pipeline that looks like it was designed by someone who genuinely dislikes you.

We were doing all three at different points. The $lookup pipelines for our more complex queries were deeply unreadable. And when a pipeline takes 5 minutes to run — yes, five minutes — you stop trusting your own system.

Data duplication was everywhere

Mongo nudges you toward denormalization, so we had the same data living in multiple places. User metadata duplicated across workspaces. Scan metadata embedded in findings. The consistency burden fell entirely on application code, which meant it was inconsistent.

No real transactions

This one's underrated. When you're managing security scan state — a scan starts, runs, writes findings, updates counts, marks completion — you want those operations to be atomic. MongoDB's multi-document transactions exist but feel bolted-on, with real performance overhead. We had logic that was "probably consistent" rather than "guaranteed consistent." That's fine until it isn't.

Debugging felt like archaeology

Querying MongoDB in Compass for simple things was fine. But when something went wrong in production and you needed to cross-reference findings with scan states with workspace configs, you were writing nested pipelines in a shell. There was no moment of clarity where you could just look at the data and understand what happened.

The Mental Model That Broke

MongoDB's core philosophy is: model your data around how you access it, not around how it relates. This works perfectly when your access patterns are known, stable, and simple. Jsmon's access patterns are none of those things.

Security data is inherently cross-cutting. You want to ask things like:

  • "Which workspaces have domains that share this vulnerable JS library?"
  • "Which scans found secrets that haven't been remediated across all users?"

These are relational questions. They're questions relational databases were literally invented to answer. Trying to answer them in MongoDB is like trying to nail a screw. The "denormalize everything" philosophy made sense for document-level access. It became a liability the moment we wanted analytics, reporting, or any query joining more than two logical entities.

Why PostgreSQL? Why Not Just Fix Mongo?

We considered it. There are things you can do to make MongoDB more bearable — better indexing, tighter embedding strategy, schema validation (yes, Mongo has it), materialized views via application-level caching.

But the fundamental issue wasn't performance tuning. It was the data model. You can't index your way out of a normalization problem.

We briefly considered DynamoDB too, and ruled it out fast. DynamoDB is excellent if your access patterns are rigid and you're operating at Amazon scale. We're not, and its query model would've made the analytical problem worse.

PostgreSQL won for four specific reasons:

  1. Joins. Actual, real, performant joins. This alone fixes 80% of our query problems.
  2. Transactions with rollbacks. Scan state can now be atomic — everything commits or nothing does.
  3. Constraints. Foreign keys, unique constraints, not-null guarantees. Data integrity enforced at the database layer, not scattered across application code.
  4. JSONB. This is underrated as a migration enabler. PostgreSQL's JSONB type allowed us to migrate MongoDB documents as raw binary JSON columns first, then normalize them progressively into proper relational tables. It gave us a much softer landing than a hard cut-over to a fully normalized schema from day one.

How the Migration Actually Happened

This is the part most posts skip over or sanitize.

The JSONB bridge strategy

The JSONB bridge is a migration technique where MongoDB documents are first imported as raw JSONB columns in PostgreSQL before being progressively normalized into relational tables. Rather than requiring a perfect schema upfront, you get your data into Postgres quickly, then clean it up incrementally. This is what made our one-day migration possible — we weren't fully transforming 130M rows on day one, we were landing them safely first.

The strategy: big bang rewrite

We didn't dual-write. We didn't run parallel systems for weeks. We rewrote all queries at once and cut over.

I know. "That's terrifying." It was. But it worked because Jsmon is a startup without bandwidth for a 6-month incremental migration, our data model was changing significantly enough that dual-writing would've been its own engineering project, and we had confidence in the new schema — because that's where we spent most of our time, not on the queries.

We also built a custom pgwriter — a Go service that consumes from Redis queues and handles all writes to PostgreSQL. This decoupled write throughput from the application layer. At peak we're handling ~2,000 writes per second through this pipeline.

The schema was the hard part

Everyone focuses on query rewrites. The schema design took longer and was far more consequential. Getting the normalization right, deciding what to index (and critically, what not to index — indexes aren't free), designing for query patterns we knew were coming — this is where the real work was.

Some index decisions were painful lessons: we over-indexed early and saw write performance suffer; some queries that seemed like they'd benefit from a composite index actually performed better with a partial index; JSONB columns need GIN indexes, not the default B-tree.

The data migration took one day

Including debugging. 2 million Mongo documents → 130 million PostgreSQL rows in one day. Some of that speed came from the JSONB bridge — we could import documents without fully transforming them immediately. The rest came from aggressive parallel inserts and good schema prep.

What Actually Got Better

Query performance. The most dramatic improvement: a complex aggregation that was taking 5 minutes now runs in 25 seconds. Several analytical queries went from 8 seconds → 500ms, and that includes Redis caching on top of Postgres. Join queries that were previously multi-round-trips in application code are now single SQL statements.

The confidence improvement matters too. When a Postgres query returns data, I trust it. With Mongo aggregations I was never fully sure about edge cases in the pipeline.

Data integrity. We caught bugs that had been silently present for months in MongoDB — foreign key violations that would've caused silent data corruption, nulls where there shouldn't be nulls. Postgres told us at insertion time. Mongo would've silently accepted it and caused a production incident six months later.

Developer experience for analytical work. A SQL query with three JOINs is infinitely more readable than the equivalent Mongo aggregation pipeline. For simple CRUD though — more on that below.

What Got Worse (Honesty Section)

Simple CRUD got more verbose. With MongoDB and a JS ORM, inserting a document is trivial. With raw pg queries — we chose not to use an ORM, intentionally — even a simple insert requires more boilerplate. Real cost.

Schema rigidity cuts both ways. When you know what you're storing, constraints are wonderful. When you're still figuring it out, they're friction. Early-stage products sometimes need to move faster than Postgres's discipline allows.

I genuinely miss Compass. MongoDB Compass for visual document exploration is great. pgAdmin exists but the raw data browsing experience in Compass is just more intuitive. Small thing, but I notice it.

Would I Start with MongoDB Again?

Here's where I'll probably lose some people: yes.

The speed of early iteration matters. When you don't actually know your data model — really know it, not just think you know it — Mongo's flexibility is genuinely valuable. You can evolve the schema as you learn what you're building.

What I'd change: plan for the migration earlier. The mistake isn't choosing MongoDB. The mistake is treating it as a permanent decision. If you're building something with relational data (and most business applications have relational data, whether you admit it or not), build with the assumption that you'll migrate when the model stabilizes.

For anyone starting fresh with a clearer product vision: start with Postgres. The initial schema work is harder. Early iteration is slower. But you avoid the migration debt entirely.

The migration isn't the worst thing in the world — but two weeks of planning plus execution time is not nothing.

The Part Nobody Puts in the Blog Post

This migration was done with AI.

The planning happened over Cursor with Claude Opus for architecture — 16 minutes to produce the migration plan. The execution used Gemini 2.5 Pro, running up to 10 parallel agents, completing in 47 minutes.

Total time: 2 hours from final prompt to finished migration. Total cost: ₹1,563 in Gemini API credits.

Two weeks of planning and schema design — that was human. The execution that would've taken a developer one to two weeks? Two hours.

I'm not writing this to flex. I'm writing it because I think we're at an inflection point most engineers haven't fully processed yet. The fundamentals still matter — I needed to understand normalization, indexing, query optimization, Redis architecture, and transaction semantics to write a prompt that produced correct output. The AI doesn't replace understanding. It replaces typing.

But if you have the understanding, your execution velocity has changed by an order of magnitude. What used to be a "senior engineer, two weeks" project is now an "engineer with clear thinking, two hours" project. That compression ratio is worth sitting with.

Hard Lessons, Summarized

  1. Your data is probably relational. Admit it early. MongoDB is great for truly document-centric data. Most business apps aren't document-centric.
  2. Normalization debt compounds. Every shortcut you take embedding or duplicating data is a future migration pain point. You don't pay now, but you pay.
  3. The schema IS the architecture. More time on schema design means less time on everything else. We spent more time on the PostgreSQL schema than on the query rewrites. Right call.
  4. Indexing strategy is not obvious. Read the explain plans. Measure. Don't assume a column needs an index just because you query on it.
  5. Transactions aren't optional for stateful workflows. Scan state, payment state, order state — anything where partial writes cause corruption. MongoDB's transactions are an afterthought. PostgreSQL's are first-class.
  6. AI is a force multiplier, not a replacement. Fundamentals got more important, not less. But with clear fundamentals, execution speed is no longer your bottleneck.

Frequently Asked Questions

How long does a MongoDB to PostgreSQL migration take? It depends heavily on data volume and schema complexity. For Jsmon, migrating 2 million MongoDB documents into 130 million normalized PostgreSQL rows took one day of actual execution — but two weeks of planning, schema design, and query rewriting preceded that. Using a JSONB bridge to stage the import significantly reduced execution time. Teams running smaller datasets with simpler schemas can reasonably expect a few hours to a few days for the full process.

When should you migrate from MongoDB to PostgreSQL? The right time to migrate from MongoDB to PostgreSQL is when your data access patterns become primarily relational — meaning you're frequently joining across collections, struggling with data consistency across embedded documents, or finding that aggregation pipelines are becoming unreadable or slow. If your $lookup queries are taking minutes or your application code is doing manual joins across multiple queries, that's a strong signal. The longer you wait after those signs appear, the larger the migration debt becomes.

What are the performance gains from switching to PostgreSQL? Performance gains vary by query type. For complex aggregations involving multiple joins, the improvement can be dramatic — at Jsmon, one query went from 5 minutes down to 25 seconds after migration. Analytical queries that previously took 8 seconds dropped to 500ms with Redis caching on top. Simple CRUD operations see little to no improvement and may actually require more verbose code depending on whether you use an ORM. The biggest gains are in cross-entity queries that MongoDB handles poorly by design.

What is the JSONB bridge migration strategy? The JSONB bridge is a two-phase MongoDB to PostgreSQL migration approach where MongoDB documents are first imported as raw JSONB columns in PostgreSQL, then progressively normalized into relational tables. Instead of requiring a perfect normalized schema on day one, you land your data safely into Postgres first — preserving the document structure — and clean it up incrementally. This significantly reduces migration risk and execution time, since you're not blocked on finalizing every table relationship before you can start importing.

Should early-stage startups use MongoDB or PostgreSQL? For a brand-new product where the data model is still being discovered, MongoDB's schema flexibility offers real advantages — faster iteration, no migration overhead, and a natural fit with JSON-heavy Node.js development. However, most business applications eventually develop relational data needs. The pragmatic approach is to start with MongoDB if your domain is unclear, but plan explicitly for a PostgreSQL migration once your data model stabilizes — rather than treating MongoDB as a permanent architectural decision.

Closing

We moved from MongoDB to PostgreSQL not because we were broken, but because we could see the break coming. That kind of preemptive engineering is harder to justify than reactive firefighting — there's no incident report to point to, just the intuition that the current path leads somewhere bad.

It led somewhere better.

130 million rows. 25-second queries where there used to be 5-minute ones. Atomic scan state. Data integrity we can actually trust. And a codebase that, while more verbose in places, says what it means.

Was it worth it? Yes. Would I do it differently? Start with Postgres if the domain is clear. Start with Mongo if you're still learning what you're building.

Either way — the migration is survivable. Especially with good fundamentals, a clear schema, and ₹1,563 in API credits.

Siddhartha Katiyar is an SDE at Jsmon, where he works on backend infrastructure, database engineering, and security tooling. He holds a BTech in CSE from JIIT, Noida and is a Codeforces Expert (peak rating 1735). You can find him on X or read more at siddhartha.work.

Siddhartha Katiyar

Written by Siddhartha Katiyar

Software engineer specializing in infrastructure and security. Building high-performance systems at Jsmon.