Skip to content
Home » Using a CSV File in S3 as a “Database”: A Surprisingly Practical Pattern

Using a CSV File in S3 as a “Database”: A Surprisingly Practical Pattern

  • AWS

Edge SQLite DB is popular now, Cloduflare, Bunney.net and mange other edge cloud providers all provide a SQLite compatible edge database to customers, in addition, SQLite can be treated as an indexed CSV file. What if we put “CSV file in S3 as a read-only edge database”, the knee-jerk reaction is usually:

That’s a hack.
That won’t scale.
Just use a real database.

Sometimes that reaction is right.
But sometimes it’s lazy.

For a large class of web and mobile applications, an S3-hosted CSV can be a perfectly valid — and even elegant — data store.

Let’s talk about when this works, why it works, and where it absolutely doesn’t.


The Core Idea

The pattern is simple:

  • Your data lives in a CSV file
  • The file is stored in Amazon S3
  • Your app:
    • Downloads it
    • Parses it
    • Uses it as a read-only or mostly-read dataset

No database server.
No connection pooling.
No schema migrations.

Just a file.


Why This Sounds Wrong (At First)

We’ve been trained to think:

  • Apps need databases
  • Databases need servers
  • Servers need maintenance

But that mental model assumes:

  • High write volume
  • Complex queries
  • Concurrent updates
  • Strong consistency guarantees

Many apps don’t actually need any of that.


Where This Pattern Shines

1. Read-heavy applications

If your app mostly reads data and rarely writes:

  • Product catalogs
  • Feature flags
  • Configuration tables
  • Static reference data
  • Game levels
  • Pricing matrices
  • Lookup tables

A CSV in S3 works extremely well.


2. Infrequent updates

If data updates:

  • Daily
  • Weekly
  • On deploy
  • Via an admin workflow

Then atomic file replacement in S3 is enough.

Upload a new CSV → done.


3. Predictable access patterns

CSV files are ideal when:

  • You load the whole dataset
  • Or scan sequentially
  • Or filter in memory

They are not ideal for ad-hoc querying across millions of rows.


The Hidden Advantages

Simplicity beats sophistication

An S3-backed CSV gives you:

  • No database provisioning
  • No migrations
  • No ORM
  • No connection errors
  • No cold starts (if cached properly)

Your failure modes shrink dramatically.


Cost is effectively zero

  • S3 storage costs pennies
  • Bandwidth is cheap
  • No idle database instances

For small to medium apps, this matters.


Operational robustness

S3 gives you:

  • High durability
  • Built-in redundancy
  • Strong consistency for new objects

In practice, it’s more reliable than many self-managed databases.


Easy local development

You can:

  • Download the CSV
  • Open it in Excel
  • Edit it by hand
  • Commit it to Git
  • Upload it to S3

No special tooling required.


Architecture Pattern

A common setup looks like this:

  1. CSV stored in S3
  2. CDN (CloudFront) in front of it
  3. App:
    • Fetches the file
    • Caches it in memory
    • Refreshes periodically

For mobile apps:

  • Fetch once on startup
  • Cache locally
  • Update in the background

This is shockingly fast and scalable.


What About Writes?

This is where discipline matters.

Good write patterns:

  • Admin-only updates
  • Batch uploads
  • Replace-the-file semantics
  • Append-only logs processed offline

Bad write patterns:

  • Per-user updates
  • Concurrent writes
  • Transactional requirements
  • Partial row updates

If your app needs frequent writes, this pattern breaks down fast.


CSV vs “Real” Databases: The Real Comparison

RequirementCSV in S3Traditional DB
Read scalability✅ Excellent✅ Excellent
Write concurrency❌ Poor✅ Strong
Query flexibility❌ Limited✅ Powerful
Operational overhead✅ Minimal❌ High
Cost✅ Very low❌ Higher
Developer velocity✅ High⚠️ Medium

The mistake is assuming every app needs every column on the right.


When This Is a Bad Idea

Be honest with yourself. Don’t use this if you need:

  • High-frequency writes
  • User-generated content
  • Transactions
  • Row-level locking
  • Complex joins
  • Real-time consistency

This pattern is not a database replacement.

It’s a data distribution strategy.


A Useful Mental Model

Instead of asking:

Is this a “real database”?

Ask:

Is my data closer to configuration… or interaction?

  • Configuration → CSV in S3 is often perfect
  • Interaction → you probably need a database

Final Takeaway

Using a CSV file in S3 as a backend isn’t a hack.

It’s a deliberate trade-off:

  • Less flexibility
  • More simplicity
  • Fewer moving parts

For read-heavy, low-write, predictable workloads:

A CSV in S3 can be the cleanest, cheapest, and most reliable “database” you’ll ever use.

The real mistake isn’t avoiding databases.

It’s using them when you don’t actually need one.

🤞Subscribe if you want to see more!

We don’t spam! Read more in our privacy policy

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *