Edge SQLite DB is popular now, Cloduflare, Bunney.net and mange other edge cloud providers all provide a SQLite compatible edge database to customers, in addition, SQLite can be treated as an indexed CSV file. What if we put “CSV file in S3 as a read-only edge database”, the knee-jerk reaction is usually:
That’s a hack.
That won’t scale.
Just use a real database.
Sometimes that reaction is right.
But sometimes it’s lazy.
For a large class of web and mobile applications, an S3-hosted CSV can be a perfectly valid — and even elegant — data store.
Let’s talk about when this works, why it works, and where it absolutely doesn’t.
The Core Idea
The pattern is simple:
- Your data lives in a CSV file
- The file is stored in Amazon S3
- Your app:
- Downloads it
- Parses it
- Uses it as a read-only or mostly-read dataset
No database server.
No connection pooling.
No schema migrations.
Just a file.
Why This Sounds Wrong (At First)
We’ve been trained to think:
- Apps need databases
- Databases need servers
- Servers need maintenance
But that mental model assumes:
- High write volume
- Complex queries
- Concurrent updates
- Strong consistency guarantees
Many apps don’t actually need any of that.
Where This Pattern Shines
1. Read-heavy applications
If your app mostly reads data and rarely writes:
- Product catalogs
- Feature flags
- Configuration tables
- Static reference data
- Game levels
- Pricing matrices
- Lookup tables
A CSV in S3 works extremely well.
2. Infrequent updates
If data updates:
- Daily
- Weekly
- On deploy
- Via an admin workflow
Then atomic file replacement in S3 is enough.
Upload a new CSV → done.
3. Predictable access patterns
CSV files are ideal when:
- You load the whole dataset
- Or scan sequentially
- Or filter in memory
They are not ideal for ad-hoc querying across millions of rows.
The Hidden Advantages
Simplicity beats sophistication
An S3-backed CSV gives you:
- No database provisioning
- No migrations
- No ORM
- No connection errors
- No cold starts (if cached properly)
Your failure modes shrink dramatically.
Cost is effectively zero
- S3 storage costs pennies
- Bandwidth is cheap
- No idle database instances
For small to medium apps, this matters.
Operational robustness
S3 gives you:
- High durability
- Built-in redundancy
- Strong consistency for new objects
In practice, it’s more reliable than many self-managed databases.
Easy local development
You can:
- Download the CSV
- Open it in Excel
- Edit it by hand
- Commit it to Git
- Upload it to S3
No special tooling required.
Architecture Pattern
A common setup looks like this:
- CSV stored in S3
- CDN (CloudFront) in front of it
- App:
- Fetches the file
- Caches it in memory
- Refreshes periodically
For mobile apps:
- Fetch once on startup
- Cache locally
- Update in the background
This is shockingly fast and scalable.
What About Writes?
This is where discipline matters.
Good write patterns:
- Admin-only updates
- Batch uploads
- Replace-the-file semantics
- Append-only logs processed offline
Bad write patterns:
- Per-user updates
- Concurrent writes
- Transactional requirements
- Partial row updates
If your app needs frequent writes, this pattern breaks down fast.
CSV vs “Real” Databases: The Real Comparison
| Requirement | CSV in S3 | Traditional DB |
|---|---|---|
| Read scalability | ✅ Excellent | ✅ Excellent |
| Write concurrency | ❌ Poor | ✅ Strong |
| Query flexibility | ❌ Limited | ✅ Powerful |
| Operational overhead | ✅ Minimal | ❌ High |
| Cost | ✅ Very low | ❌ Higher |
| Developer velocity | ✅ High | ⚠️ Medium |
The mistake is assuming every app needs every column on the right.
When This Is a Bad Idea
Be honest with yourself. Don’t use this if you need:
- High-frequency writes
- User-generated content
- Transactions
- Row-level locking
- Complex joins
- Real-time consistency
This pattern is not a database replacement.
It’s a data distribution strategy.
A Useful Mental Model
Instead of asking:
Is this a “real database”?
Ask:
Is my data closer to configuration… or interaction?
- Configuration → CSV in S3 is often perfect
- Interaction → you probably need a database
Final Takeaway
Using a CSV file in S3 as a backend isn’t a hack.
It’s a deliberate trade-off:
- Less flexibility
- More simplicity
- Fewer moving parts
For read-heavy, low-write, predictable workloads:
A CSV in S3 can be the cleanest, cheapest, and most reliable “database” you’ll ever use.
The real mistake isn’t avoiding databases.
It’s using them when you don’t actually need one.
Pingback: 在S3中使用CSV文件作为"数据库 - 偏执的码农