Maybe I misunderstand what this is, but why would I use this and not MySQL, Postgres, or any other proper database? Seems like a hack to get SQLite to do what those do by design.

One of the big advantages people enjoy is the elimination of the network latency between the application server and the DB. With SQLite your DB is right there often directly attached over NVME. This improves all access latencies and even enables patterns like N+1 queries which would typically be considered anti-patterns in other DBs.

> One of the big advantages people enjoy is the elimination of the network latency between the application server and the DB. With SQLite your DB is right there often directly attached over NVME.

You can install MySQL/PostgreSQL on the application server, connect over a unix socket and get the same benefits as if you'd used SQLite on the application server (no network latency, fast queries). Plus the other benefits that come from using these database servers (Postgres extensions, remote connections, standard tooling etc). I'm guessing more RAM is required on the application server than if you used SQLite but I haven't benchmarked it.

Unix sockets don't actually give you the same benefit. You're still doing IPC which can incur substantial memory subsystem utilization. SQLite is on the same thread/core as whatever is using it.

TIL. Thanks!

Real talk, how do you actually avoid N+1? I realize you can do complicated JOINs, but isn't that almost as bad from a performance perspective? What are you really supposed to do if you need to, e.g. fetch a list of posts along with the number of comments on each post?

Often you can use joins to get the data in a single complex SQL query. Number of comments for a post is relatively straight-forward, but you can also do increasingly complex associated data fetches with modern databases.

In particular, JSON aggregations mean you can have a single query that does things like fetch a blog entry and the earliest 10 comments in a single go. I wrote up some patterns for doing that in SQLite and PostgreSQL here: https://github.com/simonw/til/blob/main/sqlite/related-rows-...

Here's an example PostgreSQL query that does this to fetch tags for posts: https://simonwillison.net/dashboard/json-agg-example

  select
    blog_entry.id,
    title,
    slug,
    created,
    coalesce(json_agg(json_build_object(blog_tag.id, blog_tag.tag)) filter (
      where
        blog_tag.tag is not null
    ), json_build_array()) as tags
  from
    blog_entry
    left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
    left join blog_tag on blog_entry_tags.tag_id = blog_tag.id
  group by
    blog_entry.id
  order by
    blog_entry.id desc
The alternative, more common path is the pattern that Django calls "prefetch_related". Effectively looks like this:

  select id, title, created from posts order by created desc limit 20

  -- Now extract the id values from that and run:

  select
    blog_entry.id,
    blog_tag.tag
  from
    blog_entry
    join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
    join blog_tag on blog_entry_tags.tag_id = blog_tag.id
  where
    blog_entry.id in (?, ?, ?, ...)

  -- Now you can re-assemble the list of tags for
  -- each entry in your application logic
Once you have a list of e.g. 20 IDs you can run a bunch of cheap additional queries to fetch extra data about all 20 of those items.

> I realize you can do complicated JOINs, but isn't that almost as bad from a performance perspective?

No, JOINs should be orders of magnitude faster.

> What are you really supposed to do if you need to, e.g. fetch a list of posts along with the number of comments on each post?

You're really supposed to do a JOIN, together with a GROUP BY and a COUNT(). This is elementary SQL.

it gets more complicated when you need to also display something like "last comment: <author> <3 days ago>" for each post, or if the comment counts need to be filtered by various flags/states/etc.

of course, it's all possible with custom SQL but it gets complicated quick.

That's like saying it gets more complicated when you have to use loops with break statements in programming. It's just what programming is.

The filtering you describe is trivial with COUNT(flag IN (...) AND state=...) etc.

If you want to retrieve data on the last comment, as opposed to an aggregate function of all comments, you can do that with window functions (or with JOIN LATERAL for something idiomatic specifically to Postgres).

Learning how to do JOIN's in SQL is like learning pointers in C -- in that it's a basic building block of the language. Learning window functions is like learning loops. These are just programming basics.

Yes, all that’s possible. But it’s not straightforward in the majority of ORMs.

Almost sounds like ORMs are a bad idea

Right, and ORMs are the main cause of the N+1 problem, and atrocious database performance in general.

For anything that isn't just a basic row lookup from a single table, you should really just be writing the SQL yourself.

AFAIK the problem of N+1 isn't necessarily one more DB query, but one more network roundtrip. So if for each page of your app you have an API endpoint that provides exactly all of the data required for that page, it doesn't matter how many DB queries your API server makes to fulfill that request (provided that the API server and the DB are on the same machine).

This is essentially what GraphQL does instead of crafting each of these super tailored API endpoints for each of your screens, you use their query language to ask for the data you want, it queries the DB for you and get you the data back in a single network roundtrip from the user perspective.

(Not an expert, so I trust comments to correct what I got wrong)

You still have to write the resolver for graphql. I've seen. N+1 with graphql if you don't actually use data loader+batch pattern OR if you use it incorrectly.

A proper join is the right answer. But, it's not always possible to make those run well. [1] A "client side join" in the right situation can be much better, but then you probably want to do a 1+1 rathet than N+1. Do the first query to get the ids for the second query, and then construct the second query with IN or UNION depending on what works best for you database. UNION likely bloats your query string, but I've seen plenty of situations where UNION is gobs faster than IN.

Alternately, if you can separate query issuance from result parsing, you can make N+1 palletable. Ex, do your query to get the ids, wait for the results, loop and issue the N queries, then loop and wait for results in order. That will be two-ish round trips rather than N+1 round trips. But you have to search to find database apis that allow that kind of separation.

[1] You can almost always express the query you want in SQL, but that doesn't mean it will have a reasonable runtime. Sometimes server side join and client side join have about the same runtime... if it's significant and you have the usual case where clients are easier to scale than database servers, it might be worthwhile to have the join run on the client to reduce server load.

A JOIN is fast, fetching the whole list in one extra query with "WHERE id IN (...)" is also pretty fast and results in less complex queries if you have serval of these. Doing all queries separate is slow because of the network round-trip for each query.

The performance problem in N+1 is (mostly) not in fetching the N rows from disk, but rather from multiplying the network latency by a factor of N. Joins solve this; so do stored procedures.

In general, you want to ask the remote server once for all the data you need, then read all the results. It applies to databases as well as APIs.

Pipelined requests also solve the problem and can be more flexible.

Also, joins can be optimised in different ways. Sometimes the optimal way to do a join isn't to query each row one-by-one, but to do something like (when the rows you want are a large fraction of the rows that exist) making a bloom filter of the rows you want and then sequentially reading all the rows in the table.

No, JOINs are pretty much always faster than performing N+1 queries.

Either joins for a fat query, or aggregate the subqueries.

For the latter, it's along the lines of `select * from posts where ...` and `select * from authors where id in {posts.map(author_id)}`. And then once it's in memory you manually work out the associations (or rely on your ORM to do it).

You do indeed use JOINS. The goal is to retrieve exactly the data you require in a single query. Then you get the DB to `EXPLAIN VERBOSE` or similar and ensure that full table scans aren't happening and that you have indexed the columns the query is being filtered on.

Avoiding N+1 doesn't have to mean limiting yourself to 1 query. You can still fetch the posts in one query and the comments of _all_ posts in a separate query, just don't issue a query for _each_ post.

More formally, the number of queries should be constant and not linearly scaling with the number of rows you're processing.

The actual thing that we're getting N+1 of is network round-trips. An additional network round-trip is way, way slower than an extra JOIN clause. That's why N+1 query patterns aren't a problem when you're using a local database: There's no round-trip.

A well-written JOIN against a well-designed database (regardless if we're talking postgres, SQLite, MySQL/MariaDB, or MS SQL) should not be slow. If it's slow, you're using it wrong.

To avoid operating a database by yourself and dealing with incidents, backups, replicas, failovers, etc... You can use cheap commoditised S3-like storage and run your application statelessly.

If you have access to a database that is well managed on your behalf I would definitely still go with that for many usecases.

It's significantly faster and incurs less ops overhead. That's it.

But most apps should just use a classic n-tier database architecture like Postgres. We mostly do too (though Litestream does back some stuff here like our token system).

Why use Postgres if all you need is sqlite? Postgres is way overkill for a simple app with few users and no advanced database functionality.

Because Postgres is mature, works, and has a version number above v1.0?

[deleted]

If v1.0 is your North Star, you should re-evaluate a whole lot of software in your stack: https://0ver.org/#notable-zerover-projects

I think you're focusing on the wrong parts of the comment.

People care about things like long-term support. Postgres 13, from 2020, is still officially supported. Litestream 0.1.0 was the first release, also from 2020, but I can't tell if it is supported still. Worrying about the maturity, stability, and support of an application database is very reasonable in risk adverse projects.

Litestream is just a backup solution. Should probably be compared to a backup solution for postgres that does automated backups over the network etc. That isnt part of postgres.

Besides the question wasnt litestream vs postgres backup apps. It was sqlite vs postgres.

The original response at least concerned litestream because the not-1.0 comment only applies to that.

Version numbers dont mean anything as the whole Elixir ecosystem shows:D

this is essentially the "no one ever got fired for buying IBM" statement. One counter is why buy & manage a rack-mounted server when all you need is a raspberry Pi?

Postgres can also run on the Pi.

More than once I've started a project with sqlite and then had to migrate to postgres. In my experience it's because of the way sqlite has to lock the whole database file while writing to it, blocking concurrent reads - this isn't a problem in postgres. (There's WAL mode, but it still serialises all writes, and requires periodic explicit checkpointing IME)

You may also find you really want a feature postgres has, for example more advanced data types (arrays, JSON), more advanced indices (GIN inverted index on array members), replication...

I'm guessing this is a joke?

Why would saying that Postgres is a mature database - more mature than SQLite - be a joke?

SQLite is probably more mature than Postgres in terms of service hours.

I understood the comparison to be with Litestream, not SQLite in general.

Why would we compare Litestream (a backup solution) with Postgres (a database)? Would be like comparing Linux with NGINX.

Postgres has a similar replication solution built-in. It's like comparing Samba with Windows Server if your purpose is to run a SMB file server.

Both are mature. There are way more sqlite databases running than postgres. The code base is smaller and has less new features added to it every year. What is unstable with sqlite?

For a cloud service, I think it comes down to whether you’ll ever want more than one app server.

If you’re building something as a hobby project and you know it will always fit on one server, sqlite is perfect.

If it’s meant to be a startup and grow quickly, you don’t want to have to change your database to horizontally scale.

Deploying without downtime is also much easier with multiple servers. So again, it depends whether you’re doing something serious enough that you can’t tolerate dropping any requests during deploys.

This is the idea behind LiteFS --- to transparently scale out SQLite (in some very common configurations and workloads) to support multiple app servers. It's still there and it works! It's just a little ahead of its time. :)

That makes sense, and it seems really cool from a tech perspective. I guess I'm just inherently skeptical about using something shiny and new vs. battle hardened databases that were designed from the beginning to be client-server.

It's definitely really nice though that if you do choose sqlite initially to keep things as small and simple as possible, you don't immediately need to switch databases if you want to scale.

I think that's very fair. But the use case for Litestream is much simpler and you can get your head around it immediately. It also doesn't ask you to do anything that would commit you to SQLite rather than switching to Postgres later. It's just a way of very easily getting a prod caliber backend up for an app without needing a database server.

99,99% of apps dont need more than one app server. You can serve a lot of traffic on the larges instances.

For sure downtime is easier with kubernete etc but again overkill for 99,99% of apps.

Serving users is one thing. Then you want to run some interactive analytics or cronjobs for cleanup etc on the db. Even if the load can manage it, how would the admin jobs connect to the database. I’ve never seen a db with only one client. There is always some auxiliary thing, even when you don’t consider yourself a microservice shop.

For analytics the solution is very simple. Just copy the database and run the queries on the copy.

agree - with SQLite and DuckDB I've really switched my mindset from one perfect, pristine DB to multiple copies and a medallion architecture that looks more like participation ribbons for everyone! The resources required are so cheap & lightweight when you avoid the big "BI focused" tech stacks.

you can also scale out across unlimited tiny servers, because the entire stack is so lightweight and cheap. This will also force you to focus on devops, which otherwise can become a grind with this approach. The only challenge is when you have cross-DB concerns, either data or clients.

Right, but if your goal is to have a lot of users (and minimal downtime), there's no point in putting a big avoidable obstacle in your path when the alternative is just as easy.

If your goal is to serve billions of users you should probably use cassandra etc. Why limit yourself to postgres if your goal is to have a billion users online at the same time?

Because cassandra isn't easy to set up and has all kinds of tradeoffs on consistency, transactions, et al compared to an SQL db.

On the other side, why not just store everything in memory and flush to a local json file if you won't have any users? sqlite is overkill!

your goal should be to solve this problem when you have a million or 10M concurrent users. YAGNI

I'd argue that anything larger than a desktop app should not use SQLite. If you need Litestream for replication and backup it is probably better to just use Postgres. There are a ton of one-click deployment offerings for proper databases, Fly.io actually offers managed Postgres.

Why would you argue that? Do you have some benchmarks backing it up or is it more a personal preference?

It's literally what they're designed for.

SQLite is designed for one local client at a time. Client-server relational databases are designed for many clients at a time.

That's not entirely true. SQLite is designed to support many processes reading the same file on disk at once. It only allows one process to write at a time, using locks - but since most writes finish in less than a ms in most cases having a process wait until another process finishes their write isn't actually a problem.

If you have lots of concurrent writes SQLite isn't the right solution. For concurrent reads it's fine.

SQLite also isn't a network database out-of-the-box. If you want to be able to access it over the network you need to solve that separately.

(Don't try and use NFS. https://sqlite.org/howtocorrupt.html#_filesystems_with_broke... )

the reality is very few workloads have access patterns that SQLite can't support. I would much rather start with a strategy like 1. use sqlite for my beta / single client, 2. duplicate the entire environment for the next n clients, 3. solve the "my application is wildly successful" and SQLite is no longer appropriate problem at a future date. Spoiler: you're never going to get to step #3.

> 2. duplicate the entire environment for the next n clients

That becomes an instant problem if users ever write to your database. You can't duplicate the environment unless it's read-only.

And even if the database is read-only for users, the fact that every time you update it you need to redeploy the database to every client, is pretty annoying.

That's why it's usually better to start with Postgres or MySQL. A single source of truth for data makes everything vastly easier.

Not true. Can you back up your claim that the developers of Sqlite says they dont recommend it for webservers? (hint they recommend it).

If you have a read-heavy app (99% of saas) that runs on one server and dont have millions of users then sqlite is a great option.

I didn't say that. I said one local client at a time. If you're running on one server then your webserver is the one local client.

Usually you want to be able to run multiple webservers against a single database though, since that's the first thing you'll usually need to scale.

Let's say I'm building a small app that I'm hosting on some shared vps, if I think about the effort involved in setting up sqlite with litestream and just getting a $5 (or free) postgres provider I don't think sqlite makes my life easier.

Now if I'm building a local app then absolutely sqlite makes the most sense but I don't see it otherwise.

Litestream is dead simple to setup. You make an S3 bucket (or any compatible storage bucket), paste the access keys and the path to your db file in /etc/litestream, and then run

  dpkg -i litestream.deb
  systemctl enable litestream
  systemctl start litestream
The fact it's so simple is my favourite thing about it.

Are there any use cases/documentation about how litestream can be used within a docker based deployment? (Eg where systemctl wouldn't be used)

Here's their docs on running in a Docker container: https://litestream.io/guides/docker/

You'd probably want to put the sqlite db in a volume & run litestream in a separate container that restarts automatically on failure.

Systemctl's only in there to restart it if it crashes; litestream itself is (iirc) a single cli binary.

This is documented on the Litestream website.

Effort of setting up litestream and sqlite is less time than you spend signing up for supabase. And you can have 100 apps with their own databases for almost free (just a few cents of storage) vs 5*100 for postgres.

I love postgres but in no way is it as simple to run as sqlite (pretty sure even postgres core team would agree that postgres is more complex than sqlite).

The common answer (especially from Fly.io) is "at-the-edge" computing/querying. There is network latency involved in sending a query to MySQL or Postgres and getting the data returned, whereas with Litestream you could put a read replica of the entire SQLite DB at every edge. Queries become fast and efficient only to the local read replica. There's still network latency associated with updating that read replica over time, but it is amortized based on the number of overall writes rather than the number of queries, is more fault tolerant in "eventually consistent" workflows (you can answer queries from the read replica at the edge in the state that you have it while you wait for the network to reconnect and replay the writes you missed during the fault), and with SQLite backing it still has much of the same full relational DB query power of SQL you would expect from a larger (or "proper") database like MySQL or Postgres.

Law Theorem[1] fits perfectly for this scenario

1- https://law-theorem.com/

I have a branch office in boondocks with limited internet connection. The branch office cannot manage a RDBMS or access cloud services. They can use sqlite app on LAN and we could do reconciliation at end of the business day.

they can also run the entire application in these scenarios on the resources of a 10-yr-old phone.

I'm not sure, I've never done it, but I think the idea is to have many tiny customer-specific databases and move them to be powered by sqlite very close to the customer.

But I'd love to hear more from someone more well-versed in the use cases for reliable sql-lite

I find myself mostly in this camp now.

In every case where I had a SQLite vertical that required resilience, the customer simply configured the block storage device for periodic snapshots. Litestream is approximately the same idea, except you get block device snapshots implicitly as part of being in the cloud. There is no extra machinery to worry about and you won't forget about a path/file/etc.

Also, streaming replication to S3 is not that valuable an idea to me when we consider the recovery story. All other solutions support hot & ready replicas within seconds.

It's a good question, and I don't think answered sufficiently in the recent sqlite hype.

In my opinion if you have an easy way to run postgres,MySQL,... - just run that.

There's usually a lot of quirks in the details of DB usage (even when it doesn't immediately seem like it - got bitten by it a few times). Features not supported, different semantics, ...

IMO every project has an "experimental stuff" budget and if you go over it it's too broken to recover, and for most projects there's just not that much to win by spending them on a new database thing

>> the recent sqlite hype.

This is an interesting take; why do you see recent hype around the most boring and stone-age of technologies, SQLite?

The rails creator dhh has been hyping it up a lot in the first 6 month of this year, and quite a few followed of the "Dev influencers" scene. Fly's litestream came out around that time, and there's been more sqlite in the cloud companies/discussions, in particular with the AI agent use-case.

Not super sure who followed who but there was all of a sudden a lot of excitement

Litestream's first release was February 2021: https://news.ycombinator.com/item?id=26103776

SQLite's "buzz" isn't new, type "sqlite" into my https://tools.simonwillison.net/hacker-news-histogram tool and you'll see interest (on HN at least) has been pretty stable since 2021.

Maybe it's a local bump, but it sure seems like SQLite has become a fair more popular topic in the Rails world. I wouldn't expect to find it in a HN search tool. SQLite has gone from the little database you might use to boostrap or simplify local development to something products are shipping with in production. Functionality like solid_cable, solid_cache, and solid_queue allow SQLite to be used in more areas of Rails applications and is pitched as a way to simplify the stack.

While I don't have stats about every conference talk for the last decade, my experience has been that SQLite has been featured more in Rails conference talks. There's a new book titled "SQLite on Rails: The Workbook" that I don't think would have had an audience five years ago. And I've noticed more blog posts and more discussion in Rails-related discussion platforms. Moreover, I expect we'll see SQLite gain even more in popularity as it simplifies multi-agent development with multiple git worktrees.

My bad, I remember vaguely that fly io had a "litefs released" post but I seem to have confused timelines

Yeah LiteFS was more recent - September 2022, https://fly.io/blog/introducing-litefs/ - and the Cloud hosted version was July 2023 https://fly.io/blog/litefs-cloud/

Oh right the cloud version is what I remember - thanks for clarifying

Bit more stretched out than I thought it had been

this is infra for a single-user app. SQLite is THE replacement for file databases like MSAccess, but the box goes down and your database dies with all your data.

So this fills that gap by giving you a database as a service level of QOL without needing to provision a database as a service backend. Otherwise you're dicking about maintaining a service with all that comes with that (provisioning, updating, etc) when really all you need is a file that is automagically backed up or placed somewhere on the web to avoid the drawbacks of the local file system.

But aren't many single-user apps still multi-platform? For example as an Android application but also as a web app the user might access from his desktop device?

Ye that's fine because same user access wont be concurrent. We're avoiding data corruption or the need to take out an expensive and broad write lock.

Sqlite and msaccess can and have often been used with multiple users. I have experience with the latter in the 2000s with Access on a network share.

it's not the correct solution for multiple users. If you want that then you should be running a database as a service.

> If there are many client programs sending SQL to the same database over a network, then use a client/server database engine instead of SQLite.

https://www.sqlite.org/whentouse.html

An argument for using postgres is that you can still use one server, and postrgres has multithreading which allows for more performance.

To enable local-first or offline-first design. I prefer having data stored on-device and only optionally backed up to cloud

Whatever database you end up using, you'll need some sort of backup solution. Litestream is a streamed backup solution which effectively doubles as replication for durability purposes.

MySQL, Postgres, etc. have a much greater overhead for setup, unless you want to pay for a managed database, which is not going to be worth the price for small quantities of data.