Hacker News

Yeah, that was just a design choice that I made: I wanted a library that worked with `Iterator`s, felt more lightweight to me / fit my immediate needs better. I'm personally not a huge fan of Pandas DataFrames for certain applications.

LOTUS (by Liana Patel et al., folks from Stanford and Berkeley; https://arxiv.org/abs/2407.11418) extends Pandas DataFrames with semantic operators, you could check out their open-source library: https://github.com/lotus-data/lotus

Semlib does batch requests, that was one of the primary motivations (I wanted to solve some concrete data processing tasks, started using the OpenAI API directly, then started calling LLMs in a for-loop, then wanted concurrency...). Semlib lets you set `max_concurrency` when you construct a session, and then many of the algorithms like `map` and `sort` take advantage of I/O concurrency (e.g., see the heart of the implementation of Quicksort with I/O concurrency: https://github.com/anishathalye/semlib/blob/5fa5c4534b91aa0e...). I wrote a bit more about the origins of this library on my blog, if you are interested: https://anishathalye.com/semlib/

ETA: I interpreted “batching” as I/O concurrency. If you were referring to the batch APIs that some providers offer: Semlib does not use those. They are too slow for the kind of data processing I wanted to do / not great when you have a lot of data dependencies. For example, a semantic Quicksort would take forever if each batch is processed in 24 hours (the upper bound when using Anthropic’s batch APIs, for example).