Hacker News

I've read the link and the GitHub readme page.

I'm sure I'm in the top 1% of software devs for the most number of timestamps parsed. [1]

DST is not a problem in Python. It's parsing string timestamps. All libraries are bad, including this one, except Pandas. Pandas does great at DST too btw.

And I'm not shilling for Pandas either. I'm a Polars user who helicopters Pandas in whenever there's a timestamp that needs to be parsed.

Pandas has great defaults. Here's string timestamps I expect to be paesed by default. I'm willing to pass timezone in case of naive timestamps:

* All ISO 8601 formats and all its weird mutant children that differ by a tiny bit.

* 2025-05-01 (parsed not as date, but as timestamp)

* 2025-05-01 00:00:00 (or 00.0 or 00.000 or 0.000000 etc)

* 2025-05-01 00:00:00z (or uppercase Z or 00.0z or 00.000z or 0.000000z)

* 2025-05-01 00:00:00+02:00 (I don't need this converted to some time zone. Store offset if you must or convert to UTC. It should be comparable to other non naive timestamps).

* 2025-03-30 02:30:00+02:00 (This is a non existent timestamp wrt European DST but a legitimate timestamp in timestamp representation, therefore it should be allowed unless I specify CET or Europe/Berlin whatever)

* There's other timestamps formats that are non standard but are obvious. Allow for a Boolean parameter called accept_sensible_string_parsing and then parse the following:

  \* 2025-05-01 00:00 (HH:mm format)

  \* 2025-05-01 00:00+01:00 (HH:mm format)

[1] It's not a real statistic, it's just that I work with a lot of time series and customer data.

Disclaimer: I'm on the phone and on the couch so I wasn't able to test the lib for its string parsing before posting this comment.

ariebovenberg 4 days ago [ - ]

Author here. It's indeed a hard problem to parse "All ISO 8601 formats and all its weird mutant children that differ by a tiny bit." Since the ISO standard is so expansive, every library needs to decide for itself what to support. The ISO standard allows all sorts of weird things, like 2-digit years, fractional months, disallowing -00:00 offset, ordinal days, etc.

Javascript's big datetime redesign (Temporal) has an interesting overview of the decisions they made [1]. Whenever is currently undergoing an expansion of ISO support as well, if you'd like to chime in [2].

[1] https://tc39.es/proposal-temporal/#sec-temporal-iso8601gramm... [2] https://github.com/ariebovenberg/whenever/issues/204#issueco...

iknownothow 4 days ago [ - ]

Thanks for the reply and apologies for the general cynicism. It's not lost on me that it's people like you that build tools that make the work tick. I'm just a loud potential customer and I'm just forwarding the frustration that I have with my own customers onto you :)

Your customers are software devs like me. When we're in control of generating timestamps, we know we must use standard ISO formatting.

However, what do I do when my customers give me access to an S3 bucket with 1 billion timestamps in an arbitrary (yet decipherable) format?

In the GitHub issue you seem to have undergone an evolution from purity to pragmatism. I support this 100%.

What I've also noticed is that you seem to try to find grounding or motivation for "where to draw the line" from what's already been done in Temporal or Python stdlib etc. This is where I'd like to challenge your intuitions and ask you instead to open the flood gates and accept any format that is theoretically sensible under ISO format.

Why? The damage has already been done. Any format you can think of, already exists out there. You just haven't realized it yet.

You know who has accepted this? Pandas devs (I assume, I don't them). The following are legitimate timestamps under Pandas (22.2.x):

* 2025-03-30T (nope, not a typo)

* 2025-03-30T01 (HH)

* 2025-03-30 01 (same as above)

* 2025-03-30 01 (two or more spaces is also acceptable)

In my opinion Pandas doesn't go far enough. Here's an example from real customer data I've seen in the past that Pandas doesn't parse.

* 2025-03-30+00:00 (this is very sensible in my opinion. Unless there's a deeper theoretical regex pattern conflicts with other parts of the ISO format)

Here's an example that isn't decipherable under a flexible ISO interpretation and shouldn't be supported.

* 2025-30-03 (theoretically you can infer that 30 is a day, and 03 is month. BUT you shouldn't accept this. Pandas used to allow such things. I believe they no longer do)

I understand writing these flexible regexes or if-else statements will hurt your benchmarks and will be painful to maintain. Maybe release them under an new call like `parse_best_effort` (or even `youre_welcome`) and document pitfalls and performance degradation. Trust me, I'd rather use a reliable generic but slow parser than spend hours writing a write a god awful regex that I will only use once (I've spent literal weeks writing regexes and fixes in the last decade).

Pandas has been around since 2012 dealing with customer data. They have seen it all and you can learn a lot from them. ISOs and RFCs when it comes to timestamps don't mean squat. If possible try to make Whenever useful rather than fast or pure. I'd rather use a slimmer faster alternative to pandas for parsing Timestamps if one is available but there aren't any at the moment.

If time permits I'll try to compile a non exhaustive list of real world timestamp formats and post in the issue.

Thank you for your work!

P.S. seeing BurntSushi in the GitHub issue gives me imposter syndrome :)

burntsushi 4 days ago [ - ]

Because you pinged me... Jiff also generally follows in Temporal's footsteps here. Your broader point of supporting things beyond the specs (ISO 8601, RFC 3339, RFC 9557, RFC 2822 and so on) has already been absorbed into the Temporal ISO 8601 extensions. And that's what Jiff supports (and presumably, whenever, although I don't know enough about whenever to be absolutely precise in what it supports). So I think the philosophical point has already been conceded by the Temporal project itself. What's left, it seems, is a measure of degree. How far do you go in supporting oddball formats?

I honestly do not know the answer to that question myself. But I wouldn't necessarily look to Pandas as the shining beacon on a hill here. Not because Pandas is doing anything wrong per se, but because it's a totally different domain and use case. On the one hand, you have a general purpose library that needs to consider all of its users for all general purpose datetime use cases. On the other hand, you have a data scienc-y library designed for trying to slurp up and make sense of messy data at scale. There may be things that make sense in the latter that don't in the former.

In particular, a major gap in your reasoning, from what I can see, is that constraints beget better error reporting. I don't know how to precisely weigh error reporting versus flexible parsing, but there ought to be some deliberation there. The more flexible your format, the harder it is to give good error messages when you get invalid data.

Moreover, "flexible parsing" doesn't actually have to be in the datetime library. The task of flexible parsing is not, in and of itself, overtly challenging. It's a tedious task that can be build on top of the foundation of a good datetime library. I grant that this is a bit of a cop-out, but it's part of the calculus when designing ecosystem libraries like this.

Speaking for me personally (in the context of Jiff), something I wouldn't mind so much is adding a dedicated "flexible" parsing mode that one can opt into. But I don't think I'd want to make it the default.