I'm impressed. You guys cloned a whole suite of products in a short period of time that cost millions of dollars. Even the little bits of humor look costly.

On the other hand, it's way more information than I expected. I can see why someone would hesitate to release them - there's a lot to sift through and it's likely even the government couldn't sift through all of them to make sure their friends weren't mentioned somewhere.

Thanks! And it's a lot of info, yeah. ~90% of new data in yesterday's drop was photographs, which they redacted for us.

The House Oversight Committee's giant drop in November had tons of data we still didn't take advantage of even after doing the original Jmail, like flight logs.

For the Yahoo release, which is still ongoing, the folks at Drop Site News (see https://www.jmail.world/about) are handling the manual redaction which has been very time consuming, even with tons of AI to help in the background.

Would be nice to explain at some point how we did the structuring of the destructured data.

For now we’re focusing on fixing the bugs because we’re already seeing an insane wave of traffic so most of us are focused on keeping the site alive.

Hey, I’d be interested in your thoughts on this, or the key ideas/research results you relied on:

Yes! We used our friends at Reducto (https://reducto.ai/) for all document extraction and parsing (one of the best companies I've ever referred to YC ;) )

We did an initial parsing pass of all four DOJ document batches on Friday. This takes a raw PDF and returns chunks containing typed blocks—each with a type (Title, Text, Figure, etc.), bounding boxes, content, and confidence scores. For PDFs that were just scans of photographs (which was like 90% of new content in Friday's release), it gave in depth descriptions of those! You can type search terms like "door" at https://www.jmail.world/photos to see what I mean.

For apps like Jmail and JFlights we use their structured extraction endpoint instead—you define a schema (e.g. {from, to, subject, date, body} for emails or {departure_airport, arrival_airport, passengers[], date} for flights) and it pulls those fields directly into JSON.

The JFlights example served as the best ad for Reducto and how doc parsing technology can speed up hours of journalistic investigations like this.

See for yourself. Given this document

https://www.jmail.world/drive/HOUSE_OVERSIGHT_002031

It inferred and enriched multiple flight cards on JFlights (https://www.jmail.world/flights). I was really shook when I first saw this.

This might be our coolest case study yet. Thanks for the mention!

One interesting thread to pull is "Stuff released and then Yanked back" ...

Images removed from Epstein files less than a day after being posted - https://www.abc.net.au/news/2025-12-21/images-removed-from-e...

promises all the sleuthing excitement of chasing the significance of Donald in a Drawer.

Images were also planted to falsely suggest incriminating evidence.

while true, it would probably be useful to provide examples. The one that I am aware of seems to be a picture showing Clinton, Michael Jackson, and Diana Ross with "redacted" victims

https://www.imdb.com/news/ni65628031/

https://bsky.app/profile/meidastouch.com/post/3mag7myutmc2d

however it seems that this photo is actually taken from a 2003 Democratic fundraiser, and the redacted images of victims were of Diana Ross' son Evan, and Michael Jackson's kids, Paris and Prince Jackson. This may or may not be accurate either, since I have not been able to dig down into the photo and determine if it has any connections to a supposed 2003 fundraiser.

But it seems more likely to be true than not that this was sloppily planted evidence that was especially insultingly fake.

on edit: looking closer does not seem to be exact same photos, but instead two different photos taken at the same time and place, so in the 2003 Dem fundraising, but a different photo of that. So it could be that Epstein had it and DOJ thought hey, look at these pervs! Let's release!!

Is it possible that one is an input photo and the other is generative AI output?

As you say, it's not the same photo. If the one in the dump was in Epstein's possession, the reason for the redactions are either that some drone in the DOJ just redacted all children out of habit, or that it was deliberately done in such a way as to frame Clinton. I can't decide which I find more credible.

I think if it hadn't been those adults with the kids an alert staffer might have thought "whose kids are these, these aren't young teenage girls, I better double check" But Michael Jackson, kids, Clinton arms around him, Diana Ross with young male, they're thinking they walked into an armory filled with nothing but smoking guns!

>the reason for the redactions are either that some drone in the DOJ just redacted all children out of habit, or that it was deliberately done in such a way as to frame Clinton

They were supposed to redact all minors, not just "victims".

There’s no need to frame Clinton, there is plenty of evidence he was friends with and spent a lot of time with Epstein.

Similarly situation with Trump, for that matter.

It is perfectly possible, even common, to frame the guilty. It’s easier than finding real evidence.

Sure, but in this case there already is plenty of real evidence.

I see people are not clued into this and incredulously downvote because the file release appears to be in good faith to them such that illegal evidence tampering is out of the question

See https://news.ycombinator.com/item?id=46341688

The post you link to is deleted.

[flagged]

But, whoever’s doing the redacting sees the original right? What prevents the redactor from saying, “here’s what the document really said.” Or “here’s who’s in the image, I saw it before I redacted it?”

The idea of spending the rest of their life in prison is what stops them

Yeah but a few words from somebody like Ghislaine could completely fuck shit up for a lot of people.

Of course, she'll have hanged herself shortly afterward while the security cameras were malfunctioning.

Part of the law mandates that all redactions will be listed for Congress within 15 days.

I’d guess a first pass is done automatically? Eg if a page mentions eg Trump, just redact that whole page/paragraph/etc. So the people who have done the closer reading to redact further probably don’t actually know the scale of what was already redacted. Just a guess though.

People who they think will do this don't get to be redactors. It's all about power and relationships, not technology.

Given how MTG went completely silent despite her high profile platform, I'm guessing the civil (or at this point, royal) servants don't want their families harmed.

That’s a good point. I would imagine they break it up into pieces - in a reCAPTCHA sorta way - and any given person sees a sentence or a piece of a sentence.

An alternative would be to strip out all obvious known words and only leave unknowns (i.e., names) and then have those fragments reviewed (in a reCAPTCHA sorta way).

Finally, for images, cover all faces and the one by one decide which should remain covered and which should not.

LOTS of work but there are workflows to mitigate the ability for reviewers to connect more than they should.

I'm being snarky and this isn't such a serious comment and I don't really mean this for Gemini but can you imagine using something like Gemini ("Hi, please comb through this") and it just refuses on ethical grounds

We found that Codex indeed refuses but Claude + Gemini are willing to RAG it

also, shoutout the Jason Liu (https://news.ycombinator.com/user?id=jxnlco) for discovering that one. His turbopuffer-based version of Jemini is coming soon!

Usually Claude is the prude. Personally I haven't even tried for fear what I'd find. I can stomach homicide and war pictures, but Epstein is too much.

I just have real institutional problems with Google, they have all the best tech minds but some things are just off limits to them being politically correct

And no, not Epstein. It's a general statement; but it's disappointing that they're like this (and of course Gemini was famously the one that gave black Nazis and things like that)

Google has never fixed their black people/gorilla issue. The foundational tech that all of their products run on going back a decade is fundamentally flawed (and outputs outputs that many would say align with racist ideologies, among others).

> You guys cloned a whole suite of products in a short period of time that cost millions of dollars.

At the risk of stating the obvious, the functionality isn't actually cloned, only the UI. The actual code powering Gmail probably dates back to the late 80s or early 90s and has had several hundred thousands of hours of work put into it. This is just a webpage that looks kind of similar.

I point this out only because I've seen people saying that software businesses don't have moats anymore because of this, which is taking away a completely false lesson.

> The actual code powering Gmail probably dates back to the late 80s or early 90s and has had several hundred thousands of hours of work put into it.

no. google did not exist until the late 90s.

various forms of internet email sure did, but most popular mtas of the google era shared very little code with predecessors from the 80s and early 90s (maybe sendmail) and google almost certainly wrote their own from scratch.

but your first point. that an archive browser that looks like gmail is not equivalent to a full tilt email service backend is valid.

I mean it is so obvious causing me to find the use of the phrase cloned so weird that I feel it needs to be said.

The UI cloning doesn't feel exactly correct either there are things that are slightly off.

But I just find the "cloned" wrong, because obviously you cannot send an email from this account, you cannot log in to the service as Jeffrey Epstein, you cannot delete emails, create alerts based on searches, do actions on selected emails (create new tag, move under that tag)

there are so many functionalities that are not cloned because obviously they could not be cloned because they would make no sense for what this project is. So just the praise for cloning so quickly makes me sort of mad.

You could theoretically make something like this that allowed log in so you got a personalized epstein mails, and then could do all that, and perhaps get more mails sent in as files get released, and perhaps create Google alerts on epstein in the news etc. that would come as mails and maybe the code could put news that came in, into the appropriate the tags etc.

But until that time "cloned" is just very wrong.

For the holidays, they should at least implement a Shockingly Distasteful Jeffrey Epstein Christmas Card Meme Generator.

Out of curiosity, would you explain what you mean by that? Google was founded in 1998 and writing a mail client isn't terribly complicated. Did they buy some code for Gmail from an older company? Is Gmail older than Google?

A full featured mailed client is insanely complicated. If you think mail client is just smtp, you probably think word is just text with some styling and excel is just some cells and functions.

I’m sure, buried somewhere deep in Google systems, are vestiges of mail server code originally written in the 80s. But when people use the name Gmail, they are generally referring to the client facing web app, which does not have any such code.

If it exists, it's probably not at all related to Gmail or only used for testing. I don't think Google reuses a lot of third party code in its first party server software.

https://www.supremecourt.gov/opinions/20pdf/18-956_d18f.pdf

I mean it has happened in other Google products...

Even "just smtp" isn't trivial.

It is, or was at least. At the age of 13, I've created one for Windows. It was relatively widely used at the time.

> Did they buy some code for Gmail from an older company?

They bought both Deja and Neotonic.

it is not. gmail is 100% from paul bucheit.

He wasn't sitting there writing binary code and implementing all 7 layers of the OSI stack by hand, he was was gluing together pre-existing components. And the pre-existing components he had access to include two major email startups acquired by Google in 2001 and 2003, which were founded in 1995 and 1997 respectively. (Although he does have at least two patents for features and algorithms he co-invented while making Gmail.)

If I invite you to a barbeque and tell you I made lunch, will you tell me off because I didn't raise and butcher the cow?

This is more like using a sauce that someone else made.

[deleted]

Gmail is not just a mail client.

The spam checker alone is an ton of work. It needs to handle millions of mails for millions of users a day.

Nitpick: pretty sure both of those are in the billions.

Mails could even be in the trillions.

Why stop there, I'm sure you can trace Gmail all the way back to the Roman aqueducts.

The Link Between a Horse's Arse and the Space Shuttle • Physics Forums https://share.google/UnmMwwQv9kyksKhkI

I'm not a physicist, but after getting into the rotten fruit this fall, I would bet my friend's horse could launch a space shuttle from her arse. Such a sweet mare, but she has no hesitation blasting Venetian atmosphere right into your face while you're scraping the shit out of her feet. At least she has the decency to make eye contact while doing it

I mean technically if we didn't have Roman aqueducts, would we have Gmail today?

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh water system, public health, and GMail what have the Romans ever done for us?

All right, but apart from Google Wave, Google Reader, Google+, Inbox, Stadia, Project Ara, Google Glass, Loon, Picasa, Orkut, Hangouts, Allo, Duo, Google Domains, Google Health, Google Notebook, iGoogle, Knol, Jaiku, Daydream VR, Google Play Music, Nexus, Fusion Tables, Bump, Revolv, Songza, QuickOffice, Meebo, Panoramio, Milk, Schemer, Sparrow, Poly, Tilt Brush, Tour Builder, URL Shortener, Latitude, Spaces, Google Hire, Google Bulletin, Shoelace, and Neighbourly, Android Things, Project Tango, Ara Module Marketplace, Google TV, Nexus Q, Google Play Newsstand, Google Play Movies & TV, Google Podcasts, Google Now, Google Now Launcher, Google Goggles, Gesture Search, MyTracks, Google Play Edition, Android Auto for Phone Screens, All Access, Google Currents, Google SMS Search, Google Cloud Messaging, Android Beam, Androidify, Field Trip, Google Currents, and Google Play Artist Hub, what has Google ever done for us?

I'm a little out of the loop. Are any of those still active projects?

[deleted]

I mean the I would really only include the code for things like:

- Fetching email messages

- Parsing email headers

- Mime parsing

- Converting the text of email bodies into UTF-8

- Threading messages

- Eliding reply text

Given that the official story is that pb made the first version of Gmail in a day, does anyone actually believe that he wrote the code for any of those things in a day? If you honestly believe that I have a bridge to sell you.

Wait till you learn that the source code in Chrome also predates the existence of Google.

At least to the Black Death.

I don't know if I'm just misremembering but it feels like over the last three years or so the technical knowledge on HN has gone down the toilet.

Could it instead be that less technically inclined people feel more empowered to hang out here?

"less technically inclined" doesn't mean people can make whatever incorrect claims unchecked like on reddit, where you get banned because you post inconvenient facts in the wrong sub.

And this is exactly why I stopped participating in discussions on reddit and never on LinkedIn. Discussions on HN are so much civil and respectful here

P.S. if the top level comment was indeed posted by a "less technically inclined" person, I hope this is a humbling, positive educational experience, at least that's how I would take it

Maybe that and manipulating technical tools requires far less background knowledge than it did, meaning the definition of “technically inclined” has shifted, as it often does.

Most *nix tools have their origins in the ‘70s-80s.

Email as a technology is ancient by today’s standards. SMTP protocol got established in 1982. Even sendmail dates as far back as the ‘70s.

This is a pretty good talk on the history of email: https://www.youtube.com/watch?v=mrGfahzt-4Q

And the earlier technology of homing pigeons goes back even further

It's Reddit type conversation often

[flagged]

>I decided to get the Max 20x plan, and prompting 4 projects with each 2 to 3 running 'conversations' , never hit the limit anymore.

Can you expand on this please? Really cool btw.

> I'm impressed. You guys cloned a whole suite of products in a short period of time that cost millions of dollars. Even the little bits of humor look costly.

The cynic in me would assume that someone with a lot of money wants to hide some of the emails and the best way to do that (at this point) is to release them filtered with a great UI.

That’s not cynicism, it’s conspiracy theorism. That leads you to “the whole Epstein thing is a hoax designed to distract from what’s really going on”.

No, because the hypothesis is relatively easily falsifiable. This does not hold for conspiracy theories.

Would you like to do the work of falsifying it? (Since you made the claim, and posted it online, I'd argue you have some responsibility to do so.)

The thing I got from reading the majority of these emails is Epstein / trump connection was not that strong later years. I feel JE humored trump to a degree and disliked him to an even larger degree. He may have initially had strong relations in the beginning but he was NOT pleased he was winning the presidency at all. He mentioned multiple times references to dirt on DT and even at one point there was the question did Trump set him up. Not to say JE did no wrong, cause the evidence is 100% there for that but it's super interesting having read the actual files to see the various media spins on all sides. If anything though it's led me to believe there are much stronger ties to Russia with DT than I thought before. (Palm Beach House, the casino, models coming from those areas etc).

Well there's only 2500 emails here. They definitely had time to sift through these to make sure friends weren't mentioned.

I read through 80% of them last night by myself. I mean, I didn't go to bed until 3am but spread across a handful of agents? yeah you could do it in an hour.

Regardless, it seems like they used the Ctrl-a key to speed up redacting

> it's likely even the government couldn't sift through all of them

How could you tell?

They also have “promotions” tab listing all promo content. I wonder is this real or mock data.

there's a lot to sift through

The total archive size is 300GB. AFAIK they have only released around 2GB. Curious what is in the rest of it assuming it does not get [redacted] out or deleted. I am also curious how they intend to release the rest of it in time to meet the requirements of the act. Discussion [1] Epstein Files bill sponsor Ro Khanna and Hassan, no dogs being zapped.

[1] - https://www.youtube.com/watch?v=KT2u0Fp3hQg [video][1hr12m]

Yeah, there’s a ton of information. https://epsteinsecrets.com/network is another tool to pursue the data dumps.

[deleted]

"…there's a lot to sift through…"

A job for an LLM…

> I can see why someone would hesitate to release them - there's a lot to sift through and it's likely even the government couldn't sift through all of them to make sure their friends weren't mentioned somewhere.

Jared kushner, is that you?

Please don't be snarky or post shallow internet tropes. We're trying for something else here.

You're welcome, of course, to make your substantive points thoughtfully.

https://news.ycombinator.com/newsguidelines.html

It’s ok, it follows the rules - I made the comment very thoughtfully and it is indeed substantive. It’s also not snarky and isn’t a shallow internet trope.

Interpretations can differ, of course, but I don't think this was a borderline call.

Thinking Gmail costs "millions to develop" sounds exactly like the kind of price unawareness that comes from that family.

I would bet the Gmail team has single employee salaries in that range.

To be fair, millions could be hundreds of millions.

Sure. And you are inches tall.

What do these million dollar salary employees at Gmail do?

Three things, not all of which any specific employee does: 1. Fix security issues 2. Create “features” in order to seem useful that the world was better without 3. Rest upon laurels of gmail from 15 years ago

Make Google multiple millions by improving ad delivery and conversion within Gmail. Probably by also helping Google land big corporate or public contracts, but last I checked most of the money was made via ads in the free tier of GMail.

"whole suite of products in a short period of time that cost millions of dollars."

but they just copy the "UI" not the whole product

[flagged]

> - there's a lot to sift through and it's likely even the government couldn't sift through all of them to make sure their friends weren't mentioned somewhere.

if only there were some kind of universal summary engine that never gets tired and is essentially free.