I found this part interesting:

There are also other documents that appear to simulate a scanned document but completely lack the “real-world noise” expected with physical paper-based workflows. The much crisper images appear almost perfect without random artifacts or background noise, and with the exact same amount of image skew across multiple pages. Thanks to the borders around each page of text, page skew can easily be measured, such as with VOL00007\IMAGES\0001\EFTA00009229.pdf. It is highly likely these PDFs were created by rendering original content (from a digital document) to an image (e.g., via print to image or save to image functionality) and then applying image processing such as skew, downscaling, and color reduction.

GNOME Desktop users can put this in a Bash script in ~/.local/share/nautilus/ for more convincing looking fake PDF scans, accessible from your right-click menu. I do not recall where I copied it from originally to give credit so thanks, random internet person (probably on Stack Exchange). It works perfectly.

  ROTATION=$(shuf -n 1 -e '-' '')$(shuf -n 1 -e $(seq 0.05 .5))

  for pdf in "$@";
    do magick  -density 150 $pdf \
              -linear-stretch '1.5%x2%' \
              -rotate 0.4 \
              -attenuate '0.01' \
              +noise  Multiplicative \
              -colorspace 'gray' \
              "${pdf%.*}-fakescan.${pdf##*.}"
  done

That seq is probably supposed to be $(seq 0.05 0.05 0.5). Right now it's always 0.05.

Note that you can get random numbers straight from bash with $RANDOM. It's 15 bit (0 to 32767) but good enough here; this would get between 0.05 and 0.5: $(printf "0.%.4d\n" $((500 + RANDOM % 4501)))

Nothing about this is specific to GNOME, right? Imagemagick is cross-platform

I guess the Gnome-specific part is that Gnome comes with the Nautilus file browser, and the instructions add a script for Nautilus.

But yea, this will work as long as you have imagemagick and Nautilus installed.

Oh I missed that part, was just looking at the script

or just run script and input pdf as argument...

Shouldn't $ROTATION be set inside the loop and actually used in the magick command?

You know, now that you point it out that seems obvious. I think maybe I was experimenting with rotation and left that in, unused. I did this years ago. The loop works OK though. Thanks for the feedback (and now I have to finish editing that script ...)

[flagged]

you sound as grumpy as my cat looks. there's no need for this language

[flagged]

The real question is: Which of the documents are the ones that are "simulating" scanned documents, and what political narrative do they reinforce?

The only reason I can think of for why someone would want to do this is to pass off fraudulent or AI generated images as real.

A simpler explanation could be wanting to skip the print->sign->scan ceremony required by some institutions.

This. Slip in a few thousand “fakes” with the trove of goods to be able to fabricate a narrative.

Very interesting. That document in particular seems to be an interview of A. Acosta by the DoJ from 2019. But what reason would the FBI have for pretending it's a scanned document, if it is genuine? Perhaps there's some aspect of Epstein's deal with Acosta that they'd rather not reveal to the public?

https://www.justice.gov/epstein/files/DataSet%207/EFTA000092...

Not that I can speak from personal experience or anything... But somebody on an email chain may have requested a scanned version of the document to ensure there is no metadata and the employee might have found it easier to just flatten the pdf and apply a graphical filter to make the document appear like a scanned document. There might even be a webtool available somewhere to do so, I wouldn't know...

[dead]

Straight to the signup page? A bit blatant, no?

> the employee might have found it easier to just flatten the pdf and apply a graphical filter to make the document appear like a scanned document

Is that remotely plausible? I can't imaging faking a scan being easier than just walking down the hall to the copier room.

If I look at my personal work situation, working from home would mean I can't do it immediately, but would have to remember to do it the next day. Or just do it digitally right now in a few minutes and have it off my to-do list

Don't attribute to malice what can be attributed to laziness, these are government workers

I think maybe the old "don't attribute to malice" adage goes out the window when we're talking about a coverup of a giant child sex trafficking ring run by high-up people in the government.

While I don't disagree with your point about Epstein case being a massive cya for a ton of people in power, the fact is that if they deeply wanted to cover up something the right way to do it would to be to actually print it and scan it, this does look like someone shortcutted some broad order to print and scan all digital media.

Look, what I'm saying is that I don't have a scanner at home or at work and I've find this.

The time advantage of faking a scan becomes better the more pages you have to scan.

https://xkcd.com/1205/

Nice. But 5 years seems unrealistic. Who stays on the same job using same processes 5 years these days? Even if the task might remain the same, input formats might change, requiring extra maintenance to the tool. Should recalculate that for 3 years before using it in my automation decisions.

you do not work in the public sector, where processes change rarely, slowly, and partially

Working from home and no scanner in the house?

No printer.

If it's already scanned, then you don't have to leave your desk.

You’re talking about 1,000 FBI agents locked in a building. There’s no printer.

[deleted]

It's thousands of pages, surely investing some time in a script is faster. They were in a rush as well.

If they were faking the documents rather than the delivery method they definitely could have invested some time in flawless looks.

Or more-realistic flawed looks as the case is here.

Depending on their technical capability, yes.

I mean even in this thread you got what are essentially one-liners to do it.

Definitely less hassle then doing it irl

Hoe big a percentage of FBI / DoJ employees are running linux (with imagemagick) as their work computer? I'd be surprised to see a similar oneliner for a stock windows installation.

Yeah they might have used some web converter, but that on the other hand would have been extremely incompetent handling of the secret data.

Installing MSYS2 is a matter of a few minutes. There is also WSL and macOS features a POSIX shell, imagemagick is likely already installed as a dependency somewhere, like ffmpeg often also is.

I know I'm not the brightest bulb by any measure, but do some people really take less than at least a few minutes to come up with one-liners for problems as novel as graphical transformations to PDFs? Maybe if the presumed techie hacker / federal worker took it as an amusing challenge I could see this being done, but genuinely out of pure laziness? That's incredible if true.

It's not a novel problem. But yes, I don't think people quite appreciate how quick and easy it is for people who are in the habit of brewing up one-liners to solve simple problems to do that. I've done it here on HN for jq toy problems before, and I don't really doubt there are people similarly familiar with imagemagick.

It’s a mix of “they’ve done it many times before” and these days AI. But remember the “they’ve done it many times before” just means that in a technical and popular forum you’re likely to find the handful of people who have done so regularly enough to remember the one liner. Also this is probably easily searchable as well so even prior to AI not super hard.

There is nothing novel about it. I saw at least one person say that they have done exactly the same thing out of laziness.

I am only guessing that they had to remove the document from a classified network in a way where data won't possibly leak

[dead]

Such a weird way to do it when it would be a vastly easier to just blow the document out to paper and re-scan it.

Vastly easier when you do it to one or a handful of documents.

But if you want to do it to 2000 documents...

But at that point why bother with the fakery? Why does it matter if it's obviously of digital origin? As long as it's rendered down to an image problem solved.

Was the motivation for this benign (an employee skirting regulations) or malicious?

4 reems (4×500) is hardly a lot for commercial equipment to handle - paper trays will take a reem at a time. Document analysis would still show some shenanigans were in play, but you'd get a bit of variation at least.

[dead]

I mean, I do that all the time when they ask me to print something, sign it, and then scan it.

Sign a blank paper, scan it, paste the original doc on it. Then keep the scan for future docs.

An easier trick I've used is just sign directly on the computer screen over the displayed document with a whiteboard marker and take a photo with my phone.