> “If you run this on a big repository, it will take quite a lot of time because `git log -n1` takes a long time. I think this is the fastest way to get the most recent commit time on a single file? (That's the assertion that I hope someone can correct me on!) In any case, `bttf tag exec` is using parallelism under the hood to make this even faster.”
Instead of running `git log -n1` on every file, I think you can walk through the commits backwards, skipping any files that have been seen. Something like this (these two commands could be followed by bttf commands):
git log --pretty=format:"DATE:%aI" --name-only |
awk '/^DATE:/ {date=substr($0, 6); next} $0!="" && !seen[$0]++ {print date, $0}'
This seems to run much faster. The only problem is it'll include files that have been renamed or removed. I got an AI to fix that too, but it starts getting awkward (still fast though!): git ls-files |
awk '
# Read all existing files from git ls-files into an array
NR==FNR { lsfiles[$0]; next }
# Process the git log stream
/^DATE:/ { date=substr($0, 6); next }
$0!="" && ($0 in lsfiles) && !seen[$0]++ { print date, $0 }
' - <(git log --pretty=format:"DATE:%aI" --name-only)
Oh interesting! That is indeed quite a bit faster.
I'll have to noodle on this one. `bttf tag exec` works with arbitrary commands that can print any kind of date. But your approach require a different access pattern. I can either specialize the use case in bttf (blech) or I can figure out how to generalize your approach.
I think the key issue here is probably that it isn't line oriented. bttf composes well, but only when you have a one-to-one relationship between date and data. (Or a many-to-one is also supported, but it's many dates to one datum, not one date to many datums.) So maybe that relational model is worth figuring out how to streamline. Then I think this use case would work better.
Also, thank you! This is exactly the kind of reply I was hoping for! :D