Hacker News

The repetitive pattern detection approach described here is fascinating from an implementation perspective. We encountered similar challenges when building our interview feedback system - specifically around detecting and eliminating repetitive filler phrases that added no value ("um", "like", "you know").

What worked well for us was implementing a two-stage pipeline: first using a sliding window (n=3) to detect repeated n-grams, then applying cosine similarity with a threshold of 0.85 to catch semantic duplicates. This reduced redundant content by ~40% while preserving meaningful repetition (e.g. when candidates deliberately emphasize key points).

One challenge we haven't fully solved: distinguishing between harmful repetition and intentional rhetorical devices. Have others found effective heuristics for this? We're currently experimenting with attention patterns in the transformer layers to identify deliberate vs. unintentional repetition, but results are mixed.