Hacker News

This looks great! Duplication and dead code are especially tricky to catch because they are not visible in diffs.

Since you mentioned the implementation details, a couple questions come to mind:

1. Are there any research papers you found helpful or influential when building this? For example, I need to read up on using tree edit distance for code duplication.

2. How hard do you think this would be to generalize to support other programming languages?

I see you are using tree-sitter which supports many languages, but I imagine a challenge might be CFGs and dependencies.

I’ll add a Qlty plugin for this (https://github.com/qltysh/qlty) so it can be run with other code quality tools and reported back to GitHub as pass/fail commit statuses and comments. That way, the AI coding agents can take action based on the issues that pyscn finds directly in a cloud dev env.

Thank you! 1.For tree edit distance, I referred to "APTED: A Fast Tree Edit Distance Algorithm" (Pawlik & Augsten, 2016), but the algorithm works as O(n²) so I also implemented LSH (classic one) for large codebases.The other analyses also use classical compiler theory and techniques. 2. Should be straightforward! tree-sitter gives us parsers for 40+ languages. CFG construction is just tracking control flow, and the core algorithm stays the same.

I focused on Python first because vibe coding with Python tends to accumulate more structural issues. But the same techniques should apply to other languages as well.

Excited about the Qlty integration - that would make pyscn much more accessible and would be amazing!