Have worked on it for several years now as part of studying Japanese, can scan for clipboard images or directly, act as a "text hooker" by capturing screen portions/windows (and diffing for changes over time), or receive images through websockets/unix socket. All the GUI portions (configuration, log viewer that replaces terminal in the packaged versions, coordinate selection) use tkinter. It can use several OCR engines both local and online, some were reverse engineered for it by me and a friend (Google Lens uses the Chrome API, Apple Live Text uses a private ObjectiveC version of the API meant for Webkit). Other interesting ones: OneOCR is a local version of the Azure OCR model that somehow they ship in the Windows 11 snipping tool, Chrome Screen AI seems to be a local version of Lens that they ship for offline PDF annotations in Chrome (both are extremely good). Has feature parity on Windows/macOS/Linux, including Wayland through a shim which emulates mss (Python screenshot library) with the screencast api and an original implementation of ext_data_control_v1 (for those Wayland compositors that support it).