Fair enough - its honestly not something I expected anyone to be interested in enough such that an about page would be required.
At a high level, it reads RSS feeds from a number of sources, and uses LLMs to identify clusters of stories about the same thing, group them, tag them, and designate them a "top" story or not. That's it.
The biggest thing I've learned in all of this is that o3-mini is far and away the best at following instructions (for this use case). Periodically I'll cycle through the models available on Groq, and always come back to o3-mini.
Very nice, I've been working on something similar, but for regular news. But I want to summarize complete articles, and RSS only provides the headlines and sometimes the first paragraph of an article.
So I decided to write web crawlers, but then you run into CAPTCHA stuff. So I instead used Selenium to automate my browser to fetch the news articles. That worked well, but I haven't worked on it since.
Now I'm thinking that with all these AI browsers around these days, maybe that's actually easier than doing it with Selenium. But haven't researched it properly yet.
In any case, the LLM work of detecting whether two articles are reporting the same news, and summarizing the story, is the same in your project. So in case your project is open source, I would be interested in that part.
Fair enough - its honestly not something I expected anyone to be interested in enough such that an about page would be required.
At a high level, it reads RSS feeds from a number of sources, and uses LLMs to identify clusters of stories about the same thing, group them, tag them, and designate them a "top" story or not. That's it.
The biggest thing I've learned in all of this is that o3-mini is far and away the best at following instructions (for this use case). Periodically I'll cycle through the models available on Groq, and always come back to o3-mini.
Very nice, I've been working on something similar, but for regular news. But I want to summarize complete articles, and RSS only provides the headlines and sometimes the first paragraph of an article.
So I decided to write web crawlers, but then you run into CAPTCHA stuff. So I instead used Selenium to automate my browser to fetch the news articles. That worked well, but I haven't worked on it since.
Now I'm thinking that with all these AI browsers around these days, maybe that's actually easier than doing it with Selenium. But haven't researched it properly yet.
In any case, the LLM work of detecting whether two articles are reporting the same news, and summarizing the story, is the same in your project. So in case your project is open source, I would be interested in that part.