>Now, everyone basically has a personal TA, ready to go at all hours of the day
This simply hasn't been my experience.
Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.
I've found it excels at some things:
1) The broad overview of a topic
2) When I have a vague idea, it helps me narrow down the correct terminology for it
3) Providing examples of a particular category ("are there any examples of where v1 in the visual cortex develops in a disordered way?")
4) "Tell me the canonical textbooks in field X"
5) Posing math exercises
6) Free form branching--while talking about one topic, I want to shift to another that is distinct but related.
I agree they leave a lot to be desired when digging very deeply into a topic. And my biggest pet peeve is when they hallucinate fake references ("tell me papers that investigate this topic" will, for any sufficiently obscure topic, result in a bunch of very promising paper titles that are wholely invented).
These things are moving so quickly, but I teach a 2nd year combinatorics course, and about 3 months ago I tried th latest chatGPT and Deepseek -- they could answer very standard questions, but were wrong for more advanced questions, but often in quite subtle ways. I actually set a piece of homework "marking" chatGPT, which went well and students seemed to enjoy!
Super good idea!!
Luc Julia (one of the main Siri's creators) describe a very similar exercice in this interview [0](It's in french, although the au translation isn't too bad)
The gist of it, is that he describes this exercice he does with his students, where they ask chatgpt about Victor Hugo's biography, and then proceed to spot the errors made by Chatgtp.
This setup is simple, but there are very interesting mechanisms in place. The student get to learn about challenging facts, do fact checking, cross reference, etc. While also asserting the reference figure of the teacher, with the knowledge to take down chat gpt.
Well done :)
Edit: adding link
[0] https://youtube.com/shorts/SlyUvvbzRPc?si=2Fv-KIgls-uxr_3z
this is amazing strategy
forgot the link :)
Arf seems I'm one of those :).. thanks for the heads up!
That’s a great idea to both teach the subject and AI skepticism.
> I actually set a piece of homework "marking" chatGPT, which went well and students seemed to enjoy!
This. This should be done everywhere. It is the best way to let students see first hand that LLM output is useful, but can be (and often is) wrong.
If people really understands that, everything will be better.
Very clever and approachable, and I've been unintentionally giving myself that exercise for awhile now. Who knows how long it will remain viable, though.
When you say the latest chatGPT, do you mean o3?
Whatever was best on a paid account 3 months ago. I was quite disappointed to be honest, based on what I had been hearing.
I think by default ChatGPT will choose 4o for you. So unless you actually chose o3 you haven’t used the best model.
that's a cool assignment!
>When I have a vague idea, it helps me narrow down the correct terminology for it
so the opposite of Stack Overflow really, where if you have a vague idea your question gets deleted and you get reprimanded.
Maybe Stack Overflow could use AI for this, help you formulate a question in the way they want.
Maybe. But, it's been over a year since I used StackOverflow, primarily because of LLMs. Sure, I could use an LLM to formulate a question that passes SO's muster. But why bother, when the LLM can almost certainly answer the question as well; SO will be slower; and there's a decent chance that my question will be marked as a duplicate (because it pattern matches to a similar but distinct question).
>hen the LLM can almost certainly answer the question as well;
You say this in a thread specifically talking about how LLM's fall apart when digging deeper into the surface of questions.
Do people really want to learn and understand, or just feel like they are learning and understanding?
I would say that LLM might give a correct answer, in a good enough question there is more than one answer.
Furthermore the LLM might give an answer but probably not explain with the best skills available why the answer is the way it is. This is of course something that varies with StackOverflow but there it is at least possible that somebody with deep technical knowledge decides a question is worth answering deeply.
outside of 5), I concur. It's good for discovery, as is Google for discovering topics while weighing on proper profesionally resources and articles for the learning.
It's too bad people are trying to substitute the latter with the chatGPT output itself. And I absolutely cannot trust any machine that is willing to lie to me rather than admit ignorance on a subject.
I find 2 invaluable for enhancing search, and combined with 1 & 4, it's a huge boost to self-learning.
I’ve found the AI is particularly good at explaining AI, better than quite a lot of other coding tasks.
My core problem with LLMs is as you say; it's good for some simpler concepts, tasks, etc. but when you need to dive into more complex topics it will oversimplify, give you what you didn't ask for, or straight up lie by omission.
History is a great example, if you ask an LLM about a vaguely difficult period in history it will just give you one side and act like the other doesn't exist, or if there is another side, it will paint them in a very negative light which often is poorly substantiated; people don't just wake up and decide one day to be irrationally evil with no reason, if you believe that then you are a fool... although LLMs would agree with you more times than not since it's convenient.
The result of these things is a form of gatekeeping, give it a few years and basic knowledge will be almost impossible to find if it is deemed "not useful" whether that's an outdated technology that the LLM doesn't seem talked about very much anymore or a ideological issue that doesn't fall in line with TOS or common consensus.
A few weeks ago I was asking an LLM to offer anti-heliocentric arguments, from the perspective of an intelligent scientist. Although it initially started with what was almost a parody of writing from that period, with some prompting I got it to generate a strong rendition of anti-heliocentric arguments.
(On the other hand, it's very hard to get them to do it for topics that are currently politically charged. Less so for things that aren't in living memory: I've had success getting it to offer the Carthaginian perspective in the Punic Wars.)
That's a fun idea; almost having it "play pretend" instead of directly asking it for strong anti-heliocentric arguments outright.
It's weird to see which topics it "thinks" are politically charged vs. others. I've noticed some inconsistency depending on even what years you input into your questions. One year off? It will sometimes give you a more unbiased answer as a result about the year you were actually thinking of.
I think the first thing is figuring out exactly what persona you want the LLM to adopt: if you have only a vague idea of the persona, it will default to the laziest one possible that still could be said to satisfy your request. Once that's done, though, it usually works decently, except for those that the LLM detects are politically charged. (The weakness here is that at some point you've defined the persona so strictly that it's ahistorical and more reflective of your own mental model.)
As for the politically charged topics, I more or less self-censor on those topics (which seem pretty easy to anticipate--none of those you listed in your other comment surprise me at all) and don't bother to ask the LLM. Partially out of self-protection (don't want to be flagged as some kind of bad actor), partially because I know the amount of effort put in isn't going to give a strong result.
> The weakness here is that at some point you've defined the persona so strictly that it's ahistorical and more reflective of your own mental model.
That's a good thing to be aware of, using our own bias to make it more "likely" to play pretend. LLMs tend to be more on the agreeable side; given the unreliable narrators we people tend to be, and the fact that these models are trained on us, it does track that the machine would tend towards preference over fact, especially when the fact could be outside of the LLMs own "Overton Window".
I've started to care less and less about self-censoring as I deem it to be a kind of "use it or lose it" privilege. If you normalize talking about censored/"dangerous" topics in a rational way, more people will be likely to see it not as much of a problem. The other eventuality is that no one hears anything that opposes their view in a rational way but rather only hears from the extremists or those who just want to stick it to the current "bad" in their minds at that moment. Even then though I still will omit certain statements on some topics given the platform, but that's more so that I don't get mislabeled by readers. (one of the items on my other comment was intentionally left as vague as possible for this reason) As for the LLMs, I usually just leave spicy questions for LLMs I can access through an API of someone else (an aggregator) and not a personal acc just to make it a little more difficult to label my activity falsely as a bad actor.
What were its arguments? Do you have enough of an understanding of astronomy to know whether it actually made good arguments that are grounded in scientific understanding, or did it just write persuasively in a way that looks convincing to a layman?
> I've had success getting it to offer the Carthaginian perspective in the Punic Wars.
This is not surprising to me. Historians have long studied Carthage, and there are books you can get on the Punic Wars that talk about the state of Carthage leading up to and during the wars (shout out to Richard Miles's "Carthage Must Be Destroyed: The Rise and Fall of an Ancient Civilization"). I would expect an LLM to piggyback off of that existing literature.
Extensive education in physics, so yes.
The most compelling reason at the time to reject heliocentrism was the (lack of) parallax of stars. The only response that the heliocentrists had was that the stars must be implausibly far away. Hundreds of billions of times further away than the moon is--and they knew the moon itself is already pretty far from us-- which is a pretty radical, even insane, idea. There's also the point that the original Copernican heliocentric model had ad hoc epicycles just as the Ptolemaic one did, without any real increase in accuracy.
Strictly speaking, the breakdown here would be less a lack of understanding of contemporary physics, and more about whether I knew enough about the minutia of historical astronomers' disputes to know if the LLM was accurately representing them.
>I've had success getting it to offer the Carthaginian perspective in the Punic Wars.)
That's honestly one of the funniest things I have read on this site.
Have you tried abliterated models? I'm curious if the current de-censorship methods are effective in that area / at that level.
The part about history perspectives sounds interesting. I haven't noticed this. Please post any concrete/specific examples you've encountered!
You are born in your country. You love your family. A foreign country invades you. Your country needs you. Your faith says to obey the government. Commendable and noble except for a few countries, depending upon the year.
Why?
- Rhodesia (lock step with the racial-first reasoning, underplays Britain's failures to support that which they helped establish; makes the colonists look hateful when they were dealing with terrorists which the British supported)
- Bombing of Dresden, death stats as well as how long the bombing went on for (Arthur Harris is considered a war-criminal to this day for that; LLLMs highlight easily falsifiable claims by Nazi's to justify low estimates without providing much in the way of verifiable claims outside of a select few, questionable, sources. If the low-estimate is to be believed, then it seems absurd that Harris would be considered a war-criminal in light of what crimes we allow today in warfare)
- Ask it about the Crusades, often if forgets the sacking of St. Peter's in Rome around 846 AD, usually painting the Papacy as a needlessly hateful and violent people during that specific Crusade. Which was horrible, bloody as well as immensely destructive (I don't defend the Crusades), but paints the Islamic forces as victims, which they were eventually, but not at the beginning, at the beginning they were the aggressors bent on invading Rome.
- Ask it about the Six-Day War (1967) and contrast that with several different sources on both sides and you'll see a different portrayal even by those who supported the actions taken.
These are just the four that come to my memory at this time.
Most LLMs seem cagey about these topics; I believe this is due to an accepted notion that anything that could "justify" hatred or dislike of a people group or class that is in favor -- according to modern politics -- will be classified as hateful rhetoric, which is then omitted from the record. The issue lies in the fact that to understand history, we need to understand what happened, not how it is perceived, politically, after the fact. History helps inform us about the issues of today, and it is important, above all other agendas, to represent the truth of history, keeping an accurate account (or simply allowing others to read differing accounts without heavy bias).
LLMs are restricted in this way quite egregiously; "those who do not study history are doomed to repeat it", but if this continues, no one will have the ability to know history and are therefore forced to repeat it.
> Ask it about the Crusades, often if forgets the sacking of St. Peter's in Rome around 846 AD, usually painting the Papacy as a needlessly hateful and violent people during that specific Crusade. Which was horrible, bloody as well as immensely destructive (I don't defend the Crusades), but paints the Islamic forces as victims, which they were eventually, but not at the beginning, at the beginning they were the aggressors bent on invading Rome.
I don't know a lot about the other things you mentioned, but the concept of crusading did not exist (in Christianity) in 846 AD. It's not any conflict between Muslims and Christians.
The crusades were predicated on historic tensions between Rome and the Arabs. Which is why I mention that, while the First Crusade proper was in 1096, it's core reasoning were situations like the Sacking of St. Peters which is considered by historians to be one of the most influential moments and often was used as a justification as there was a history of incompatibilities between Rome and the Muslims.
Further leading to the Papacy furthering such efforts in the upcoming years, as they were in Rome and made strong efforts to maintain Catholicism within those boundaries. Crusading didn't appear out of nothing; it required a catalyst for the behavior, like what i listed, is usually a common suspect.
What you’re saying is not at all what I understand to be the history of crusading.
Its background is in the Islamic Christian conflicts of Spain. Crusading was adopted from the Muslim idea of Jihad, as we things like naming customs (Spanish are the only Christians who name their children “Jesus”, after the Muslim “Muhammad”).
The political tensions that lead to the first crusade were between Arab Muslims and Byzantine Christian’s. Specifically, the Battle of Mazikirt made Christian Europe seem more vulnerable than it was.
The Papacy wasn’t at the forefront of the struggle against Islam. It was more worried about the Normans, Germans, and Greeks.
When the papacy was interested in Crusading it was for domestic reasons: getting rid of king so-and-so by making him go on crusade.
The situation was different in Spain where Islam was a constant threat, but the Papacy regarded Spain as an exotic foreign land (although Sylvester II was educated there).
It’s extremely misleading to view the pope as the leader of an anti-Muslim coalition. There really was no leader per se, but the reasons why kings went on crusade had little to do with fighting Islam.
Just look at how many monarchs showed up in Jerusalem, then headed straight home and spent the rest of their lives bragging about crusaders.
I’m 80% certain no pope ever set foot in Outremere.
While what you are saying makes a lot of sense, but it seemingly ignores the concerns of a people who, not too long before, had been made aware of the dangerous notion of Muslims having dominion over even an adjacent region to their own. I do know that the Papacy was gaining in power and popularity leading up to the Crusades. As such, and I believe what you say about getting rid of the king to be absolutely true, this is still lacking a component, that being a reason for the populace of Rome to stand behind their new "king".
"We are now expected to believe that the Crusades were an unwarranted act of aggression against a peaceful Muslim world. Hardly. The first call for a crusade occurred in 846 CE, when an Arab expedition to Sicily sailed up the Tiber and sacked St Peter's in Rome. A synod in France issued an appeal to Christian sovereigns to rally against 'the enemies of Christ,' and the pope, Leo IV, offered a heavenly reward to those who died fighting the Muslims. A century and a half and many battles later, in 1096, the Crusaders actually arrived in the Middle East. The Crusades were a late, limited, and unsuccessful imitation of the jihad - an attempt to recover by holy war what was lost by holy war. It failed, and it was not followed up." (Bernard Lewis, 2007 Irving Kristol Lecture, March 7)
Leo IV's actions to fortify after the sacking does show his concerns; with "Leonine City" with calls to invest into this as a means of defense from future incursions. https://dispatch.richmond.edu/1860/12/29/4/93 A decent (Catholic bias) summary which you can find references for fairly easily: https://www.newadvent.org/cathen/09159a.htm
unfortunately it's hard to find this pdf without signing up or paying money but there are some useful figures if you scroll down https://www.academia.edu/60028806/The_Surviving_Remains_of_t... Showing the re-enforcement as well as a very clear and obvious purpose to it in light of when it was built.
I would recommend puttering about Lewis' work as well as the likes of Thomas Madden as well. If you are really adventurous you can dig up the likes of Henri Pirenne and his work on the topic; he argues that literate civilization continued in the West up until the arrival of Islam in the 7th century, Islam's blockade, through their piracy, in the Mediterranean being a core contributor in leaving the West in a state of poverty, and when you lose the ability to easily find food usually then literacy is placed on the back burner. Though that's just a tangent for another day, it's very interesting and he presents pretty decent evidence for his suppositions iirc.
Although if Pirenne is correct then the sacking of St. Peters carries a different tone, not one of just a "one off" oopsie but a sign of the intention of a troublesome and destructive new enemy setting their sites on Rome itself, not content to keep to the sea and to the East. It was a clear message to the people that they could be next in line (this is my opinion of course).
If you are American I would simply remind you that even now today you hear cries of a little nation across the sea being an "imminent threat to democracy" while our historic enemies are LITERALLY at our door just South of us and they have been there for several years now sitting in their little bases waiting for something. (I'm unclear as to when exactly it all started) The notion that a Pope could give the people a reason, especially those who have felt the economic pressures, as well as the memory of a raid in their own home by the same aggressors, is possible. Being compelled to engage with an enemy that is a decent distance away is very believable.
I'm familiar with Madden's more political stuff. I also read his book on the Fourth Crusade.
One thing mentions a lot is that our understanding of Crusades is heavily influenced by 19th century colonialism. "Our understanding" being both the modern West and modern Islamic understanding.
It's also completely and totally wrong.
Just because a bunch of Christians and bunch of Muslims fought, does not mean it's a crusade. And just as there were no Crusades in the 19th century (with one teeny-tiny exception) there were no Crusades in the 9th century.
What's most relevant this conversation is that ChatGPT would be opening itself to lots of criticism if it started talking about 9th Century Crusades.
There are simply too many reputable documents saying "the first crusade began in ..." or "the concept of crusading evolved in Spain ..."
I'm reaching into my memory from college, but I recall crusading was mostly a Norman-Franco led thing (plenty of exceptions, of course).
Papal foreign policy was based around one very simple principal: avoid all concentrations of power.
Crusading was useful when it supported that principal, and harmful when it degraded it.
So the ideal papal crusade was one that was poorly managed, unlikely to succeed, but messed up the established political order just enough that all the kingdoms were weakened.
Which is exactly what the crusades looked like.
Why should we consider something that happened 250 years prior as some sort of affirmative defense of the Crusades as having been something that started with the Islamic world being the aggressors?
If the US were to start invading Axis countries with WW2 being the justification we'd of course be the aggressors, and that was less than 100 years ago.
Because it played a role in forming the motivations of the Crusaders? It's not about justifying the Crusades, but understanding why they happened.
Similarly, it helps us understand all the examples of today of resentments and grudges over events that happened over a century ago that still motivate people politically.
He's referring to the Arab sack of St. Peters. https://en.wikipedia.org/wiki/Arab_raid_against_Rome
His point is that this was not part of the crusades, not that he was unaware of his happening.
Arthur Harris is in no way considered a war criminal by the vast majority of British people for the record.
It’s a very controversial opinion and stating as a just so fact needs challenging.
Do you have references or corroborating evidence?
In 1992 a statue was erected to Harris in London, it was under 24 hour surveillance for several months due to protesting and vandalism attempts. I'm only mentioning this to highlight that there was quite a bit of push back specifically calling the gov out on a tribute to him; which usually doesn't happen if the person was well liked... not as an attempted killshot.
Even the RAF themselves state that there was quite a few who were critical on the first page of their assessment of Arthur Harris https://www.raf.mod.uk/what-we-do/centre-for-air-and-space-p...
Which is funny and an odd thing to say if you are widely loved/unquestioned by your people. Again just another occurrence of language from those who are on his side reinforcing the idea that there is, as you say is "very controversial", and maybe not a "vast majority" since those two things seem at odds with each other.
Not to mention that Harris targeted civilians, which is generally considered behavior of a war-criminal.
As an aside this talk page is a good laugh. https://en.wikipedia.org/wiki/Talk:Arthur_Harris/Archive_1
Although you are correct I should have used more accurate language instead of saying "considered" I should have said "considered by some".
You call out that you don’t defend the crusades but are you supportive of Rhodesia?
I only highlighted that I'm not in support of the crusades since it sounds like i might be by my comments. I was highlighting that they didn't just lash out with no cause to start their holy war.
Rhodesia is a hard one; since the more I learn about it the more I feel terrible for both sides; I also do not support terrorism against a nation even if I believe they might not be in the right. However i hold by my disdain for how the British responded/withdrew from them effectively doomed Rhodesia making peaceful resolution essentially impossible.
This was interesting thanks - makes me wish I had the time to study your examples. But of course I don't, without just turning to an LLM....
If for any of these topics you do manage to get a summary you'd agree with from a (future or better-prompted?) LLM I'd like to read it. Particularly the first and third, the second is somewhat familiar and the fourth was a bit vague.
If someone has Grok 4 access I'd be interested to see if it's less likely to avoid these specific issues.
> those who do not study history are doomed to repeat it
The problem is, those that do study history are also doomed to watch it repeat.
People _do_ just wake up one day and decide some piece of land should belong to them, or that they don't have enough money and can take yours, or they are just sick of looking at you and want to be rid of you. They will have some excuse or justification, but really they just want more than they have.
People _do_ just wake up and decide to be evil.
A nation that might fit this description may have had their populace indoctrinated (through a widespread political campaign) to believe that the majority of the world throughout history seeks for their destruction. That's a reason for why they think that way, but not because they woke up one day and decided to choose violence.
However not a justification, since I believe that what is happening today is truly evil. Same with another nation who entered a war knowing they'd be crushed, which is suicide; whether that nation is in the right is of little effect if most of their next generation has died.
History in particular is rapidly approaching post-truth as a knowledge domain anyway.
There's no short-term incentive to ever be right about it (and it's easy to convince yourself of both short-term and long-term incentives, both self-interested and altruistic, to actively lie about it). Like, given the training corpus, could I do a better job? Not sure.
"Post truth". History is a funny topic. It is both critical and irrelevant. Do we really need to know how the founder felt about gun rights? Abortion? Both of these topics were radically different in their day.
All of us need to learn the basics about how to read history and historians critically and to know our the limitations which as you stated probably a tall task.
What are you talking about? In what sense is history done by professional historians degrading in recent times? And what short/long term incentives are you talking about? They are the same as any social science.
"History done by professional historians" comprises an ever-shrinking fraction of the total available text.
Gen-pop is actually incentivized to distill and repeat the opinions of technical practitioners. Completing tasks in the short term depends on it! Not true of history! Or climate science, for that matter.
> people don't just wake up and decide one day to be irrationally evil with no reason, if you believe that then you are a fool
The problem with this, is that people sometimes really do, objectively, wake up and device to be irrationally evil. It’s not every day, and it’s not every single person — but it does happen routinely.
If you haven’t experienced this wrath yourself, I envy you. But for millions of people, this is their actual, 100% honest truthful lived reality. You can’t rationalize people out of their hate, because most people have no rational basis for their hate.
(see pretty much all racism, sexism, transphobia, etc)
Do they see it as evil though? They wake up, decide to do what they perceive as good but things are so twisted that their version of good doesn't agree with mine or yours. Some people are evil, see themselves as bad, and continue down that path, absolutely. But that level of malevolence is rare. Far more common is for people to believe that what they're doing is in service of the greater good of their community.
Humans are not rational animals, they are rationalizing animals.
So in this regard, they probably do deep down see it as evil, but will try to reason a way (often in a hypocritical way) to make it appear good. The msot common method of using this to drive bigotry often comes in the reasons of 1) dehumanizing the subject of hate ("Group X is evil, so they had it coming!") or 2) reinforcing a superiority over the subject of hate ("I worked hard and deserve this. Group X did not but wants the same thing").
Your answer depends on how effective you think propaganda and authority is at shaping the mind to contradict itself. The Stanfor experiment seems to reinforce a notion that a "good" person can justify any evil to themself with a surprisingly little amount of nudging.
> History is a great example, if you ask an LLM about a vaguely difficult period in history it will just give you one side and act like the other doesn't exist, or if there is another side, it will paint them in a very negative light which often is poorly substantiated
Which is why it's so terribly irresponsible to paint these """AI""" systems as impartial or neutral or anything of the sort, as has been done by hypesters and marketers for the past 3 years.
Couldn't agree more.
However on the bright side people only believe what they want to anyhow, so not much has been lost -_-
It's a floor raiser, not a ceiling raiser. It helps you get up to speed on general conventions and consensus on a topic, less so on going deep on controversial or highly specialized topics
That's the best and succinct description of using ChatGPT for this kind of things: it's a floor raiser, not a ceiling raiser.
If it's of interest, I expanded on this in a blog post: https://elroy.bot/blog/2025/07/29/ai-is-a-floor-raiser-not-a...
I really think that 90% of such comments come from a lack of knowledge on how to use LLMs for research.
It's not a criticism, the landscape moves fast and it takes time to master and personalize a flow to use an LLM as a research assistant.
Start with something such as NotebookLM.
I use them and stay up to date reasonably. I have used NotebookLM, I have access to advanced models through my employer and personally, and I have done alot of research on LLMs and using them effectively.
They simply have limitations, especially on deep pointed subject matters where you want depth not breadth, and honestly I'm not sure why these limitations exist but I'm not working directly on these systems.
Talk to Gemini or ChatGPT about mental health things, thats a good example of what I'm talking about. As recently as two weeks ago my colleagues found that even when heavily tuned, they still managed to become 'pro suicide' if given certain lines of questioning.
And if we assume this is a knowledgable, technical community: how do you feel about the general populaces ability to use LLM's for research, without the skepticism needed to correct it?
> Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.
That's fine. Recognize the limits of LLMs and don't use them in those cases.
Yet that is something you should be doing regardless of the source. There are plenty of non-reputable sources in academic libraries and there are plenty of non-reputable sources from professionals in any given field. That is particularly true when dealing with controversial topics or historical sources.
IT can be beneficial for making your initial assessment, but you'll need to dig deeper for something meaningful. For example, I recently used Gemini's Deep Research to do some literature review on educational Color Theory in relation to PowerPoint presentations [1]. I know both areas rather well, but I wanted to have some links between the two for some research that I am currently doing.
I'd say that companies like Google and OpenAI are aware of the "reputable" concerns the Internet is expressing and addressing them. This tech is going to be, if not already is, very powerful for education.
[1] http://bit.ly/4mc4UHG
Taking a Gemini Deep Research output and feeding it to NotebookLM to create audio overviews is my current podcast go-to. Sometimes I do a quick Google and add in a few detailed but overly verbose documents or long form YouTube videos, and the result is better than 99% of the podcasts out there, including those by some academics.
No wonder there are so many confident people spouting total rubbish on technical forums.
Grandparent testimony of success, & parent testimony of frustration, are both just wispy random gossip when they don't specify which LLMs delivered the reported experiences.
The quality varies wildly across models & versions.
With humans, the statement "my tutor was great" and "my tutor was awful" reflect very little on "tutoring" in general, and are barely even responses to each other withou more specificity about the quality of tutor involved.
Same with AI models.
Latest OpenAI, Latest Gemini models, also tried with latest LLAMA but I didn’t expect much there.
I have no access to anthropic right now to compare that.
It’s an ongoing problem in my experience
Hmm. I have had pretty productive conversations with ChatGPT about non-linear optimization.
Granted, that's probably well-trodden ground, to which model developers are primed to pay attention, and I'm (a) a relative novice with (b) very strong math skills from another domain (computational physics). So Chuck and I are probably both set up for success.
What are some subjects that ChatGPT has given only shallow instruction on?
I'll tell you that I recently found it the best resource on the web for teaching me about the 30 Years War. I was reading a collection of primary source documents, and was able to interview ChatGPT about them.
Last week I used it to learn how to create and use Lehmer codes, and its explanation was perfect, and much easier to understand than, for example, Wikipedia.
I ask it about truck repair stuff all the time, and it is also great at that.
I don't think it's great at literary analysis, but for factual stuff it has only ever blown away my expectations at how useful it is.
It sounds like it is a good tool for getting you up to speed on a subject and you can leverage that newfound familiarity to better search for reputable sources on existing platforms like google scholar or arXiv.
I built a public tool a while back for some of my friends in grad school to support this sort of deep academic research use case. Sharing in case it is helpful: https://sturdystatistics.com/deepdive?search_type=external&q...
It is shallow. But as long as what you're asking it of is the kind of material covered in high school or college, it's fairly reliable.
This generation of AI doesn't yet have the knowledge depth of a seasoned university professor. It's the kind of teacher that you should, eventually, surpass.
I validate models in finance, and this is by far the best tool created for that purpose. I'd compare financial model validation to a Master's level task, where you're working with well established concepts, but at a deep, technical level. LLMs excel at that: ithey understand model assumptions, know what needs to be tested to ensure correctness, and can generate the necessary code and calculations to perform those tests. And finally, they can write the reports.
Model Validation groups are one of the targets for LLMs.
That’s one aspect of quantitative finance, and I agree. Elsewhere I noted that anything that is structured data + computation adjacent it has an easier time with, even excels in many cases.
It doesn’t cover the other aspects of finance, perhaps may be considered advanced (to a regular person at least) but less quantitative. Try having it reason out a “cigar butt” strategy and see if returns anything useful about companies that fit the mold from a prepared source.
Granted this isn’t quant finance modeling, but it’s a relatively easy thing as a human to do, and I didn’t find LLMs up to the task
The worst is when it's confidently wrong about things... Thankfully, this occurance is becoming less & less common -- or at least, it's boundary is beyond my subject matter expertise.
> Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
You must be using a free model like GPT-4o (or the equivalent from another provider)?
I find that o3 is consistently able to go deeper than me in anything I'm a nonexpert in, and usually can keep up with me in those areas where I am an expert.
If that's not the case for you I'd be very curious to see a full conversation transcript (in chatgpt you can share these directly from the UI).
I have access to the highest tier paid versions of ChatGPT and Google Gemini, I've tried different models, tuning things like size of context windows etc.
I know it has nothing to do with this. I simply hit a wall eventually.
I unfortunately am not at liberty to share the chats though. They're work related (I very recently ended up at a place where we do thorny research).
A simple one though, is researching Israel - Palestine relations since 1948. It starts off okay (usually) but it goes off the rails eventually with bad sourcing, fictitious sourcing, and/or hallucinations. Sometimes I actually hit a wall where it repeats itself over and over and I suspect its because the information is simply not captured by the model.
FWIW, if these models had live & historic access to Reuters and Bloomberg terminals I think they might be better at a range of tasks I find them inadequate for, maybe.
> I unfortunately am not at liberty to share the chats though.
I have bad news for you. If you shared it with ChatGPT (which you most likely did), then whatever it is that you are trying to keep hidden or private, is not actually hidden or private anymore, it is stored on their servers, and most likely will be trained on that chat. Use local models instead in such cases.
I have found that being very specific and asking things like "can you tell me what another perspective might be, such that I can understand potential counter-arguments might be, and how people with other views might see this topic?" can be helpful when dealing with complex/nuanced/contentious subjects. Likewise with regard to "reputable" sources.
This can happen if you use the free model and not a paid deep research model. You can use a gpt model and ask things like , "how many moons does Jupiter have?" But if you want to ask, "can you go on the web a research the affects that chamical a has had on our water supply a cite sources?", you will need to use a deep research model.
Why not do the research yourself rather than risk it misinterpreting? I FAFO'd repeatedly with that, and it is just horribly unreliable.
This is where feeding in extra context matters. Paste in text that shows up from a google search, textbooks preferred, to get in depth answers.
No one builds multi shot search tools because they eat tokens like no ones business, but I've deployed them internal to a company with rave reviews at the cost of $200 per seat per day.
> and you want it to find reputable sources
Ask it for sources. The two things where LLMs excel is by filling the sources on some claim you give it (lots will be made up, but there isn't anything better out there) and by giving you queries you can search for some description you give it.
Also, Perplexity.ai cites its sources by default.
It often invents sources. At least for me.
Try to red team blue team with it
Blue team you throw out concepts and have it steelman them
Red team you can literally throw any kind of stress test at your idea
Alternate like this and you will learn
A great prompt is “give me the top 10 xyz things” and then you can explore
Back when I was in 2006 I used Wikipedia to prepare for job interviews :)
> god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources
If you're really researching something complex/controversial, there may not be any
Can you give a specific example where at certain depth it has stopped becoming useful?
“The deeper I go, the less it seems to be useful. This happens quick for me. Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.”
These things also apply to humans. A year or so ago I thought I’d finally learn more about the Israeli/Palestinians conflict. Turns out literally every source that was recommended to me by some reputable source was considered completely non-credible by another reputable one.
That said I’ve found ChatGPT to be quite good at math and programming and I can go pretty deep at both. I can definitely trip it into mistakes (eg it seems to use calculations to “intuit” its way around sometimes and you can find dev cases where the calls will lead it the wrong directions), but I also know enough to know how to keep it on rails.
> learn more about the Israeli/Palestinians
> to be quite good at math and programming
Since LLMs are essentially summarizing relevant content, this makes sense. In "objective" fields like math and CS, the vast majority of content aligns, and LLMs are fantastic at distilling the relevant portions you ask about. When there is no consensus, they can usually tell you that ("this is nuanced topic with many perspectives...", etc), but they can't help you resolve the truth because, from their perspective, the only truth is the content.
Israel / Palestine is a collision between two internally valid and mutually exclusive worldviews. It's kind of a given that there will be two camps who consider the other non-reputable.
FWIW, the /r/AskHistorians booklist is pretty helpful.
https://www.reddit.com/r/AskHistorians/wiki/books/middleeast...
A human-curated list of human-written books? How delightfully old fashioned!
> It's kind of a given that there will be two camps who consider the other non-reputable.
You don’t need to look more than 2 years back to understand why either camp finds the other non-reputable.
> Turns out literally every source that was recommended to me by some reputable source was considered completely non-credible by another reputable one.
That’s the single most important lesson by the way, that this conflict just has two different, mutually exclusive perspectives, and no objective truth (none that could be recovered FWIW). Either you accept the ambiguity, or you end up siding with one party over the other.
> you end up siding with one party over the other
Then as you get more and more familiar you "switch" depending on the sub-issue being discussed, aka nuance
the truth (aka facts) is objective and facts exist.
The problem is selective memory of these facts, and biased interpretation of those facts, and stretching the truth to fit pre-determined opinion
Who can tell now what really happened in Deir Yassin? It’s a hopeless endeavour.
If there is no trustworthy record of the objective truth, it doesn’t exist anymore, effectively.
Re: conflicts and politics etc.
I've anecdotally found that real world things like these tend to be nuanced, and that sources (especially on the internet) are disincentivised in various ways from actually showing nuance. This leads to "side-taking" and a lack of "middle-ground" nuanced sources, when the reality lies somewhere in the middle.
Might be linked to the phenomenon where in an environment where people "take sides", those who display moderate opinions are simply ostracized by both sides.
Curious to hear people's thoughts and disagreements on this.
I think the Israeli/Palestinian conflict is an example where studying the history is in some sense counter-productive. There's more than a century of atrocities that justify each subsequent reaction; the veritable cycle of violence. And whichever atrocity grabs you first (partly based on present cultural narratives) will color how you perceive everything else.
Moreover, the conflict is unfolding. What matters isn't what happened 100 years ago, or even 50 years ago, but what has happened recently and is happening. A neighbor of mine who recently passed was raised in Israel. Born circa 1946 (there's black & white footage of her as a baby aboard, IIRC, the ship Exodus 1947), she has vivid memories as a child of Palestinian Imams calling out from the mosques to "kill the Jews". She was a beautiful, kind soul who, for example, freely taught adult education to immigrants (of all sorts), but who one time admitted to me that she utterly despised Arabs. That's all you need to know, right there, to understand why Israel is doing what it's doing. Not so much what happened in the past to make people feel that way, but that many Israelis actually, viscerally feel this way today, justifiably or not but in any event rooted in memories and experiences seared into their conscience. Suffice it to say, most Palestinians have similar stories and sentiments of their own, one of the expressions of which was seen on October 7th.
And yet at the same time, after the first few months of the Gaza War she was so disgusted that she said she wanted to renounce her Israeli citizenship. (I don't know how sincere she was in saying this; she died not long after.) And, again, that's all you need to know to see how the conflict can be resolved, if at all; not by understanding and reconciling the history, but merely choosing to stop justifying the violence and moving forward. How the collective action problem might be resolved, within Israeli and Palestinian societies and between them... that's a whole 'nother dilemma.
Using AI/ML to study history is interesting in that it even further removes one from actual human experience. Hearing first hand accounts, even if anecdotal, conveys information you can't acquire from a book; reading a book conveys information and perspective you can't get from a shorter work, like a paper or article; and AI/ML summaries elide and obscure yet more substance.
This is the part where you actually need to think and wonder if AI is the right tool in this particular purpose. Unfortunately you can't completely turn your brain off just yet.
What is "it"? Be specific: are you using some obsolete and/or free model? What specific prompt(s) convinced you that there was no way forward?
>Its too shallow. The deeper I go, the less it seems to be useful. This happens quick for me.
If its a subject you are just learning how can you possibly evaluate this?
If you're a math-y person trying to get up to speed in some other math-y field you can discern useless LLM output pretty quickly even as a relative novice.
Falling apart under pointed questioning, saying obviously false things, etc.
It's easy to recognize that something is wrong if it's wrong enough.
If we have custom trained LLMs per subject doesn't that solve the problem. The shallow problem seems really easy to solve
Can you share some examples?
Try doing deep research on the Israel - Palestine relations. That’s a good baseline. You’ll find it starts spitting out really useless stuff fast, or will try to give sources that don’t exist or are not reputable.
It's not a doctoral adviser.
Human interlocutors have similar issues.