Blog

The Turing Test, AI, and LLMs

15/01/2026















Alan Turing, one of my ‘heroes', along with Ada Lovelace, who will be the subject of a separate blog. 


Seventy-five years after it was first proposed, the Turing Test still sits at the centre of debates about artificial intelligence, even as the machines it was meant to challenge now converse with us daily. That alone says something important: not just about technology, but about the question Alan Turing was really asking.


Alan Turing remains one of the most influential thinkers of the twentieth century. During the Second World War he played a central role at Bletchley Park, devising methods and machines that helped break the German Enigma cipher, an achievement widely believed to have shortened the war by several years. Yet his wartime work, extraordinary as it was, represents only part of his legacy. Turing also laid the theoretical foundations of modern computer science, most famously through the concept of the "Turing machine", which still underpins how computation itself is understood.


In 1950, Turing turned his attention to a more provocative question: can machines think? Rather than becoming trapped in philosophical arguments about the nature of thought, he reframed the problem into what he called the imitation game. The idea was deceptively simple. A human judge engages in text-based conversation with two unseen participants, one human and one machine. If the judge cannot reliably tell which is which, the machine is said to have passed the test. The emphasis is not on internal processes, consciousness, or intent, but on observable behaviour.


This framing was radical for its time. Early computers were widely regarded as glorified calculators, powerful but rigid and unimaginative. Turing sidestepped metaphysics and asked an operational question instead: if a system behaves as if it were intelligent, on what grounds do we deny it that label? His approach reflected an engineer's mindset rather than a philosopher's, grounding intelligence in performance rather than essence.


The constraints of the test were deliberate. Communication is textual, removing visual and auditory cues that would make the distinction trivial. The machine is not expected to be perfect; indeed, occasional hesitation or error may make it more convincing. Intelligence, in this framing, includes ambiguity, humour, context, and inconsistency, features of human communication that go far beyond raw calculation.


Today, the Turing Test is no longer treated by serious researchers as a definitive goal. Modern AI systems routinely outperform humans in narrow domains while failing badly at common-sense reasoning, emotional understanding, or moral judgement. As a result, the test is better understood as a historical milestone and a conceptual provocation rather than a final exam for machine intelligence.


A clearer, more contemporary way of expressing Turing's original insight might be this: if a machine can participate in human conversation so naturally that its artificial nature becomes irrelevant, we are forced to reconsider what we mean by intelligence. It is that challenge, not the mechanics of the test itself, that remains his most enduring contribution.


Despite those achievements, Turing's life ended tragically. He was prosecuted for homosexuality, subjected to chemical castration, and died in 1954. His posthumous pardon decades later stands as both recognition of his scientific importance and a stark reminder of the social injustices of his time. Today he is remembered not only as a pioneer of computing, but as a symbol of intellectual courage and human dignity.


So, does anything actually pass the Turing Test as Turing conceived it? The most accurate answer remains no, not yet.


What has happened is that machines have become exceptionally good at passing simplified or weakened versions of the test. In short, time-limited, text-only conversations with non-expert judges, modern AI systems can and do convince people that they are human. There have even been staged demonstrations and competitions where judges were misled at rates that would have astonished researchers a generation ago. These, however, are best understood as demonstrations of linguistic fluency rather than evidence of human-level intelligence.


Turing's original thought experiment implicitly assumed something far tougher than a few minutes of polite chat. He envisaged sustained dialogue, with the interrogator free to change direction, probe inconsistencies, revisit earlier claims, and test memory, humour, moral judgement, and the ability to cope with ambiguity over time. Under those conditions, all existing systems eventually reveal their limits. They lose coherence, fabricate facts, misunderstand intent, or fail to maintain a stable internal model of the world.


It is also worth stressing that no single, formal version of the Turing Test was ever fixed. Turing deliberately left it loose. He was more interested in provoking debate than in setting an exam paper for future machines. That looseness has allowed the goalposts to drift. When headlines claim that a chatbot has "passed the Turing Test", it is usually a narrow, game-like interpretation that strips away much of the depth Turing assumed but never codified.
Perhaps the most revealing point is this: humans pass the test effortlessly not because they are good at conversation, but because conversation is grounded in lived experience. Humans remember past interactions, revise beliefs, feel embarrassment, change their minds, and care whether what they say is true. No existing artificial system does these things in a literal sense. They simulate the outputs of such processes, sometimes uncannily well, but without the continuity that gives those outputs meaning over time.


Large language models make this tension impossible to ignore. In short, tightly scoped exchanges, especially when judges are not actively probing for weaknesses, LLMs can appear indistinguishable from humans. In that narrow, operational sense, one could argue that the test has been functionally passed. But this was never what Turing had in mind. Over longer interactions, or when pushed outside learned patterns, the cracks appear: confident errors, shallow reasoning, invented facts, and an absence of genuine grounding in the real world.


The deeper issue is that LLMs do not understand in the human sense. They do not hold beliefs, intentions, or experiences. They predict language rather than grasp meaning. Ironically, their very fluency can give the game away. They are often too consistent, too polite, too composed, qualities that an experienced interrogator may recognise as artificial.


There is also a shift in context that Turing could not have anticipated. In 1950, the challenge was whether machines could imitate humans at all. Today, we know they can. The more relevant question is whether imitation is even the right benchmark. In a world where AI systems draft contracts, analyse trade documents, and support real-world decision-making, sounding human matters far less than being reliable, auditable, and trustworthy.
In that sense, the Turing Test has not been passed, but it has been outgrown. By coming so close to Turing's benchmark, modern AI has exposed its limitations. The real challenge now is not whether machines can think like us, but whether thinking like us was ever the right goal in the first place.


www.tradefinance.training 


Back to recent articles