Why "A Disproof of LLM Consciousness" fails

It’s been about a month since my last post on Erik Hoel’s paper:

Did Erik Hoel just disprove LLM consciousness?

After a lively back-and-forth in the comments and a lot of time to think, I thought I would elaborate and extend my argument. For better context, and fairness to Erik, I recommend reading my post and his; if you really want the rich picture, you should read his papers too (linked in the posts), but I tried to convey what I believe to be the load-bearing aspects of the paper for my reply.

Thinking about consciousness wrong

Hoel never defines what he means by consciousness. However, the paper does express these beliefs:

  1. Humans are conscious (Hoel 15).
  2. Lookup tables are not conscious (Hoel 10).
  3. Consciousness is potential ground for moral patienthood, and over- or under-attributing consciousness has high moral stakes (Hoel 2).
  4. Consciousness cannot be directly measured, in the way that temperature and other non-mental phenomena can; it can only be “inferred” (Kleiner & Hoel 11).
  5. The only available way to “infer” consciousness in an LLM is its input/output behavior (Hoel 13).
  6. If there is no empirically falsifiable theory of consciousness which says X is conscious, then X is not conscious (Hoel 10).
    • Corollary: a true & empirically falsifiable theory of consciousness exists (since humans are conscious).

Everybody, except maybe panpsychists, should be able to agree with 1-3. But 4-6 are contentious theoretical claims.

For instance: how do you know, a priori, that consciousness cannot be measured and that there must be a scientifically falsifiable theory that explains it? Doesn’t that seem like a pretty weird phenomenon? Hoel’s reply to me in the comments last month was:

The argument requires only a weaker claim, which is that consciousness is not a physical observable like temperature. But that doesn’t, in turn, mean that consciousness is unique scientifically … Is computation a physical observable? What about representation? What about Hume’s entire famous line of reasoning about causation not being physically observable? Yes, their physical correlates might be observable, but the actual action of them is arguably not.

These are great examples of things which are not physically observable. But they are also great examples of things which are not subject to non-trivial, falsifiable scientific theories!

Imagine if Hoel made the same claim, in full, about computation: “Computation cannot be measured, only inferred. However, there must exist a true scientifically falsifiable theory which can distinguish genuine cases of computation from non-computation. And stipulative definitions from physical observables, along the lines of ‘X computes Y iff there is a mapping between the states of X and the states of the algorithm Y’ don’t count, because then there is no more work for science to do.” I am not making that last part up: Hoel’s entire stated justification for assuming the falsity of “trivial” theories is that “otherwise, there is no scientifically informative theory of consciousness.” Well, in that sense, there is no scientifically informative theory of computation in LLMs either, but we don’t assume that therefore they have no computation!

In section 4.1, Hoel considers dropping #6 to accommodate humans and admits that one of the solutions to the Kleiner-Hoel dilemma is to accept trivial theories and true. But here, he offers only a handwavey account of why these theories wouldn’t apply to LLMs:

[A]ll such theories have far more “room” in humans, due to the many more properties of humans that can serve as Consciousness-Relevant Properties … Even panpsychist theories like Russellian monism [66] face a combination problem [67] that seems more solvable in an integrated and plastic human brain than a feedforward and static LLM. (Hoel 17)

I have previously argued that consciousness could act much like a Gödel sentence in science … [this] would support that consciousness is beyond empirical science’s ken; however, the reasoning for metaphysical but unscientific trivial theories having more “room” for humans in them would still apply (Hoel 18)

Here, Hoel and I agree: LLMs are probably not conscious for many senses of the word “conscious,” and there are many dissimilarities between humans and LLMs that should make us skeptical. But now, we have fallen quite far from a “disproof” into very speculative terrain.

I don’t know exactly what Hoel means when he says consciousness, but I don’t think anything exists which satisfies all six of the above claims. My main gripe is the lumping together of 1-3 with 4-6. I think it is obvious that there is some important phenomenon which satisfies 1-3 and which deserves to be called “consciousness.” I do not think it is at all obvious that the same phenomenon satisfies 4-6. Hoel isn’t just claiming, LLMs don’t have this particular definition of consciousness. He is claiming, I have proven that LLMs don’t have the kind of consciousness that grounds human moral worth. No, you have not.

Okay, but then how do you test for something like consciousness? I think it’s tricky, but mostly because neuroscience is still a really new field and there is a lot of conceptual confusion about what we are even talking about. Similar issues would emerge in other cases of pre-theoretic disagreement, and similar solutions are available.

Thinking about consciousness less-wrong

Imagine a group of medieval scientists are arguing about what water is. Unfortunately, there is a lot of conceptual confusion about how to identify water pre-theoretically. The essentialists think that a density of 1g/ML, a boiling point of 100° C, a dynamic viscosity of 1, etc. are essential properties of water. The pluralists point to significant variance in these measurements (which they don’t yet know are caused by things like salinity, atmospheric pressure, dissolved oxygen, etc.) and reject this idea; they think that there could be many varieties of water with many different properties. Some dutiful meta-theorists, observing this conceptual confusion, propose to simplify things. Any satisfactory theory of what water is should be falsifiable, and that means there should be an agreed-upon pre-theoretic criterion used to infer whether or not a substance really is water. Suppose these meta-theorists settle on the criterion of water’s visual appearance: if something looks enough like water, it will be “inferred” to be water. Now, another team of scientists discovers that they can synthesize a liquid (pure ethanol) which looks enough like water to be inferred as water. The meta-theorists declare they have falsified those theories which assert that water = hydrogen + oxygen, as this “water” can be made without any hydrogen at all.

The H₂O theorists protest. When they say water, they don’t mean just anything that “looks watery!” There are two moves available to them. One faction goes the essentialist route and demands a richer set of criteria for inferring water, not just the appearance of water but density, viscosity, flammability, etc. This offends the pluralists, who say that they are baking in extra assumptions of what water needs to be that aren’t pre-theoretically agreed upon; if they’re going to move the goalposts like this, then how is the theory falsifiable? Another faction goes a different route: water, they say, is the watery-looking stuff that fills the Earth’s oceans, that falls down as rain, that we can drink, etc. The meta-theorists accuse them of an arbitrary Earth-chauvinism to protect their theories from more rigorous experimental testing. The H₂O theorists fire back that they are identifying water with “the stuff of Earth’s oceans” de re, not de dicto; they are perfectly happy to accept the possibility of water on other planets, they just use Earth to clearly locate the referent.

My verdict, viewing this scene, is that there is genuine conceptual confusion among these scientists about what water even means pre-theoretically, and so it would be a mistake to declare H₂O theory “falsified” on these grounds. The essentialist and de re strategies are perfectly valid ways of identifying a phenomenon which aren’t falsified by a simple substitution of ethanol for H₂O. There is nothing unscientific about these strategies. If you dislike that definition, you are welcome to identify your own referent; maybe call the thing identified by the essentialists “water-A”, the thing identified by the de re theorists “water-B”, and the thing identified by the meta-theorists “water-C”. The mistake is just saying “water” when you are using the definition of water-C, and then claiming to have proven something to scientists who are talking about water-A and water-B.

Okay, with the semantics cleared up, how do you do science here? Suppose a scientist synthesizes a liquid which behaves a lot like the liquid in rain/lakes/wells/etc., but which doesn’t contain any hydrogen. Does that falsify water = H₂O or not? The synthetic counts as water-C, so water-C ≠ H₂O. What about water-A? If the synthetic matches density, viscosity, flammability, and all the other criteria, then water-A ≠ H₂O. But what if the synthetic is just like water-A, but slightly (though noticeably) more dense? Does that count as a falsification or not? Unless you define an exact range of acceptible densities for water-A, there won’t be a hard boundary. Instead of thinking in terms of binary falsifications, the appropriate framework is Bayesian. The closer the synthetic is to the standard for water-A, the stronger the evidence that water-A ≠ H₂O.

Similarly for water-B. It’s always theoretically possible that even a very similar synthetic compound is subtly different from water-B, since water-B was de re identified by its natural exemplars. But if we have two hypotheses, one of which says there is only one watery substance with these characteristics and one which says there are multiple which can still be distinguished by some as-yet-undiscovered attribute, then the more complex explanation should get lower prior probability. And as the list of common attributes grows longer and longer, the probability assigned to that secret difference should diminish.

I think this is much closer to how we should be thinking about consciousness. I am interested in the phenomenon which de re enables self-reports, attention control, and the ability to feel pleasure and pain in humans. I want to do science and uncover what that thing is, and also discover whether other systems, biological and machine, have that thing.

I might be wrong in calling consciousness one thing. First, maybe there isn’t a unifying phenomena behind self-reporting, introspection, and suffering in humans; maybe these are separate systems. If the physical mechanisms behind each of those phenomena look pretty independent in the brain, then I might be persuaded to discard my pre-theoretic concept. This is not much of a problem; it happens all the time in science. One might look for a phenomenon of “psychic powers” which de re explains the oujia boards, palm-reading, dowsing, and card tricks you’ve observed in your lifetime. If a magician explains how they do their tricks, it’s possible that this magician is a charlatan and all the other displays of “psychic power” have been real. But the more examples of tricks that you find, the more probable the alternative hypothesis that there is no unified psychic power becomes, until you decide to discard the concept entirely.

Second, suppose I succeed and there is some unity behind these phenomena. My identification of consciousness is still underdetermined. When I talk about the thing which causes these phenomena, at what level of description am I talking? Do I mean the precise microphysical chain of particles moving that lead to me saying “I see red?” Do I mean the computational process by which information about color from the apple becomes accessible to a variety of brain regions? Do I mean the evolutionary explanation of why I got the genes and environment that disposed me to say those words in the presence of apples? Which do I care about?

Moral relevance & qualia realism

Here is where we connect back to Hoel’s introduction. If one is just doing descriptive science, then there’s no real reason to privilege one level of explanation over the other in general; it just depends on context. If you see a gazelle stotting, and you’re an evolutionary biologist, you’re interested in the evolutionary explanation. If you’re a biophysicist, you’re interested in the physical explanation. If you’re designing robots, you’re interested in the computational explanation. And so on. There’s no question left about which is the “real” explanation.

But readers of my work know that I’m concerned with animal welfare, and readers of Hoel’s paper know that he considers over- or under-claiming LLM consciousness morally relevant. So then the question that remains is, what kind of consciousness is important for moral standing?

What if a large language model says it’s in pain? Well, I don’t think I just care about whether a system says “I’m in pain!”; I can say that and not be suffering at all. Since moral philosophy and neuroscience are still incomplete, I’m not certain what it is precisely about being in pain that makes it so awful for me, but I can start making guesses. It’s hard for me imagine, for instance, being in pain and having absolutely no disposition, no internal inclination, to stop being in pain. It’s theoretically possible, but they do seem tightly linked. However, I don’t quite see what it is about being made of neurons that would make pain so awful. Again, maybe I’m wrong. But based on the evidence I’ve got, if I spot a computer system with something that looks a lot like functional consciousness, that provides me some prudential reason to care about it.

Now, I can hear the qualia realists screaming behind their computer screens: “Of course it’s not the mere fact of whether or not you’re made of neurons which makes pain good or bad! It’s whether or not you have pain qualia! The job of a theory of consciousness is to predict which systems have qualia, and which don’t, and so far you have left that unanswered.”

To which I give the annoying reply: what do you mean by qualia? If you believe qualia are “ineffable, intrinstic, private, [and] directly or immediately apprehensible in consciousness” (Dennett, “Quining Qualia”)—and if such qualia do indeed exist—then I don’t see how you could ever expect a “falsifiable” theory to definitively tell you which systems have qualia and which don’t; they’re supposed to be third-party inaccessible. It seems like the best you can do is make probabilistic inferences based on proximity to the only thing that you know has qualia: you. Or rather, me, since I don’t know whether you all are zombies or not.

If your definition of qualia is looser, or involves paradigm cases like Eric Schwitzgebel’s, then what is your testable theory that would differentiate “mere” physical, computational, and evolutionary consciousness from “real” phenomenal consciousness (Frankish, “Quining Diet Qualia”)? If I give a full physical explanation at whatever levels of description you like, what exactly is this “feel” that is left out? I really recommend Frankish’s paper here; he says it better than I, and there’s no point in retreading old ground.

Still, the qualia realist has ordinary good-old-fashioned abductive inference & Bayesian epistemology. Assign a simplicity-weighted prior, look at the range of theories which assign qualia to various physical systems, do science, and update based on your evidence. If you find out that all human self-reports of consciousness are heavily reliant on a particular type of brain circuit, then if you see that a corgi has the same brain circuit, that’s pretty good evidence that the corgi has qualia, and you should assign a high probability to it. There is no way to prove that the corgi isn’t a “zombie”, but there is no way to prove induction and causality either, and we seem to get by just fine with inference to the best explanation.

Conclusion

Erik Hoel’s arguments about LLM consciousness follow from his formal assumptions. The problem is not a gap in the logic, it’s that his assumptions about what consciousness is and what kind of “falsification” is required for something to be a scientific theory are deeply problematic and contentious, innocuous as they might seem at first glance.

I have no problem with Hoel identifying what he means by “consciousness,” and what he means by “scientific theory.” I do have a problem with him passing this off as a proof that LLMs could not possibly have the kind of consciousness that humans have which makes it good to treat them well and bad to tread them badly. That does not fly.