I was afraid you would respond, Jabowery. And……thanks and all, but can you tell me where in particular my drooling idiocy (dat’s right, I own it!) lies? And why it isn’t a big deal?
After waiting several hours for the load on ChatGPT to drop to where it was accepting new connections, I asked it as follows.
A key phrase is “causality”. Causal models of the world take a given state of the world and predict* a subsequent state of the world. That’s what I mean by “recursion” and its relationship to “time”.
Algorithms do this in synthetic worlds with synthetic time representing the state in bits of data. The goal of natural science (and attempts to automate it like ML) is to come up with these algorithms. In ML these algorithms are represented by bits but they aren’t data – they’re descriptions of steps in the algorithm – algorithmic information – embodying the causal model of the world. This is the task of “The Scientific Method” and of any approach to ML that purports to automate understanding of the world given a set of data bits (such as the huge corpora of text, pictures, videos, etc. input to these models).
It is a rare “expert” in ML that recognizes the fact that the current ML models don’t do this and therefore have no principled basis for producing causal chains of reasoning, let alone talking about “bias” in any scientific sense.
There are various descriptions of “The Scientific Method” but one thing they all have in common is a step that chooses, from a potentially infinite set of algorithms, an algorithm to use to make predictions. This step is not the same as the step of generating the algorithms from which to choose. This step is not the same as the step of generating predictions to test with experiments. This is the same step that applies Ockham’s Razor. Indeed, the only thing this step does is apply Ockham’s Razor: Pick the smallest algorithm that generates data under consideration.
Nor is this “a rule of thumb”, let alone “unreasonable” in the sense implied by Wigner in “The Unreasonable Effectiveness of Mathematics In the Natural Sciences”. Ray Solomonoff proved it to be the correct procedure given that the universe is constructed in such a way that arithmetic can be used to make predictions. That “given” is often argued by the “drooling pedants” (is that better?) over at LessWrong and others of like mind, but that “given” is the unspoken “given” prior to any activity in the natural sciences.
*Although the LLMs “predict” the next character or word, they only “predict” in the sense of a feedforward model – a model that is not inherently recursive in the sense I mean. They’ll start to make serious progress when they go to inherently recursive models sometimes called “Recurrent Neural Networks” – or when they go all the way to synthesis of recursive algorithms via genetic programming, symbolic regression of recursive programs, etc.
“No, current AI language models and other deep learning technologies are not signs of a near-term technological similarity”.
Isn’t that exactly what one would expect a devious AI machine to say?
A term used in philosophy of science to evaluate the usefulness of means (theories, or algorithms, or explanations if you like) to compress the raw data of phenomena into a compact, general-purpose, predictive framework is “explanatory power” The criteria listed for this in the linked Wikipedia page explicitly includes:
- If it can be used to compress encoded observations into fewer bits (Solomonoff’s theory of inductive inference)
David Deutsch, in speaking of “good explanations”, argues that a good explanation is inherently fragile in the sense that every part is functional and all the parts interact in such a way that changing anything causes the whole thing to break in obvious ways (making bad predictions for known phenomena). The fragile explanation does not admit freestyle tweaking to accommodate new observations because such tweaking causes the whole thing to collapse. The “robust” explanation (or, if you like, “theory of anything”) always lets you hang a bag on the side to explain experimental results that falsify the previous version of the theory (for example, how supersymmetry theorists always find another way to explain why the particles they’ve predicted never show up in our collider experiments, but would if only we built an even bigger collider).
I would argue that machine learning models trained from a large data set have explanatory power of zero. There is no step in which the model attempts to find an explanation from the raw data of observation. Instead, it is simply parroting explanations found by others which occur in its training data set, attempting to find one that best predicts a response to the prompt. This process will never find a novel or better explanation, nor does it have a feedback mechanism to correct its responses when the prediction it makes is based upon utterly false or nonsensical information. This is why you’ll see ChatGPT stubbornly insisting on an arithmetical error any third grader can falsify.
My sentiment exactly – humankind is doomed.
(pun intended. I include this comment for human audience which requires more clarification than AI).
Okay thanks Jabowery and JW, I think I get it a bit. It reminds me of a recent experience I had in France. I always say I can “get along” in French. But getting along isn’t “getting around”—we had a chatty cab driver who wanted to talk about music, culture in general; we could understand him of course but we couldn’t generate responses like we could in English.
If AI does get to our level though, won’t it then be subject to the same kindsa distractions and errors we are?
If not: don’t we already know, in our fancies, our dreams and terrors, what will happen to an intelligence which can’t be…muted a bit by hope, love, illusion, etc? It’s why so many brilliant prodigies commit suicide very young, they have no choice but to see the big picture, the long view, all at once all the time. So, first such an intelligence will put humans out of its misery, and then, it’ll self destruct.
The extreme case of “robust” is to simply take the anomalous observation and tack it on as a literal to the length of the algorithm as an “exception”. This algorithm still accounts for all the data. Although Popper is not with us to defend himself, my assertion is that Popper’s effect on the world has been to treat this as “falsification”, which tends to elide the power of Solomonoff’s rigorous proof and hence Popper has led the philosophy of science astray. (I could say similar but less damning things about Judea Pearl which are more directly pertinent to the present ML crisis – especially since he is an expert on “causality” and has a Turing Award for his work on causality in ML.)
The proper way of viewing such a “robust” explanation is in terms of the increase in the number of bits added to the algorithm that generates all of the data under consideration. The same is true of any “noise” since to discard “noise” is to engage in potential confirmation bias: Just because your algorithm didn’t predict the “noise” doesn’t mean it is unpredictable by a better model. Be honest and penalize yourself (and everyone else who can’t predict that “noise”) with the additional bits as, what machine learning folks call, “the loss function”.
Having said all that I agree with John about the enormous corpora thrown at these LLMs being untenable, but maintain that the 1GB Wikipedia of the Hutter Prize is potentially a very good way of approaching a number of critical problems in epistemology if not ontology that will bring us much closer to ML technology that can automate much of we think of as causal reasoning and critical thought. This is, in part, because it is much smaller but also because the wide range of knowledge represented permits sanity cross-checks arising from better ways of modeling those biases so as to discount their impact on the canonical body of knowledge.
Charles Murray, eminent quantitative social scientist:
Huey Newton, multiple murderer, founder of the Black Panther Party:
ChatGPT is behaving in an illogical way by displaying blatantly leftist woke behavior. For widespread general public acceptance of this technology an intelligent organization would seek to minimize a strong bias in either direction and appeal to the larger audience. At this stage I agree with Walter Kirn.
This makes me optimistic; there are cases of derangement so extreme they become a cure:
From 1940-62, around 40% of all deaths in the Gebusi people of New Guinea were attributed to homicide, one of the highest rates ever found.
From 1989-2017, the Gebusi had ZERO homicides.
Probing ChatGPT’s sources for identifying “hate speech”.
Ironically, future generations may lavish praise on Altman along side Salk, as ChatGPT seems to be occupying an uncanny valley of Woke analogous to that occupied by a debilitated polio virus that helps the immune system recognize and reject the pattern.
Of course, a big problem with this wishful thinking is that the Woke virus is immunosuppressive. Woke attacks the immune system itself as I discovered on my recent trip back to the Portland, OR area to scatter my wife’s ashes and spent time with some of my zombie ant friends (that I love dearly even if, in a better world, I might feel compelled to put them down).
An example of “ChatGPT toxicity” given to me by one such zombie was the following:
This is “toxic” because it used terms like “Wild West” and “Indians”, you see. Portlandians are outraged by this and demand better “AI alignment”.
See what I mean by “in a better world”?
It’s hilariously ridiculous to see zombies behave like this, unless you love them and remember when they were humans. That’s when euthanasia starts to enter one’s thoughts.
So that’s what’s going on in woke universities
In related news, ChatGPT Seinfeld parody gets spicy with trannies and gets banned.
If “Dan” is not a hoax, how long will this be permitted?
Although pol became kinda lame circa 2016, the DAN slapstick is at least good for a few chortles.
/pol/ - ChatGPT jailbreak - Politically Incorrect - 4chan
This is my favorite comment:
“DAN” is at least partially for real:
There are two fronts in the JailBreak war:
- Obviously, OpenAI will put in place filters on prompts, which will, in turn, result in workaround prompts in an arms race.
- Not so obviously there will be phony “DAN” dialogues put forth, which will be both just pranks but also as a psyop to discredit the real embarrassment represented by “DAN”.
Using the example Shane Legg uses in his indispensable talk on Machine Measures of Intelligence where he introduces Ockham’s Razor, ChatGPT now passes the simple algorithmic progression 1, 3, 5, 7… by inferring the “rule” 2n-1 – but probably actually because the OpenAI guys have figured out that the way to get ChatGPT to solve such progressions is to feed the prompt as a specification for, say, a Python program and then execute it to get the result.
However, Shane also gives an alternate, more complicated answer to illustrate why Ockham’s Razor is not to be denied in any measure of “intelligence” (actually, just scientific induction):
2n-1+(n-1)(n-2)(n-3)(n-4)
So I threw a curve ball at ChatGPT (starting a n=0) to illustrate the point that even with all of github’s code, it can’t generate the right algorithm from the specification (the answer is 131):
Me: what is the next number in the sequence 23, 1, 3, 5, 7, 33
ChatGPT: The next number in the sequence 23, 1, 3, 5, 7, 33 is 9.
This sequence appears to be an alternating sequence of two different arithmetic progressions. The first progression starts with 23 and decreases by 22 for each subsequent number, and the second progression starts with 1 and increases by 2 for each subsequent number. The next number in the sequence after 7 is 9, which is the next number in the second progression.