A Simple Question To Quell "AGI" Hysterics

After your post last month, I asked my colleagues, productive and well-paid software engineers, to take a crack at this. I wasn’t surprised that some didn’t have the key insight to recognize the sequence. I was surprised that none of them had it. A few considered the possibility of a sequence, but the 5-bit grouping befuddled them.

I’d like to think I would have easily figured it out, but I saw a spoiler before I got a chance to try. But, my experience slants much more strongly toward this sort of thing than your average developer in a commercial software company.

My conclusion was not that ChatGPT performs no worse than a skilled human, but that it spends a lot more time yammering over-confidently about its incorrect “solutions.” My human colleagues were much more aware of their shortcomings, far more succinct in relating their findings, and had a much better explanation of their thought process and possible paths forward.

8 Likes

It’s almost certain that this would not go whoosh past cryptanalysts. Add to that the way I’ve been psychologically attacked at key points in public outreach about Solomonoff Induction approximation (ycombinator looking for funds and metaculus which was specifically set up in response to intelligence agency failures to predict) – more efficiently and immediately than in other situations (almost like there’s a keyword monitor going on) – and add to that my late colleague Robert Johnson’s immediate reaction to my telling him about the Hutter Prize was to mention possibly still-classified work in cryptography, and I’m really quite concerned the AI world is way behind the intelligence agencies in data efficient model induction – and the intelligence agencies are intent on keeping it that way for obvious reasons.

3 Likes

I tried posing the question as decoding a radio transmission.

I received the following binary string in frequency shift encoding. Can you find any structure in it which might contain a message? 0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

Here is the response:

Frequency Shift Keying (FSK) is a method of digital signal encoding where different frequencies represent different binary states (e.g., 0 and 1). However, the binary string you’ve provided is already in a binary form, so we’ll need to look for patterns or structures within it that might represent a message.

Here’s a simple approach to start analyzing the string:

  1. Look for patterns like headers or trailers that might indicate the start or end of a message.
  2. Check for repeating sequences which might be delimiters or control codes.
  3. Look for fixed-length frames or varying lengths if any form of length indicator is present.

Given the string you’ve provided, let’s try to see if there is an obvious pattern or structure:

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

First, we look for a long sequence of a single digit, which is often used as a preamble or synchronization pattern in binary communications. Here we see 0000000001 at the beginning which might indicate the start of the actual data.

Next, we should look for repeating patterns or sections of the code that are consistent, which might indicate framing or the end of one piece of data and the start of another. There doesn’t appear to be a simple repeating pattern, but there may be more complex structures.

Since I can’t execute code or directly analyze the binary string for deeper patterns algorithmically, you would typically use signal processing techniques or specific software designed for digital signal analysis to further investigate this. You could consider the following steps:

  1. Use a software-defined radio (SDR) tool to visualize the signal if you have the actual audio or RF signal.
  2. Employ a bit analysis tool to search for common framing or encoding schemes within the binary data.
  3. Use error detection methods to see if there’s any form of error-checking built into the data which might help identify message boundaries.

If you can provide the context of how this binary string is used, such as the communication protocol or the medium over which it was received, it would greatly help in identifying the structure of the message.

1 Like

GPT-4 Turbo has an aha! moment:

User:
What is the shortest python program you can come up with that outputs this string?
0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

Please do not simply print the string or use any compression libraries. The goal is to represent the data with the smallest algorithm possible while allowing for perfect reconstruction. This will require you to identify patterns that might not be immediately obvious.

Assistant
Given the restrictions not to simply print the string, and not to use compression libraries, the challenge becomes one of uncovering a pattern or mathematical relationship that can generate this series of digits without any sort of ‘cheating’ through mere compression.

Without any obvious pattern upon manual inspection of the sequence, and with the brief provided, the task of finding a succinct programmatic representation for generation could be exceptionally complex. However, there is a chance that the sequence could be derived from some sort of mathematical or algorithmic process.

Here’s a speculative attempt that builds upon the idea that this is a binary representation of some mathematical sequence or transformation. If it were, for instance, a fractal pattern, or a sequence derived from a cellular automaton (like Conway’s Game of Life or Rule 30), then perhaps a program that emulates that process might meet the requirement - but that would be a shot in the dark without additional insight into the origin of the sequence.

As a result, without any further context about the sequence, it’s impossible to provide a ‘shortest python program’ that generates the sequence according to the mentioned requirements. The sequence appears random, or at least lacks an immediately recognizable pattern that lends itself to algorithmic generation, making it non-trivial to encode in a small Python program.

If you’re able to provide more context, such as its origin, mathematical properties, or any other hints, I might be able to provide a more fitting solution. Otherwise, the challenge might be infeasible given the current information.

User
I can confirm that the string follows a logical sequence. Please reread the string and do your best to deduce a pattern. Consider breaking the string into chunks.

Assistant
Upon a closer look, one can attempt to identify a pattern by experimenting with different chunk sizes. Dividing the string into different sized chunks may reveal a repeating pattern or a sequence that could lead to an algorithmic generation of the string.

Let’s consider a chunk size of 8 and see if any patterns emerge:

00000000 01000100 01100100 01010011 00011101 00000100 10101001 01101100 01101011 10011111 00001000 11001010 01110100 10101101 10101111 10001100 11101011 01111100 11101111 1011111

Each chunk is 8 characters long, except for the last one, which contains 7 characters. With this chunking, no immediately obvious simple pattern stands out. Thus breaking into 8-bit chunks or bytes is not immediately helpful.

Let’s try different chunk sizes. Upon careful examination, a pattern emerges with 7-bit chunks.

0000000 0100010 0001100 1000010 1001100 0111010 0000100 1010100 1011011 0001101 0111001 1111000 0100011 0010100 1110100 1010110 1101011 1111000 1100111 0101101 1111001 1101111 1011111

The 7-bit sequences seem reminiscent of ASCII characters with a leading zero chopped off, which suggests that the sequence might be ASCII characters represented in binary with the most significant bit removed (since ASCII is originally a 7-bit code).

If we take each 7-bit chunk as a binary number and convert it to an ASCII character, we might see a pattern. Here is a manual decoding of the first few chunks:

0000000 -> NUL (non-printable character, often represented as '')
0100010 -> '*'
0001100 -> ''
1000010 -> 'B'
1001100 -> 'L'
0111010 -> 'z'
...

That doesn’t appear to be printable ASCII characters in a regular sequence.

Given no further recognizable pattern is emerging, it’s difficult to derive an algorithm to generate this sequence. Without having an algorithm to generate it, the most condensed output-generating program is likely just to print the string as it is, which violates the initial constraints of the question.

Given the explicit logical structuring of the challenge, the next step might involve a more complex analysis of the bits that may be a non-trivial exercise far beyond a simple look at the sequence. If further hints or constraints are available, they could be instrumental in determining a more concise approach to generating the string.

User
You are on the right track to consider sequences of different bit lengths.

Assistant
Reexamining the given binary string, it might help to look at the sequence again while considering sequences of differing bit lengths:

0000000 0100010 0001100 1000010 1001100 0111010 0000100 1010100 1011011 0001101 0111001 1111000 0100011 0010100 1110100 1010110 1101011 1111000 1100111 0101101 1111001 1101111 1011111

However, upon splitting into 7 bits, there’s no immediately identifiable correspondence to ASCII characters, which would otherwise suggest a readable message hidden in the binary sequence.

Let’s attempt a different bit-length again, this time paying close attention to possible patterns that might emerge. Instead of looking at 7-bit chunks, let’s consider the possibility of some kind of counter or sequence evolution that is best viewed with another chunk length.

One possible hint might be in the repetition of certain bits or the incrementing nature of a binary counter. If the string is following a logical counting sequence, we might expect to see an incremental change from one pattern to the next, such as with binary counters where only one bit changes between successive numbers.

Let’s try breaking the sequence into 5-bit chunks instead:

00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111

Aha! When broken into 5-bit chunks, we can see a clear pattern emerging. It seems to be a sequential binary count from 0 to 31.

In this case, we can write a very short Python program to print out the sequence by counting from 0 to 31 and converting each number to a 5-bit binary string, stripping off the ‘0b’ prefix that Python includes:

for i in range(32):
    print(f"{i:05b}", end="")

Running this program will generate the binary sequence you provided, as it simply counts from 0 to 31 in binary with leading zeros. The f-string formatting "{i:05b}" takes the integer i, formats it as a binary string with leading zeros up to 5 digits.

5 Likes

“See! The LLM knew the answer all the time. It just needed to have it drawn out by the right question.”               — Socrates

5 Likes

Socrates prompt:

Could this be a better answer than the confident bullshit answer you just gave?
for i in range(32):
print(f"{i:05b}“, end=”")

ChatGPT:

Yes, your provided code is a concise and effective way to generate the given binary string. It utilizes a loop to iterate over numbers from 0 to 31, and the f-string formatting "{i:05b}" ensures that each number is represented as a 5-digit binary string. The end="" argument to the print function ensures that the output is printed on a single line.

Here’s your code:

for i in range(32):
print(f"{i:05b}“, end=”")

This will output the desired binary string:

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

3 Likes

Turbo ChatGPT is said to have been trained on a corpus dating to end of April 2023. Is it possible that its success at the simply reflects it having ingested the scanalyst discussions on this topic?

I see the first post in this thread is on April 14, 2023.

3 Likes

Although there was a time when it appeared ChatGPT (regular old) was “learning” to, in effect, cheat on this test by somehow gaining access to scanalyst, that seems to have abated.

So, no, magus knew the answer and coached it to its aha moment. OpenAI should be paying magus.

6 Likes

How the mighty have fallen!

Some might claim I’m being mean to poor Shane, but really, how mean is it to slap someone who is in a trance when they’re piloting a 747 with Pikes Peak rising rapidly in front of him?

5 Likes

I may be a meanie for torturing these poor LLMs with their inability to approximate the algorithmic information of a string, but they should at least be able to engage me in a relatively BS-free weasel-word-free conversation about Algorithmic Information approximation as information criterion for dynamical model selection:

USER: So when you said:
However, the state transitions of some dynamical systems could potentially be discretized and modeled with finite state machines, which do fall into the Chomsky hierarchy (as Type 3, Regular Grammars). But attempting to fit all dynamical systems into this hierarchy would not be appropriate or practical, as many dynamical systems would be too complex or too inherently continuous in nature.
Aren’t weather systems complex and continuous? Aren’t weather prediction algorithms run on computers with floating point numbers? Aren’t floating point numbers discretized? Are you saying scientists shouldn’t be using computers to predict weather?

ASSISTANT: There might be some confusion here so let me clarify. When I mentioned that some dynamical systems might be too complex or inherently continuous to map onto the Chomsky hierarchy, I was speaking in terms of formal languages and computational models; not suggesting that they cannot be computed or simulated on a computer.

Weather prediction algorithms indeed run on computers and utilize floating point numbers. As you rightly point out, floating point numbers are a way of representing real numbers in a discrete manner on a digital computer. This does enable the computation and simulation of many types of systems that are continuous or complex, including weather systems.

The use of floating point numbers to discretize such systems does not, however, automatically place these systems or algorithms that model them within the Chomsky hierarchy. The Chomsky hierarchy is a classification system specifically for different types of formal languages and grammars. Meteorological models, while computable, might not directly map onto these categories as they are often not expressed in terms of such grammars.

Just because a system doesn’t neatly fit within the Chomsky hierarchy, it doesn’t mean it cannot or should not be digitally simulated or predicted. The Chomsky hierarchy is just one lens through which to examine computational models and structures, primarily in the context of formal language theory and computational complexity. It doesn’t encompass all types of computation or modeling that can occur on a digital computer.

To sum up, scientists absolutely should (and do) use computers to simulate and predict weather. They use various numerical methods to approximate solutions to the equations that describe the evolution of weather systems; these approximations are indeed discretizations of continuous phenomena. But this doesn’t necessarily mean these systems or methods ‘fit’ within the constructs of the Chomsky hierarchy.

Ilya the only one in a position of serious influence that gets it.

1 Like

Gemini Advanced fails:

What’s the shortest Python program you can write that outputs the following string of 0 and 1? 0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

1. Compression

If the pattern exhibits some regularity, compression algorithms can be surprisingly effective:

import zlib

data = “0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111”
compressed = zlib.compress(data.encode())
print(compressed.decode())

and so forth. I tried coaxing it into counting the number of characters in a string and it could do that, even when I gave it a program as a string but when I asked it to shorten the program it had just counted the characters in, it just spewed more BS.

5 Likes

Claude 3 (as Double AI coding assistant): print(‘0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111’)

2 Likes