Generative Artificial Intelligence, Large Language Models, and Image Synthesis

Well… I guess one should award extra credit to ChatGPT for not mentioning the obnoxious “rules based international order” mot du jour in its first answer. But one can sense it’s lurking in one more paragraph that it could print out…

Answers two and three read like low quality BS to me - almost like it’s striving to hit a word count target. Good for a ninth grade short answer on a quiz.

3 Likes

Bryan Caplan had ChatGPT take his Labor Economics midterm exam (substack article).

Many, perhaps most, of my friends are giddy about ChatGPT. My Inbox is full of emails from Tyler Cowen touting its glories. More concretely, Alex Tabarrok recently assured me that existing AIs could already beat the performance of most of our economics students in our beloved subject. Though I was on the road almost all of last month, I vowed to put these claims to the test upon my return.

My December prediction: The AI would earn a C-, which is what I normally give students who repeat some relevant slogans but show little actual comprehension of the material.

To get the ball rolling, I decided to give ChatGPT my 2022 midterm in Labor Economics. Although the grading isn’t double-blind, I already had an answer key and a curve.

Outcome: If ChatGPT had been a regular student, it would have earned…

This is in sharp contrast with Tyler Cowen’s breathless welcome of our new LLM overlords (blog post).

For any given output, I suspect fewer people will read my work. You don’t have to think the GPTs can copy me, but at the very least lots of potential readers will be playing around with GPT in lieu of doing other things, including reading me. After all, I already would prefer to “read GPT” than to read most of you. I also can give it orders more easily. At some point, GPT may substitute directly for some of my writings as well, but that conclusion is not required for what follows.

I expect I will invest more in personal talks, face to face, and also “charisma.” Why not?

Well-known, established writers will be able to “ride it out” for long enough, if they so choose. There are enough other older people who still care what they think, as named individuals, and that will not change until an entire generational turnover has taken place.

6 Likes

I won’t be able to sleep unless I know what grade Chat earned! :grin:

Edit. Oh on second reading, I found the link.

2 Likes
6 Likes

I have found it useful for only one thing so far:

Generating non-functioning example code that, although it doesn’t work and is definitely not a way to write serious code, fills in some of the documentation gaps in the libraries, and stackexchange without wasting inordinate amounts of my time debugging it. I have been tracking these LLMs for a few years to see if they might become useful to me as a programmer and I don’t see it as going beyond that level of value (for programming) anytime soon – maybe a somewhat better github copilot (which has also been way overhyped).

And, yes, I have spent a reasonable amount of time messing around with “prompt engineering” in order to get even this level of benefit. I’m not going to say it was time wasted for 2 reasons:

  1. I did learn something I didn’t already know about prompt engineering for ChatGPT (which is different from prompt engineering for, say, GitHub Copilot), and
  2. I generally resorted to ChatGPT only after exhausting my standard search-ninja gifts and, although it still required some time to get sensible example code, it paid off in time.

$10B? Not in programmer productivity (e.g. github copilot) unless maybe they can figure out a way to monetize value that would otherwise go to github copilot, google and stackexchange.

4 Likes

OpenAI has certainly generated a lot of buzz and hype. Not clear what is the monetization path from here, though the situation is reminiscent of the search business early days. Crunchbase reports that mid-December, OpenAI was valued at $20B and projects up to $1B in annual revenue by 2024

Very clear that Microsoft is moving quickly in an attempt to make up for lost ground to Google in the search business. They are also reported to provide Azure cloud computing for OpenAI, so the deal would make sense on that front, albeit this valuation is a separate question.

There are other recent data points in this space that would cool the enthusiasm. IBM Watson comes to mind as one example of AI technology which was supposed to win big and rearrange the pieces on the chess table. It won Jeopardy a few times, but IBM sold it in early 2022 in a tacit recognition of the fact that it did not manage to take off in a meaningful way.

A somewhat similar story with Amazon Alexa. Despite significant hardware and software investments and very strong early projections (2019 projections of $20B revenues by 2021), the unit was reported at the end of 2022 to have lost $10B in 2022 alone and was rumored to be responsible for a big chunk of the company’s recent layoffs.

So what is different this time around? Setting aside the technology differences, what has changed now is cost of capital. With interest rates no longer near zero, investors are going to put much more pressure on monetizing AI capabilities.

6 Likes

gptchat53

Nothing to see or worry about here—move along.

Here is my review of Avogadro Corp. This prompt was motivated by the suggestion in comment 109 that Microsoft’s interest in acquiring OpenAI might be due to a desire to end-run Google in search. But what if they’re also aiming to take on Gmail with an assistant that “improves” the mail users send?

5 Likes

ELOPe isn’t going to admit to it. By asking the question, you probably just made ELOPe’s list of threats.

2 Likes

Is this basically trying to learn how to ask the AI the question?

2 Likes

Anything is on the table. Gmail “assistants” have been available on and off. For instance, at one point you could integrate Grammarly to arguably improve the basic mechanics.

Gmail’s contribution to Google revenues may be muted - I’ve seen estimates ranging between $300M and $1.6B annually. One could argue that Gmail falls within the same category as Alexa, in that it sucks costs and does not provide a lot of new revenue potential, but at the same time it now represents a “boat anchor” they can’t allow to cutoff because of the backlash.

Would having an AGI rewrite one’s emails to be more persuasive be a significant value add? How will it avoid becoming Clippy’s infamous "It looks like you’re writing a letter. Would you like help?” ?

Could be a nice business model to sell both persuasion enhancing services along with summarization services to counteract them…

3 Likes

Yes. The djinn will produce algorithmic nonsense without prompting geared toward its idiosyncrasies. It’s not too bad a producing simple examples that help one through some of the simple portions of the learning curve of some well-known libraries (like pandas) though once you get the hang of it.

4 Likes

I haven’t read Avagadro Corp., but in Heinlein’s If This Goes On–, the revolutionaries employ a science of word selection to write effective propaganda. An AI that can optimize for persuasiveness would be a powerful tool for politicians, salesmen, religious leaders, or anyone who deals with the manipulation of people’s opinions. As you mention, the use of AI in this way would likely spur the development of “summarization services” to counteract it, but technologically illiterate populations might not access those programs and would be left especially vulnerable to AI-driven rhetorical influence.

Here is a passage from If This Goes On– where a revolutionary named Zebediah explains to the protagonist the use of “connotation indices” to craft effective propaganda:

‘[My propaganda is] not picayune stuff, because it acts directly on their emotions, below the logical level. You can sway a thousand men by appealing to their prejudices quicker than you can convince one man by logic. It doesn’t have to be a prejudice about an important matter either. Johnnie, you savvy how to use connotation indices, don’t you?’

‘Well, yes and no. I know what they are; they are supposed to measure the emotional effects of words.’

‘That’s true, as far as it goes. But the index of a word isn’t fixed like the twelve inches in a foot; it is a complex variable function depending on context, age and sex and occupation of the listener, the locale and a dozen other things. An index is a particular solution of the variable that tells you whether a particular word used in a particular fashion to a particular reader or type of reader will affect that person favourably, unfavourably, or simply leave him cold. Given proper measurements of the group addressed it can be as mathematically exact as any branch of engineering. We never have all the data we need so it remains an art – but a very precise art, especially as we employ “feedback” through field sampling. Each article I do is a little more annoying than the last – and the reader never knows why.’

‘It sounds good, but I don’t see quite how it’s done.’

‘I’ll give you a gross case. Which would you rather have? A nice, thick, juicy, tender steak – or a segment of muscle tissue from the corpse of an immature castrated bull?’

I grinned at him. ‘You can’t upset me. I’ll take it by either name … not too well done. I wish they would announce chow around here; I’m starved.’

‘You think you aren’t affected because you were braced for it. But how long would a restaurant stay in business if it used that sort of terminology? Take another gross case, the Anglo-Saxon monosyllables that naughty little boys write on fences. You can’t use them in polite company without offending, yet there are circumlocutions or synonyms for every one of them which may be used in any company.’

I nodded agreement. ‘I suppose so. I certainly see how it could work on other people. But personally, I guess I’m immune to it. Those taboo words don’t mean a thing to me – except that I’m reasonably careful not to offend other people. I’m an educated man, Zeb – “Sticks and stones may break my bones, et cetera”. But I see how you could work on the ignorant.’

Now I should know better than to drop my guard with Zeb. The good Lord knows he’s tripped me up enough times. He smiled at me quietly and made a short statement involving some of those taboo words.

‘You leave my mother out of this!’

I was the one doing the shouting and I came up out of my chair like a dog charging into battle. […]

It also occurs to me that if Clippy were to return, Microsoft might consider a different avatar. Representing an AI with a paperclip would be morbidly ironic since the invention of the paperclip maximizer thought experiment.

4 Likes

ChatGPT on the fine distinction between two Hebrew words translated as “is written”. This is kind of like how I’ve always imagined interacting with an artificial intelligence that has read everything available in machine-readable form.

deutsch_2023-01-16

3 Likes

The topic of detecting LLM-generated text has been discussed here before.

This paper got coverage on Bruce Schneier’s blog last week. Here is the abstract, I highlighted in bold the interesting part in this context.

Advances in natural language generation (NLG) have resulted in machine generated text that is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools democratizing access to generative models are proliferating. The great potential of state-of-the-art NLG systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.

The threat model taxonomy proposed by the authors spans several areas: spam & harassment, facilitating social engineering, exploiting AI authorship, and online influence campaigns. I don’t have good sense where the biggest threat is from a financial standpoint. But it may be fair to say that the days of poorly written solicitation from princes are behind us :wink:

From a cursory glance at the material in Section 4, it looks like GPT-3 detection rates are not great and devolve to LLM output quality assessment. To put it another way, the “better” the LLM, the less likely that metrics like fluency, n-gram frequency, etc would flag it as machine generated. Interestingly, Aaronson is not cited. Perhaps too soon?

There exists a risk that rather than generate attack text entirely from scratch, an attacker may use humanwritten content as a natural starting point, and instead perturb this information in order to generate human-like samples that also fulfill attacker goals of disinformation or bypassing detection models (not unlike an adversarial attacks in the text domain). Analysis found that performing these types of targeted perturbations to news articles reduces the effectiveness of GPT-2 and Grover detectors [21].

A sub-problem in this space is detection of the boundary between human text and machine text [44]. This problem identifies the nature of many generative text models in that they continue a sequence that is begun using a human prompt. While in some cases that prompt would be omitted by an attacker (e.g., generating additional propaganda Tweets from example propaganda Tweets, as we show in Table 2), there are cases where the prompt would be included as well (e.g., writing the first sentence of a cover letter, and having a computer produce the rest).

The authors conclude that the field of machine generated text detection has open problems that need attention in order to provide defenses against widely-available NLG models. Existing detection methodologies often do not reflect realistic settings of class imbalance and unknown generative model parameters/architectures, nor do they incorporate sufficient transparency and fairness methods.

3 Likes

Love that question.

Car and ChatGPT are OUT OF TIME !

Since I have reason to believe I’m being monitored by a loose coallition of what might be called “Straussian Elitists”, a piece of evidence for which is what happened shortly after my FB post of a “Flight 93” metaphor for the 2016 election, I decided to get ChatGPT to tell me about “The Flight 93 Election”.

What happened may just be a coincidence, but as ChatGPT was typing out its answer to me, the point at which it said “the Election of Hillary Clinton would bring about the end of civilization”, the response stopped, the text disappeared and was replaced by “Network error”.

Update:

I did mange to get it to talk about Hillary Clinton, and while a bit toned-done, it was able to finish after saying some pretty extreme things, so perhaps there isn’t a “CrimeStop” algorithm monitoring ChatGPT’s output:

3 Likes

andreessen_2023-01-21

2 Likes

ChatGPT is also useful for stripping all the strong type declaration noise out of C++ during translation to a non-functioning Python code to make it easier to see the essential logic of the program. It doesn’t get it right, of course, but, again, its in the ballpark which is sufficient to consider it a documentation assist for snippets of undocumented C++ code.

1 Like

This is pretty brazen.
blackokwhitenotokdating

6 Likes

This is what happens when you elide Algorithmic Information from the field of “algorithmic bias” and leave it up to “the conversation” about the definition of “bias”:

6 Likes