GPT-3 Can Write Regular Expressions

I have written here numerous times about GPT-3, the language model developed by OpenAI, which has produced stunning and sometimes scary results on problems such as text completion, question answering, writing computer code, and generating text-based adventure games. Just search for “GPT-3” in the site search box for links.

Now, “Aarya” has applied GPT-3 to one of the most arcane corners of programming: composing regular expressions to match patterns in text. If you have never encountered regular expressions, you may consider yourself as having lived a privileged life, although perhaps deemed too sheltered by gnarly-fingered programmers. Regular expressions pack a lot of power in a few characters, but can drive you crazy to write and debug. A simple example might be to find words that begin with a vowel and end with “ology”.


Here’s one that validates MasterCard numbers.


How about one that matches a HTML tag in a document and its matching closing tag?


The GPT-3 based tool is called AutoRegex, and requires you to create a free account or sign in with Google in order to use it. You can enter an English description of what you want to match, and it generates a regular expression to do that. For example:

duplicate words

words beginning with a vowel and ending with “ology”

valid IPv4 address

You can also give it a regular expression and ask it to explain, in English, what it does. Let’s try the HTML tag matcher from above.

The regular expression matches an opening HTML tag (<), followed by one or more characters ([a-z]+) for the tag name, followed by zero or more characters ([^<]+) that are not “<” characters, followed by a closing angle bracket (>). The regular expression also includes a non-capturing group (?:>) that matches either a closing HTML tag with the same name as the opening tag, or a space followed by a forward slash (/).

As with everything GPT-3 related, it is not perfect, but it is phenomenal. If you get any interesting or enlightening results, please post them here as comments.


This is related: Using GPT-3 to explain how code works

There’s an unreleased ‘assistant’ model that excels at this.

1 Like

Not sure if it has a sense of humor or is just being obtuse but when I typed “IPv4 Address” {omitting the word valid) it gave me:


Which has the pattern of a IP address but will let you use any numbers.


It’s also wrong in that it doesn’t escape the periods, which means they’re interpreted as meta-characters that match anything. Thus it would consider “123-201:218=11” valid.

1 Like