I have written here numerous times about GPT-3, the language model developed by OpenAI, which has produced stunning and sometimes scary results on problems such as text completion, question answering, writing computer code, and generating text-based adventure games. Just search for “GPT-3” in the site search box for links.
Now, “Aarya” has applied GPT-3 to one of the most arcane corners of programming: composing regular expressions to match patterns in text. If you have never encountered regular expressions, you may consider yourself as having lived a privileged life, although perhaps deemed too sheltered by gnarly-fingered programmers. Regular expressions pack a lot of power in a few characters, but can drive you crazy to write and debug. A simple example might be to find words that begin with a vowel and end with “ology”.
^[aeuouy][a-z]*ology$
Here’s one that validates MasterCard numbers.
^(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}$
How about one that matches a HTML tag in a document and its matching closing tag?
<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)
The GPT-3 based tool is called AutoRegex, and requires you to create a free account or sign in with Google in order to use it. You can enter an English description of what you want to match, and it generates a regular expression to do that. For example:
duplicate words
\b(\w+)\b\s+\1
words beginning with a vowel and ending with “ology”
/^[aeiouAEIOU]\w+ology$/
valid IPv4 address
^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25
You can also give it a regular expression and ask it to explain, in English, what it does. Let’s try the HTML tag matcher from above.
<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)
The regular expression matches an opening HTML tag (<), followed by one or more characters ([a-z]+) for the tag name, followed by zero or more characters ([^<]+) that are not “<” characters, followed by a closing angle bracket (>). The regular expression also includes a non-capturing group (?:>) that matches either a closing HTML tag with the same name as the opening tag, or a space followed by a forward slash (/).
As with everything GPT-3 related, it is not perfect, but it is phenomenal. If you get any interesting or enlightening results, please post them here as comments.