The “Big List of Naughty Strings”

When developing any software that accepts user input, programmers must take great care to “sanitise” that input to guard against content which, if passed to libraries, operating systems, or included in output generated by the program might damage or hijack the system in various ways. For example, consider the “SQL injection attack” that figures in the classic xkcd 327 “Liittle Bobby Tables” cartoon.

Here, the string in panel 3, keyboarded from a paper form and then passed to a database on an SQL command, used an embedded quote to terminate the string, then entered an executable command to delete the entire database of students.

SQL injection has been the vector for “hacking” incompetently designed Web sites, in particular those foolish enough to use WordPress, which has no coherent concept of quoting things and is built on top of PHP, arguably the worst and most poorly designed widely-used programming language in history and around the world. But there are many other ways user input can break everything from parsers to low-level operating system and network services.

Max Woolf, “minmaxir” on GitHub, has compiled a tremendously valuable resource, the “Big List of Naughty Strings”, now available on GitHub in several formats, from plain text to a JSON data structure for automated testing.

The list contains numerous categories of strings which might cause problems when input by a user. Here are just a few.

  • Strings which may be used elsewhere in code
  • Strings which can be interpreted as numeric
  • ASCII punctuation
  • Non-whitespace C0 controls
  • Whitespace
  • Strings which contain misplaced quotation marks
  • Strings which contain two-byte letters
  • Japanese Emoticons
  • Right-To-Left Strings
  • Unicode Upsidedown
  • Script Injection
  • SQL Injection
  • Server Code Injection
  • Unwanted Interpolation
  • File Inclusion
  • MSDOS/Windows Special Filenames
  • Innocuous strings which may be blocked by profanity filters
  • Terminal escape codes

If you don’t want to clone and install the whole package, here is a direct link to the raw text file. If displaying this with your browser instead of directly downloading it ends badly, don’t blame me.

Just wait until Elon Musk’s son “X Æ A-Xii” (originally named “X Æ A-12”) goes to school.