Charles Murray of the American Enterprise Institute has announced “Data Tools 1: Deciphering the Location of Respondents in the American Community Survey” [PDF], the first in a series of databases he has personally curated over the years and used in research for his books. He notes:
For more than 50 years, I have loved to prepare and explore big databases. It has felt closer to a vocation than a profession. Don’t ask me why. The “aha!” moments are vanishingly rare, lost among the days, weeks, and months spent on tasks that meet every definition of “tedious.” But I long ago stopped farming out these tedious tasks to research assistants because I realized that, for me, they were fun and that giving those tasks to research assistants was reducing the satisfaction I found in my work.
The oddest part of this odd vocation is that in recent years I have become as absorbed in the problems of preparing databases for analysis as in exploring the data’s implications. Most people have a rough understanding of what the word “exploring” means when it comes to data analysis. But unless you work directly with databases, you are unlikely to realize how much of the iceberg is below the surface. To give you an idea, preparing the seven new variables in the data file I am about to describe took around 300 hours of work.
Typically, all this preparation is used by a researcher or team of researchers for a single study and never used again. Sometimes this is unavoidable because a database is so specific to a particular issue that it has little utility for anyone else. But over the years, I have prepared databases that have potential for exploring many topics that I did not. I am thinking especially of the databases I prepared for Human Accomplishment, Coming Apart, and Facing Reality, plus a few others that I assembled but never used for published work.
Over the next year or so, I plan to remedy this situation, sharing databases that I hope can be useful to others. But I will also post some new data files that can inform entire classes of analyses—hence the title of the series, “Data Tools.”
Here is the PUMA-descriptors database posted on GitHub.