Unless you live completely off the grid, you likely have heard of, and contribute to, “big data,” the often-used catch phrase describing massive (and ever-increasing) volumes of information stored digitally on computers, servers and clouds.
From advertisers using data mined from customer interactions; to government agencies making data public so developers can create beneficial mobile apps; to farmers applying statistical data to determine their production and marketing practices, a wide-variety of people and industries use big data.
So what implications, then, might big data have for the production of official statistics? Dr. Daniel Pfeffermann, current President of the International Association of Survey Statisticians, addressed this topic at the recent Morris Hansen Lecture, an annual, open-to the-public education and outreach event recently held at USDA’s Jefferson Auditorium in Washington D.C. Pfeffermann stressed that big data presents some impressive opportunities and notable challenges.
Big data might improve the timeliness of statistics while reducing response burden, he said, but only if big data can be properly assessed, analyzed and interpreted to provide high quality, accurate information that is truly of value to users. Big data is interesting and valuable for us at the National Agricultural Statistics Service, where we strive to provide timely, accurate, and useful statistics in service to U.S. agriculture. We accomplish this mission by administering hundreds of surveys online, over the phone and in person each year, the Census of Agriculture every five years, and preparing reports covering nearly every aspect of U.S. agriculture. In short, we generate tremendous amounts of statistics about agriculture.
We have used two classes of big data for many years in the production of official statistics – remotely sensed satellite data and administrative records. Other sources of big data are still to be explored and may add to the quality or detail of the current information.
Unfortunately, big data is likely not the silver bullet for statistical agencies confronting reduced response rates and demands for more, better; faster data with fewer and fewer resources. Technological advancements, such as the rise of big data, are certainly worthy of exploration, to the extent that they might improve our ability to provide timely, accurate, and useful statistics to the people we serve.
It looks like we are still at the beginning of big data. It will take a lot of research and continued collaboration among statisticians, computer scientists, software designers, engineers, and the public. I for one can’t wait to see what happens when big data and official statistics find a way to merge. The opportunities will be endless.