By David Morganstein, Tom Krenzke, and Sylvia Dohrmann
Government surveys get a bad rap. Detractors cry that the surveys "pry into our daily lives." The Founding Fathers, however, recognized the important role information can provide in tracking the development of our country. So much so, they penned a requirement for a decennial census into the first Article of the U.S. Constitution.
The founders' appreciation of government-sponsored survey data didn't stop there. In 1790, James Madison requested the extension of data collection to information on agriculture, commerce, and manufacturing. This info, he asserted, would enable legislators to "adapt the public measures to the particular circumstances of the community." Madison asked whether legislators would rather "rest their arguments on facts, instead of assertions and conjectures?"
Ten years later, Thomas Jefferson requested a further expansion of census data. His focus was on what influences life and health, specifically soil, climate and occupation and how these affect life expectancy. An expanded census, he explained, would provide a valuable opportunity to gather "facts highly important to society."
Today, important and valuable information is collected through surveys, such as monthly numbers on employment and unemployment; international trade data; health status of the population; how many scientists, engineers and mathematicians the nation graduates each year; and whether prices are changing from month to month.
Critics of government surveys bark another flawed argument: that, once collected, personal information is at risk. These people have, obviously, never met a statistician. Statisticians take data very seriously. We are data-control fanatics. We make sure your personal identifiers are deleted and the other data remains just that: a bunch of codes and numbers that don't identify you or connect you to the information you've provided.
The statisticians who collect all this information for the government follow a strict code of ethics, one that follows them from data collection to final reporting. This code is all about performing statistical work responsibly, practicing secure statistics, as it were. Statisticians use a wide variety of techniques to ensure each person providing the information remains anonymous.
You're Just a Number to Me
There's probably not another scenario in life in which you want to be just a number (or a bunch of codes and numbers), but statisticians do just that to protect your identity. We have many techniques to separate your personal identification from the other more important information you provide. Your name and address are jettisoned. Sure, there are cases when a study participant, with consent, needs to be identified (e.g., in a clinical trial of a new drug). But, for most government surveys, your individual identity is not integral to the survey question; it's what you represent in the larger picture that counts.
Take, for example, your brother's daily detours to the coffee shop on his way to work each morning. Statisticians are crunching the numbers of the thousands of commuters who travel each day in that metropolitan area because it helps planners develop a comprehensive assessment of transportation needs.
What about your elderly aunt whose lumbago has flared and she's been in and out of the doctor's office for it as a result? Statisticians are focused, again, not on your aunt's lumbago, but on the numbers -- on what services Medicare beneficiaries are using to chart over time how that group is affected by changes to the Medicare program.
Information culled from survey respondents, whose identities are protected, is combined to produce group-level statistics that help answer these and other important questions. The job of the statistician is to "look through" the individual information to get the bigger picture.
We Have Ways of Making You ... Invisible
The personal information you provide in a government survey is protected, by law, from public release for any purpose. Statisticians review the data to ensure no single individual can be identified from the summary by combining other information (not including personal information) that you provide.
There are a number of sophisticated and powerful privacy protection methods statisticians use to ensure your identity is divorced from your data:
• Statisticians evaluate the data for high-risk values that, if disclosed, might lead to unique individuals in the population being identified, such as a Hawaiian construction worker in New Hampshire who weighs 300 pounds and was in the hospital for three months. If left as is, someone might be able to identify this person in the data. These items are dots that should not be connected.
• Next, statisticians incorporate a variety of data-treatment methods that don't corrupt the information; rather they shield associations of the data that might point to an individual.
- One method makes the data less specific. A typical application of this is trimming off outlier values. For example, in a survey of wealth, extremes are grouped into a single, less extreme category. It would be easy to identify Bill Gates in a survey if a category was "net worth of67 billion," but not so if everyone of "net worth 100 million plus" is put into a single group.
- How about a little random perturbation? Perturbing data is statistics-speak for tweaking the data with a touch of randomness to ensure that while perturbation protects the individual, its message and quality of the information remains the same. In other words, you can't see the tree, but you can still see the forest!
- Another approach moves pieces of data from one individual to another, sometimes even across different geographic regions, to further conceal the identity of an individual. People who use the file have no idea which pieces were moved and, therefore, won't be certain that they know anything about specific responses.
After applying these techniques, statisticians then go back to evaluate the impact on the message and quality of the statistics.
These techniques are never an end in themselves. Statisticians are constantly developing and evaluating new methods and new approaches to ensure your identity is protected.
Morganstein is vice president and director of the statistical staff at Westat and will be the American Statistical Association's president-elect in 2014. Krenzke is an associate director and senior statistician at Westat. Dohrmann is a senior statistician at Westat. Special thanks to Joan Murphy, a senior writer at Westat.