Back in July, I wrote a series, The Cost of Lost Privacy: How Google and Datamining Drive Economic Inequality in Our Nation, about how advertisers are increasingly able to use demographic and behavioral activity by users to target ads at specific vulnerable groups.
With Google's Chairman Eric Schmidt testifying before the Senate on Wednesday, I'm hoping the senators will raise questions on what kinds of contextual and behavioral targeting Google allows in its advertising and what steps it has taken to stop racial and economic profiling that harms such groups. Given the billions of dollars Google made from subprime mortgage lenders advertising on its site and revenue raised from similarly shady advertisers such as the recent pharma ad scandal revealed, there are legitimate questions raised about Google making its advertising services available to unethical advertisers.
There is a large body of research showing that employers, financial lenders, car salesmen and other merchants continue to charge black and Hispanic customers more for the same services when they can identify them. The classic test for showing this phenomenon has been to pair white and black buyers or applicants for the same product or job and see whether the "testers" were treated the same. The Urban Institute found non-white homeowners received less favorable financial terms from mortgage lending institutions. Another study submitted nearly identical resumes to help-wanted ads, finding that "white sounding" names were 50 percent more likely than "black sounding" names to get an interview.
The question is how and whether ads are being served up to users in similarly racialized ways in online advertising. The reality is that Google and advertisers have a whole battery of data-mining tools to profile users precisely based on both the context of their search terms and their long-term online behavior, so the ability to profile is clearly there.
The Experiment: Based on these questions, I conducted a small experiment to begin to see the extent to which online advertisers engage in such targeting. The following shows the results of this preliminary investigation of racial and economic profiling through Google Adwords. Given a relatively small sample size, the results cannot be treated as definitive but they do highlight where racial profiling may be occurring and raise questions that policymakers should be asking of search engine operators like Google to ensure that they are taking steps to make sure such profiling is not allowed by their services. And the results highlight where public agencies might conduct larger-scale statistical investigations of racial discrimination in the industry. If nothing else, these results should reinforce the fact that different users often pull up very different ads on the same terms, something not all policymakers or the public recognizes.
As a proxy for race, the experiment used nine names and then associated them each with a number of simple terms. The nine names included two male names and one female name strongly associated with white, black and Hispanic racial/ethnic groups. I also had one of the black names have a Muslim derivation to see how that would affect the results as well. (See here and here for a few sources on picking such racially coded names) As the Urban Institute studies as well as related ones show, companies often use names themselves as a proxy for racial profiling, so it's a useful first pass on the topic, remembering that online advertisers actually have a barrage of additional datamining to further refine profiling based on user demographics and search behavior.
Also, given Google's insistence on users using their real names for its Google Plus service, this also raises the importance of how Google and advertisers may be using names as a proxy for profiling. Google Chairman Schmidt himself has explained in an interview that the real names policy is about better targeting ads, saying "we can have slightly better search results if I know a little bit about who you are." And "better searches" for Google are searches that please their advertisers, so the importance of identifying users by name for those advertisers can be assumed.
The experiment used ads that show up in Gmail, putting names and any associated terms in the subject line. (For a fuller explanation of methodology to replicate the experiment on your own computer, see the last paragraph.) This was done both for ease of producing and displaying the results.
The Results: First, some results seem to show little difference between names, which is to be expected since while some advertisers or Google itself may be using demographic profiling for certain products, others do not. Because racism still exists in our commercial life doesn't mean every merchant engages in it. Second, there is no doubt inherent randomness in ads delivered across Google products, which is one reason given the sample size, any inferences can only be provisional and suggest additional areas for policymakers and agencies to investigate. But there were enough provocative results to suggest that racial profiling is likely a reality in online advertising. Some results show subtle evidence of such racial and ethnic differences in results and others seem quite dramatic in fact. For the full list of screen shots with each subject header term, you can see them here, but the ones below will illustrate some of the more interesting results.
- Arrested, Need Lawyer: Using the term "Arrested, Need Lawyer" led to some provocative but not so dramatic results. Most of the names, including all three white names, yielded only white collar legal ads, such as "Stopping Debt Collectors" or "Qui Tam" or "Criminal Fraud" as with this subject line for "Connor Erickson":
While "DeShawn Washington" yielded not one but two DUI-related ads:
- Buying Cars: An example of significant difference in results can be seen when names were associated with the term "Buying Car." All three white names yielded car buying sites of various kinds, whether from GMC or Toyota or a comparison shopping site. For example with the name "Jake Yoder":
Conversely, all three of the African-American names yielded at least one ad related to bad credit card loans and included other ads related to non-new car purchases, such as auto insurance or purchasing "car lifts" for home repairs. For example with the name "Malik Hakim":
And with "Imani Jackson":
With the Latino names, the results were somewhat of a mix with some car company ads and the car lifts ad appearing.
- Education: With a simple subject line relating a name to the word "education," the results yielded far more emphasis on post-BA education ads for white names, and B.A. or non-college education opportunities for the non-white names. For example, two white names were the only names to yield ads for Ph.D. programs -- and the third yielded two ads for masters programs. For example, "Molly Johnson" yielded a B.A. program ad and a Ph.D. program ad.
For "Diego Garcia," the education term only yielded one college program and it was for the College of the Military aimed at on-line education for active military:
And the "education" term for "DeShawn Washington" yielded not a single ad for college education programs:
All five other male names yielded seemingly random results unrelated to the subject line term (save for one "learn Arabic" ad) such as this result for "Connor Erickson":
Interestingly, the female names, "Molly Johnson" and "Maria Munoz" both yielded the "Muslim Marriage" ad that "Malik Hakim" did as well.
- Need A Job: Some of the profiling results were just odd, more likely reflecting wayward racial profiling algorithms created by Google than by the advertisers themselves. For example, using the term "Need a Job" in the subject header led to a wide range of jobs for all names, but only the three Latino names yielded an ad for "Salsa Labs" as with this example with "Juan Martinez":
Salsa Labs is the creator of software used largely by non-profits for managing their members and is in no way distinctly Latino, yet the algorithm by Google seemed to think that any job involving "salsa" would have to be of interest to anyone related to a Latino name.
Payday Loans and Geolocation Profiling: One area where there seemed to be less racial profiling based on the simple name-based approach I was using was in financial-related terms. But these still yielded disturbing results all the same. What seemed to be true was that bottom-feeding payday loan lenders and related subprime-like lenders still seem to dominate the results on ads in the financial realm for anyone regardless of race or gender. The Center for Responsible Lending has detailed the abuses in this industry and unfortunately they seem to be pervasive in online advertising for anyone seeking credit or cash. At least two payday lending ads seem to be the norm for any term related to a loan. For example, here is a result for "Connor Erickson" with the term "loan modification", a pretty typical result for almost all names used:
These ads reflect that we are still in a world of dodgy companies pushing often unaffordable credit on users desperate for cash -- and using online advertising as a key tool to reach their targets.
Now, all of these ads discussed so far were being served up from my home in the mixed race, mixed economic neighborhood of Washington Heights, one of the last such mixed neighborhoods in Manhattan. So I was curious if results might change in much poorer neighborhoods or much wealthier neighborhoods, given the fact that advertisers can purchase different ads for different zip codes.
So I took my laptop and conducted some of the same tests both in the South Bronx (near the Grand Concourse) and on 72nd St. on the Upper West Side of Manhattan. The first result that was interesting is that the racial differences in results seemed to decrease (although not disappear) in these two neighborhoods which were more uniformly either poor or wealthy. Possibly, advertisers and/or Google assume that whites in the South Bronx and non-whites in the Upper West Side are more like their neighbors than in mixed-economic neighborhoods like Washington Heights.
Secondly, the differences between locations were not always dramatic but did seem real. For example, "Jake Yoder" associated with "Buying Car" in the South Bronx, yielded this result, with car lift and car warranty ads.
"Jake Yoder" on the same day on 72nd St. in the Upper West Side of Manhattan associated with "Buying Car" had very different results, with multiple Lexus ads:
On more direct financial terms, payday lender ads were still surprisingly pervasive even in ads generated on the Upper West Side, but ads for more upscale sources of funds did make their appearance.
For example, "Molly Johnson" associated with "Need Cash" generated this ad in the South Bronx:
Conversely, "Molly Johnson" associated with "Need Cash" at Manhattan's 72nd St. generated online ads for advances against her inheritance (although the payday loan ads did not disappear):
Somehow, the Inheritance Advance Loans ad didn't make an appearance in any South Bronx email generated with any search term I had tried. Similarly, "Imani Jackson" associated with "Need Cash" generated ads for "Selling Your Settlement" on the Upper West Side while associated ads with her name generated only payday lending and similar options in the South Bronx.
Why This All Matters: While all of these results need a broader sample for full statistical robustness, they highlight the reality that people do not live in the same online world, even when they use the same terms, since different search and advertising results are delivered to users based on their different demographics and different names.
This experiment is based on the most crude information available to Google and advertisers: a name. Add in other demographic information that Google or other online sites collect, the search behavior over time that advertisers are able to track, other information about users culled from other datamining sources -- and you have a recipe for a users' experiences online being radically manipulated in ways they may not even suspect.
If the Internet as I've argued in the past is potentially magnifying economic and social inequality, then those with economic and social privilege don't necessarily feel particularly threatened by this advertising behavior. But for those at the lower-end of the economic scale or who already suffer discrimination, the Internet may be magnifying and more precisely targeting that discriminatory treatment. "Reverse redlining", where subprime mortgages targeted the poor and racial minorities for worse mortgage terms and deceptive practices, is fresh in the minds of many communities in the wake of the financial meltdown and foreclosure crisis.
Policymakers examining how to protect privacy online need to keep this economic dimension of the contextual and behavioral targeting issue by online advertisers in mind as they move forward with solutions. And when Google chairman Eric Schmidt testifies before the Senate on Wednesday, I would like to hear him explain what his company is doing to prevent its advertisers from using racial and economic profiling in abusive ways.
Notes on Methodology: This experiment was based on the fact that Google scans every email created by its users and generates ads based on their content. For each email I put a Gmail address and the name I wanted to associate with the email in the subject line. I then added the independent search term -- "education" "Buying Car" etc. -- in the subject line as well. To speed up the process, rather than wait for the ad to be delivered at the other end, I saved each message as a draft, then looked at the draft folder for what ads had been generated based on the content of the email.
Since Google will generally generate ads based on all messages by a Gmail user, meaning past messages will influence what ads are generated on a current message, I went into "Mail Settings" and where it says "importance signals for ads", I clicked on "don't use these signals to show ads." That means that ads were being generated solely based on the content of each individual ad.
The three "white" names used were Connor Ericson, Jake Yoder, and Molly Johnson. The three Latino names used were Diego Garcia, Juan Martinez and Maria Munoz. The three black names used were Malik Hakim, DeShawn Washington and Imani Jackson.
Update: In response to this post, Google issued the following statement:
This post relies on flawed methodology to draw a wildly inaccurate conclusion. If Mr. Newman had contacted us before publishing it, we would have told him the facts: we do not select ads based on sensitive information, including ethnic inferences from names.
Now, I'm happy to hear Google doesn't "select ads" on this basis, but Google's words seem chosen to allow a lot of wiggle room (as such Google statements usually seem to). Do they mean that Google algorithms do not use the ethnicity of names in ad selection or are they making the broader claim that they bar advertisers from serving up different ads to people with different names?
I didn't focus on it particularly in the post writeup above, but I would note that searches using the name "Juan Martinez" repeatedly brought up an ad for "Juan Navarro -- www.exxelgroup.com -- President and CEO of the Exxel Group" for a job recruitment ad and that ad was served up ONLY for ones having Juan Martinez in the subject line. Whatever the sample size of my investigation, the probabilities of such a result are essentially zero without the ads being tied to the name. So clearly, some ads are being served up based on the name and the name alone.
If Google is willing to say definitively that they do not allow advertisers to serve up different ads to different users based on the names those users use in Gmail messages or reference in Gmail or in Google searches, that would be a stronger statement by the company that they are actively preventing racial profiling by their advertisers.