Bias in the machine: Internet algorithms reinforce harmful stereotypes

November 22, 2016

Discovery: Research at Princeton 2016-2017

THE ARTIFICIAL-INTELLIGENCE (AI) SYSTEMS that suggest our search terms and otherwise determine what we see online rely on data that can be biased against women and racial and religious groups, according to a study led by researchers in Princeton’s Center for Information Technology and Policy (CITP).

As machine learning and AI algorithms become more ubiquitous, this phenomenon could inadvertently cement and amplify bias that is already present in our society or a user’s mind, according to the study, which was led by Arvind Narayanan, an assistant professor of computer science. The paper was posted in August 2016 on the preprint server arXiv.

The team found that the algorithms tended to associate domestic words more with women than men, and associated negative terms with the elderly and certain races and religions. “For just about every kind of bias that’s been documented in people, including gender stereotypes and racial prejudice, we were able to replicate it in today’s machine-learning models,” said Narayanan, who worked with CITP postdoctoral research associate Aylin Caliskan-Islam and Joanna Bryson, a professor of computer science at the University of Bath and a visiting scholar at CITP.

Machine-learning algorithms build models of language by exploring how words are used in context — for example, by combing all of Wikipedia or gigabytes of news clippings. Each time the model learns a word, it gives that word a series of geometric coordinates that correspond to a position in a many-dimensional constellation of words. Words that are frequently found near each other are given nearby coordinates, and the positions reflect the words’ meanings.

Biases develop as a result of the positions of these words. If the text used to train the model Bias in the machine: Internet algorithms reinforce harmful stereotypes more often associates “doctor” with words relating to men, ambition and medicine, while linking “nurse” to words related to women, nurturing and medicine, the model would come to assume that “nurse” is feminine, possibly even the feminine version of the masculine “doctor.”

To measure biases in algorithm results, the researchers adapted a test long used to reveal implicit bias in human subjects, the Implicit Association Test, for use on the language models. The human version of the test measures how long it takes a subject to associate words such as “evil” or “beautiful” with names and faces of people from different demographics. Thanks to the geometric model of language that machine-learning algorithms use, their biases can actually be measured more directly by simply finding the distance between the name of a group and positive, negative or stereotypical words.

Such biases can have very real effects. For example, in 2013 researchers at Harvard University led by Latanya Sweeney noted that African American-sounding names were far more likely to be paired with ads for arrest records. Such experiences could lead to unintentional discrimination when, say, a potential employer searches the internet for an applicant’s name.

“AI is no better and no worse than we are,” Bryson said. “However, we can continue to learn, but the machine learning for an AI program might be turned off, freezing it in a prejudiced state.” If we can measure this bias, however, Narayanan said, we can take steps to mitigate it. This could mean mathematically correcting a language model’s bias or simply being aware of the algorithms’ faults — and our own.

–By Bennett McIntosh