Projects  |  ,   |  February 23, 2018

Gender Shades Project

Project that evaluates the accuracy of AI powered gender classification products.

How well do IBM, Microsoft, and Face++ AI services guess the gender of a face? This evaluation focuses on gender classification as a motivating example to show the need for increased transparency in the performance of any AI products and services that focused on human subjects. Bias in this context is defined as having practical differences in gender classification error rates between groups.

1270 images were chosen to create a benchmark for this gender classification performance test. The subjects were selected from 3 African countries and 3 European countries. The subjects were then grouped by gender, skin type, and the intersection of gender and skin type.


  • Joy Buolamwini, Lead Author
  • Timnit Gebru, PhD, Co-Author
  • Dr. Helen Raynham, Clinical Expert
  • Deborah Raji, Data Opps
  • Ethan Zuckerman, Advisor