IBM Research on Tuesday released a new data set that contains 1 million images of diverse human faces, with an aim to help advance fairness and accuracy in facial recognition technology.
“For the facial recognition systems to perform as desired — and the outcomes to become increasingly accurate — training data must be diverse and offer a breadth of coverage,” wrote John Smith, an IBM fellow, in a blog post. “The images must reflect the distribution of features in faces we see in the world.”
This comes after artificial intelligence in facial recognition systems has reportedly shown bias. Last week, an MIT study revealed that Amazon’s Rekognition tech had a harder time recognizing the gender of darker-skinned women and made more mistakes identifying gender overall than competing technologies from Microsoft and IBM.
While researchers are already working with attributes like age, gender and skin tone, these features can’t adequately characterize everyone, according to IBM. Things like face symmetry, facial contrast, the pose the face is in, and the length or width of eyes, nose, forehead, mouth and more need to be considered.
IBM’s data set, called Diversity in Faces, has 10 coding schemes, which include features like head length, nose length, forehead height, facial ratios, age, gender, pose, resolution and more.
The million-face data set is available today to researchers around the world on request.