Facial measures derived from neural networks predict in-person ratings of facial attractiveness

– by Amy Zhao and Brendan Zietsch

Facial attractiveness studies have typically relied on asking people to rate facial photos of real-life participants or images of computer-generated faces. However, these ratings can be subjective and affected by rater biases. More recent studies (such as our own) have attempted to avoid subjective biases through the use of facial landmarks to derive objective measures of facial traits. However, these landmark-based measures ignore features thought to be relevant to face perception such as skin colour and contrast, hair, and eye colour. Here, we introduce deep neural networks as a method that combines the strengths of both approaches while addressing the limitations of facial landmarks.

Facial recognition neural network models are designed to extract abstract facial features from images. Each image input yields one set of multidimensional coordinates in feature space — a space representing the compressed version of the original image, with the number of dimensions representing the number of abstract facial features. The distance between two points (i.e. faces) in feature space reflects facial similarity, with similar faces represented by points that are closer together. While these coordinates lack direct interpretability, they effectively quantify abstract facial qualities that can be used to calculate facial traits relevant to facial attractiveness research.

We applied an existing facial recognition neural network model (VGG16) to facial images from our speed-dating study (n = 682). We used the extracted feature space coordinates to calculate traits such as facial averageness, similarity, and masculinity to predict in-person ratings of facial attractiveness and kindness. We then compared this neural network method to traditional manual (and automatic) landmark methods.

An issue that has been alluded to in past studies is that landmark measures of masculinity could be influenced by facial pitch (upward or downward tilt of a face). In our images, men tended to tilt their heads upward compared to women (there was a significant difference in facial pitch angle between genders). We found facial pitch was highly correlated with landmark measures of masculinity (-.17 ≤ r ≤ -.73). In contrast, there was little to no correlation between facial pitch and neural network measures of masculinity (.00 ≤ r ≤ -.23). Likely, gender differences in the way that men and women pose for photos might bias typical landmark masculinity measures. Here, we demonstrate that neural networks can extract facial information without being affected by limitations associated with landmarks.

Overall, facial measures derived from neural networks predicted in-person ratings, largely replicating what we found in our previous study using manual landmarks. Some differences were that neural network measures of masculinity robustly predicted facial attractiveness in men, whereas there was only context-dependent evidence for this in our previous study. We also found novel evidence for assortative preferences for facial masculinity. For example, participants with sex-atypical faces (a masculine woman or feminine man) revealed stronger preferences for a partner who was sex-typical (i.e. they rated sex-typical partners more attractive) than those with sex-atypical faces. We believe that we saw such effects in the context of neural network measures of masculinity due to increased visual information that was extracted from images of participants as well as decreased noise from participant facial pitch.

Neural network-derived measures had small to moderate correlations with landmark-based measures (.11 ≤ r ≤ .33), while manual and automatic landmarks were moderate to strongly correlated as expected (.29 ≤ r ≤ .86). Neural network masculinity measures were more accurate when it came to classifying the sex of the participant (95.6 % ≤ accuracy) compared to landmark measures (75.3% ≤ accuracy ≤ 88.8 %). Both low correlations between neural network and landmark measures as well as relatively higher sex-classification accuracy from neural network measures suggest that there is relevant information from facial photos that is uncaptured by landmarks. However, we did not find that neural networks were better (explained more variance) at predicting in-person ratings compared to other landmark measures.

While we found that neural network-derived measures do indeed predict in-person ratings, the underlying method is not well understood. Unlike landmarks, where we understand that the “average” male is one that most resembles the facial structure (as described by landmark coordinates) of the average male face, we are unclear as to what the “average” neural network face resembles – that is, what aspects of the face are contributing more or less to its position in feature space. While there are some ways in which we can use landmarks to describe (and visualise) shape variation, we are unaware of any straightforward way to visualise variation in feature space coordinates. (We note that we did create composite images of the top 20 participants for each trait.) While we controlled for ethnicity variables, this does not mitigate any systematic biases that may arise from imbalances in training face recognition models used by automatic landmarks and neural networks.

Given the lack of transparency behind neural network models, we suggest that researchers use caution when employing these methods. However, we also believe that neural networks are a fast, reproducible, and powerful way to extract visual information without the limitations associated with landmarks. A link to instructions and code for obtaining feature space values using neural networks is available in the full text of this paper.

Read the original paper: Zhao, A.A.Z., & Zietsch, B. (2024). Deep neural networks generate facial metrics that overcome limitations of previous methods and predict in-person attraction. Evolution & Human Behavior, 45(6), 106632.