Emotional facial expressions play an important role in social communication across primates. Despite major progress made in our understanding of categorical information processing such as for objects and faces, little is known, however, about how the primate brain evolved to process emotional cues. In this study, we used functional magnetic resonance imaging (fMRI) to compare the processing of emotional facial expressions between monkeys and humans. We used a 2 × 2 × 2 factorial design with species (human and monkey), expression (fear and chewing) and configuration (intact versus scrambled) as factors. At the whole brain level, selective neural responses to conspecific emotional expressions were anatomically confined to the superior temporal sulcus (STS) in humans. Within the human STS, we found functional subdivisions with a face-selective right posterior STS area that also responded selectively to emotional expressions of other species and a more anterior area in the right middle STS that responded specifically to human emotions. Hence, we argue that the latter region does not show a mere emotion-dependent modulation of activity but is primarily driven by human emotional facial expressions. Conversely, in monkeys, emotional responses appeared in earlier visual cortex and outside face-selective regions in inferior temporal cortex that responded also to multiple visual categories. Within monkey IT, we also found areas that were more responsive to conspecific than to non-conspecific emotional expressions but these responses were not as specific as in human middle STS. Overall, our results indicate that human STS may have developed unique properties to deal with social cues such as emotional expressions.