Abstract: The ability of organisms to discriminate social signals, such as affective displays, using different sensory modalities is important for social communication. However, a major problem for understanding the evolution and integration of multimodal signals is determining how humans and animals attend to different sensory modalities, and these different modalities contribute to the perception and categorization of social signals. Using a matching-to-sample procedure, chimpanzees discriminated videos of conspecifics' facial expressions that contained only auditory or only visual cues by selecting one of two facial expression photographs that matched the expression category represented by the sample. Other videos were edited to contain incongruent sensory cues, i.e., visual features of one expression but auditory features of another. In these cases, subjects were free to select the expression that matched either the auditory or visual modality, whichever was more salient for that expression type. Results showed that chimpanzees were able to discriminate facial expressions using only auditory or visual cues, and when these modalities were mixed. However, in these latter trials, depending on the expression category, clear preferences for either the visual or auditory modality emerged. Pant-hoots and play faces were discriminated preferentially using the auditory modality, while screams were discriminated preferentially using the visual modality. Therefore, depending on the type of expressive display, the auditory and visual modalities were differentially salient in ways that appear consistent with the ethological importance of that display's social function.