||Abstract We tested whether dogs have a cross-modal representation of human individuals. We presented domestic dogs with a photo of either the owner's or a stranger's face on the LCD monitor after playing back a voice of one of those persons. A voice and a face matched in half of the trials (Congruent condition) and mismatched in the other half (Incongruent condition). If our subjects activate visual images of the voice, their expectation would be contradicted in Incongruent condition. It would result in the subjects` longer looking times in Incongruent condition than in Congruent condition. Our subject dogs looked longer at the visual stimulus in Incongruent condition than in Congruent condition. This suggests that dogs actively generate their internal representation of the owner's face when they hear the owner calling them. This is the first demonstration that nonhuman animals do not merely associate auditory and visual stimuli but also actively generate a visual image from auditory information. Furthermore, our subject also looked at the visual stimulus longer in Incongruent condition in which the owner's face followed an unfamiliar person's voice than in Congruent condition in which the owner's face followed the owner's voice. Generating a particular visual image in response to an unfamiliar voice should be difficult, and any expected images from the voice ought to be more obscure or less well defined than that of the owners. However, our subjects looked longer at the owner's face in Incongruent condition than in Congruent condition. This may indicate that dogs may have predicted that it should not be the owner when they heard the unfamiliar person's voice.