Humans and songbirds are both referred to as vocal learners, as they share the rare capacity to modify the production of vocalization as a result of experience. In a continuist scenario, this convergence is based upon the cooption of pre-existing genetic and developmental patterns. Here I review evidence testing the prediction that 1) conserved genetic networks and brain regions have been independently coopted by humans and songbirds for vocal learning, and 2) that rudiments of vocal learning can be found among non-learners, such as chimpanzees and gibbons. In particular, I propose that these behavioural rudiments may point to processes of affiliation and intimacy as pre-vocal learning conditions in the hominin lineage. Consistent with this hypothesis, emotional, rewarding mechanisms seem to support language learning and the overall vocal development of human babies. A broader comparison between vocal and non-vocal learners may thus encourage a deeper investigation of emotions motivating humans to speak.