GPT-4 and similar AI systems learn human language through vast amounts of data, far exceeding what children typically encounter during language acquisition. While children receive millions of words per year, AI models train on trillions of words.
To bridge this data gap, NYU researchers experimented by training a multimodal AI system on the experiences of a single child, using headcam videos from six months to the second birthday. Despite capturing only 1% of the child’s waking hours, the model successfully learned a substantial number of words and concepts, reshaping our understanding of early language acquisition.
The researchers utilized contrastive learning to link visual and linguistic cues, mimicking how children associate words with what they see. Testing the model’s word learning abilities resembled evaluations for infants, demonstrating its capacity to generalize words to diverse visual instances.
These findings challenge traditional debates on language learning, showcasing that AI models can offer insights into children’s learning processes without relying on massive datasets. The study received support from the U.S. Department of Defense and the National Science Foundation, with ethical approval from NYU’s Institutional Review Board.