AI vs Human Reasoning : GPT-3 Matches College Undergraduates

In an eye-opening study, researcher revealed GPT-3, a popular artificial intelligence language model, performs comparably to college undergraduates in solving reasoning problems that typically appear on intelligence tests and SATs. However, the study’s authors question if GPT-3 is merely mimicking human reasoning due to its training dataset, or if it’s utilizing a novel cognitive process.

OpenAI’s renowned AI-powered tool, GPT-3, has been revealed to possess reasoning abilities comparable to those of college undergraduate students, as evidenced by scientific research.

In a comprehensive study, the artificial intelligence behemoth known as the large language model (LLM) was exposed to intricate reasoning challenges commonly encountered in intelligence tests and standardized exams like the SAT. These tests play a pivotal role in the admissions process for higher education institutions in the United States and around the globe, as reported by PTI. The results underscore the remarkable prowess of GPT-3 in addressing intricate cognitive tasks.

The study, conducted at the University of California – Los Angeles (UCLA), involved a captivating experiment featuring GPT-3. The AI was assigned the task of predicting the subsequent shape in elaborate arrangements of shapes and solving SAT analogy questions, both of which were entirely novel to the AI.

Moreover, the researchers enlisted the participation of 40 UCLA undergraduate students to attempt the same challenges. In the shape prediction assessment, GPT-3 achieved an impressive accuracy rate of 80%, surpassing the average score of human participants, which stood just below 60%. Remarkably, GPT-3 even outperformed the highest individual human scores.

UCLA psychology professor Hongjing Lu, the senior author of the study published in the journal Nature Human Behaviour, stated, “Surprisingly, GPT-3 not only performed at a level comparable to humans but also made similar types of errors.”

In the realm of SAT analogies, GPT-3 displayed exceptional performance, surpassing the average scores of its human counterparts. Analogical reasoning involves confronting novel challenges by drawing parallels to familiar scenarios and applying analogous solutions to the new context. The test questions required participants to identify pairs of words that shared analogous relationships. For instance, in the provided example “‘Love’ is to ‘hate’ as ‘rich’ is to which word?,” the correct response would be “poor.”

The test items prompted participants to select word pairs that exhibited the same type of relationships. For instance, in the problem “‘Love’ is to ‘hate’ as ‘rich’ is to which word?,” the solution would be “poor.”

However, when faced with analogies rooted in short narratives, the AI’s performance was not as robust as that of the students. These specific challenges necessitated reading a passage and then discerning another story conveying a comparable meaning.

UCLA psychology professor Keith Holyoak, a co-author of the study, remarked, “Language learning models primarily focus on predicting words, so it’s astonishing to witness their aptitude for reasoning. In the past couple of years, this technology has experienced a significant leap from its previous iterations.”

Given the restricted access to GPT-3’s internal mechanisms, which are closely guarded by its creator, OpenAI, the researchers acknowledged uncertainty regarding the fundamental processes driving its reasoning capabilities. There remains ambiguity about whether large language models (LLMs) are genuinely beginning to emulate human-like “thinking” or if they are simply duplicating human thought through an alternate avenue. The researchers expressed a keen interest in delving deeper into this subject during subsequent investigations.

UCLA psychology professor Keith Holyoak, a co-author of the study, remarked, “We are eager to ascertain whether GPT-3 is genuinely employing methods akin to human cognition, or if it has introduced something entirely novel—a true manifestation of artificial intelligence—which would indeed be a remarkable achievement.”

Source LiveMint