News

Команда форума

Редактор

Offline

For a while now, companies like OpenAI and Google have been touting advanced "reasoning" capabilities as the next big step in their latest artificial intelligence models. Now, though, a new study from six Apple engineers shows that the mathematical "reasoning" displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems.

The fragility highlighted in these new results helps support previous research suggesting that LLMs use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. "Current LLMs are not capable of genuine logical reasoning," the researchers hypothesize based on these results. "Instead, they attempt to replicate the reasoning steps observed in their training data."

Mix it up

In "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"—currently available as a pre-print paper—the six Apple researchers start with GSM8K's standardized set of over 8,000 grade-school level mathematical word problems, which is often used as a benchmark for modern LLMs' complex reasoning capabilities. They then take the novel approach of modifying a portion of that testing set to dynamically replace certain names and numbers with new values—so a question about Sophie getting 31 building blocks for her nephew in GSM8K could become a question about Bill getting 19 building blocks for his brother in the new GSM-Symbolic evaluation.

Read full article

Comments

Похожие темы	Форум	Дата
News Apple Pay now works with third-party iOS and desktop browsers	Overview of computer technology and the Internet.	Вчера в 19:48
News Apple adds another buy now pay later service to Pay	Overview of computer technology and the Internet.	Вчера в 19:48
News The USB-C Apple Pencil drops to a new all-time low of $65	Overview of computer technology and the Internet.	Вчера в 16:00
News Apple Music helps artists turn concert set lists into playlists	Overview of computer technology and the Internet.	Вчера в 16:00
News Apple will launch a Business Caller ID service next year	Overview of computer technology and the Internet.	Вчера в 03:01

News Apple Pay now works with third-party iOS and desktop browsers

Overview of computer technology and the Internet.

Вчера в 19:48

News Apple adds another buy now pay later service to Pay

Overview of computer technology and the Internet.

Вчера в 19:48

News The USB-C Apple Pencil drops to a new all-time low of $65

Overview of computer technology and the Internet.

Вчера в 16:00

News Apple Music helps artists turn concert set lists into playlists

Overview of computer technology and the Internet.

Вчера в 16:00

News Apple will launch a Business Caller ID service next year

Overview of computer technology and the Internet.

Вчера в 03:01

Tools Web-Органайзер

Tools IP Информер Провайдера

Tools User Temp Cleaner

Tools Netzwerk Analyse Tool Ipconfig

News Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

News

Похожие темы