• Автор темы News
  • Дата начала
  • " /> News - Apple study exposes deep cracks in LLMs’ “reasoning” capabilities | SoftoolStore.de - Софт,Avid Media Composer,Книги,Новости,News,Windows,Internet news. | бесплатные прокси (HTTP, Socks 4, Socks 5)

    News Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

    News

    Команда форума
    Редактор
    Регистрация
    17 Февраль 2018
    Сообщения
    24 535
    Лучшие ответы
    0
    Баллы
    2 093
    Offline
    #1
    For a while now, companies like OpenAI and Google have been touting advanced "reasoning" capabilities as the next big step in their latest artificial intelligence models. Now, though, a new study from six Apple engineers shows that the mathematical "reasoning" displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems.

    The fragility highlighted in these new results helps support previous research suggesting that LLMs use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. "Current LLMs are not capable of genuine logical reasoning," the researchers hypothesize based on these results. "Instead, they attempt to replicate the reasoning steps observed in their training data."

    Mix it up


    In "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"—currently available as a pre-print paper—the six Apple researchers start with GSM8K's standardized set of over 8,000 grade-school level mathematical word problems, which is often used as a benchmark for modern LLMs' complex reasoning capabilities. They then take the novel approach of modifying a portion of that testing set to dynamically replace certain names and numbers with new values—so a question about Sophie getting 31 building blocks for her nephew in GSM8K could become a question about Bill getting 19 building blocks for his brother in the new GSM-Symbolic evaluation.

    Read full article

    Comments
     
    Сверху Снизу