[long] Some tests of how much AI "understands" what it says (spoiler: very little)

diz@awful.systems · 4 months ago

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

diz@awful.systems · 4 months ago

But if your response to the obvious misrepresentation that a chatbot is a person of ANY level of intelligence is to point out that it’s dumb you’ve already accepted the premise.

How am I accepting the premise, though? I do call it an Absolute Imbecile, but that’s more of a word play on the “AI” moniker.

What I do accept is an unfortunate fact that they did get their “AIs” to score very highly on various “reasoning” benchmarks (some of their own design), standardized tests, and so on and so forth. It works correctly across most simple variations, such as changing the numbers in a problem or the word order.

They really did a very good job at faking reasoning. I feel that even though LLMs are complete bullshit, the sheer strength of that bullshit is easy to underestimate.

self@awful.systems · 4 months ago

given how none of their rant applied to your OP, I’m fairly certain they didn’t read it and were just going off the title. see also how fast they went from a false critique of LLMs (“of course they’re not people”) to an appeal to an imaginary middle ground (“both proponents and critics of LLMs anthropomorphize them/think they’re sci-fi marvels”, a ridiculous claim to apply to your OP or to serious LLM skepticism in general) to smuggling in hype (“…but of course LLMs are revolutionary and we don’t know what they’re capable of”)

in short, don’t bother with this shithead, they’re just marketing OpenAI products to a particularly hostile crowd

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

A couple simple probes:

GPT4 is uncannily good at recognizing the river crossing puzzle

An Idiot With a Petascale Cheat Sheet

Is this a “hallucination”?

But after an update, GPT-whatever is so much better at such prompts.

The need for an Absolute Imbecile Level Reasoning Benchmark

Randomness in bullshitting