• Boddhisatva@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    3
    ·
    1 year ago

    OpenAI discontinued its AI Classifier, which was an experimental tool designed to detect AI-written text. It had an abysmal 26 percent accuracy rate.

    If you ask this thing whether or not some given text is AI generated, and it is only right 26% of the time, then I can think of a real quick way to make it 74% accurate.

    • Leate_Wonceslace@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      14
      ·
      1 year ago

      I feel like this must stem from a misunderstanding of what 26% accuracy means, but for the life of me, I can’t figure out what it would be.

      • dartos@reddthat.com
        link
        fedilink
        English
        arrow-up
        10
        ·
        edit-2
        1 year ago

        Looks like they got that number from this quote from another arstechnica article ”…OpenAI admitted that its AI Classifier was not “fully reliable,” correctly identifying only 26 percent of AI-written text as “likely AI-written” and incorrectly labeling human-written works 9 percent of the time”

        Seems like it mostly wasn’t confident enough to make a judgement, but 26% it correctly detected ai text and 9% incorrectly identified human text as ai text. It doesn’t tell us how often it labeled AI text as human text or how often it was just unsure.

        EDIT: this article https://arstechnica.com/information-technology/2023/07/openai-discontinues-its-ai-writing-detector-due-to-low-rate-of-accuracy/

        • cmfhsu@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          In statistics, everything is based off probability / likelihood - even binary yes or no decisions. For example, you might say “this predictive algorithm must be at least 95% statistically confident of an answer, else you default to unknown or safe answer”.

          What this likely means is only 26% of the answers were confident enough to say “yes” (because falsely accusing somebody of cheating is much worse than giving the benefit of the doubt) and were correct.

          There is likely a large portion of answers which could have been predicted correctly if the company was willing to chance more false positives (potentially getting studings mistakenly expelled).

    • notatoad@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      it seemed like a really weird decision for OpenAI to have an AI classifier in the first place. their whole business is to generate output that’s good enough that it can’t be distinguished from what a human might produce, and then they went and made a tool to try and point out where they failed.

      • Boddhisatva@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        That may have been the goal. Look how good our AI is, even we can’t tell if its output is human generated or not.