This comic follows on from the Previous comic which will almost certainly provide context.

You might not wanna be famous, but when you’re level 10, every organization within a mile is watching what you’re doing.

  • AhdokOP
    link
    fedilink
    arrow-up
    93
    ·
    edit-2
    15 days ago

    Here’s a short little “meanwhile” comic as a bonus, since it’s been a while.

    Oh, apologies about my personal website. My hosting is done by a friend with a small server, and… well basically every wordpress site in existence is now under constant effective-DDoS by AI bots trying to scrape all the data. They’re not subtle about it, and just try to download all the pictures simultaneously. My server is too small to handle that load, so just reboots when that happens (it’s usually down for about a minute).

    The fact that it’s near constantly down is just a product of how often I’m getting these requests.

    • Forester@yiffit.net
      link
      fedilink
      arrow-up
      47
      ·
      edit-2
      15 days ago

      This won’t fix it but it might help.

      Make sure you have a robots.txt file with a crawl delay set for all agents once every 30 seconds and that you are disallowing most of the WordPress directories such as WP admin, the media directory, etc.

      I would also strongly recommend that you use a caching system if you are not using one. It’s a lot more efficient to serve the same image a hundred times to different bots from the ram than loading it off your drive.

      Just my personal opinions working in a web hosting environment.

      That’ll probably help if it’s i/o issues.

      • AhdokOP
        link
        fedilink
        arrow-up
        36
        ·
        15 days ago

        most of these AI scrapers don’t respect robots.txt, so I’m not sure that really helps much, but… we have tried doing all of these things.

        • itslilith@lemmy.blahaj.zone
          link
          fedilink
          arrow-up
          22
          ·
          15 days ago

          Someone on lemmy suggested to create a dummy endpoint that normal people won’t be able to navigate to, and disallow it in robots.txt

          Then when somebody crawls it you know they are ignoring robots.txt, and you ip ban them

          • AhdokOP
            link
            fedilink
            arrow-up
            13
            ·
            15 days ago

            That’s pretty clever.

            I think that these AI scrapers might be smart enough that this doesn’t really work though - at least if I were designing them I’d have them all come from dynamic IPs and not have any of them bother hitting the same target more than once. These things are very dedicated to acquiring content without consent, and if they’re capable of causing problems for (say) Reddit, I’m not sure my little website is going to have much luck deterring them.

            Honestly a better strategy might be to just glaze everything I draw.

            • Johanno@feddit.org
              link
              fedilink
              arrow-up
              6
              ·
              14 days ago

              I am not sure if it costs money, but you could implement captchas.

              Or use cloudflare to do that bot detecting for you.

              Worst case you make it so you need to create an account to see content.

              • AhdokOP
                link
                fedilink
                arrow-up
                4
                ·
                14 days ago

                Well, we are already using cloudflare, that’s one of the other reasons why the site is so slow… I don’t think the other two suggestions prevent a scraper from requesting the information from the server… I think they’d just make it more arduous for real people to access the content.

            • Lumisal@lemmy.world
              link
              fedilink
              arrow-up
              3
              ·
              14 days ago

              Instead of a tech solution, why not a legal one? Place somewhere in the website that refusal to follow your robots.txt is agreement to pay you x amount of money for your content. Then combine that with the dummy page solution the other person brought up so you can record the IP address, then take them to court so they pay you. Has potential to bring you a really really nice chunk of money.

              • AhdokOP
                link
                fedilink
                arrow-up
                4
                ·
                14 days ago

                I believe that there are multiple very high profile billion-dollar lawsuits being run against AI companies right now. I don’t really have the budget to sue these companies.

            • MouseKeyboard
              link
              fedilink
              arrow-up
              2
              ·
              12 days ago

              Honestly a better strategy might be to just glaze everything I draw.

              I doubt that will help, they can still scrape the site and then wait until whatever version of Glaze was applied is cracked.

        • Forester@yiffit.net
          link
          fedilink
          arrow-up
          15
          ·
          15 days ago

          Whoever their host is, they already appear to have some type of load balancing based on the four IPS. But I would also agree that a free cloudflare account does wonders for most WordPress users. But that’s probably mostly because it filters out a shitload of bots and known bad actors. Just make sure you set up your origin certificates if you use a cloudflare account.

          • AhdokOP
            link
            fedilink
            arrow-up
            12
            ·
            15 days ago

            We are already using cloudflare.

              • AhdokOP
                link
                fedilink
                arrow-up
                4
                ·
                edit-2
                14 days ago

                I shall mention it to my web admin

                (Edit): Bot fight mode breaks wordpress’ commenting system, which ideally I’d like to preserve.

    • Godort@lemm.ee
      link
      fedilink
      arrow-up
      23
      ·
      15 days ago

      Welcome back. I love these comics. The website situation is super shitty, I wish you luck on that endless battle.

      I ended up just switching to your Tumblr page in my webcomic rotation to get around it.