OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
How do they get the data if its not purchased or freely available
It may be freely available for non-commercial works, eg. Photos on Photobucket, internet archive free book archives, etc.
Most everything is on the internet these days, copyrighted or not. I’m sure if I googled enough I could find the entire text of Harry Potter for free. I still haven’t purchased it, and technically it’s not legally freely available. But in training these models I guarantee they didn’t care where the data came from, just that it was data.
I’m against piracy as well for the record, but pretty much everything is available through torrenting and pirate sites at this point, copyright be damned.
Don’t care, that’s not mine or these LLMs problem they don’t secure their copyright. They shouldn’t come asking for others to pay for them not securing their data. I see it as a double edged sword.
I really hope this is a wake up call to all creative types to pack up and not use the internet like a street corner while they busk.
If they want to come online to contribute like everybody else. Just have fun and post stuff, that’s great. But all of them are no different then any other greedy corporation. They all want more toll roads. When they do make it and earn millions and get our attention they exploit it with more ads. It swallows all the free good content. Sites gear towards these rich creators. They lawyer up and sue everybody and everything that looks or sounds like them. We lose all our good spaces to them.
I hope the LLM allows regular people to shit post in peace finally.
creative types are greedy for wanting compensation for their creation? is a car mechanic greedy for wanting money for fixing your car?