
There’s been a lot of sturm und drang on Facebo*k and elsewhere about an article in the Atlantic about the data used to train M*ta and other Large Language Models (LLMs). First in the artists’ pages and now the writers’ as well. AI art and AI stories already clogging up submissions at the magazines and such. Curiosity and ego got the better of me and I did the DB search thing and came up with this:
Yamada Monogatari: Demon Hunter, Yamada Monogatari: To Break the Demon Gate, Yamada Monogatari: The War God’s Son, and Yamada Monogatari: The Emperor in Shadow.
All present. The only one they missed was Yamada Monogatari: Troubled Spirits, the final collection.
The online outrage has been pretty intense, and on one level I understand completely. I do have a dog in this fight. No one asked our permission, and the fact that the text (albeit tokenized beyond human recognition) is there at all can be interpreted as an unauthorized publication. Stephen King and other prominent writers have already filed suit and the courts will have to deal with it. Of course I’m curious to see how that all shakes out, since we’re in mostly uncharted territory here with AI in general and this instance in particular. Can the ones assembling the material argue “fair use” or will the courts decide the writers’ IP rights were violated? Some legal clarity here would be nice, however it shakes out.
At this stage I confess to being too ambivalent to share the outrage. Concern, yes. On the one hand, we should have been asked. On the other, is it really publication? As the data is fed into the LLM it’s barely recognizable as words. On the other other hand, will plagiarism result? Here I’ve seen with my own experiments with ChatGPT to know it can happen, and sometimes the AI puts out work purporting to be original which is an almost word for word copy of something it was trained on. That needs to be addressed. Can that be prevented? Too many questions unanswered.
What I mostly want at this point is some answers to all the above. I’m willing to wait, assuming I have the option. This may take a while.
I hope you have the option.
Good to hear from you. Stay healthy.
Lifting phrases has gone on for a long time:
Rosy fingered dawn – Homer
Wonderful to tell – Virgil
Happily ever after – Brothers Grimm
A long time ago – Lucas most recently 🙂
and on.
And in the world of music
https://en.wikipedia.org/wiki/Music_plagiarism
Do grammar checkers ( Grammarly, etc. ) count?
It would be nice if there were some way to label
writing, or parts of writing, as AI generated.
Isn’t this a growing problem for college papers?