The long arm of the law may have finally caught up with Artificial Intelligence (AI) research company OpenAI. American newspaper The New York Times (also known as NY Times) recently updated their terms of service, prohibiting AI companies from using their content to train large language models, and is currently preparing a lawsuit that could see OpenAI pay a crippling $150,000 PER piece of infringed content if their case is won.
This development follows a series of legal challenges brought forth by creators across various fields, who have taken a proactive stance to safeguard their livelihoods. One high profile case involved comedian Sarah Silverman, who asserted that ChatGPT's ability to accurately summarise her entire 2010 memoir "The Bedwetter" underscored the unauthorised use of her work for financial gain in the absence “consent, credit, or compensation”. devoid of consent, recognition, or compensation.
But NY Times is a whole other beast compared to independent creators. Founded in 1851, NY Times boasts a rich heritage and global readership that expands far beyond its nearly 10 million combined paid and digital subscribers. OpenAI’s response to these allegations thus far appear to hint at a guilty conscience. They, alongside other prominent AI developing companies such as Google, Meta, and Microsoft, have ceased disclosing data that their AI models are trained on. Even so, AI researchers from ByteDance still found instances of ChatGPT generating copyright material — including near exact duplicates of excerpts from the Harry Potter book series.
The reaction from creatives is understandable. If found to be guilty, OpenAI could essentially be profiting off the labour of journalists, writers, and photographers by scraping the web for freshly posted content and regenerating it on the ChatGPT interface, essentially driving away potential viewership from would-be readers of NY Times.
In fact, creatives across the board who have spent entire lifetimes honing their craft expect their jobs to be impacted alongside over 300 million other jobs to some extent, all while OpenAI generates a total venture capital value of $11.3 billion at their expense. So far, it appears that the NY Times and other creatives do have a case on their hands, since OpenAI would have no way of generating timely news content if it wasn’t freely available online in the first place.
As a result, calls for artist compensation have also been growing louder. In fact, OpenAI is already rewarding some companies for contributing to their dataset. The Associated Press (AP), for one, has granted OpenAI a licence to access its extensive news story catalogue that dates back to 1985. This isn’t the first time a tech company has coughed up money to access data, either. Both Google and Facebook have been reported to pay news sites for that very purpose. Unfortunately, the same cannot be said for OpenAI’s legal opponents.
Copyright infringement occurs whenever there is unauthorised use of another person’s work. While there are nuances to consider, such as the difference between being inspired by a piece of work and plagiarising it altogether, once use is deemed to be unauthorised, simply acknowledging the original source will not solve the problem.
The only foolproof way to avoid copyright infringement is to seek approval (written is always best) from the original copyright holder. However, the catch is that there’s no way for the average user to know all of the sources that an AI tool is trained on, nor can find out if the AI content they’ve generated is an aggregation of a training dataset or a verbatim copy of work that already exists.
While AI companies can confine their generative AI datasets to works that are solely within public domain, it's undeniable that popular tools such as ChatGPT appear to be in clear violation of this clause. In fact, most AI companies appear to be passing the buck of responsibility down to the average user by stating that all users must be aware of copyright risks within the licensing agreements.
But what does all this information mean for the average corporate worker who may find generative AI increasingly integrated into day-to-day work? According to experts in AI ethics and law, anyone who uses ChatGPT’s outputs for their own purposes may be on the hook for costly lawsuits as long as said content violates another person’s Intellectual Property (IP) rights.
While this may not happen to the average person who does not garner much attention online, this is the internet that we’re talking about, where overnight viral sensations are the norm. Any AI generated image, blog post, or status update uploaded onto personal social media accounts may garner just enough attention to alert the original creators who may be inspired to send a copyright infringement notice to generative AI users who would be none the wiser.
The outcome of ongoing legal cases could potentially reshape the future of OpenAI and the field of generative AI, but as recent years have shown, the future is never set in stone. To understand how generative AI will develop and how we can (or cannot) use it to expedite our workflows, let’s take a closer look at how other AI companies are approaching the touchy topic of ethics.
In a landscape where artists and AI platforms constantly find themselves clashing, developers of Hadar AI are looking towards creating self-sustaining platforms that reward creative individuals for their contributions towards AI training datasets. The value of compensation is determined based on ‘quality’, which is in turn determined by the frequency with which contributed data is utilised in response to user API requests. In essence, creators whose contributions inspire outputs more frequently will be granted a larger share of the platform’s revenue.
Of course, the viability of Hadar AI’s model hinges on obtaining a balanced ratio of requests to data points. The developers navigate this challenge by embracing a pay-per-use microtransaction approach that differs from ChatGPT’s subscription model. While this approach is promising in theory, corporations tend to gravitate towards affordable alternatives. And at the moment, ChatGPT presents stiff competition to more sustainable AI models, with GPT3.5 being completely free to use while the enhanced GPT-4 is capepd at 50 messages every three hours.
In contrast to most AI companies that revise their language models based on user input and feedback, creators of Claude AI at Anthropic (who also happen to be former senior members of OpenAI), are pioneering an ethics-first methodology where the model is trained to critique and revise responses based on rules outlined within the “Claude’s Constitution”.
The constitution incorporates principles from the United Nations Declaration of Human Rights, Apple’s data privacy regulations, and other widely accepted ethics guidelines, and has proven successful in avoiding toxic, sexist, racist, or illegal content. While Anthropic acknowledges that their stringent model might come across as “judgemental” or “annoying”, the fact that Claude AI is able to summarise up to 75,000 words compared to ChatGPT which starts to struggle at 3,000 words is testament to how ethical guardrails don’t necessarily have to inhibit AI’s developmental potential.
You don’t need us to tell you that AI is a rapidly developing field, or that technology is still in its infancy. But what you can gain by following us on LinkedIn and Facebook are practical tips on how you can act on the latest tech trends to build a fulfilling career that suits you.
Singaporeans in tech firms find themselves fearing for job security. Meta recently announced plans to...
Competition for coveted tech jobs has never been steeper...
The job market has never been tighter...