According to the News Media Alliance (NMA), artificial intelligence (AI) developers are heavily dependent on illegally scraping copyrighted material from news publications and journalists to train their models. In a 77-page white paper and accompanying submission to the United States Copyright Office, the NMA claims that the data sets used to train AI models consist of a significant amount of news publisher content. As a result, AI models “copy and use publisher content in their outputs,” which infringes on copyright and puts news outlets in competition with these AI models.
The NMA emphasizes that many AI developers choose to scrape publisher content without permission, using it for model training and creating competing products in real-time. The group argues that while news publishers invest resources and take on risks, it is AI developers who reap the rewards in terms of users, data, brand creation, and advertising dollars. This leads to reduced revenues, employment opportunities, and strained relationships with viewers for news publishers.
To address these issues, the NMA recommends that the Copyright Office declare the use of a publication’s content to monetize AI systems as harmful to publishers. The group also calls for the implementation of various licensing models and transparency measures to restrict the ingestion of copyrighted materials. Additionally, the NMA suggests that the Copyright Office adopt measures to remove protected content from third-party websites.
While acknowledging the benefits of generative AI, the NMA states that publications and journalists can use AI for proofreading, idea generation, and search engine optimization. However, the methods used to train AI models have faced criticism, with several cases of copyright infringement claims reaching the courts. Comedian Sarah Silverman, for example, sued OpenAI and Meta in July, alleging that they used her copyrighted work without permission to train their AI systems.
OpenAI and Google have also faced separate class-action lawsuits related to the scraping of private user information from the internet. Google has promised to assume legal responsibility if its customers are accused of copyright infringement while using its generative AI products on Google Cloud and Workspace. However, Google’s Bard search tool is not covered by this legal protection. Both OpenAI and Google have yet to respond to requests for comment on the matter.
In conclusion, the NMA’s white paper highlights the issue of AI developers relying on illegally scraped copyrighted material from news publishers and journalists to train their models. The group calls for measures to protect publishers’ content, including licensing models and transparency measures. These recommendations aim to address the copyright infringement concerns and the negative impact on news publishers’ revenues and relationships with viewers.
Source link