On July 1, 2023, Google reported changes to its Privacy Policy. These changes allow the company to use public information on the Internet to improve their artificial intelligence. However, a recent lawsuit claims that Google has been “data scraping” in secret for years. Data scraping involves “bots” copying and storing web page information. Many Google users are concerned about the copyright and personal privacy issues that have come to light.
Google’s new policy under “publicly accessible sources” states, “[W]e may collect information that’s publicly available online or from other public sources to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities. Or, if your business’s information appears on a website, we may index and display it on Google services.”
Just ten days after these new policy changes were released, Clarkson Law Firm launched a class action lawsuit against Google for allegedly harvesting their users’ personal data.
In the introduction of the case, it bluntly reads, “It has very recently come to light that Google has been secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans. Google has taken all our personal and professional information, our creative and copywritten works, our photographs, and even our emails—virtually the entirety of our digital footprint—and is using it to build commercial Artificial Intelligence (‘AI’) Products like ‘Bard’.”
The lawsuit also alleges that this update is not actually new: “For years, Google harvested this data in secret, without notice or consent from anyone.” However, Google’s general counsel, Halimah DeLaine Prado, responded in a statement to CNN by saying, “We’ve been clear for years that we use data from public sources — like information published to the open web and public datasets — to train the AI models behind services like Google Translate, responsibly and in line with our AI Principles.”
Google is vague about where it gets its data for Infiniset, but states that 50 percent of it comes from “public forums.” However, the company does not clearly describe what defines a “public forum,” which the lawsuit claims leaves users “in the dark about the exact origins and nature of the data influencing half of the AI’s training.”
Some of the data allegedly accessed is “conversational data between humans,” “[c]reative and expressive works” and, in at least one instance, photographs taken from a private medical file.
Comedian Sarah Silverman is also filing a similar lawsuit based on copyright infringement against OpenAI and Meta, so this topic is an emerging pattern.