Data scraping is the original secret agent of the web. It is everywhere, surreptitiously and indiscriminately following the footprint across our digital lives and gathering the associated data on a colossal scale, for purposes that are often unclear or even unknown. In its current form, this clandestine, invisible processing of our personal data effectively undermines every core principle of established privacy and data protection laws: to name a few, there is little or no transparency; no data minimisation; no control over further uses of our data; no control over onward transfers of that data; and no guarantees over how that data will be protected.
Data scraping is clearly necessary – Google use it to index websites, and most obviously, Large Language Models could not be trained without the vast data hoards that scraping provides – but, why, then, does the topic essentially remain hidden in plain sight, the unnamed elephant in the room that threatens to openly mock data protection and AI laws? This webinar will discuss this and the below key issues:
• How data scraping works as a practice and why is it needed.
• The privacy-AI regulatory impacts of data scraping.
• If data scraping, in its current form, can ever be considered lawful under existing privacy-AI laws.
• The public-private distinction and how data scraping trades on a misunderstanding of what it means for personal data to be public.
• How regulators are beginning to address the lawfulness of data scraping and the investigations underway.
• What solutions might work? Can data scraping be salvaged and made lawful? Should the law change or should data scraping change?