The evolution of data, privacy & IP: What you need to know in the age of AI
01 June 2023 11:00
Of all the issues posed by the growth of AI, the data privacy and IP implications are certainly among the most interesting. What data was the AI bot trained on? How is it used? How easily accessible is it by another party? What rights does a creator have if their work is being replicated or adapted without their consent? Questions abound in this space, and for many of them the answer is, “we’re not entirely sure just yet.”
Alec Christie is a Partner in the Digital Law team at Clyde & Co, and - as an authoritative voice on these issues for the better part of 30 years - has extensive experience across privacy, cyber, digital transformation, AI, Internet of Things, blockchain and cryptocurrency. “A lot of focus has been on the ethical and the quality issues around AI, while the data and privacy issues have not had a lot of attention except in specialist practices like mine,” he says. “At the core of the issue is the possibility of inadvertently disclosing client confidential information or other confidential business information without realising it.”
Alec says the allure of getting accurate information from increasingly intelligent AI systems is leading more people to disclose sensitive information to get more precise responses to their questions. Many are unaware that once asked, the question, information provided, and the answer are all retained by the AI engine for learning and to assist with future queries. This can lead to situations where, for example, an individual’s personal information is disclosed even though it’s unintended, or unrelated to the answer being sought.
“But you may also be on the other side of that, collecting personal information you've got no right to collect,” says Alec. “We've done tests where we've seen by putting in information and trying to work out who an anonymous person in a news report was, we got down to two people, and one of them was correct. So if you're recording that in your system, you’re recording personal information outside the bounds of the privacy law and therefore unlawfully. In that same scenario, you may be breaching non-disclosure or non-publication laws of the courts. So I’d say to lawyers out there, be really careful, and if you’re allowing team members to use it, quality and ethical issues aside, make sure they're drilled on not accidentally or inadvertently disclosing that personal or confidential information.”
These issues of disclosure and usage carry over to the intellectual property space, where debate is increasingly heating up around whether AI companies should pay to access the data they use to train their models. Lawsuits are already springing up to this effect, with businesses saying their IP (either in the form of data, products, or other ideas) has been infringed through its use as training data for an AI model.
“There are two aspects to this,” says Alec. “First, there’s a privacy aspect that comes into play when there’s personal information being disclosed. And there are specific rules around that - you can’t just scrape the internet for data because personal data needs to be handled a particular way, although there are plenty of overseas companies who don't adhere to those rules." He notes that anyone who sees this sort of behaviour occurring can lodge a complaint with the Privacy Commissioner.
“On the IP side, we don’t really have copyright law for data per se,” he continues. “Maybe if it’s in a database which has unique characteristics or a unique set of data, the copying of the whole database might be unlawful; but just ripping off bits like birthdates or other sorts of particular data doesn't attract copyright protection in this country. So you've really gotta go to the default and what I would say to everyone is the best mechanism, which is the contractual basis or the terms and conditions on the website.”
He notes that such protections can even apply to bots (or, more accurately, their owners) - where even a bot implicitly accepts the T&Cs of a website. If those T&Cs are well drafted and include a no scraping provision, for example, the site owner may have some legal recourse. By pushing these issues into the realm of contract law, and away from copyright law, much wider coverage is afforded. “If they've taken a huge slab of text, which is hard to prove in the AI space,” says Alec, “then you do have copyright protection for that. Many big media organisations operate this way - their terms and conditions for use of the website, materials, tool, or database specifically state no scraping, no copying, and no reuse. That forces the other party to come back and talk to you if they're legitimate and negotiate a licence fee."
And while businesses will no doubt be rushing to check and update their website’s T&Cs, this of course does not stop infringements from actually occurring. So what are the options available to businesses if they realise their data has been used to train AI without their consent, and it poses a risk to their business? “Where it's personal information, businesses should send the equivalent of a cease and desist for personal information or breach of privacy,” says Alec. “If it isn’t, then you can try to enforce your T&Cs, but outside of that, there's not really a lot you can do to protect yourself from this.”
As regulations continue to evolve with this rapidly changing technology across the world, it remains to be seen which way Australia will go - and which of the more advanced markets it will aim to follow, the US, EU/Asia, or the UK. For Alec, it’s about taking a smart and targeted approach that deals with the specific problems of AI, rather than a broad-brush law that changes the approach to the whole area. “We need to regulate the problems, because there's a whole lot in AI that's non problematic,” he says. “In some places, we can lean on existing laws like privacy laws - they’re not specific for AI, but they cover AI, so perhaps building in large fines for privacy breaches will drive up compliance. So I think it's educational. I think it's tweaking laws where necessary, but I think the first issue is to think about what the problems are.”
As AI continues to become more firmly entrenched in every aspect of our lives, there’s little doubt that issues around data privacy and IP will be ironed out - at least in the longer term - as regulators can establish clear guidelines that protect, without stifling innovation. More immediately, though, it seems there’s significant ambiguity around how these issues are treated under the law, how they can be remedied, and how more complex jurisdictional matters can be handled, given the increasingly global nature of business. With Generative AI now opening a new world of creatively-driven possibility, the natural conclusion would be that issues around ownership and reproduction will only accelerate, so this will be an interesting - and certainly complex - area to watch in the coming months and years.
This article is based on our AI Decoded podcast series. You can listen to the full episode here.
At LexisNexis®, our purpose is to advance the rule of law. This purpose guides our actions and so when developing solutions, we ensure that all of our developments are in line with RELX’s responsible AI principles. These principles ensure that our solutions develop in line with our values and maintain our stance as a thought leader in the market.
Linda Przhedetsky is an Associate Professor at the University of Technology Sydney’s Human Technology Institute. She’s a policymaker and PhD candidate specialising in the ethical development, regulation and use of artificial intelligence (AI), and her research focuses on the role of automated decision-making tools in competitive essential services markets and the development of effective regulatory solutions that prevent consumer harms, while simultaneously promoting innovation. So to say that she’s an authority on the ethical use of AI is, in many ways, an understatement.
Few technologies have generated as much hype across the legal industry as Generative AI. Open AI’s ChatGPT, Google’s Bard, and a host of new solutions being developed on Large Language Models (LLMs) are making waves across the world.