As LLMs have become more ubiquitous in recent times, companies are looking to collect all the data they possibly can to train their own.
Zoom has updated its Terms of Service to be able to use all data collected from meetings on its software platform to be able to train AI models. “You consent to Zoom’s access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage of Service Generated Data for any purpose, to the extent and in the manner permitted under applicable Law, including for the purpose of product and service development, marketing, analytics, quality assurance, machine learning or artificial intelligence (including for the purposes of training and tuning of algorithms and models), training, testing, improvement of the Services, Software, or Zoom’s other products, services, and software, or any combination thereof, and as otherwise provided in this Agreement,” Zoom’s updated Terms of Service now says.
It appears that the line about Machine Learning and Artificial Intelligence was added on 2nd April this year. In the web archive for the page 1st April, the line is absent, but it appears in the web archive for 2nd April 2023.
Zoom isn’t the only company that appears to have realized the value of user data since the LLM revolution came into being. Around the same time, Reddit had announced that it would begin charging for data access. All the way back in February, Twitter had also said that it would impose more restrictions on how its data was used, and begin charging users to use its data.
It’s perhaps for good reason that these comapnies are looking to either protect their data, or, like Zoom, collect as much data as they possibly can. As LLMs have become more commoditized — open-source LLMs are now competing with state-of-the-art closed models from just a few months ago — what can really set them apart is data that is used to train them. Companies like Reddit and Twitter have text data collected over the years, and it can be a treasure trove for training LLMs. Zoom is used by millions of users around the world, and if it can collect and store data from meetings, it can build some very powerful LLMs of its own.
But there will be concerns around whether users will be comfortable around letting Zoom use their data to train models. As far as most people can tell, Zoom didn’t prominently disclose this change when it was implemented in April, and quietly changed its Terms of Service. Also, the move will also call into question the privacy of meetings held on Zoom — Zoom appears to be storing meetings on its platform, and this data could conceivably be leaked, or demanded by governments as a part of investigations. Which just goes to show that as the AI revolution progresses, there might be an ever-greater price for obtaining user data — and could make privacy much harder as a result.