There have been many reported instances of hackers stealing personal data such as credit card details. Adobe and Equifax customers experienced this in 2013 and 2017 respectively. It’s clear that these breaches are illegal, and that the likely motivation is direct financial gain by fraudulent use of the data. Off the back of this, a nascent cyber security industry has developed, with companies offering products that stop breaches to keep the hackers out.
There’s a sticking point though, even in the absence of credit card or banking information, personal data as simple as an address or date of birth can have a value when collated into a large database with thousands of others. Technology has progressed sufficiently in recent years to a point where there is now no need to illegally hack into a privately owned database to gather this type of information because it’s publicly listed in many different forms. Individuals may be completely unaware of just how much information can be gleaned from their social media accounts, or more esoteric sources.
An interesting recent example of this came to light from a 3D virtual tour of a home for sale on the property platform Rightmove. Due to the pandemic, physical house viewings haven’t always been popular and to overcome this, and to keep the property market moving, the website introduced virtual tours of composite photographs allowing users to get a three-dimensional view of each room. One tour of a house in Devon inadvertently revealed detailed financial information about the current owners by simply zooming in on some of the photographs, including a dividend cheque, an insurance policy document and an invoice. Other information could also be gathered such as; the state of their health, based on asthma inhalers on display; their political views, based on visible reading material, and; the names of their pets, something people commonly use in passwords.
This might seem like a unique example, but it is possible to gather a lot of information using a process known as ‘data scraping’. This is where a computer program systematically searches publicly viewable pages of one or more websites and collects all the data it can on individuals. If the program is sophisticated enough, it can cross reference with data previously collected from other sources and can build up a rich picture of thousands of users. The practice is not explicitly illegal, although collecting the data itself, as well as using it for commercial purposes, is morally ambiguous at the very least. Lawmakers are significantly behind the curve on protecting private data, although the introduction of GDPR in 2018 has gone some way to addressing this imbalance, calling for companies to provide a reasonable level of protection for users’ data. The regulation has yet to be truly tested in the case of data scrapping public information.
Facebook recently announced that the data of over 530 million members was harvested in this way utilising a vulnerability in one of the tools open to users, allowing them to add friends on the platform from contact information such as a phone number. In response to an enquiry from a Belgian journalist, a Facebook representative mistakenly attached an internal email which stated, 'Longer term, though, we expect more scraping incidents and it's important to both frame this as a broad industry issue and normalise the fact that this activity happens regularly.’ In their official response, Facebook recommended that users undergo regular ‘privacy check-ups’ in an attempt to share some responsibility with their users. Facebook has come under increasing regulatory scrutiny in recent years and there are currently 15 ongoing investigations into the platform and its subsidiaries Instagram and WhatsApp.
A New York University research project developed a browser extension used by over 6,000 volunteers that allowed for the scraping of information from their Facebook feed. The accompanying study was aimed at understanding how Facebook were targeting users with political adverts, but those conducting the study received a cease and desist letter from Facebook to halt the project.
Outside academic research, the commercial opportunity that accompanies data scraping is significant. A data science company called HiQ utilised scraping techniques for several years to take data from LinkedIn user profiles. They used this publicly available data, alongside some quantitative analysis, to create two products aimed at employers. The first, called Keeper, was aimed at identifying trends in LinkedIn profiles in order to identify which employees the company was most at risk of losing through being recruited by rival firms. The second, Skill Mapper, provided a summary of the skills that existing staff members possessed in order to establish if there were opportunities to leverage their human capital in a more efficient and profitable manner. LinkedIn did not oppose this data scraping until they launched a tool very similar to Skill Mapper, at which time they issued a cease and desist to HiQ who responded in turn with a lawsuit. The case is currently working its way through the US courts.
As the digital world becomes a bigger part of our lives and our online presence grows, the fight for control over our data will intensify. Whilst the utilisation of personal data has been black and white in the past, the acceleration of computing power and ever-increasing sophistication of computer programming has created a grey area; one that could be worth a significant sum to the eventual victor.
Richard O'Sullivan, Investment Research Manager, RSMR