|

Is scraping legal?

Scraping can be a divisive topic. In general, it is perceived as something slightly shady or illicit by people who are unfamiliar with it. Consequently, the legality of scraping is frequently questioned. This article will clear up some of the common misconceptions and break down the facts on legal scraping.

Disclaimer: We are not lawyers, and these comments are solely based on our personal and business experience. If you are in doubt about the legality of your use case, seek professional legal advice.

Yes, but …

The simple answer is that web scraping is legal, but like most things in life it is a little more complicated than that. Take driving a car for example, that too is legal. However, there are cases when driving a car would illegal. For instance, driving while under the influence of some drug or without a valid license is illegal. Scraping is similar, there is nothing inherently illicit about scraping, but there are some gray areas and even some practices which are not legal.

Generally, there are three aspects relevant when determining whether a specific case is illegal:

  1. The type of data being scraped
  2. How this data is used
  3. How the data was extracted

Types of data

Personal and copyrighted data are the main types to worry about when evaluating the legality of a use case.

Personal data

Personal data, technically known as personally identifiable information (PII), is data that can directly or indirectly identify a specific individual.

Every legal jurisdiction has different regulations governing personal data. In jurisdictions with the latest consumer privacy legislation (the EU, California, etc.), it is generally illegal for companies to obtain, store and/or use someone’s personal data without their consent or without having a lawful reason for doing so.

Types of personal data include:

  • Name
  • Email
  • Phone Number
  • Address
  • Username
  • IP Address
  • Date of Birth
  • Employment Info
  • Bank or Credit Card Info
  • Medical Data
  • Biometric Data

In the vast majority of cases (lead generation, sales intelligence, etc.), when scraping personal data from a website you don’t have the consent of the data owner (the person whose data you are scraping) to scrape their data, and it’s very hard to argue you have one of these lawful reasons to do so:

  • Consent – the data subject consented to us having their data.
  • Contract – the personal data is required for performance of a contract with the data subject.
  • Compliance – necessary for compliance with a legal obligation.
  • Vital Interest, Public Interest, or Official Authority – typically only applicable for state-run bodies where access to personal data is in the public’s interest.
  • Legitimate Interest – necessary for our legitimate interests.

As a result, scraping the personal data of a citizen of the EU or California could result in your web scraping being deemed illegal.

If you’re not extracting any personal data, or just the personal data of non-EU or Californian citizens, then you are likely safe to keep scraping.

Copyrighted data

The second type of data you need to be careful of scraping is copyrighted data.

Copyrighted data is data owned by businesses and individuals with explicit control over its reproduction and capture. Like the use of copyrighted images and songs, just because the data is publicly available on the internet doesn’t mean it is legal for it to be scraped without the owner’s consent. You could be infringing the owner’s copyright by scraping their data. This generally applies to the following types of web data:

  • Articles
  • Videos
  • Pictures
  • Stories
  • Music
  • Databases

Scraping copyrighted data itself isn’t illegal, it’s what you plan to do with the copyrighted data that could potentially make it so.

One person could scrape a copyrighted article and be perfectly legal to do so, however, someone else could scrape the same article and be found to have breached the owner’s copyright.

It really depends on how you plan to use the data after you’ve scraped the data.

Can you argue fair use? Instead of replicating the article in full, you plan to use snippets of the original article.

Can you argue that the data is factual, therefore not copyrightable? Facts like product names, prices, features, etc. aren’t covered by copyright laws, so can you argue the data you plan to scrape is factual in nature.

A trickier aspect to copyright law, however, is the issue of database rights. A database is an organized collection of materials that permits a user to search for and access individual pieces of information contained within the materials.

This means that it can be illegal to scrape a full database from the web and then reproduce it exactly for your own purposes.

Again, the US and the EU have different regulations around what constitutes a database and what legal protections they give to the database owner. So it is important to understand the rules and regulations for the legal jurisdictions you are scraping in.

The risks of infringing someone’s database rights can be mitigated by altering how the data is scraped and used. These two tips help ensure you’re conducting ethical data scraping with copyrighted data:

  • Only scrape some of the available data;
  • Do not replicate the organizational structure of the original database

Okay, so far we’ve covered what types of data can be illegal to scrape, and have seen how you plan to use the scraped data can affect its legality.

It’s pretty straightforward to determine if scraping personal or copyrighted data will make your web scraping illegal because there are clear laws that set out what is legal and what is illegal.

The legality of scraping itself

It’s pretty straightforward to determine if scraping personal or copyrighted data will make your web scraping illegal because there are clear laws that set out what is legal and what is illegal.

It gets a lot more tricky when it comes to the act of web scraping itself because no government has passed any law explicitly legalizing or de-legalizing web scraping. Instead, we have to go off the verdicts of lawsuits between web scrapers and website owners. Of which there are many:

  • Craigslist vs 3Taps
  • Ryanair vs PR Aviation
  • Facebook vs Power Ventures
  • HiQ vs LinkedIn

The main issue of all these cases is whether the Terms of Service listed on many websites that forbid web scraping (or automatic access) are legally enforceable. Of course, with websites that allow web scraping, there are no issues.

Although cases on the topic of web scraping have gone both ways, as of 2021 the courts are beginning to clarify the legality of data scraping for web scrapers.

The most recent of which, HiQ vs LinkedIn, found that scraping data from a website doesn’t violate anti-hacking laws as long as the data is public, and the scraper hasn’t explicitly agreed to the website’s terms and conditions in advance.

What this means is that so long as the data is publicly available on a website, and doesn’t require the web scraper to login and explicitly accept the terms of conditions of the website, the web scraper is within their right to scrape the publicly available data.

So how does this affect web scrapers?

If you are scraping a website, then you need to ask these questions to determine if it is legal or not:

Is the data publicly available? If the data isn’t hidden behind a login, then the website’s terms and conditions aren’t enforceable, so you can legally scrape the public data.

Do you need to create an account and login to access the data? If this is the case, then you need to examine the terms and conditions you agreed to when you created the account, because by agreeing to them, you made them legally enforceable.

A lot of websites include in their Terms and Conditions (that you agree to when you create an account with their site) that they forbid you to scrape content from their site. So as a rule of thumb, you should always assume that logging into a site and scraping is illegal unless you’ve examined their T&Cs.

Similar Posts