What is Data Scraping; Is it legal? Who does it?

blog/DataScraping-blog-crop-v1.png

What is it?

“Data scraping is a technique in which a computer program [usually called a ‘bot’] extracts data from human-readable output coming from another program.” https://en.wikipedia.org/wiki/Data_scraping

According to the School of Data; it is the method of extracting data hidden in documents – such as Web Pages and PDFs to make it usable for further processing. There are scraper extensions you can get from the Chrome Store that help you get data into a spreadsheet format and anyone with a moderate understanding of programming can do this.

Is it legal?

It is still debatable and although it can be seen as stealing or a breach of copyright if bots are aggressively scraping content from your website, there have been cases that have been won and lost in court against scraping.

In the case of Craigslist vs 3Taps in 2013, 3Taps was found to have violated Craigslist’s terms of service and thus accessing the content illegally. Craigslist invoked a controversial anti-hacking law and a federal judge ruled that the “IP rotation technology” 3Taps used to disguise its identity in order to continue scraping data from the Craigslist site, was enough to violate this anti-hacking statute.

A lot of companies now put something in their Terms & Conditions agreement of their website to include scraping as a copyright infringement. So scraping could be considered illegal if it violates these T&Cs and could be deemed as 'accessing the content without authorisation', therefore being in violation of the 'Computer Fraud and Abuse Act', but just having something in your T&Cs does not necessarily mean it is a legally binding agreement.

In the case of blogs/main page content, if bots steal your content and post it elsewhere then it will become duplicate content, which hurts not only your SEO but also potentially your brand, if it’s used in the wrong way. You can quickly check if anyone has stolen/duplicated your content by using Copyscape.

Who does it?

But it can also be a good thing such as in the case of medical journalism to examine difficult and confusing data such as the financial ties between drug companies and doctors.

Others such as retailer or manufacturer websites scrape data to show price comparisons, companies scrape data on business reviews and profiles to track online presence and reputation on with social network sites, and organisations can scrape news websites or job adverts to give their clients more targeted information all in one place.

Scraping can be useful and even Google does it but it depends what you’re scraping, if you have permission to do so, and what you’re then doing with that data. If you feel you do not want your website to be scraped in anyway then it is worth adding in a clause to your Terms & Conditions and look for a secure and trusted company that provides software protection for your website against data scraping.

Blog written by Natalie Wiggins