Discover the Anyleads suite | Find emails, verify emails, install a chatbot, grow your business and more!.
blog

Is it legal to scrape data from LinkedIn?



Is it legal to scrape data from LinkedIn?


A recent ruling by the 9th Circuit Court of Appeals has upheld an earlier decision handed down by a lower court judge who dismissed a class action lawsuit filed against LinkedIn for allegedly violating users' privacy rights through its use of web scraping technology.  The case centered around whether or not a company can legally collect and sell information gathered from public online sources without permission from those sites.

hiQ Labs is a San Francisco-based startup whose primary product is a software application called ScraperWiki which automates the process of gathering publicly available data from various internet platforms including Twitter, Facebook, Instagram, Wikipedia, Yelp, Amazon and more. The tool was first introduced as part of a Kickstarter campaign back in 2013, but since then, there's been no shortage of controversy surrounding how exactly the site works -- specifically with regards to the way it collects and sells private information about individuals.  In fact, LinkedIn recently sued hiQ over allegations that the scraper violated their Terms of Service (TOS).

On July 20, 2018, a federal district court judge sided with LinkedIn, dismissing the lawsuit altogether on grounds that scraping isn't actually illegal under US law. But now, after reviewing the matter at hand, the 9th Circuit Court of Appeals has reversed this decision. According to the appellate court's opinion, "the District Court erred when it held that, because LinkedIn had a TOS prohibiting automated access to its site, hiQ could be found liable for engaging in 'automatic access.'”

LinkedIn appealed the dismissal of the suit, arguing that even though they prohibit scraping, their TOS doesn't apply to outside parties like hiQ. However, according to the appellate panel, this argument failed to hold up against scrutiny. In addition to upholding the initial verdict, the court also went so far as to rule that "scraping" should be defined as "any automatic retrieval of information from a computer database."

This means that if you visit any one of these pages using your browser, all of the content will show up in your own private window. If you were to copy the URL, paste it into another tab, and navigate to the page again, the same thing would happen. This is what makes LinkedIn's TOS so problematic - it only applies to companies and employees accessing the site directly through an employee account, rather than third parties. As such, the website itself cannot claim ownership of anything scraped off of it.

Does LinkedIn have a data API?

Scraping allows us to gather large amounts of information much faster than we otherwise might be able to do ourselves. It's often used in situations where you're looking for specific data points to answer questions related to business analysis. For example, let's say you wanted to know the number of registered patents per state, or perhaps the total revenue generated by each restaurant chain within a certain radius of a given location. These are difficult numbers to find out yourself unless you spend countless hours doing research. Instead, you can simply install a web scraping app like ScraperWiki, log onto LinkedIn, and click a button to begin collecting data from every profile in an instant.

While it may seem obvious that web scraping is both harmless and beneficial, it's important to keep in mind that many people don't realize that their personal data is being collected and sold by some of the biggest tech giants around. There are plenty of services that provide similar functionality, but none quite come close to offering the comprehensive coverage provided by LinkedIn. For instance, while Google provides free search results based upon keywords entered, LinkedIn gives you unfiltered data from millions of different profiles across the globe. You'll never get anywhere near that level of detail via other methods.

As mentioned above, LinkedIn does offer a paid subscription service called Linkedin Premium. This service costs $25/month and includes additional features like advanced analytics tools, custom reports, enhanced messaging capabilities and more. While this option certainly comes with advantages, it's worth noting that most of the benefits offered by premium accounts aren't accessible through regular browsing activities. Instead, you must sign up for a new account and request access to the feature set in order to see them.

It seems strange that LinkedIn would restrict access to their platform in this manner, especially considering the vast amount of useful information that can be gleaned from scraping the website. After all, why wouldn't they want anyone to take advantage of the valuable insights contained therein? Is it possible that they just didn't think ahead enough before launching the program? Or did they foresee potential lawsuits coming down the road and decided to preemptively lock things down beforehand? We won't ever truly know the truth behind the situation, but I'm sure we'll continue to hear more stories like this until someone comes forward to shed light on the issue.

What is scraping and why does it matter?

In short, scraping involves using automated software programs to extract data from websites. The goal of scraping is to gather as much relevant information about a topic or product as possible.

Scraping allows marketers and businesses alike to gain valuable insights into how consumers are interacting with their products. It also provides them with a more accurate understanding of what people want to buy, which helps companies make better decisions regarding pricing strategies, marketing campaigns, sales promotions, etc.

When combined with other forms of social media analysis (such as sentiment analysis), scraped content can be used to identify consumer trends and patterns, allowing businesses to target specific segments of customers based on demographic characteristics such as age group, gender, income level, education level, etc. This leads to increased efficiency when making business decisions because you know exactly where your money should go.

Although there have been many debates over the years regarding the legality of scraping, the issue at hand here isn't necessarily whether or not it's illegal -- rather, if it's ethical. In fact, some argue that scraping is actually beneficial to society since it allows consumers to access useful information they wouldn't otherwise have access to.

This brings us back to the question at hand.

"Should I worry about my personal privacy being violated?" If so, then you may need to consider scrapping yourself.



Why do we care about scraping?

There are several reasons why we might choose to scrape publicly available data instead of simply accessing it directly ourselves. For example, most search engines don't index certain types of websites due to copyright issues (e.g., Wikipedia). As a result, these websites remain inaccessible unless someone else indexes them first.

A federal judge on Monday sided with hiQ Labs and its founder, Brian Hogan, against LinkedIn’s request for an injunction preventing the scraper from gathering publicly accessible information about users who have not granted permission for their profiles to be collected by third-party services like hiQ Labs. 

The ruling comes after LinkedIn filed a suit last year against hiQ Labs and Hogan, saying they violated its Terms Of Service (TOS) when they built a tool which automatically collects public information about users without their consent. The TOS states that “you agree to use your LinkedIn account solely as intended through our website or apps…and only under these conditions.” 

HiQ Labs argued that it was collecting data from LinkedIn because it had no other way to gather the same type of information, and therefore did not violate any rules set forth by the social network. The Court agreed, stating that since there were alternative ways to obtain the same information, Hiq Labs did not breach the TOS agreement. However, the Appeals Court disagreed, concluding that while it could find no evidence that LinkedIn itself had solicited hiQ Labs’ business model, it still found that the two companies shared common interests.  This means that if LinkedIn ever seeks to sue another developer over similar practices in the future, it may face the possibility of violating anti-SLAPP laws, which prevent lawsuits being used as a form of censorship. This case also sets precedent regarding how much power LinkedIn has to stop developers from scraping its content, and whether those developers can even do so at all.

Can I collect data from LinkedIn?

No. As stated in the Terms of Use and Privacy Policy, LinkedIn does not permit the collection of "personally identifiable information" such as names, addresses, phone numbers, email address, etc., unless explicitly permitted by the user. If you are interested in accessing only non-publicly viewable information, then you should contact LinkedIn directly.

Can you get data from LinkedIn?

LinkedIn provides several APIs to access its service, including one called People API which allows developers to build applications based off of people's personal information. While some developers might argue that using the People API violates LinkedIn's TOS, others disagree. For example, Facebook recently sued Parscale Data Solutions LLC for scraping its platform, claiming that doing so amounted to unauthorized usage of copyrighted material. In response, Parscale simply said that it didn't need to ask for anyone's permission to access the site.

In fact, many developers say that LinkedIn doesn't actually care what happens with your data once they've given it up—the problem arises when they attempt to make money off of it. It's a bit like how Spotify gives away music but charges for streaming, and Netflix lets you watch movies online for free, but requires payment for TV shows and films. When developers try to monetize their app, they run into issues with copyright infringement and intellectual property law.

What data is available from LinkedIn?

Like most websites, LinkedIn offers various types of data depending on the permissions each individual grants to LinkedIn. Here's a breakdown of what data is available:

Profile Information - Things like job title, location, education history, skillset, work experience, etc.

Public Profile Link - A link to a person's public page, where everyone can see the person's name, photo, bio, etc.

Company Page - Links to a person's company pages, where everyone can see the person's name, photo, biography, etc.

Job Postings - Lists of openings, positions, and opportunities posted by employers or recruiters.

Connections - Listing of connections between individuals.

Research Reports - Documents published by LinkedIn Research Services.

News Feed - News feeds containing status updates, articles, videos, photos, etc.

Forums - Forums created by members themselves.

Other Content - Anything else that isn't listed above.

If someone wants to access more than just public info, they will likely require additional permission from LinkedIn. Permission requests range from $100-$1000 USD per month, although some larger businesses pay higher fees.

Can you scrape LinkedIn profile?

Yes! Scraping LinkedIn profiles is perfectly acceptable. There are countless libraries out there that let you pull specific pieces of information from a LinkedIn profile:

scrape_linkedin : Python library for extracting LinkedIn profile information in JSON format.

Scrapery : Ruby gem for pulling links to external sites, images, documents, etc. from a LinkedIn profile.

Python Scrapy Library for Python: Automatically extract data from HTML files.

LinkedIn Extractor: JavaScript library for extracting data from a LinkedIn profile.

But before you start using any of them, keep in mind that LinkedIn won't allow you to save private data in plain text, meaning you'll have to remove the identifying details from the results. And if you plan on running this kind of activity commercially, be sure to check out LinkedIn's licensing terms first.

Here's everything you need to know about LinkedIn in five easy steps.

What is scraping and why does it matter?

Scraping involves taking the contents of webpages off site (i.e., outside the domain where they are hosted) using automated software or applications.

In most cases, you can find the source code online if you want to learn how these programs work. In some cases you will need special permissions to access the source code.

The purpose of scraping websites is to collect specific pieces of content which may include text, images, video, audio files, HTML codes and links. These pieces of content are then stored into databases.

There are two main types of scrapers: crawlers and spiders.

Crawlers crawl through a website looking for new pages to visit while spiders only look at previously visited URLs.

Most popularly used tools for scraping websites include ScraperWiki, SciTools and  RegexBuddy.

Why would anyone use a tool such as hiQ Labs?

hiQ Labs provides a way to search public LinkedIn profiles without having to fill out lengthy forms or providing your personal details.

It collects all publicly visible profile data including job titles, contact information, industry affiliations and education history.

It also includes information such as current employer, years working for previous employers and educational institutions attended.

Other reasons

LinkedIn has been sued multiple times before over similar issues related to scraping activities. However, none of those lawsuits were successful because the courts determined that LinkedIn did not actually own any of the private information being gathered by these companies.</

A federal appellate court on Monday ruled against LinkedIn and upheld the verdict from a lower court which found that scraping data from its site was not illegal under U.S. copyright law. The ruling comes amid an ongoing battle between LinkedIn and a startup called HiQ Labs over whether or not the latter's automated tool violated LinkedIn's Terms of Service (TOS).  The decision will likely set precedent across all other websites with similar TOS policies. Here's what we know so far about how the case came down, why the judge decided against LinkedIn, and what it means going forward.

Is data scraping LinkedIn legal?

LinkedIn has been fighting HiQ Labs since May 2016 when the former sued the latter alleging that the latter had illegally accessed profile information using a software program called LinkExtractor. In response, HiQ Labs filed a motion to dismiss, arguing that their use of the software did not violate LinkedIn's TOS because they were only accessing publicly available data without violating any intellectual property rights.

U.S. District Judge William Alsup disagreed, agreeing with LinkedIn's argument that the data scraper was "a derivative work," meaning that it contained elements taken directly from LinkedIn’s website. He also pointed out that scraping data violates LinkedIn’s TOS, which states that users must provide written consent before allowing third-party applications access to their account.

On June 28th 2017, following months of litigation, the parties reached a settlement agreement where LinkedIn would drop its claim but HiQ Labs would have to pay $400,000 in damages as well as cover LinkedIn's attorney fees if there is further appeal.  Although both sides agreed to the deal, LinkedIn still appealed the decision while HiQ Labs moved ahead with plans to build upon their software. On October 11th 2018, the United States Court of Appeals for the Ninth Circuit heard oral arguments on the case, after which it issued its judgment on March 12th 2019.  In a 2-1 vote, the judges affirmed the lower court's findings and held that the data scraper was indeed a violation of LinkedIn’s TOS even though the data itself wasn't protected by copyright laws.

Data scraping refers to pulling data off a public website through automation rather than manually entering URLs into search engines like Google or Bing. There are many different ways to do this, including using APIs, browser extensions, or tools such as Scrapy. Data scraping can be used for good purposes -- to gather data for research projects, for example -- but can also lead to issues around IP theft. That's especially true when companies don't want to share their own data with others in order to protect their business interests.

If your employer wants to collect data on employees' social media accounts, then it may ask you to sign away those rights, but most people wouldn't realize they're giving up these rights unless someone points them out. It's possible to take advantage of this lack of knowledge by searching online for free data sources. For instance, you could use a tool like ScraperWiki to find open datasets related to specific topics -- including ones related to education, health insurance, or employment -- and create a spreadsheet with relevant information. Then, when you send this spreadsheet back to your boss, you'll probably receive some strange looks.

It's important to note that although scraping might seem harmless at first glance, it's often done without permission and can result in penalties depending on who owns the content being scraped. In the case of LinkedIn, this meant that the scrapper didn't have permission to copy certain parts of the website onto his computer. However, he argued that this was fair use, pointing to the fact that the data was already freely accessible on the internet. But the judges said no to this idea, noting that scraping isn't inherently okay just because something exists somewhere else. Instead, courts look at three factors: 1) Whether or not the original author intended to make the data public, 2) How much the data is copied compared to the original, and 3) If the data owner benefits financially from the use of the data.

Since the majority of cases dealing with scraping involve copyrighted material, it makes sense that there aren't clear guidelines around scraping non-copyrighted materials. And although it seems like scraping would be considered acceptable in general, cases involving financial data tend to be more complicated. One thing that can help determine legality here is the purpose for which the data is being scraped. Is it solely for personal interest? Or does it have a business purpose behind it? If scraping doesn't serve a business purpose, then chances are that it won't be viewed favorably by the courts.



Can you get in trouble for scraping a website?

Yes! As mentioned above, scraping isn't always seen as being in line with copyright law. This is particularly true when it involves collecting data that belongs to another person and making money off of it. In the case of LinkedIn v. HiQ Labs, one question that the court needed to answer was whether or not the data scrapper had made enough profit from the data to justify his actions. At trial, the evidence showed that the scraper earned approximately $100 per month by selling the data collected from his scraping project. So, yes, it appeared that he profited from the collection of data, but according to the court, the amount of profits weren't high enough to warrant a criminal prosecution.

Of course, there are exceptions to this rule. A recent case involved a man named Ryan Shapiro who created a Chrome extension called LastPass AutoFill. The extension allowed him to fill forms on multiple sites automatically based on his username and password. While this type of feature is generally helpful, it became problematic when LastPass began using the same technique to pull login credentials from various services like Gmail and Facebook. Because LastPass couldn't keep track of which passwords belonged to which sites, they ended up sending passwords to unauthorized locations. When users complained, LastPass fixed the problem and changed their auto login process, causing confusion among customers.

Shapiro faced two charges: unauthorized interception/decryption of electronic communications and wire fraud. Both stemmed from the same incident and resulted in a guilty plea. According to the New York Times, “[h]e admitted to stealing thousands of usernames and passwords belonging to customers of major Web firms, including Amazon, Twitter, Apple, Netflix and PayPal."

One potential difference between this situation and similar ones that involve scraping is that it appears that LastPass never asked consumers for explicit approval prior to storing their data. Thus, unlike LinkedIn's TOS, it didn't explicitly state that users should give their authorization to allow apps to store their data. Still, the fact remains that this kind of behavior is frowned upon by the courts. Another issue that the court looked at was whether or not LastPass benefited financially from the stolen data. Although the company received compensation from advertisers, it failed to show that this benefit outweighed the harm caused to the consumer base.  Ultimately, it seemed unlikely that Shapiro would face jail time simply because he chose to steal peoples' data rather than wait for them to willingly hand it over.

Another interesting point worth mentioning is that the Supreme Court recently overturned a previous ruling regarding the use of cookies. In 2011, the justices struck down a California ban on Internet tracking due to the First Amendment implications of limiting speech. They wrote that "[t]he right to speak includes 'the right to communicate one's views... however obnoxious they may be.'" In light of this new precedent, it's unclear whether or not the court would view scraping as similarly protected.

Can you scrape data from any website legally?

There are plenty of instances where scraping data isn't technically illegal, but doing so can come with consequences. To avoid any problems, try to stick within the confines of the law. If you're working on a school assignment, then maybe you're able to scrape data from a particular database without having to worry about getting caught. However, if you're looking to sell the results of your scraping activity, then you need to ensure that you follow appropriate procedures. Make sure you've got proper permissions from the owners of the data and that you're willing to abide by whatever rules and regulations are put in place.

If you're interested in learning more about scraping, check out our guide to building a simple Python web crawler. You'll learn everything you need to start gathering data from the web today.


Author

Anyleads

San Francisco

We are the leading marketing automation platform serving more than 100,000 businesses daily. We operate in 3 countries, based in San Francisco, New York, Paris & London.

Join Anyleads to generate leads

Error! Impossible to register please verify the fields or the account already exists.. Error, domain not allowed. Error, use a business email. Welcome to the Anyleads experience!
More than +200 features to generate leads
Register to start generating leads

Create your account and start your 7 day free trial!

Error! Impossible to register please verify the fields or the account already exists.. Error, domain not allowed. Error, use a business email. Welcome to the Anyleads experience! By registering you agree to the Terms and conditions agreement.
More than +200 features to generate leads

We offer multiple products for your lead generation, discover them below!

>> Unlimited access to all products with one single licensecheck our pricing.