Discover the Anyleads suite | Find emails, verify emails, install a chatbot, grow your business and more!.
blog

Can LinkedIn data be scraped?



Can LinkedIn data be scraped?


A US appeals court on Monday rejected LinkedIn's request for an injunction against hiQ Labs, which uses automated software to gather information about millions of people without their permission. The ruling means the startup will not face any penalties or fines. 

hiQ has been building its business by gathering and selling publicly available information about individuals who have signed up for accounts at various social media platforms -- including Twitter, Facebook, YouTube, Google+, Pinterest, Reddit, LinkedIn, Tumblr, Snapchat, and others. It says these users are often unaware they're being profiled through third-party services like those mentioned above.

The firm also sells aggregated information such as job titles, education history, location, interests, skills, and more. In addition, it offers custom reports based on specific criteria (e.g., "the most popular college majors among women").

In February 2019, LinkedIn sued hiQ over allegations that the startup was violating the site's terms of service agreement when it used automatic software to collect profile details. LinkedIn claimed that doing so could put users' private information at risk, and that it would lose advertising revenue if hiQ continued with its practices.

LinkedIn's lawyers argued that using automation to access and analyze personal data violates the website's ToS because it requires users to grant explicit consent before sharing their data with third parties. HiQ responded that LinkedIn had no right to prevent them from accessing publicly available content.

On Tuesday, however, the 9th Circuit Court of Appeals found in favor of hiQ. According to the decision (.pdf), LinkedIn did not prove that it suffered irreparable harm if the startup kept scraping the platform while the case proceeded. And since there were "no facts suggesting that [HiQ] intends to use the data other than for internal purposes," the court concluded that the threat of future violations was too speculative.

"We’re thrilled to see this important precedent set for the internet," said hiQ CEO Jonathan Zittrain in a statement. "This decision sends a clear message that companies cannot claim ownership of our online lives."

LinkedIn spokesman Scott Gluck added that the company plans to appeal the decision to the Supreme Court.

"As we've stated all along, we believe LinkedIn owns your personal data and should control how it's shared," he told us via email. "While today’s decision does not affect our ongoing efforts to protect members’ rights and safeguard their data, we continue to make progress toward achieving those goals."

It depends.

Generally speaking, it is illegal to use automated programs to harvest or otherwise obtain data unless you own the property where it resides, according to the Electronic Frontier Foundation. But that doesn't mean every instance of Web scraping is necessarily a violation of law. For example, if you create a bot that crawls websites looking for product listings, then buys items off eBay with the resulting profit, that's perfectly acceptable under federal copyright laws.

And even though you may technically violate some sort of intellectual property law by copying someone else's work, courts rarely come down hard on anyone involved in the process. That's because fair use exemptions apply to many situations involving copyrighted material. So long as the purpose isn't commercial gain, you aren't liable for infringing another person's copyright.

That same logic applies to Web scraping. If you don't intend to monetize what you find, it's unlikely that any type of IP infringement occurred. As such, scrapping sites for research purposes generally falls into the realm of fair use. However, there are still certain instances where scraping might trigger liability.

For example, if you intentionally copy content from one page onto another, you may run afoul of the Digital Millennium Copyright Act (DMCA). This legislation states that unauthorized reproduction of protected digital works is punishable by civil damages, potentially reaching $150,000 per incident.

In short, you need to know exactly what you're doing before you start scraping anything. Otherwise, you'll likely come across problems.

Is Social Media scraping legal?

Yes. Most cases of scraping involve websites owned by large corporations rather than individual users. When businesses do get caught making copies of proprietary materials, they typically receive cease-and-desist letters. Afterward, they usually comply and settle out of court.

However, this isn't always the case. A few years ago, GitHub began cracking down on developers who violated its code license by downloading source files and posting them elsewhere. Many projects quickly complied, but some decided to fight back. They filed lawsuits alleging that GitHub's actions amounted to theft.

Ultimately, however, the judge sided with GitHub. He determined that downloading source code didn't constitute copyright infringement because it wasn't done for financial gain. Rather, it was intended to help build new features for open source projects.

So, if you want to keep scraping, just remember that you shouldn't try to profit from whatever you discover. Not only will it result in trouble, it probably won't accomplish much anyway.



Is data scraping ethical?

Sometimes yes. Sometimes no.

If you're simply trying to learn something interesting, it's fine to scrape public sources. There are plenty of ways to do so fairly safely. You don't need to break any rules or infringe upon anyone's copyrights. Just follow common sense guidelines: Never steal content. Don't post sensitive information. Make sure everything goes somewhere safe.

But if you plan on reselling that data, things become trickier. Even if you never intend to charge money for your findings, you still must abide by applicable restrictions. Some countries prohibit the sale of data, whereas others place strict limits on the number of times you can reuse it.

You might think that nothing prohibits you from selling scraped data in the United States, but that's not entirely true. While the Fair Credit Reporting Act (FCRA) allows consumers to buy credit scores once annually, it places strict limitations on how many times you can reissue those numbers. Your ability to purchase credit histories is also restricted by state consumer protection statutes.

These laws vary widely from state to state, but each has some form of limitation placed on the amount of times you can repackage previously purchased credit records. These restrictions exist largely due to fears surrounding identity fraud.

Many people assume that scraping data is inherently fraudulent. But that couldn't be further from the truth. Companies like Realtor.com regularly pay real estate agents to list homes on their websites, yet nobody accuses them of stealing customers. Why? Because everyone knows that agents already earn commissions from sales made directly by homebuyers.

Scraping merely makes that system slightly less efficient. Instead of buying ads, brokers now spend time combing through search results for prospects interested enough to visit a particular listing. Meanwhile, homeowners benefit from faster searches and easier navigation around the site.

There's really no reason to argue against innovation. Whether it comes from big tech giants or small startups, scrappers are playing a vital role in modern society. By automating tasks, they free up human workers to focus on higher value activities. That's good news for both businesses and consumers alike.

No. At least, not until recently.

When Instagram launched its Stories feature, the photo-sharing app introduced several changes designed to increase engagement within the app. One of those tweaks required users to share stories with friends in order for the story to remain visible. Before long, Instagram noticed that many users weren't taking advantage of this option.

To encourage users to participate, Instagram offered a promotion allowing them to view stories posted by others without having to send invites. The idea was simple: Users could browse photos uploaded by strangers instead of waiting for invitations to pop up in their feeds. Soon after, the company expanded the program beyond stories. Now, whenever you log into Instagram, you'll see random posts created by other users appear in your feed.

Instagram's goal behind launching the feature was to improve discovery and give users more opportunities to engage with the brand. But critics say the change undermined user privacy and posed security risks. On Wednesday, the Federal Trade Commission announced that it had opened an investigation into whether Instagram violated the FTC Act by collecting personally identifiable information without obtaining informed consent.

In response, Instagram changed course. Last week, it rolled back the feature and started requiring users to opt-in to viewing uninvited posts. And earlier this month, the company issued a blog post detailing why it chose to restrict the program in the first place.

According to Instagram, Stories originally allowed the company to better understand how frequently people interacted with the app. Without the feature, the team was forced to rely solely on metrics provided by analytics tools. Those measurements revealed that many users experienced fewer interactions during peak hours, resulting in lower average daily usage rates.

A federal appeals court on Monday upheld the right for companies to use automated programs and scripts to gather information about people who have posted their personal information online -- even if those users haven't given them permission to do so. The case involves LinkedIn, which has been sued by several individuals over allegations it's used its platform as an opportunity to harvest data without consent. The plaintiffs argued that scraping linkedin data violates the Federal Trade Commission Act (FTC) because it compromises consumers' privacy.

In response, LinkedIn said scrapping data was illegal under both Massachusetts and California law, and noted that doing so could lead to "unintended consequences" such as increased spamming or fraud. But the appeals court disagreed with these arguments, saying that while certain restrictions may apply when it comes to selling individual pieces of data, they don't necessarily apply to aggregated data. In other words, the court determined that scraping linkedin data isn't inherently harmful, but rather depends upon how someone uses the data once it's collected.

The ruling comes after HiQ Labs, the software development firm behind the controversial scraping tool Scrapely, won a preliminary injunction against LinkedIn last year. A lower court had previously dismissed the suit, arguing that scraping linkedin wasn't actually illegal since there were no laws prohibiting it, only rules governing how third parties should obtain user data. But the appeals court disagreed, noting that the FTC Act prohibits unauthorized access to protected consumer data. It also rejected LinkedIn's argument that scraping linkedin data would make it difficult to detect fraudulent activity.

LinkedIn announced in February 2020 that it'd stopped allowing any type of scraping, including "web-scraping," due to concerns about its impact on privacy. However, it didn't ban all forms of scraping entirely, instead stating that it only allowed scraping of publicly available data. This means that businesses and developers are still able to collect information about anyone who has made themselves visible through LinkedIn.

Here's what you need to know about scraping LinkedIn data before deciding whether you want to start using it yourself.

Does LinkedIn allow web scraping?

Yes, according to LinkedIn's website. There are some exceptions. For example, LinkedIn will not let you scrape your own profile page, content shared with others, or your "Public Profile." You're also required to register your account with a valid email address at least 30 days prior to attempting to scrape anything. And finally, LinkedIn does reserve the rights to revoke permissions at any time.

However, the terms of service for LinkedIn explicitly state that scraping is permissible, so long as you follow the guidelines listed above. That includes scraping "publically accessible pages" like job postings, headshots, publications, events, groups, profiles, recommendations, and more. If you think you've found something that falls into one of those categories, please reach out to us via our tips form  and we'll take a look.

How do I scrape data from ParseHub?

ParseHub offers a free API key for developers wanting to integrate the social network into their apps and websites. While the free version gives you limited access to the database, the paid tier allows you to pull down full profiles. You can sign up here.

How do you scrape LinkedIn posts using Python?

Scrapely, a popular scraping program developed by HiQ Labs, lets you search for specific keywords related to employment opportunities, then parse links within the text of each result. Once you locate relevant results, you can export the data from your browser directly into a Google Sheet. Here's how to install Scrapely:

1. Open Chrome or Firefox and go to chrome://extensions/

2. Click More tools " Developer Tools

3. Under Application you'll see Scrapely installed

4. To uninstall just click Disable extension

5. Then restart Chrome

6. Go back to the URL bar and enter https://www.linkedin.com

7. Scroll down until you see the section called Data Sources

8. Select the option labeled Get LinkedIN Jobs

9. Enter your desired job title and location and hit Search

10. When you find a link to a job posting, click View Job Posting

11. On the next screen select Export Links

12. Make sure you check off the box marked Include Headshot

13. Copy the resulting code and paste it into Scrapely

14. Hit Run Script and wait for the script to run

15. After the process finishes copy everything else and save it somewhere safe.

You can view a list of every post you've scraped by going to your Profile > Activity Log. Alternatively you can open up the spreadsheet where you saved the file by clicking File and selecting New Spreadsheet and pasting the contents of the CSV file.

How do I scrape my LinkedIn public profile?

If you already have a LinkedIn account, you can easily get started scraping your public profile. Simply visit your Account Settings page and scroll down to the bottom of the main screen. From there, click Edit Your Public Information and you'll see a number of options, including Personal Details, Education History, Work Experience, Skills & Endorsements, Groups, Recommendations, and Events.

For example, say you wanted to build a list of everyone who worked at Microsoft between 2005 and 2015. Just input your criteria into Scapely based on the fields you see below. Remember to choose the correct format depending on the field you're searching. For instance, if you're looking for names, click People Field Type and then select Name Format: Full First Last.

Once you've built your query, simply hit Run Script and wait for the results to populate. You can then either manually sort and filter the data, or export it to Excel for further analysis.

How do you scrape linkedin jobs using Python?

To automate the entire process, you'll first need to create a new project in GitHub, then clone the repo onto your machine. Next, navigate to the directory containing the project source files. Finally, execute the following command to compile the code:

python setup.py develop

Afterwards, you can launch Scrapely by typing:

python scrape_job.py

This will generate a report detailing the total number of jobs found and the top 5 employers mentioned. You can also import the resulting JSON file into another application of choice.

Is web scraping legal?

While scraping linkedin data is technically legal provided you stay within the bounds outlined above, many experts believe that it's unethical. One reason why is that by collecting large amounts of unstructured data, you risk compromising peoples' privacy. Another issue stems from the fact that scraping linkedin often requires third party services, which aren't always upfront about how exactly they're handling data.

As far as legal issues go, though, most courts agree that scraping data doesn't violate any current legislation unless it causes harm. Some argue that scraping data is akin to accessing copyrighted material for noncommercial purposes, which is generally acceptable. Others disagree, citing cases where companies have tried to charge customers fees based on the amount of data they've accessed.

Regardless of the legality of scraping linkedin data, it remains to be seen how the Supreme Court will rule in a potential case involving Facebook. According to Bloomberg News, the justices will review a decision by the D.C. Circuit Court of Appeals that states it's OK for tech firms to use bots to mine customer data. The high court will decide whether to hear the case during its October term.

The 9th Circuit Court of Appeals has issued an opinion and order ruling against the social network's claim that allowing users to automate their own scraping violates its terms of service. The decision upholds a lower-court judge’s finding last year that it did not violate any laws or regulations when it allowed HiQ Laboratories to collect information from LinkedIn.

HiQ Labs had filed a complaint with the US District Court for Northern California after the tech giant sued them for allegedly violating the Computer Fraud & Abuse Act (CFAA) by using automated scripts to access the site without authorization.

LinkedIn claimed that such activities violated its Terms of Use, which prohibit "automated means" of accessing its website. It also said that scrapping could harm its reputation if people believed they were being spied upon. However, Judge William Alsup disagreed, saying that there was no evidence that scraping harmed the network's reputation. He added that while scraping may have been technically illegal, it didn't rise to the level of criminal activity under CFAA.

Although LinkedIn won the initial case, it appealed the ruling to prevent further lawsuits from other companies that are interested in scraping LinkedIn data — including Google, Facebook, Twitter, and others. In March 2020, the appellate court agreed that scraping does not constitute a violation of the CFAA, but noted that more research would need to take place before determining whether other types of scraping might qualify as a violation.

In his opinion, Judge Alex Kozinski wrote that he found “no reason why scraping should be treated differently than other Internet uses that expose us all to risk of invasion of our privacy."

He continued: “If we cannot trust ourselves to refrain from abusing others' personal information online, then who will?"

Can I scrape data from LinkedIn?

You generally don’t have to worry about scraping LinkedIn unless you plan to use your findings for commercial purposes. As long as you're simply gathering publicly available information, like job listings, it's perfectly fine. You can even share links to these collections with friends and family members, provided that they give you permission first. Scraping LinkedIn isn't something you'd want to undertake yourself though. Instead, consider outsourcing the work to someone else who specializes in doing so legally.

But what happens when you're looking at LinkedIn accounts belonging to former employees of your company? How about when you're trying to find out how many people have viewed certain pages within the platform? Or maybe you just want to know how often your competitors post new updates. Those queries aren't explicitly forbidden by LinkedIn's ToS, but you shouldn't expect much help from the company itself.

In 2017, LinkedIn started cracking down on bots and scrapers, warning users that it reserves the right to remove content and ban anyone caught using automation tools. While some have interpreted those measures as a signal that LinkedIn wants to protect its existing users from competition, LinkedIn CEO Jeff Weiner told Bloomberg in 2018 that the crackdown wasn't meant to keep away newcomers.

Instead, he explained that LinkedIn wanted to make sure that only legitimate users have access to its valuable data. But since most people still prefer to manually search through the site rather than rely on software, that hasn't stopped thousands of developers around the world from building apps designed to extract useful insights.

One example of a tool that leverages LinkedIn data is the popular JobScout app, which lets employers track applicants based on the specific skills they list on their profile. Another is RecruiterBot, which allows job seekers to apply directly through LinkedIn instead of sending resumes via email.

Many of these services focus exclusively on LinkedIn because it offers the highest amount of data compared to competing platforms. For instance, LinkedIn provides detailed information about each person listed on the resume, along with their contact details, education history, employment history, interests, groups they belong to, and much more. Other sites, like Indeed, offer similar features but limit themselves to aggregating job postings.

How do I scrape LinkedIn leads?

As mentioned above, LinkedIn doesn't seem too concerned with enforcing its rules regarding scraping. So, you probably shouldn't get worried when you see messages telling you to stop using automated systems to gather information from its site. On the flip side, however, it's possible that a human employee might catch wind of your efforts and decide to shut off your account or block your IP address. This happened to one developer named Ben Phelan back in 2016, who says he got into trouble for scraping the site and sharing his results with other developers.

Phelan eventually settled with LinkedIn and promised never to repeat his actions again. But it seems that the company is less forgiving when it comes to scraping. At least three times between 2015 and 2019, LinkedIn sent cease-and-desist letters asking developers to either delete their apps or stop collecting data. One of these cases involved a bot called LinkUprater, an open source project developed by researchers at Cornell University. They used the program to create a database containing millions of records collected over several years from LinkedIn profiles. After receiving LinkedIn's letter, the team decided to take down their code.

Another developer, David Bongardt, faced similar problems. His app, LinkedIn Extractor, was initially approved by LinkedIn and given a green light to continue operating until April 2023. Then, shortly afterward, LinkedIn asked him to turn off the app and stop extracting data. He complied immediately. A few months later, in November 2022, LinkedIn revoked the approval and demanded that he delete all data extracted up to that point. Again, he removed the offending files.

Bongardt believes that LinkedIn blocked his application because he relied heavily on scraping. Since the app didn't actually provide any value to LinkedIn users, the company felt threatened by it.



How do you view LinkedIn in public mode?

Linkedin's Public Profile feature enables you to browse the site without having to log in. If you visit the page, you'll notice that the top portion of the screen contains the usual About section, followed by a series of tabs that allow you to explore different parts of your profile. Each tab displays additional sections of your profile, including Education, Experience, Groups, Skills, Jobs, Connections, etc., depending on where you click.

There's another way to navigate LinkedIn's public profile, thanks to a Chrome extension called LinkedIn Viewer. When installed, the tool opens a small window next to every web browser tab you've opened. All you have to do is hover the cursor over the toolbar icon and select the option labeled "View My LinkedIn". That's it! No typing required.

While viewing your LinkedIn profile in this manner, you'll notice that the layout changes slightly. Some elements appear grayed out, while others disappear altogether. And unlike normal browsing, you can't scroll down your timeline or click on any other part of the page.

How do you scrub on LinkedIn?

Like many websites nowadays, LinkedIn employs artificial intelligence to identify potential threats. Its AI system works 24/7 to scan billions of posts, images, videos, and comments posted across the entire internet. Once the system detects suspicious behavior, it sends alerts to humans tasked with reviewing flagged items.

For now, LinkedIn appears to be very good at identifying spammy content. An analysis conducted by security firm White Ops in May 2020 concluded that 99 percent of reported content contained at least one sign of malicious intent. But it's unclear exactly how well LinkedIn spots false profiles. According to LinkedIn's guidelines, the social network accepts fake profiles created solely to gain visibility among potential customers. Still, many experts believe that LinkedIn's algorithms fail to spot real identities behind these fakes.

This problem isn't limited to LinkedIn. Similar concerns surround Facebook, Instagram, YouTube, Twitter, Reddit, Slack, Discord, TikTok, and virtually every other major social media platform.

If you're worried about getting banned from the platform, you might want to avoid posting anything controversial. Or perhaps you should stick to posting memes or funny GIFs. Anything political, especially related to Donald Trump, is likely to land you in hot water.

Similarly, you should avoid disclosing sensitive information, like credit card numbers, phone numbers, addresses, birth dates, driver licenses, passports, Social Security Numbers, passwords, or medical conditions. Otherwise, you run the risk of being blacklisted and unable to login to your account ever again.

And speaking of logging in, remember that LinkedIn requires you to enter your password whenever you try to update your status or check your notifications. Be careful not to reveal any hints that could lead somebody to your actual username.


Author

Anyleads

San Francisco

We are the leading marketing automation platform serving more than 100,000 businesses daily. We operate in 3 countries, based in San Francisco, New York, Paris & London.

Join Anyleads to generate leads

Error! Impossible to register please verify the fields or the account already exists.. Error, domain not allowed. Error, use a business email. Welcome to the Anyleads experience!
More than +200 features to generate leads
Register to start generating leads

Create your account and start your 7 day free trial!

Error! Impossible to register please verify the fields or the account already exists.. Error, domain not allowed. Error, use a business email. Welcome to the Anyleads experience! By registering you agree to the Terms and conditions agreement.
More than +200 features to generate leads

We offer multiple products for your lead generation, discover them below!

>> Unlimited access to all products with one single licensecheck our pricing.