OWASP Automated Threat (OAT– 011) ScrapingScraping – Collect application content and/or other data for use elsewhere.
Scraping refers to the collection of application’s content and/or other data for use elsewhere. It is an automated threat that aims to steal sensitive data from a victim or abuse functionality.
What is Scraping?
Collecting accessible data and/or processed output from the application. Some scraping may use fake or compromised accounts, or the information may be accessible without authentication. The scraper may attempt to read all accessible paths and parameter values for web pages and APIs, collecting the responses and extracting data from them.
Scraping may occur in real-time or be more periodic in nature. Some scraping may be used to gain insight into how it is constructed and operates – perhaps for cryptanalysis, reverse engineering or session analysis.
Scraping is also known by the terms such as API provisioning, bargain hunting, comparative shopping, content scraping, data aggregation, database scraping, farming, harvesting, metasearch scraper, mining, mirroring, pagejacking, powering APIs, ripping, scraper bot, screen scraping, and search / social media bot.
The Symptoms of Scraping
OWASP, a worldwide not-for-profit charitable organization focused on improving the security of software, notes that there are several possible symptoms of scraping. These include,
- Unusual request activity for selected resources (e.g., high rate, high number, fixed period)
- Duplicated content from multiple sources in search engine results
- Decreased search engine ranking
- Increased network bandwidth usage throughput problems
- New competitors with similar service offerings
Sectors Targeted by Scraping
According to the Automated Threat Handbook published by the OWASP, scraping is aimed at companies in industries including education, entertainment, government, financial, healthcare, retail, technology and social networking.
OWASP, a worldwide not-for-profit charitable organization focused on improving the security of software, says data commonly misused in scraping incidents include authentication credentials, payment cardholder data and other financial data, medical and other personal data, intellectual property and other business data and public information.
The handbook notes that some scraping might use compromised accounts, or the information might be accessible without authentication. The scraper might attempt to read all accessible paths and parameter values for web pages and APIs, collecting the responses and extracting data from them. Scraping can occur in real-time or be more periodic in nature.
Ways to Prevent Scraping Security Threat
OWASP suggests several possible countermeasures for organizations to address the threat of scraping. These include,
- Reducing data fields collected and subsequently output, and/or reducing the retention period, documenting what is acceptable usage and what is unacceptable scraping.
- Defining test cases for scraping that confirm an application will detect and/or prevent users attempting to scrape content and other data.
- Consider randomizing the content and URLs of content, linking these changes to an individual user’s session, verifying the changes at each request and restricting any identified automated usage.
- Organizations can also identify and restrict automated usage by fingerprinting before a scraping attack can occur; require greater identity authentication for access, pre-register users and implement strong authentication for access to any exposed APIs.
- OWASP also recommends organizations to participate in ecommerce threat intelligence exchanges and contributing any relevant attack data to sector-wide sharing systems.
These are the primary security checks against scraping attacks, but few dedicated fraudsters will go beyond the lengths to straighten their scraping effort often operating through privacy browsers, VPN, proxy servers to blur their online identity. Above mentioned are the few security measures that help fight back against malicious users such as scalping, without causing harm to your legitimate users.
Online businesses can also opt for a bot mitigation solution that prevents scraping and even other OWASP automated threats in real-time without affecting any legitimate visitors. Bot mitigation is probably the most accurate solution for preventing OWASP Automated Threats and also ensure real-time protection against malicious bots. Bot mitigation solution can block all automated ways to expedite actions on websites by bots.
By having these security measures in place, your website will be able to defend against the online security threats such as scraping and other OWASP automated threats in real-time.