Skip to content

Commit

Permalink
Add Preface
Browse files Browse the repository at this point in the history
  • Loading branch information
niespodd authored Feb 8, 2021
1 parent 52e0c4f commit d6b288b
Showing 1 changed file with 38 additions and 3 deletions.
41 changes: 38 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,43 @@
# Browser Fingerprinting 👫 Anti-bot Detection
# Browser Fingerprinting 👫 Bot Detection 👨‍🔧 Technical dive

Here I study various aspects of existing evasion techniques to get around anti-bot systems. [Looking for help with bypassing one?](#support)
A ⭐ on this repo will be **appreciated**!

PS. A ⭐ on this repo will be **appreciated**!
Here I study various aspects of existing evasion techniques to get around anti-bot systems. The technical findings that I am sharing below are based on observations of running web scraping scripts for a few months against websites protected by:
* [Akamai Bot Manager by Akamai](https://www.akamai.com/uk/en/products/security/bot-manager.jsp)
* [Advanced Bot Protection by Imperva (former Distil Networks)](https://www.imperva.com/products/advanced-bot-protection-management/)
* [DataDome](https://datadome.co/)
* [PerimeterX](https://www.perimeterx.com/)
* [Shape Security](https://www.shapesecurity.com/)

and a few other custom built (incl. social media platforms). [Having troubles bypassing one of them?](#support)

# Preface

Before I dive into technical details, I need to make a general remark to people who are evaluating (and/or) planning to introduce anti-bot software on their websites. **Anti-bot software is nonsense.** It's snake oil sold to people without technical knowledge for heavy bucks. Most of the time you do not need it.

Blocking bot traffic is based on the premise that you (or your technology provider) can **distinguish bots from real users**. To make this happen various techniques that I am discussing further in the document are applied. However, to date none of them appear to be successful against specialized web scrapering tools. In other words **anti-bot software is all about reducing cheap bot traffic** and most often only making the process more expensive and complicated, but **not entirely impossible** for the people running web scrapers.

Today anti-bot vendors apply two major techniques to detect (and block) bot traffic:

### Binary detection

No specialized web scraping software is used. Vendor can detect the bad traffic based on information openly disclosed by the scraper e.g. `User-Agent` header, connection parameters etc.

As a result **only bots that are not targeted to scrape specific website are blocked**. This will make most of the managers happy, because the overall number of *bad traffic* goes down and it may almost look like there is no more bot traffic on the website. **Wrong.**

### Traffic clustering

More advanced web scrapers make use of residential proxies and implement complex evasion techniques to fool anti-bot software to think that the web scraper is a real user. No detection mechanism exists to get around this due to technical limitation of web browsers.

In this case, most of the time the vendor will be only able to **cluster the bad traffic** by finding patterns in bot traffic and behavior. The problem with banning the traffic here is that it may turn out to be a risky operation when bots are successfully mimicking real users. There is a chance that **by blocking bots the website will become unavailable to real visitors**.

### *Gateways, captchas & co*

If you think this is a way to go [google "captcha resolve api"](https://letmegooglethat.com/?q=captcha+resolve+api).

# Technicalities

*I constantly add stuff to this section.*

## Random, maybe useful
* [Cap FPS for Chromium with software rendering --use-gl=swiftshader](https://gist.github.com/niespodd/c7fd14e0e58652e74c0f1fdbd819112d) - Limit CPU usage from SwiftShader by redraw freq. of Chromium in AVD
Expand Down

0 comments on commit d6b288b

Please sign in to comment.