Tyler Young

Product Developer

I’m a product-focused developer working at Jump, where we’re using LLMs to help financial advisors do less tedious stuff and more of the work they love. Formerly at Felt (the “Figma for maps”), I’ve worked in a number of soft real-time, distributed systems domains, including networking, video streaming, and IOT command-and-control systems. I fell in love with Elixir in 2019, when I used it to build a massively multiplayer game server for the X-Plane flight simulator, and I’ve been working primarily with Elixir ever since.

Talk:
Cat and Mouse: Challenges in Adversarial Web Scraping

The time comes in every developer’s career when they need to scrape a web page. If you’re lucky, a simple HTTP request gets you what you need, or maybe you have to spoof some browser headers. But if that’s not enough, what can you do?

And from the other side, as a site operator, how can you prevent your site from being scraped by any script kiddie who knows what a user agent is?

In this talk, we’ll explore the dark art of scraping the web from both perspectives: the bots, and the services that try to confound them. We’ll look at a number of techniques for detecting non-human traffic, and show how a respectful, ethical scraper might get around them. (Hint: You can’t use OTP’s built-in HTTP stack for this!) We’ll also look at the gold standard for bot detection, and test the limits for how sites can prevent automated access.

Key Takeaways:

Methods for bot detection go way beyond user agents; they go as low in the stack as the TLS handshake, and as high as JavaScript execution and mouse movement tracking
Aggressive prevention of bot traffic provides a disservice to human users
Guidelines for ethical web scraping include adhering to the law and avoiding putting an excessive load on the remote server

Target Audience:

Developers who want to know how bot detection works, and how to get around it
Site operators who want to prevent bots from accessing their site

Tyler Young

Product Developer

Talk: Cat and Mouse: Challenges in Adversarial Web Scraping

Talk:
Cat and Mouse: Challenges in Adversarial Web Scraping