I’m a product-focused developer working at Jump, where we’re using LLMs to help financial advisors do less tedious stuff and more of the work they love. Formerly at Felt (the “Figma for maps”), I’ve worked in a number of soft real-time, distributed systems domains, including networking, video streaming, and IOT command-and-control systems. I fell in love with Elixir in 2019, when I used it to build a massively multiplayer game server for the X-Plane flight simulator, and I’ve been working primarily with Elixir ever since.
The time comes in every developer’s career when they need to scrape a web page. If you’re lucky, a simple HTTP request gets you what you need, or maybe you have to spoof some browser headers. But if that’s not enough, what can you do?
And from the other side, as a site operator, how can you prevent your site from being scraped by any script kiddie who knows what a user agent is?
In this talk, we’ll explore the dark art of scraping the web from both perspectives: the bots, and the services that try to confound them. We’ll look at a number of techniques for detecting non-human traffic, and show how a respectful, ethical scraper might get around them. (Hint: You can’t use OTP’s built-in HTTP stack for this!) We’ll also look at the gold standard for bot detection, and test the limits for how sites can prevent automated access.
Key Takeaways:
Target Audience: