BeautifulSoup Scraper

Fetch and parse HTML via micropip/requests.

Try BeautifulSoup Scraper Code

How it Works

While requests normally requires a local OS socket, PyRun maps requests functionality over the browser fetch API.

This allows parsing of open CORS pages.

Source Code

Scrapes Hacker News top story headlines dynamically.

scraper.py
Try in Editor
import requests
from bs4 import BeautifulSoup

# PyRun uses micropip internally to make requests 
# through the browser's fetch API
url = 'https://news.ycombinator.com'
print(f"Fetching {url}...\n")

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

print("=== Hacker News Top Stories ===")
# Find the story titles
titles = soup.find_all('span', class_='titleline', limit=5)

for i, title in enumerate(titles):
    link = title.find('a')
    text = link.text
    href = link.get('href')
    print(f"{i+1}. {text}")
    print(f"   Link: {href}\n")
Terminal Output
Fetching https://news.ycombinator.com...

=== Hacker News Top Stories ===
1. ...
   Link: ...

2. ...
   Link: ...

Real-world Applications

  • Web Scraping
  • Data Aggregation
  • SEO Auditing

Frequently Asked Questions

Why does a request occasionally fail?

If a site has strict CORS headers preventing browser fetch operations, it will fail. News.ycombinator currently permits open read CORS.