BeautifulSoup Scraper
Fetch and parse HTML via micropip/requests.
How it Works
While requests normally requires a local OS socket, PyRun maps requests functionality over the browser fetch API.
This allows parsing of open CORS pages.
Source Code
Scrapes Hacker News top story headlines dynamically.
scraper.py
Try in Editorimport requests
from bs4 import BeautifulSoup
# PyRun uses micropip internally to make requests
# through the browser's fetch API
url = 'https://news.ycombinator.com'
print(f"Fetching {url}...\n")
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
print("=== Hacker News Top Stories ===")
# Find the story titles
titles = soup.find_all('span', class_='titleline', limit=5)
for i, title in enumerate(titles):
link = title.find('a')
text = link.text
href = link.get('href')
print(f"{i+1}. {text}")
print(f" Link: {href}\n")Terminal Output
Fetching https://news.ycombinator.com...
=== Hacker News Top Stories ===
1. ...
Link: ...
2. ...
Link: ...Real-world Applications
- Web Scraping
- Data Aggregation
- SEO Auditing
Frequently Asked Questions
Why does a request occasionally fail?
If a site has strict CORS headers preventing browser fetch operations, it will fail. News.ycombinator currently permits open read CORS.