Python Regex: Pattern Matching with the re Module

Master Regular Expressions (Regex) in Python. Learn to search, match, split, and substitute string data using the built-in re library.

Try Python Regex Code

Overview

Regular Expressions (often shortened to regex) are a mini-language used for advanced text searching, string matching, and manipulation. Instead of checking simple string values, regex lets you define complex patterns to search for inside massive bodies of text. For instance, finding all email addresses, phone numbers, or URLs in a document is simple to express in regex, but incredibly complex to write using normal string methods.

Python provides built-in support for regular expressions through the `re` module. The module includes critical functions like `re.search()` to find the first match, `re.findall()` to retrieve all matching substrings as a list, and `re.finditer()` to iterate over match objects. Each match object provides metadata, such as the exact start and end position index of the matching substring, and the matching text itself.

Regex patterns utilize special metacharacters: `\d` matches any digit, `\w` matches alphanumeric characters, `+` matches one or more repetitions, and brackets `[...]` define character sets. Grouping with parentheses `(...)` allows you to extract specific parts of a pattern. Additionally, `re.sub()` lets you search for a pattern and replace it with a new string. Although regex has a steep learning curve, mastering it provides you with an incredibly fast and versatile tool for text processing.

Code Example

Using the re module to validate and extract components of an email address.

regex_demo.py
Try in Editor
import re

text = "Contact support at info@pyrun.xyz or developer@pyrun.xyz today."

# Pattern to match email addresses
email_pattern = r"([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"

# Find all matches
emails = re.findall(email_pattern, text)
print("Emails found:")
for user, domain in emails:
    print(f"User: {user} | Domain: {domain}")

# Redacting emails using re.sub
redacted_text = re.sub(email_pattern, "[REDACTED]", text)
print(f"\nRedacted Text:\n{redacted_text}")
Terminal Output
Emails found:
User: info | Domain: pyrun.xyz
User: developer | Domain: pyrun.xyz

Redacted Text:
Contact support at [REDACTED] or [REDACTED] today.

Real-world Use Cases

  • Validating format inputs (emails, phones, passwords)
  • Parsing server access logs for status codes
  • Scraping information from raw HTML and markdown reports

Frequently Asked Questions

What is the significance of the 'r' prefix in regex patterns?

It denotes a 'raw string'. Raw strings treat backslashes (\) as literal characters instead of escape characters, avoiding syntax conflicts with regex wildcards.

What is the difference between re.match and re.search?

re.match only checks for a match starting at the very beginning of the string, while re.search scans the entire string for a match.

Keep Learning

Recommended Python Resources

Expand your knowledge with related interactive tutorials, cheat sheets, and code comparisons.