Explain

How can I find all matches to a regular expression in Python?

In Python, you typically use re.findall() or re.finditer() to retrieve all matches of a regular expression in a string. Below are the common ways and their differences.

1. re.findall() for a List of Matches

import re

text = "Hello 123, goodbye 456"
pattern = r"\d+"

matches = re.findall(pattern, text)
print(matches)
# Output: ["123", "456"]
  • re.findall(pattern, string) returns a list of matched substrings.
  • If the pattern has capturing groups, findall returns either:
    • a list of strings (if there's exactly one capturing group), or
    • a list of tuples (if there are multiple capturing groups).

Example with Capturing Group

text = "Name: John, Age: 30"
pattern = r"(\w+):\s(\w+)"

# Each match has two groups -> result is a list of tuples
matches = re.findall(pattern, text)
print(matches)
# Output: [('Name', 'John'), ('Age', '30')]

2. re.finditer() for an Iterator of Match Objects

import re

text = "Hello 123, goodbye 456"
pattern = r"\d+"

for match in re.finditer(pattern, text):
    print("Match:", match.group(0), "at", match.span())
  • re.finditer(pattern, string) returns an iterator of match objects (re.Match in Python 3.7+).
  • Each match object gives you start/end indices (.span()), the full match (.group(0)), and any capturing groups (e.g. .group(1), .group(2), etc.).
  • Ideal if you need detailed info about the positions or groups for each match.

3. Other Tips & Flags

3.1 Regex Flags

import re

text = "HELLO\nhello"
pattern = r"hello"

# re.IGNORECASE -> case-insensitive
# re.DOTALL -> '.' matches newline
# re.MULTILINE -> '^' and '$' match start/end of lines
matches = re.findall(pattern, text, flags=re.IGNORECASE)
print(matches)  # ['HELLO', 'hello']

Common flags include:

  • re.IGNORECASE or re.I: case-insensitive matching.
  • re.DOTALL or re.S: '.' matches newline.
  • re.MULTILINE or re.M: ^ and $ match start/end of each line, not just the entire string.

Recommended Courses

3.2 Overlapping Matches

  • findall() and finditer() find non-overlapping matches. If you need overlapping matches, you have to devise a custom loop (e.g. adjusting the search start index on each iteration) or use a regex trick like lookahead. For example:
    import re
    
    text = "aaaa"
    pattern = r"(?=(aa))"  # lookahead-based approach
    
    matches = re.findall(pattern, text)
    print(matches)  # ['aa', 'aa', 'aa']
    
    This captures overlapping occurrences of "aa".

4. Summary

  • re.findall(pattern, string): Returns a list of all matched substrings (or a list of tuples if multiple capturing groups).
  • re.finditer(pattern, string): Returns an iterator of match objects, offering more control (like match positions, individual groups, etc.).
  • Non-overlapping: By default, both skip overlapping matches unless you use lookaheads or specialized logic.

Bonus: Level Up Your Regex & Coding Interview Skills

If you’re digging into Python and regex while preparing for interviews or real-world tasks, check out these DesignGurus.io resources:

  1. Grokking the Coding Interview: Patterns for Coding Questions
    Master common coding patterns essential for interviews and problem-solving.

  2. Grokking Data Structures & Algorithms for Coding Interviews
    Strengthen your DS&A fundamentals—key for technical interviews.

  3. Grokking Python Fundamentals
    Dive into Python essentials.

For personalized feedback from ex-FAANG engineers, explore Mock Interviews:

Also, find free content on the DesignGurus.io YouTube channel.

Conclusion: Use re.findall() or re.finditer() to retrieve all regex matches in Python. findall gives you a list of matches or tuples, while finditer yields match objects for more detailed info.