Python Regular Expressions – re Module

The re module in Python provides different functions which can be used for matching patterns amongst strings or to check whether some string contains a pattern.
Python’s Core Developers have designed many functions bundled together as re module, but to use these functions you need a Regular Expression Pattern. For example – re.match(Regular Expression, string) function takes in a Regular Expression and string, then do the process of matching.

So in order to become good in using re module functions, you need to first know Regular Expressions and How to write these?
For learning this, I would recommend you to go through tutorials on RegexOne website. Back in the days, when I was a first year student at university my professor recommended to go through these tutorials before coming to Introduction to Programming class. (Writing this article remembered me of my First Year at university 😂 😂 )
RegexOne is great for learning forming regular expressions, please go there if your not already aware of Regular Expressions. After going through tutorials on RegexOne, please do some exercises and for that use Pythex its a free Python Regex Expression Testing Tool.

Also note that in this article I’ll be explaining how functions in re module can be levarged for doing Pattern Matching or Searching in Python itself rather than explaining what Regex is itself.

Common re module Functions in Python – Table

Function NameDescription
re.compile(pattern, flags = 0)Compile a regular expression pattern into a regular expression object
re.search(pattern, string, flags = 0)Scan through string looking for first location where regular expression pattern produces a match and return corresponding match object, otherwise return None
re.match(pattern, string, flags = 0)If zero or more characters at beginning of string match regular expression pattern return a corresponding match object, returns None otherwise
re.fullmatch(pattern, string, flags = 0)If whole of string matches with regular expression returns True, otherwise return False
re.split(pattern, string, maxsplit=0, flags=0)Split string by occurrences of pattern
re.findall(pattern, string, flags=0)Return all non-overlapping matches of pattern in string as a list of strings
re.sub(pattern, repl, string, count=0, flags=0)Returns string obtained by replacing leftmost non-overlapping occurrences of pattern in string by replacement repl
re.escape(pattern)Escape special characters in pattern
re.purge()Clears regular expression cache

Using re.match() Function for Matching Strings

re Module’s match() Function checks if zero or more characters at beginning of string match regular expression pattern. If there’s a match then return a corresponding match object, otherwise return None.

Finding first character of String

import re

text = "Computer Science Hub"         # Some string

m = re.match(".", text)               
# m will be an indexable object containing first character in string at position 0

print(m[0])                           # Prints out C

Matching against whole of String

import re

text = "Computer Science Hub"       # Python String

m = re.match(".*", text)
# m will be an indexable object containing all characters of string at position 0

print(m[0])                         # Prints out "Computer Science Hub"

Finding First Sequence of Letters in String

import re

text = "   Computer Science Hub"        # Python String, note - Spaces

m = re.match("\w+", text)
# m will be an indexable object containing first sequence of letters in string

print(m[0])                       # Prints out TypeError: 'NoneType'

# Because first sequence of characters in string is empty spaces.
# But if text = "Computer Science Hub" then m[0] will be 'Computer' for re.match("\w+", text)

Using re module for Taking out matching Substring from Large String

import re

text ="10/15/99"

m = re.match("(\d{2})/(\d{2})/(\d{2,4})", text)   # Matching with a Pattern
if m:
    print(m.group(1, 2, 3))

# Prints out 
('10', '15', '99')

Using re Module for Searching a Substring

import re

text = "Example 3: There is 1 date 10/25/95 in here!"

m = re.search("(\d{1,2})/(\d{1,2})/(\d{2,4})", text)

print(m.group(1), m.group(2), m.group(3))

month, day, year = m.group(1, 2, 3)
print(month, day, year)

date = m.group(0)
print(date)

# Above code prints out
10 25 95
10 25 95
10/25/95

Using re Module for Replacing some part of String

import re

text = "you're no fun anymore..."

# literal replace (string.replace is faster)
print re.sub("fun", "entertaining", text)

# collapse all non-letter sequences to a single dash
print re.sub("[^\w]+", "-", text)

# convert all words to beeps
print re.sub("\S+", "-BEEP-", text)

# Above code prints out
you're no entertaining anymore...
you-re-no-fun-anymore-
-BEEP- -BEEP- -BEEP- -BEEP-

Using re Module to Replace Substrings via callback Function

import re
import string

text = "a line of text\\012another line of text\\012etc..."

def octal(match):
    # replace octal code with corresponding ASCII character
    return chr(string.atoi(match.group(1), 8))

octal_pattern = re.compile(r"\\(\d\d\d)")

print(text)
print(octal_pattern.sub(octal, text))

# Above code prints out
a line of text\012another line of text\012etc...
a line of text
another line of text
etc...

Using re Module to Match Against One of Many Patterns

import re, string

def combined_pattern(patterns):
    p = re.compile(
        string.join(map(lambda x: "("+x+")", patterns), "|")
        )
    def fixup(v, m=p.match, r=range(0,len(patterns))):
        try:
            regs = m(v).regs
        except AttributeError:
            return None # no match, so m.regs will fail
        else:
            for i in r:
                if regs[i+1] != (-1, -1):
                    return i
    return fixup

#
# try it out!

patterns = [
    r"\d+",
    r"abc\d{2,4}",
    r"p\w+"
]

p = combined_pattern(patterns)

print p("129391")
print p("abc800")
print p("abc1600")
print p("python")
print p("perl")
print p("tcl")

# Above code prints out
0
1
1
2
2
None

You May Also Like

Gagan

Hi, there I'm founder of ComputerScienceHub(Started this to bring useful Computer Science information just at one place). Personally I've been doing JavaScript, Python development since 2015(Been long) - Worked upon couple of Web Development Projects, Did some Data Science stuff using Python. Nowadays primarily I work as Freelance JavaScript Developer(Web Developer) and on side-by-side managing team of Computer Science specialists at ComputerScienceHub.io

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts