Python Regular Expressions (RegEx)

Share to

Regular Expression or RegEx is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

Illustration Regular Expressions (RegEx)

For beginners, imagine RegEx as a much more advanced "Search" or "Find" (Ctrl+F) feature. If regular search feature can only find exact same words, RegEx allows you to find specific patterns, like finding all phone numbers, email addresses, or date formats in a very long document without needing to know specific text content.

Although at first RegEx looks confusing because it is full of unique symbols, mastering this technique will greatly help you in text processing and performing user input validation in your Python application much more efficiently.

Python has a built-in module named re, which can be used to work with Regular Expressions.

Using re Module

To use RegEx in Python, you must import the re module:

import re

Functions in re Module

The re module offers a set of functions that allows us to search a string for a match:

Function	Description
`findall`	Returns a list containing all matches
`search`	Returns a Match object if there is a match anywhere in the string
`split`	Returns a list where the string has been split at each match
`sub`	Replaces one or many matches with a string

search() Function

The search() function searches the string for a match, and returns a Match object if there is a match. If more than one match is found, only the first occurrence of the match is returned.

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

if x:
  print("YES! We have a match!")
else:
  print("No match")

findall() Function

The findall() function returns a list containing all matches.

import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

The list contains the matches in the order they are found. If no matches are found, an empty list is returned.

split() Function

The split() function returns a list where the string has been split at each match.

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

You can control the number of occurrences by specifying the maxsplit parameter:

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

sub() Function

The sub() function replaces the matches with the text of your choice.

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

You can control the number of replacements by specifying the count parameter:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

Metacharacters

Metacharacters are characters with a special meaning:

Character	Description	Example
`[]`	A set of characters	`"[a-m]"`
`\`	Signals a special sequence (can also be used to escape special characters)	`"\d"`
`.`	Any character (except newline character)	`"he..o"`
`^`	Starts with	`"^hello"`
`$`	Ends with	`"world$"`
`*`	Zero or more occurrences	`"aix*"`
`+`	One or more occurrences	`"aix+"`
`{}`	Exactly the specified number of occurrences	`"al{2}"`
`\|`	Either or	`"falls\|stays"`
`()`	Capture and group

Special Sequences

A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

Character	Description	Example
`\A`	Returns a match if the specified characters are at the beginning of the string	`"\AThe"`
`\b`	Returns a match where the specified characters are at the beginning or at the end of a word	`r"\bain"` `r"ain\b"`
`\B`	Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word	`r"\Bain"` `r"ain\B"`
`\d`	Returns a match where the string contains digits (numbers from 0-9)	`"\d"`
`\D`	Returns a match where the string DOES NOT contain digits	`"\D"`
`\s`	Returns a match where the string contains a white space character	`"\s"`
`\S`	Returns a match where the string DOES NOT contain a white space character	`"\S"`
`\w`	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	`"\w"`
`\W`	Returns a match where the string DOES NOT contain any word characters	`"\W"`
`\Z`	Returns a match if the specified characters are at the end of the string	`"Spain\Z"`

Sets

A set is a set of characters inside a pair of square brackets [] with a special meaning:

Set	Description
`[arn]`	Returns a match where one of the specified characters (`a`, `r`, or `n`) are present
`[a-n]`	Returns a match for any lower case character, alphabetically between `a` and `n`
`[^arn]`	Returns a match for any character EXCEPT `a`, `r`, and `n`
`[0123]`	Returns a match where any of the specified digits (`0`, `1`, `2`, or `3`) are present
`[0-9]`	Returns a match for any digit between `0` and `9`
`[0-5][0-9]`	Returns a match for any two-digit numbers from `00` and `59`
`[a-zA-Z]`	Returns a match for any character alphabetically between `a` and `z`, lower case OR upper case
`[+]`	In sets, `+`, `*`, `.`, `\|`, `()`, `$`, `{}` has no special meaning, so `[+]` means: return a match for any `+` character in the string

Example

import re

# Find string that contains lowercase letter between a and n
x = re.findall("[a-n]", txt)
print(x)

If no match is found, `findall()` will return an empty list.

Edit this tutorial

Python Web Development

Python JSON Data

MICRODRAMA INDONESIA

Nonton & Buat Short Drama AI Gratis

Platform Microdrama AI pertama di Indonesia. Nonton short drama gratis atau buat drama pendek sendiri dengan AI dalam hitungan menit, tanpa kru & kamera.

Coba Gratis Sekarang

100% Gratis untuk dicoba. Bagian dari ekosistem MicroDrama Indonesia.

Tutorial Material