Python RegEx
A Regular Expression or RegEx is a special sequence of characters that uses
a search pattern to find a string or set of strings.
It can detect the presence or absence of a text by matching it with a particular
pattern and also can split a pattern into one or more sub-patterns.
Regex Module in Python
Python has a built-in module named "re" that is used for regular expressions
in Python. We can import this module by using import statement.
Importing re module in Python using following command:
import re
How to Use RegEx in Python?
You can use RegEx in Python after importing re module.
Example:
This Python code uses regular expressions to search for the word "portal" in
the given string and then prints the start and end indices of the matched word
within the string.
import re
s = 'GeeksforGeeks: A computer science portal for geeks'
match = [Link](r'portal', s)
print('Start Index:', [Link]())
print('End Index:', [Link]())
Output
Start Index: 34
End Index: 40
Note: Here r character (r’portal’) stands for raw, not regex. The raw string is
slightly different from a regular string, it won’t interpret the \ character as an
escape character. This is because the regular expression engine uses \
character for its own escaping purpose.
Before starting with the Python regex module let's see how to actually write
regex using metacharacters or special sequences.
RegEx Functions
The re module in Python provides various functions that help search, match,
and manipulate strings using regular expressions.
Below are main functions available in the re module:
Function Description
finds and returns all matching
[Link]()
occurrences in a list
Regular expressions are compiled
[Link]()
into pattern objects
Split string by the occurrences of a
[Link]()
character or a pattern.
Replaces all occurrences of a
[Link]() character or patter with a
replacement string.
It's similar to [Link]() method but it
resubn returns a tuple: (new_string,
number_of_substitutions)
[Link]() Escapes special character
Searches for first occurrence of
[Link]()
character or pattern
Let's see the working of these RegEx functions with definition and examples:
1. [Link]()
Returns all non-overlapping matches of a pattern in the string as a list. It
scans the string from left to right.
Example: This code uses regular expression \d+ to find all sequences of one
or more digits in the given string.
import re
string = """Hello my Number is 123456789 and
my friend's number is 987654321"""
regex = '\d+'
match = [Link](regex, string)
print(match)
Output
['123456789', '987654321']
2. [Link]()
Compiles a regex into a pattern object, which can be reused for matching or
substitutions.
Example 1: This pattern [a-e] matches all lowercase letters between 'a' and
'e', in the input string "Aye, said Mr. Gibenson Stark". The output should be
['e', 'a', 'd', 'b', 'e'], which are matching characters.
import re
p = [Link]('[a-e]')
print([Link]("Aye, said Mr. Gibenson Stark"))
Output
['e', 'a', 'd', 'b', 'e', 'a']
Explanation:
● First occurrence is 'e' in "Aye" and not 'A', as it is Case Sensitive.
● Next Occurrence is 'a' in "said", then 'd' in "said", followed by 'b' and
'e' in "Gibenson", the Last 'a' matches with "Stark".
● Metacharacter backslash '\' has a very important role as it signals
various sequences. If the backslash is to be used without its special
meaning as metacharacter, use'\\'
Example 2: The code uses regular expressions to find and list all single digits
and sequences of digits in the given input strings. It finds single digits with \d
and sequences of digits with \d+.
import re
p = [Link]('\d')
print([Link]("I went to him at 11 A.M. on 4th July 1886"))
p = [Link]('\d+')
print([Link]("I went to him at 11 A.M. on 4th July 1886"))
Output
['1', '1', '4', '1', '8', '8', '6']
['11', '4', '1886']
Example 3: Word and non-word characters
● \w matches a single word character.
● \w+ matches a group of word characters.
● \W matches non-word characters.
import re
p = [Link]('\w')
print([Link]("He said * in some_lang."))
p = [Link]('\w+')
print([Link]("I went to him at 11 A.M., he \
said *** in some_language."))
p = [Link]('\W')
print([Link]("he said *** in some_language."))
Output
['H', 'e', 's', 'a', 'i', 'd', 'i', 'n', 's', 'o', 'm', 'e', '_',
'l', 'a', 'n', 'g']
['I', 'went', 'to', 'him', 'at', '11', 'A', 'M', 'he', 'said',
'in', 'some_language']
[' ', ' ', '*', '*', '*', ' ', ' ', '.']
Example 4: The regular expression pattern 'ab*' to find and list all
occurrences of 'ab' followed by zero or more 'b' characters. In the input string
"ababbaabbb". It returns the following list of matches: ['ab', 'abb', 'abbb'].
import re
p = [Link]('ab*')
print([Link]("ababbaabbb"))
Output
['ab', 'abb', 'a', 'abbb']
Explanation:
● Output 'ab', is valid because of single 'a' accompanied by single 'b'.
● Output 'abb', is valid because of single 'a' accompanied by 2 'b'.
● Output 'a', is valid because of single 'a' accompanied by 0 'b'.
● Output 'abbb', is valid because of single 'a' accompanied by 3 'b'.
3. [Link]()
Splits a string wherever the pattern matches. The remaining characters are
returned as list elements.
Syntax:
[Link](pattern, string, maxsplit=0, flags=0)
● pattern: Regular expression to match split points.
● string: The input string to split.
● maxsplit (optional): Limits the number of splits. Default is 0 (no
limit).
● flags (optional): Apply regex flags like [Link].
Example 1: Splitting by non-word characters or digits
This example demonstrates how to split a string using different patterns like
non-word characters (\W+), apostrophes, and digits (\d+).
from re import split
print(split('\W+', 'Words, words , Words'))
print(split('\W+', "Word's words Words"))
print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))
Output
['Words', 'words', 'Words']
['Word', 's', 'words', 'Words']
['On', '12th', 'Jan', '2016', 'at', '11', '02', 'AM']
['On ', 'th Jan ', ', at ', ':', ' AM']
Example 2: Using maxsplit and flags
This example shows how to limit the number of splits using maxsplit, and how
flags can control case sensitivity.
import re
print([Link]('\d+', 'On 12th Jan 2016, at 11:02 AM', 1))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here',
flags=[Link]))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here'))
Output
['On ', 'th Jan 2016, at 11:02 AM']
['', 'y, ', 'oy oh ', 'oy, ', 'om', ' h', 'r', '']
['A', 'y, Boy oh ', 'oy, ', 'om', ' h', 'r', '']
Note: In the second and third cases of the above , [a-f]+ splits the string using
any combination of lowercase letters from 'a' to 'f'. The [Link] flag
includes uppercase letters in the match.
4. [Link]()
The [Link]() function replaces all occurrences of a pattern in a string with a
replacement string.
Syntax:
[Link](pattern, repl, string, count=0, flags=0)
● pattern: The regex pattern to search for.
● repl: The string to replace matches with.
● string: The input string to process.
● count (optional): Maximum number of substitutions (default is 0,
which means replace all).
● flags (optional): Regex flags like [Link].
Example 1: The following examples show different ways to replace the
pattern 'ub' with '~*', using various flags and count values.
import re
# Case-insensitive replacement of all 'ub'
print([Link]('ub', '~*', 'Subject has Uber booked already',
flags=[Link]))
# Case-sensitive replacement of all 'ub'
print([Link]('ub', '~*', 'Subject has Uber booked already'))
# Replace only the first 'ub', case-insensitive
print([Link]('ub', '~*', 'Subject has Uber booked already', count=1,
flags=[Link]))
# Replace "AND" with "&", ignoring case
print([Link](r'\sAND\s', ' & ', 'Baked Beans And Spam',
flags=[Link]))
Output
S~*ject has ~*er booked already
S~*ject has Uber booked already
S~*ject has Uber booked already
Baked Beans & Spam
5. [Link]()
[Link]() function works just like [Link](), but instead of returning only the
modified string, it returns a tuple: (new_string, number_of_substitutions)
Syntax:
[Link](pattern, repl, string, count=0, flags=0)
Example: Substitution with count
This example shows how [Link]() gives both the replaced string and the
number of times replacements were made.
import re
# Case-sensitive replacement
print([Link]('ub', '~*', 'Subject has Uber booked already'))
# Case-insensitive replacement
t = [Link]('ub', '~*', 'Subject has Uber booked already',
flags=[Link])
print(t)
print(len(t)) # tuple length
print(t[0]) # modified string
Output
('S~*ject has Uber booked already', 1)
('S~*ject has ~*er booked already', 2)
2
S~*ject has ~*er booked already
6. [Link]()
[Link]() function adds a backslash (\) before all special characters in a
string. This is useful when you want to match a string literally, including any
characters that have special meaning in regex (like ., *, [, ], etc.).
Syntax:
[Link](string)
Example: Escaping special characters
This example shows how [Link]() treats spaces, brackets, dashes, and
tabs as literal characters.
import re
print([Link]("This is Awesome even 1 AM"))
print([Link]("I Asked what is this [a-9], he said \t ^WoW"))
Output
This\ is\ Awesome\ even\ 1\ AM
I\ Asked\ what\ is\ this\ \[a\-9\]\,\ he\ said\ \ \ \^WoW
7. [Link]()
The [Link]() function searches for the first occurrence of a pattern in a
string. It returns a match object if found, otherwise None.
Note: Use it when you want to check if a pattern exists or extract the first
match.
Example: Search and extract values
This example searches for a date pattern with a month name (letters) followed
by a day (digits) in a sentence.
import re
regex = r"([a-zA-Z]+) (\d+)"
match = [Link](regex, "I was born on June 24")
if match:
print("Match at index %s, %s" % ([Link](), [Link]()))
print("Full match:", [Link](0))
print("Month:", [Link](1))
print("Day:", [Link](2))
else:
print("The regex pattern does not match.")
Output
Match at index 14, 21
Full match: June 24
Month: June
Day: 24
Meta-characters
Metacharacters are special characters in regular expressions used to define
search patterns. The re module in Python supports several metacharacters
that help you perform powerful pattern matching.
Below is a quick reference table:
MetaCharacters Description
Used to drop the special meaning
\
of character following it
[] Represent a character class
^ Matches the beginning
$ Matches the end
Matches any character except
.
newline
Means OR (Matches with any of
|
the characters separated by it.
? Matches zero or one occurrence
* Any number of occurrences
(including 0 occurrences)
+ One or more occurrences
Indicate the number of
{} occurrences of a preceding regex
to match.
() Enclose a group of Regex
Let's discuss each of these metacharacters in detail:
1. \ - Backslash
The backslash (\) makes sure that the character is not treated in a special
way. This can be considered a way of escaping metacharacters.
For example, if you want to search for the dot(.) in the string then you will find
that dot(.) will be treated as a special character as is one of the
metacharacters (as shown in the above table). So for this case, we will use
the backslash(\) just before the dot(.) so that it will lose its specialty. See the
below example for a better understanding.
Example: The first search ([Link](r'.', s)) matches any character, not just
the period, while the second search ([Link](r'\.', s)) specifically looks for
and matches the period character.
import re
s = '[Link]'
# without using \
match = [Link](r'.', s)
print(match)
# using \
match = [Link](r'\.', s)
print(match)
Output
<[Link] object; span=(0, 1), match='g'>
<[Link] object; span=(5, 6), match='.'>
2. [] - Square Brackets
Square Brackets ([]) represent a character class consisting of a set of
characters that we wish to match. For example, the character class [abc] will
match any single a, b, or c.
We can also specify a range of characters using - inside the square brackets.
For example,
● [0, 3] is sample as [0123]
● [a-c] is same as [abc]
We can also invert the character class using the caret(^) symbol. For
example,
● [^0-3] means any character except 0, 1, 2, or 3
● [^a-c] means any character except a, b, or c
Example: In this code, you're using regular expressions to find all the
characters in the string that fall within the range of 'a' to 'm'. The [Link]()
function returns a list of all such characters. In the given string, the characters
that match this pattern are: 'c', 'k', 'b', 'f', 'j', 'e', 'h', 'l', 'd', 'g'.
import re
string = "The quick brown fox jumps over the lazy dog"
pattern = "[a-m]"
result = [Link](pattern, string)
print(result)
Output
['h', 'e', 'i', 'c', 'k', 'b', 'f', 'j', 'm', 'e', 'h', 'e', 'l',
'a', 'd', 'g']
3. ^ - Caret
Caret (^) symbol matches the beginning of the string i.e. checks whether the
string starts with the given character(s) or not. For example -
● ^g will check if the string starts with g such as geeks, globe, girl, g,
etc.
● ^ge will check if the string starts with ge such as geeks,
geeksforgeeks, etc.
Example: This code uses regular expressions to check if a list of strings starts
with "The". If a string begins with "The," it's marked as "Matched"
otherwise, it's labeled as "Not matched".
import re
regex = r'^The'
strings = ['The quick brown fox', 'The lazy dog', 'A quick brown
fox']
for string in strings:
if [Link](regex, string):
print(f'Matched: {string}')
else:
print(f'Not matched: {string}')
Output
Matched: The quick brown fox
Matched: The lazy dog
Not matched: A quick brown fox
4. $ - Dollar
Dollar($) symbol matches the end of the string i.e checks whether the string
ends with the given character(s) or not. For example-
● s$ will check for the string that ends with a such as geeks, ends, s,
etc.
● ks$ will check for the string that ends with ks such as geeks,
geeksforgeeks, ks, etc.
Example: This code uses a regular expression to check if the string ends with
"World!". If a match is found, it prints "Match found!" otherwise, it prints
"Match not found".
import re
string = "Hello World!"
pattern = r"World!$"
match = [Link](pattern, string)
if match:
print("Match found!")
else:
print("Match not found.")
Output
Match found!
5. . - Dot
Dot(.) symbol matches only a single character except for the newline
character (\n). For example -
● a.b will check for the string that contains any character at the place of
the dot such as acb, acbd, abbb, etc
● .. will check if the string contains at least 2 characters
Example: This code uses a regular expression to search for the pattern
"[Link]" within the string. The dot (.) in the pattern represents any
character. If a match is found, it prints "Match found!" otherwise, it prints
"Match not found".
import re
string = "The quick brown fox jumps over the lazy dog."
pattern = r"[Link]"
match = [Link](pattern, string)
if match:
print("Match found!")
else:
print("Match not found.")
Output
Match found!
6. | - Or
The | operator means either pattern on its left or right can match. a|b will
match any string that contains a or b such as acd, bcd, abcd, etc.
7. ? - Question Mark
The question mark (?) indicates that the preceding element should be
matched zero or one time. It allows you to specify that the element is optional,
meaning it may occur once or not at all.
For example, ab?c will be matched for the string ac, acb, dabc but will not be
matched for abbc because there are two b. Similarly, it will not be matched for
abdc because b is not followed by c.
8.* - Star
Star (*) symbol matches zero or more occurrences of the regex preceding the
* symbol.
For example, ab*c will be matched for the string ac, abc, abbbc, dabc, etc.
but will not be matched for abdc because b is not followed by c.
9. + - Plus
Plus (+) symbol matches one or more occurrences of the regex preceding the
+ symbol.
For example, ab+c will be matched for the string abc, abbc, dabc, but will not
be matched for ac, abdc, because there is no b in ac and b, is not followed by
c in abdc.
10. {m, n} - Braces
Braces match any repetitions preceding regex from m to n both inclusive.
For example, a{2, 4} will be matched for the string aaab, baaaac, gaad, but
will not be matched for strings like abc, bc because there is only one a or no a
in both the cases.
11. (<regex>) - Group
Group symbol is used to group sub-patterns.
For example, (a|b)cd will match for strings like acd, abcd, gacd, etc.
Special Sequences
Special sequences do not match for the actual character in the string instead
it tells the specific location in the search string where the match must occur. It
makes it easier to write commonly used patterns.
List of special sequences
Special
Description Examples
Sequence
for geeks
Matches if the
string begins
\A \Afor
with the given
character
for the world
Matches if the
word begins or geeks
ends with the
given
character. \
b(string) will
\b check for the \bge
beginning of the
word and
(string)\b will
check for the get
ending of the
word.
together
It is the opposite
of the \b i.e. the
\B string should not \Bge
start or end with
the given regex.
forge
123
Matches any
decimal digit,
\d this is equivalent \d
to the set class
[0-9]
gee1
geeks
Matches any
non-digit
\D character, this is \D
equivalent to the
set class [^0-9]
geek1
gee ks
Matches any
\s whitespace \s
character.
a bc a
a bd
Matches any
\S non-whitespace \S
character
abcd
123
Matches any
alphanumeric
character, this is
\w \w
equivalent to the
class [a-zA-Z0-
9_].
geeKs4
>$
Matches any
non-
\W \W
alphanumeric
character.
gee<>
abcdab
Matches if the
\Z string ends with ab\Z
the given regex
abababab
Sets for character matching
A Set is a set of characters enclosed in '[]' brackets. Sets are used to match a
single character in the set of characters specified between brackets. Below is
the list of Sets:
Set Description
Quantifies the preceding character
\{n,\} or group and matches at least n
occurrences.
Quantifies the preceding character
* or group and matches zero or more
occurrences.
Matches the specified digits (0, 1, 2,
[0123]
or 3)
matches for any character EXCEPT
[^arn]
a, r, and n
\d Matches any digit (0-9).
[0-5][0-9] matches for any two-digit numbers
from 00 and 59
Matches any alphanumeric
\w
character (a-z, A-Z, 0-9, or _).
Matches any lower case alphabet
[a-n]
between a and n.
\D Matches any non-digit character.
matches where one of the specified
[arn]
characters (a, r, or n) are present
matches any character between a
[a-zA-Z]
and z, lower case OR upper case
[0-9] matches any digit between 0 and 9
Match Object
A Match object contains all the information about the search and the result
and if there is no match found then None will be returned. Let's see some of
the commonly used methods and attributes of the match object.
1. Getting the string and the regex
[Link] attribute returns the regular expression passed and [Link]
attribute returns the string passed.
Example:
The code searches for the letter "G" at a word boundary in the string
"Welcome to GeeksForGeeks" and prints the regular expression pattern
([Link]) and the original string ([Link]).
import re
s = "Welcome to GeeksForGeeks"
res = [Link](r"\bG", s)
print([Link])
print([Link])
Output
[Link]('\\bG')
Welcome to GeeksForGeeks
2. Getting index of matched object
● start() method returns the starting index of the matched substring
● end() method returns the ending index of the matched substring
● span() method returns a tuple containing the starting and the ending
index of the matched substring
Example: Getting index of matched object
The code searches for substring "Gee" at a word boundary in string "Welcome
to GeeksForGeeks" and prints start index of the match ([Link]()), end index
of the match ([Link]()) and span of the match ([Link]()).
import re
s = "Welcome to GeeksForGeeks"
res = [Link](r"\bGee", s)
print([Link]())
print([Link]())
print([Link]())
Output
11
14
(11, 14)
3. Getting matched substring
group() method returns the part of the string for which the patterns match. See
the below example for a better understanding.
Example: Getting matched substring
The code searches for a sequence of two non-digit characters followed by a
space and the letter 't' in the string "Welcome to GeeksForGeeks" and prints
the matched text using [Link]().
import re
s = "Welcome to GeeksForGeeks"
res = [Link](r"\D{2} t", s)
print([Link]())
Output
me t
In the above example, our pattern specifies for the string that contains at least
2 characters which are followed by a space, and that space is followed by a t.
Basic RegEx Patterns
Let's understand some of the basic regular expressions. They are as follows:
1. Character Classes
Character classes allow matching any one character from a specified set.
They are enclosed in square brackets [].
import re
print([Link](r'[Gg]eeks', 'GeeksforGeeks: \
A computer science portal for geeks'))
Output
['Geeks', 'Geeks', 'geeks']
2. Ranges
In RegEx, a range allows matching characters or digits within a span using -
inside []. For example, [0-9] matches digits, [A-Z] matches uppercase letters.
import re
print('Range',[Link](r'[a-zA-Z]', 'x'))
Output
Range <[Link] object; span=(0, 1), match='x'>
3. Negation
Negation in a character class is specified by placing a ^ at the beginning of the
brackets, meaning match anything except those characters.
Syntax:
[^a-z]
Example:
import re
print([Link](r'[^a-z]', 'c'))
print([Link](r'G[^e]', 'Geeks'))
Output
None
None
3. Shortcuts
Shortcuts are shorthand representations for common character classes. Let's
discuss some of the shortcuts provided by the regular expression engine.
● \w - matches a word character
● \d - matches digit character
● \s - matches whitespace character (space, tab, newline, etc.)
● \b - matches a zero-length character
import re
print('Geeks:', [Link](r'\bGeeks\b', 'Geeks'))
print('GeeksforGeeks:', [Link](r'\bGeeks\b', 'GeeksforGeeks'))
Output
Geeks: <_sre.SRE_Match object; span=(0, 5), match='Geeks'>
GeeksforGeeks: None
4. Beginning and End of String
The ^ character chooses the beginning of a string and the $ character
chooses the end of a string.
import re
# Beginning of String
match = [Link](r'^Geek', 'Campus Geek of the month')
print('Beg. of String:', match)
match = [Link](r'^Geek', 'Geek of the month')
print('Beg. of String:', match)
# End of String
match = [Link](r'Geeks$', 'Compute science portal-GeeksforGeeks')
print('End of String:', match)
Output
Beg. of String: None
Beg. of String: <_sre.SRE_Match object; span=(0, 4),
match='Geek'>
End of String: <_sre.SRE_Match object; span=(31, 36),
match='Geeks'>
5. Any Character
The . character represents any single character outside a bracketed character
class.
import re
print('Any Character', [Link](r'[Link].n', 'python 3'))
Output
Any Character <_sre.SRE_Match object; span=(0, 6),
match='python'>
6. Optional Characters
Regular expression engine allows you to specify optional characters using the
? character. It allows a character or character class either to present once or
else not to occur. Let's consider the example of a word with an alternative
spelling - color or colour.
import re
print('Color',[Link](r'colou?r', 'color'))
print('Colour',[Link](r'colou?r', 'colour'))
Output
Color <_sre.SRE_Match object; span=(0, 5), match='color'>
Colour <_sre.SRE_Match object; span=(0, 6), match='colour'>
7. Repetition
Repetition enables you to repeat the same character or character class.
Consider an example of a date that consists of day, month, and year. Let's
use a regular expression to identify the date (mm-dd-yyyy).
import re
print('Date{mm-dd-yyyy}:', [Link](r'[\d]{2}-[\d]{2}-[\d]{4}','18-
08-2020'))
Output
Date{mm-dd-yyyy}: <_sre.SRE_Match object; span=(0, 10),
match='18-08-2020'>
Here, the regular expression engine checks for two consecutive digits. Upon
finding the match, it moves to the hyphen character. After then, it checks the
next two consecutive digits and the process is repeated.
Let's discuss three other regular expressions under repetition.
7.1 Repetition ranges
The repetition range is useful when you have to accept one or more formats.
Consider a scenario where both three digits, as well as four digits, are
accepted. Let's have a look at the regular expression.
import re
print('Three Digit:', [Link](r'[\d]{3,4}', '189'))
print('Four Digit:', [Link](r'[\d]{3,4}', '2145'))
Output
Three Digit: <_sre.SRE_Match object; span=(0, 3), match='189'>
Four Digit: <_sre.SRE_Match object; span=(0, 4), match='2145'>
7.2 Open-Ended Ranges
There are scenarios where there is no limit for a character repetition. In such
scenarios, you can set the upper limit as infinitive. A common example is
matching street addresses. Let's have a look
import re
print([Link](r'[\d]{1,}','5th Floor, A-118,\
Sector-136, Noida, Uttar Pradesh - 201305'))
Output
<_sre.SRE_Match object; span=(0, 1), match='5'>
7.3 Shorthand
Shorthand characters allow you to use + character to specify one or more
({1,}) and * character to specify zero or more ({0,}.
import re
print([Link](r'[\d]+', '5th Floor, A-118,\
Sector-136, Noida, Uttar Pradesh - 201305'))
Output
<_sre.SRE_Match object; span=(0, 1), match='5'>
8. Grouping
Grouping is the process of separating an expression into groups by using
parentheses, and it allows you to fetch each individual matching group.
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})', '26-08-2020')
print(grp)
Output
<_sre.SRE_Match object; span=(0, 10), match='26-08-2020'>
Let's see some of its functionality.
8.1 Return the entire match
The re module allows you to return the entire match using the group() method
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print([Link]())
Output
26-08-2020
8.2 Return a tuple of matched groups
You can use groups() method to return a tuple that holds individual matched
groups
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print([Link]())
Output
('26', '08', '2020')
8.3 Retrieve a single group
Upon passing the index to a group method, you can retrieve just a single
group.
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print([Link](3))
Output
2020
8.4 Name your groups
The re module allows you to name your groups. Let's look into the syntax.
import re
match = [Link](r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]
{4})',
'26-08-2020')
print([Link]('mm'))
Output
08
8.5 Individual match as a dictionary
We have seen how regular expression provides a tuple of individual groups.
Not only tuple, but it can also provide individual match as a dictionary in which
the name of each group acts as the dictionary key.
import re
match = [Link](r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]
{4})',
'26-08-2020')
print([Link]())
Output
{'dd': '26', 'mm': '08', 'yyyy': '2020'}
9. Lookahead
In the case of a negated character class, it won't match if a character is not
present to check against the negated character. We can overcome this case
by using lookahead; it accepts or rejects a match based on the presence or
absence of content.
import re
print('negation:', [Link](r'n[^e]', 'Python'))
print('lookahead:', [Link](r'n(?!e)', 'Python'))
Output
negation: None
lookahead: <_sre.SRE_Match object; span=(5, 6), match='n'>
Lookahead can also disqualify the match if it is not followed by a particular
character. This process is called a positive lookahead, and can be achieved
by simply replacing ! character with = character.
import re
print('positive lookahead', [Link](r'n(?=e)', 'jasmine'))
Output
positive lookahead <_sre.SRE_Match object; span=(5, 6),
match='n'>
10. Substitution
The regular expression can replace the string and returns the replaced one
using the [Link] method. It is useful when you want to avoid characters such
as /, -, ., etc. before storing it to a database. It takes three arguments:
● the regular expression
● the replacement string
● the source string being searched
Let's have a look at the below code that replaces - character from a credit
card number.
import re
print([Link](r'([\d]{4})-([\d]{4})-([\d]{4})-([\d]{4})',r'\1\2\3\4',
'1111-2222-3333-4444'))
Output
1111222233334444