0% found this document useful (0 votes)

11 views38 pages

Python RegEx

The document provides an overview of Regular Expressions (RegEx) in Python, explaining its purpose and how to use the built-in 're' module for various string manipulation tasks. It details key functions such as re.findall(), re.compile(), re.split(), re.sub(), and re.search(), along with examples demonstrating their usage. Additionally, it covers metacharacters and their significance in defining search patterns.

Uploaded by

Mathura Subhash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views38 pages

Python RegEx

Uploaded by

Mathura Subhash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Python RegEx

A Regular Expression or RegEx is a special sequence of characters that uses

a search pattern to find a string or set of strings.

It can detect the presence or absence of a text by matching it with a particular

pattern and also can split a pattern into one or more sub-patterns.

Regex Module in Python

Python has a built-in module named "re" that is used for regular expressions
in Python. We can import this module by using import statement.

Importing re module in Python using following command:

import re

How to Use RegEx in Python?

You can use RegEx in Python after importing re module.

Example:

This Python code uses regular expressions to search for the word "portal" in
the given string and then prints the start and end indices of the matched word
within the string.
import re

s = 'GeeksforGeeks: A computer science portal for geeks'

match = [Link](r'portal', s)

print('Start Index:', [Link]())

print('End Index:', [Link]())
Output
Start Index: 34
End Index: 40

Note: Here r character (r’portal’) stands for raw, not regex. The raw string is
slightly different from a regular string, it won’t interpret the \ character as an
escape character. This is because the regular expression engine uses \
character for its own escaping purpose.

Before starting with the Python regex module let's see how to actually write
regex using metacharacters or special sequences.

RegEx Functions
The re module in Python provides various functions that help search, match,
and manipulate strings using regular expressions.

Below are main functions available in the re module:

Function Description

finds and returns all matching

[Link]()
occurrences in a list
Regular expressions are compiled
[Link]()
into pattern objects

Split string by the occurrences of a

[Link]()
character or a pattern.

Replaces all occurrences of a

[Link]() character or patter with a
replacement string.

It's similar to [Link]() method but it

resubn returns a tuple: (new_string,
number_of_substitutions)

[Link]() Escapes special character

Searches for first occurrence of
[Link]()
character or pattern

Let's see the working of these RegEx functions with definition and examples:

1. [Link]()
Returns all non-overlapping matches of a pattern in the string as a list. It
scans the string from left to right.

Example: This code uses regular expression \d+ to find all sequences of one
or more digits in the given string.
import re
string = """Hello my Number is 123456789 and
my friend's number is 987654321"""

regex = '\d+'
match = [Link](regex, string)
print(match)

Output
['123456789', '987654321']

2. [Link]()
Compiles a regex into a pattern object, which can be reused for matching or
substitutions.

Example 1: This pattern [a-e] matches all lowercase letters between 'a' and
'e', in the input string "Aye, said Mr. Gibenson Stark". The output should be
['e', 'a', 'd', 'b', 'e'], which are matching characters.
import re
p = [Link]('[a-e]')
print([Link]("Aye, said Mr. Gibenson Stark"))

Output
['e', 'a', 'd', 'b', 'e', 'a']

Explanation:

● First occurrence is 'e' in "Aye" and not 'A', as it is Case Sensitive.

● Next Occurrence is 'a' in "said", then 'd' in "said", followed by 'b' and
'e' in "Gibenson", the Last 'a' matches with "Stark".
● Metacharacter backslash '\' has a very important role as it signals
various sequences. If the backslash is to be used without its special
meaning as metacharacter, use'\\'

Example 2: The code uses regular expressions to find and list all single digits
and sequences of digits in the given input strings. It finds single digits with \d
and sequences of digits with \d+.
import re
p = [Link]('\d')
print([Link]("I went to him at 11 A.M. on 4th July 1886"))

p = [Link]('\d+')
print([Link]("I went to him at 11 A.M. on 4th July 1886"))

Output
['1', '1', '4', '1', '8', '8', '6']
['11', '4', '1886']

Example 3: Word and non-word characters

● \w matches a single word character.

● \w+ matches a group of word characters.
● \W matches non-word characters.

import re

p = [Link]('\w')
print([Link]("He said * in some_lang."))

p = [Link]('\w+')
print([Link]("I went to him at 11 A.M., he \
said *** in some_language."))

p = [Link]('\W')
print([Link]("he said *** in some_language."))

Output
['H', 'e', 's', 'a', 'i', 'd', 'i', 'n', 's', 'o', 'm', 'e', '_',
'l', 'a', 'n', 'g']
['I', 'went', 'to', 'him', 'at', '11', 'A', 'M', 'he', 'said',
'in', 'some_language']
[' ', ' ', '*', '*', '*', ' ', ' ', '.']

Example 4: The regular expression pattern 'ab*' to find and list all
occurrences of 'ab' followed by zero or more 'b' characters. In the input string
"ababbaabbb". It returns the following list of matches: ['ab', 'abb', 'abbb'].
import re
p = [Link]('ab*')
print([Link]("ababbaabbb"))

Output
['ab', 'abb', 'a', 'abbb']

Explanation:

● Output 'ab', is valid because of single 'a' accompanied by single 'b'.

● Output 'abb', is valid because of single 'a' accompanied by 2 'b'.
● Output 'a', is valid because of single 'a' accompanied by 0 'b'.
● Output 'abbb', is valid because of single 'a' accompanied by 3 'b'.

3. [Link]()
Splits a string wherever the pattern matches. The remaining characters are
returned as list elements.

Syntax:

[Link](pattern, string, maxsplit=0, flags=0)

● pattern: Regular expression to match split points.

● string: The input string to split.
● maxsplit (optional): Limits the number of splits. Default is 0 (no
limit).
● flags (optional): Apply regex flags like [Link].

Example 1: Splitting by non-word characters or digits

This example demonstrates how to split a string using different patterns like
non-word characters (\W+), apostrophes, and digits (\d+).
from re import split

print(split('\W+', 'Words, words , Words'))

print(split('\W+', "Word's words Words"))
print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))

Output
['Words', 'words', 'Words']
['Word', 's', 'words', 'Words']
['On', '12th', 'Jan', '2016', 'at', '11', '02', 'AM']
['On ', 'th Jan ', ', at ', ':', ' AM']
Example 2: Using maxsplit and flags

This example shows how to limit the number of splits using maxsplit, and how
flags can control case sensitivity.
import re
print([Link]('\d+', 'On 12th Jan 2016, at 11:02 AM', 1))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here',
flags=[Link]))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here'))

Output
['On ', 'th Jan 2016, at 11:02 AM']
['', 'y, ', 'oy oh ', 'oy, ', 'om', ' h', 'r', '']
['A', 'y, Boy oh ', 'oy, ', 'om', ' h', 'r', '']

Note: In the second and third cases of the above , [a-f]+ splits the string using
any combination of lowercase letters from 'a' to 'f'. The [Link] flag
includes uppercase letters in the match.

4. [Link]()
The [Link]() function replaces all occurrences of a pattern in a string with a
replacement string.

Syntax:

[Link](pattern, repl, string, count=0, flags=0)

● pattern: The regex pattern to search for.

● repl: The string to replace matches with.
● string: The input string to process.
● count (optional): Maximum number of substitutions (default is 0,
which means replace all).
● flags (optional): Regex flags like [Link].

Example 1: The following examples show different ways to replace the

pattern 'ub' with '~*', using various flags and count values.
import re

# Case-insensitive replacement of all 'ub'

print([Link]('ub', '~*', 'Subject has Uber booked already',
flags=[Link]))

# Case-sensitive replacement of all 'ub'

print([Link]('ub', '~*', 'Subject has Uber booked already'))

# Replace only the first 'ub', case-insensitive

print([Link]('ub', '~*', 'Subject has Uber booked already', count=1,
flags=[Link]))

# Replace "AND" with "&", ignoring case

print([Link](r'\sAND\s', ' & ', 'Baked Beans And Spam',
flags=[Link]))

Output
S~*ject has ~*er booked already
S~*ject has Uber booked already
S~*ject has Uber booked already
Baked Beans & Spam

5. [Link]()
[Link]() function works just like [Link](), but instead of returning only the
modified string, it returns a tuple: (new_string, number_of_substitutions)

Syntax:
[Link](pattern, repl, string, count=0, flags=0)

Example: Substitution with count

This example shows how [Link]() gives both the replaced string and the
number of times replacements were made.
import re

# Case-sensitive replacement
print([Link]('ub', '~*', 'Subject has Uber booked already'))

# Case-insensitive replacement
t = [Link]('ub', '~*', 'Subject has Uber booked already',
flags=[Link])
print(t)
print(len(t)) # tuple length
print(t[0]) # modified string

Output
('S~*ject has Uber booked already', 1)
('S~*ject has ~*er booked already', 2)
2
S~*ject has ~*er booked already

6. [Link]()
[Link]() function adds a backslash (\) before all special characters in a
string. This is useful when you want to match a string literally, including any
characters that have special meaning in regex (like ., *, [, ], etc.).

Syntax:

[Link](string)

Example: Escaping special characters

This example shows how [Link]() treats spaces, brackets, dashes, and
tabs as literal characters.
import re
print([Link]("This is Awesome even 1 AM"))
print([Link]("I Asked what is this [a-9], he said \t ^WoW"))

Output
This\ is\ Awesome\ even\ 1\ AM
I\ Asked\ what\ is\ this\ \[a\-9\]\,\ he\ said\ \ \ \^WoW

7. [Link]()
The [Link]() function searches for the first occurrence of a pattern in a
string. It returns a match object if found, otherwise None.

Note: Use it when you want to check if a pattern exists or extract the first
match.

Example: Search and extract values

This example searches for a date pattern with a month name (letters) followed
by a day (digits) in a sentence.
import re

regex = r"([a-zA-Z]+) (\d+)"

match = [Link](regex, "I was born on June 24")

if match:
print("Match at index %s, %s" % ([Link](), [Link]()))
print("Full match:", [Link](0))
print("Month:", [Link](1))
print("Day:", [Link](2))
else:
print("The regex pattern does not match.")
Output
Match at index 14, 21
Full match: June 24
Month: June
Day: 24

Meta-characters
Metacharacters are special characters in regular expressions used to define
search patterns. The re module in Python supports several metacharacters
that help you perform powerful pattern matching.

Below is a quick reference table:

MetaCharacters Description

Used to drop the special meaning

\
of character following it

[] Represent a character class

^ Matches the beginning

$ Matches the end

Matches any character except

.
newline

Means OR (Matches with any of

|
the characters separated by it.

? Matches zero or one occurrence

* Any number of occurrences

(including 0 occurrences)

+ One or more occurrences

Indicate the number of

{} occurrences of a preceding regex
to match.

() Enclose a group of Regex

Let's discuss each of these metacharacters in detail:

1. \ - Backslash

The backslash (\) makes sure that the character is not treated in a special
way. This can be considered a way of escaping metacharacters.

For example, if you want to search for the dot(.) in the string then you will find
that dot(.) will be treated as a special character as is one of the
metacharacters (as shown in the above table). So for this case, we will use
the backslash(\) just before the dot(.) so that it will lose its specialty. See the
below example for a better understanding.

Example: The first search ([Link](r'.', s)) matches any character, not just
the period, while the second search ([Link](r'\.', s)) specifically looks for
and matches the period character.
import re

s = '[Link]'

# without using \
match = [Link](r'.', s)
print(match)

# using \
match = [Link](r'\.', s)
print(match)

Output
<[Link] object; span=(0, 1), match='g'>
<[Link] object; span=(5, 6), match='.'>

2. [] - Square Brackets

Square Brackets ([]) represent a character class consisting of a set of

characters that we wish to match. For example, the character class [abc] will
match any single a, b, or c.

We can also specify a range of characters using - inside the square brackets.
For example,

● [0, 3] is sample as [0123]

● [a-c] is same as [abc]
We can also invert the character class using the caret(^) symbol. For
example,

● [^0-3] means any character except 0, 1, 2, or 3

● [^a-c] means any character except a, b, or c

Example: In this code, you're using regular expressions to find all the
characters in the string that fall within the range of 'a' to 'm'. The [Link]()
function returns a list of all such characters. In the given string, the characters
that match this pattern are: 'c', 'k', 'b', 'f', 'j', 'e', 'h', 'l', 'd', 'g'.
import re

string = "The quick brown fox jumps over the lazy dog"
pattern = "[a-m]"
result = [Link](pattern, string)

print(result)

Output
['h', 'e', 'i', 'c', 'k', 'b', 'f', 'j', 'm', 'e', 'h', 'e', 'l',
'a', 'd', 'g']

3. ^ - Caret

Caret (^) symbol matches the beginning of the string i.e. checks whether the
string starts with the given character(s) or not. For example -

● ^g will check if the string starts with g such as geeks, globe, girl, g,
etc.
● ^ge will check if the string starts with ge such as geeks,
geeksforgeeks, etc.
Example: This code uses regular expressions to check if a list of strings starts
with "The". If a string begins with "The," it's marked as "Matched"
otherwise, it's labeled as "Not matched".
import re
regex = r'^The'
strings = ['The quick brown fox', 'The lazy dog', 'A quick brown
fox']
for string in strings:
if [Link](regex, string):
print(f'Matched: {string}')
else:
print(f'Not matched: {string}')

Output
Matched: The quick brown fox
Matched: The lazy dog
Not matched: A quick brown fox

4. $ - Dollar

Dollar($) symbol matches the end of the string i.e checks whether the string
ends with the given character(s) or not. For example-

● s$ will check for the string that ends with a such as geeks, ends, s,
etc.
● ks$ will check for the string that ends with ks such as geeks,
geeksforgeeks, ks, etc.

Example: This code uses a regular expression to check if the string ends with
"World!". If a match is found, it prints "Match found!" otherwise, it prints
"Match not found".
import re
string = "Hello World!"
pattern = r"World!$"

match = [Link](pattern, string)

if match:
print("Match found!")
else:
print("Match not found.")

Output
Match found!

5. . - Dot

Dot(.) symbol matches only a single character except for the newline
character (\n). For example -

● a.b will check for the string that contains any character at the place of
the dot such as acb, acbd, abbb, etc
● .. will check if the string contains at least 2 characters

Example: This code uses a regular expression to search for the pattern
"[Link]" within the string. The dot (.) in the pattern represents any
character. If a match is found, it prints "Match found!" otherwise, it prints
"Match not found".
import re

string = "The quick brown fox jumps over the lazy dog."
pattern = r"[Link]"

match = [Link](pattern, string)

if match:
print("Match found!")
else:
print("Match not found.")

Output
Match found!

6. | - Or

The | operator means either pattern on its left or right can match. a|b will
match any string that contains a or b such as acd, bcd, abcd, etc.

7. ? - Question Mark

The question mark (?) indicates that the preceding element should be
matched zero or one time. It allows you to specify that the element is optional,
meaning it may occur once or not at all.

For example, ab?c will be matched for the string ac, acb, dabc but will not be
matched for abbc because there are two b. Similarly, it will not be matched for
abdc because b is not followed by c.

8.* - Star

Star (*) symbol matches zero or more occurrences of the regex preceding the
* symbol.

For example, ab*c will be matched for the string ac, abc, abbbc, dabc, etc.
but will not be matched for abdc because b is not followed by c.

9. + - Plus
Plus (+) symbol matches one or more occurrences of the regex preceding the
+ symbol.

For example, ab+c will be matched for the string abc, abbc, dabc, but will not
be matched for ac, abdc, because there is no b in ac and b, is not followed by
c in abdc.

10. {m, n} - Braces

Braces match any repetitions preceding regex from m to n both inclusive.

For example, a{2, 4} will be matched for the string aaab, baaaac, gaad, but
will not be matched for strings like abc, bc because there is only one a or no a
in both the cases.

11. (<regex>) - Group

Group symbol is used to group sub-patterns.

For example, (a|b)cd will match for strings like acd, abcd, gacd, etc.

Special Sequences
Special sequences do not match for the actual character in the string instead
it tells the specific location in the search string where the match must occur. It
makes it easier to write commonly used patterns.

List of special sequences

Special
Description Examples
Sequence

for geeks

Matches if the
string begins
\A \Afor
with the given
character

for the world

Matches if the
word begins or geeks
ends with the
given
character. \
b(string) will
\b check for the \bge
beginning of the
word and
(string)\b will
check for the get
ending of the
word.
together
It is the opposite
of the \b i.e. the
\B string should not \Bge
start or end with
the given regex.
forge

123
Matches any
decimal digit,
\d this is equivalent \d
to the set class
[0-9]
gee1

geeks
Matches any
non-digit
\D character, this is \D
equivalent to the
set class [^0-9]
geek1
gee ks

Matches any
\s whitespace \s
character.

a bc a

a bd

Matches any
\S non-whitespace \S
character

abcd

123
Matches any
alphanumeric
character, this is
\w \w
equivalent to the
class [a-zA-Z0-
9_].
geeKs4
>$

Matches any
non-
\W \W
alphanumeric
character.

gee<>

abcdab

Matches if the
\Z string ends with ab\Z
the given regex

abababab

Sets for character matching

A Set is a set of characters enclosed in '[]' brackets. Sets are used to match a
single character in the set of characters specified between brackets. Below is
the list of Sets:

Set Description
Quantifies the preceding character
\{n,\} or group and matches at least n
occurrences.

Quantifies the preceding character

* or group and matches zero or more
occurrences.

Matches the specified digits (0, 1, 2,

[0123]
or 3)

matches for any character EXCEPT

[^arn]
a, r, and n

\d Matches any digit (0-9).

[0-5][0-9] matches for any two-digit numbers

from 00 and 59

Matches any alphanumeric

\w
character (a-z, A-Z, 0-9, or _).

Matches any lower case alphabet

[a-n]
between a and n.

\D Matches any non-digit character.

matches where one of the specified

[arn]
characters (a, r, or n) are present

matches any character between a

[a-zA-Z]
and z, lower case OR upper case
[0-9] matches any digit between 0 and 9

Match Object
A Match object contains all the information about the search and the result
and if there is no match found then None will be returned. Let's see some of
the commonly used methods and attributes of the match object.

1. Getting the string and the regex

[Link] attribute returns the regular expression passed and [Link]

attribute returns the string passed.

Example:

The code searches for the letter "G" at a word boundary in the string
"Welcome to GeeksForGeeks" and prints the regular expression pattern
([Link]) and the original string ([Link]).
import re
s = "Welcome to GeeksForGeeks"
res = [Link](r"\bG", s)

print([Link])
print([Link])

Output
[Link]('\\bG')
Welcome to GeeksForGeeks
2. Getting index of matched object

● start() method returns the starting index of the matched substring

● end() method returns the ending index of the matched substring
● span() method returns a tuple containing the starting and the ending
index of the matched substring

Example: Getting index of matched object

The code searches for substring "Gee" at a word boundary in string "Welcome
to GeeksForGeeks" and prints start index of the match ([Link]()), end index
of the match ([Link]()) and span of the match ([Link]()).
import re

s = "Welcome to GeeksForGeeks"
res = [Link](r"\bGee", s)

print([Link]())
print([Link]())
print([Link]())

Output
11
14
(11, 14)

3. Getting matched substring

group() method returns the part of the string for which the patterns match. See
the below example for a better understanding.

Example: Getting matched substring

The code searches for a sequence of two non-digit characters followed by a
space and the letter 't' in the string "Welcome to GeeksForGeeks" and prints
the matched text using [Link]().
import re
s = "Welcome to GeeksForGeeks"
res = [Link](r"\D{2} t", s)
print([Link]())

Output
me t

In the above example, our pattern specifies for the string that contains at least
2 characters which are followed by a space, and that space is followed by a t.

Basic RegEx Patterns

Let's understand some of the basic regular expressions. They are as follows:

1. Character Classes

Character classes allow matching any one character from a specified set.
They are enclosed in square brackets [].
import re
print([Link](r'[Gg]eeks', 'GeeksforGeeks: \
A computer science portal for geeks'))

Output
['Geeks', 'Geeks', 'geeks']

2. Ranges
In RegEx, a range allows matching characters or digits within a span using -
inside []. For example, [0-9] matches digits, [A-Z] matches uppercase letters.
import re
print('Range',[Link](r'[a-zA-Z]', 'x'))

Output
Range <[Link] object; span=(0, 1), match='x'>

3. Negation

Negation in a character class is specified by placing a ^ at the beginning of the

brackets, meaning match anything except those characters.

Syntax:

[^a-z]

Example:
import re

print([Link](r'[^a-z]', 'c'))
print([Link](r'G[^e]', 'Geeks'))

Output
None
None

3. Shortcuts
Shortcuts are shorthand representations for common character classes. Let's
discuss some of the shortcuts provided by the regular expression engine.

● \w - matches a word character

● \d - matches digit character
● \s - matches whitespace character (space, tab, newline, etc.)
● \b - matches a zero-length character

import re

print('Geeks:', [Link](r'\bGeeks\b', 'Geeks'))

print('GeeksforGeeks:', [Link](r'\bGeeks\b', 'GeeksforGeeks'))

Output
Geeks: <_sre.SRE_Match object; span=(0, 5), match='Geeks'>

GeeksforGeeks: None

4. Beginning and End of String

The ^ character chooses the beginning of a string and the $ character

chooses the end of a string.
import re

# Beginning of String
match = [Link](r'^Geek', 'Campus Geek of the month')
print('Beg. of String:', match)

match = [Link](r'^Geek', 'Geek of the month')

print('Beg. of String:', match)

# End of String
match = [Link](r'Geeks$', 'Compute science portal-GeeksforGeeks')
print('End of String:', match)

Output
Beg. of String: None
Beg. of String: <_sre.SRE_Match object; span=(0, 4),
match='Geek'>

End of String: <_sre.SRE_Match object; span=(31, 36),

match='Geeks'>

5. Any Character

The . character represents any single character outside a bracketed character

class.
import re
print('Any Character', [Link](r'[Link].n', 'python 3'))

Output

Any Character <_sre.SRE_Match object; span=(0, 6),

match='python'>

6. Optional Characters

Regular expression engine allows you to specify optional characters using the
? character. It allows a character or character class either to present once or
else not to occur. Let's consider the example of a word with an alternative
spelling - color or colour.
import re

print('Color',[Link](r'colou?r', 'color'))
print('Colour',[Link](r'colou?r', 'colour'))
Output
Color <_sre.SRE_Match object; span=(0, 5), match='color'>

Colour <_sre.SRE_Match object; span=(0, 6), match='colour'>

7. Repetition

Repetition enables you to repeat the same character or character class.

Consider an example of a date that consists of day, month, and year. Let's
use a regular expression to identify the date (mm-dd-yyyy).
import re
print('Date{mm-dd-yyyy}:', [Link](r'[\d]{2}-[\d]{2}-[\d]{4}','18-
08-2020'))

Output

Date{mm-dd-yyyy}: <_sre.SRE_Match object; span=(0, 10),

match='18-08-2020'>

Here, the regular expression engine checks for two consecutive digits. Upon
finding the match, it moves to the hyphen character. After then, it checks the
next two consecutive digits and the process is repeated.

Let's discuss three other regular expressions under repetition.

7.1 Repetition ranges

The repetition range is useful when you have to accept one or more formats.
Consider a scenario where both three digits, as well as four digits, are
accepted. Let's have a look at the regular expression.
import re

print('Three Digit:', [Link](r'[\d]{3,4}', '189'))

print('Four Digit:', [Link](r'[\d]{3,4}', '2145'))

Output
Three Digit: <_sre.SRE_Match object; span=(0, 3), match='189'>

Four Digit: <_sre.SRE_Match object; span=(0, 4), match='2145'>

7.2 Open-Ended Ranges

There are scenarios where there is no limit for a character repetition. In such
scenarios, you can set the upper limit as infinitive. A common example is
matching street addresses. Let's have a look
import re

print([Link](r'[\d]{1,}','5th Floor, A-118,\

Sector-136, Noida, Uttar Pradesh - 201305'))

Output

<_sre.SRE_Match object; span=(0, 1), match='5'>

7.3 Shorthand

Shorthand characters allow you to use + character to specify one or more

({1,}) and * character to specify zero or more ({0,}.
import re

print([Link](r'[\d]+', '5th Floor, A-118,\

Sector-136, Noida, Uttar Pradesh - 201305'))

Output

<_sre.SRE_Match object; span=(0, 1), match='5'>

8. Grouping

Grouping is the process of separating an expression into groups by using

parentheses, and it allows you to fetch each individual matching group.
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})', '26-08-2020')
print(grp)

Output

<_sre.SRE_Match object; span=(0, 10), match='26-08-2020'>

Let's see some of its functionality.

8.1 Return the entire match

The re module allows you to return the entire match using the group() method
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print([Link]())

Output

26-08-2020

8.2 Return a tuple of matched groups

You can use groups() method to return a tuple that holds individual matched
groups
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print([Link]())
Output

('26', '08', '2020')

8.3 Retrieve a single group

Upon passing the index to a group method, you can retrieve just a single
group.
import re
grp = [Link](r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print([Link](3))

Output

2020

8.4 Name your groups

The re module allows you to name your groups. Let's look into the syntax.
import re
match = [Link](r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]
{4})',
'26-08-2020')
print([Link]('mm'))

Output

8.5 Individual match as a dictionary

We have seen how regular expression provides a tuple of individual groups.
Not only tuple, but it can also provide individual match as a dictionary in which
the name of each group acts as the dictionary key.
import re
match = [Link](r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]
{4})',
'26-08-2020')
print([Link]())

Output

{'dd': '26', 'mm': '08', 'yyyy': '2020'}

9. Lookahead

In the case of a negated character class, it won't match if a character is not

present to check against the negated character. We can overcome this case
by using lookahead; it accepts or rejects a match based on the presence or
absence of content.
import re
print('negation:', [Link](r'n[^e]', 'Python'))
print('lookahead:', [Link](r'n(?!e)', 'Python'))

Output
negation: None

lookahead: <_sre.SRE_Match object; span=(5, 6), match='n'>

Lookahead can also disqualify the match if it is not followed by a particular

character. This process is called a positive lookahead, and can be achieved
by simply replacing ! character with = character.
import re
print('positive lookahead', [Link](r'n(?=e)', 'jasmine'))
Output

positive lookahead <_sre.SRE_Match object; span=(5, 6),

match='n'>

10. Substitution

The regular expression can replace the string and returns the replaced one
using the [Link] method. It is useful when you want to avoid characters such
as /, -, ., etc. before storing it to a database. It takes three arguments:

● the regular expression

● the replacement string
● the source string being searched

Let's have a look at the below code that replaces - character from a credit
card number.
import re
print([Link](r'([\d]{4})-([\d]{4})-([\d]{4})-([\d]{4})',r'\1\2\3\4',
'1111-2222-3333-4444'))

Output

1111222233334444

Python Regex String Manipulation Guide
No ratings yet
Python Regex String Manipulation Guide
8 pages
Understanding Metacharacters in RegEx
No ratings yet
Understanding Metacharacters in RegEx
9 pages
Understanding Regular Expressions in Python
No ratings yet
Understanding Regular Expressions in Python
20 pages
File and Regex Operations in Python
No ratings yet
File and Regex Operations in Python
14 pages
Understanding Python Regular Expressions
No ratings yet
Understanding Python Regular Expressions
21 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
30 pages
Python Regex Functions Explained
No ratings yet
Python Regex Functions Explained
19 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
67 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
16 pages
Python Regex Metacharacters Guide
No ratings yet
Python Regex Metacharacters Guide
4 pages
Python Regex for Text Manipulation
No ratings yet
Python Regex for Text Manipulation
4 pages
Python Regex: A Comprehensive Guide
No ratings yet
Python Regex: A Comprehensive Guide
24 pages
Python Regex: findall vs finditer
No ratings yet
Python Regex: findall vs finditer
28 pages
Python Regex: Using the re Module
No ratings yet
Python Regex: Using the re Module
3 pages
Python Regex Methods Explained
No ratings yet
Python Regex Methods Explained
11 pages
Python Strings and Regex Guide
No ratings yet
Python Strings and Regex Guide
20 pages
Python Regex Basics and Examples
No ratings yet
Python Regex Basics and Examples
10 pages
Python Regex: re Module Overview
No ratings yet
Python Regex: re Module Overview
16 pages
Python Regex Functions and Examples
No ratings yet
Python Regex Functions and Examples
4 pages
Validating Mobile Numbers in Python
No ratings yet
Validating Mobile Numbers in Python
57 pages
Validate Mobile Number in Python
No ratings yet
Validate Mobile Number in Python
57 pages
Python Regex Examples and Usage
No ratings yet
Python Regex Examples and Usage
12 pages
Understanding Regular Expressions in Python
No ratings yet
Understanding Regular Expressions in Python
20 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
5 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Python Regex Basics and Examples
No ratings yet
Python Regex Basics and Examples
27 pages
Regular Expressions in Python Guide
No ratings yet
Regular Expressions in Python Guide
48 pages
Python Regex Essentials Guide
No ratings yet
Python Regex Essentials Guide
11 pages
Python Regex Guide and Examples
No ratings yet
Python Regex Guide and Examples
13 pages
Understanding Python's re Module
No ratings yet
Understanding Python's re Module
9 pages
Advanced Python Programming Manual
No ratings yet
Advanced Python Programming Manual
29 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
143 pages
Understanding Python Regular Expressions
No ratings yet
Understanding Python Regular Expressions
17 pages
Python Regex: Match, Search, Replace
No ratings yet
Python Regex: Match, Search, Replace
14 pages
Python Regex Guide and Functions
No ratings yet
Python Regex Guide and Functions
47 pages
Thaanya Meaning in Tamil Explained
No ratings yet
Thaanya Meaning in Tamil Explained
53 pages
Python Regex Methods Explained
No ratings yet
Python Regex Methods Explained
10 pages
Understanding Regular Expressions in Python
No ratings yet
Understanding Regular Expressions in Python
38 pages
Python Regex: Using findall() Function
No ratings yet
Python Regex: Using findall() Function
40 pages
Regular Expressions - Regexes in Python (Part 1) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 1) - Real Python
44 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
19 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
20 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
Regular Expressions in Python Explained
No ratings yet
Regular Expressions in Python Explained
15 pages
Regex for Word Patterns in Python
No ratings yet
Regex for Word Patterns in Python
19 pages
Python String Matching Techniques
No ratings yet
Python String Matching Techniques
23 pages
Understanding Regular Expressions
No ratings yet
Understanding Regular Expressions
104 pages
Using RegEx with Python's re Module
No ratings yet
Using RegEx with Python's re Module
8 pages
Key Regex Concepts in Python
No ratings yet
Key Regex Concepts in Python
4 pages
Regex Basics in Python: Raw Strings & Patterns
No ratings yet
Regex Basics in Python: Raw Strings & Patterns
12 pages
Mastering Regular Expressions Basics
No ratings yet
Mastering Regular Expressions Basics
128 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
8 pages
Python Regular Expressions Tutorial
No ratings yet
Python Regular Expressions Tutorial
23 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
17 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
18 pages
Python Multithreading Overview
No ratings yet
Python Multithreading Overview
72 pages
Web Server Security Vulnerabilities Guide
No ratings yet
Web Server Security Vulnerabilities Guide
45 pages
Candidate Evaluation: Govinda Pedhiwal
No ratings yet
Candidate Evaluation: Govinda Pedhiwal
11 pages
Frontend Development Plan for Blogs
No ratings yet
Frontend Development Plan for Blogs
7 pages
Java Generics Lab: Type-Safe Coding
No ratings yet
Java Generics Lab: Type-Safe Coding
8 pages
Toga App: Sum and Difference GUI
No ratings yet
Toga App: Sum and Difference GUI
40 pages
System Design Interview Preparation Guide
No ratings yet
System Design Interview Preparation Guide
245 pages
PAM Audit Checklist
No ratings yet
PAM Audit Checklist
2 pages
Exponential Smoothing in Time Series Forecasting
No ratings yet
Exponential Smoothing in Time Series Forecasting
5 pages
Evolution of Computing Systems Explained
No ratings yet
Evolution of Computing Systems Explained
5 pages
Types of Multiprocessor Operating Systems
No ratings yet
Types of Multiprocessor Operating Systems
5 pages
Virginia Tech Data Annotation Expert
No ratings yet
Virginia Tech Data Annotation Expert
3 pages
Web Proxies for Penetration Testing
No ratings yet
Web Proxies for Penetration Testing
60 pages
Whitepaper JavaBeans
No ratings yet
Whitepaper JavaBeans
114 pages
Managing HIL SIL and MIL Simulation With SIM WB
No ratings yet
Managing HIL SIL and MIL Simulation With SIM WB
14 pages
VMAX All Flash and VMAX3 Configuration Management Course Description
No ratings yet
VMAX All Flash and VMAX3 Configuration Management Course Description
3 pages
OS Practice Questions for Assessment 5.1.11
No ratings yet
OS Practice Questions for Assessment 5.1.11
9 pages
Currency Conversion in IBS Project
No ratings yet
Currency Conversion in IBS Project
4 pages
CAD Software and Hardware Essentials
No ratings yet
CAD Software and Hardware Essentials
5 pages
Data Analytics Course Overview
No ratings yet
Data Analytics Course Overview
8 pages
Product Data Sheet Ovation Excitation en 657152
No ratings yet
Product Data Sheet Ovation Excitation en 657152
10 pages
Blood Bank and Donor Management System
No ratings yet
Blood Bank and Donor Management System
32 pages
Information Systems and Technology Overview
No ratings yet
Information Systems and Technology Overview
71 pages
CC Functional Matrix
No ratings yet
CC Functional Matrix
6 pages
Air Ticket Reservation Project Overview
No ratings yet
Air Ticket Reservation Project Overview
25 pages
Microsoft Word Features & Versions
No ratings yet
Microsoft Word Features & Versions
7 pages
Free Morph PowerPoint Template
No ratings yet
Free Morph PowerPoint Template
10 pages
Java Primitive Data Types Explained
No ratings yet
Java Primitive Data Types Explained
6 pages
Mirth 1.7.1 Reference Guide Overview
No ratings yet
Mirth 1.7.1 Reference Guide Overview
144 pages
One Touch Hotel Management Software
No ratings yet
One Touch Hotel Management Software
13 pages
IoT Course Week 7 Assignment Questions
No ratings yet
IoT Course Week 7 Assignment Questions
10 pages