0% found this document useful (0 votes)
4 views9 pages

Python Regular Expression

A regular expression is a sequence of characters used to match strings, supported in Python by the re module. The document outlines various regular expression symbols, special characters, and methods for matching patterns, including functions like match(), search(), findall(), sub(), and split(). It provides examples and explanations for using these features effectively in Python programming.

Uploaded by

saradha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views9 pages

Python Regular Expression

A regular expression is a sequence of characters used to match strings, supported in Python by the re module. The document outlines various regular expression symbols, special characters, and methods for matching patterns, including functions like match(), search(), findall(), sub(), and split(). It provides examples and explanations for using these features effectively in Python programming.

Uploaded by

saradha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

REGULAR EXPRESSION

A regular expression is a special sequence of characters that helps you match or find other
strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are
widely used in UNIX world.

The module re provides full support for Perl-like regular expressions in Python. The re module
raises the exception [Link] if an error occurs while compiling or using a regular expression.

Following table lists the regular expression symbols that is available in Python −

Pattern Description

^ Matches beginning of line.

$ Matches end of line.

. Matches any single character except newline. Using m option


allows it to match newline as well.

[...] Matches any single character in brackets.

[^...] Matches any single character not in brackets

re* Matches 0 or more occurrences of preceding expression.

re+ Matches 1 or more occurrence of preceding expression.

re? Matches 0 or 1 occurrence of preceding expression.

re{ n} Matches exactly n number of occurrences of preceding expression.

re{ n,} Matches n or more occurrences of preceding expression.


re{ n, m} Matches at least n and at most m occurrences of preceding
expression.

a| b Matches either a or b.

(re) Groups regular expressions and remembers matched text.

Special Characters

\w Matches any alphanumeric characters. (A-Z a-z 0-9)

\W Matches nonword [Link] inverse of \w

\s Matches whitespace. Equivalent to [\t\n\r\f].

\S Matches non whitespace.

\d Matches any decimal digits. Equivalent to [0-9].

\D Matches non digits.

\A Matches beginning of string.

\Z Matches end of string. If a newline exists, it matches just before newline.

\z Matches end of string.

\G Matches point where last match finished.

\b Matches word boundaries when outside brackets. Matches backspace (0x08)


when inside brackets.

\B Matches nonword boundaries.

\n, \t, Matches newlines, carriage returns, tabs, etc.


etc.

Matching more than one RE pattern with Alternation

| - is used to choose from one of the different regular expression

Eg: at|home - at or home

Matches any single character except for newline (.)

Eg: f.o - fao, fvo, f9o, f#o,..

.end - any character before the string end

Matching from the Beginning or End of strings or word Boundaries

^ - Matches beginning of line.

$ - Matches end of line.

\z - Matches end of string.

\b- Matches word boundaries when outside brackets.

\B - Matches nonword boundaries.

Eg:

RE pattern

^from - any string that start with from

/bin/tcsh$ - any string hat ends with /bin/tcsh

\b the – any word that starts with ‘the’

\B- any string that contains but does not begin with ‘the’

Creating character classes ([ ])


[ ] - Matches any single character in brackets.

RE pattern

b[aeiu]t – bat, bet,bit,but

[cr][23][dp] – c2d, c3d,c2p,c3p,…

Ranges ( -) and Negation (^ )

- - range of characters

^ - not to match any of the character in the given character set

Eg:

RE pattern

z.[0-9] - z followed by any character then followed by a single digit.

[ ^aeiou] – A non-vowel character.

Multiple Occurrence/ Repetition using closure operators

*- Matches 0 or more occurrences of preceding expression.

+ - Matches 1 or more occurrence of preceding expression

? – Matches 0 or 1 occurrence of preceding expression.

{ }- Matches exactly n number of occurrences of preceding expression.

{m,n} - Matches at least n and at most m occurrences of preceding expression.

{n,} - Matches n or more occurrences of preceding expression.

Eg:

re module:
1. [Link](pattern, flags=0)
Compile a regular expression pattern into a regular expression object, which can
be used for matching using its match() and search() methods, described below.

The expression’s behaviour can be modified by specifying a flags value. Values


can be any of the following variables, combined using bitwise OR (the | operator).

The sequence

prog = [Link](pattern)
result = [Link](string)
is equivalent to

result = [Link](pattern, string)


2. [Link]([group1, ...])
Returns one or more subgroups of the match. If there is a single argument, the
result is a single string; if there are multiple arguments, the result is a tuple with
one item per argument. Without arguments, group1 defaults to zero (the whole
match is returned). If a groupN argument is zero, the corresponding return value
is the entire matching string; if it is in the inclusive range [1..99], it is the string
matching the corresponding parenthesized group. If a group number is negative
or larger than the number of groups defined in the pattern,
an IndexError exception is raised. If a group is contained in a part of the
pattern that did not match, the corresponding result is None. If a group is
contained in a part of the pattern that matched multiple times, the last match is
returned.

3. >>> m = [Link](‘\w\w\w)-(\d\d\d)’, ‘abc-123’)


4. >>> [Link]() # The entire match

'abc-123'

5. >>> [Link](1) # The first parenthesized subgroup.


6. 'abc'
7. >>> [Link](2) # The second parenthesized subgroup.
8. '123'
9. >>> [Link]()
10. ('abc', '123')
3. [Link](default=None)¶

Return a tuple containing all the subgroups of the match, from 1 up to however many
groups are in the pattern. The default argument is used for groups that did not
participate in the match; it defaults to None.

For example:

>>>
>>> m = [Link](r"(\d+)\.(\d+)", "24.1632")
>>> [Link]()
('24', '1632')

4. The match Function


This function searches for first occurrence of RE pattern within string with
optional flags.

Here is the syntax for this function:

[Link](pattern, string, flags=0)

Here is the description of the parameters:

Parameter Description

pattern This is the regular expression to be matched.

string This is the string, which would be searched to match the


pattern anywhere in the string.

flags You can specify different flags using bitwise OR (|). These
are modifiers, which are listed in the table below.

The [Link] function returns a match object on success, none on failure.


We use group(num) or groups() function of match object to get matched
expression.
Match Object Description
Methods

group(num=0) This method returns entire match (or specific subgroup


num)

groups() This method returns all matching subgroups in a tuple


(empty if there weren't any)

Example
1. M= [Link](‘foo’,’foo’)

If M is not None:

[Link]()

Match successfully

2. M= [Link](‘foo’,’bar’)
If M is not None:
[Link]()
Pattern does not match

5. [Link]
Search checks for a match anywhere in the string

Eg:

1. M= [Link](‘foo’,’seafood’)

If M is None:

[Link]()

Match successfully

6. Matching more than one string(/)

1. bt =’bat/bit/bet’

m=[Link](bt, ‘bat’)

print (m) - bat


2.m = [Link](bt, ‘he bit me’)

Does not match

M = [Link](bt, ‘he bit me’)

Match

[Link] any single character (.)

1. anyend = ‘.end’

M = [Link](anyend, ‘bend’)

bend

2. M = [Link](anyend, ‘end’)
Does not match
3. M = [Link](anyend, ‘\nend’)
Does not match, ie any char except \n
4. M = [Link](anyend, ‘ The end’)
match

8. Finding Every Occurrence with findall()

Is is similar to search () in that it performs a string search. But it differs from


match() and search() in that findall() always returns a list.

1.>>> [Link](‘car’,’car’)

[‘car’]

2.>>> [Link](‘car’, ‘scary’)

[‘car’]

3.>>> [Link](‘car’, ‘carry the barcardi to the car’)

[‘car’,’car’,’car’]

9. Searching and replacing with sub() and subn()


Syntax:

Sub(pattern,repl,string,max=0)

Replace all occurrences of the RE pattern in string with repl, substituting all
occurrences unless max provided.

Subn() is exactly the same as sub() but it also returns the total number of
substitutions made

>>> [Link](‘X’,’Smith’,’attn :X Dear X’)

’attn: Smith Dear smith’

>>> [Link](‘X’,’Smith’,’attn :X Dear X’)

(’attn: Smith Dear smith’,2)

10. Splitting with split()

Syntax:

Split(pattern, string,max=0)

Split string into a list according to RE pattern delimiter and return list of successful
matches, splitting at most max times.

Eg:

>>>[Link](‘:’,’str1:str2:str3’)

[ ‘str1’,str2’,’str3’]

You might also like