100% found this document useful (1 vote)

89 views10 pages

COMP2041 25T1: Python Regex Guide

Here are regex class notes from my school

Uploaded by

felixbakitsi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

89 views10 pages

COMP2041 25T1: Python Regex Guide

Here are regex class notes from my school

Uploaded by

felixbakitsi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

COMP(2041|9044) 25T1 — Python Regular Expressions

[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 1 / 28

Regular Expression History Revisited

1950s mathematician Stephen Kleene develops theory
1960s Ken Thompson develops syntax and practical implementation, two versions:
POSIX Basic Regular Expressions
limited syntax, e.g no |
used by grep & sed
needed when computers were every slow to make regex matching faster
POSIX Extended Regular Expressions - superset of Basic Regular Expressions
used by grep -E & sed -E
1980s Henry Spencer produces open source regex C library
used many place e.g. postgresql, tcl
extended (added features & syntax) to Ken’s regex language.
1987 Perl (Larry Wall) copied Henry’s library & extended much further
available outside Perl via Perl Compatible Regular Expressions library
used by grep -P
1990s Python standard re package also copied Henry’s library
added most of the features in Perl/PCRE
many commonly used features are common to both
we will cover some (not all) useful extra regex features found in both Python & Perl/PCRE
note [Link] lets you specify which regex language

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 2 / 28

Python re package - useful functions

[Link](regex, string, flags)

# search for a *regex* match within *string*
# return object with information about match or `None` if match fails
# optional parameter flags modifies matching,
# e.g. make matching case-insensitive with: `flags=re.I`

[Link](regex, string, flags)

# only match at start of string
# same as `[Link]` stating with `^`

[Link](regex, string, flags)

# only match the full string
# same as `[Link]` stating with `^` and ending with `$`

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 3 / 28

Python re package - useful functions

[Link](regex, replacement, string, count, flags)

# return *string* with anywhere *regex* matches, substituted by *replacement*
# optional parameter *count*, if non-zero, sets maximum number of
↪ substitutions

[Link](regex, string, flags)

# return all non-overlapping matches of pattern in string
# if pattern contains () return part matched by ()
# if pattern contains multiple () return tuple

[Link](regex, string, maxsplit, flags)

# Split *string* everywhere *regex* matches
# optional parameter *maxsplit*, if non-zero, set maximum number of splits

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 4 / 28

Python Characters Classes (also in PCRE)

\d matches any digit, for ASCII: [0-9]

\D matches any non-digit, for ASCII: [^0-9]
\w matches any word char, for ASCII: [a-zA-Z_0-9]
\W matches any non-word char, for ASCII: [^a-zA-Z_0-9]
\s matches any whitespace, for ASCII: [ \t\n\r\f]
\S matches any non-whitespace, for ASCII: [^ \t\n\r\f]
\b matches at a word boundary
\B matches except at a word boundary
\A matches at the start of the string, same as ^
\Z matches at the end of the string, same as $

convenient and make your regex more likely to be portable to non-English locales
\b and \B are like ^ and $ - they don’t match characters, they anchor the match

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 5 / 28

raw strings

Python raw-string is prefixed with an r (for raw)

can prefix with r strings quoted with ' " ''' """
backslashes have no special meaning in raw-string except before quotes
backslashes escape quotes but also stay in the string
regexes often contain backslashes - using raw-strings makes them more readable

>>> print('Hello\nAndrew')
Hello
Andrew
>>> print(r'Hello\nAndrew')
Hello\nAndrew
>>> r'Hello\nAndrew' == 'Hello\\nAndrew'
True
>>> len('\n')
1
>>> len(r'\n')
2

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 6 / 28

Match objects

[Link], [Link], [Link] return a match object if a match suceeds, None if it fails
hence their return can to control if or while

print("Destroy the file system? ")

answer = input()
if [Link](r'yes|ok|affirmative', answer, flags=re.I):
[Link]("rm -r /", Shell=True)

the match object can provide useful information:

>>> m = [Link](r'[aiou].*[aeiou]', 'pillow')

>>> m
<[Link] object; span=(1, 5), match='illo'>
>>> [Link](0)
'illo'
>>> [Link]()
(1, 5)
>>>

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 7 / 28

Capturing Parts of a Regex Match

brackets are used for grouping (like arithmetic) in extened regular expresions
in Python (& PCRE) brackets also capture the part of the string matched
group(n) returns part of the string matched by the nth-pair of brackets
>>> m = [Link]('(\w+)\s+(\w+)', 'Hello Andrew')
>>> [Link]()
('Hello', 'Andrew')
>>> [Link](1)
'Hello'
>>> [Link](2)
'Andrew'

\number can be used to refer to group number in an [Link] replacement string

>>> [Link](r'(\d+) and (\d+)', r'\2 or \1', "The answer is 42 and 43?")
'The answer is 43 or 42?'

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 8 / 28

Back-referencing

\number can also be used in a regex as well

usually called a back-reference
e.g. r'^(\d+) (\1)$' match the same integer twice

>>> [Link](r'^(\d+) (\d+)$', '42 43')

<[Link] object; span=(0, 5), match='42 43'>
>>> [Link](r'^(\d+) (\1)$', '42 43')
>>> [Link](r'^(\d+) (\1)$', '42 42')
<[Link] object; span=(0, 5), match='42 42'>

back-references allow matching impossible with classical regular expressions

python supports up to 99 back-references, \1, \2, \3, …, \99

\01 or \100 is interpreted as an octal number

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 9 / 28

Non-Capturing Group

(?:...) is a non-capturing group

it has the same grouping behaviour as (...)
it doesn’t capture the part of the string matched by the group

>>> m = [Link](r'.(?:[aeiou]).([aeiou]).*', 'abcde')

>>> m
<[Link] object; span=(0, 5), match='abcde'>
>>> [Link](1)
'e'

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 10 / 28

Greedy versus non-Greedy Pattern Matching

The default semantics for pattern matching is greedy:

starts match the first place it can succeed
make the match as long as possible
The ? operator changes pattern matching to non-greedy:
starts match the first place it can succeed
make the match as short as possible

>>> s = "abbbc"
>>> [Link](r'ab+', 'X', s)
'Xc'
>>> [Link](r'ab+?', 'X', s)
'Xbbc'

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 11 / 28

Why Implementing a Regex Matching isn’t Easy

regex matching starts match the first place it can succeed

but a regex can partly match many places

>>> [Link](r'ab+c', 'X', "abbabbbbbbbabbbc")

'abbabbbbbbbX'

and may need to backtrack, e.g:

>>> [Link](r'a.*bc', 'X', "abbabbbbbbbcabbb")

'Xabbb'

poorly designed regex engines can get very slow

have been used for denial-of-service attacks
Python (PCRE) regex matching is NP-hard due to back-references

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 12 / 28

[Link]

[Link] returns a list of the matched strings, e.g:

>>> [Link](r'\d+', "-5==10zzz200_")
['5', '10', '200']

if the regex contains () only the captured text is returned

>>> [Link](r'(\d)\d*', "-5==10zzz200_")

['5', '1', '2']

if the regex contains multiple () a list of tuples is returned

>>> [Link](r'(\d)\d*(\d)', "-5==10zzz200_")

[('1', '0'), ('2', '0')]
>>> [Link](r'([^,]*), (\S+)', "Hopper, Grace Brewster Murray")
[('Hopper', 'Grace')]
>>> [Link](r'([A-Z])([aeiou])', "Hopper, Grace Brewster Murray")
[('H', 'o'), ('M', 'u')]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 13 / 28

[Link]

[Link] splits a string where a regex matches

>>> [Link](r'\d+', "-5==10zzz200_")
['-', '==', 'zzz', '_']

like cut in Shell scripts - but more powerful

for example, you can’t do this with cut

>>> [Link](r'\s,\s', "abc,de, ghi ,jk , mn")

['abc', 'de', 'ghi', 'jk', 'mn']

>>> a = [Link](r'\s,\s', "abc,de, ghi ,jk , mn")

>>> a
['abc', 'de', 'ghi', 'jk', 'mn']
>>> ':'.join(a)
'[Link]ghi:jk:mn'

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 14 / 28

Example - printing the last number

# Print the last number (real or integer) on every line

# Note: regexp to match number: -?\d+\.?\d*
# Note: use of assignment operator :=
import re, sys
for line in [Link]:
if m := [Link](r'(-?\d+\.?\d*)\D*$', line):
print([Link](1))
source code for print_last_number.py

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 15 / 28

Example - finding numbers #0

# print the sum and mean of any positive integers found on stdin
# Note regexp to split on non-digits
# Note check to handle empty string from split
# Only positive integers handled
import re, sys
input_as_string = [Link]()
numbers = [Link](r"\D+", input_as_string)
total = 0
n = 0
for number in numbers:
if number:
total += int(number)
n += 1
if numbers:
print(f"{n} numbers, total {total}, mean {total / n:.1f}")
source code for find_numbers.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 16 / 28

Example - finding numbers #1

# print the sum and mean of any numbers found on stdin

# Note regexp to match number -?\d+\.?\d*
# match postive & negative integers & floating-point numbers
import re, sys
input_as_string = [Link]()
numbers = [Link](r"-?\d+\.?\d*", input_as_string)
n = len(numbers)
total = sum(float(number) for number in numbers)
if numbers:
print(f"{n} numbers, total {total}, mean {total / n:.1f}")
source code for find_numbers.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 17 / 28

Example - counting enrollments with regexes & dicts

course_names = {}
with open(COURSE_CODES_FILE, encoding="utf-8") as f:
for line in f:
if m := [Link](r"(\S+)\s+(.*\S)", line):
course_names[[Link](1)] = [Link](2)
enrollments_count = {}
with open(ENROLLMENTS_FILE, encoding="utf-8") as f:
for line in f:
course_code = [Link](r"\|.*\n", "", line)
if course_code not in enrollments_count:
enrollments_count[course_code] = 0
enrollments_count[course_code] += 1
for (course_code, enrollment) in sorted(enrollments_count.items()):
# if no name for course_code use ???
name = course_names.get(course_code, "???")
print(f"{enrollment:4} {course_code} {name}")
source code for count_enrollments.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 18 / 28

Example - counting enrollments with split & counters
course_names = {}
with open(COURSE_CODES_FILE, encoding="utf-8") as f:
for line in f:
course_code, course_name = [Link]().split("\t", maxsplit=1)
course_names[course_code] = course_name
enrollments_count = [Link]()
with open(ENROLLMENTS_FILE, encoding="utf-8") as f:
for line in f:
course_code = [Link]("|")[0]
enrollments_count[course_code] += 1
for (course_code, enrollment) in sorted(enrollments_count.items()):
# if no name for course_code use ???
name = course_names.get(course_code, "???")
print(f"{enrollment:4} {course_code} {name}")
source code for count_enrollments.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 19 / 28

Example - counting first names

already_counted = set()
first_name_count = [Link]()
with open(ENROLLMENTS_FILE, encoding="utf-8") as f:
for line in f:
_, student_number, full_name = [Link]("|")[0:3]
if student_number in already_counted:
continue
already_counted.add(student_number)
if m := [Link](r".*,\s+(\S+)", full_name):
first_name = [Link](1)
first_name_count[first_name] += 1
# put the count first in the tuples so sorting orders on count before name
count_name_tuples = [(c, f) for (f, c) in first_name_count.items()]
# print first names in decreasing order of popularity
for (count, first_name) in sorted(count_name_tuples, reverse=True):
print(f"{count:4} {first_name}")
source code for count_first_names.py

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 20 / 28

Example - finding duplicate first names using dict of dicts

course_first_name_count = {}
with open(ENROLLMENTS_FILE, encoding="utf-8") as f:
for line in f:
course_code, _, full_name = [Link]("|")[0:3]
if m := [Link](r".*,\s+(\S+)", full_name):
first_name = [Link](1)
else:
print("Warning could not parse line", [Link](),
↪ file=[Link])
continue
if course_code not in course_first_name_count:
course_first_name_count[course_code] = {}
if first_name not in course_first_name_count[course_code]:
course_first_name_count[course_code][first_name] = 0
course_first_name_count[course_code][first_name] += 1
for course in sorted(course_first_name_count.keys()):
for (first_name, count) in course_first_name_count[course].items():
if count >= REPORT_MORE_THAN_STUDENTS:
print(course, "has", count, "students named", first_name)
source code for duplicate_first_names.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 21 / 28

Example - finding duplicate first names using split & defaultdict of counters
course_first_name_count = [Link]([Link])
with open(ENROLLMENTS_FILE, encoding="utf-8") as f:
for line in f:
course_code, _, full_name = [Link]("|")[0:3]
given_names = full_name.split(",")[1].strip()
first_name = given_names.split(" ")[0]
course_first_name_count[course_code][first_name] += 1
for (course, name_counts) in sorted(course_first_name_count.items()):
for (first_name, count) in name_counts.items():
if count > REPORT_MORE_THAN_STUDENTS:
print(course, "has", count, "students named", first_name)
source code for duplicate_first_names.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 22 / 28

Example - Changing Filenames with Regex

# written by andrewt@[Link] for COMP(2041|9044)
#
# Change the names of the specified files
# by substituting occurrances of regex with replacement
# (simple version of the perl utility rename)
import os
import re
import sys
if len([Link]) < 3:
print(f"Usage: {[Link][0]} <regex> <replacement> [files]",
↪ file=[Link])
[Link](1)
regex = [Link][1]
replacement = [Link][2]
for old_pathname in [Link][3:]:
new_pathname = [Link](regex, replacement, old_pathname, count=1)
if new_pathname == old_pathname:
continue
if [Link](new_pathname):
print(f"{[Link][0]}: '{new_pathname}' exists", file=[Link])
continue
try:
[Link](old_pathname, new_pathname)
except OSError as e:
print(f"{[Link][0]}: '{new_pathname}' {e}", file=[Link])
source code for rename_regex.py

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 23 / 28

Example - Changing Filenames with Regex & EVal

# written by andrewt@[Link] for COMP(2041|9044)
#
# Change the names of the specified files
# by substituting occurrances of regex with replacement
# (simple version of the perl utility rename)
#
# also demonstrating argument processing and use of eval
# beware eval can allow arbitrary code execution,
# it should not be used where security is importnat
import argparse
import os
import re
import sys
parser = [Link]()
# add required arguments
parser.add_argument("regex", type=str, help="match against filenames")
parser.add_argument("replacement", type=str, help="replaces matches with
↪ this")
parser.add_argument("filenames", nargs="*", help="filenames to be changed")
# add some optional boolean arguments
parser.add_argument(
"-d", "--dryrun", action="store_true", help="show changes but don't make
↪ them"
)
parser.add_argument(
"-v", "--verbose", action="store_true", help="print more information"
)
parser.add_argument(
"-e",
"--eval",
action="store_true",
help="evaluate replacement as python expression, match available as _",
)
# optional integer argument which defaults to 1
parser.add_argument(
"-n",
"--replace_n_matches",
type=int,
default=1,
help="replace n matches (0 for all matches)",
)
args = parser.parse_args()
def eval_replacement(match):
"""if --eval given, evaluate replacment string as Python
with the variable _ set to the matching part of the filename
"""
if not [Link]:
return [Link]
_ = [Link](0)
return str(eval([Link]))
for old_pathname in [Link]:
try:
new_pathname = [Link](
[Link], eval_replacement, old_pathname,
↪ count=args.replace_n_matches
)
except OSError as e:
print(
f"{[Link][0]}: '{old_pathname}': '{[Link]}' {e}",
file=[Link],
)
continue
if new_pathname == old_pathname:
if [Link]:
print("no change:", old_pathname)
continue
if [Link](new_pathname):
print(f"{[Link][0]}: '{new_pathname}' exists", file=[Link])
continue
if [Link]:
print(old_pathname, "would be renamed to", new_pathname)
continue
if [Link]:
print("'renaming", old_pathname, "to", new_pathname)
try:
[Link](old_pathname, new_pathname)
except OSError as e:
print(f"{[Link][0]}: '{new_pathname}' {e}", file=[Link])
source code for rename_regex_eval.py

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 24 / 28

Example - When Harry Met Hermione #0

# For each file given as argument replace occurrences of Hermione

# allowing for some misspellings with Harry and vice-versa.
# Relies on Zaphod not occurring in the text.
import re, sys, os
for filename in [Link][1:]:
tmp_filename = filename + ".new"
if [Link](tmp_filename):
print(f"{[Link][0]}: {tmp_filename} already exists\n",
↪ file=[Link])
[Link](1)
with open(filename) as f:
with open(tmp_filename, "w") as g:
for line in f:
changed_line = [Link](r"Herm[io]+ne", "Zaphod", line)
changed_line = changed_line.replace("Harry", "Hermione")
changed_line = changed_line.replace("Zaphod", "Harry")
[Link](changed_line)
[Link](tmp_filename, filename)
source code for change_names.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 25 / 28

Example - When Harry Met Hermione #1

# For each file given as argument replace occurrences of Hermione

# allowing for some misspellings with Harry and vice-versa.
# Relies on Zaphod not occurring in the text.
import re, sys, os, shutil, tempfile
for filename in [Link][1:]:
with [Link](mode='w', delete=False) as tmp:
with open(filename) as f:
for line in f:
changed_line = [Link](r"Herm[io]+ne", "Zaphod", line)
changed_line = changed_line.replace("Harry", "Hermione")
changed_line = changed_line.replace("Zaphod", "Harry")
[Link](changed_line)
[Link]([Link], filename)
source code for change_names.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 26 / 28

Example - When Harry Met Hermione #2

# For each file given as argument replace occurrences of Hermione

# allowing for some misspellings with Harry and vice-versa.
# Relies on Zaphod not occurring in the text.
# modified text is stored in a list then file over-written
import re, sys, os
for filename in [Link][1:]:
changed_lines = []
with open(filename) as f:
for line in f:
changed_line = [Link](r"Herm[io]+ne", "Zaphod", line)
changed_line = changed_line.replace("Harry", "Hermione")
changed_line = changed_line.replace("Zaphod", "Harry")
changed_lines.append(changed_line)
with open(filename, "w") as g:
[Link]("".join(changed_lines))
source code for change_names.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 27 / 28

Example - When Harry Met Hermione #3

# For each file given as argument replace occurrences of Hermione

# allowing for some misspellings with Harry and vice-versa.
# Relies on Zaphod not occurring in the text.
# modified text is stored in a single string then file over-written
import re, sys, os
for filename in [Link][1:]:
changed_lines = []
with open(filename) as f:
text = [Link]()
changed_text = [Link](r"Herm[io]+ne", "Zaphod", text)
changed_text = changed_text.replace("Harry", "Hermione")
changed_text = changed_text.replace("Zaphod", "Harry")
with open(filename, "w") as g:
[Link]("".join(changed_text))
source code for change_names.[Link]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 28 / 28

C Programming Basics and Functions
No ratings yet
C Programming Basics and Functions
24 pages
Applied Mathematics Textbook Vol I PDF
No ratings yet
Applied Mathematics Textbook Vol I PDF
376 pages
C Techmax Harish Narula
No ratings yet
C Techmax Harish Narula
375 pages
DataFrame Analysis and Visualization Tasks
No ratings yet
DataFrame Analysis and Visualization Tasks
4 pages
GV Kumbhojkar PDF Download
No ratings yet
GV Kumbhojkar PDF Download
514 pages
Kumbhojkar Engineering Maths Sem 3 PDF
No ratings yet
Kumbhojkar Engineering Maths Sem 3 PDF
255 pages
AppliedPhysicsII PDF
No ratings yet
AppliedPhysicsII PDF
8 pages
MCGM 24x7 App Overview by Manoj Dongare
No ratings yet
MCGM 24x7 App Overview by Manoj Dongare
22 pages
Engineering Physics 2 by Dr. I.A. Shaikh
No ratings yet
Engineering Physics 2 by Dr. I.A. Shaikh
173 pages
Engineering Chemistry Exam Crash Course
No ratings yet
Engineering Chemistry Exam Crash Course
33 pages
Diagonal Length in Circular Openings
100% (1)
Diagonal Length in Circular Openings
56 pages
Principles of Green Chemistry Explained
No ratings yet
Principles of Green Chemistry Explained
46 pages
Types of Referents in Reference Resolution
No ratings yet
Types of Referents in Reference Resolution
148 pages
Engineering Mathematics 2 by Kumbhojkar
No ratings yet
Engineering Mathematics 2 by Kumbhojkar
430 pages
Understanding Nc and Nv in Semiconductors
No ratings yet
Understanding Nc and Nv in Semiconductors
25 pages
Class 12 Companion Biology PDF
No ratings yet
Class 12 Companion Biology PDF
138 pages
DTSP Notes PDF Document
No ratings yet
DTSP Notes PDF Document
199 pages
Document Scanning Overview
No ratings yet
Document Scanning Overview
262 pages
Document Scanned by CamScanner
No ratings yet
Document Scanned by CamScanner
429 pages
Differential Calculus by Gorakh Prasad
No ratings yet
Differential Calculus by Gorakh Prasad
204 pages
DC and AK Tayal Document
No ratings yet
DC and AK Tayal Document
346 pages
Scanned Document Collection
No ratings yet
Scanned Document Collection
223 pages
Engineering Mathematics Resources MU
No ratings yet
Engineering Mathematics Resources MU
160 pages
Document Scanning Overview
No ratings yet
Document Scanning Overview
57 pages
Engineering Mechanics by AK Tayal PDF
No ratings yet
Engineering Mechanics by AK Tayal PDF
346 pages
NP Bali Engineering Mathematics PDF
No ratings yet
NP Bali Engineering Mathematics PDF
498 pages
Internet Programming Fundamentals PDF
50% (2)
Internet Programming Fundamentals PDF
251 pages
Physics for Engineers by Giasuddin PDF
No ratings yet
Physics for Engineers by Giasuddin PDF
1,375 pages
TCS TechMax and TechKnowledge PDF
No ratings yet
TCS TechMax and TechKnowledge PDF
589 pages
Language Detection Using Machine Learning
100% (1)
Language Detection Using Machine Learning
22 pages
EDC Notes for Engineering Students
No ratings yet
EDC Notes for Engineering Students
144 pages
GTU Theory of Computation Syllabus
No ratings yet
GTU Theory of Computation Syllabus
2 pages
Understanding P-N Junction Diodes
No ratings yet
Understanding P-N Junction Diodes
21 pages
MUSA Question Bank for SE IT Sem III
No ratings yet
MUSA Question Bank for SE IT Sem III
2 pages
Engineering Mathematics 2 Notes MU
No ratings yet
Engineering Mathematics 2 Notes MU
262 pages
Easy CSS Solutions Guide
No ratings yet
Easy CSS Solutions Guide
123 pages
Vidyalankar Engineering Notes PDF
100% (1)
Vidyalankar Engineering Notes PDF
148 pages
Document Scanned by CamScanner
No ratings yet
Document Scanned by CamScanner
408 pages
Probability Chapter 1 Part 2 (Kumbhojkar)
No ratings yet
Probability Chapter 1 Part 2 (Kumbhojkar)
15 pages
RTL Design for Soda Dispenser Machine
No ratings yet
RTL Design for Soda Dispenser Machine
3 pages
DCE PDF Scanning Overview
No ratings yet
DCE PDF Scanning Overview
145 pages
Mini Project 2B Report: Pyrolysis Design
No ratings yet
Mini Project 2B Report: Pyrolysis Design
5 pages
Document Scanned by CamScanner
No ratings yet
Document Scanned by CamScanner
133 pages
TCS TechKnowledge
No ratings yet
TCS TechKnowledge
298 pages
Overview of Elementary Signals
100% (1)
Overview of Elementary Signals
7 pages
MSBTE C Programming Exam Questions
No ratings yet
MSBTE C Programming Exam Questions
15 pages
Circuit Analysis Using Linear Algebra
No ratings yet
Circuit Analysis Using Linear Algebra
4 pages
SPCC Module-Wise Question Bank
No ratings yet
SPCC Module-Wise Question Bank
8 pages
Document Scanning Overview
No ratings yet
Document Scanning Overview
133 pages
Discrete Mathematics Study Notes
No ratings yet
Discrete Mathematics Study Notes
218 pages
Code With Harry Java Notes
No ratings yet
Code With Harry Java Notes
80 pages
Document Scanning Overview
No ratings yet
Document Scanning Overview
363 pages
Theory of Computation Overview
No ratings yet
Theory of Computation Overview
7 pages
SNDEY11
No ratings yet
SNDEY11
1,499 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Python Regex
No ratings yet
Python Regex
8 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
18 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
8 pages
Python Regex: Match, Search, Replace
No ratings yet
Python Regex: Match, Search, Replace
14 pages
Unit-2 Detailed Notes
No ratings yet
Unit-2 Detailed Notes
63 pages
Understanding Angles and Clock Measurements
No ratings yet
Understanding Angles and Clock Measurements
4 pages
God's Nurture of The Inner Man
No ratings yet
God's Nurture of The Inner Man
1 page
Tensor Networks Tub
No ratings yet
Tensor Networks Tub
47 pages
AP Literature Curriculum Overview
No ratings yet
AP Literature Curriculum Overview
4 pages
Number Base Conversion Guide
No ratings yet
Number Base Conversion Guide
4 pages
Linux Command Line Cheat Sheet
No ratings yet
Linux Command Line Cheat Sheet
1 page
Java Interview Questions and Answers
No ratings yet
Java Interview Questions and Answers
15 pages
Buwan ng Wika and Kasaysayan Event Plan
No ratings yet
Buwan ng Wika and Kasaysayan Event Plan
4 pages
Exploring the Circus Maximus in Latin
No ratings yet
Exploring the Circus Maximus in Latin
2 pages
Joseph Andrews: A Comic Novel Overview
No ratings yet
Joseph Andrews: A Comic Novel Overview
5 pages
IT Support Engineer Resume Summary
No ratings yet
IT Support Engineer Resume Summary
1 page
Analyzing Keats's "When I Have Fears"
100% (1)
Analyzing Keats's "When I Have Fears"
49 pages
Content Writing Essentials for Digital Marketing
No ratings yet
Content Writing Essentials for Digital Marketing
25 pages
Butterfly Valve Selection-Velocity Chart: HDU/HD Series Valves
No ratings yet
Butterfly Valve Selection-Velocity Chart: HDU/HD Series Valves
2 pages
Open Mind Adult English Course Overview
No ratings yet
Open Mind Adult English Course Overview
11 pages
Guide To Syncsort
No ratings yet
Guide To Syncsort
41 pages
Ansible Interview Questions Guide
No ratings yet
Ansible Interview Questions Guide
34 pages
Jews and The Founding of America
No ratings yet
Jews and The Founding of America
8 pages
Ptolus - Doctrine of Ghul PDF
100% (2)
Ptolus - Doctrine of Ghul PDF
31 pages
The Gospel According to Peter
No ratings yet
The Gospel According to Peter
6 pages
B1 Grammar and Vocabulary Overview
No ratings yet
B1 Grammar and Vocabulary Overview
1 page
UAV Flight Simulation with MATLAB & VRML
No ratings yet
UAV Flight Simulation with MATLAB & VRML
74 pages
Raudhah (In English)
No ratings yet
Raudhah (In English)
109 pages
U-Substitution in Integration Techniques
No ratings yet
U-Substitution in Integration Techniques
30 pages
Spiritist Prayers for Deliverance
No ratings yet
Spiritist Prayers for Deliverance
42 pages
Math Tools and Optimization Guide
No ratings yet
Math Tools and Optimization Guide
5 pages
Past Perfect Tense in Indirect Speech
No ratings yet
Past Perfect Tense in Indirect Speech
8 pages
Overview of Chinese Culture and Traditions
No ratings yet
Overview of Chinese Culture and Traditions
6 pages
Translating Common Abbreviations
100% (1)
Translating Common Abbreviations
4 pages
Encyclopedia of Policy Studies Second Edition, Revised and Expanded Edition Nagel Ebook Chapter Pack
100% (3)
Encyclopedia of Policy Studies Second Edition, Revised and Expanded Edition Nagel Ebook Chapter Pack
52 pages

COMP2041 25T1: Python Regex Guide

Uploaded by

COMP2041 25T1: Python Regex Guide

Uploaded by

COMP(2041|9044) 25T1 — Python Regular Expressions

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 1 / 28

Regular Expression History Revisited

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 2 / 28

Python re package - useful functions

[Link](regex, string, flags)

[Link](regex, string, flags)

[Link](regex, string, flags)

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 3 / 28

[Link](regex, replacement, string, count, flags)

[Link](regex, string, flags)

[Link](regex, string, maxsplit, flags)

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 4 / 28

Python Characters Classes (also in PCRE)

\d matches any digit, for ASCII: [0-9]

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 5 / 28

Python raw-string is prefixed with an r (for raw)

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 6 / 28

print("Destroy the file system? ")

the match object can provide useful information:

>>> m = [Link](r'[aiou].*[aeiou]', 'pillow')

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 7 / 28

Capturing Parts of a Regex Match

\number can be used to refer to group number in an [Link] replacement string

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 8 / 28

\number can also be used in a regex as well

>>> [Link](r'^(\d+) (\d+)$', '42 43')

back-references allow matching impossible with classical regular expressions

python supports up to 99 back-references, \1, \2, \3, …, \99

\01 or \100 is interpreted as an octal number

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 9 / 28

(?:...) is a non-capturing group

>>> m = [Link](r'.*(?:[aeiou]).*([aeiou]).*', 'abcde')

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 10 / 28

Greedy versus non-Greedy Pattern Matching

The default semantics for pattern matching is greedy:

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 11 / 28

Why Implementing a Regex Matching isn’t Easy

regex matching starts match the first place it can succeed

but a regex can partly match many places

>>> [Link](r'ab+c', 'X', "abbabbbbbbbabbbc")

and may need to backtrack, e.g:

>>> [Link](r'a.*bc', 'X', "abbabbbbbbbcabbb")

poorly designed regex engines can get very slow

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 12 / 28

[Link] returns a list of the matched strings, e.g:

if the regex contains () only the captured text is returned

>>> [Link](r'(\d)\d*', "-5==10zzz200_")

if the regex contains multiple () a list of tuples is returned

>>> [Link](r'(\d)\d*(\d)', "-5==10zzz200_")

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 13 / 28

[Link] splits a string where a regex matches

like cut in Shell scripts - but more powerful

for example, you can’t do this with cut

>>> [Link](r'\s*,\s*', "abc,de, ghi ,jk , mn")

see also the string join function

>>> a = [Link](r'\s*,\s*', "abc,de, ghi ,jk , mn")

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 14 / 28

Example - printing the last number

# Print the last number (real or integer) on every line

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 15 / 28

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 16 / 28

Example - finding numbers #1

# print the sum and mean of any numbers found on stdin

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 17 / 28

Example - counting enrollments with regexes & dicts

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 18 / 28

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 19 / 28

Example - counting first names

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 20 / 28

Example - finding duplicate first names using dict of dicts

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 21 / 28

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 22 / 28

Example - Changing Filenames with Regex

[Link] COMP(2041|9044) 25T1 — Python Regular Expressions 23 / 28

Example - Changing Filenames with Regex & EVal

>>> m = [Link](r'.(?:[aeiou]).([aeiou]).*', 'abcde')

>>> [Link](r'\s,\s', "abc,de, ghi ,jk , mn")

>>> a = [Link](r'\s,\s', "abc,de, ghi ,jk , mn")