Regular Expression in Python

Exercises: Regular Expression (regex) module

© Kaixin Wang, Winter 2020

This set of exercises is based on the handout at http://stanford.edu/~risi/tutorials/regex.html

Library Import

In [1]:
import re
In [2]:
test_string = "This is a TEST string 123. [email protected]"

Exercise 1: character matching

In [3]:
# Find all lowercase letters that appeared in the string.
re.findall (r'[a-z]', test_string)
Out[3]:
['h',
 'i',
 's',
 'i',
 's',
 'a',
 's',
 't',
 'r',
 'i',
 'n',
 'g',
 'a',
 'b',
 'c',
 'e',
 'm',
 'a',
 'i',
 'l',
 'c',
 'o',
 'm']

Exercise 2: character matching

In [4]:
# Find all uppercase letters that appeared in the string.
re.findall(r'[A-Z]', test_string)
Out[4]:
['T', 'T', 'E', 'S', 'T']

Exercise 3: character matching

In [5]:
# Find all words (substrings) that are made up of numbers.
re.findall(r'[0-9]+', test_string)
Out[5]:
['123']

Exercise 4: multiple groups matching

In [6]:
# Find all words that start with the letter “T”.
re.findall(r'T[a-zA-Z]+', test_string)
Out[6]:
['This', 'TEST']

Exercise 5: multiple groups matching

In [7]:
# Find all words (substrings) that contain an email address.
re.findall(r'[a-zA-Z][email protected][a-zA-Z.]+', test_string)
Out[7]:

Exercise 6: word boundary

In [8]:
# Find words (substrings) that consist of only lowercase letters.
re.findall(r'\b[a-z]+\b', test_string)
Out[8]:
['is', 'a', 'string', 'abc', 'email', 'com']
In [9]:
# Note: if the word boundary "\b" is not specified:
# all substrings that contain only lower case letters will be returned 
re.findall(r'[a-z]+', test_string)
Out[9]:
['his', 'is', 'a', 'string', 'abc', 'email', 'com']