In [3]:
import re
In [1]:
#Sample Msg
sample="This is roger. my contact no is 415-555-4242,& 415-555-4243, 416-555-4244"
In [12]:
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

#Search() return first match
mob=phoneNumRegex.search(sample)
print mob.group()
415-555-4242

mob=phoneNumRegex.findall(sample)

In [15]:
#find all return All matches
mob=phoneNumRegex.findall(sample)
print mob
['415-555-4242', '415-555-4243', '416-555-4244']

Understanding Group

let say we want to categrise our result , like above phone number consist of area code & actual number . using group we can find it & use it easily.

In [24]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mob=phoneNumRegex.findall(sample)
print mob
#for find all our result is set of tuple & each touple consist of 2 group based on our regression
[('415', '555-4242'), ('415', '555-4243'), ('416', '555-4244')]
In [26]:
mo=phoneNumRegex.search(sample)
print 'area code: ',mo.group(1)
print 'phone no :',mo.group(2)
area code:  415
phone no : 555-4242

Matching Multiple Groups with the Pipe

The regular expression r'Batman|Tina Fey' will match either 'Batman' or 'Tina Fey' . When both Batman and Tina Fey occur in the searched string, the first occurrence of matching text will be returned as the Match object.

In [27]:
heroRegex = re.compile (r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')
print mo1.group()
Batman
In [29]:
mo2 = heroRegex.search('Tina Fey and Batman.')
print mo2.group()
Tina Fey
In [31]:
mo3=heroRegex.findall('Tina Fey and Batman.')
print mo3
['Tina Fey', 'Batman']

More Examples of pipe

In [34]:
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')
print mo.group()
Batmobile
In [35]:
mo = batRegex.search('Batbat lost a wheel')
print mo.group()
Batbat

Optional Matching with the Question Mark

In [36]:
batRegex = re.compile(r'Bat(wo)?man')
#here "wo" is optional

mo1 = batRegex.search('The Adventures of Batman')
print mo1.group()
Batman
In [37]:
mo1 = batRegex.search('The Adventures of Batwoman')
print mo1.group()
Batwoman

Matching Zero or More with the Star

The * (called the star or asterisk) means “match zero or more”—the group that precedes the star can occur any number of times in the text

In [38]:
batRegex = re.compile(r'Bat(wo)*man')

mo1 = batRegex.search('The Adventures of Batman')
print mo1.group()
Batman
In [39]:
mo1 = batRegex.search('The Adventures of Batwoman')
print mo1.group()
Batwoman
In [40]:
mo1 = batRegex.search('The Adventures of Batwowowoman')
print mo1.group()
Batwowowoman

Note: While * means “match zero or more,” the + (or plus) means “match one or more.

Matching Specific Repetitions with Curly Brackets

Regex (Ha){3} will match the string 'HaHaHa <br > egex (Ha){3,5} will match 'HaHaHa' , 'HaHaHaHa' , and 'HaHaHaHaHa' .

In [41]:
haRegex = re.compile(r'(Ha){3}')
mo1 = haRegex.search('HaHaHa')
print mo1.group()
HaHaHa
In [42]:
haRegex = re.compile(r'(Ha){3,5}')
mo1 = haRegex.search('HaHaHaHa')
print mo1.group()
HaHaHaHa