Regular Expression Magic Examples

Use %reload_ext to force a reload of the magic during development.

In [1]:
%reload_ext regexmagic

matchlines cell magic

Use %%matchlines pattern followed by a block of text to match and colorize instances of pattern in that text.

In [2]:
%%matchlines a+b
xabx
Pattern: a+b

xabx

Match colors alternate so that adjacent matches are visually distinct.

In [3]:
%%matchlines a+b
xyz
aaabxx
xaababx
xyzabab
xabxabx
Pattern: a+b

xyz
aaabxx
x
aababx
xyz
abab
x
abxabx

The usual backslash escapes work.

In [4]:
%%matchlines /\w+ \d+, \d+/
Site/Date/Evil
Davison/May 22, 2010/1721.3
Pertwee/May 24, 2010/2103.8
Pattern: /\w+ \d+, \d+/

Site/Date/Evil
Davison
/May 22, 2010/1721.3
Pertwee
/May 24, 2010/2103.8

However, IPython interprets {x} as "expand the variable x", so repetition counts like {4} need to be written as {{4}} (doubling up the curly braces).

In [5]:
%%matchlines /\w{{4}} \d+, \d+/
Site/Date/Evil
Davison/May 22, 2010/1721.3
Pertwee/June 24, 2010/2103.8
Pattern: /\w{4} \d+, \d+/

Site/Date/Evil
Davison/May 22, 2010/1721.3
Pertwee
/June 24, 2010/2103.8

Everything following the first space after the %%matchlines directive is part of the pattern, so it's possible to have leading and trailing spaces - just not very readable. Also, because spaces aren't being converted to   in output, leading or trailing spaces don't show up in the output display.

In [6]:
%%matchlines  ab 
x ab x
 ab  ab x
Pattern: ab

x ab x
ab ab x

imatchlines cell magic

In [7]:
%%imatchlines
The Zen of Python, by Tim Peters

  Beautiful is better than ugly.
  Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Pattern: ^ .*$
Options: match lines separately

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

matchfile line magic

Here's the text file used to demonstrate %matchfile.

In [8]:
!cat data.txt
xyz
aaabxx
xabbbx
xyzab
xabxabx

And here's %matchfile itself - it uses the same engine as %%matchlines, so the only significant difference is that everything following the space after the filename is interpreted as pattern.

In [9]:
%matchfile data.txt a+b  
Pattern: a+b

xyz
aaabxx
x
abbbx
xyz
ab
x
abxabx

imatchfile cell magic

In [10]:
%imatchfile data.txt
Pattern: a+b+

xyz
aaabxx
x
abbbx
xyz
ab
x
abxabx