I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.
The re
module was added in Python 1.5, and provides Perl-style regular
expression patterns. The earlier version, regex
module was
removed completely in Python 2.5.
`match()` checks for a match only at the beginning of the string.
`search()` checks for a match anywhere of the string.
`findall()` find all substrings where the RE matches, and return them as a list
`finditer()` find all substrings where the RE matches, and return them as an iterator
import re
pat = [ 'first', 'second']
sent = 'What is first and second? what! *'
match = re.search(pat[0], sent)
if match:
print "found"
else:
print "not found"
import re
pat = [ 'first', 'second']
sent = 'What is first and second? what! *'
for p in pat:
if re.search(p, sent):
print "found"
else:
print "not found"
import re
pat = 'first'
sent = 'What is first and second? what! *'
match = re.search(pat, sent)
s = match.start()
e = match.end()
print '"%s" is found in "%s" from index %d to %d ("%s")' % \
(match.re.pattern, match.string, s, e, sent[s:e])
print "match.group(): " , match.group()
output
"first" is found in "What is first and second? what! *" from index 8 to 13 which is "first"
match.group(): first
import re
pat = 'first'
sent = 'What is first and second? what! *'
match = re.sub(pat, "1st", sent)
print match
output
What is 1st and second? what! *
\d | [0-9] |
\D | [^0-9] |
\s | Any whitespace-> [ \t\n\r\f\v] |
\S | Any non-whitespace-> [^ \t\n\r\f\v] |
\w | Any alphanumeric-> [a-zA-Z0-9_] |
\W | Any non-alphanumeric-> [^a-zA-Z0-9_] |
[s,.] | Any whitespace or ',' or '.' |
[..] | Anythin exceprt new line |
ca*t | ct, cat, caaat, etc |
a[bcd]*b | a + zero or more letter from [bcd] + b |
ca+t | cat, caat, but won't match ct |
home-?brew | homebrew or home-brew |
a/{1,3}b | a/b, a//b, a///b, but won't match ab |
p = re.compile('ab*c')
p.search("Where is abc?")
print match.group()
output
abc
a|b | "or" operator. any string that matches either a or b |
^ | Matches at the beginning of lines. |
$ | Matches at the end of a line. |
\A | Matches only at the start of the string. |
\Z | Matches only at the end of the string. |
\b | Word boundary, empty string at the beginning or end of a word |
\B | empty string not at the beginning or end of a word |