import fuzzingbook_utils
!pip install coverage gcovr
Requirement already satisfied: coverage in /opt/conda/lib/python3.6/site-packages (4.5.2) Requirement already satisfied: gcovr in /opt/conda/lib/python3.6/site-packages (4.1) Requirement already satisfied: jinja2 in /opt/conda/lib/python3.6/site-packages (from gcovr) (2.10) Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.6/site-packages (from jinja2->gcovr) (1.0)
Testing regular expression (regex) parsers is a challenging task. A simple regex, such as a?(b|c)+d*
, requires dozens of inputs to cover all possibilities.
To test regex, we use the python regex module (provided in the data/regex
directory of this project). We use the source code of the regex
module so that we can compute the coverage of the module.
from data.regex.regex_3 import _regex_core
from data.regex.regex_3 import regex
pattern = 'a?(b|c)+d*$'
print('a matches %s: %s' % (pattern, regex.match(pattern, 'a') != None))
print('b matches %s: %s' % (pattern, regex.match(pattern, 'ab') != None))
a matches a?(b|c)+d*$: False b matches a?(b|c)+d*$: True
An efficient approach to test regex is to use grammars. Regex are instances of regular languages - a subset of context-free grammar (CFG) - and, thus, can be tested with the techniques presented in this course.
Assume the following grammar, which represents the a?(b|c)+d*
regular expression.
example_grammar = {'<start>': ['<A><BC><D>'],
'<A>': ['a', '<empty>'],
'<BC>': ['b<BC2>', 'c<BC2>'],
'<BC2>': ['b<BC2>', 'c<BC2>', '<empty>'],
'<D>': ['d<D>', '<empty>'],
'<empty>': [''],
}
Reusing the techniques from the lecture we expand and visualize it.
from GrammarFuzzer import GrammarFuzzer, display_tree
def as_tree(grammar):
f = GrammarFuzzer(grammar)
derivation_tree = f.init_tree()
for i in range(6):
derivation_tree = f.expand_tree_once(derivation_tree)
display_tree(derivation_tree)
as_tree(example_grammar)
And we can use it to produce valid input values
def test_input(grammar):
f = GrammarFuzzer(grammar)
input_value = f.fuzz()
print('%s matches %s: %s' % (input_value, pattern, regex.match(pattern, input_value) != None))
for i in range(5):
test_input(example_grammar)
ccbbcbbb matches a?(b|c)+d*$: True abbbbd matches a?(b|c)+d*$: True c matches a?(b|c)+d*$: True bbc matches a?(b|c)+d*$: True bbcbbccbcbcc matches a?(b|c)+d*$: True
The goal of this project is to implement an algorithm that constructs a CFG from an arbitrary regular expression. Then, use the techniques learned in the lecture (i.e. efficient grammar-based fuzzing) to automatically expand your grammar and to automatically produce a set of inputs that match the given regular expression, covering as many grammar productions as possible.
In this project, we restrict the regex specification into its simpler constructions and grouped then into 3 categories: basic syntax, complex syntax and bonus as follows:
Category | Components |
---|---|
Basic Syntax | Anchors ^ and $ Groupings () Whitespace \s = [ \t\n\r\f\v] Quantifiers * , + , ? concatenation OR operator | |
Complex syntax | Character classes [0-9] , [a-z] , [A-Z] , [0-9A-Za-z] Special character classes \d = [0-9] , \w = [0-9a-zA-Z_] Bounded quantifier {} Set Negation [^] (requires look-ahead) \D = [^0-9] \W = [^0-9a-zA-Z_] \S = [^\s] |
Bonus | Boundaries \b and \B |
In your implementation, ignore any non-listed regex elements, such as flags, back references, other types of look-ahead, look-behind and greedy/lazy match.
I have also updated the project accordingly, you can git pull
to obtain the
latest version of the project.
We use the regex parser, version 2.4.151, instead of Python's native re
, because re
is just a wrapper for a C library. Confirm regex version using:
print(regex.__version__)
2.4.151
Ensure your implementation works on the following regex examples, and covers the provided minimum required coverage for parsing a represententative input for each regex:
Category | Examples | Min. Cov. LOC _regex_core.py | Min. Cov. LOC *.c |
---|---|---|---|
Basic Syntax | ^ab$ |
241 (8%) | 988 (7%) |
Basic Syntax | b$ |
170 (6%) | 949 (6%) |
Basic Syntax | ab*$ |
207 (7%) | 1142 (8%) |
Basic Syntax | a|b |
308 (11%) | 1024 (7%) |
Basic Syntax | a(b|c) |
321 (11%) | 1198 (8%) |
Basic Syntax | (a|b)\s |
334 (12%) | 1131 (8%) |
Basic Syntax | ^a*(b+|c)d? |
413 (15%) | 1442 (10%) |
Basic Syntax | ^a*(b+|c)d?\se |
455 (16%) | 1632 (11%) |
Basic Syntax | ^a+(b*|c)d?\se |
426 (15%) | 1564 (11%) |
Complex Syntax | ^[0-9]+\s[a-z]*\s?[A-Z]+$ |
303 (11%) | 1228 (9%) |
Complex Syntax | \w$ |
169 (6%) | 878 (6%) |
Complex Syntax | \d$ |
169 (6%) | 875 (6%) |
Complex Syntax | (a|b){2,5}$ |
401 (14%) | 1380 (10%) |
Complex Syntax | [0-9a-zA-Z]$ |
252 (9%) | 902 (6%) |
Complex Syntax | [0-9]$ |
222 (8%) | 852 (6%) |
Complex Syntax | [a-z]$ |
222 (8%) | 852 (6%) |
Complex Syntax | [^a-zA-Z]$ |
213 (7%) | 879 (6%) |
Complex Syntax | [^0-9]$ |
178 (6%) | 825 (6%) |
Complex Syntax | \D$ |
168 (6%) | 877 (6%) |
Complex Syntax | \W$ |
122 (4%) | 852 (6%) |
Complex Syntax | \S$ |
122 (4%) | 852 (6%) |
Complex Syntax | [a-zA-Z]\s\w+\s\d\s$ |
324 (11%) | 1295 (9%) |
Complex Syntax | a(bc){2,5}$ |
297 (10%) | 1472 (10%) |
Bonus | \babc\b$ |
258 (9%) | 865 (6%) |
Bonus | abc\B |
177 (6%) | 987 (7%) |
Bonus | \babc\B |
247 (9%) | 928 (6%) |
Note that the code coverage during the execution of the import statement (import regex
) is unacccounted for, since this module was executed before the collection of coverage data began. We also do not track or evaluate the coverage of the python code (regex.py
), since it is always the same forregex.match
method call.
Coverage data for Python (i.e. _regex_core.py
) were obtained using a python utility for measuring code coverage of Python programs called Coverage.py, version 4.5.2. Confirm your Coverage.py version:
!coverage --version
Coverage.py, version 4.5.2 with C extension Documentation at https://coverage.readthedocs.io
Coverage data for the C extensions of regex (i.e. _regex.c and _regex_unicode.c) were obtained using the python utility for GNU gcov utility Gcovr, version 4.1. You can confirm your Gcovr version using:
!gcovr --version
gcovr 4.1 Copyright 2013-2018 the gcovr authors Copyright 2013 Sandia Corporation Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains certain rights in this software.
Your generated test inputs will be executed on the fuzzingbook server, based on the amount of executed methods and lines of code (LOC).
Note: if you are developing locally, use the source code of version 2018.11.07 of the regex parser, however test on the fuzzingbook server before submission.
Your implementation will be tested with a secret set of regex, encompassing all previously specified syntax categories.
For grading:
The following functions are available to assist your implementation and tests:
Using the GrammarCoverageFuzzer, produce unique set of inputs from your grammar:
from GrammarCoverageFuzzer import GrammarCoverageFuzzer
from Grammars import is_valid_grammar, def_used_nonterminals
import random, time
def generate_inputs(grammar):
assert is_valid_grammar(grammar)
start_seed, end_seed = 2000, 2005
inputs = []
for seed in range(start_seed, end_seed):
random.seed(seed)
f = GrammarCoverageFuzzer(grammar)
#time_spent < 3 hrs = 10800 sec
start_time = time.time()
time_spent = 0
while (time_spent <= 10800) and (len(inputs) <= 1000) and (len(f.max_expansion_coverage() - f.expansion_coverage()) > 0):
inputs.append(f.fuzz())
end_time = time.time()
time_spent = end_time - start_time
#print("set(inputs): ", set(inputs))
return set(inputs)
Your grammar should produce only valid inputs. To ensure that all inputs your grammar can produce are valid you can use:
def ensure_all_valid(pattern, inputs):
valid = True
res = [[inp, regex.match(pattern, inp) != None] for inp in inputs]
failures = list(filter(lambda x: not x[1], res))
if len(failures) > 0:
#raise ValueError("Invalid inputs: " + str([x[0] for x in failures]))
valid = False
return valid
You can use the run_with_py_coverage
function to obtain the python code covered by each input generated by your grammar.
from coverage import Coverage
import os
def run_with_py_coverage(pattern, inp):
cwd = os.getcwd()
if cwd.split("/")[-2] != "data" and cwd.split("/")[-1] != "regex":
target_dir = os.path.join(cwd,"data", "regex")
else:
target_dir = cwd
os.chdir(target_dir)
regex._cache = {}
cov = Coverage(include=["*.py"])
cov.start()
regex.match(pattern, str(inp))
cov.stop()
cov.save()
results = cov.analysis2("regex_3/_regex_core.py")
covered = set(results[1]) - set(results[3])
os.chdir(cwd)
return covered, len(set(results[1]))
We implement helper function to execute bash commands:
import subprocess
def execute_bash_command(command):
process = subprocess.Popen(command, shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, error = process.communicate()
error = error.decode("utf-8").strip()
if error:
raise ValueError("bash command error: '{0}',\n during execution of command: \n '{1}': ".format(error, command ))
res = out.decode("utf-8").strip()
return res
We implement the function replace_all_special_chars()
to escape all special chars in bash command.
def replace_all_special_chars(inp):
#print("inp: ", inp)
special_char_repl = {"\\": "\\\\\\\\", "\n": "\\\n", "\r": "\\\r", \
"\'": "\\\'", '"':'\\"', '`':'\\`'}
#"\\": "\\\\\\"
for key in special_char_repl:
inp = inp.replace(key, special_char_repl[key])
#print("replaced inp: ", inp)
return inp
We implement the function run_with_c_coverage()
to obtain coverage for the C classes in the regex module.
import os
def run_with_c_coverage(pattern, inp):
cwd = os.getcwd()
if cwd.split("/")[-2] != "data" and cwd.split("/")[-1] != "regex":
target_dir = os.path.join(cwd , "data", "regex")
else:
target_dir = cwd
if os.getcwd() != target_dir and os.path.isdir(target_dir):
os.chdir(target_dir)
#remove old build
build_dir = os.path.join(os.getcwd(), "build")
if os.path.isdir(build_dir):
rm_command = "rm -r " + build_dir
execute_bash_command(rm_command)
#re-build regex
c_flags = 'CFLAGS="-fprofile-arcs -ftest-coverage -fPIC -g -O0" '
build_instr = 'python setup.py build_ext --inplace > /dev/null'
build_command = c_flags + build_instr
execute_bash_command(build_command)
#run regex match test
run_test_command = """python -c "from regex_3 import _regex_core;\
from regex_3 import regex;pattern = '{0}'; inp = '{1}';\
regex._cache = {{}}; regex.match(pattern, str(inp))" """.\
format(pattern, replace_all_special_chars(inp))
execute_bash_command(run_test_command)
#obtain coverage of c extensions
gcovr_instr = 'gcovr -r . '
grep_TOTAL = 'grep -n "TOTAL" '
c_cov_command = gcovr_instr + "| " + grep_TOTAL
cov_out = execute_bash_command(c_cov_command)
coverage = cov_out.split()[2]
percent_coverage = cov_out.split()[3]
os.chdir(cwd)
return coverage, percent_coverage
For all your inputs, use population_coverage
to obtain the overall cummulative coverage of Python code and the maximum coverage for C Code.
def population_coverage(pattern, population):
py_all_coverage = set()
c_max_coverage = 0
c_max_percent = 0
py_max_percent = ""
for inp in population:
#obtain all covered LOC for Python code
py_covered, py_length = run_with_py_coverage(pattern, inp)
py_all_coverage |= py_covered
#obtain max coverage for C code
c_covered, percent_covered = run_with_c_coverage(pattern, inp)
if c_max_coverage < int(c_covered):
c_max_coverage = int(c_covered)
c_max_percent = percent_covered
py_max_percent = str(int(len(py_all_coverage)/py_length * 100)) + "%"
return len(py_all_coverage), c_max_coverage, py_max_percent, c_max_percent
Write your code here to create a grammar out of an arbitrary regular expression.
The following code will be used to run your evaluation.
Note that the set of regular expression which will be used for evaluation may be different from those provided as example.
old_simple = [[r'^ab$', 241, 988], [r'b$', 170, 949],[r'ab*$', 207, 1142],\
[r'a|b$',308, 1024], [r'a(b|c)$', 321, 1198], [r'(a|b)\s$',334, 1131],\
[r'^a*(b+|c)d?$', 413, 1442],[r'^a*(b+|c)d?\se$', 455, 1632],\
[r'^a+(b*|c)d?\se$',426, 1564]]
new_simple =[[r'^\s+$', 278, 1099],[r'^$', 157, 820], [r'/b(a|e|i|o|u)t/', 303, 1171], [r'^The$', 241, 988], \
[r'of despair$', 170, 998], [r'^abc$', 241, 988], [r'grammar$', 170, 999], [r'hi|hello$', 272, 1088], \
[r'ab+$', 207, 1153], [r'ab?$', 207, 1140], [r'a?b+$', 284, 1202], [r'(b|cd)ef$', 389, 1294], \
[r'a(bc)*$', 273, 1439], [r'(a|b)*c$', 398, 1482]]
simple_regex = old_simple + new_simple
old_complex = [[r'^[0-9]+\s[a-z]*\s?[A-Z]+$', 303, 1228],\
[r'\w$', 169, 878], [r'\d$', 169, 875], [r'(a|b){2,5}$', 401, 1380],\
[r'[0-9a-zA-Z]$', 252, 902], [r'[0-9]$', 222, 852], [r'[a-z]$', 222, 852],\
[r'[^a-zA-Z]', 213, 879], [r'[^0-9]', 178, 825],\
[r'\D$', 168, 877], [r'\W$', 122, 852] , [r'\S$', 122, 852],\
[r'[a-zA-Z]\s\w+\s\d\s$', 324, 1295], [r'a(bc){2,5}$', 297, 1472]]
new_complex = [[r'(</?(\w|\s)*>|<.+\W>)$', 423, 1752], [r'\d{2}[ ]?\d{3}$', 356, 1100], \
[r'\d{3}-\d{4}$', 330, 1173], [r'[ABCEGHJKLMNPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ ]?\d[ABCEGHJ-NPRSTV-Z]\d$', 342, 1160], \
[r'\d{5}([ -]\d{4})?$', 430, 1470], [r'\d{5}$', 299, 1072], [r'(/[*]+)((\w| )*)([*]+/)$', 424, 1637], \
[r'^((\w| )+)/([^/]+)$', 473, 1618], [r'^[A-PR-WY][1-9]\d\s?\d{4}[1-9]$', 390, 1201], \
[r'(^\d{5}(-\d{4})?)|(^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1})$', 524, 1587], \
[r'^([1-9]|0[1-9]|[12][0-9]|3[01])\D([1-9]|0[1-9]|1[012])\D(19[0-9][0-9]|20[0-9][0-9])$', 455, 1366], \
[r'^3[47][0-9]{13}$', 375, 1244], [r'^(\w|,|\s|-)+[.][A-Za-z]{3}$', 520, 1673], \
[r'^([a-z][a-z0-9-]+([.]|-*[.]))+[a-z]{2,6}$', 494, 1763], \
[r'^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$', 530, 1891], \
[r'^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])[.](\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]){3}$', 487, 1723], \
[r'^((\w| )*[a-z])((\w| )*[A-Z])((\w| )*\d)(\w| ){6,12}$', 482, 1640], [r'^[a-z0-9_-]{6,18}$', 361, 1149], \
[r'^([a-z0-9_.+-]+)@([0-9a-z.-]+)[.]([a-z.]{2,6})$', 444, 1571], [r'^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+[.][a-zA-Z]{2,6})*$', 431, 1670], \
[r'^([a-z0-9_.-]+)@([0-9a-z.-]+)[.]([a-z.]{2,6})$', 444, 1571],[r'^[a-z0-9_-]{3,16}$', 361, 1149], \
[r'^(19|20)\d{2}$', 465, 1415],[r'^[+]?([0-9]|\s)+[(]?([0-9]|\s){10,}$', 475, 1697], \
[r'^[+]?([0-9]|\s){3,}$', 475, 1638],[r'^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$', 401, 1324], \
[r'^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$', 496, 1468],[r'^#?([a-f0-9]{6}|[a-f0-9]{3})$', 496, 1468], \
[r'^[a-z0-9-]+$', 336, 1147], [r'^[+-<>., ]+$', 336, 1151], \
[r'^([01]?[0-9]|2[0-3]):[0-5][0-9]$', 462, 1584],[r'^(0?[1-9]|[12][0-9]|3[01])([ /-])(0?[1-9]|1[012])2([0-9][0-9][0-9][0-9])(([ -])([0-1]?[0-9]|2[0-3]):[0-5]?[0-9]:[0-5]?[0-9])?$', 465, 1786], \
[r'^-?\d*([.]\d+)?$', 364, 1451], [r'^\d*([.]\d+)?$', 363, 1430], [r'^\d*[.]\d+$', 359, 1241], \
[r'^-?\d*[.]?\d+$', 353, 1284], [r'^-\d*[.]?\d+$', 331, 1310],[r'^\d*[.]?\d+$', 352, 1262], \
[r'^(\w| )+[.](png|jpg|jpeg|gif|webp)$', 501, 1639],[r'[-]?[0-9]+[,.]?[0-9]*([/][0-9]+[,.]?[0-9]*)*$', 394, 1549], \
[r'^-?\d+$', 317, 1161], [r'^-\d+$', 294, 1189], [r'^\d+$', 278, 1097] , [r'^([0-9]{3}|)$', 0, 0]
]
complex_regex = old_complex + new_complex
old_bonus = [[r'\babc\b$', 258, 865], [r'abc\B', 177, 987], [r'\babc\B', 247, 928]]
new_bonus = [[r'\b\d{13,16}\b$', 303, 1063],[r'\bon\w+=\S+((\w| )*>)$', 436, 1263]]
bonus_regex = old_bonus + new_bonus
old_secret_set_of_regex = old_simple + old_complex + old_bonus
new_secret_set_of_regex = new_simple + new_complex + new_bonus
secret_set_of_regex = old_secret_set_of_regex + new_secret_set_of_regex
from multiprocessing import Process
import traceback
def run_with_limited_time(func, args, kwargs, time):
p = Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(time)
if p.is_alive():
p.terminate()
return False
return True
to_process = []
for pattern, py_min_coverage, c_min_coverage in secret_set_of_regex:
try:
if run_with_limited_time(regex_to_CFG, (pattern,), {}, 60):
grammar = regex_to_CFG(pattern)
to_process.append([pattern, py_min_coverage, c_min_coverage, grammar])
except Exception as e:
e.__traceback__ = None
traceback.clear_frames(e.__traceback__)
pass
Each regex will be evaluated individually, against a minimum precomputed coverage. Plot coverage result and indicate errors.
def evaluate(grammar, pattern, py_min_coverage, c_min_coverage):
py_all_coverage, c_max_coverage = 0, 0
py_percent, c_percent = "0", "0"
try:
inputs = generate_inputs(grammar)
if ensure_all_valid(pattern, inputs):
py_all_coverage, c_max_coverage, py_percent, c_percent = population_coverage(pattern, inputs)
except Exception as e:
e.__traceback__ = None
traceback.clear_frames(e.__traceback__)
pass
return py_all_coverage >= py_min_coverage and c_max_coverage >= c_min_coverage, py_all_coverage, c_max_coverage, py_percent, c_percent
Plot coverage result and indicate errors.
import numpy as np
import matplotlib.pyplot as plt
def plot_bar_chart(py_cov_tuple, c_cov_tuple, py_err, c_err, num_grammars):
fig, ax = plt.subplots(figsize=(20,10))
index = np.arange(num_grammars)
bar_width = 0.4
opacity = 0.4
error_config = {'ecolor': '0.3'}
rects1 = ax.bar(index, py_cov_tuple, bar_width,
alpha=opacity, color='b', yerr=py_err,
error_kw=error_config,
label='Python Code')
rects2 = ax.bar(index + bar_width, c_cov_tuple, bar_width,
alpha=opacity, color='r', yerr=c_err,
error_kw=error_config,
label='C Code')
ax.set_xlabel('Regex')
ax.set_ylabel('Coverage (LOC)')
ax.set_title('Coverage of Regex by Language (Python & C)')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(tuple([str(y) for y in tuple(range(1, len(secret_set_of_regex) + 1))]))
ax.legend()
fig.tight_layout()
plt.show()
Final result will be computed accordingly:
def compute_results():
overall_result, py_cov_list, c_cov_list, py_err, c_err, evaluation_list, failed_regexes = [], [], [], [], [], [], []
dash = '-' * 98
print(dash)
print('{:<28s}{:>12s}{:>12s}{:>12s}{:>12s}{:>18s}'.format("Pattern", "Py_cov (LOC)", "Py_cov (%)", "C_cov (LOC)", "C_cov (%)", "Pass(1)/Fail(0)"))
print(dash)
passed_regexes = set()
for pattern, py_min_coverage, c_min_coverage, grammar in to_process:
evaluation_result, py_all_coverage, c_max_coverage, py_percent, c_percent = evaluate(grammar, pattern, py_min_coverage, c_min_coverage)
py_cov_list.append(py_all_coverage)
c_cov_list.append(c_max_coverage)
evaluation_list.append(evaluation_result)
if py_all_coverage < py_min_coverage:
py_err.append((py_all_coverage - py_min_coverage ))
failed_regexes.append([pattern, "Py"])
else:
py_err.append(0)
if c_max_coverage < c_min_coverage:
c_err.append((c_max_coverage - c_min_coverage ))
failed_regexes.append([pattern, "C"])
else:
c_err.append(0)
if py_all_coverage >= py_min_coverage and c_max_coverage >= c_min_coverage:
passed_regexes.add(pattern)
print('{:<28s}{:>12d}{:>12s}{:>12d}{:>12s}{:>12b}'.format(pattern, py_all_coverage, py_percent, c_max_coverage, c_percent, c_max_coverage >= c_min_coverage and py_all_coverage >= py_min_coverage))
overall_result.append([pattern, evaluation_result])
plot_bar_chart(tuple(py_cov_list), tuple(c_cov_list), tuple(py_err), tuple(c_err), len(to_process))
score = sum(evaluation_list)
print("Score: {0}/{1}\nScore Percentage: {2}%".format(score,len(secret_set_of_regex), ((score*100)/len(secret_set_of_regex))))
if failed_regexes:
print("The following regexes failed the coverage baseline: ", failed_regexes)
return failed_regexes, passed_regexes
failed_regex, passed_regex = compute_results()
-------------------------------------------------------------------------------------------------- Pattern Py_cov (LOC) Py_cov (%) C_cov (LOC) C_cov (%) Pass(1)/Fail(0) -------------------------------------------------------------------------------------------------- ^ab$ 241 8% 988 7% 1 b$ 170 6% 949 6% 1 ab*$ 207 7% 1142 8% 1 a|b$ 308 11% 1024 7% 1 a(b|c)$ 321 11% 1198 8% 1 (a|b)\s$ 334 12% 1131 8% 1 ^a*(b+|c)d?$ 413 15% 1442 10% 1 ^a*(b+|c)d?\se$ 455 16% 1633 11% 1 ^a+(b*|c)d?\se$ 426 15% 1564 11% 1 ^[0-9]+\s[a-z]*\s?[A-Z]+$ 303 11% 1228 9% 1 \w$ 169 6% 878 6% 1 \d$ 169 6% 875 6% 1 (a|b){2,5}$ 401 14% 1380 10% 1 [0-9]$ 222 8% 852 6% 1 [a-z]$ 222 8% 852 6% 1 [^a-zA-Z] 213 7% 879 6% 1 [^0-9] 178 6% 825 6% 1 \D$ 168 6% 877 6% 1 \W$ 168 6% 879 6% 1 \S$ 168 6% 879 6% 1 [a-zA-Z]\s\w+\s\d\s$ 324 11% 1295 9% 1 a(bc){2,5}$ 297 10% 1472 10% 1 \babc\b$ 258 9% 865 6% 1 abc\B 177 6% 987 7% 1 \babc\B 247 9% 928 6% 1 ^\s+$ 278 10% 1099 8% 1 ^$ 157 5% 820 6% 1 /b(a|e|i|o|u)t/ 303 11% 1171 8% 1 ^The$ 241 8% 988 7% 1 of despair$ 170 6% 998 7% 1 ^abc$ 241 8% 988 7% 1 grammar$ 170 6% 999 7% 1 hi|hello$ 272 9% 1088 8% 1 ab+$ 207 7% 1153 8% 1 ab?$ 207 7% 1140 8% 1 a?b+$ 284 10% 1202 8% 1 (b|cd)ef$ 389 14% 1294 9% 1 a(bc)*$ 273 9% 1439 10% 1 (a|b)*c$ 398 14% 1482 10% 1 (</?(\w|\s)*>|<.+\W>)$ 423 15% 1752 12% 1 \d{2}[ ]?\d{3}$ 356 12% 1100 8% 1 \d{3}-\d{4}$ 330 12% 1173 8% 1 [ABCEGHJKLMNPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ ]?\d[ABCEGHJ-NPRSTV-Z]\d$ 342 12% 1160 8% 1 \d{5}([ -]\d{4})?$ 430 15% 1470 10% 1 \d{5}$ 299 10% 1072 7% 1 (/[*]+)((\w| )*)([*]+/)$ 424 15% 1637 12% 1 ^((\w| )+)/([^/]+)$ 473 17% 1618 11% 1 ^[A-PR-WY][1-9]\d\s?\d{4}[1-9]$ 390 14% 1201 8% 1 (^\d{5}(-\d{4})?)|(^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1})$ 524 19% 1587 11% 1 ^([1-9]|0[1-9]|[12][0-9]|3[01])\D([1-9]|0[1-9]|1[012])\D(19[0-9][0-9]|20[0-9][0-9])$ 455 16% 1366 10% 1 ^3[47][0-9]{13}$ 375 13% 1244 9% 1 ^(\w|,|\s|-)+[.][A-Za-z]{3}$ 520 18% 1673 12% 1 ^([a-z][a-z0-9-]+([.]|-*[.]))+[a-z]{2,6}$ 494 18% 1763 12% 1 ^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$ 0 0 0 0 0 ^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])[.](\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]){3}$ 487 17% 1723 12% 1 ^((\w| )*[a-z])((\w| )*[A-Z])((\w| )*\d)(\w| ){6,12}$ 482 17% 1640 12% 1 ^[a-z0-9_-]{6,18}$ 361 13% 1149 8% 1 ^([a-z0-9_.+-]+)@([0-9a-z.-]+)[.]([a-z.]{2,6})$ 444 16% 1571 11% 1 ^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+[.][a-zA-Z]{2,6})*$ 431 15% 1670 12% 1 ^([a-z0-9_.-]+)@([0-9a-z.-]+)[.]([a-z.]{2,6})$ 444 16% 1571 11% 1 ^[a-z0-9_-]{3,16}$ 361 13% 1149 8% 1 ^(19|20)\d{2}$ 465 16% 1415 10% 1 ^[+]?([0-9]|\s)+[(]?([0-9]|\s){10,}$ 0 0 0 0 0 ^[+]?([0-9]|\s){3,}$ 0 0 0 0 0 ^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$ 401 14% 1324 9% 1 ^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$ 496 18% 1468 10% 1 ^#?([a-f0-9]{6}|[a-f0-9]{3})$ 496 18% 1468 10% 1 ^[+-<>., ]+$ 336 12% 1151 8% 1 ^([01]?[0-9]|2[0-3]):[0-5][0-9]$ 462 16% 1513 11% 0 ^(0?[1-9]|[12][0-9]|3[01])([ /-])(0?[1-9]|1[012])2([0-9][0-9][0-9][0-9])(([ -])([0-1]?[0-9]|2[0-3]):[0-5]?[0-9]:[0-5]?[0-9])?$ 465 16% 1786 13% 1 ^-?\d*([.]\d+)?$ 364 13% 1451 10% 1 ^\d*([.]\d+)?$ 363 13% 1430 10% 1 ^\d*[.]\d+$ 359 13% 1241 9% 1 ^-?\d*[.]?\d+$ 353 12% 1284 9% 1 ^-\d*[.]?\d+$ 331 12% 1302 9% 0 ^\d*[.]?\d+$ 352 12% 1262 9% 1 ^(\w| )+[.](png|jpg|jpeg|gif|webp)$ 501 18% 1639 12% 1 [-]?[0-9]+[,.]?[0-9]*([/][0-9]+[,.]?[0-9]*)*$ 394 14% 1549 11% 1 ^-?\d+$ 317 11% 1161 8% 1 ^-\d+$ 294 10% 1189 8% 1 ^\d+$ 278 10% 1097 8% 1 ^([0-9]{3}|)$ 412 15% 1334 9% 1 \b\d{13,16}\b$ 303 11% 1063 7% 1 \bon\w+=\S+((\w| )*>)$ 436 15% 1263 9% 1
Score: 81/86 Score Percentage: 94.18604651162791% The following regexes failed the coverage baseline: [['^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$', 'Py'], ['^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$', 'C'], ['^[+]?([0-9]|\\s)+[(]?([0-9]|\\s){10,}$', 'Py'], ['^[+]?([0-9]|\\s)+[(]?([0-9]|\\s){10,}$', 'C'], ['^[+]?([0-9]|\\s){3,}$', 'Py'], ['^[+]?([0-9]|\\s){3,}$', 'C'], ['^([01]?[0-9]|2[0-3]):[0-5][0-9]$', 'C'], ['^-\\d*[.]?\\d+$', 'C']]
total_simple = len(simple_regex)
total_complex = len(complex_regex)
total_bonus = len(bonus_regex)
num_passed_simple, num_passed_complex, num_passed_bonus = 0,0,0
def getRegexPatterns(eval_regex):
res = set()
for elem in eval_regex:
res.add(str(elem[0]))
return res
simple_patterns = getRegexPatterns(simple_regex)
complex_patterns = getRegexPatterns(complex_regex)
bonus_patterns = getRegexPatterns(bonus_regex)
for item in passed_regex:
if item in simple_patterns:
num_passed_simple += 1
continue
if item in complex_patterns:
num_passed_complex += 1
continue
if item in bonus_patterns:
num_passed_bonus += 1
continue
print("Detailed report of the implementation's score for each category:")
print("\tSimple Regex: {0}/{1}".format(num_passed_simple, total_simple))
print("\tComplex Regex: {0}/{1}".format(num_passed_complex, total_complex))
print("\tBonus Regex: {0}/{1}".format(num_passed_bonus, total_bonus))
Detailed report of the implementation's score for each category: Simple Regex: 23/23 Complex Regex: 53/58 Bonus Regex: 5/5
print("failed_regex: ", getRegexPatterns(failed_regex))
failed_regex: {'^-\\d*[.]?\\d+$', '^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])[.]){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$', '^([01]?[0-9]|2[0-3]):[0-5][0-9]$', '^[+]?([0-9]|\\s)+[(]?([0-9]|\\s){10,}$', '^[+]?([0-9]|\\s){3,}$'}
set_unattempted_regex = getRegexPatterns(secret_set_of_regex) - getRegexPatterns(failed_regex) - passed_regex
print("set_unattempted_regex: ", set_unattempted_regex)
set_unattempted_regex: set()
print("# failed_regex: ", len(getRegexPatterns(failed_regex)))
print("# set_unattempted_regex: ", len(set_unattempted_regex))
# failed_regex: 5 # set_unattempted_regex: 0