In [ ]:

```
__author__ = "Chris Potts"
__version__ = "CS224u, Stanford, Spring 2018 term"
```

Make sure the

`sys.path.append`

value is the path to your local SippyCup repository. (Alternatively, you can add SippyCup to your Python path; see one of the teaching team if you'd like to do that but aren't sure how.)Make sure that semparse_math_bakeoff_data.py is in the current directory (or available via your Python path).

In [ ]:

```
from copy import copy
import random
import sys
sys.path.append('../sippycup')
```

It's an indeterminate time in the future. An alien invasion is imminent. We have intercepted many of the aliens' transmissions and begun the process of decoding their language. Luckily, we have found a small database of alien language statements paired with numbers that seem to be the denotations of those statements.

Linguists, working tirelessly, have translated the numbers into standard arabic notation, but they have made little headway in understanding the meanings of the words and phrases in the statements. Standard bag-of-words classifiers were little help with the high-dimensional output space.

You've been called in personally by World President Zahara Jolie-Pitt-Kardashian to complete the translation task. Your goal is to use the available data to induce a lexicon mapping alien words to their associated mathematical concepts. Time is of the essence.

The data are available in `semparse_math_bakeoff_data.py`

, which contains two lists of SippyCup `Example`

instances:

In [ ]:

```
from semparse_math_bakeoff_data import mathbake_train, mathbake_dev
```

Let's check out some examples:

In [ ]:

```
for i in range(5):
ex = mathbake_train[random.randint(0, len(mathbake_train))]
print(ex.input, ex.denotation)
```

To start, the goal should be to create a grammar that can find at least one parse with the correct denotation. With that done, we can rely on features and our training data to find weights that favor the correct parses.

The other linguists on the team have extracted the vocabulary, and they can say with confidence that the words in the grammar can be classified as follows:

In [ ]:

```
integers = ['fribbs', 'volms', 'scincs', 'kugns', 'glarc', 'sherle']
predicates = ['sniese', 'thouch', 'sklofg', 'scwokt']
```

We can begin building a crude grammar on this basis. We'll start with an empty one:

In [ ]:

```
import itertools
from parsing import Grammar
# Increasing this value will increase your chances of finding
# correct parses, but it will slow everything down.
import parsing
parsing.MAX_CELL_CAPACITY = 1000
gram = Grammar(start_symbol='$E')
```

To start, we assume that the integers all have their denotations somewhere in the interval [0,5], and we consider every hypothesis of that form:

In [ ]:

```
from parsing import Rule, add_rule
for w, i in itertools.product(integers, range(len(integers))):
add_rule(gram, Rule('$E', w, i))
```

We assume also that there are unary and binary operators, so we add those combination rules:

In [ ]:

```
# Unary connective, as in English "minus one":
add_rule(gram, Rule('$E', '$UnOp $E', lambda sems: (sems[0], sems[1])))
# First stage of binary connective, as in English "two plus":
add_rule(gram, Rule('$EBO', '$E $BinOp', lambda sems: (sems[1], sems[0])))
# Second stage of binary connective, as in English "(two plus) seven":
add_rule(gram, Rule('$E', '$EBO $E', lambda sems: (sems[0][0], sems[0][1], sems[1])))
```

Determining the semantic space of the operators is harder. The executor from SippyCup's `arithmetic.py`

seems like a reasonable place to start:

In [ ]:

```
unary_ops = {
'~': lambda x: -x
}
binary_ops = {
'+': lambda x, y: x + y,
'-': lambda x, y: x - y,
'*': lambda x, y: x * y
}
##################################################
#### Consider extending one or both ops dicts ####
ops = {key: val for key, val in itertools.chain(unary_ops.items(), binary_ops.items())}
def execute(semantics):
if isinstance(semantics, tuple):
op = ops[semantics[0]]
args = [execute(arg) for arg in semantics[1:]]
return op(*args)
else:
return semantics
```

**TO DO**: Bring in the words in `predicates`

, in the form of a set of grammar rules like those we added for the integers. Since you don't yet know whether the predicates are unary or binary, you'll have to add rules that allow for all possible meanings.

In [ ]:

```
##################################################
########## Add your operators rules here #########
```

If all is going well, the vast majority of sentences of the alien language should now have a parse with a correct denotation. That is, our oracle accuracy should be at least 80%. (In fact, if it is this high, it is probably 100% but the target sometimes wasn't included in the sample of parses found during search.)

In [ ]:

```
from parsing import parse_input
def check_oracle_accuracy(grammar=None, examples=mathbake_train, verbose=True):
oracle = 0
for ex in examples:
# All the denotations for all the parses:
dens = [execute(parse.semantics) for parse in gram.parse_input(ex.input)]
if ex.denotation in dens:
oracle += 1
elif verbose:
print("=" * 70)
print(ex.input)
print(set(dens))
print(ex.denotation)
percent_correct = int(round((oracle/float(len(examples)))*100, 0))
print("Oracle accuracy: %s / %s (%s%%)" % (oracle, len(examples), percent_correct))
```

In [ ]:

```
check_oracle_accuracy(grammar=gram, examples=mathbake_train, verbose=False)
```

If your oracle accuracy isn't above 80%, then consider expanding the space of operators defined by `ops`

and expanding the space of rules accordingly. **There's no guarantee that the alien language uses precisely the operators given by ops!** (Hint: it would be a lot of trouble to deal with operators that could return non-

`int`

values.)Your grammar is now successful in that it finds correct parses and associated denotations for the alien language. However, World President Zahara Jolie-Pitt-Kardashian is unlikely to be impressed, because you can't tell her *which* denotation is correct, and so you can't induce a translation lexicon either. To address this, we need to find feature weights that are effective at using the training data to identify the best hypotheses allowed by the grammar.

We can start with the core features given by `scoring.rule_features`

:

In [ ]:

```
from scoring import Model, rule_features
from arithmetic import ArithmeticDomain # A source of more feature functions!
def arithmetic_features(parse):
features = rule_features(parse)
# Consider adding to the features dict based on properties of
# parse and/or parse.semantics. SippyCup's `ArithmeticDomain`
# has a method `operator_precedence_features` that might be
# helpful here, for example.
return features
```

**TO DO**: Improve on the features returned by `arithmetic_features`

.

Now can build and train the model.

In [ ]:

```
model = Model(grammar=gram, feature_fn=arithmetic_features, executor=execute)
```

**TO DO**: Improve on the optimizer settings!

In [ ]:

```
from learning import latent_sgd
from metrics import DenotationAccuracyMetric
##################################################
#### Consider improving the optimizer settings ###
trained_model = latent_sgd(
model,
mathbake_train,
training_metric=DenotationAccuracyMetric(),
T=5,
loss='hinge',
l2_penalty=0.0,
eta=0.1)
```

Now that the model is trained, we can evaluate it on the held-out data:

In [ ]:

```
from experiment import evaluate_model
from metrics import denotation_match_metrics
evaluate_model(
model=trained_model,
examples=mathbake_dev,
examples_label="Dev",
metrics=denotation_match_metrics(),
print_examples=False)
```

- Enter your "denotation accuracy" score (non-oracle) from the above into the bake-off.
- Enter a description of the feature functions and optimization settings you used.

Submission URL: https://goo.gl/forms/DCVvih3prRRa06ov2

Our primary objective was to learn how to translate the alien language into our own language for math (basic arithmetic). To see how well we did, we can look at the weights the classifier learned for the core rule-based features.

In [ ]:

```
from operator import itemgetter
from collections import defaultdict
def view_lexical_features(weights):
# Get the lexical features:
feats = [(featname, val) for featname, val in weights.items()
if val > 0.0 and isinstance(featname, str) and featname.startswith('Rule')]
# Get the core parts:
lex = defaultdict(list)
for featname, val in feats:
r = eval(featname)
lex[r.rhs[0]].append((r.sem, val))
# Restrict to the highest weights for each feature:
for w, vals in lex.items():
maxval = max([x[1] for x in vals])
vals = [x for x in vals if x[1]==maxval]
lex[w] = vals
# Printout sorted by our own semantic operators:
for featname, vals in sorted(lex.items(), key=(lambda item: str(item[1]))):
for val in vals:
print("'%s' means %s (weight %0.02f)" % (featname, val[0], val[1]))
```

In [ ]:

```
view_lexical_features(trained_model.weights)
```

**Are you correct?**

Paste in your output from view_lexical_features here to find out!

(Paste in the entire output as printed in the cell above; the script that checks the input is pretty strict about the formatting.)

This isn't part of the bake-off submission. The stakes are higher here!