Learning an alien language

In [ ]:
__author__ = "Chris Potts"
__version__ = "CS224u, Stanford, Spring 2016 term"

Set-up

  1. Make sure the sys.path.append value is the path to your local SippyCup repository. (Alternatively, you can add SippyCup to your Python path; see one of the teaching team if you'd like to do that but aren't sure how.)

  2. Make sure that semparse_math_bakeoff_data.py is in the current directory (or available via your Python path).

In [ ]:
import sys
sys.path.append('../sippycup')

The doomsday scenario

It's an indeterminate time in the future. An alien invasion is imminent. We have intercepted many of the aliens' transmissions and begun the process of decoding their language. Luckily, we have found a small database of alien language statements paired with numbers that seem to be the denotations of those statements.

Linguists, working tirelessly, have translated the numbers into standard arabic notation, but they have made little headway in understanding the meanings of the words and phrases in the statements. Standard bag-of-words classifiers were little help with the high-dimensional output space.

You've been called in personally by World President Zahara Jolie-Pitt-Kardashian to complete the translation task. Your goal is to use the available data to induce a lexicon mapping alien words to their associated mathematical concepts. Time is of the essence.

In-class bake-off

Turn in the "denotation accuracy" score you obtain from the the evaluate_model use at the end of the "Objective 2" section. We'll report on the results in the discussion forum.

(You'll probably want to keep going from there to find out whether you cracked the code.)

In [ ]:
from copy import copy
import random

The data

The data are available in semparse_math_bakeoff_data.py, which contains two lists of SippyCup Example instances:

In [ ]:
from semparse_math_bakeoff_data import mathbake_train, mathbake_dev

Let's check out some examples:

In [ ]:
for i in range(5):
    ex = mathbake_train[random.randint(0, len(mathbake_train))]
    print(ex.input, ex.denotation)

Objective 1: Oracle accuracy

To start, the goal should be to create a grammar that can find at least one parse with the correct denotation. With that done, we can rely on features and our training data to find weights that favor the correct parses.

The other linguists on the team have extracted the vocabulary, and they can say with confidence that the words in the grammar can be classified as follows:

In [ ]:
integers = ['fribbs', 'volms', 'scincs', 'kugns', 'glarc', 'sherle']
predicates = ['sniese', 'thouch', 'sklofg', 'scwokt']

We can begin building a crude grammar on this basis. We'll start with an empty one:

In [ ]:
import itertools
from parsing import Grammar

# Increasing this value will increase your chances of finding 
# correct parses, but it will slow everything down.
import parsing
parsing.MAX_CELL_CAPACITY = 1000

gram = Grammar(start_symbol='$E') 

To start, we assume that the integers all have their denotations somewhere in the interval [0,5], and we consider every hypothesis of that form:

In [ ]:
from parsing import Rule, add_rule

for w, i in itertools.product(integers, range(len(integers))):
    add_rule(gram, Rule('$E', w, i))

We assume also that there are unary and binary operators, so we add those combination rules:

In [ ]:
# Unary connective, as in English "minus one":
add_rule(gram, Rule('$E', '$UnOp $E', lambda sems: (sems[0], sems[1])))

# First stage of binary connective, as in English "two plus":
add_rule(gram, Rule('$EBO', '$E $BinOp', lambda sems: (sems[1], sems[0])))

# Second stage of binary connective, as in English "(two plus) seven":
add_rule(gram, Rule('$E', '$EBO $E', lambda sems: (sems[0][0], sems[0][1], sems[1])))

Determining the semantic space of the operators is harder. The executor from SippyCup's arithmetic.py seems like a reasonable place to start:

In [ ]:
unary_ops = {
    '~': lambda x: -x
}

binary_ops = {
    '+': lambda x, y: x + y,
    '-': lambda x, y: x - y,
    '*': lambda x, y: x * y
}

##################################################
#### Consider extending one or both ops dicts ####




ops = {key:val for key, val in itertools.chain(unary_ops.items(), binary_ops.items())}

def execute(semantics):
    if isinstance(semantics, tuple):
        op = ops[semantics[0]]
        args = [execute(arg) for arg in semantics[1:]]
        return op(*args)
    else:
        return semantics

TO DO: bring in the words in predicates, in the form of a set of grammar rules like those we added for the integers:

In [ ]:
##################################################
########## Add your operators rules here ######### 

If all is going well, the vast majority of sentences of the alien language should now have a parse with a correct denotation. That is, our oracle accuracy should be at least 80%. (In fact, if it is this high, it is probably 100% but the target sometimes wasn't included in the sample of parses found during search.)

In [ ]:
from parsing import parse_input

def check_oracle_accuracy(grammar=None, examples=mathbake_train, verbose=True):
    oracle = 0
    for ex in examples:
        # All the denotations for all the parses:
        dens = [execute(parse.semantics) for parse in gram.parse_input(ex.input)]
        if ex.denotation in dens:
            oracle += 1
        elif verbose:
            print("=" * 70)
            print(ex.input)
            print(set(dens))
            print(ex.denotation)
    percent_correct = int(round((oracle/float(len(examples)))*100, 0))
    print("Oracle accuracy: %s / %s (%s%%)" % (oracle, len(examples), percent_correct))
In [ ]:
check_oracle_accuracy(grammar=gram, examples=mathbake_train, verbose=False)

If your oracle accuracy isn't above 80%, then consider expanding the space of operators defined by ops and expanding the space of rules accordingly. There's no guarantee that the alien language uses precisely the operators given by ops!

Objective 2: Predictive accuracy

Your grammar is now successful in that it finds correct parses and associated denotations for the alien language. However, World President Zahara Jolie-Pitt-Kardashian is unlikely to be impressed, because you can't tell her which denotation is correct, and so you can't induce a translation lexicon either. To address this, we need to find feature weights that are effective at using the training data to identify the best hypotheses allowed by the grammar.

We can start with the core features given by scoring.rule_features:

In [ ]:
from scoring import Model, rule_features
from arithmetic import ArithmeticDomain # A source of more feature functions!

def arithmetic_features(parse):
    features = rule_features(parse)
    # Consider adding to the features dict based on properties of
    # parse and/or parse.semantics
    
    
    
    return features

TO DO: Improve on the features returned by arithmetic_features.

Now can build and train the model.

In [ ]:
model = Model(grammar=gram, feature_fn=arithmetic_features, executor=execute)

TO DO: improve on the optimizer settings!

In [ ]:
from learning import latent_sgd
from metrics import DenotationAccuracyMetric

##################################################
#### Consider improving the optimizer settings ###

trained_model = latent_sgd(
    model, 
    mathbake_train, 
    training_metric=DenotationAccuracyMetric(), 
    T=10, 
    eta=0.1)

Now that the model is trained, we can evaluate it on the held-out data:

In [ ]:
from experiment import evaluate_model
from metrics import denotation_match_metrics

evaluate_model(
    model=trained_model, 
    examples=mathbake_dev, 
    examples_label="Dev",
    metrics=denotation_match_metrics(),
    print_examples=False)

Enter your "denotation accuracy" score (non-oracle) from the above into the bake-off.

Objective 3: The translation function

Our primary objective was to learn how to translate the alien language into our own language for math (basic arithmetic). To see how well we did, we can look at the weights the classifier learned for the core rule-based features.

In [ ]:
from operator import itemgetter
from collections import defaultdict

def view_lexical_features(weights):
    # Get the lexical features:        
    feats = [(featname, val) for featname, val in weights.items() 
             if val > 0.0 and isinstance(featname, str) and featname.startswith('Rule')]
    # Get the core parts:
    lex = defaultdict(list)
    for featname, val in feats:
        r = eval(featname)
        lex[r.rhs[0]].append((r.sem, val))    
    # Restrict to the highest weights for each feature:
    for w, vals in lex.items():
        maxval = max([x[1] for x in vals])
        vals = [x for x in vals if x[1]==maxval]
        lex[w] = vals  
    # Printout sorted by our own semantic operators:
    for featname, vals in sorted(lex.items(), key=(lambda item: str(item[1]))):
        for val in vals:
            print("'%s' means %s (weight %0.02f)" % (featname, val[0], val[1]))
In [ ]:
view_lexical_features(trained_model.weights)