#!/usr/bin/env python # coding: utf-8 # # #
# Bill MacCartney
# Spring 2015
#
#
# Rule('$BikeMode, 'bike') # Rule('$BikeMode', 'by bike') ## # with a single rule containing an optional element: # # Rule('$BikeMode', '?by bike') # # The optional mechanism also makes it easy to define one category as a sequence of another category: # #
# Rule('$Thing', 'thing') # Rule('$Things', '$Thing ?$Things') ## # The following functions enable the optional mechanism. # In[14]: from types import FunctionType def is_optional(label): """ Returns true iff the given RHS item is optional, i.e., is marked with an initial '?'. """ return label.startswith('?') and len(label) > 1 def contains_optionals(rule): """Returns true iff the given Rule contains any optional items on the RHS.""" return any([is_optional(rhsi) for rhsi in rule.rhs]) def add_rule_containing_optional(grammar, rule): """ Handles adding a rule which contains an optional element on the RHS. We find the leftmost optional element on the RHS, and then generate two variants of the rule: one in which that element is required, and one in which it is removed. We add these variants in place of the original rule. (If there are more optional elements further to the right, we'll wind up recursing.) For example, if the original rule is: Rule('$Z', '$A ?$B ?$C $D') then we add these rules instead: Rule('$Z', '$A $B ?$C $D') Rule('$Z', '$A ?$C $D') """ # Find index of the first optional element on the RHS. first = next((idx for idx, elt in enumerate(rule.rhs) if is_optional(elt)), -1) assert first >= 0 assert len(rule.rhs) > 1, 'Entire RHS is optional: %s' % rule prefix = rule.rhs[:first] suffix = rule.rhs[(first + 1):] # First variant: the first optional element gets deoptionalized. deoptionalized = (rule.rhs[first][1:],) add_rule(grammar, Rule(rule.lhs, prefix + deoptionalized + suffix, rule.sem)) # Second variant: the first optional element gets removed. # If the semantics is a value, just keep it as is. sem = rule.sem # But if it's a function, we need to supply a dummy argument for the removed element. if isinstance(rule.sem, FunctionType): sem = lambda sems: rule.sem(sems[:first] + [None] + sems[first:]) add_rule(grammar, Rule(rule.lhs, prefix + suffix, sem)) # # ## Grammar engineering # It's time to start writing some grammar rules for the travel domain. We're going to adopt a data-driven approach. Using the 75 training examples as a development set, we will iteratively: # # - look at examples which are not yet parsed correctly, # - identify common obstacles or sources of error, # - introduce new rules to address those problems, and then # - re-evaluate on the 75 training examples. # # During grammar engineering, the performance metric we'll focus on is *oracle* accuracy (the proportion of examples for which *any* parse is correct), not accuracy (the proportion of examples for which the *first* parse is correct). Remember that oracle accuracy is an upper bound on accuracy. Oracle accuracy is a measure of the expressive power of the grammar: does it have the rules it needs to generate the correct parse? The gap between oracle accuracy and accuracy, on the other hand, reflects the ability of the scoring model to bring the correct parse to the top of the candidate list. # ### Phrase-bag grammars # For the travel domain, we're going to develop a style of grammar known as a *phrase-bag grammar*. To get a sense of how it will work, let's look at ten example inputs from the training data. # # # # # # # # # #
# travel boston to fr. myers fla # how do i get from tulsa oklahoma to atlantic city. new jersey by air # airbus from boston to europe # cheap tickets to south carolina # birmingham al distance from indianapolish in # transportation to the philadelphia airport # one day cruise from fort lauderdale florida # directions from washington to canada # flights from portland or to seattle wa # honeymoon trip to hawaii ## # Here we've highlighted phrases in different colors, according to the roles they play in building the meaning of the input. # # - Green phrases indicate the destination of travel. # - Blue phrases indicate the origin of travel. # - Yellow phrases indicate a mode of travel: air, boat, etc. # - Orange phrases indicate travel of some kind, but do not specify a travel mode. # - Red phrases indicate a specific type of information sought: distance, directions, etc. # - Gray phrases indicate "optional" words which contribute little to the semantics. (The modifiers "one day" and "honeymoon" may be quite meaningful to the user, but our semantic representation is too impoverished to capture them, so they are not relevant for our grammar.) # # Note that the *ordering* of phrases in a query isn't particularly important. Whether the user says directions from washington to canada, or from washington to canada directions, or even to canada directions from washington, the intent is clearly the same. Some of these formulations might be more natural than others — and more common in the query logs — but all of them should be interpretable by our grammar. # # That's the motivation for the phrase-bag style of grammar. In a simple phrase-bag grammar, a valid query is made up of one or more *query elements*, which can appear in any order, and *optionals*, which can be scattered freely amongst the query elements. (More complex phrase-bag grammars can impose further constraints.) For the travel domain, we can identify two types of query elements: *travel locations* and *travel arguments*. # A travel location is either a *to location* (destination) or a *from location* (origin). # A travel argument is either a *travel mode*, a *travel trigger*, or a *request type*. # # We're ready to write our first grammar rules for the travel domain. # In[15]: def sems_0(sems): return sems[0] def sems_1(sems): return sems[1] def merge_dicts(d1, d2): if not d2: return d1 result = d1.copy() result.update(d2) return result rules_travel = [ Rule('$ROOT', '$TravelQuery', sems_0), Rule('$TravelQuery', '$TravelQueryElements', lambda sems: merge_dicts({'domain': 'travel'}, sems[0])), Rule('$TravelQueryElements', '$TravelQueryElement ?$TravelQueryElements', lambda sems: merge_dicts(sems[0], sems[1])), Rule('$TravelQueryElement', '$TravelLocation', sems_0), Rule('$TravelQueryElement', '$TravelArgument', sems_0), ] # These rules are incomplete: `$TravelLocation` and `$TravelArgument` are not yet defined, and there is no mention of optionals. But these rules define the high-level structure of our phrase-bag grammar. # # Pay attention to the semantic functions attached to each rule. # Remember that our semantic representations are maps of key-value pairs. # In the rules which define `$TravelQueryElement`, the semantic function propagates the semantics of the child unchanged. # In the rule which defines `$TravelQueryElements`, the semantic function merges the semantics of the children. # And in the rule which defines `$TravelQuery`, the semantic function adds a key-value pair to the semantics of the child. # ### Travel locations # The rules introduced above left `$TravelLocation` undefined. Let's add some rules to define it. # In[16]: rules_travel_locations = [ Rule('$TravelLocation', '$ToLocation', sems_0), Rule('$TravelLocation', '$FromLocation', sems_0), Rule('$ToLocation', '$To $Location', lambda sems: {'destination': sems[1]}), Rule('$FromLocation', '$From $Location', lambda sems: {'origin': sems[1]}), Rule('$To', 'to'), Rule('$From', 'from'), ] # This looks good, but we still need something that defines `$Location`, which will match a phrase like "`boston`" and assign to it a semantic representation like `{id: 4930956, name: 'Boston, MA, US'}`. If only we had such a thing. # ### The `GeoNamesAnnotator` # [Geocoding][] is the process of mapping a piece of text describing a location (such as a place name or an address) into a canonical, machine-readable representation (such as [geographic coordinates][] or a unique identifier in a geographic database). Because geocoding is a common need across many applications, many geocoding services are available. # # One such service is [GeoNames][]. GeoNames defines a large geographic database in which each location is identified by a unique integer. For example, Boston is identified by [4930956][]. GeoNames also provides a free, [RESTful][] API for geocoding requests. For example, a request to geocode "Palo Alto" looks like this: # # [http://api.geonames.org/searchJSON?q=Palo+Alto&username=demo](http://api.geonames.org/searchJSON?q=Palo+Alto&username=demo) # # [geocoding]: http://en.wikipedia.org/wiki/Geocoding # [geographic coordinates]: http://en.wikipedia.org/wiki/Geographic_coordinate_system # [GeoNames]: http://www.geonames.org/ # [RESTful]: http://en.wikipedia.org/wiki/Representational_state_transfer # [4930956]: http://www.geonames.org/4930956/ # # In SippyCup, the `GeoNamesAnnotator` (defined in [`geonames.py`](./geonames.py)) is implemented as a wrapper around the GeoNames API. The job of the `GeoNamesAnnotator` is to recognize `$Location`s and generate semantics for them. Take a few minutes to skim that code now. # # Note that the `GeoNamesAnnotator` uses a persistent cache which is pre-populated for phrases in the 100 annotated travel examples, avoiding the need for live calls to the GeoNames API. However, if you run on any other examples, you will be making live calls. By default, your requests will specify your username as `wcmac`. That's fine, but each (free) GeoNames account is limited to 2000 calls per hour. If too many people are making calls as `wcmac`, that quota could be exhausted quickly. If that happens, you'll want to [create your own account](http://www.geonames.org/login) on GeoNames. # # You will soon observe that the annotations are far from perfect. For example, "`florida`" is annotated as `{'id': 3442584, 'name': 'Florida, UY'}`, which denotes a city in Uruguay. The current implementation of `GeoNamesAnnotator` could be improved in many ways. For example, while the GeoNames API can return multiple results for ambiguous queries such as "`florida`", the `GeoNamesAnnotator` considers only the first result — and there is no guarantee that the first result is the best. It may also be possible to improve the quality of the results by playing with some of the API request parameters documented [here](http://www.geonames.org/export/geonames-search.html). Some of the [exercises](#travel-exercises) at the end of this unit ask you to explore these possibilities. # # We're now ready to see our grammar parse an input: # In[17]: from collections import defaultdict from geonames import GeoNamesAnnotator from parsing import * geonames_annotator = GeoNamesAnnotator() rules = rules_travel + rules_travel_locations travel_annotators = [geonames_annotator] grammar = Grammar(rules=rules, annotators=travel_annotators) parses = grammar.parse_input('from boston to austin') print(parses[0].semantics) # ### Travel modes # Let's turn our attention from travel locations to travel arguments. One kind of travel argument is a travel mode. Our semantic representation defines eight travel modes (air, bike, boat, bus, car, taxi, train, and transit), and our training examples illustrate some common ways of expressing each travel mode. In fact, individual examples will suggest specific lexical rules. Consider this example: # # Example(input='flights from portland or to seattle wa', # semantics={'domain': 'travel', 'mode': 'air', # 'origin': {'id': 5746545, 'name': 'Portland, OR, US'}, # 'destination': {'id': 5809844, 'name': 'Seattle, WA, US'}}), # # The pairing of the token "`flights`" with the semantic fragment `{'mode': 'air'}` suggests the rule: # # Rule('$AirMode', 'flights', {'mode': 'air'}) # # Actually, for simplicity, we're going to add the '`air`' semantics one level "higher" in the grammar, like this: # # Rule('$TravelMode', '$AirMode', {'mode': 'air'}) # Rule('$AirMode', 'flights') # # The rules below illustrate this approach. These rules will allow our grammar to handle the most common ways of referring to each travel mode. All of the lexical rules below use phrases which are either highly obvious (such as '`taxi`' for `$TaxiMode`) or else are motivated by specific examples in our training dataset (not the test set!). # In[18]: rules_travel_modes = [ Rule('$TravelArgument', '$TravelMode', sems_0), Rule('$TravelMode', '$AirMode', {'mode': 'air'}), Rule('$TravelMode', '$BikeMode', {'mode': 'bike'}), Rule('$TravelMode', '$BoatMode', {'mode': 'boat'}), Rule('$TravelMode', '$BusMode', {'mode': 'bus'}), Rule('$TravelMode', '$CarMode', {'mode': 'car'}), Rule('$TravelMode', '$TaxiMode', {'mode': 'taxi'}), Rule('$TravelMode', '$TrainMode', {'mode': 'train'}), Rule('$TravelMode', '$TransitMode', {'mode': 'transit'}), Rule('$AirMode', 'air fare'), Rule('$AirMode', 'air fares'), Rule('$AirMode', 'airbus'), Rule('$AirMode', 'airfare'), Rule('$AirMode', 'airfares'), Rule('$AirMode', 'airline'), Rule('$AirMode', 'airlines'), Rule('$AirMode', '?by air'), Rule('$AirMode', 'flight'), Rule('$AirMode', 'flights'), Rule('$AirMode', 'fly'), Rule('$BikeMode', '?by bike'), Rule('$BikeMode', 'bike riding'), Rule('$BoatMode', '?by boat'), Rule('$BoatMode', 'cruise'), Rule('$BoatMode', 'cruises'), Rule('$BoatMode', 'norwegian cruise lines'), Rule('$BusMode', '?by bus'), Rule('$BusMode', 'bus tours'), Rule('$BusMode', 'buses'), Rule('$BusMode', 'shutle'), Rule('$BusMode', 'shuttle'), Rule('$CarMode', '?by car'), Rule('$CarMode', 'drive'), Rule('$CarMode', 'driving'), Rule('$CarMode', 'gas'), Rule('$TaxiMode', 'cab'), Rule('$TaxiMode', 'car service'), Rule('$TaxiMode', 'taxi'), Rule('$TrainMode', '?by train'), Rule('$TrainMode', 'trains'), Rule('$TrainMode', 'amtrak'), Rule('$TransitMode', '?by public transportation'), Rule('$TransitMode', '?by ?public transit'), ] # Let's see the rules in action on a toy example. # In[19]: rules = rules_travel + rules_travel_locations + rules_travel_modes grammar = Grammar(rules=rules, annotators=travel_annotators) parses = grammar.parse_input('from boston to austin by train') print(parses[0].semantics) # Great, it works. # # We're far from done with the travel grammar, but we now have enough in place that we should be able to parse several of the examples in our training data. This means that we can start hill-climbing on oracle accuracy! # # To drive our grammar engineering process, we're going to use a SippyCup utility function called `sample_wins_and_losses()`, which will report our oracle accuracy on the training data, and show some examples we're parsing correctly and some examples we're not. # # (Note that `sample_wins_and_losses()` requires a `Domain`, a `Model`, and a `Metric`. A description of these classes is tangential to our presentation. If you're interested, read some SippyCup code! It's not very complicated.) # # # In[20]: from experiment import sample_wins_and_losses from metrics import SemanticsOracleAccuracyMetric from scoring import Model from travel import TravelDomain domain = TravelDomain() model = Model(grammar=grammar) metric = SemanticsOracleAccuracyMetric() sample_wins_and_losses(domain=domain, model=model, metric=metric, seed=31) # As you can see, `sample_wins_and_losses()` doesn't print many details — just a few evaluation metrics, and then examples of wins and losses with the current grammar. (A "win" is an example on which our primary metric has a positive value.) Our primary metric, semantics oracle accuracy, stands at 0.133 — not great, but greater than zero, so it's a start. The wins make sense: we can parse queries which consist solely of travel locations and travel modes, with no extraneous elements. The losses are more interesting, because they will provide the motivation for the next phase of work. # ### Travel triggers # Many of the examples we're currently failing to parse contain phrases (such as "tickets" or "transportation") which indicate a travel intent, but do not specify a travel mode. These phrases constitute the second type of travel argument, namely *travel triggers*. Inspection of the examples in our training dataset suggest a small number of lexical rules, shown here. # In[21]: rules_travel_triggers = [ Rule('$TravelArgument', '$TravelTrigger', {}), Rule('$TravelTrigger', 'tickets'), Rule('$TravelTrigger', 'transportation'), Rule('$TravelTrigger', 'travel'), Rule('$TravelTrigger', 'travel packages'), Rule('$TravelTrigger', 'trip'), ] # Let's run `sample_wins_and_losses()` again, to see how much we've gained, and what to work on next. # In[22]: rules = rules_travel + rules_travel_locations + rules_travel_modes + rules_travel_triggers grammar = Grammar(rules=rules, annotators=travel_annotators) model = Model(grammar=grammar) sample_wins_and_losses(domain=domain, model=model, metric=metric, seed=1) # So we've gained about four points (that is, 0.04) in oracle accuracy on the training dataset. We're making progress! Again, the losses suggest where to go next. # ### Request types # The third and last kind of travel argument is a request type, which indicates a specific type of information sought, such as directions or distance. We'll adopt the same methodology, adding lexical rules motivated by specific training examples, tied together by higher-level rules which add semantics. # In[23]: rules_request_types = [ Rule('$TravelArgument', '$RequestType', sems_0), Rule('$RequestType', '$DirectionsRequest', {'type': 'directions'}), Rule('$RequestType', '$DistanceRequest', {'type': 'distance'}), Rule('$RequestType', '$ScheduleRequest', {'type': 'schedule'}), Rule('$RequestType', '$CostRequest', {'type': 'cost'}), Rule('$DirectionsRequest', 'directions'), Rule('$DirectionsRequest', 'how do i get'), Rule('$DistanceRequest', 'distance'), Rule('$ScheduleRequest', 'schedule'), Rule('$CostRequest', 'cost'), ] # Again, we'll check our progress using `sample_wins_and_losses()`. # In[24]: rules = rules_travel + rules_travel_locations + rules_travel_modes + rules_travel_triggers + rules_request_types grammar = Grammar(rules=rules, annotators=travel_annotators) model = Model(grammar=grammar) sample_wins_and_losses(domain=domain, model=model, metric=metric, seed=1) # Great, oracle accuracy is up to 0.20. But there's still one big piece we're missing. # ### Optionals # A key ingredient of the phrase-bag approach to grammar building is the ability to accept *optional* elements interspersed freely among the query elements. Optionals are phrases which can be either present or absent; typically, they contribute nothing to the semantics. # # The following rules illustrate one approach to allowing optionals. The first two rules allow any `$TravelQueryElement` to combine with an `$Optionals` either to the right or to the left, while ignoring its semantics. The third rule defines `$Optionals` as a sequence of one or more `$Optional` elements, while the following rules define several specific categories of optionals. As usual, most of the lexical rules are motivated by specific examples from the training dataset, with a few extras included just because they are super obvious. # # This is not necessarily the best design! One of the [exercises](#travel-exercises) will challenge you to do better. # In[25]: rules_optionals = [ Rule('$TravelQueryElement', '$TravelQueryElement $Optionals', sems_0), Rule('$TravelQueryElement', '$Optionals $TravelQueryElement', sems_1), Rule('$Optionals', '$Optional ?$Optionals'), Rule('$Optional', '$Show'), Rule('$Optional', '$Modifier'), Rule('$Optional', '$Carrier'), Rule('$Optional', '$Stopword'), Rule('$Optional', '$Determiner'), Rule('$Show', 'book'), Rule('$Show', 'give ?me'), Rule('$Show', 'show ?me'), Rule('$Modifier', 'cheap'), Rule('$Modifier', 'cheapest'), Rule('$Modifier', 'discount'), Rule('$Modifier', 'honeymoon'), Rule('$Modifier', 'one way'), Rule('$Modifier', 'direct'), Rule('$Modifier', 'scenic'), Rule('$Modifier', 'transatlantic'), Rule('$Modifier', 'one day'), Rule('$Modifier', 'last minute'), Rule('$Carrier', 'delta'), Rule('$Carrier', 'jet blue'), Rule('$Carrier', 'spirit airlines'), Rule('$Carrier', 'amtrak'), Rule('$Stopword', 'all'), Rule('$Stopword', 'of'), Rule('$Stopword', 'what'), Rule('$Stopword', 'will'), Rule('$Stopword', 'it'), Rule('$Stopword', 'to'), Rule('$Determiner', 'a'), Rule('$Determiner', 'an'), Rule('$Determiner', 'the'), ] # Again, we'll check our progress using `sample_wins_and_losses()`. # In[26]: rules = rules_travel + rules_travel_locations + rules_travel_modes + rules_travel_triggers + rules_request_types + rules_optionals grammar = Grammar(rules=rules, annotators=travel_annotators) model = Model(grammar=grammar) sample_wins_and_losses(domain=domain, model=model, metric=metric, seed=1) # Adding support for optionals has doubled oracle accuracy on the training dataset, from 0.200 to 0.400. This is a big gain! However, there are still many losses, and many of them share a property: they are *negative examples*. # ### Negative examples # A semantic parsing model for a given domain should be able to predict that a given input does *not* belong to the domain. We call such inputs *negative examples*. For the travel domain, negative examples include: # # discount tickets to new york city ballet # george washington borrows 500 000 from pennsylvania farmer to finance war # ride this train to roseburg oregon now ther's a town for ya # # Much of the academic literature on semantic parsing describes systems which are required always to produce an in-domain semantic representation, no matter what the input. If the input is *not* in-domain, the result is usually garbage. In real-world applications, it's much better to have models which can learn when to produce no positive output. # # The easiest way to achieve this is to introduce some rules which allow any input to be parsed with "negative" semantics, and then learn weights for those rule features in the scoring model. In the travel domain, the "negative" semantic representation is the special value `{'domain': 'other'}`. # In[27]: rules_not_travel = [ Rule('$ROOT', '$NotTravelQuery', sems_0), Rule('$NotTravelQuery', '$Text', {'domain': 'other'}), Rule('$Text', '$Token ?$Text'), ] # Note that the last rule depends on the `$Token` category, which can be applied to any token by the `TokenAnnotator`. So let's add the `TokenAnnotator` to our list of annotators. # In[28]: travel_annotators = [geonames_annotator, TokenAnnotator()] # As usual, we'll check our progress using `sample_wins_and_losses()`. # In[29]: rules = rules_travel + rules_travel_locations + rules_travel_modes + rules_travel_triggers + rules_request_types + rules_optionals + rules_not_travel grammar = Grammar(rules=rules, annotators=travel_annotators) model = Model(grammar=grammar) sample_wins_and_losses(domain=domain, model=model, metric=metric, seed=1) # We've achieved another big gain in oracle accuracy, from 0.400 to 0.573, just by ensuring that we offer a "negative" prediction for every input. (Note that the mean number of parses has increased by exactly 1, from 0.773 to 1.773.) However, for the first time, a big gap has opened between accuracy, at 0.173, and oracle accuracy, at 0.573. The problem is that we don't yet have a scoring model for the travel domain, so the ranking of parses is arbitrary. In order to close the gap, we need to create a scoring model. One of the [exercises](#travel-exercises) will ask you to pursue this. # ## Exercises # Several of these exercises ask you to measure the impact of your change on key evaluation metrics. Part of your job is to decide which evaluation metrics are most relevant for the change you're making. It's probably best to evaluate only on training data, in order to keep the test data unseen during development. (But the test data is hardly a state secret, so whatever.) # # ### Straightforward # # 1. Select 20 of the 6,588 queries in [aol-travel-queries.txt](./aol-travel-queries.txt) and manually annotate them with target semantics. Select your queries using uniform random sampling — this will minimize overlap between different people completing this exercise, and therefore maximize the value of the aggregate labor. (In Python, you can sample from a list using `random.sample()`.) You will likely find some cases where it's not clear what the right semantics are. Do your best. The point of the exercise is to develop an awareness of the challenges of annotation, and to recognize that there's no such thing as perfectly annotated data. # # 1. Many of the remaining errors on training examples occur because the origin isn't marked by "from". Examples include "`transatlantic cruise southampton to tampa`", "`fly boston to myrtle beach spirit airlines`", and "`distance usa to peru`". Extend the grammar to handle examples like these. Measure the impact on key evaluation metrics. # # 1. Does your solution to the previous exercise handle examples where some other query element intervenes between origin and destination? Examples include "`university of washington transportation to seatac`", "`birmingham al distance from indianapolish in`", and "`nyc flights to buffalo ny`". If not, extend the grammar to handle examples like these. Measure the impact on key evaluation metrics. # # 1. The current structure of the grammar permits parses containing any number of `$FromLocation`s and `$ToLocation`s, including zero. Find a way to require that (a) there is at least one `$FromLocation` or `$ToLocation`, (b) there are not multiple `$FromLocation`s or `$ToLocation`s. # # 1. The travel grammar is lacking a scoring model, and it shows a big gap between accuracy and oracle accuracy. Examine and diagnose some examples where accuracy is 0 even though oracle accuracy is 1. Propose and implement a scoring model, and measure its efficacy in closing that gap. # # # ### Challenging # # 1. Extend `Grammar` to allow rules which mix terminals and non-terminals on the RHS, such as `Rule('$RouteQuery', '$TravelMode from $Location to $Location')`. # # 1. Try to improve the precision of the `GeoNamesAnnotator` by fiddling with the GeoNames API request parameters documented [here](http://www.geonames.org/export/geonames-search.html). For example, the `featureClass`, `countryBias`, or `orderby` parameters seem like promising targets. # # 1. Try to improve the coverage of the `GeoNamesAnnotator` by enabling it to return multiple annotations for ambiguous location names. Investigate the impact of varying the maximum number of annotations on various performance metrics, including accuracy, oracle accuracy, and number of parses. How would you characterize the tradeoff you're making? # # 1. Building on the previous exercise, implement a feature which captures information about the result rank of annotations generated by the `GeoNamesAnnotator`, and see if you can use this feature to narrow the gap between accuracy and oracle accuracy. # # 1. You have probably noticed that one of our standard evaluation metrics is something called "spurious ambiguity". Dig into the SippyCup codebase to figure out what spurious ambiguity is. Here's a hint: it's something bad, so we want to push that metric toward zero. Find a training example where it's not zero, and figure out why the example exhibits spurious ambiguity. Are there changes we can make to the grammar to reduce spurious ambiguity? Also, why is spurious ambiguity undesirable? # # 1. In its current form, the travel grammar parses lots of queries it shouldn't. (By "parses", we mean "generates a *positive* parse".) This problem is known as *overtriggering*. The overtriggering problem is hard to observe on our tiny dataset of 100 examples, where most of the examples are positive examples. Investigate overtriggering by downloading the AOL query dataset, identifying the 1,000 most frequent queries, and running them through the grammar. How many cases of overtriggering do you find? Can you suggest some simple changes to minimize overtriggering? (Warning: the AOL dataset contains lots of queries which may be offensive. Skip this exercise if you're not down with that.) # # 1. Consider the queries "`flights los angeles hawaii`" vs. "`flights los angeles california`". Despite the superficial resemblance, the second query seems to mean flights *to* Los Angeles, not flights *from* Los Angeles *to* California. But the grammar can successfully interpret both queries only if it permits both interpretations for each query. This creates a ranking problem: which interpretation should be scored higher? Can you add to the scoring model a feature which solves the problem? # # 1. Develop and execute a strategy to annotate all 6,588 queries in [aol-travel-queries.txt](./aol-travel-queries.txt) with target semantics using crowdsourcing. Warning: this is an ambitious exercise. # # Copyright (C) 2015 Bill MacCartney