Looking at French word frequency with Lexique¶

Lexique 3 is a French word database from the Université de Savoie. It includes:

Inflected word forms.
Lexemes.
Frequency data for movie subtitles and books.

It's a great database and a ton of fun. In this notebook, we use a copy of Lexique that has been loaded into the SQLite 3 database. But since we don't want to get into the messy details of the Python, we build the database using a Makefile, and we keep all of our Python utility functions in an external file named lexique.py. If you want to see all those details, or customize this analysis, check out this notebook on GitHub.

First, let's get everything loaded.

In [1]:

%matplotlib inline
%run lexique.py

First, let's take a look at what data we have available. First, we have the raw data from Lexique, which includes inflected forms in the ortho column:

In [2]:

sql("SELECT * FROM lexique WHERE lemme = 'avoir' LIMIT 5")

Out[2]:

	ortho	phon	lemme	cgram	freqlemfilms2	freqlemlivres	freqfilms2	freqlivres
0	a	a	avoir	AUX	18559.22	12800.81	6350.91	2926.69
1	a	a	avoir	VER	13572.40	6426.49	5498.34	1669.39
2	ai	E	avoir	AUX	18559.22	12800.81	4902.10	2119.12
3	ai	E	avoir	VER	13572.40	6426.49	2475.78	619.05
4	aie	E	avoir	AUX	18559.22	12800.81	31.75	21.69

We have another table, lemme, which sums over all the orthographies associated with a given lemma. The lemme column, however, is still not unique: If a given word can be either a noun or a verb, it will appear twice. And so on.

In [3]:

sql("SELECT * FROM lemme ORDER BY freqlivres DESC LIMIT 5")

Out[3]:

	lemme	cgram	genre	nombre	freqfilms2	freqlivres
0	de	PRE			25220.86	38928.92
1	la	ART:def	f	s	14946.48	23633.92
2	et	CON			12909.08	20879.73
3	à	PRE			12190.40	19209.05
4	le	ART:def	m	s	13652.76	18310.95

And finally, we collapse all the cgram, genre and nombre values associated with a given value of lemme, to give us unique lemmas with frequency data:

In [4]:

sql("SELECT * FROM lemme_simple ORDER BY freqlivres DESC LIMIT 5")

Out[4]:

	lemme	freqfilms2	freqlivres
0	de	25220.96	38928.92
1	la	16028.08	24877.30
2	être	40411.41	21709.87
3	et	12909.08	20879.73
4	le	16953.50	20735.14

Note that we have two sets of frequency data: freqfilms2, which is based on a corpus of film subtitles, and freqlivres, which is based on a corpus of books. There are some important differences. For example, French films use the passé composé far more often than books, which raises the frequencies of the auxiliary verbs être and avoir:

In [5]:

sql("SELECT * FROM lemme_simple ORDER BY freqfilms2 DESC LIMIT 5")

Out[5]:

	lemme	freqfilms2	freqlivres
0	être	40411.41	21709.87
1	avoir	32134.77	19230.64
2	je	25988.48	10862.77
3	de	25220.96	38928.92
4	ne	22297.51	13852.97

Parts of speech¶

Using the film dataset, let's take a look at the parts of speech:

In [6]:

cgram_freq = sql("""
SELECT cgram, SUM(freqfilms2) AS freqfilms2, SUM(freqlivres) AS freqlivres
  FROM lemme GROUP BY cgram
""", index_col='cgram')
cgram_freq

Out[6]:

	freqfilms2	freqlivres
cgram
	1.20	0.00
ADJ	42939.77	56548.13
ADJ:dem	6363.94	6802.23
ADJ:ind	2999.34	3737.10
ADJ:int	1273.62	582.91
ADJ:num	2525.93	4680.43
ADJ:pos	19106.03	20005.62
ADV	97693.38	69747.44
ART:def	54495.63	83470.15
ART:ind	26051.19	33763.58
AUX	26633.45	19302.64
CON	29730.47	38189.17
LIA	0.00	412.57
NOM	144894.66	186537.81
ONO	6291.15	1501.06
PRE	77439.28	118274.42
PRO:dem	15700.13	7549.99
PRO:ind	7538.14	5716.50
PRO:int	1612.64	736.37
PRO:per	133651.20	90995.84
PRO:pos	334.17	322.16
PRO:rel	11547.03	14483.19
VER	198390.74	150350.57

In [7]:

cgram_freq_summary = cgram_freq.groupby(lambda x: x[0:3]).sum()
plt.figure(figsize=(7,7))
plt.subplot(aspect=True)
plt.pie(cgram_freq_summary.freqfilms2, labels=cgram_freq_summary.index.values, colors=colors)
plt.title("Parts of speech")

Out[7]:

<matplotlib.text.Text at 0x41c2050>

Text coverage¶

How many words do we need to know to understand 98% of the individual words which appear on a given page?

In [8]:

coverage = sql("""
SELECT lemme, freqfilms2 FROM lemme_simple
  ORDER BY freqfilms2 DESC""")
coverage.index += 1
coverage['film_coverage'] = \
  100*coverage['freqfilms2'].cumsum() / coverage['freqfilms2'].sum()
del coverage['lemme']
del coverage['freqfilms2']
coverage[0:5]

Out[8]:

	film_coverage
1	4.454456
2	7.996598
3	10.861248
4	13.641296
5	16.099099

In [9]:

book_coverage = sql("""
SELECT lemme, freqlivres FROM lemme_simple
  ORDER BY freqlivres DESC""")
book_coverage.index += 1
coverage['book_coverage'] = \
  100*book_coverage['freqlivres'].cumsum() / book_coverage['freqlivres'].sum()
coverage[0:5]

Out[9]:

	film_coverage	book_coverage
1	4.454456	4.260534
2	7.996598	6.983203
3	10.861248	9.359217
4	13.641296	11.644377
5	16.099099	13.913712

In [10]:

plt.plot(coverage.index.values, coverage.film_coverage, label="Film Coverage")
plt.plot(coverage.index.values, coverage.book_coverage, label="Book Coverage")
plt.legend(loc = 'lower right')
plt.title('Text Coverage')
plt.xlabel('Vocabulary size')
plt.ylabel('% coverage')
plt.xlim((0,10000))

Out[10]:

(0, 10000)

Or, in table form, here's how many words you need to know to get a given percentage of coverage:

In [11]:

coverage.loc[[250, 500, 1000, 2000, 4000, 8000, 16000], :]

Out[11]:

	film_coverage	book_coverage
250	76.164757	68.074024
500	82.792428	74.990087
1000	88.386602	81.450089
2000	93.004247	87.535363
4000	96.410076	92.756703
8000	98.554939	96.606475
16000	99.666146	99.017399

Text coverage by part of speech¶

We want to get a feel for how many nouns, verbs, etc., are required in a well-balanced vocabulary. This requires grouping words by part of speech, sorting them by frequency, and graphing the cumulative text coverge for a given number of words. This takes a fair bit of work to set up.

First, we need to do quite a bit of data munging:

In [12]:

# Merge related cgrams, sum frequency over (cgram, lemme) groups,
# and sort by (cgram,freqfilms2).
cgram_lemme_freq = sql("""
SELECT cgram, SUM(freqfilms2) AS freqfilms2
  FROM (SELECT CASE WHEN cgram='AUX' THEN 'VER'
                    ELSE SUBSTR(cgram, 1, 3)
                    END AS cgram,
               lemme, freqfilms2
          FROM lemme)
  GROUP BY cgram, lemme
  ORDER BY cgram, freqfilms2 DESC
""")

# Convert freqfilms2 to a cumulative percentage over each cgram group.
cgram_col = cgram_lemme_freq['cgram']
normalized_freq = cgram_lemme_freq.groupby(cgram_col).transform(lambda x: x/x.sum())
cumulative_freq = normalized_freq.groupby(cgram_col).cumsum()
cgram_lemme_freq['freqfilms2'] = 100.0*cumulative_freq

# Sequentially number the rows within each cgram group so we can see the
# vocabulary size corresponding to each cumulative percentage.
cgram_lemme_freq['rang'] = cgram_lemme_freq.groupby(cgram_col).cumcount()+1

# Index by cgram group, and vocabulary size within the group. Uncomment the
# last line to view the data.
cgram_lemme_freq.set_index(['cgram', 'rang'], inplace = True)
#cgram_lemme_freq

Now that we have the data, we can plot it using two different graphs: One for the "large" parts of speech (nouns, etc.), and one for the parts of speech which are either closed classes, or at least very small.

In [13]:

def plot_cgrams(labels):
    for key in labels.keys():
        cgram_group = cgram_lemme_freq.loc[key]
        plt.plot(cgram_group.index.values, cgram_group.freqfilms2, label=labels[key])
    plt.legend(loc = 'lower right')
    plt.title('Text Coverage by Part of Speech (films)')
    plt.xlabel('Words known by part of speech')
    plt.ylabel('% coverage')
    plt.ylim((0,100))
    plt.axhline(y=90, color='k', ls='dashed')

plt.figure(figsize=(12,4))
    
plt.subplot(121)
small_cgram_labels = {
    'PRO': 'Pronouns',
    'ADV': 'Adverbs',
    'PRE': 'Prepositions',
    'CON': 'Conjuctions',
    'ART': 'Articles'
}
plot_cgrams(small_cgram_labels)
plt.xlim((0,150))

plt.subplot(122)
large_cgram_labels = {
    'NOM': 'Nouns',
    'VER': 'Verbs',
    'ADJ': 'Adjectives'
}
plot_cgrams(large_cgram_labels)
plt.xlim((0,10000))

Out[13]:

(0, 10000)

It would be nice to have this as a table, too, so we can figure out—for example—how many nouns we need to get 75% coverage. Once again, this will require a fair bit of data munging.

In [14]:

# Only include the parts of speech used in our graph.
cgram_labels = small_cgram_labels.copy()
cgram_labels.update(large_cgram_labels)
interesting = cgram_lemme_freq.loc[cgram_labels.keys()]

# We'll use this to build a list of columns in our final table.
columns = []

# Calculate minimum number words for a given percentage of coverage.
for threshold in [75,90,95,98,99,99.5]:
    # Discard all the rows below our threshold.
    over_threshold = interesting[interesting['freqfilms2'] >= threshold]
    
    # Take the first row that remains.
    over_threshold.reset_index(inplace=True)
    over_threshold.set_index('cgram', inplace=True)
    first_over = over_threshold.groupby(level=0).first()
    
    # Keep only a single column named after our threashold.
    del first_over['freqfilms2']
    first_over.rename(columns={'rang': '%r%%' % threshold}, inplace=True)
    columns.append(first_over)

# Join all the columns together.
table = columns[0].join(columns[1:])

# Clean up the table a bit and add a total
table.index.names = ['Part of speech']
table.index = table.index.map(lambda i: cgram_labels[i])
table.loc['TOTAL'] = table.sum()
table

Out[14]:

	75%	90%	95%	98%	99%	99.5%
Adjectives	136	620	1367	2686	3742	4736
Adverbs	17	42	69	118	182	277
Articles	6	8	9	9	10	10
Conjuctions	5	9	11	14	15	17
Nouns	1137	3115	5215	8454	10956	13347
Prepositions	6	9	14	21	26	30
Pronouns	16	24	29	40	50	65
Verbs	63	290	583	1108	1601	2149
TOTAL	1386	4117	7297	12450	16582	20631

Verb groups¶

We divide verbs into the three standard groups, -er, -ir and -re. We split aller into its own group, because it's the only irregular -er verb. For now, we treat the auxiliary versions of être and avoir in the passé composé as being ordinary verbs.

In [15]:

verbs = sql("SELECT * FROM verbe ORDER BY freqfilms2 DESC")
verbs[0:15]

Out[15]:

	lemme	groupe	prototype	conjugaison	aux	freqfilms2	freqlivres
0	être	re	être	être	avoir	40310.72	21587.31
1	avoir	ir	avoir	avoir	avoir	32131.64	19227.33
2	aller	aller	aller	aller	être	9992.78	2854.92
3	faire	re	.*faire	faire	avoir	8813.52	5328.99
4	dire	re	dire\|redire	dire	avoir	5946.18	4832.51
5	pouvoir	ir	pouvoir	pouvoir	avoir	5524.46	2659.75
6	vouloir	ir	.*vouloir	vouloir	avoir	5249.31	1640.16
7	savoir	ir	.*savoir	savoir	avoir	4516.72	2003.59
8	voir	ir	.voir\|.oir	voir	avoir	4119.47	2401.76
9	devoir	ir	.*devoir	devoir	avoir	3232.59	1318.20
10	venir	ir	.*venir	venir	être	2763.82	1514.53
11	suivre	re	.*suivre	suivre	avoir	2090.55	949.13
12	parler	er	.*er	-er	avoir	1970.53	1086.02
13	prendre	re	.*prendre	prendre	avoir	1913.84	1466.44
14	croire	re	.*croire	croire	avoir	1712.02	947.25

As we can see, all three groups have roughly equal text coverage, but there are actually far more -er verbs than all the others combined. This suggests that a small number of -ir and -re verbs are disproportionately common.

In [16]:

plt.figure(figsize=(8,8))

plt.subplot(121, aspect=True)
group_freq = verbs.groupby(verbs['groupe']).sum()
plt.pie(group_freq.freqfilms2, labels=group_freq.index.values, colors=colors)
plt.title("Verb group frequency")

plt.subplot(122, aspect=True)
group_size = verbs.groupby(verbs['groupe']).count()
plt.pie(group_size.lemme, labels=group_size.index.values, colors=colors)
plt.title("Verb group size (words)")

Out[16]:

<matplotlib.text.Text at 0x4eb8b10>

In [17]:

# Extract the columns we need, and get rid of 'aller'.
group_freq = verbs[['groupe', 'lemme', 'freqfilms2']].copy()
group_freq = group_freq[group_freq['groupe'] != 'aller']

# Calculate coverage percentages for frequency ranks in each group.
groupe_col = group_freq['groupe']
normalized_freq = group_freq.groupby(groupe_col).transform(lambda x: x/x.sum())
cumulative_freq = 100.0*normalized_freq.groupby(groupe_col).cumsum()
group_freq['freqfilms2'] = cumulative_freq
group_freq['rang'] = group_freq.groupby(groupe_col).cumcount()+1
group_freq.set_index(['groupe', 'rang'], inplace=True)
group_freq[0:10]

Out[17]:

		lemme	freqfilms2
groupe	rang
re	1	être	53.707239
ir	1	avoir	44.826275
re	2	faire	65.449768
re	3	dire	73.372051
ir	2	pouvoir	52.533350
	3	vouloir	59.856569
	4	savoir	66.157764
	5	voir	71.904763
	6	devoir	76.414491
	7	venir	80.270247

In [18]:

# Sigh. My database is in French, and my libraries are in English.
# There's no way to avoid coding in franglais, I fear.
for group in ['er', 'ir', 're']:
    g = group_freq.loc[group]
    plt.plot(g.index.values, g.freqfilms2, label=group)
plt.title('Verb Coverage by Group')
plt.legend(loc = 'lower right')
plt.xlabel('Verbs known in group')
plt.ylabel('% coverage')
plt.xlim((1,100))
plt.ylim((0,100))

Out[18]:

(0, 100)

If we take the first 40 -ir and -re verbs, we get better than 96% coverage. Even the first 20 in each group will give us better than 92% coverage. Here's a list for people who want to master all the high-frequency irregular verbs.

In [19]:

def html_for_group(groupe):
    lst = ', '.join(group_freq.loc[groupe].loc[1:40]['lemme'].tolist())
    return '<p><i>-%s</i> verbs: %s.</p>' % (groupe, lst)
HTML("<p><i>-er</i> verbs: aller.</p>" + html_for_group('ir') + html_for_group('re'))

Out[19]:

-er verbs: aller.

-ir verbs: avoir, pouvoir, vouloir, savoir, voir, devoir, venir, falloir, partir, mourir, sortir, revenir, finir, sentir, tenir, devenir, ouvrir, dormir, asseoir, souvenir, servir, valoir, agir, recevoir, mentir, offrir, choisir, revoir, courir, réussir, prévenir, découvrir, maintenir, réfléchir, souffrir, couvrir, obtenir, appartenir, ressentir, prévoir.

-re verbs: être, faire, dire, suivre, prendre, croire, attendre, mettre, connaître, comprendre, entendre, plaire, perdre, vivre, rendre, foutre, apprendre, boire, écrire, lire, répondre, descendre, suffire, vendre, battre, promettre, permettre, conduire, disparaître, taire, remettre, reconnaître, rire, reprendre, détruire, paraître, craindre, naître, rejoindre, défendre.

Verb groups, in horrifying detail¶

Of course, not all -er verbs are completely regular, and there are patterns among the other verb groups. Fortunately, there's a nice XML file of French verb conjugation rules that we can use to examine these hidden details. Combining that with quite a bit of custom code, we can assign a "conjugator" to each verb prototype, and verify that the generated forms match the XML data. This gives us a much shorter list of key forms.

In [20]:

verbs2 = sql("""
SELECT conjugaison.nom AS conjugaison, lemme, freqfilms2, resume
  FROM verbe
  LEFT OUTER JOIN conjugaison
    ON verbe.conjugaison = conjugaison.nom
  ORDER BY freqfilms2 DESC
""")
verbs2['freqfilms2'] = 100 * verbs2['freqfilms2'] / verbs2['freqfilms2'].sum()
def summarize_conjugator(grp):
    return pd.Series(dict(exemples=', '.join(grp.lemme[0:5]),
                          compte=grp.lemme.count(),
                          freqfilms2=grp.freqfilms2.sum(),
                          resume=grp.resume.iloc[0]))
conjugators = verbs2.groupby('conjugaison').apply(summarize_conjugator).sort('freqfilms2', ascending=False)
conjugators.reset_index(inplace=True)
conjugators.index.names = ['rang']
conjugators.reset_index(inplace=True)
conjugators['rang'] = conjugators['rang'] + 1
conjugators['freqfilms2'] = conjugators['freqfilms2'].cumsum()
conjugators.set_index('rang', inplace=True)
save_tsv('conjugators.tsv', conjugators)
conjugators

Out[20]:

	conjugaison	compte	exemples	freqfilms2	resume
rang
1	-er	5187	parler, aimer, passer, penser, trouver	27.057864	(p.p.) parlé, je parle, tu parles, il parle, n...
2	être	1	être	44.971814	(irregular)
3	avoir	1	avoir	59.251008	(irregular)
4	aller	1	aller	63.691766	(irregular)
5	faire	5	faire, refaire, satisfaire, défaire, contrefaire	67.651207	(irregular)
6	dire	2	dire, redire	70.297491	Like interdire, except: vous dites
7	pouvoir	1	pouvoir	72.752543	(irregular)
8	venir	26	venir, revenir, tenir, devenir, souvenir	75.130407	Like -ir, except: (p.p.) venu, tu viens, nous ...
9	vouloir	2	vouloir, revouloir	77.463383	(irregular)
10	savoir	3	savoir, non-savoir, assavoir	79.470603	(irregular)
11	-re	50	attendre, entendre, perdre, rendre, répondre	81.428246	(p.p.) attendu, tu attends, nous attendons, il...
12	voir	5	voir, revoir, entrevoir, ravoir, comparoir	83.332903	Like -ir, except: (p.p.) vu, nous voyons, ils ...
13	partir	18	partir, sortir, sentir, dormir, servir	84.950418	Like -ir, except: tu pars
14	prendre	12	prendre, comprendre, apprendre, reprendre, sur...	86.441360	Like -re, except: (p.p.) pris, nous prenons, i...
15	devoir	2	devoir, redevoir	87.877921	(irregular)
16	-ir (-iss-)	304	finir, agir, choisir, réussir, réfléchir	89.052373	(p.p.) fini, tu finis, nous finissons, ils fin...
17	suivre	2	suivre, poursuivre	90.010207	Like -re, except: (p.p.) suivi, tu suis
18	espérer	199	espérer, inquiéter, préférer, protéger, répéter	90.856970	Like -er, except: tu espères, ils espèrent
19	acheter	64	acheter, emmener, amener, ramener, enlever	91.628962	Like -er, except: tu achètes, ils achètent, il...
20	croire	1	croire	92.389778	Like -re, except: (p.p.) cru, nous croyons, (p...
21	appeler	112	appeler, rappeler, jeter, rejeter, projeter	93.148066	Like -er, except: tu appelles, ils appellent, ...
22	mettre	15	mettre, promettre, permettre, remettre, admettre	93.891337	Like battre, except: (p.p.) mis, (p.s.) il mit
23	falloir	1	falloir	94.626262	(irregular)
24	connaître	10	connaître, disparaître, reconnaître, paraître,...	95.267398	Like -re, except: (p.p.) connu, je connais, tu...
25	essayer	30	essayer, payer, effrayer, balayer, rayer	95.784982	Like ennuyer, except: tu essaies/tu essayes, i...
26	ouvrir	9	ouvrir, offrir, découvrir, souffrir, couvrir	96.201333	Like -ir, except: (p.p.) ouvert, j'ouvre, tu o...
27	mourir	1	mourir	96.608587	Like -ir, except: (p.p.) mort, tu meurs, ils m...
28	plaire	3	plaire, déplaire, complaire	96.884291	Like taire, except: il plaît/il plait
29	vivre	3	vivre, survivre, revivre	97.143889	Like suivre, except: (p.p.) vécu, (p.s.) il vécut
30	conduire	24	conduire, détruire, construire, produire, réduire	97.396542	Like interdire, except: (p.s.) il conduisit
...	...	...	...	...	...
34	écrire	12	écrire, décrire, inscrire, prescrire, réécrire	98.157882	Like -re, except: (p.p.) écrit, nous écrivons,...
35	boire	2	boire, reboire	98.308728	Like -re, except: (p.p.) bu, nous buvons, ils ...
36	asseoir	2	asseoir, rasseoir	98.453020	Like -ir, except: (p.p.) assis, tu assieds/tu ...
37	lire	4	lire, élire, relire, réélire	98.586161	Like interdire, except: (p.p.) lu, (p.s.) il lut
38	battre	9	battre, abattre, combattre, débattre, rabattre	98.718538	Like -re, except: tu bats
39	recevoir	9	recevoir, apercevoir, décevoir, concevoir, per...	98.845991	Like -ir, except: (p.p.) reçu, tu reçois, nous...
40	ennuyer	52	ennuyer, nettoyer, appuyer, noyer, employer	98.964911	Like -er, except: tu ennuies, ils ennuient, il...
41	valoir	2	valoir, équivaloir	99.070811	(irregular)
42	suffire	1	suffire	99.173942	Like interdire, except: (p.p.) suffi
43	rire	2	rire, sourire	99.260244	Like -re, except: (p.p.) ri, nous rions, ils r...
44	courir	9	courir, parcourir, secourir, accourir, recourir	99.339511	Like -ir, except: (p.p.) couru, tu cours, il c...
45	taire	1	taire	99.408153	Like -re, except: (p.p.) tu, nous taisons, ils...
46	fuir	2	fuir, enfuir	99.471239	Like -ir, except: tu fuis, nous fuyons, ils fu...
47	naître	1	naître	99.522834	Like connaître, except: (p.p.) né, (p.s.) il n...
48	ficher	1	ficher	99.566318	Like -er, except: (p.p.) fiché/fichu
49	convaincre	3	convaincre, vaincre, reconvaincre	99.603891	Like -re, except: je convaincs, tu convaincs, ...
50	interdire	6	interdire, prédire, contredire, médire, adire	99.639585	Like -re, except: (p.p.) interdit, nous interd...
51	prévoir	1	prévoir	99.674342	Like voir, except: il prévoira
52	pleuvoir	1	pleuvoir	99.704196	(defective)
53	parfaire	1	parfaire	99.730887	(defective)
54	haïr	1	haïr	99.755520	Like -ir (-iss-), except: tu hais, (p.s.) il h...
55	accueillir	3	accueillir, cueillir, recueillir	99.779695	Like -ir, except: j'accueille, tu accueilles, ...
56	bénir	1	bénir	99.801364	Like -ir (-iss-), except: (p.p.) béni/bénit
57	faillir	1	faillir	99.821006	(irregular)
58	résoudre	1	résoudre	99.839200	Like -re, except: (p.p.) résolu, tu résous, no...
59	conclure	2	conclure, exclure	99.856353	Like -re, except: (p.p.) conclu, (p.s.) il con...
60	conquérir	6	conquérir, acquérir, requérir, reconquérir, en...	99.870845	Like -ir, except: (p.p.) conquis, tu conquiers...
61	distraire	10	distraire, extraire, traire, soustraire, rentr...	99.885199	(defective)
62	pourvoir	1	pourvoir	99.894749	Like prévoir, except: (p.s.) il pourvut
63	douer	1	douer	99.903770	(defective)

63 rows × 5 columns

In [21]:

plt.plot(conjugators.index.values, conjugators.freqfilms2)
plt.title('Verb Coverage by Conjugator')
#plt.legend(loc = 'lower right')
plt.xlabel('Verb conjugations known')
plt.ylabel('% coverage')
plt.ylim((0,100))
plt.xlim((1,60))

Out[21]:

(1, 60)