Lexique 3 is a French word database from the Université de Savoie. It includes:
It's a great database and a ton of fun. In this notebook, we use a copy of Lexique that has been loaded into the SQLite 3 database. But since we don't want to get into the messy details of the Python, we build the database using a Makefile
, and we keep all of our Python utility functions in an external file named lexique.py
. If you want to see all those details, or customize this analysis, check out this notebook on GitHub.
First, let's get everything loaded.
%matplotlib inline
%run lexique.py
First, let's take a look at what data we have available. First, we have the raw data from Lexique, which includes inflected forms in the ortho
column:
sql("SELECT * FROM lexique WHERE lemme = 'avoir' LIMIT 5")
ortho | phon | lemme | cgram | genre | nombre | freqlemfilms2 | freqlemlivres | freqfilms2 | freqlivres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | a | a | avoir | AUX | 18559.22 | 12800.81 | 6350.91 | 2926.69 | ||
1 | a | a | avoir | VER | 13572.40 | 6426.49 | 5498.34 | 1669.39 | ||
2 | ai | E | avoir | AUX | 18559.22 | 12800.81 | 4902.10 | 2119.12 | ||
3 | ai | E | avoir | VER | 13572.40 | 6426.49 | 2475.78 | 619.05 | ||
4 | aie | E | avoir | AUX | 18559.22 | 12800.81 | 31.75 | 21.69 |
We have another table, lemme
, which sums over all the orthographies associated with a given lemma. The lemme
column, however, is still not unique: If a given word can be either a noun or a verb, it will appear twice. And so on.
sql("SELECT * FROM lemme ORDER BY freqlivres DESC LIMIT 5")
lemme | cgram | genre | nombre | freqfilms2 | freqlivres | |
---|---|---|---|---|---|---|
0 | de | PRE | 25220.86 | 38928.92 | ||
1 | la | ART:def | f | s | 14946.48 | 23633.92 |
2 | et | CON | 12909.08 | 20879.73 | ||
3 | à | PRE | 12190.40 | 19209.05 | ||
4 | le | ART:def | m | s | 13652.76 | 18310.95 |
And finally, we collapse all the cgram
, genre
and nombre
values associated with a given value of lemme
, to give us unique lemmas with frequency data:
sql("SELECT * FROM lemme_simple ORDER BY freqlivres DESC LIMIT 5")
lemme | freqfilms2 | freqlivres | |
---|---|---|---|
0 | de | 25220.96 | 38928.92 |
1 | la | 16028.08 | 24877.30 |
2 | être | 40411.41 | 21709.87 |
3 | et | 12909.08 | 20879.73 |
4 | le | 16953.50 | 20735.14 |
Note that we have two sets of frequency data: freqfilms2
, which is based on a corpus of film subtitles, and freqlivres
, which is based on a corpus of books. There are some important differences. For example, French films use the passé composé far more often than books, which raises the frequencies of the auxiliary verbs être and avoir:
sql("SELECT * FROM lemme_simple ORDER BY freqfilms2 DESC LIMIT 5")
lemme | freqfilms2 | freqlivres | |
---|---|---|---|
0 | être | 40411.41 | 21709.87 |
1 | avoir | 32134.77 | 19230.64 |
2 | je | 25988.48 | 10862.77 |
3 | de | 25220.96 | 38928.92 |
4 | ne | 22297.51 | 13852.97 |
Using the film dataset, let's take a look at the parts of speech:
cgram_freq = sql("""
SELECT cgram, SUM(freqfilms2) AS freqfilms2, SUM(freqlivres) AS freqlivres
FROM lemme GROUP BY cgram
""", index_col='cgram')
cgram_freq
freqfilms2 | freqlivres | |
---|---|---|
cgram | ||
1.20 | 0.00 | |
ADJ | 42939.77 | 56548.13 |
ADJ:dem | 6363.94 | 6802.23 |
ADJ:ind | 2999.34 | 3737.10 |
ADJ:int | 1273.62 | 582.91 |
ADJ:num | 2525.93 | 4680.43 |
ADJ:pos | 19106.03 | 20005.62 |
ADV | 97693.38 | 69747.44 |
ART:def | 54495.63 | 83470.15 |
ART:ind | 26051.19 | 33763.58 |
AUX | 26633.45 | 19302.64 |
CON | 29730.47 | 38189.17 |
LIA | 0.00 | 412.57 |
NOM | 144894.66 | 186537.81 |
ONO | 6291.15 | 1501.06 |
PRE | 77439.28 | 118274.42 |
PRO:dem | 15700.13 | 7549.99 |
PRO:ind | 7538.14 | 5716.50 |
PRO:int | 1612.64 | 736.37 |
PRO:per | 133651.20 | 90995.84 |
PRO:pos | 334.17 | 322.16 |
PRO:rel | 11547.03 | 14483.19 |
VER | 198390.74 | 150350.57 |
cgram_freq_summary = cgram_freq.groupby(lambda x: x[0:3]).sum()
plt.figure(figsize=(7,7))
plt.subplot(aspect=True)
plt.pie(cgram_freq_summary.freqfilms2, labels=cgram_freq_summary.index.values, colors=colors)
plt.title("Parts of speech")
<matplotlib.text.Text at 0x41c2050>
How many words do we need to know to understand 98% of the individual words which appear on a given page?
coverage = sql("""
SELECT lemme, freqfilms2 FROM lemme_simple
ORDER BY freqfilms2 DESC""")
coverage.index += 1
coverage['film_coverage'] = \
100*coverage['freqfilms2'].cumsum() / coverage['freqfilms2'].sum()
del coverage['lemme']
del coverage['freqfilms2']
coverage[0:5]
film_coverage | |
---|---|
1 | 4.454456 |
2 | 7.996598 |
3 | 10.861248 |
4 | 13.641296 |
5 | 16.099099 |
book_coverage = sql("""
SELECT lemme, freqlivres FROM lemme_simple
ORDER BY freqlivres DESC""")
book_coverage.index += 1
coverage['book_coverage'] = \
100*book_coverage['freqlivres'].cumsum() / book_coverage['freqlivres'].sum()
coverage[0:5]
film_coverage | book_coverage | |
---|---|---|
1 | 4.454456 | 4.260534 |
2 | 7.996598 | 6.983203 |
3 | 10.861248 | 9.359217 |
4 | 13.641296 | 11.644377 |
5 | 16.099099 | 13.913712 |
plt.plot(coverage.index.values, coverage.film_coverage, label="Film Coverage")
plt.plot(coverage.index.values, coverage.book_coverage, label="Book Coverage")
plt.legend(loc = 'lower right')
plt.title('Text Coverage')
plt.xlabel('Vocabulary size')
plt.ylabel('% coverage')
plt.xlim((0,10000))
(0, 10000)
Or, in table form, here's how many words you need to know to get a given percentage of coverage:
coverage.loc[[250, 500, 1000, 2000, 4000, 8000, 16000], :]
film_coverage | book_coverage | |
---|---|---|
250 | 76.164757 | 68.074024 |
500 | 82.792428 | 74.990087 |
1000 | 88.386602 | 81.450089 |
2000 | 93.004247 | 87.535363 |
4000 | 96.410076 | 92.756703 |
8000 | 98.554939 | 96.606475 |
16000 | 99.666146 | 99.017399 |
We want to get a feel for how many nouns, verbs, etc., are required in a well-balanced vocabulary. This requires grouping words by part of speech, sorting them by frequency, and graphing the cumulative text coverge for a given number of words. This takes a fair bit of work to set up.
First, we need to do quite a bit of data munging:
# Merge related cgrams, sum frequency over (cgram, lemme) groups,
# and sort by (cgram,freqfilms2).
cgram_lemme_freq = sql("""
SELECT cgram, SUM(freqfilms2) AS freqfilms2
FROM (SELECT CASE WHEN cgram='AUX' THEN 'VER'
ELSE SUBSTR(cgram, 1, 3)
END AS cgram,
lemme, freqfilms2
FROM lemme)
GROUP BY cgram, lemme
ORDER BY cgram, freqfilms2 DESC
""")
# Convert freqfilms2 to a cumulative percentage over each cgram group.
cgram_col = cgram_lemme_freq['cgram']
normalized_freq = cgram_lemme_freq.groupby(cgram_col).transform(lambda x: x/x.sum())
cumulative_freq = normalized_freq.groupby(cgram_col).cumsum()
cgram_lemme_freq['freqfilms2'] = 100.0*cumulative_freq
# Sequentially number the rows within each cgram group so we can see the
# vocabulary size corresponding to each cumulative percentage.
cgram_lemme_freq['rang'] = cgram_lemme_freq.groupby(cgram_col).cumcount()+1
# Index by cgram group, and vocabulary size within the group. Uncomment the
# last line to view the data.
cgram_lemme_freq.set_index(['cgram', 'rang'], inplace = True)
#cgram_lemme_freq
Now that we have the data, we can plot it using two different graphs: One for the "large" parts of speech (nouns, etc.), and one for the parts of speech which are either closed classes, or at least very small.
def plot_cgrams(labels):
for key in labels.keys():
cgram_group = cgram_lemme_freq.loc[key]
plt.plot(cgram_group.index.values, cgram_group.freqfilms2, label=labels[key])
plt.legend(loc = 'lower right')
plt.title('Text Coverage by Part of Speech (films)')
plt.xlabel('Words known by part of speech')
plt.ylabel('% coverage')
plt.ylim((0,100))
plt.axhline(y=90, color='k', ls='dashed')
plt.figure(figsize=(12,4))
plt.subplot(121)
small_cgram_labels = {
'PRO': 'Pronouns',
'ADV': 'Adverbs',
'PRE': 'Prepositions',
'CON': 'Conjuctions',
'ART': 'Articles'
}
plot_cgrams(small_cgram_labels)
plt.xlim((0,150))
plt.subplot(122)
large_cgram_labels = {
'NOM': 'Nouns',
'VER': 'Verbs',
'ADJ': 'Adjectives'
}
plot_cgrams(large_cgram_labels)
plt.xlim((0,10000))
(0, 10000)
It would be nice to have this as a table, too, so we can figure out—for example—how many nouns we need to get 75% coverage. Once again, this will require a fair bit of data munging.
# Only include the parts of speech used in our graph.
cgram_labels = small_cgram_labels.copy()
cgram_labels.update(large_cgram_labels)
interesting = cgram_lemme_freq.loc[cgram_labels.keys()]
# We'll use this to build a list of columns in our final table.
columns = []
# Calculate minimum number words for a given percentage of coverage.
for threshold in [75,90,95,98,99,99.5]:
# Discard all the rows below our threshold.
over_threshold = interesting[interesting['freqfilms2'] >= threshold]
# Take the first row that remains.
over_threshold.reset_index(inplace=True)
over_threshold.set_index('cgram', inplace=True)
first_over = over_threshold.groupby(level=0).first()
# Keep only a single column named after our threashold.
del first_over['freqfilms2']
first_over.rename(columns={'rang': '%r%%' % threshold}, inplace=True)
columns.append(first_over)
# Join all the columns together.
table = columns[0].join(columns[1:])
# Clean up the table a bit and add a total
table.index.names = ['Part of speech']
table.index = table.index.map(lambda i: cgram_labels[i])
table.loc['TOTAL'] = table.sum()
table
75% | 90% | 95% | 98% | 99% | 99.5% | |
---|---|---|---|---|---|---|
Adjectives | 136 | 620 | 1367 | 2686 | 3742 | 4736 |
Adverbs | 17 | 42 | 69 | 118 | 182 | 277 |
Articles | 6 | 8 | 9 | 9 | 10 | 10 |
Conjuctions | 5 | 9 | 11 | 14 | 15 | 17 |
Nouns | 1137 | 3115 | 5215 | 8454 | 10956 | 13347 |
Prepositions | 6 | 9 | 14 | 21 | 26 | 30 |
Pronouns | 16 | 24 | 29 | 40 | 50 | 65 |
Verbs | 63 | 290 | 583 | 1108 | 1601 | 2149 |
TOTAL | 1386 | 4117 | 7297 | 12450 | 16582 | 20631 |
We divide verbs into the three standard groups, -er, -ir and -re. We split aller into its own group, because it's the only irregular -er verb. For now, we treat the auxiliary versions of être and avoir in the passé composé as being ordinary verbs.
verbs = sql("SELECT * FROM verbe ORDER BY freqfilms2 DESC")
verbs[0:15]
lemme | groupe | prototype | conjugaison | aux | freqfilms2 | freqlivres | |
---|---|---|---|---|---|---|---|
0 | être | re | être | être | avoir | 40310.72 | 21587.31 |
1 | avoir | ir | avoir | avoir | avoir | 32131.64 | 19227.33 |
2 | aller | aller | aller | aller | être | 9992.78 | 2854.92 |
3 | faire | re | .*faire | faire | avoir | 8813.52 | 5328.99 |
4 | dire | re | dire|redire | dire | avoir | 5946.18 | 4832.51 |
5 | pouvoir | ir | pouvoir | pouvoir | avoir | 5524.46 | 2659.75 |
6 | vouloir | ir | .*vouloir | vouloir | avoir | 5249.31 | 1640.16 |
7 | savoir | ir | .*savoir | savoir | avoir | 4516.72 | 2003.59 |
8 | voir | ir | .*voir|.*oir | voir | avoir | 4119.47 | 2401.76 |
9 | devoir | ir | .*devoir | devoir | avoir | 3232.59 | 1318.20 |
10 | venir | ir | .*venir | venir | être | 2763.82 | 1514.53 |
11 | suivre | re | .*suivre | suivre | avoir | 2090.55 | 949.13 |
12 | parler | er | .*er | -er | avoir | 1970.53 | 1086.02 |
13 | prendre | re | .*prendre | prendre | avoir | 1913.84 | 1466.44 |
14 | croire | re | .*croire | croire | avoir | 1712.02 | 947.25 |
As we can see, all three groups have roughly equal text coverage, but there are actually far more -er verbs than all the others combined. This suggests that a small number of -ir and -re verbs are disproportionately common.
plt.figure(figsize=(8,8))
plt.subplot(121, aspect=True)
group_freq = verbs.groupby(verbs['groupe']).sum()
plt.pie(group_freq.freqfilms2, labels=group_freq.index.values, colors=colors)
plt.title("Verb group frequency")
plt.subplot(122, aspect=True)
group_size = verbs.groupby(verbs['groupe']).count()
plt.pie(group_size.lemme, labels=group_size.index.values, colors=colors)
plt.title("Verb group size (words)")
<matplotlib.text.Text at 0x4eb8b10>
# Extract the columns we need, and get rid of 'aller'.
group_freq = verbs[['groupe', 'lemme', 'freqfilms2']].copy()
group_freq = group_freq[group_freq['groupe'] != 'aller']
# Calculate coverage percentages for frequency ranks in each group.
groupe_col = group_freq['groupe']
normalized_freq = group_freq.groupby(groupe_col).transform(lambda x: x/x.sum())
cumulative_freq = 100.0*normalized_freq.groupby(groupe_col).cumsum()
group_freq['freqfilms2'] = cumulative_freq
group_freq['rang'] = group_freq.groupby(groupe_col).cumcount()+1
group_freq.set_index(['groupe', 'rang'], inplace=True)
group_freq[0:10]
lemme | freqfilms2 | ||
---|---|---|---|
groupe | rang | ||
re | 1 | être | 53.707239 |
ir | 1 | avoir | 44.826275 |
re | 2 | faire | 65.449768 |
3 | dire | 73.372051 | |
ir | 2 | pouvoir | 52.533350 |
3 | vouloir | 59.856569 | |
4 | savoir | 66.157764 | |
5 | voir | 71.904763 | |
6 | devoir | 76.414491 | |
7 | venir | 80.270247 |
# Sigh. My database is in French, and my libraries are in English.
# There's no way to avoid coding in franglais, I fear.
for group in ['er', 'ir', 're']:
g = group_freq.loc[group]
plt.plot(g.index.values, g.freqfilms2, label=group)
plt.title('Verb Coverage by Group')
plt.legend(loc = 'lower right')
plt.xlabel('Verbs known in group')
plt.ylabel('% coverage')
plt.xlim((1,100))
plt.ylim((0,100))
(0, 100)
If we take the first 40 -ir and -re verbs, we get better than 96% coverage. Even the first 20 in each group will give us better than 92% coverage. Here's a list for people who want to master all the high-frequency irregular verbs.
def html_for_group(groupe):
lst = ', '.join(group_freq.loc[groupe].loc[1:40]['lemme'].tolist())
return '<p><i>-%s</i> verbs: %s.</p>' % (groupe, lst)
HTML("<p><i>-er</i> verbs: aller.</p>" + html_for_group('ir') + html_for_group('re'))
-er verbs: aller.
-ir verbs: avoir, pouvoir, vouloir, savoir, voir, devoir, venir, falloir, partir, mourir, sortir, revenir, finir, sentir, tenir, devenir, ouvrir, dormir, asseoir, souvenir, servir, valoir, agir, recevoir, mentir, offrir, choisir, revoir, courir, réussir, prévenir, découvrir, maintenir, réfléchir, souffrir, couvrir, obtenir, appartenir, ressentir, prévoir.
-re verbs: être, faire, dire, suivre, prendre, croire, attendre, mettre, connaître, comprendre, entendre, plaire, perdre, vivre, rendre, foutre, apprendre, boire, écrire, lire, répondre, descendre, suffire, vendre, battre, promettre, permettre, conduire, disparaître, taire, remettre, reconnaître, rire, reprendre, détruire, paraître, craindre, naître, rejoindre, défendre.
Of course, not all -er verbs are completely regular, and there are patterns among the other verb groups. Fortunately, there's a nice XML file of French verb conjugation rules that we can use to examine these hidden details. Combining that with quite a bit of custom code, we can assign a "conjugator" to each verb prototype, and verify that the generated forms match the XML data. This gives us a much shorter list of key forms.
verbs2 = sql("""
SELECT conjugaison.nom AS conjugaison, lemme, freqfilms2, resume
FROM verbe
LEFT OUTER JOIN conjugaison
ON verbe.conjugaison = conjugaison.nom
ORDER BY freqfilms2 DESC
""")
verbs2['freqfilms2'] = 100 * verbs2['freqfilms2'] / verbs2['freqfilms2'].sum()
def summarize_conjugator(grp):
return pd.Series(dict(exemples=', '.join(grp.lemme[0:5]),
compte=grp.lemme.count(),
freqfilms2=grp.freqfilms2.sum(),
resume=grp.resume.iloc[0]))
conjugators = verbs2.groupby('conjugaison').apply(summarize_conjugator).sort('freqfilms2', ascending=False)
conjugators.reset_index(inplace=True)
conjugators.index.names = ['rang']
conjugators.reset_index(inplace=True)
conjugators['rang'] = conjugators['rang'] + 1
conjugators['freqfilms2'] = conjugators['freqfilms2'].cumsum()
conjugators.set_index('rang', inplace=True)
save_tsv('conjugators.tsv', conjugators)
conjugators
conjugaison | compte | exemples | freqfilms2 | resume | |
---|---|---|---|---|---|
rang | |||||
1 | -er | 5187 | parler, aimer, passer, penser, trouver | 27.057864 | (p.p.) parlé, je parle, tu parles, il parle, n... |
2 | être | 1 | être | 44.971814 | (irregular) |
3 | avoir | 1 | avoir | 59.251008 | (irregular) |
4 | aller | 1 | aller | 63.691766 | (irregular) |
5 | faire | 5 | faire, refaire, satisfaire, défaire, contrefaire | 67.651207 | (irregular) |
6 | dire | 2 | dire, redire | 70.297491 | Like interdire, except: vous dites |
7 | pouvoir | 1 | pouvoir | 72.752543 | (irregular) |
8 | venir | 26 | venir, revenir, tenir, devenir, souvenir | 75.130407 | Like -ir, except: (p.p.) venu, tu viens, nous ... |
9 | vouloir | 2 | vouloir, revouloir | 77.463383 | (irregular) |
10 | savoir | 3 | savoir, non-savoir, assavoir | 79.470603 | (irregular) |
11 | -re | 50 | attendre, entendre, perdre, rendre, répondre | 81.428246 | (p.p.) attendu, tu attends, nous attendons, il... |
12 | voir | 5 | voir, revoir, entrevoir, ravoir, comparoir | 83.332903 | Like -ir, except: (p.p.) vu, nous voyons, ils ... |
13 | partir | 18 | partir, sortir, sentir, dormir, servir | 84.950418 | Like -ir, except: tu pars |
14 | prendre | 12 | prendre, comprendre, apprendre, reprendre, sur... | 86.441360 | Like -re, except: (p.p.) pris, nous prenons, i... |
15 | devoir | 2 | devoir, redevoir | 87.877921 | (irregular) |
16 | -ir (-iss-) | 304 | finir, agir, choisir, réussir, réfléchir | 89.052373 | (p.p.) fini, tu finis, nous finissons, ils fin... |
17 | suivre | 2 | suivre, poursuivre | 90.010207 | Like -re, except: (p.p.) suivi, tu suis |
18 | espérer | 199 | espérer, inquiéter, préférer, protéger, répéter | 90.856970 | Like -er, except: tu espères, ils espèrent |
19 | acheter | 64 | acheter, emmener, amener, ramener, enlever | 91.628962 | Like -er, except: tu achètes, ils achètent, il... |
20 | croire | 1 | croire | 92.389778 | Like -re, except: (p.p.) cru, nous croyons, (p... |
21 | appeler | 112 | appeler, rappeler, jeter, rejeter, projeter | 93.148066 | Like -er, except: tu appelles, ils appellent, ... |
22 | mettre | 15 | mettre, promettre, permettre, remettre, admettre | 93.891337 | Like battre, except: (p.p.) mis, (p.s.) il mit |
23 | falloir | 1 | falloir | 94.626262 | (irregular) |
24 | connaître | 10 | connaître, disparaître, reconnaître, paraître,... | 95.267398 | Like -re, except: (p.p.) connu, je connais, tu... |
25 | essayer | 30 | essayer, payer, effrayer, balayer, rayer | 95.784982 | Like ennuyer, except: tu essaies/tu essayes, i... |
26 | ouvrir | 9 | ouvrir, offrir, découvrir, souffrir, couvrir | 96.201333 | Like -ir, except: (p.p.) ouvert, j'ouvre, tu o... |
27 | mourir | 1 | mourir | 96.608587 | Like -ir, except: (p.p.) mort, tu meurs, ils m... |
28 | plaire | 3 | plaire, déplaire, complaire | 96.884291 | Like taire, except: il plaît/il plait |
29 | vivre | 3 | vivre, survivre, revivre | 97.143889 | Like suivre, except: (p.p.) vécu, (p.s.) il vécut |
30 | conduire | 24 | conduire, détruire, construire, produire, réduire | 97.396542 | Like interdire, except: (p.s.) il conduisit |
... | ... | ... | ... | ... | ... |
34 | écrire | 12 | écrire, décrire, inscrire, prescrire, réécrire | 98.157882 | Like -re, except: (p.p.) écrit, nous écrivons,... |
35 | boire | 2 | boire, reboire | 98.308728 | Like -re, except: (p.p.) bu, nous buvons, ils ... |
36 | asseoir | 2 | asseoir, rasseoir | 98.453020 | Like -ir, except: (p.p.) assis, tu assieds/tu ... |
37 | lire | 4 | lire, élire, relire, réélire | 98.586161 | Like interdire, except: (p.p.) lu, (p.s.) il lut |
38 | battre | 9 | battre, abattre, combattre, débattre, rabattre | 98.718538 | Like -re, except: tu bats |
39 | recevoir | 9 | recevoir, apercevoir, décevoir, concevoir, per... | 98.845991 | Like -ir, except: (p.p.) reçu, tu reçois, nous... |
40 | ennuyer | 52 | ennuyer, nettoyer, appuyer, noyer, employer | 98.964911 | Like -er, except: tu ennuies, ils ennuient, il... |
41 | valoir | 2 | valoir, équivaloir | 99.070811 | (irregular) |
42 | suffire | 1 | suffire | 99.173942 | Like interdire, except: (p.p.) suffi |
43 | rire | 2 | rire, sourire | 99.260244 | Like -re, except: (p.p.) ri, nous rions, ils r... |
44 | courir | 9 | courir, parcourir, secourir, accourir, recourir | 99.339511 | Like -ir, except: (p.p.) couru, tu cours, il c... |
45 | taire | 1 | taire | 99.408153 | Like -re, except: (p.p.) tu, nous taisons, ils... |
46 | fuir | 2 | fuir, enfuir | 99.471239 | Like -ir, except: tu fuis, nous fuyons, ils fu... |
47 | naître | 1 | naître | 99.522834 | Like connaître, except: (p.p.) né, (p.s.) il n... |
48 | ficher | 1 | ficher | 99.566318 | Like -er, except: (p.p.) fiché/fichu |
49 | convaincre | 3 | convaincre, vaincre, reconvaincre | 99.603891 | Like -re, except: je convaincs, tu convaincs, ... |
50 | interdire | 6 | interdire, prédire, contredire, médire, adire | 99.639585 | Like -re, except: (p.p.) interdit, nous interd... |
51 | prévoir | 1 | prévoir | 99.674342 | Like voir, except: il prévoira |
52 | pleuvoir | 1 | pleuvoir | 99.704196 | (defective) |
53 | parfaire | 1 | parfaire | 99.730887 | (defective) |
54 | haïr | 1 | haïr | 99.755520 | Like -ir (-iss-), except: tu hais, (p.s.) il h... |
55 | accueillir | 3 | accueillir, cueillir, recueillir | 99.779695 | Like -ir, except: j'accueille, tu accueilles, ... |
56 | bénir | 1 | bénir | 99.801364 | Like -ir (-iss-), except: (p.p.) béni/bénit |
57 | faillir | 1 | faillir | 99.821006 | (irregular) |
58 | résoudre | 1 | résoudre | 99.839200 | Like -re, except: (p.p.) résolu, tu résous, no... |
59 | conclure | 2 | conclure, exclure | 99.856353 | Like -re, except: (p.p.) conclu, (p.s.) il con... |
60 | conquérir | 6 | conquérir, acquérir, requérir, reconquérir, en... | 99.870845 | Like -ir, except: (p.p.) conquis, tu conquiers... |
61 | distraire | 10 | distraire, extraire, traire, soustraire, rentr... | 99.885199 | (defective) |
62 | pourvoir | 1 | pourvoir | 99.894749 | Like prévoir, except: (p.s.) il pourvut |
63 | douer | 1 | douer | 99.903770 | (defective) |
63 rows × 5 columns
plt.plot(conjugators.index.values, conjugators.freqfilms2)
plt.title('Verb Coverage by Conjugator')
#plt.legend(loc = 'lower right')
plt.xlabel('Verb conjugations known')
plt.ylabel('% coverage')
plt.ylim((0,100))
plt.xlim((1,60))
(1, 60)