Notebook

Attempts at LZ Rating Predictions¶

This is a revised notebook that was shared on r/cbaduk about a week ago, but expanded with more stats of progress so far, charts and predictions of future progress.

Notebook starts with bunch of code you can skip if you are not interested. Actual charts and estimates are couple of screens under.

Fetching and merging¶

In [1]:

from bs4 import BeautifulSoup
from urllib.request import urlopen
from copy import deepcopy
import json
import itertools

In [2]:

INTERNAL_RATINGS_URL = 'http://zero.sjeng.org/data/elograph.json'
CGOS_BAYES_URL = 'http://www.yss-aya.com/cgos/19x19/bayes.html'
CGOS_STANDINGS_URL = 'http://www.yss-aya.com/cgos/19x19/standings.html'

In [3]:

import warnings
warnings.filterwarnings('ignore')

In [4]:

def get_best_networks():
    with urlopen(INTERNAL_RATINGS_URL) as resp:
        int_str = resp.read().decode('utf8')
    internal = json.loads(int_str)
    return reversed([i for i in internal if i['best'] == 'true'])

In [5]:

def get_cgos_bayes():
    with urlopen(CGOS_BAYES_URL) as resp:
        cgos_bayes_str = resp.read().decode('utf8')
    doc = BeautifulSoup(cgos_bayes_str, 'html5lib')
    return {
        row.contents[3].a.string: int(row.contents[5].string)
        for row in itertools.islice(doc.find_all('tr'), 1, None)
    }

In [6]:

def get_merged_ratings():
    bests = get_best_networks()
    cgos_bayes = get_cgos_bayes()
    res = []
    for b in bests:
        name = 'LZ-%s-t1-p1600' % b['hash']
        if name in cgos_bayes:
            b['cgos'] = cgos_bayes.get(name)
            res.append(b)
    return res

Processing and plotting¶

In [7]:

import matplotlib.pyplot as plt
import numpy as np
from numpy.polynomial import polynomial as poly
from datetime import date

In [8]:

def spread_equidistantly(ratings, start, end):
    res = deepcopy(ratings)
    start_net = 0
    end_net = 0
    count = 0
    to_fix = []
    for r in res:
        if r['hash'] == start:
            start_net = r['net']
            continue
        if r['hash'] == end:
            end_net = r['net']
            break
        if start_net:
            count += 1
            to_fix.append(r)
    diff = (end_net - start_net) / count
    for i, r in enumerate(to_fix):
        r['net'] = start_net + (i + 1) * diff
    return res

In [9]:

def linear_cgos_game_count(ratings):
    internal = [r['rating'] for r in ratings]
    cgos = [r['cgos'] for r in ratings]
    game_count = [r['net'] for r in ratings]
    
    n, k = poly.polyfit(cgos, internal, 1)
    scaled = (np.asarray(internal) - n) / k
    
    fig, ax = plt.subplots(figsize=(12, 9))
    ax.plot(game_count, scaled, marker='o', label='scaled')
    ax.plot(game_count, cgos, marker='s', label='cgos')
    
    ax.legend()
    ax.grid(True)
    
    ax.set(xlabel='Selfplay games', ylabel='Elo',
           title='Internal Elo = CGos * {:.2f} {:=+6d}'.format(k, int(n)))
    plt.show()

In [10]:

def cgos_internal_projections(ratings):
    internal = np.asarray([r['rating'] for r in ratings])
    cgos = np.asarray([r['cgos'] for r in ratings])
    coef1 = poly.polyfit(internal, cgos, 1)
    coef_latest = poly.polyfit(internal[-30:], cgos[-30:], 1)
    coef2 = poly.polyfit(internal, cgos, 2)
    
    proj = np.concatenate([internal, [10500]])
    proj_latest = np.concatenate([internal[-30:], [10500]])
    
    fig, ax = plt.subplots(figsize=(12, 9))
    
    ax.plot(internal, cgos, label='cgos')
    ax.plot(proj, poly.polyval(proj, coef1), '--', label='linear')
    ax.plot(proj_latest, poly.polyval(proj_latest, coef_latest), '-.', label='linear last 30 nets')
    ax.plot(
        proj, poly.polyval(proj, coef2), ':', label='quadriatic'
    )
    
    ax.set(xlabel='Internal Elo', ylabel='CGoS Elo', title='Projections')
    ax.legend()
    ax.grid(True)
    
    plt.show()

In [11]:

def avg_gain(ratings, key, ylabel='rating gain', window_size=10):
    avgs = []
    gain_min = []
    gain_max = []
    avg = (ratings[-1][key] - ratings[0][key]) / (len(ratings) - 1)
    for i in range(len(ratings) - window_size):
        avgs.append((ratings[i+window_size][key] - ratings[i][key]) / (window_size - 1))
        diffs = [
            b[key] - a[key] for a, b in
            zip(ratings[i:i + window_size - 1], ratings[i + 1:i+window_size])
        ]
        gain_min.append(min(diffs))
        gain_max.append(max(diffs))
    fig, ax = plt.subplots(figsize=(8, 6))
    cnt = len(avgs)
    ax.plot(range(cnt), avgs, label='window average')
    ax.plot(range(cnt), gain_min, '--', label='window min')
    ax.plot(range(cnt), gain_max, '--', label='window max')
    ax.plot(range(cnt), np.ones([cnt]) * avg, ':', label='overall average')
    ax.set(ylabel=ylabel,
           title='Gain across a sliding window of {:d} consecutive networks AVG = {:.2f}'.format(window_size, avg))
    ax.legend()
    ax.grid(True)
    plt.show()

In [12]:

def rating_gamecount_log_projection(ratings, key='cgos', ylabel='rating', rank=1, predictions=1):
    rating = np.asarray([r[key] for r in ratings])
    game_count = np.asarray([r['net'] for r in ratings])
    lg = np.log(game_count)
    lgfit = poly.polyfit(lg, rating, rank)
    fig, ax = plt.subplots(figsize=(12, 9))
    ax.plot(game_count, rating)
    x = np.concatenate([game_count, [(5 + i) * 1000 * 1000 for i in range(predictions)]])
    ax.plot(x, poly.polyval(np.log(x), lgfit), '--', label='log fit')
    ax.set(ylabel=ylabel, xlabel='Selfplay games', 
           title='Rating prediction')
    ax.legend()
    ax.grid(True)
    plt.show()

In [13]:

merged = get_merged_ratings()

Prediction time!¶

Disclaimer: I know just enough linear algebra to be dangerous with it, but not enough statistics to really know if what I'm doing makes sense. Statistical significance, confidence intervals, standard deviations, etc. are nowhere to be found on this page which only proves that these numbers shouldn't be trusted :) In my defense, I'll express my final estimate in weeks, so this analysis should be good enough for that level of precision.

Let's first assume that CGoS rating and internal Elo are linearly correlated, and see how rating progressed with number of self-play games so far:

In [14]:

linear_cgos_game_count(merged)

That was nice, but with scale factor of 3.25, we could expect LZ to measure up with top contestants on CGoS only when it reaches internal Elo around 12,000. Considering a diminishing pace of progress, it could take a very long time. Let's see if linear relation is really the best way to predict future progress:

In [15]:

cgos_internal_projections(merged)

A bit of relief. It turns out simple linear regression is most pessimistic in its predictions, and it doesn't fit the data very well anyway. Two competing models are quadratic and linear that only takes into account last 30 networks.

So why linear on just last 30 networks? Well, earlier networks were smaller, and code got changed in the meantime, and bugs fixed. Maybe it makes sense to ignore how older code / network size progressed and focus on a current state. This seams like a realistic lower bound of progress. At around 10,000 internal Elo, LZ would sit comfortably in top 100 on CGoS, and maybe even top 50. Domination expected around Elo 11,000.

Quadratic is the most optimistic of three, and it also fits data very well. It says, the stronger the network is, more will it's Elo gain be transferred to CGoS. Reasoning behind it is that earlier networks could exploit weaknesses in previous iterations, and achieve easy wins, but those exploits weren't so generalizable to other opponents. As newer networks have fewer obvious exploits, it requires real strength gain to get promoted. Internal Elo of 10,000 would be enough to match top bots on CGoS.

So how much of internal Elo gain can a new network expect anyway?

In [16]:

avg_gain(merged, 'rating', 'Internal gain')

Ouch. So Internal Elo transfers better and better to CGoS rating, but each new network gains less and less compared to the previous one. Does these two effects cancel out?

In [17]:

avg_gain(merged, 'cgos', 'CGoS gain')

Pretty much. While new network can gain as much as 200 points (or loose as much as 100), average gain merrily oscillates around 26 points. Best LZ is around 2900 CGoS rating, while top bots are ~4000, so it seems that we are a bit more than 40 networks away from the top spot. Considering that recent networks gain ~60 internal points and that minimum for promotion is ~40 points, "a bit more than 40 networks" translates to about 2500 internal Elo. So we get a number of 10,500 internal Elo to get to the top which is smack in between our previous two predictions. We are onto something here... But how much is that in a real time?

Dates are not available in ratings JSON, but currently site claims 54360 self-played games in last 24h. Latest candidate-network is at 3.7 millions. Network 55b... was at 1.7 millions on 4th of January

In [18]:

diff = date(2018, 2, 10) - date(2018, 1, 4)
print('days: ', diff.days)
print('games per day: ', 2000000 / diff.days)

days:  37
games per day:  54054.05405405405

So ~54000 games a day seems like pretty stable estimate. Now all that is left to figure out is how many self-play games are needed to get to 4k CGoS Elo.

This is where it gets a bit ugly. Reason is that there was a bug in training code that caused a long stretch of slow improvement, followed by burst of new networks once the bug was fixed. So I'll (gasp) tweak the data to fit curves better.

In [19]:

tweaked = spread_equidistantly(merged, 'ffc1e5', 'd16fa4')

So what kind of curves should we fit? This slowing down business sounds like logarithmic, so let's try that:

In [20]:

rating_gamecount_log_projection(tweaked, key='rating', ylabel='Internal Elo', predictions=2)

It doesn't quite fit the data, but we have ourselves a first estimate: in about 2 million more games, or about 5 weeks, LZ will rule the world. Let's crosscheck with fit on CGoS rating:

In [21]:

rating_gamecount_log_projection(tweaked, key='cgos', ylabel='CGoS Elo', predictions=4)

Nope. This is ridiculous extrapolation from available data and a fit that isn't that good anyway, but let's call it a 5 more million games, or about 12 weeks. Can we get better predictions?

In [22]:

rating_gamecount_log_projection(tweaked, key='cgos', ylabel='CGoS Elo', predictions=2, rank=2)

Fit is better, but result is... never? What does internal Elo says?

In [23]:

rating_gamecount_log_projection(tweaked, key='rating', ylabel='Internal Elo', predictions=1, rank=2)

OMG it's going backwards! Tweak some more parameters!

In [24]:

rating_gamecount_log_projection(tweaked, key='rating', ylabel='Internal Elo', predictions=4, rank=3)

I'm assuming cubic polynomial of logarithm of something makes no sense whatsoever, but there you go... another 12 weeks estimate.

Well, it turned out that fitting this curve was a bit trickier than I expected. It means making exact prediction is hard, but there is one simple reason to be optimistic: if progress stalls, we can always add more layers :)

Stack more layers

Joking aside, instead of a mindless curve fitting, I assume a better way to predict rating is to model rating distribution of candidate networks as a function of rating, and workout probability of network getting promoted and its Elo. Having said that, it's not something I know how to do right now. Maybe next week :)