Exact solutions for iAF1260 in cobra

This paper alleges that iAF1260 only computes due to numerical error. Here, I will demonstrate that this is false by computing rational and integer solutions to the model.

Setting up the environment

In [1]:
from os.path import expanduser
from fractions import Fraction

from numpy import array, ceil, sign
import pandas

import cobra

Print the installed cobrapy solver versions. cglpk and glpk are just different bindings to different versions of the GLPK solver, while gurobi and cplex are from the python bindings shipped by the supplier.

In [2]:
from gurobipy import gurobi
import glpk

print "cglpk:", cobra.solvers.cglpk.__glpk_version__
print "glpk:", ".".join(str(i) for i in glpk.env.version)
print "gurobi:", ".".join(str(i) for i in gurobi.version())
print "cplex:", cobra.solvers.cplex_solver.Cplex().get_version()
cglpk: 4.55
glpk: 4.45
gurobi: 5.6.3
cplex: 12.4.0.0

Getting the model

This file encoding iAF1260 was downloaded from bigg

In [3]:
original = cobra.io.read_legacy_sbml(expanduser("~/Downloads/iAF1260.xml"))
original.id = original.description = "iAF1260"

Solving rationally with E-solver

First we solve the problem using each of the floating point solvers

In [4]:
solver_results = {solver: original.optimize(solver=solver).f for solver in cobra.solvers.solver_dict}
solver_results = pandas.Series(solver_results)
solver_results
Out[4]:
cglpk     0.736701
cplex     0.736701
glpk      0.736701
gurobi    0.736701
dtype: float64

The MATLAB COBRA toolbox with GLPK also has an identical solution.

In [5]:
cobra.io.save_matlab_model(original, "iAF1260.mat")
In [6]:
%load_ext pymatbridge
Starting MATLAB on ZMQ socket ipc:///tmp/pymatbridge
Send 'exit' command to kill the server
.MATLAB started and connected!
In [7]:
%%matlab --silent -o mat_sol
addpath('~/cobratoolbox')
initCobraToolbox
load iAF1260.mat
result = optimizeCbModel(iAF1260)
mat_sol = result.f
In [8]:
solver_results["COBRA toolbox"] = mat_sol
mat_sol
Out[8]:
0.7367009387595181

Now we try it again with E-solver

In [9]:
lp = cobra.solvers.glpk_solver.create_problem(original)
lp.write(cpxlp="iAF1260.lp")
!esolver -O iAF1260 iAF1260.lp
Using EGlib (SVN-version 2.6.18:412, built Mar 29 2012-09:24:26)
Using QSopt_ex (SVN-version 2.5.10:518, built Mar 29 2012-09:29:26)
Host: clostridium, Linux 3.2.0-69-generic x86_64
Current process id: 7764
Using EG-GMP mempool
Reading problem from iAF1260.lp
Cur rtime limit -1, trying to set to 2.14748e+09
New rtime limit 2147483647 (2.15e+09)
Cur data limit -1,-1 (soft,hard)
New data limit 4294967295,-1 (soft,hard)
Cur address space limit -1,-1 (soft,hard)
New address space limit 4294967295,-1 (soft,hard)
New core dump space limit 0,-1 (soft,hard)
Data Warning: Setting problem name to "unnamed".
mpq_ILLlp_add_logicals ...
Time for SOLVER_READ_MPQ: 0.02 seconds.
================================================================================
	Trying double precision
================================================================================
starting dbl_ILLsimplex on scaled_lp...
Problem has 1668 rows and 4050 cols and 10899 nonzeros
starting primal phase I, nosolve 1
(0): primal infeas = 861351480.8485694
(100): primal infeas = 34111034.5800002
(200): primal infeas = 5287009.7090741
(300): primal infeas = 2592998.2718045
(400): primal infeas = 892639.6506070
(500): primal infeas = 451457.5295782
(600): primal infeas = 29546.6237273
starting primal phase II, nosolve 66
(700): primal objval = -0.0000020
(800): primal objval = -0.0000299
(900): primal objval = -0.0008766
(1000): primal objval = -0.7367155
completed dbl_ILLsimplex
scaled_lp: time = 0.180, pI = 611, pII = 391, dI = 0, dII = 0, opt = -0.736701
starting dbl_ILLsimplex on dbl_problem...
Problem has 1668 rows and 4050 cols and 10899 nonzeros
completed dbl_ILLsimplex
dbl_problem: time = 0.004, pI = 0, pII = 0, dI = 0, dII = 0, opt = -0.736701
LP Value: 0.736701, status 1
solution is infeasible for constraint zn2_c, violation -9.59801e-20, in QSexact_optimal_test (src/exact.c:553)
Performing Rational Basic Solve on unnamed, RAT_optimal, check done in 0.076005 seconds, PS F 0, DS F 0, in QSexact_basis_status (src/exact.c:1087)
Retesting solution, in QSexact_solver (src/exact.c:1519)
Problem solved to optimality, LP value 0.736701, in QSexact_optimal_test (src/exact.c:746)
================================================================================
	Problem Solved Exactly
================================================================================
Time for SOLVER: 0.28 seconds.
Disabling EG-GMP mempool

The problem solves exactly, with an identical solution to the ones from the floating point solvers with a precision of $1\times10^{-10}$.

In [10]:
!gzip -d -f iAF1260.sol.gz
exact_result_str = !head iAF1260.sol | grep Value
exact_result = exact_result_str[0].split("=")[1].strip()
exact_result
Out[10]:
'146888000/199386199'
In [11]:
pandas.DataFrame.from_dict({"values": solver_results, "error": solver_results - float(Fraction(exact_result))})
Out[11]:
error values
cglpk 2.087197e-11 0.736701
cplex -2.066314e-11 0.736701
glpk 1.858336e-11 0.736701
gurobi -1.203304e-11 0.736701
COBRA toolbox -1.053628e-10 0.736701

Create an "integer" version of the model

In [12]:
m = original.copy()
m.id += "_int"

Stoichiometry

In [13]:
# make the biomass an integerized version
biomass = m.reactions.get_by_id("Ec_biomass_iAF1260_core_59p81M")
biomass._metabolites = {met: ceil(abs(stoic)) * sign(stoic) for met, stoic in biomass.metabolites.items()}
In [14]:
m_array = m.to_array_based_model()
S = m_array.S.todense()
m_index, r_index = abs(S - S.astype(int)).nonzero()
# I checked, these all have 0.5 O2 in the stoichiometry
for r in r_index.tolist()[0]:
    m.reactions[r] *= 2  # scaling a reaction by 2 is the same reaction, only now we have integer entries

We now cast $S$ as an integer matrix, and ensure it is equal to the floating point version.

In [15]:
m_array = m.to_array_based_model()
S_float = m_array.S.todense()
S = S_float.astype(int)
abs(S - S_float).max()
Out[15]:
0.0

Bounds

We increase the flux through the already open exchange reactions to compensate for increased biomass requirements and to make all the bounds integers as well. We also will round up ATPM, which fixes the flux to 8.39, a non-integer value.

In [16]:
for r in m.reactions.query("EX_"):
    if r.lower_bound < 0:
        r.lower_bound = -999999.
m.reactions.ATPM.lower_bound = ceil(m.reactions.ATPM.lower_bound)
m.reactions.ATPM.upper_bound = ceil(m.reactions.ATPM.upper_bound)

Integer Computation

Convert the problem to MILP

In [17]:
for r in m.reactions:
    r.variable_kind = "integer"

Using integer math, we can prove that our solution fulfills $ S \cdot v = 0$ exactly

In [18]:
sol = m.optimize(solver="gurobi")  # gurobi has good performance for MILP's
v = array(sol.x).reshape(len(sol.x), 1).round(3).astype(int)  # cast v to int type
abs(S * v).max()
Out[18]:
0

We still satisfy $ \text{upper_bounds} \ge v \ge \text{lower_bounds} $

In [19]:
m_array = m.to_array_based_model()
v_array = v[:, 0]
bool(all((m_array.upper_bounds >= v_array) & (v_array >= m_array.lower_bounds)))
Out[19]:
True

This is not because our solution is simply 0

In [20]:
sol.f
Out[20]:
2256.0

Additionally, iAF1260 can produce every biomass component with an integer solution (exactly 0 error).

In [21]:
m2 = m.copy()
for metabolite in biomass.metabolites:
    r = cobra.Reaction("DM_BIOMASS_" + metabolite.id)
    r.add_metabolites({m2.metabolites.get_by_id(metabolite.id): -1})
    r.variable_kind = "integer"
    m2.add_reaction(r)
m_array = m2.to_array_based_model()
S_float = m_array.S.todense()
S = S_float.astype(int)
abs(S - S_float).max()  # once again, this S consists only of integer values.
Out[21]:
0.0
In [22]:
table = {}
for r in m2.reactions.query("DM_BIOMASS"):
    m2.change_objective(r)
    sol = m2.optimize(solver="gurobi")
    v = array(sol.x).reshape(len(sol.x), 1).round(3).astype(int)
    table[r.id] = {"error": abs(S * v).max(), "production": sol.f}
In [23]:
pandas.DataFrame.from_dict(table).T.min()
Out[23]:
error            0
production    1000
dtype: float64