[1]:
import pandas as pd
import cobra
from BFAIR.mfa.INCA import INCA_reimport
from BFAIR.mfa.sampling import (
    model_rxn_overlap,
    rxn_coverage,
    split_lumped_rxns,
    split_lumped_reverse_rxns,
    find_reverse_rxns,
    combine_split_rxns,
    cobra_add_split_rxns,
    find_biomass_reaction,
    replace_biomass_rxn_name,
)
Determination of memory status is not supported on this
 platform, measuring for memoryleaks will never fail

INCA re-import

First, let’s reimport the data using our BFAIR INCA_reimport tools

[2]:
filename = 'data/MFA_modelInputsData/TestFile.mat'
simulation_info = pd.read_csv('data/MFA_modelInputsData/Re-import/experimentalMS_data_I.csv')
simulation_id = 'WTEColi_113C80_U13C20_01'

Here we re-import the INCA output

[3]:
reimport_data = INCA_reimport()
(fittedData,
 fittedFluxes,
 fittedFragments,
 fittedMeasuredFluxes,
 fittedMeasuredFragments,
 fittedMeasuredFluxResiduals,
 fittedMeasuredFragmentResiduals,
 simulationParameters) = reimport_data.reimport(
    filename,
    simulation_info,
    simulation_id
)
[4]:
fittedFluxes
[4]:
simulation_id simulation_dateAndTime rxn_id flux flux_stdev flux_lb flux_ub flux_units fit_alf fit_chi2s fit_cor fit_cov free used_ comment_
0 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 26dap_DASH_MSYN 2.295040e-01 0.002608 0.224392 0.234616 mmol*gDCW-1*hr-1 0.05 None None None False True None
1 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ACONTa_ACONTb 2.074886e+00 16996.864976 1.185984 1000.000000 mmol*gDCW-1*hr-1 0.05 None None None False True None
2 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ACONTa_ACONTb_reverse 8.690514e-07 15432.592032 0.000000 28.927600 mmol*gDCW-1*hr-1 0.05 None None None True True None
3 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 AKGDH 1.423617e-01 7673.615592 0.000000 1.919800 mmol*gDCW-1*hr-1 0.05 None None None False True None
4 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ALATA_L 3.435520e-01 0.003904 0.335900 0.351204 mmol*gDCW-1*hr-1 0.05 None None None False True None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
92 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 TPI 2.175603e+00 25620.656341 0.000000 1000.000000 mmol*gDCW-1*hr-1 0.05 None None None False True None
93 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 TPI_reverse 8.689299e-07 24739.593649 0.000000 1000.000000 mmol*gDCW-1*hr-1 0.05 None None None True True None
94 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 TrpSYN 3.801600e-02 0.000432 0.037169 0.038863 mmol*gDCW-1*hr-1 0.05 None None None False True None
95 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 TyrSYN 9.222400e-02 0.001048 0.090170 0.094278 mmol*gDCW-1*hr-1 0.05 None None None False True None
96 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ValSYN 2.830080e-01 0.003216 0.276705 0.289311 mmol*gDCW-1*hr-1 0.05 None None None False True None

97 rows × 15 columns

Here we import the model

[5]:
model = cobra.io.load_json_model('data/FIA_MS_example/database_files/iJO1366.json')
Academic license - for non-commercial use only - expires 2021-07-30
Using license file /Users/matmat/gurobi.lic
[6]:
rxn_coverage(fittedFluxes, model)
61.0 %

61 % of all the reimported reactions have non-overlapping names with the model we want to use for simulations! That a bit much. Let’s do something about that

First, let’s find the name of the biomass reaction in the model and replace the one in our data with it

[7]:
find_biomass_reaction(model)
[7]:
['BIOMASS_Ec_iJO1366_WT_53p95M', 'BIOMASS_Ec_iJO1366_core_53p95M']

This model has two biomass reactions, a full and a reduced core biomass reaction. In the summary we can see that the core biomass reaction is the assigned objective function, so we will reassign this name to our biomass function

[8]:
model.summary()
[8]:

Objective

1.0 BIOMASS_Ec_iJO1366_core_53p95M = 0.9823718127269793

Uptake

Metabolite Reaction Flux C-Number C-Flux
ca2_e EX_ca2_e 0.005113 0 0.00%
cl_e EX_cl_e 0.005113 0 0.00%
cobalt2_e EX_cobalt2_e 2.456E-05 0 0.00%
cu2_e EX_cu2_e 0.0006965 0 0.00%
fe2_e EX_fe2_e 0.01578 0 0.00%
glc__D_e EX_glc__D_e 10 6 100.00%
k_e EX_k_e 0.1918 0 0.00%
mg2_e EX_mg2_e 0.008522 0 0.00%
mn2_e EX_mn2_e 0.0006788 0 0.00%
mobd_e EX_mobd_e 0.0001267 0 0.00%
nh4_e EX_nh4_e 10.61 0 0.00%
ni2_e EX_ni2_e 0.0003173 0 0.00%
o2_e EX_o2_e 17.58 0 0.00%
pi_e EX_pi_e 0.9476 0 0.00%
so4_e EX_so4_e 0.2478 0 0.00%
zn2_e EX_zn2_e 0.000335 0 0.00%

Secretion

Metabolite Reaction Flux C-Number C-Flux
4crsol_c DM_4crsol_c -0.0002191 7 0.01%
5drib_c DM_5drib_c -0.000221 5 0.01%
amob_c DM_amob_c -1.965E-06 15 0.00%
mththf_c DM_mththf_c -0.0004401 5 0.01%
co2_e EX_co2_e -19.68 1 99.98%
h2o_e EX_h2o_e -45.62 0 0.00%
h_e EX_h_e -9.026 0 0.00%
meoh_e EX_meoh_e -1.965E-06 1 0.00%
[9]:
fittedFluxes = replace_biomass_rxn_name(fittedFluxes, biomass_string='Biomass', biomass_rxn_name='BIOMASS_Ec_iJO1366_core_53p95M')

Next step, adjust the names of our MFA data so that they can be assigned to our model’s reactions

[10]:
model_rxn_overlap(fittedFluxes, model)
[10]:
0                       26dap_DASH_MSYN
1                         ACONTa_ACONTb
2                 ACONTa_ACONTb_reverse
5                                ArgSYN
14               EX_ac_LPAREN_e_RPAREN_
15              EX_co2_LPAREN_e_RPAREN_
16    EX_co2_LPAREN_e_RPAREN__unlabeled
17              EX_glc_LPAREN_e_RPAREN_
18              EX_nh4_LPAREN_e_RPAREN_
19               EX_o2_LPAREN_e_RPAREN_
20              EX_so4_LPAREN_e_RPAREN_
21           FADR_NADH_CYTBD_HYD_ATPS4r
23                          FBA_reverse
25                          FUM_reverse
26                          G6PDH2r_PGL
27                             GAPD_PGK
28                     GAPD_PGK_reverse
30                       GHMT2r_reverse
34                               GluSYN
37                               HisSYN
39                       ICDHyr_reverse
41                               IleSYN
42                               LeuSYN
45                          MDH_reverse
48                               MetSYN
49                             MlthfSYN
50                     MlthfSYN_reverse
53                NADH_CYTBD_HYD_ATPS4r
54                       NADTRHD_THD2pp
55               NADTRHD_THD2pp_reverse
59                          PGI_reverse
61                          PGM_reverse
62                               PheSYN
65                               ProSYN
67                        PTAr_ACKr_ACS
68                PTAr_ACKr_ACS_reverse
71                          RPE_reverse
73                          RPI_reverse
74                           SERAT_CYSS
75                               SerSYN
76                              SUCCOAS
77                      SUCCOAS_reverse
79                        SUCDi_reverse
81                         TALA_reverse
82                           THRD_GLYAT
83                               ThrSYN
84                                TKT1a
85                        TKT1a_reverse
86                                TKT1b
87                        TKT1b_reverse
88                                TKT2a
89                        TKT2a_reverse
90                                TKT2b
91                        TKT2b_reverse
93                          TPI_reverse
94                               TrpSYN
95                               TyrSYN
96                               ValSYN
Name: rxn_id, dtype: object

Observations: 1) some reaction names include more than one metabolite 2) many unassigned amino acids end with SYN and 3) some exchange reactions include LPAREN_ and RPAREN_. Let’s try to do something about that 4) probably all _reverse reactions could not be assigned

  1. Split the lumped reactions and give all of them the same bounds

So let’s pick the ones we want. Let’s save the reverse reactions for a separate step

[11]:
lumped_ids = [1, 21, 26, 27, 53, 54, 67, 74, 82]
mask = []
overlap = model_rxn_overlap(fittedFluxes, model)
for i in overlap.iteritems():
    if i[0] in lumped_ids:
        mask.append(True)
    else:
        mask.append(False)
[12]:
lumped_rxns = model_rxn_overlap(fittedFluxes, model)[mask]
fittedFluxes = split_lumped_rxns(lumped_rxns, fittedFluxes)
[13]:
lumped_reverse_ids = [2, 28, 55, 68]
mask_reverse = []
for i in model_rxn_overlap(fittedFluxes, model).iteritems():
    if i[0] in lumped_reverse_ids:
        mask_reverse.append(True)
    else:
        mask_reverse.append(False)
[14]:
lumped_reverse_rxns = model_rxn_overlap(fittedFluxes, model)[mask_reverse]
fittedFluxes = split_lumped_reverse_rxns(lumped_reverse_rxns, fittedFluxes)
ACONTa_ACONTb_reverse
GAPD_PGK_reverse
NADTRHD_THD2pp_reverse
PTAr_ACKr_ACS_reverse
[15]:
model_rxn_overlap(fittedFluxes, model)
[15]:
0              26dap_DASH_MSYN
2               ACONTa_reverse
5                       ArgSYN
14      EX_ac_LPAREN_e_RPAREN_
15     EX_co2_LPAREN_e_RPAREN_
                ...
112             ACONTb_reverse
113                PGK_reverse
114             THD2pp_reverse
115               ACKr_reverse
116                ACS_reverse
Name: rxn_id, Length: 63, dtype: object
[16]:
rxn_coverage(fittedFluxes, model)
54.0 %
  1. SYN, these reactions might be lumped; let’s investigate!

[17]:
for rxn in model.reactions:
    if 'ARG' in rxn.id:
        print(rxn.id)
ARGAGMt7pp
ARGDC
ARGDCpp
ARGORNt7pp
ARGSL
ARGSS
ARGTRS
ARGabcpp
ARGt3pp
ARGtex

Yeah I guess so… This sucks, not sure of we can do anything about that

  1. Let’s remove the extra bits in the exchange reaction strings

[18]:
for i, row in fittedFluxes.iterrows():
    if 'LPAREN_' in row['rxn_id']:
        fittedFluxes.at[i, 'rxn_id'] = row['rxn_id'].replace('LPAREN_', '').replace('_RPAREN_', '')
[19]:
rxn_coverage(fittedFluxes, model)
50.0 %
  1. Reverse. Let’s check if the forward and reverse fluxes are actually separate. If not, then the two of them will define the bounds together. If they are, then we should add new reverse reactions to the model.

[20]:
find_reverse_rxns(fittedFluxes)
[20]:
forward reverse
2 ACONTa ACONTa_reverse
23 FBA FBA_reverse
25 FUM FUM_reverse
28 GAPD GAPD_reverse
30 GHMT2r GHMT2r_reverse
39 ICDHyr ICDHyr_reverse
45 MDH MDH_reverse
50 MlthfSYN MlthfSYN_reverse
55 NADTRHD NADTRHD_reverse
59 PGI PGI_reverse
61 PGM PGM_reverse
68 PTAr PTAr_reverse
71 RPE RPE_reverse
73 RPI RPI_reverse
77 SUCCOAS SUCCOAS_reverse
79 SUCDi SUCDi_reverse
81 TALA TALA_reverse
85 TKT1a TKT1a_reverse
87 TKT1b TKT1b_reverse
89 TKT2a TKT2a_reverse
91 TKT2b TKT2b_reverse
93 TPI TPI_reverse
112 ACONTb ACONTb_reverse
113 PGK PGK_reverse
114 THD2pp THD2pp_reverse
115 ACKr ACKr_reverse
116 ACS ACS_reverse
[21]:
fittedFluxes, rxns_to_split = combine_split_rxns(fittedFluxes)
These reactions need to be split into two: ACONTa
These reactions need to be split into two: FUM
These reactions need to be split into two: GAPD
These reactions need to be split into two: ICDHyr
These reactions need to be split into two: MlthfSYN
These reactions need to be split into two: PGM
These reactions need to be split into two: PTAr
These reactions need to be split into two: ACONTb
These reactions need to be split into two: PGK
These reactions need to be split into two: ACKr
These reactions need to be split into two: ACS
[22]:
fittedFluxes
[22]:
index simulation_id simulation_dateAndTime rxn_id flux flux_stdev flux_lb flux_ub flux_units fit_alf fit_chi2s fit_cor fit_cov free used_ comment_
0 0 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 26dap_DASH_MSYN 2.295040e-01 2.607999e-03 0.224392 0.234616 mmol*gDCW-1*hr-1 0.05 None None None False True None
1 1 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ACONTa 2.074886e+00 1.699686e+04 1.185984 1000.000000 mmol*gDCW-1*hr-1 0.05 None None None False True None
2 2 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ACONTa_reverse 8.690514e-07 1.543259e+04 0.000000 28.927600 mmol*gDCW-1*hr-1 0.05 None None None True True None
3 3 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 AKGDH 1.423617e-01 7.673616e+03 0.000000 1.919800 mmol*gDCW-1*hr-1 0.05 None None None False True None
4 4 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ALATA_L 3.435520e-01 3.904002e-03 0.335900 0.351204 mmol*gDCW-1*hr-1 0.05 None None None False True None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
96 111 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 GLYAT 1.000000e-07 6.096786e-12 0.000000 0.035201 mmol*gDCW-1*hr-1 0.05 None None None False True None
97 112 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ACONTb_reverse 8.690514e-07 1.543259e+04 0.000000 28.927600 mmol*gDCW-1*hr-1 0.05 None None None True True None
98 113 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 PGK_reverse 8.689295e-07 9.920303e+03 0.000000 24.418874 mmol*gDCW-1*hr-1 0.05 None None None True True None
99 115 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ACKr_reverse 8.695792e-07 2.139235e+04 0.000000 28.262368 mmol*gDCW-1*hr-1 0.05 None None None True True None
100 116 WTEColi_113C80_U13C20_01 2021-04-16 18:29:37 ACS_reverse 8.695792e-07 2.139235e+04 0.000000 28.262368 mmol*gDCW-1*hr-1 0.05 None None None True True None

101 rows × 16 columns

The reactions that are acutally separate (i.e. non-overlapping bounds, exchange fluxes) are a problem. COBRA has some ways to account for that but they seem to be quite involved. An easier way to deal with that is that just add the reverse reaction as a separate reaction to the model; it’s the same reaction, just with the inverse direction. The following method is “destructive”, i.e. it will alter the model. Be aware of that.

[28]:
rxns_to_split
[28]:
['ACONTa',
 'FUM',
 'GAPD',
 'ICDHyr',
 'MlthfSYN',
 'PGM',
 'PTAr',
 'ACONTb',
 'PGK',
 'ACKr',
 'ACS']
[23]:
cobra_add_split_rxns(rxns_to_split, model)
- Added ACONTa to model
- Added FUM to model
- Added GAPD to model
- Added ICDHyr to model
# Could not add MlthfSYN to model
- Added PGM to model
- Added PTAr to model
- Added ACONTb to model
- Added PGK to model
- Added ACKr to model
- Added ACS to model
[24]:
model_rxn_overlap(fittedFluxes, model)
[24]:
0        26dap_DASH_MSYN
5                 ArgSYN
16    EX_co2_e_unlabeled
17              EX_glc_e
21                  FADR
32                GluSYN
35                HisSYN
39                IleSYN
40                LeuSYN
45                MetSYN
46              MlthfSYN
47      MlthfSYN_reverse
50                  NADH
57                PheSYN
60                ProSYN
68                SerSYN
69               SUCCOAS
73                ThrSYN
74                 TKT1a
75                 TKT1b
76                 TKT2a
77                 TKT2b
79                TrpSYN
80                TyrSYN
81                ValSYN
83                  NADH
84                 CYTBD
85                   HYD
86                ATPS4r
89                 CYTBD
90                   HYD
91                ATPS4r
Name: rxn_id, dtype: object
[25]:
rxn_coverage(fittedFluxes, model)
32.0 %

Ok… well, guess you can’t please everyone. You could of course find a way to add the remaining reactions manually or investigate further how to distribute their values to other reactions but this is as far as we will go here.

For the next steps, please check the MFA_feasibility_and_sampling notebook

[26]:
fittedFluxes.to_pickle("data/MFA_sampling/preprocessed_fittedFluxes.obj")
[27]:
cobra.io.save_json_model(model, "data/MFA_sampling/preprocessed_model.json")
[ ]: