[1]:

import pandas as pd
import cobra
from BFAIR.mfa.INCA import INCA_reimport
from BFAIR.mfa.sampling import (
    model_rxn_overlap,
    rxn_coverage,
    split_lumped_rxns,
    split_lumped_reverse_rxns,
    find_reverse_rxns,
    combine_split_rxns,
    cobra_add_split_rxns,
    find_biomass_reaction,
    replace_biomass_rxn_name,
)

Determination of memory status is not supported on this
 platform, measuring for memoryleaks will never fail

INCA re-import¶

First, let’s reimport the data using our BFAIR INCA_reimport tools

[2]:

filename = 'data/MFA_modelInputsData/TestFile.mat'
simulation_info = pd.read_csv('data/MFA_modelInputsData/Re-import/experimentalMS_data_I.csv')
simulation_id = 'WTEColi_113C80_U13C20_01'

Here we re-import the INCA output

[3]:

reimport_data = INCA_reimport()
(fittedData,
 fittedFluxes,
 fittedFragments,
 fittedMeasuredFluxes,
 fittedMeasuredFragments,
 fittedMeasuredFluxResiduals,
 fittedMeasuredFragmentResiduals,
 simulationParameters) = reimport_data.reimport(
    filename,
    simulation_info,
    simulation_id
)

[4]:

fittedFluxes

[4]:

	simulation_id	simulation_dateAndTime	rxn_id	flux	flux_stdev	flux_lb	flux_ub	flux_units	fit_alf	fit_chi2s	fit_cor	fit_cov	free	used_	comment_
0	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	26dap_DASH_MSYN	2.295040e-01	0.002608	0.224392	0.234616	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
1	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ACONTa_ACONTb	2.074886e+00	16996.864976	1.185984	1000.000000	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
2	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ACONTa_ACONTb_reverse	8.690514e-07	15432.592032	0.000000	28.927600	mmolgDCW-1hr-1	0.05	None	None	None	True	True	None
3	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	AKGDH	1.423617e-01	7673.615592	0.000000	1.919800	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
4	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ALATA_L	3.435520e-01	0.003904	0.335900	0.351204	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
92	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	TPI	2.175603e+00	25620.656341	0.000000	1000.000000	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
93	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	TPI_reverse	8.689299e-07	24739.593649	0.000000	1000.000000	mmolgDCW-1hr-1	0.05	None	None	None	True	True	None
94	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	TrpSYN	3.801600e-02	0.000432	0.037169	0.038863	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
95	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	TyrSYN	9.222400e-02	0.001048	0.090170	0.094278	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
96	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ValSYN	2.830080e-01	0.003216	0.276705	0.289311	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None

97 rows × 15 columns

Here we import the model

[5]:

model = cobra.io.load_json_model('data/FIA_MS_example/database_files/iJO1366.json')

Academic license - for non-commercial use only - expires 2021-07-30
Using license file /Users/matmat/gurobi.lic

[6]:

rxn_coverage(fittedFluxes, model)

61.0 %

61 % of all the reimported reactions have non-overlapping names with the model we want to use for simulations! That a bit much. Let’s do something about that

First, let’s find the name of the biomass reaction in the model and replace the one in our data with it

[7]:

find_biomass_reaction(model)

[7]:

['BIOMASS_Ec_iJO1366_WT_53p95M', 'BIOMASS_Ec_iJO1366_core_53p95M']

This model has two biomass reactions, a full and a reduced core biomass reaction. In the summary we can see that the core biomass reaction is the assigned objective function, so we will reassign this name to our biomass function

[8]:

model.summary()

[8]:

Objective

1.0 BIOMASS_Ec_iJO1366_core_53p95M = 0.9823718127269793

Uptake

Metabolite	Reaction	Flux	C-Number	C-Flux
ca2_e	EX_ca2_e	0.005113	0	0.00%
cl_e	EX_cl_e	0.005113	0	0.00%
cobalt2_e	EX_cobalt2_e	2.456E-05	0	0.00%
cu2_e	EX_cu2_e	0.0006965	0	0.00%
fe2_e	EX_fe2_e	0.01578	0	0.00%
glc__D_e	EX_glc__D_e	10	6	100.00%
k_e	EX_k_e	0.1918	0	0.00%
mg2_e	EX_mg2_e	0.008522	0	0.00%
mn2_e	EX_mn2_e	0.0006788	0	0.00%
mobd_e	EX_mobd_e	0.0001267	0	0.00%
nh4_e	EX_nh4_e	10.61	0	0.00%
ni2_e	EX_ni2_e	0.0003173	0	0.00%
o2_e	EX_o2_e	17.58	0	0.00%
pi_e	EX_pi_e	0.9476	0	0.00%
so4_e	EX_so4_e	0.2478	0	0.00%
zn2_e	EX_zn2_e	0.000335	0	0.00%

Secretion

Metabolite	Reaction	Flux	C-Number	C-Flux
4crsol_c	DM_4crsol_c	-0.0002191	7	0.01%
5drib_c	DM_5drib_c	-0.000221	5	0.01%
amob_c	DM_amob_c	-1.965E-06	15	0.00%
mththf_c	DM_mththf_c	-0.0004401	5	0.01%
co2_e	EX_co2_e	-19.68	1	99.98%
h2o_e	EX_h2o_e	-45.62	0	0.00%
h_e	EX_h_e	-9.026	0	0.00%
meoh_e	EX_meoh_e	-1.965E-06	1	0.00%

[9]:

fittedFluxes = replace_biomass_rxn_name(fittedFluxes, biomass_string='Biomass', biomass_rxn_name='BIOMASS_Ec_iJO1366_core_53p95M')

Next step, adjust the names of our MFA data so that they can be assigned to our model’s reactions

[10]:

model_rxn_overlap(fittedFluxes, model)

[10]:

0                       26dap_DASH_MSYN
1                         ACONTa_ACONTb
2                 ACONTa_ACONTb_reverse
5                                ArgSYN
14               EX_ac_LPAREN_e_RPAREN_
15              EX_co2_LPAREN_e_RPAREN_
16    EX_co2_LPAREN_e_RPAREN__unlabeled
17              EX_glc_LPAREN_e_RPAREN_
18              EX_nh4_LPAREN_e_RPAREN_
19               EX_o2_LPAREN_e_RPAREN_
20              EX_so4_LPAREN_e_RPAREN_
21           FADR_NADH_CYTBD_HYD_ATPS4r
23                          FBA_reverse
25                          FUM_reverse
26                          G6PDH2r_PGL
27                             GAPD_PGK
28                     GAPD_PGK_reverse
30                       GHMT2r_reverse
34                               GluSYN
37                               HisSYN
39                       ICDHyr_reverse
41                               IleSYN
42                               LeuSYN
45                          MDH_reverse
48                               MetSYN
49                             MlthfSYN
50                     MlthfSYN_reverse
53                NADH_CYTBD_HYD_ATPS4r
54                       NADTRHD_THD2pp
55               NADTRHD_THD2pp_reverse
59                          PGI_reverse
61                          PGM_reverse
62                               PheSYN
65                               ProSYN
67                        PTAr_ACKr_ACS
68                PTAr_ACKr_ACS_reverse
71                          RPE_reverse
73                          RPI_reverse
74                           SERAT_CYSS
75                               SerSYN
76                              SUCCOAS
77                      SUCCOAS_reverse
79                        SUCDi_reverse
81                         TALA_reverse
82                           THRD_GLYAT
83                               ThrSYN
84                                TKT1a
85                        TKT1a_reverse
86                                TKT1b
87                        TKT1b_reverse
88                                TKT2a
89                        TKT2a_reverse
90                                TKT2b
91                        TKT2b_reverse
93                          TPI_reverse
94                               TrpSYN
95                               TyrSYN
96                               ValSYN
Name: rxn_id, dtype: object

Observations: 1) some reaction names include more than one metabolite 2) many unassigned amino acids end with SYN and 3) some exchange reactions include LPAREN_ and RPAREN_. Let’s try to do something about that 4) probably all _reverse reactions could not be assigned

Split the lumped reactions and give all of them the same bounds

So let’s pick the ones we want. Let’s save the reverse reactions for a separate step

[11]:

lumped_ids = [1, 21, 26, 27, 53, 54, 67, 74, 82]
mask = []
overlap = model_rxn_overlap(fittedFluxes, model)
for i in overlap.iteritems():
    if i[0] in lumped_ids:
        mask.append(True)
    else:
        mask.append(False)

[12]:

lumped_rxns = model_rxn_overlap(fittedFluxes, model)[mask]
fittedFluxes = split_lumped_rxns(lumped_rxns, fittedFluxes)

[13]:

lumped_reverse_ids = [2, 28, 55, 68]
mask_reverse = []
for i in model_rxn_overlap(fittedFluxes, model).iteritems():
    if i[0] in lumped_reverse_ids:
        mask_reverse.append(True)
    else:
        mask_reverse.append(False)

[14]:

lumped_reverse_rxns = model_rxn_overlap(fittedFluxes, model)[mask_reverse]
fittedFluxes = split_lumped_reverse_rxns(lumped_reverse_rxns, fittedFluxes)

ACONTa_ACONTb_reverse
GAPD_PGK_reverse
NADTRHD_THD2pp_reverse
PTAr_ACKr_ACS_reverse

[15]:

model_rxn_overlap(fittedFluxes, model)

[15]:

0              26dap_DASH_MSYN
2               ACONTa_reverse
5                       ArgSYN
14      EX_ac_LPAREN_e_RPAREN_
15     EX_co2_LPAREN_e_RPAREN_
                ...
112             ACONTb_reverse
113                PGK_reverse
114             THD2pp_reverse
115               ACKr_reverse
116                ACS_reverse
Name: rxn_id, Length: 63, dtype: object

[16]:

rxn_coverage(fittedFluxes, model)

54.0 %

SYN, these reactions might be lumped; let’s investigate!

[17]:

for rxn in model.reactions:
    if 'ARG' in rxn.id:
        print(rxn.id)

ARGAGMt7pp
ARGDC
ARGDCpp
ARGORNt7pp
ARGSL
ARGSS
ARGTRS
ARGabcpp
ARGt3pp
ARGtex

Yeah I guess so… This sucks, not sure of we can do anything about that

Let’s remove the extra bits in the exchange reaction strings

[18]:

for i, row in fittedFluxes.iterrows():
    if 'LPAREN_' in row['rxn_id']:
        fittedFluxes.at[i, 'rxn_id'] = row['rxn_id'].replace('LPAREN_', '').replace('_RPAREN_', '')

[19]:

rxn_coverage(fittedFluxes, model)

50.0 %

Reverse. Let’s check if the forward and reverse fluxes are actually separate. If not, then the two of them will define the bounds together. If they are, then we should add new reverse reactions to the model.

[20]:

find_reverse_rxns(fittedFluxes)

[20]:

	forward	reverse
2	ACONTa	ACONTa_reverse
23	FBA	FBA_reverse
25	FUM	FUM_reverse
28	GAPD	GAPD_reverse
30	GHMT2r	GHMT2r_reverse
39	ICDHyr	ICDHyr_reverse
45	MDH	MDH_reverse
50	MlthfSYN	MlthfSYN_reverse
55	NADTRHD	NADTRHD_reverse
59	PGI	PGI_reverse
61	PGM	PGM_reverse
68	PTAr	PTAr_reverse
71	RPE	RPE_reverse
73	RPI	RPI_reverse
77	SUCCOAS	SUCCOAS_reverse
79	SUCDi	SUCDi_reverse
81	TALA	TALA_reverse
85	TKT1a	TKT1a_reverse
87	TKT1b	TKT1b_reverse
89	TKT2a	TKT2a_reverse
91	TKT2b	TKT2b_reverse
93	TPI	TPI_reverse
112	ACONTb	ACONTb_reverse
113	PGK	PGK_reverse
114	THD2pp	THD2pp_reverse
115	ACKr	ACKr_reverse
116	ACS	ACS_reverse

[21]:

fittedFluxes, rxns_to_split = combine_split_rxns(fittedFluxes)

These reactions need to be split into two: ACONTa
These reactions need to be split into two: FUM
These reactions need to be split into two: GAPD
These reactions need to be split into two: ICDHyr
These reactions need to be split into two: MlthfSYN
These reactions need to be split into two: PGM
These reactions need to be split into two: PTAr
These reactions need to be split into two: ACONTb
These reactions need to be split into two: PGK
These reactions need to be split into two: ACKr
These reactions need to be split into two: ACS

[22]:

fittedFluxes

[22]:

	index	simulation_id	simulation_dateAndTime	rxn_id	flux	flux_stdev	flux_lb	flux_ub	flux_units	fit_alf	fit_chi2s	fit_cor	fit_cov	free	used_	comment_
0	0	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	26dap_DASH_MSYN	2.295040e-01	2.607999e-03	0.224392	0.234616	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
1	1	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ACONTa	2.074886e+00	1.699686e+04	1.185984	1000.000000	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
2	2	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ACONTa_reverse	8.690514e-07	1.543259e+04	0.000000	28.927600	mmolgDCW-1hr-1	0.05	None	None	None	True	True	None
3	3	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	AKGDH	1.423617e-01	7.673616e+03	0.000000	1.919800	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
4	4	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ALATA_L	3.435520e-01	3.904002e-03	0.335900	0.351204	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
96	111	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	GLYAT	1.000000e-07	6.096786e-12	0.000000	0.035201	mmolgDCW-1hr-1	0.05	None	None	None	False	True	None
97	112	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ACONTb_reverse	8.690514e-07	1.543259e+04	0.000000	28.927600	mmolgDCW-1hr-1	0.05	None	None	None	True	True	None
98	113	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	PGK_reverse	8.689295e-07	9.920303e+03	0.000000	24.418874	mmolgDCW-1hr-1	0.05	None	None	None	True	True	None
99	115	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ACKr_reverse	8.695792e-07	2.139235e+04	0.000000	28.262368	mmolgDCW-1hr-1	0.05	None	None	None	True	True	None
100	116	WTEColi_113C80_U13C20_01	2021-04-16 18:29:37	ACS_reverse	8.695792e-07	2.139235e+04	0.000000	28.262368	mmolgDCW-1hr-1	0.05	None	None	None	True	True	None

101 rows × 16 columns

The reactions that are acutally separate (i.e. non-overlapping bounds, exchange fluxes) are a problem. COBRA has some ways to account for that but they seem to be quite involved. An easier way to deal with that is that just add the reverse reaction as a separate reaction to the model; it’s the same reaction, just with the inverse direction. The following method is “destructive”, i.e. it will alter the model. Be aware of that.

[28]:

rxns_to_split

[28]:

['ACONTa',
 'FUM',
 'GAPD',
 'ICDHyr',
 'MlthfSYN',
 'PGM',
 'PTAr',
 'ACONTb',
 'PGK',
 'ACKr',
 'ACS']

[23]:

cobra_add_split_rxns(rxns_to_split, model)

- Added ACONTa to model
- Added FUM to model
- Added GAPD to model
- Added ICDHyr to model
# Could not add MlthfSYN to model
- Added PGM to model
- Added PTAr to model
- Added ACONTb to model
- Added PGK to model
- Added ACKr to model
- Added ACS to model

[24]:

model_rxn_overlap(fittedFluxes, model)

[24]:

0        26dap_DASH_MSYN
5                 ArgSYN
16    EX_co2_e_unlabeled
17              EX_glc_e
21                  FADR
32                GluSYN
35                HisSYN
39                IleSYN
40                LeuSYN
45                MetSYN
46              MlthfSYN
47      MlthfSYN_reverse
50                  NADH
57                PheSYN
60                ProSYN
68                SerSYN
69               SUCCOAS
73                ThrSYN
74                 TKT1a
75                 TKT1b
76                 TKT2a
77                 TKT2b
79                TrpSYN
80                TyrSYN
81                ValSYN
83                  NADH
84                 CYTBD
85                   HYD
86                ATPS4r
89                 CYTBD
90                   HYD
91                ATPS4r
Name: rxn_id, dtype: object

[25]:

rxn_coverage(fittedFluxes, model)

32.0 %

Ok… well, guess you can’t please everyone. You could of course find a way to add the remaining reactions manually or investigate further how to distribute their values to other reactions but this is as far as we will go here.

For the next steps, please check the MFA_feasibility_and_sampling notebook

[26]:

fittedFluxes.to_pickle("data/MFA_sampling/preprocessed_fittedFluxes.obj")

[27]:

cobra.io.save_json_model(model, "data/MFA_sampling/preprocessed_model.json")

[ ]: