[1]:
import pandas as pd
import cobra
from BFAIR.mfa.INCA import INCA_reimport
from BFAIR.mfa.sampling import (
model_rxn_overlap,
rxn_coverage,
split_lumped_rxns,
split_lumped_reverse_rxns,
find_reverse_rxns,
combine_split_rxns,
cobra_add_split_rxns,
find_biomass_reaction,
replace_biomass_rxn_name,
)
Determination of memory status is not supported on this
platform, measuring for memoryleaks will never fail
INCA re-import¶
First, let’s reimport the data using our BFAIR INCA_reimport
tools
[2]:
filename = 'data/MFA_modelInputsData/TestFile.mat'
simulation_info = pd.read_csv('data/MFA_modelInputsData/Re-import/experimentalMS_data_I.csv')
simulation_id = 'WTEColi_113C80_U13C20_01'
Here we re-import the INCA output
[3]:
reimport_data = INCA_reimport()
(fittedData,
fittedFluxes,
fittedFragments,
fittedMeasuredFluxes,
fittedMeasuredFragments,
fittedMeasuredFluxResiduals,
fittedMeasuredFragmentResiduals,
simulationParameters) = reimport_data.reimport(
filename,
simulation_info,
simulation_id
)
[4]:
fittedFluxes
[4]:
simulation_id | simulation_dateAndTime | rxn_id | flux | flux_stdev | flux_lb | flux_ub | flux_units | fit_alf | fit_chi2s | fit_cor | fit_cov | free | used_ | comment_ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | 26dap_DASH_MSYN | 2.295040e-01 | 0.002608 | 0.224392 | 0.234616 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
1 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ACONTa_ACONTb | 2.074886e+00 | 16996.864976 | 1.185984 | 1000.000000 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
2 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ACONTa_ACONTb_reverse | 8.690514e-07 | 15432.592032 | 0.000000 | 28.927600 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | True | True | None |
3 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | AKGDH | 1.423617e-01 | 7673.615592 | 0.000000 | 1.919800 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
4 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ALATA_L | 3.435520e-01 | 0.003904 | 0.335900 | 0.351204 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
92 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | TPI | 2.175603e+00 | 25620.656341 | 0.000000 | 1000.000000 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
93 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | TPI_reverse | 8.689299e-07 | 24739.593649 | 0.000000 | 1000.000000 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | True | True | None |
94 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | TrpSYN | 3.801600e-02 | 0.000432 | 0.037169 | 0.038863 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
95 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | TyrSYN | 9.222400e-02 | 0.001048 | 0.090170 | 0.094278 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
96 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ValSYN | 2.830080e-01 | 0.003216 | 0.276705 | 0.289311 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
97 rows × 15 columns
Here we import the model
[5]:
model = cobra.io.load_json_model('data/FIA_MS_example/database_files/iJO1366.json')
Academic license - for non-commercial use only - expires 2021-07-30
Using license file /Users/matmat/gurobi.lic
[6]:
rxn_coverage(fittedFluxes, model)
61.0 %
61 % of all the reimported reactions have non-overlapping names with the model we want to use for simulations! That a bit much. Let’s do something about that
First, let’s find the name of the biomass reaction in the model and replace the one in our data with it
[7]:
find_biomass_reaction(model)
[7]:
['BIOMASS_Ec_iJO1366_WT_53p95M', 'BIOMASS_Ec_iJO1366_core_53p95M']
This model has two biomass reactions, a full and a reduced core biomass reaction. In the summary we can see that the core biomass reaction is the assigned objective function, so we will reassign this name to our biomass function
[8]:
model.summary()
[8]:
Objective
1.0 BIOMASS_Ec_iJO1366_core_53p95M = 0.9823718127269793
Uptake
Metabolite | Reaction | Flux | C-Number | C-Flux |
---|---|---|---|---|
ca2_e | EX_ca2_e | 0.005113 | 0 | 0.00% |
cl_e | EX_cl_e | 0.005113 | 0 | 0.00% |
cobalt2_e | EX_cobalt2_e | 2.456E-05 | 0 | 0.00% |
cu2_e | EX_cu2_e | 0.0006965 | 0 | 0.00% |
fe2_e | EX_fe2_e | 0.01578 | 0 | 0.00% |
glc__D_e | EX_glc__D_e | 10 | 6 | 100.00% |
k_e | EX_k_e | 0.1918 | 0 | 0.00% |
mg2_e | EX_mg2_e | 0.008522 | 0 | 0.00% |
mn2_e | EX_mn2_e | 0.0006788 | 0 | 0.00% |
mobd_e | EX_mobd_e | 0.0001267 | 0 | 0.00% |
nh4_e | EX_nh4_e | 10.61 | 0 | 0.00% |
ni2_e | EX_ni2_e | 0.0003173 | 0 | 0.00% |
o2_e | EX_o2_e | 17.58 | 0 | 0.00% |
pi_e | EX_pi_e | 0.9476 | 0 | 0.00% |
so4_e | EX_so4_e | 0.2478 | 0 | 0.00% |
zn2_e | EX_zn2_e | 0.000335 | 0 | 0.00% |
Secretion
Metabolite | Reaction | Flux | C-Number | C-Flux |
---|---|---|---|---|
4crsol_c | DM_4crsol_c | -0.0002191 | 7 | 0.01% |
5drib_c | DM_5drib_c | -0.000221 | 5 | 0.01% |
amob_c | DM_amob_c | -1.965E-06 | 15 | 0.00% |
mththf_c | DM_mththf_c | -0.0004401 | 5 | 0.01% |
co2_e | EX_co2_e | -19.68 | 1 | 99.98% |
h2o_e | EX_h2o_e | -45.62 | 0 | 0.00% |
h_e | EX_h_e | -9.026 | 0 | 0.00% |
meoh_e | EX_meoh_e | -1.965E-06 | 1 | 0.00% |
[9]:
fittedFluxes = replace_biomass_rxn_name(fittedFluxes, biomass_string='Biomass', biomass_rxn_name='BIOMASS_Ec_iJO1366_core_53p95M')
Next step, adjust the names of our MFA data so that they can be assigned to our model’s reactions
[10]:
model_rxn_overlap(fittedFluxes, model)
[10]:
0 26dap_DASH_MSYN
1 ACONTa_ACONTb
2 ACONTa_ACONTb_reverse
5 ArgSYN
14 EX_ac_LPAREN_e_RPAREN_
15 EX_co2_LPAREN_e_RPAREN_
16 EX_co2_LPAREN_e_RPAREN__unlabeled
17 EX_glc_LPAREN_e_RPAREN_
18 EX_nh4_LPAREN_e_RPAREN_
19 EX_o2_LPAREN_e_RPAREN_
20 EX_so4_LPAREN_e_RPAREN_
21 FADR_NADH_CYTBD_HYD_ATPS4r
23 FBA_reverse
25 FUM_reverse
26 G6PDH2r_PGL
27 GAPD_PGK
28 GAPD_PGK_reverse
30 GHMT2r_reverse
34 GluSYN
37 HisSYN
39 ICDHyr_reverse
41 IleSYN
42 LeuSYN
45 MDH_reverse
48 MetSYN
49 MlthfSYN
50 MlthfSYN_reverse
53 NADH_CYTBD_HYD_ATPS4r
54 NADTRHD_THD2pp
55 NADTRHD_THD2pp_reverse
59 PGI_reverse
61 PGM_reverse
62 PheSYN
65 ProSYN
67 PTAr_ACKr_ACS
68 PTAr_ACKr_ACS_reverse
71 RPE_reverse
73 RPI_reverse
74 SERAT_CYSS
75 SerSYN
76 SUCCOAS
77 SUCCOAS_reverse
79 SUCDi_reverse
81 TALA_reverse
82 THRD_GLYAT
83 ThrSYN
84 TKT1a
85 TKT1a_reverse
86 TKT1b
87 TKT1b_reverse
88 TKT2a
89 TKT2a_reverse
90 TKT2b
91 TKT2b_reverse
93 TPI_reverse
94 TrpSYN
95 TyrSYN
96 ValSYN
Name: rxn_id, dtype: object
Observations: 1) some reaction names include more than one metabolite 2) many unassigned amino acids end with SYN
and 3) some exchange reactions include LPAREN_
and RPAREN_
. Let’s try to do something about that 4) probably all _reverse
reactions could not be assigned
Split the lumped reactions and give all of them the same bounds
So let’s pick the ones we want. Let’s save the reverse reactions for a separate step
[11]:
lumped_ids = [1, 21, 26, 27, 53, 54, 67, 74, 82]
mask = []
overlap = model_rxn_overlap(fittedFluxes, model)
for i in overlap.iteritems():
if i[0] in lumped_ids:
mask.append(True)
else:
mask.append(False)
[12]:
lumped_rxns = model_rxn_overlap(fittedFluxes, model)[mask]
fittedFluxes = split_lumped_rxns(lumped_rxns, fittedFluxes)
[13]:
lumped_reverse_ids = [2, 28, 55, 68]
mask_reverse = []
for i in model_rxn_overlap(fittedFluxes, model).iteritems():
if i[0] in lumped_reverse_ids:
mask_reverse.append(True)
else:
mask_reverse.append(False)
[14]:
lumped_reverse_rxns = model_rxn_overlap(fittedFluxes, model)[mask_reverse]
fittedFluxes = split_lumped_reverse_rxns(lumped_reverse_rxns, fittedFluxes)
ACONTa_ACONTb_reverse
GAPD_PGK_reverse
NADTRHD_THD2pp_reverse
PTAr_ACKr_ACS_reverse
[15]:
model_rxn_overlap(fittedFluxes, model)
[15]:
0 26dap_DASH_MSYN
2 ACONTa_reverse
5 ArgSYN
14 EX_ac_LPAREN_e_RPAREN_
15 EX_co2_LPAREN_e_RPAREN_
...
112 ACONTb_reverse
113 PGK_reverse
114 THD2pp_reverse
115 ACKr_reverse
116 ACS_reverse
Name: rxn_id, Length: 63, dtype: object
[16]:
rxn_coverage(fittedFluxes, model)
54.0 %
SYN, these reactions might be lumped; let’s investigate!
[17]:
for rxn in model.reactions:
if 'ARG' in rxn.id:
print(rxn.id)
ARGAGMt7pp
ARGDC
ARGDCpp
ARGORNt7pp
ARGSL
ARGSS
ARGTRS
ARGabcpp
ARGt3pp
ARGtex
Yeah I guess so… This sucks, not sure of we can do anything about that
Let’s remove the extra bits in the exchange reaction strings
[18]:
for i, row in fittedFluxes.iterrows():
if 'LPAREN_' in row['rxn_id']:
fittedFluxes.at[i, 'rxn_id'] = row['rxn_id'].replace('LPAREN_', '').replace('_RPAREN_', '')
[19]:
rxn_coverage(fittedFluxes, model)
50.0 %
Reverse. Let’s check if the forward and reverse fluxes are actually separate. If not, then the two of them will define the bounds together. If they are, then we should add new reverse reactions to the model.
[20]:
find_reverse_rxns(fittedFluxes)
[20]:
forward | reverse | |
---|---|---|
2 | ACONTa | ACONTa_reverse |
23 | FBA | FBA_reverse |
25 | FUM | FUM_reverse |
28 | GAPD | GAPD_reverse |
30 | GHMT2r | GHMT2r_reverse |
39 | ICDHyr | ICDHyr_reverse |
45 | MDH | MDH_reverse |
50 | MlthfSYN | MlthfSYN_reverse |
55 | NADTRHD | NADTRHD_reverse |
59 | PGI | PGI_reverse |
61 | PGM | PGM_reverse |
68 | PTAr | PTAr_reverse |
71 | RPE | RPE_reverse |
73 | RPI | RPI_reverse |
77 | SUCCOAS | SUCCOAS_reverse |
79 | SUCDi | SUCDi_reverse |
81 | TALA | TALA_reverse |
85 | TKT1a | TKT1a_reverse |
87 | TKT1b | TKT1b_reverse |
89 | TKT2a | TKT2a_reverse |
91 | TKT2b | TKT2b_reverse |
93 | TPI | TPI_reverse |
112 | ACONTb | ACONTb_reverse |
113 | PGK | PGK_reverse |
114 | THD2pp | THD2pp_reverse |
115 | ACKr | ACKr_reverse |
116 | ACS | ACS_reverse |
[21]:
fittedFluxes, rxns_to_split = combine_split_rxns(fittedFluxes)
These reactions need to be split into two: ACONTa
These reactions need to be split into two: FUM
These reactions need to be split into two: GAPD
These reactions need to be split into two: ICDHyr
These reactions need to be split into two: MlthfSYN
These reactions need to be split into two: PGM
These reactions need to be split into two: PTAr
These reactions need to be split into two: ACONTb
These reactions need to be split into two: PGK
These reactions need to be split into two: ACKr
These reactions need to be split into two: ACS
[22]:
fittedFluxes
[22]:
index | simulation_id | simulation_dateAndTime | rxn_id | flux | flux_stdev | flux_lb | flux_ub | flux_units | fit_alf | fit_chi2s | fit_cor | fit_cov | free | used_ | comment_ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | 26dap_DASH_MSYN | 2.295040e-01 | 2.607999e-03 | 0.224392 | 0.234616 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
1 | 1 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ACONTa | 2.074886e+00 | 1.699686e+04 | 1.185984 | 1000.000000 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
2 | 2 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ACONTa_reverse | 8.690514e-07 | 1.543259e+04 | 0.000000 | 28.927600 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | True | True | None |
3 | 3 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | AKGDH | 1.423617e-01 | 7.673616e+03 | 0.000000 | 1.919800 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
4 | 4 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ALATA_L | 3.435520e-01 | 3.904002e-03 | 0.335900 | 0.351204 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
96 | 111 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | GLYAT | 1.000000e-07 | 6.096786e-12 | 0.000000 | 0.035201 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | False | True | None |
97 | 112 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ACONTb_reverse | 8.690514e-07 | 1.543259e+04 | 0.000000 | 28.927600 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | True | True | None |
98 | 113 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | PGK_reverse | 8.689295e-07 | 9.920303e+03 | 0.000000 | 24.418874 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | True | True | None |
99 | 115 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ACKr_reverse | 8.695792e-07 | 2.139235e+04 | 0.000000 | 28.262368 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | True | True | None |
100 | 116 | WTEColi_113C80_U13C20_01 | 2021-04-16 18:29:37 | ACS_reverse | 8.695792e-07 | 2.139235e+04 | 0.000000 | 28.262368 | mmol*gDCW-1*hr-1 | 0.05 | None | None | None | True | True | None |
101 rows × 16 columns
The reactions that are acutally separate (i.e. non-overlapping bounds, exchange fluxes) are a problem. COBRA has some ways to account for that but they seem to be quite involved. An easier way to deal with that is that just add the reverse reaction as a separate reaction to the model; it’s the same reaction, just with the inverse direction. The following method is “destructive”, i.e. it will alter the model. Be aware of that.
[28]:
rxns_to_split
[28]:
['ACONTa',
'FUM',
'GAPD',
'ICDHyr',
'MlthfSYN',
'PGM',
'PTAr',
'ACONTb',
'PGK',
'ACKr',
'ACS']
[23]:
cobra_add_split_rxns(rxns_to_split, model)
- Added ACONTa to model
- Added FUM to model
- Added GAPD to model
- Added ICDHyr to model
# Could not add MlthfSYN to model
- Added PGM to model
- Added PTAr to model
- Added ACONTb to model
- Added PGK to model
- Added ACKr to model
- Added ACS to model
[24]:
model_rxn_overlap(fittedFluxes, model)
[24]:
0 26dap_DASH_MSYN
5 ArgSYN
16 EX_co2_e_unlabeled
17 EX_glc_e
21 FADR
32 GluSYN
35 HisSYN
39 IleSYN
40 LeuSYN
45 MetSYN
46 MlthfSYN
47 MlthfSYN_reverse
50 NADH
57 PheSYN
60 ProSYN
68 SerSYN
69 SUCCOAS
73 ThrSYN
74 TKT1a
75 TKT1b
76 TKT2a
77 TKT2b
79 TrpSYN
80 TyrSYN
81 ValSYN
83 NADH
84 CYTBD
85 HYD
86 ATPS4r
89 CYTBD
90 HYD
91 ATPS4r
Name: rxn_id, dtype: object
[25]:
rxn_coverage(fittedFluxes, model)
32.0 %
Ok… well, guess you can’t please everyone. You could of course find a way to add the remaining reactions manually or investigate further how to distribute their values to other reactions but this is as far as we will go here.
For the next steps, please check the MFA_feasibility_and_sampling
notebook
[26]:
fittedFluxes.to_pickle("data/MFA_sampling/preprocessed_fittedFluxes.obj")
[27]:
cobra.io.save_json_model(model, "data/MFA_sampling/preprocessed_model.json")
[ ]: