## Similar protocols

## Protocol publication

[…] The core procedure of the flux balance technique has been thoroughly described elsewhere [, ]. It is a linear optimization method for studying metabolism based solely on stoichiometric knowledge of the organism’s network of biochemical reactions without relying on knowledge of enzyme kinetics. By constructing an in silico model of the metabolic network and defining a suitable metabolic objective function, FBA then identifies an optimal metabolic phenotype as a vector of fluxes for each reaction present in the model. A large genome-scale in silico network of the Mycobacterium tuberculosis has already been constructed and is presented in []. The in silico reconstruction of this GSMN (Genome-Scale Metabolic Network) comprises 856 metabolic reactions involving metabolites with 726 specific enzymes catalyzing associated reactions. The model is available in the SBML (Systems Biology Markup Language) and CSV formats in . The SBML format can be used in several standalone packages with parsers for the format (such as JyMet explained below), while the CSV format is usually more flexible for inclusion in text processing pipelines and simple scripts written in languages yet without specific SBML parsers.Several methods have already been presented that incorporate experimental data onto FBA methods to achieve metabolic predictions that better characterize the different environmental and experimental conditions [, ]. Data is usually incorporated in the form of hard or soft constraints, and when both transcriptomics and proteomics data are used, it is to solve for inconsistencies between measurements of transcriptional and translational activities []. To the best of our knowledge, experimental data has not been used in the specification of the objective function. In this work we propose the use of proteomics to determine the FBA objective function. In this study we are attempting to estimate the complete metabolic state of the bacterial cell using an in silico model. Although it is clear that quantitative enzyme information by itself (without using e.g. metabolomics, signalling proteins, pathway knowledge, RNA expression, etc.), when used in a in silico model to improve results of an optimization procedure cannot produce a complete picture of the metabolism under drug exposure, it is our conjecture that it can produce a better characterization of the metabolism in situations where the assumption of an objective function for optimal growth (maximization of biomass) can not be made.Simulation scripts for all the FBA calculations used in this paper were written in the R programming language and are available in the supplementary data file . For solving the flux balance analysis optimization problems we used a freely available wrapper on the GLPK (GNU Linear Programming Kit) called **SurreyFBA**. The program is freeware and available for download from (http://sysbio3.fhms.surrey.ac.uk/). Its main features have been presented in []. The program can be used as a CLI (command-line interface) program, which is more suitable for inclusion in the R scripts used in our simulations, but also as a GUI (graphical user interface) using the JyMet frontend, that is included with SurreyFBA. The SurreyFBA program accepts in silico metabolic models in both SBML format and simpler CSV text files, and both these models are also included as supplementary information . Alternatively, several other packages are also available for performing similar calculations []. [...] In this section we describe our proposed objective function. We explain how proteomics data can be used to determine an objective function for FBA in cases where the organism is under drug-induced stress, and thus optimal growth should not be assumed as the metabolic goal of the cell. We propose to define the objective to be a linear combination of fluxes as in [, ]. However, the weights associated to the different fluxes are determined in a different manner. The objective function is the linear combination
f(v)=∑k=1Nckvk=cTv(1)
where v = [v
1, v
2, …, v
N] is the flux vector, and c is the vector of coefficients of the linear combination. The dimension N of both vectors is equal to the number of reactions (i.e. metabolic fluxes) in the model. In the case of biomass maximization, vector c is an all-zero vector except for a one (1.0) in the position corresponding to the biomass reaction:
fbio(v)=cTv=[1,0,0,…,0][vbiov1v2⋮](2)
where v
bio is the corresponding flux for biomass in the model. In our method we propose to use the same linear combination of fluxes for the objective function, but instead of maximizing the biomass flux, we calculate the coefficients c
k from proteomics data.We first define a vector p containing the quantitative values of protein levels obtained with the proteomics experiment (cf. ) for one experimental condition and time point. Vector p is of length K and given by:
p=[p1p2⋮pK](3)
where each element p
k corresponds to a value representing the level of protein k in the sample. While proteomics experiments are very precise and reliable (as opposed to microarray experiments, where the signal-to-noise ratio is much lower), the number K of identified proteins is only a subset of the whole proteome of the cell, a difficulty stemming from two facts; (1) the method performs identification by comparison of peptide fragments against a database, so that only previously identified proteins are accounted for and (2) not all proteins may be susceptible to the cleavage procedure.Moreover, since some proteins may not appear in a specific sample (e.g. protein A may be present in sample 6h control but not in sample day 1 mefloquine), we set the levels of all proteins absent from a sample to zero. It is also important to note that while the proteome experiment quantifies both enzymes and signaling proteins, vector p contains only the subset of proteins with enzymatic activity and that are present in the in silico model. Although this leaves a number of important regulatory proteins not accounted for, we propose to include regulatory information in a subsequent analysis.As a second step, vector p is normalized by its maximum value to yield a vector of relative protein levels p˜:
p˜=p/max{p}(4)
where p˜ is also of dimension K. This normalized vector will be used in the expressions from the GSMN-TB model that combine enzyme actions, as we explain shortly. It is important to note that each value p˜k is associated with one enzyme.Following the normalization procedure we identify which reactions in the metabolic model are catalyzed by the enzymes represented in p˜. The values of the elements p˜k of p˜ are then used to determine the values of the coefficients c
k for our proposed objective. In the simplest case where each enzyme catalyzes one reaction, we set the corresponding coefficient c
k of each reaction to the respective value p˜k of the catalyzing enzyme. For all the other fluxes, i.e. those not present in p˜, the respective coefficients are set to zero since these represent proteins not observed in the proteomics experiment.More complex situations arise when one reaction is catalyzed not by a single enzyme, but by the combined action of a set of enzymes. In such situations, the determination of the coefficients c
k of the objective function depends on the relationship between enzymes and metabolic reactions. Therefore, it is important to understand how enzymes and reactions are related in the metabolic model, and how this information is used in the definition of the proposed objective function. In the GSMN-TB model, metabolic reactions that are catalyzed by a set of enzymes are associated with a boolean expression that combines the action of all enzymes needed for catalysis of the reaction. These expressions were manually curated by the authors of the GSMN-TB model and more details and references can be found in []. As an example, let us check model reaction R022 (myo-inositol synthesis). For this reaction the catalytic proteins are combined in the following boolean expression:
Rv0046c∨(Rv2612c∧Rv1822)(5)
where the operators ∨ and ∧ represent the boolean operators OR and AND respectively. The proteins in this expression are the two transferases Rv2612c (IGY178), Rv1822 (I6X2EA) and the synthase Rv0046c (I6X8D3), all involved in inositol synthesis [].In these cases, to obtain the coefficients for the fluxes that will be used in the objective function, it is first necessary to substitute the proteomics data in p˜ into the corresponding boolean expressions (similar to ). The mathematical operation for the boolean OR results in the maximum of the terms, while the AND operation results in the minimum of the two terms, in a similar fashion as done in []. The resulting values from the boolean expressions are used as the final values for the elements in c of coefficients for the objective function. We note that vector c will have a number of non-zero elements that is usually different from K, unless we have the very specific situation mentioned before where each reaction is catalyzed by a distinct enzyme. Moreover, we will usually have K (the number of proteins available in the proteomics experiment) larger than the number of non-zero elements in c (reactions catalyzed by these enzymes), although it is possible that in some situations the number of reactions in the objective function is greater than the number of enzymes, for example when the same enzyme (or combination of enzymes) catalyzes several reactions in the model.As is clear from the objective function , each weight c
k multiplies the corresponding metabolic flux v
k. As an example, suppose we identify three proteins 1, 2 and 3 with levels p
1 = 1.0, p
2 = 2.0 and p
3 = 3.0 in a given sample. The corresponding normalized vector is obtained using and is equal to p˜=[1/3;2/3;1]. For the sake of this example, we assume that these three proteins catalyze three reactions R023, R042 and R128 in the GSMN model. We assume further that protein 1 catalyzes reaction v
23, protein 2 catalyzes reaction v
42 and an AND combination of proteins 1 and 3 catalyze reaction v
128. The objective function to be maximized by FBA is then defined to be:
f(v)=13v23+23v42+13v128(6)
where the vector of weights is given by c = [1/3;2/3;1/3], since the AND expression is calculated using the minimum of the two values in p˜.We note that by using data from different time points it is possible to define different objective functions for different time points, something that it is not possible when either the more common objective biomass yield or any other time-independent objective function is used. In this work we assume that our proteomics data for the control group should not be significantly different even for different time points, thus we use an average value of these data to perform FBA for the control condition. Our proposed methodology focus on defining an objective function by using proteomics data. Therefore, our FBA constraints are unaltered and are the same as the ones used with the regular FBA procedure. This way, when we present results of comparisons between our method and the **E-flux** method, we are comparing two variants of the plain FBA procedure modified by the introduction of proteomics data: one method (E-flux) uses proteomics data to redefine flux constraints and a biomass objective function, while the method proposed here uses proteomics data to define the objective function and unconstrained internal fluxes (as in the regular FBA method).In this study we have performed comparisons of the proposed methodology with the alternative method E-flux []. This method was originally devised to use transcriptomics data to adjust FBA constraints using a “pipe capacity” analogy. Since the E-flux method can also be used with proteomics data, we run FBA using our proposed methodology and the E-flux method with the same set of data and compare the results obtained. We first simulated both methods and the regular FBA procedure using only biomass maximization without any experimental data. Biomass was constrained to the same value as the regular FBA procedure for comparison. We analyzed the results comparing the number of reactions in the model that carried zero flux in the optimal vector and are catalyzed by essential enzymes needed for growth as presented by []. Here we restrict our analysis to the control experimental condition, where we assume that the objective of the organism is optimal growth, represented by maximization of biomass production. It is expected that in a situation where the organism is not under any biological stress we would end up with a small number of these essential reactions (i.e. metabolic reactions catalyzed by essential enzymes).As a validation step, we performed comparison of prediction error between our proposed method and the E-flux method for a control condition and treatment conditions in different time points. We perform FBA for both methods using randomly chosen subsets of our proteomics data and the remaining subset of proteins to evaluate the prediction of the model in terms of enzyme abundances. This cross-validation step was performed as follows: For one specific time point and experimental condition the set of proteomics data was randomly split in two groups using a 80/20 rule, i.e. 80% of the proteins were used to defined the objective function in our proposed method and to define normalized constraints for the E-flux method, while 20% of the proteins were used as a validation set to compare fluxes and enzyme abundances. This procedure was repeated 16 times with different random splits. The enzymes in the validation set were evaluated in the boolean expressions and the resulting values were used for comparison with the corresponding fluxes obtained with FBA. Normalized squared prediction error was used to evaluate prediction accuracy of the two methods. In doing so, we normalized the error by the number of reactions in the validation set, as without normalization different splits would result in a different number of reactions depending on the boolean expressions evaluated. The validation procedure was repeated and results obtained in different realizations are in agreement. In order to obtain some insight into the metabolic behavior for alternative optima, we perform a final step of flux variability analysis, again comparing both methods. Our results are presented and discussed in the next section. […]

## Pipeline specifications

Software tools | SurreyFBA, E-flux |
---|---|

Application | Metabolic engineering |

Organisms | Mycobacterium tuberculosis |

Diseases | Tuberculosis |

Chemicals | Mefloquine |