Share this page

3BIO-BioInfo Team Publications

Peer-reviewed journal articles

Articles dans des revues avec comité de lecture

2025

SOuLMuSiC, a novel tool for predicting the impact of mutations on protein solubility

Attanasio, S., Kwasigroch, J.-M., Rooman, M., & Pucci, F. (2025). SOuLMuSiC, a novel tool for predicting the impact of mutations on protein solubility. Scientific reports, 15(1), 27531. doi:10.1038/s41598-025-11326-x

Protein solubility problems arise in a wide range of applications, from antibody development to enzyme production, and are linked to several major disorders, including cataracts and Alzheimer's diseases. To assist scientists in designing proteins with improved solubility and better understand solubility-related diseases, we introduce SOuLMuSiC, a computational tool for the fast and accurate prediction of the impact of single-site mutations on protein solubility. Our model is based on a simple artificial neural network that takes as input a series of features, including biophysical properties of wild-type and mutated residues, energetic values computed using various statistical potentials, and mutational scores derived from protein language models. SOuLMuSiC has been trained on a curated dataset of about 700 single-site mutations with known solubility values, collected and manually verified from original literature. It significantly outperforms current state-of-the-art predictors in strict cross validation: the Spearman correlation reaches 0.5 when solubility changes are represented categorically; for the subset with quantitative values, it increases to 0.7. SOuLMuSiC also shows good performance on external datasets containing high-throughput enzyme solubility-related data as well as protein aggregation propensities. In summary, SOuLMuSiC is a valuable tool for identifying mutations that impact protein solubility, and can play a major role in the rational design of proteins with improved solubility and in understanding genetic variants' effect. It is freely available for academic use at http://babylone.ulb.ac.be/SoulMuSiC/.

https://dipot.ulb.ac.be/dspace/bitstream/2013/393911/1/doi_377555.pdf

AbAgym: a well-curated dataset for the mutational analysis of antibody-antigen complexes

Cia Beriain, G., Li, D., Poblete, S., Rooman, M., & Pucci, F. (2025). AbAgym: a well-curated dataset for the mutational analysis of antibody-antigen complexes. mAbs, 17(1). doi:10.1080/19420862.2025.2592421

With monoclonal antibodies becoming one of the largest classes of biopharmaceuticals, it is important to have curated data to train computational models that can accelerate their design. Despite the massive amount of mutagenesis data generated on antibody-antigen interactions, only a few small, well-curated datasets are available. This paper introduces AbAgym, a manually curated repository comprising approximately 324k mutations in antibody-antigen complexes, including approximately 10% of interface mutations, whose effects on antibody-antigen binding have been experimentally quantified through deep mutational scanning (DMS) experiments. We collected and curated 68 DMS datasets from the literature together with the three-dimensional structure of each antibody-antigen complex. We benchmarked the performance of established force field methods as well as recent machine learning models that predict the change in binding affinity upon mutation. The former achieved modest performance, whereas the latter performed only marginally better than random. Finally, our analysis of hotspot residues responsible for immune evasion highlights the importance of accounting for biological complexities, such as conformational changes or oligomeric states that influence antibody-antigen binding, which are often overlooked. Abagym is freely available for academic use at https://github.com/3BioCompBio/Abagym.

https://dipot.ulb.ac.be/dspace/bitstream/2013/397097/3/zzzl.pdf

Residue conservation and solvent accessibility are (almost) all you need for predicting mutational effects in proteins

Tsishyn, M., Hermans, P., Rooman, M., & Pucci, F. (2025). Residue conservation and solvent accessibility are (almost) all you need for predicting mutational effects in proteins. Bioinformatics, 41(6). doi:10.1093/bioinformatics/btaf322

Abstract Motivation Predicting how mutations impact protein biophysical properties remains a significant challenge in computational biology. In recent years, numerous predictors, primarily deep learning models, have been developed to address this problem; however, issues such as their lack of interpretability and limited accuracy persist. Results We showed that a simple evolutionary score, based on the log-odd ratio of wild-type and mutated residue frequencies in evolutionary related proteins, when scaled by the residue's relative solvent accessibility, performs on par with or slightly outperforms most of the benchmarked predictors, many of which are considerably more complex. The evaluation is performed on mutations from the ProteinGym deep mutational scanning dataset collection, which measures various properties such as stability, activity or fitness. This raises further questions about what these complex models actually learn and highlights their limitations in addressing prediction of mutational landscape. Availability and implementation The RSALOR model is available as a user-friendly Python package that can be installed from the PyPI repository. The code is freely available at https://github.com/3BioCompBio/RSALOR.

https://dipot.ulb.ac.be/dspace/bitstream/2013/392379/3/btaf322.pdf

Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A

Jain, S., Trinidad, M., Nguyen, T. B., Jones, K., Neto, S. D., Ge, F., Glagovsky, A., Jones, C., Moran, G., Wang, B., Rahimi, K., Çalıcı, S. Z., Cedillo, L., Berardelli, S., Özden, B., Chen, K., Katsonis, P., Williams, A., Lichtarge, O., Rana, S., Pradhan, S., Srinivasan, R., Sajeed, R., Joshi, D., Faraggi, E., Jernigan, R., Kloczkowski, A., Xu, J., Song, Z., Özkan, S., Padilla, N., de la Cruz, X., Acuna-Hidalgo, R., Grafmüller, A., Barrón, L. T. J., Manfredi, M., Savojardo, C., Babbi, G., Martelli, P. L., Casadio, R. R., Sun, Y., Zhu, S., Shen, Y., Pucci, F., Rooman, M., Cia, G., Raimondi, D., Hermans, P., Kwee, S., Chen, E., Astore, C., Kamandula, A., Pejaver, V., Ramola, R., Velyunskiy, M., Zeiberg, D., Mishra, R., Sterling, T., Goldstein, J. L., Lugo-Martinez, J., Kazi, S., Li, S., Long, K., Brenner, S. E., Bakolitsa, C., Radivojac, P., Suhr, D., Suhr, T., & Clark, W. T. (2025). Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A. Human genetics, 144(2-3), 295-308. doi:10.1007/s00439-025-02731-3

Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.

https://dipot.ulb.ac.be/dspace/bitstream/2013/392381/3/2024.05.16.594558v2.full.pdf

Critical assessment of missense variant effect predictors on disease-relevant variant data

Rastogi, R., Chung, R., Li, S., Li, C., Lee, K., Woo, J., Kim, D. W., Keum, C., Babbi, G., Martelli, P. L., Savojardo, C., Casadio, R. R., Chennen, K., Weber, T. S., Poch, O., Ancien, F., Camperio Ciani, G., Pucci, F., Raimondi, D., Vranken, W., Rooman, M., Marquet, C., Olenyi, T., Rost, B., Andreoletti, G., Kamandula, A., Peng, Y., Bakolitsa, C., Mort, M., Cooper, D. N., Bergquist, T., Pejaver, V., Liu, X., Radivojac, P., Brenner, S. E., & Ioannidis, N. M. (2025). Critical assessment of missense variant effect predictors on disease-relevant variant data. Human genetics, 144(2-3), 281-293. doi:10.1007/s00439-025-02732-2

Abstract Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.

https://dipot.ulb.ac.be/dspace/bitstream/2013/392380/1/doi_376024.pdf

Alterations in the renin-angiotensin system during septic shock

Benaroua, C., Pucci, F., Rooman, M., Picod, A., Favory, R., Legrand, M., Vincent, J. L., Creteur, J., Taccone, F. S., Annoni, F., & Garcia, B. (2025). Alterations in the renin-angiotensin system during septic shock. Annals of intensive care, 15(1). doi:10.1186/s13613-025-01463-x

Abstract Background Alterations in the classical Renin-Angiotensin Aldosterone System (RAAS) have been described during septic shock and are associated with patient outcomes. Since the alternative RAAS has also been reported to be altered in critically ill patients, and given that the RAAS can be modulated by specific therapeutics, such as angiotensin II, understanding its pathophysiology is of primary interest. Objective To describe the alterations in the classical and alternative RAAS during septic shock in comparison with healthy controls. Methods This prospective, monocentric, controlled study enrolled 20 patients fulfilling the septic shock diagnosis, as defined by the Sepsis-3 criteria, along with 30 controls. The main exclusion criteria were the use of any prior medication modifying the RAAS, prior liver failure (Child-Pugh score > 9), or chronic kidney disease (estimated glomerular filtration rate < 30 ml/min/1.73 m²). Equilibrium concentrations of RAAS peptides were analyzed using a liquid chromatography-mass spectrometry method from heparinized plasma. Circulating angiotensin-converting enzyme (cACE), cACE type 2 (cACE2) activities, and circulating dipeptidyl peptidase 3 (cDPP3) concentrations were assessed. Values were measured at diagnosis, 6 h after diagnosis and on days 1 and 3. The main timepoint of interest was 6 h after diagnosis. Values 6 h after diagnosis were compared to 30 controls. Results In septic shock patients, increased concentrations of the main peptides of the classical and alternative RAAS were observed compared to controls, particularly angiotensin I (Ang I) and angiotensin-(1-7) (Ang-(1-7)). Additionally, there was a significant increase in the Ang I/Ang II ratio (1.16 [0.74-3.31] vs. 0.34 [0.25-0.43], p < 0.05) and the Ang-(1-7)/Ang II ratio (0.15 [0.08-1.30] vs. 0.03 [0.02-0.04], p < 0.05). We also observed a significant reduction in cACE activity (3.38 [2.29-6.8] vs. 7.89 [6.39-9.05] nmol Ang II/L/h), an increase in cACE2 activity (814 [669-1987] vs. 214 [132-293] pmol Ang-(1-7)/L/h), and increased cDPP3 concentrations (54.6 [35-142.2] ng/mL vs. 13.7 [11.9-15.4] ng/mL, all p < 0.05). Conclusions Septic shock was associated with increased Ang I/Ang II and Ang-(1-7)/Ang II ratios, along with reduced cACE activity, increased cACE2 activity, and elevated cDPP3 concentrations compared to healthy controls. Graphical abstract

https://dipot.ulb.ac.be/dspace/bitstream/2013/392378/1/doi_376022.pdf

2024

Assessing predictions on fitness effects of missense variants in HMBS in CAGI6

Zhang, J., Kinch, L., Katsonis, P., Lichtarge, O., Jagota, M., Song, Y. S., Sun, Y., Shen, Y., Kuru, N., Dereli, O., Adebali, O., Alladin, M. A., Pal, D., Capriotti, E., Turina, M. P., Savojardo, C., Martelli, P. L., Babbi, G., Casadio, R. R., Pucci, F., Rooman, M., Cia Beriain, G., Tsishyn, M., Strokach, A., Hu, Z., van Loggerenberg, W., Roth, F. P., Radivojac, P., Brenner, S. E., Cong, Q., & Grishin, N. (2024). Assessing predictions on fitness effects of missense variants in HMBS in CAGI6. Human genetics. doi:10.1007/s00439-024-02680-3

This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.

https://dipot.ulb.ac.be/dspace/bitstream/2013/377441/3/s00439-024-02680-3

Prediction of Paratope-Epitope Pairs Using Convolutional Neural Networks

Li, D., Pucci, F., & Rooman, M. (2024). Prediction of Paratope-Epitope Pairs Using Convolutional Neural Networks. International Journal of Molecular Sciences (CD-ROM), 25(10), 5434. doi:10.3390/ijms25105434

Antibodies play a central role in the adaptive immune response of vertebrates through the specific recognition of exogenous or endogenous antigens. The rational design of antibodies has a wide range of biotechnological and medical applications, such as in disease diagnosis and treatment. However, there are currently no reliable methods for predicting the antibodies that recognize a specific antigen region (or epitope) and, conversely, epitopes that recognize the binding region of a given antibody (or paratope). To fill this gap, we developed ImaPEp, a machine learning-based tool for predicting the binding probability of paratope-epitope pairs, where the epitope and paratope patches were simplified into interacting two-dimensional patches, which were colored according to the values of selected features, and pixelated. The specific recognition of an epitope image by a paratope image was achieved by using a convolutional neural network-based model, which was trained on a set of two-dimensional paratope-epitope images derived from experimental structures of antibody-antigen complexes. Our method achieves good performances in terms of cross-validation with a balanced accuracy of 0.8. Finally, we showcase examples of application of ImaPep, including extensive screening of large libraries to identify paratope candidates that bind to a selected epitope, and rescoring and refining antibody-antigen docking poses.

https://dipot.ulb.ac.be/dspace/bitstream/2013/375643/1/doi_359287.pdf

FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction

Tsishyn, M., Cia Beriain, G., Hermans, P., Kwasigroch, J.-M., Rooman, M., & Pucci, F. (2024). FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction. Human genomics, 18(1). doi:10.1186/s40246-024-00605-9

Abstract Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC's robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC's qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.

https://dipot.ulb.ac.be/dspace/bitstream/2013/373278/1/doi_356922.pdf

MHCII-peptide presentation: an assessment of the state-of-the-art prediction methods

Yang, Y., Wei, Z., Cia Beriain, G., Song, X., Pucci, F., Rooman, M., Xue, F., & Hou, Q. (2024). MHCII-peptide presentation: an assessment of the state-of-the-art prediction methods. Frontiers in immunology, 15. doi:10.3389/fimmu.2024.1293706

Major histocompatibility complex Class II (MHCII) proteins initiate and regulate immune responses by presentation of antigenic peptides to CD4+ T-cells and self-restriction. The interactions between MHCII and peptides determine the specificity of the immune response and are crucial in immunotherapy and cancer vaccine design. With the ever-increasing amount of MHCII-peptide binding data available, many computational approaches have been developed for MHCII-peptide interaction prediction over the last decade. There is thus an urgent need to provide an up-to-date overview and assessment of these newly developed computational methods. To benchmark the prediction performance of these methods, we constructed an independent dataset containing binding and non-binding peptides to 20 human MHCII protein allotypes from the Immune Epitope Database, covering DP, DR and DQ alleles. After collecting 11 known predictors up to January 2022, we evaluated those available through a webserver or standalone packages on this independent dataset. The benchmarking results show that MixMHC2pred and NetMHCIIpan-4.1 achieve the best performance among all predictors. In general, newly developed methods perform better than older ones due to the rapid expansion of data on which they are trained and the development of deep learning algorithms. Our manuscript not only draws a full picture of the state-of-art of MHCII-peptide binding prediction, but also guides researchers in the choice among the different predictors. More importantly, it will inspire biomedical researchers in both academia and industry for the future developments in this field.

https://dipot.ulb.ac.be/dspace/bitstream/2013/373279/1/doi_356923.pdf

Quantification of biases in predictions of protein-protein binding affinity changes upon mutations

Tsishyn, M., Pucci, F., & Rooman, M. (2024). Quantification of biases in predictions of protein-protein binding affinity changes upon mutations. Briefings in bioinformatics, 25(1). doi:10.1093/bib/bbad491

Abstract Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.

https://dipot.ulb.ac.be/dspace/bitstream/2013/369329/1/doi_352973.pdf

2023

Enzyme Stability-Activity Trade-Off: New Insights from Protein Stability Weaknesses and Evolutionary Conservation

Hou, Q., Rooman, M., & Pucci, F. (2023). Enzyme Stability-Activity Trade-Off: New Insights from Protein Stability Weaknesses and Evolutionary Conservation. Journal of chemical theory and computation, 19(12), 3664-3671. doi:10.1021/acs.jctc.3c00036

A general limitation of the use of enzymes in biotechnological processes under sometimes nonphysiological conditions is the complex interplay between two key quantities, enzyme activity and stability, where the increase of one is often associated with the decrease of the other. A precise stability-activity trade-off is necessary for the enzymes to be fully functional, but its weight in different protein regions and its dependence on environmental conditions is not yet elucidated. To advance this issue, we used the formalism that we have recently developed to effectively identify stability strength and weakness regions in protein structures and applied it to a large set of globular enzymes with known experimental structure and catalytic sites. Our analysis showed a striking oscillatory pattern of free energy compensation centered on the catalytic region. Indeed, catalytic residues are usually nonoptimal with respect to stability, but residues in the first shell around the catalytic site are, on the average, stability strengths and thus compensate for this lack of stability; residues in the second shell are weaker again, and so on. This trend is consistent across all enzyme families. It is accompanied by a similar, but less pronounced, pattern of residue conservation across evolution. In addition, we analyzed cold- and heat-adapted enzymes separately and highlighted different patterns of stability strengths and weaknesses, which provide insight into the longstanding problem of catalytic rate enhancement in cold environments. The successful comparison of our stability and conservation results with experimental fitness data, obtained by deep mutagenesis scanning, led us to propose criteria for improving catalytic activity while maintaining enzyme stability, a key goal in enzyme design.

https://dipot.ulb.ac.be/dspace/bitstream/2013/360350/3/acs.jctc.3c00036-2.pdf

Estimating the Vertical Ionization Potential of Single-Stranded DNA Molecules

Rooman, M., & Pucci, F. (2023). Estimating the Vertical Ionization Potential of Single-Stranded DNA Molecules. Journal of chemical information and modeling, 63(6), 1766-1775. doi:10.1021/acs.jcim.2c01525

The electronic properties of DNA molecules, defined by the sequence-dependent ionization potentials of nucleobases, enable long-range charge transport along the DNA stacks. This has been linked to a range of key physiological processes in the cells and to the triggering of nucleobase substitutions, some of which may cause diseases. To gain molecular-level understanding of the sequence dependence of these phenomena, we estimated the vertical ionization potential (vIP) of all possible nucleobase stacks in B-conformation, containing one to four Gua, Ade, Thy, Cyt, or methylated Cyt. To do this, we used quantum chemistry calculations and more precisely the second-order Møller-Plesset perturbation theory (MP2) and three double-hybrid density functional theory methods, combined with several basis sets for describing atomic orbitals. The calculated vIP of single nucleobases were compared to experimental data and those of nucleobase pairs, triplets, and quadruplets, to observed mutability frequencies in the human genome, reported to be correlated with vIP values. This comparison selected MP2 with the 6-31G* basis set as the best of the tested calculation levels. These results were exploited to set up a recursive model, called vIPer, which estimates the vIP of all possible single-stranded DNA sequences of any length based on the calculated vIPs of overlapping quadruplets. vIPer's vIP values correlate well with oxidation potentials measured by cyclic voltammetry and activities obtained through photoinduced DNA cleavage experiments, further validating our approach. vIPer is freely available on the github.com/3BioCompBio/vIPer repository.

https://dipot.ulb.ac.be/dspace/bitstream/2013/360349/3/2023.02.27.530325v1.full.pdf

Critical review of conformational B-cell epitope prediction methods.

Cia Beriain, G., Pucci, F., & Rooman, M. (2023). Critical review of conformational B-cell epitope prediction methods. Briefings in bioinformatics, 24(1). doi:10.1093/bib/bbac567

Accurate in silico prediction of conformational B-cell epitopes would lead to major improvements in disease diagnostics, drug design and vaccine development. A variety of computational methods, mainly based on machine learning approaches, have been developed in the last decades to tackle this challenging problem. Here, we rigorously benchmarked nine state-of-the-art conformational B-cell epitope prediction webservers, including generic and antibody-specific methods, on a dataset of over 250 antibody-antigen structures. The results of our assessment and statistical analyses show that all the methods achieve very low performances, and some do not perform better than randomly generated patches of surface residues. In addition, we also found that commonly used consensus strategies that combine the results from multiple webservers are at best only marginally better than random. Finally, we applied all the predictors to the SARS-CoV-2 spike protein as an independent case study, and showed that they perform poorly in general, which largely recapitulates our benchmarking conclusions. We hope that these results will lead to greater caution when using these tools until the biases and issues that limit current methods have been addressed, promote the use of state-of-the-art evaluation methodologies in future publications and suggest new strategies to improve the performance of conformational B-cell epitope prediction methods.

https://dipot.ulb.ac.be/dspace/bitstream/2013/355297/5/2023_Briefings_Cia_etal_BenchmarkEpitopes.pdf

pyScoMotif: discovery of similar 3D structural motifs across proteins

Cia Beriain, G., Kwasigroch, J.-M., Stamatopoulos, B., Rooman, M., & Pucci, F. (2023). pyScoMotif: discovery of similar 3D structural motifs across proteins. Bioinformatics Advances, 3(1). doi:10.1093/bioadv/vbad158

Abstract Motivation The fast and accurate detection of similar geometrical arrangements of protein residues, known as 3D structural motifs, is highly relevant for many applications such as binding region and catalytic site detection, drug discovery and structure conservation analyses. With the recent publication of new protein structure prediction methods, the number of available protein structures is exploding, which makes efficient and easy-to-use tools for identifying 3D structural motifs essential. Results We present an open-source Python package that enables the search for both exact and mutated motifs with position-specific residue substitutions. The tool is efficient, flexible, accurate, and suitable to run both on computer clusters and personal laptops. Two successful applications of pyScoMotif for catalytic site identification are showcased. Availability and implementation The pyScoMotif package can be installed from the PyPI repository and is also available at https://github.com/3BioCompBio/pyScoMotif. It is free to use for non-commercial purposes.

https://dipot.ulb.ac.be/dspace/bitstream/2013/367924/3/vbad158.pdf

2022

SpikePro: a webserver to predict the fitness of SARS-CoV-2 variants.

Cia Beriain, G., Kwasigroch, J.-M., Rooman, M., & Pucci, F. (2022). SpikePro: a webserver to predict the fitness of SARS-CoV-2 variants. Bioinformatics, 38(18), 4418-4419. doi:10.1093/bioinformatics/btac517

The SARS-CoV-2 virus has shown a remarkable ability to evolve and spread across the globe through successive waves of variants since the original Wuhan lineage. Despite all the efforts of the last 2 years, the early and accurate prediction of variant severity is still a challenging issue which needs to be addressed to help, for example, the decision of activating COVID-19 plans long before the peak of new waves. Upstream preparation would indeed make it possible to avoid the overflow of health systems and limit the most severe cases.

https://dipot.ulb.ac.be/dspace/bitstream/2013/355299/3/2022_Bioinformatics_Cia_etal_SpikePro2pdf.pdf https://dipot.ulb.ac.be/dspace/bitstream/2013/355299/4/2022_Bioinformatics_Cia_etal_SpikePro2pdf.pdf

NPTX1 mutations trigger endoplasmic reticulum stress and cause autosomal dominant cerebellar ataxia

Coutelier, M., Jacoupy, M., Janer, A., Renaud, F., Auger, N., Saripella, G.-V., Ancien, F., Pucci, F., Rooman, M., Gilis, D., Larivière, R., Sgarioto, N., Valter, R., Guillot-Noel, L., Le Ber, I., Sayah, S., Charles, P., Nümann, A., Pauly, M. G., Helmchen, C., Deininger, N., Haack, T., Brais, B., Brice, A., Trégouët, D.-A., El Hachimi, K., Shoubridge, E., Durr, A., & Stevanin, G. (2022). NPTX1 mutations trigger endoplasmic reticulum stress and cause autosomal dominant cerebellar ataxia. Brain, 145(4), 1519-1534. doi:10.1093/brain/awab407

Abstract With more than forty causative genes identified so far, autosomal dominant cerebellar ataxias exhibit a remarkable genetic heterogeneity. Yet, half the patients are lacking a molecular diagnosis. In a large family with nine sampled affected members, we performed exome sequencing combined with whole-genome linkage analysis. We identified a missense variant in NPTX1, NM_002522.3: c.1165G>A: p.G389R, segregating with the phenotype. Further investigations with whole exome sequencing and an amplicon-based panel identified four additional unrelated families segregating the same variant, for whom a common founder effect could be excluded. A second missense variant, NM_002522.3: c.980A>G: p.E327G, was identified in a fifth familial case. The NPTX1-associated phenotype consists of a late-onset, slowly progressive, cerebellar ataxia, with downbeat nystagmus, cognitive impairment reminiscent of cerebellar cognitive affective syndrome, myoclonic tremor and mild cerebellar vermian atrophy on brain imaging. NPTX1 encodes the neuronal pentraxin 1, a secreted protein with various cellular and synaptic functions. Both variants affect conserved amino-acid residues and are extremely rare or absent from public databases. In COS7 cells, overexpression of both neuronal pentraxin 1 variants altered endoplasmic reticulum morphology and induced ATF6-mediated endoplasmic reticulum stress, associated with cytotoxicity. In addition, the p. E327G variant abolished neuronal pentraxin 1 secretion, as well as its capacity to form a high molecular weight complex with the wild-type protein. Co-immunoprecipitation experiments coupled with mass spectrometry analysis demonstrated abnormal interactions of this variant with the cytoskeleton. In agreement with these observations, in silico modelling of the neuronal pentraxin 1 complex evidenced a destabilizing effect for the p. E327G substitution, located at the interface between monomers. On the contrary, the p. G389 residue, located at the protein surface, had no predictable effect on the complex stability. Our results establish NPTX1 as a new causative gene in autosomal dominant cerebellar ataxias. We suggest that variants in NPTX1 can lead to cerebellar ataxia due to endoplasmic reticulum stress, mediated by ATF6, and associated to a destabilization of NP1 polymers in a dominant-negative manner for one of the variants.

https://dipot.ulb.ac.be/dspace/bitstream/2013/334504/4/2021_Coutelier_Brain.pdf

Analysis of the Neutralizing Activity of Antibodies Targeting Open or Closed SARS-CoV-2 Spike Protein Conformations.

Cia Beriain, G., Pucci, F., & Rooman, M. (2022). Analysis of the Neutralizing Activity of Antibodies Targeting Open or Closed SARS-CoV-2 Spike Protein Conformations. International journal of molecular sciences, 23(4). doi:10.3390/ijms23042078

SARS-CoV-2 infection elicits a polyclonal neutralizing antibody (nAb) response that primarily targets the spike protein, but it is still unclear which nAbs are immunodominant and what distinguishes them from subdominant nAbs. This information would however be crucial to predict the evolutionary trajectory of the virus and design future vaccines. To shed light on this issue, we gathered 83 structures of nAbs in complex with spike protein domains. We analyzed in silico the ability of these nAbs to bind the full spike protein trimer in open and closed conformations, and predicted the change in binding affinity of the most frequently observed spike protein variants in the circulating strains. This led us to define four nAb classes with distinct variant escape fractions. By comparing these fractions with those measured from plasma of infected patients, we showed that the class of nAbs that most contributes to the immune response is able to bind the spike protein in its closed conformation. Although this class of nAbs only partially inhibits the spike protein binding to the host's angiotensin converting enzyme 2 (ACE2), it has been suggested to lock the closed pre-fusion spike protein conformation and therefore prevent its transition to an open state. Furthermore, comparison of our predictions with mRNA-1273 vaccinated patient plasma measurements suggests that spike proteins contained in vaccines elicit a different nAb class than the one elicited by natural SARS-CoV-2 infection and suggests the design of highly stable closed-form spike proteins as next-generation vaccine immunogens.

https://dipot.ulb.ac.be/dspace/bitstream/2013/355301/1/doi_338945.pdf

Artificial intelligence challenges for predicting the impact of mutations on protein stability.

Pucci, F., Schwersensky, M., & Rooman, M. (2022). Artificial intelligence challenges for predicting the impact of mutations on protein stability. Current opinion in structural biology, 72, 161-168. doi:10.1016/j.sbi.2021.11.001

Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.

https://dipot.ulb.ac.be/dspace/bitstream/2013/336730/5/2022_CurrOpin_Pucci_etal_AIreview.pdf

Using metagenomic data to boost protein structure prediction and discovery

Hou, Q., Pucci, F., Pan, F., Xue, F., Rooman, M., & Feng, Q. (2022). Using metagenomic data to boost protein structure prediction and discovery. Computational and Structural Biotechnology Journal, 20, 434-442. doi:10.1016/j.csbj.2021.12.030

Over the past decade, metagenomic sequencing approaches have been providing an ever-increasing amount of protein sequence data at an astonishing rate. These constitute an invaluable source of information which has been exploited in various research fields such as the study of the role of the gut microbiota in human diseases and aging. However, only a small fraction of all metagenomic sequences collected have been functionally or structurally characterized, leaving much of them completely unexplored. Here, we review how this information has been used in protein structure prediction and protein discovery. We begin by presenting some widely used metagenomic databases and analyze in detail how metagenomic data has contributed to the impressive improvement in the accuracy of structure prediction methods in recent years. We then examine how metagenomic information can be exploited to annotate protein sequences. More specifically, we focus on the role of metagenomes in the discovery of enzymes and new CRISPR-Cas systems, and in the identification of antibiotic resistance genes. With this review, we provide an overview of how metagenomic data is currently revolutionizing our understanding of protein science.

https://dipot.ulb.ac.be/dspace/bitstream/2013/338024/1/doi_321668.pdf

MutaFrame—an interpretative visualization framework for deleteriousness prediction of missense variants in the human exome

Ancien, F., Pucci, F., Vranken, W. F., & Rooman, M. (2022). MutaFrame—an interpretative visualization framework for deleteriousness prediction of missense variants in the human exome. Bioinformatics, 38(1), 265-266. doi:10.1093/bioinformatics/btab453

Abstract Motivation High-throughput experiments are generating ever increasing amounts of various -omics data, so shedding new light on the link between human disorders, their genetic causes and the related impact on protein behavior and structure. While numerous bioinformatics tools now exist that predict which variants in the human exome cause diseases, few tools predict the reasons why they might do so. Yet, understanding the impact of variants at the molecular level is a prerequisite for the rational development of targeted drugs or personalized therapies. Results We present the updated MutaFrame webserver, which aims to meet this need. It offers two deleteriousness prediction softwares, DEOGEN2 and SNPMuSiC, and is designed for bioinformaticians and medical researchers who want to gain insights into the origins of monogenic diseases. It contains information at two levels for each human protein: its amino acid sequence and its three-dimensional structure; we used the experimental structures whenever available, and modeled structures otherwise. MutaFrame also includes higher-level information, such as protein essentiality and protein-protein interactions. It has a user-friendly interface for the interpretation of results and a convenient visualization system for protein structures, in which the variant positions introduced by the user and other structural information are shown. In this way, MutaFrame aids our understanding of the pathogenic processes caused by single-site mutations and their molecular and contextual interpretation. Availability and implementation Mutaframe webserver at http://mutaframe.com/. Supplementary information Supplementary data are available at Bioinformatics online.

https://dipot.ulb.ac.be/dspace/bitstream/2013/338026/1/doi_321670.pdf

2021

BRANEart: Identify Stability Strength and Weakness Regions in Membrane Proteins

Basu, S. C., Assaf, S., Teheux, F., Rooman, M., & Pucci, F. (2021). BRANEart: Identify Stability Strength and Weakness Regions in Membrane Proteins. Frontiers in bioinformatics, 1, 742843. doi:10.3389/fbinf.2021.742843

Understanding the role of stability strengths and weaknesses in proteins is a key objective for rationalizing their dynamical and functional properties such as conformational changes, catalytic activity, and protein-protein and protein-ligand interactions. We present BRANEart, a new, fast and accurate method to evaluate the per-residue contributions to the overall stability of membrane proteins. It is based on an extended set of recently introduced statistical potentials derived from membrane protein structures, which better describe the stability properties of this class of proteins than standard potentials derived from globular proteins. We defined a per-residue membrane propensity index from combinations of these potentials, which can be used to identify residues which strongly contribute to the stability of the transmembrane region or which would, on the contrary, be more stable in extramembrane regions, or vice versa. Large-scale application to membrane and globular proteins sets and application to tests cases show excellent agreement with experimental data. BRANEart thus appears as a useful instrument to analyze in detail the overall stability properties of a target membrane protein, to position it relative to the lipid bilayer, and to rationally modify its biophysical characteristics and function. BRANEart can be freely accessed from http://babylone.3bio.ulb.ac.be/BRANEart.

https://dipot.ulb.ac.be/dspace/bitstream/2013/338027/3/2021_FrontBioinform_Basu_etal_BRANEart.pdf

Perturbing dimer interactions and allosteric communication modulates the immunosuppressive activity of human galectin-7

Pham, N. T. H., Létourneau, M., Fortier, M., Bégin, G., Al-Abdul-Wahid, S., Pucci, F., Folch, B., Rooman, M., Chatenet, D., St Pierre, Y., Lagüe, P., Calmettes, C., & Doucet, N. (2021). Perturbing dimer interactions and allosteric communication modulates the immunosuppressive activity of human galectin-7. The Journal of biological chemistry, 297(5), 101308. doi:10.1016/j.jbc.2021.101308

The design of allosteric modulators to control protein function is a key objective in drug discovery programs. Altering functionally essential allosteric residue networks provides unique protein family subtype specificity, minimizes unwanted off-target effects, and helps avert resistance acquisition typically plaguing drugs that target orthosteric sites. In this work, we used protein engineering and dimer interface mutations to positively and negatively modulate the immunosuppressive activity of the proapoptotic human galectin-7 (GAL-7). Using the PoPMuSiC and BeAtMuSiC algorithms, mutational sites and residue identity were computationally probed and predicted to either alter or stabilize the GAL-7 dimer interface. By designing a covalent disulfide bridge between protomers to control homodimer strength and stability, we demonstrate the importance of dimer interface perturbations on the allosteric network bridging the two opposite glycan-binding sites on GAL-7, resulting in control of induced apoptosis in Jurkat T cells. Molecular investigation of G16X GAL-7 variants using X-ray crystallography, biophysical, and computational characterization illuminates residues involved in dimer stability and allosteric communication, along with discrete long-range dynamic behaviors involving loops 1, 3, and 5. We show that perturbing the protein-protein interface between GAL-7 protomers can modulate its biological function, even when the overall structure and ligand-binding affinity remains unaltered. This study highlights new avenues for the design of galectin-specific modulators influencing both glycandependent and glycan-independent interactions.

https://dipot.ulb.ac.be/dspace/bitstream/2013/338025/1/doi_321669.pdf

Quantifying renin-angiotensin-system alterations in covid-19

Pucci, F., Annoni, F., Dos Santos, R. A. S., Taccone, F. S., & Rooman, M. (2021). Quantifying renin-angiotensin-system alterations in covid-19. Cells, 10(10), 2755. doi:10.3390/cells10102755

The renin-angiotensin system (RAS) plays a pivotal role in a wide series of physiological processes, among which inflammation and blood pressure regulation. One of its key components, the angiotensin-converting enzyme 2, has been identified as the entry point of the SARS-CoV-2 virus into the host cells, and therefore a lot of research has been devoted to study RAS dysregulation in COVID-19. Here we discuss the alterations of the regulatory RAS axes due to SARS-CoV-2 infection on the basis of a series of recent clinical investigations and experimental analyzes quantifying, e.g., the levels and activity of RAS components. We performed a comprehensive meta-analysis of these data in view of disentangling the links between the impaired RAS functioning and the pathophysiological characteristics of COVID-19. We also review the effects of several RAS-targeting drugs and how they could potentially help restore the normal RAS functionality and minimize the COVID-19 severity. Finally, we discuss the conflicting evidence found in the literature and the open questions on RAS dysregulation in SARS-CoV-2 infection whose resolution would improve our understanding of COVID-19.

https://dipot.ulb.ac.be/dspace/bitstream/2013/333673/1/doi_317317.pdf

Discopolis 2.0: A new recursive version of the algorithm for uniform sampling of metabolic flux distributions with linear programming

Bogaerts, P., & Rooman, M. (2021). Discopolis 2.0: A new recursive version of the algorithm for uniform sampling of metabolic flux distributions with linear programming. IFAC-PapersOnLine, 54(3), 300-305. doi:10.1016/j.ifacol.2021.08.258

Metabolic flux values are subject to equality (e.g., mass balances, measured fluxes) and inequality (e.g., upper and lower flux bounds) constraints. The system is generally underdetermined, i.e. with more unknown fluxes than equations, and all the admissible solutions belong to a convex polytope. Sampling that polytope allows subsequently computing marginal distributions for each metabolic flux. We propose a new version of the DISCOPOLIS algorithm (DIscrete Sampling of COnvex POlytopes via Linear program Iterative Sequences) that provides the same weight to all the samples and that approximates a uniform distribution thanks to a recursive loop that computes variable numbers (called grid points) of samples depending on the fluxes that have already been fixed in former iterations. The method is illustrated on three different case studies (with 3, 95 and 1054 fluxes) and shows interesting results in terms of flux distribution convergence and large ranges of the marginal flux distributions. Three consistent criteria are proposed to choose the optimal maximum number of grid points.

https://dipot.ulb.ac.be/dspace/bitstream/2013/335189/1/elsevier_318833.pdf

Prediction and evolution of the molecular fitness of sars-cov-2 variants: Introducing spikepro

Pucci, F., & Rooman, M. (2021). Prediction and evolution of the molecular fitness of sars-cov-2 variants: Introducing spikepro. Viruses, 13(5), 935. doi:10.3390/v13050935

The understanding of the molecular mechanisms driving the fitness of the SARS-CoV-2 virus and its mutational evolution is still a critical issue. We built a simplified computational model, called SpikePro, to predict the SARS-CoV-2 fitness from the amino acid sequence and structure of the spike protein. It contains three contributions: the inter-human transmissibility of the virus predicted from the stability of the spike protein, the infectivity computed in terms of the affinity of the spike protein for the ACE2 receptor, and the ability of the virus to escape from the human immune response based on the binding affinity of the spike protein for a set of neutralizing antibodies. Our model reproduces well the available experimental, epidemiological and clinical data on the impact of variants on the biophysical characteristics of the virus. For example, it is able to identify circulating viral strains that, by increasing their fitness, recently became dominant at the population level. SpikePro is a useful, freely available instrument which predicts rapidly and with good accuracy the dangerousness of new viral strains. It can be integrated and play a fundamental role in the genomic surveillance programs of the SARS-CoV-2 virus that, despite all the efforts, remain time-consuming and expensive.

https://dipot.ulb.ac.be/dspace/bitstream/2013/326911/1/doi_310555.pdf

In silico analysis of the molecular-level impact of smpd1 variants on niemann-pick disease severity

Ancien, F., Pucci, F., & Rooman, M. (2021). In silico analysis of the molecular-level impact of smpd1 variants on niemann-pick disease severity. International journal of molecular sciences, 22(9), 4516. doi:10.3390/ijms22094516

Sphingomyelin phosphodiesterase (SMPD1) is a key enzyme in the sphingolipid metabolism. Genetic SMPD1 variants have been related to the Niemann-Pick lysosomal storage disorder, which has different degrees of phenotypic severity ranging from severe symptomatology involving the central nervous system (type A) to milder ones (type B). They have also been linked to neurodegenerative disorders such as Parkinson and Alzheimer. In this paper, we leveraged structural, evolutionary and stability information on SMPD1 to predict and analyze the impact of variants at the molecular level. We developed the SMPD1-ZooM algorithm, which is able to predict with good accuracy whether variants cause Niemann-Pick disease and its phenotypic severity; the predictor is freely available for download. We performed a large-scale analysis of all possible SMPD1 variants, which led us to identify protein regions that are either robust or fragile with respect to amino acid variations, and show the importance of aromatic-involving interactions in SMPD1 function and stability. Our study also revealed a good correlation between SMPD1-ZooM scores and in vitro loss of SMPD1 activity. The understanding of the molecular effects of SMPD1 variants is of crucial importance to improve genetic screening of SMPD1-related disorders and to develop personalized treatments that restore SMPD1 functionality.

https://dipot.ulb.ac.be/dspace/bitstream/2013/325010/1/doi_308654.pdf

SWOTein: a structure-based approach to predict stability Strengths and Weaknesses of prOTEINs.

Hou, Q., Pucci, F., Ancien, F., Kwasigroch, J.-M., Bourgeas, R., & Rooman, M. (2021). SWOTein: a structure-based approach to predict stability Strengths and Weaknesses of prOTEINs. Bioinformatics. doi:10.1093/bioinformatics/btab034

Although structured proteins adopt their lowest free energy conformation in physiological conditions, the individual residues are generally not in their lowest free energy conformation. Residues that are stability weaknesses are often involved in functional regions, whereas stability strengths ensure local structural stability. The detection of strengths and weaknesses provides key information to guide protein engineering experiments aiming to modulate folding and various functional processes.

https://dipot.ulb.ac.be/dspace/bitstream/2013/326615/3/2021_Bioinformatics_Hou_etal_SWOTein.pdf

2020

Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness

Schwersensky, M., Rooman, M., & Pucci, F. (2020). Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC biology, 18(1), 146. doi:10.1186/s12915-020-00870-9

Background: How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. Results: At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. Conclusion: Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.

Modeling the Molecular Impact of SARS-CoV-2 Infection on the Renin-Angiotensin System

Pucci, F., Bogaerts, P., & Rooman, M. (2020). Modeling the Molecular Impact of SARS-CoV-2 Infection on the Renin-Angiotensin System. Viruses, 12(12), 1367. doi:10.3390/v12121367

SARS-CoV-2 infection is mediated by the binding of its spike protein to the angiotensin-converting enzyme 2 (ACE2), which plays a pivotal role in the renin-angiotensin system (RAS). The study of RAS dysregulation due to SARS-CoV-2 infection is fundamentally important for a better understanding of the pathogenic mechanisms and risk factors associated with COVID-19 coronavirus disease and to design effective therapeutic strategies. In this context, we developed a mathematical model of RAS based on data regarding protein and peptide concentrations; the model was tested on clinical data from healthy normotensive and hypertensive individuals. We used our model to analyze the impact of SARS-CoV-2 infection on RAS, which we modeled through a downregulation of ACE2 as a function of viral load. We also used it to predict the effect of RAS-targeting drugs, such as RAS-blockers, human recombinant ACE2, and angiotensin 1-7 peptide, on COVID-19 patients; the model predicted an improvement of the clinical outcome for some drugs and a worsening for others. Our model and its predictions constitute a valuable framework for in silico testing of hypotheses about the COVID-19 pathogenic mechanisms and the effect of drugs aiming to restore RAS functionality.

https://dipot.ulb.ac.be/dspace/bitstream/2013/315105/1/doi_298749.pdf

Inhibition of aquaporin-1 prevents myocardial remodeling by blocking the transmembrane transport of hydrogen peroxide

Montiel, V., Bella, R., Michel, L. Y. M., Esfahani, H., De Mulder, D., Robinson, E. L., Deglasse, J.-P., Tiburcy, M., Chow, P. H., Jonas, J.-C. J., Gilon, P., Steinhorn, B., Michel, T., Beauloye, C., Bertrand, L., Farah, C., Dei Zotti, F., Debaix, H., Bouzin, C., Brusa, D., Horman, S., Vanoverschelde, J.-L., Bergmann, O., Gilis, D., Rooman, M., Ghigo, A., Geninatti-Crich, S., Yool, A., Zimmermann, W. H., Roderick, L., Devuyst, O., & Balligand, J.-L. (2020). Inhibition of aquaporin-1 prevents myocardial remodeling by blocking the transmembrane transport of hydrogen peroxide. Science Translational Medicine, 12(564), eaay2176. doi:10.1126/scitranslmed.aay2176

Pathological remodeling of the myocardium has long been known to involve oxidant signaling, but strategies using systemic antioxidants have generally failed to prevent it. We sought to identify key regulators of oxidant-mediated cardiac hypertrophy amenable to targeted pharmacological therapy. Specific isoforms of the aquaporin water channels have been implicated in oxidant sensing, but their role in heart muscle is unknown. RNA sequencing from human cardiac myocytes revealed that the archetypal AQP1 is a major isoform. AQP1 expression correlates with the severity of hypertrophic remodeling in patients with aortic stenosis. The AQP1 channel was detected at the plasma membrane of human and mouse cardiac myocytes from hypertrophic hearts, where it colocalized with NADPH oxidase-2 and caveolin-3. We show that hydrogen peroxide (H 2 O 2 ), produced extracellularly, is necessary for the hypertrophic response of isolated cardiac myocytes and that AQP1 facilitates the transmembrane transport of H 2 O 2 through its water pore, resulting in activation of oxidant-sensitive kinases in cardiac myocytes. Structural analysis of the amino acid residues lining the water pore of AQP1 supports its permeation by H 2 O 2 . Deletion of Aqp1 or selective blockade of the AQP1 intrasubunit pore inhibited H 2 O 2 transport in mouse and human cells and rescued the myocyte hypertrophy in human induced pluripotent stem cell-derived engineered heart muscle. Treatment of mice with a clinically approved AQP1 inhibitor, Bacopaside, attenuated cardiac hypertrophy. We conclude that cardiac hypertrophy is mediated by the transmembrane transport of H 2 O 2 by the water channel AQP1 and that inhibitors of AQP1 represent new possibilities for treating hypertrophic cardiomyopathies.

https://dipot.ulb.ac.be/dspace/bitstream/2013/313486/3/2020_Montiel_SciTransMed.pdf

Protein Thermal Stability Engineering Using HoTMuSiC.

Pucci, F., Kwasigroch, J.-M., & Rooman, M. (2020). Protein Thermal Stability Engineering Using HoTMuSiC. Methods in molecular biology, 2112, 59-73. doi:10.1007/978-1-0716-0270-6_5

The rational design of enzymes is a challenging research field, which plays an important role in the optimization of a wide series of biotechnological processes. Computational approaches allow screening all possible amino acid substitutions in a target protein and to identify a subset likely to have the desired properties. They can thus be used to guide and restrict the huge, time-consuming search in sequence space to reach protein optimality. Here we present HoTMuSiC, a tool that predicts the impact of point mutations on the protein melting temperature, which uses the experimental or modeled protein structure as sole input and is available at the dezyme.com website. Its main advantages include accuracy and speed, which makes it a perfect instrument for thermal stability engineering projects aiming at designing new proteins that feature increased heat resistance or remain active and stable in nonphysiological conditions. We set up a HoTMuSiC-based pipeline, which uses additional information to avoid mutations of functionally important residues, identified as being too well conserved among homologous proteins or too close to annotated functional sites. The efficiency of this pipeline is successfully demonstrated on Rhizomucor miehei lipase.

https://dipot.ulb.ac.be/dspace/bitstream/2013/309533/3/2020_StrucBioinf_Pucci_etal._HoTMuSiCchapter.pdf

SOLart: a structure-based method to predict protein solubility and aggregation

Hou, Q., Kwasigroch, J.-M., Rooman, M., & Pucci, F. (2020). SOLart: a structure-based method to predict protein solubility and aggregation. Bioinformatics, 36(5), 1445-1452. doi:10.1093/bioinformatics/btz773

Abstract Motivation The solubility of a protein is often decisive for its proper functioning. Lack of solubility is a major bottleneck in high-throughput structural genomic studies and in high-concentration protein production, and the formation of protein aggregates causes a wide variety of diseases. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools. Results We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue-residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of Escherichia coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of almost 0.7 both in cross-validation on the training dataset and in an independent set of Saccharomyces cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists. Availability and implementation The SOLart webserver is freely available at http://babylone.ulb.ac.be/SOLART/. Supplementary information Supplementary data are available at Bioinformatics online.

https://dipot.ulb.ac.be/dspace/bitstream/2013/300304/3/600734v1.full.pdf

Digenic inheritance of human primary microcephaly delineates centrosomal and non centrosomal pathways.

Duerinckx, S., Jacquemin, V., Drunat, S., Vial, Y., Passemard, S., Perazzolo, C., Massart, A., Soblet, J., Racapé, J., Desmyter, L., Badoer, C., Papadimitriou, S., Le Borgne, Y.-A., Lefort, A., Libert, F., De Maertelaer, V., Rooman, M., Costagliola, S., Verloes, A., Lenaerts, T., Pirson, I., & Abramowicz, M. (2020). Digenic inheritance of human primary microcephaly delineates centrosomal and non centrosomal pathways. Human mutation, 41(2), 512-524. doi:10.1002/humu.23948

Primary Microcephaly (PM) is characterized by a small head since birth and is vastly heterogeneous both genetically and phenotypically. While most cases are monogenic, genetic interactions between Aspm and Wdr62 have recently been described in a mouse model of PM. Here, we used two complementary, holistic in vivo approaches: high throughput DNA sequencing of multiple PM genes in human PM patients, and genome-edited zebrafish modeling for digenic inheritance of PM. Exomes of PM patients showed a significant burden of variants in 75 PM genes, that persisted after removing monogenic causes of PM (e.g., biallelic pathogenic variants in CEP152). This observation was replicated in an independent cohort of PM patients, where a PM gene panel showed in addition that the burden was carried by six centrosomal genes. Allelic frequencies were consistent with digenic inheritance. In zebrafish, non-centrosomal gene casc5 -/- produced a severe PM phenotype, that was not modified by centrosomal genes aspm or wdr62 invalidation. A digenic, quadriallelic PM phenotype was produced by aspm and wdr62. Our observations provide strong evidence for digenic inheritance of human PM, involving centrosomal genes. Absence of genetic interaction between casc5 and aspm or wdr62 further delineates centrosomal and non-centrosomal pathways in PM. This article is protected by copyright. All rights reserved.

https://dipot.ulb.ac.be/dspace/bitstream/2013/296188/3/Duerinckx_et_al-2019-Human_Mutation.pdfhttps://dipot.ulb.ac.be/dspace/bitstream/2013/296188/4/Supp_Mat.pdf

2019

DISCOPOLIS : an algorithm for uniform sampling of metabolic flux distributions via iterative sequences of linear programs

Bogaerts, P., & Rooman, M. (2019). DISCOPOLIS : an algorithm for uniform sampling of metabolic flux distributions via iterative sequences of linear programs. IFAC-PapersOnLine, 52-26, 269-274.
https://dipot.ulb.ac.be/dspace/bitstream/2013/300983/3/2019_IFAC_Bogaerts_Rooman_DISCOPOLIS.pdf

A comprehensive computational study of amino acid interactions in membrane proteins.

Mbaye, M. N., Hou, Q., Basu, S. C., Teheux, F., Pucci, F., & Rooman, M. (2019). A comprehensive computational study of amino acid interactions in membrane proteins. Scientific reports, 9(1), 12043. doi:10.1038/s41598-019-48541-2

Transmembrane proteins play a fundamental role in a wide series of biological processes but, despite their importance, they are less studied than globular proteins, essentially because their embedding in lipid membranes hampers their experimental characterization. In this paper, we improved our understanding of their structural stability through the development of new knowledge-based energy functions describing amino acid pair interactions that prevail in the transmembrane and extramembrane regions of membrane proteins. The comparison of these potentials and those derived from globular proteins yields an objective view of the relative strength of amino acid interactions in the different protein environments, and their role in protein stabilization. Separate potentials were also derived from α-helical and β-barrel transmembrane regions to investigate possible dissimilarities. We found that, in extramembrane regions, hydrophobic residues are less frequent but interactions between aromatic and aliphatic amino acids as well as aromatic-sulfur interactions contribute more to stability. In transmembrane regions, polar residues are less abundant but interactions between residues of equal or opposite charges or non-charged polar residues as well as anion-π interactions appear stronger. This shows indirectly the preference of the water and lipid molecules to interact with polar and hydrophobic residues, respectively. We applied these new energy functions to predict whether a residue is located in the trans- or extramembrane region, and obtained an AUC score of 83% in cross validation, which demonstrates their accuracy. As their application is, moreover, extremely fast, they are optimal instruments for membrane protein design and large-scale investigations of membrane protein stability.

https://dipot.ulb.ac.be/dspace/bitstream/2013/292345/4/doi_275972.pdf

Relation between DNA ionization potentials, single base substitutions and pathogenic variants

Pucci, F., & Rooman, M. (2019). Relation between DNA ionization potentials, single base substitutions and pathogenic variants. BMC genomics, 20, 551. doi:10.1186/s12864-019-5867-y

Background: It is nowadays clear that single base substitutions that occur in the human genome, of which some lead to pathogenic conditions, are non-random and influenced by their flanking nucleobase sequences. However, despite recent progress, the understanding of these "non-local" effects is still far from being achieved. Results: To advance this problem, we analyzed the relationship between the base mutability in specific gene regions and the electron hole transport along the DNA base stacks, as it is one of the mechanisms that have been suggested to contribute to these effects. More precisely, we studied the connection between the normalized frequency of single base substitutions and the vertical ionization potential of the base and its flanking sequence, estimated using MP2/6-31G∗ab initio quantum chemistry calculations. We found a statistically significant overall anticorrelation between these two quantities: the lower the vIP value, the more probable the substitution. Moreover, the slope of the regression lines varies. It is larger for introns than for exons and untranslated regions, and for synonymous than for missense substitutions. Interestingly, the correlation appears to be more pronounced when considering the flanking sequence of the substituted base in the 3' rather than in the 5' direction, which corresponds to the preferred direction of charge migration. A weaker but still statistically significant correlation is found between the ionization potentials and the pathogenicity of the base substitutions. Moreover, pathogenicity is also preferentially associated with larger changes in ionization potentials upon base substitution. Conclusions: With this analysis we gained new insights into the complex biophysical mechanisms that are at the basis of mutagenesis and pathogenicity, and supported the role of electron-hole transport in these matters.

https://dipot.ulb.ac.be/dspace/bitstream/2013/297124/1/doi_280768.pdf

Rational antibiotic design: in silico structural comparison of the functional cavities of penicillin-binding proteins and ß-lactamases

Mbaye, M. N., Gilis, D., & Rooman, M. (2019). Rational antibiotic design: in silico structural comparison of the functional cavities of penicillin-binding proteins and ß-lactamases. Journal of biomolecular structure & dynamics, 37(1), 65-74. doi:10.1080/07391102.2017.1418678

The class of ß-lactam antibiotics has proven highly efficient in targeting bacterial penicillin-binding proteins (PBP) leading to the blocking of the bacterial cell wall synthesis. However, the benefit of these drugs is limited because of bacterial resistance mechanisms; the most widespread resistance involves ß-lactamase enzymes (ßLACT) that inactivate ß-lactam-based molecules. We focused on PBPs and ßLACTs from enterobacteria, and performed a detailed in silico study of PBPs whose inactivation is lethal for the bacteria and of ßLACTs that have a PBP-type catalytic mechanism. The comparison of the sequences and structures of PBPs and ßLACTs shows an almost perfect conservation of the catalytic site, and a high spatial resemblance of the whole functional cavity despite a very low overall sequence identity. Some notable differences in the functional cavity were observed in the vicinity of the catalytic site: four tyrosines are well conserved in the PBPs, whereas the residues occurring at equivalent positions in the ßLACT families present other physicochemical properties. These tyrosines are thus good candidates to be targeted in designing new antibiotic molecules with increased affinity and specificity for PBPs, with the goal of overcoming drug resistance. Our analysis also identified residues that have similar characteristics in most ßLACT families and different properties in PBPs; these are interesting targets for new ligands that specifically inhibit ßLACT proteins. The in silico approach presented here can be extended to other protein systems in view of guiding and improving rational drug design.

https://dipot.ulb.ac.be/dspace/bitstream/2013/296578/3/2018_Mbaye_JBSD.pdf

2018

Computational analysis of the amino acid interactions that promote or decrease protein solubility

Hou, Q., Bourgeas, R., Pucci, F., & Rooman, M. (2018). Computational analysis of the amino acid interactions that promote or decrease protein solubility. Scientific reports, 8(1), 14661. doi:10.1038/s41598-018-32988-w

The solubility of globular proteins is a basic biophysical property that is usually a prerequisite for their functioning. In this study, we probed the solubility of globular proteins with the help of the statistical potential formalism, in view of objectifying the connection of solubility with structural and energetic properties and of the solubility-dependence of specific amino acid interactions. We started by setting up two independent datasets containing either soluble or aggregation-prone proteins with known structures. From these two datasets, we computed solubility-dependent distance potentials that are by construction biased towards the solubility of the proteins from which they are derived. Their analysis showed the clear preference of amino acid interactions such as Lys-containing salt bridges and aliphatic interactions to promote protein solubility, whereas others such as aromatic, His-π, cation-π, amino-π and anion-π interactions rather tend to reduce it. These results indicate that interactions involving delocalized π-electrons favor aggregation, unlike those involving no (or few) dispersion forces. Furthermore, using our potentials derived from either highly or weakly soluble proteins to compute protein folding free energies, we found that the difference between these two energies correlates better with solubility than other properties analyzed before such as protein length, isoelectric point and aliphatic index. This is, to the best of our knowledge, the first comprehensive in silico study of the impact of residue-residue interactions on protein solubility properties.The results of this analysis provide new insights that will facilitate future rational protein design applications aimed at modulating the solubility of targeted proteins.

https://dipot.ulb.ac.be/dspace/bitstream/2013/282571/3/doi_266198.pdf

Deciphering noise amplification and reduction in open chemical reaction networks

Pucci, F., & Rooman, M. (2018). Deciphering noise amplification and reduction in open chemical reaction networks. Journal of the Royal Society interface, 15(149), 20180805. doi:10.1098/rsif.2018.0805

The impact of fluctuations on the dynamical behaviour of complex biological systems is a longstanding issue, whose understanding would elucidate how evolutionary pressure tends to modulate intrinsic noise. Using the Itō stochastic differential equation formalism, we performed analytic and numerical analyses of model systems containing different molecular species in contact with the environment and interacting with each other through mass-action kinetics. For networks of zero deficiency, which admit a detailed- or complex-balanced steady state, all molecular species are uncorrelated and their Fano factors are Poissonian. Systems of higher deficiency have non-equilibrium steady states and non-zero reaction fluxes flowing between the complexes. When they model homo-oligomerization, the noise on each species is reduced when the flux flows from the oligomers of lowest to highest degree, and amplified otherwise. In the case of hetero-oligomerization systems, only the noise on the highest-degree species shows this behaviour.

https://dipot.ulb.ac.be/dspace/bitstream/2013/283642/4/2018_JRoyalSocInterface_Pucci_Rooman_Noise.pdf

Quantification of biases in predictions of protein stability changes upon mutations.

Pucci, F., Bernaerts, K., Kwasigroch, J.-M., & Rooman, M. (2018). Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics, 34(21), 3659-3665. doi:10.1093/bioinformatics/bty348

Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis experiments feasible, even on a proteome scale. Despite these achievements, they still suffer from important issues that must be solved to allow further improving their performances and utilizing them to deepen our insights into protein folding and stability mechanisms. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations.

Large-scale in-silico statistical mutagenesis analysis sheds light on the deleteriousness landscape of the human proteome.

Raimondi, D., Orlando, G., Tabaro, F., Lenaerts, T., Rooman, M., Moreau, Y., & Vranken, W. F. (2018). Large-scale in-silico statistical mutagenesis analysis sheds light on the deleteriousness landscape of the human proteome. Scientific reports, 8(1), 16980. doi:10.1038/s41598-018-34959-7

Next generation sequencing technologies are providing increasing amounts of sequencing data, paving the way for improvements in clinical genetics and precision medicine. The interpretation of the observed genomic variants in the light of their phenotypic effects is thus emerging as a crucial task to solve in order to advance our understanding of how exomic variants affect proteins and how the proteins' functional changes affect human health. Since the experimental evaluation of the effects of every observed variant is unfeasible, Bioinformatics methods are being developed to address this challenge in-silico, by predicting the impact of millions of variants, thus providing insight into the deleteriousness landscape of entire proteomes. Here we show the feasibility of this approach by using the recently developed DEOGEN2 variant-effect predictor to perform the largest in-silico mutagenesis scan to date. We computed the deleteriousness score of 170 million variants over 15000 human proteins and we analysed the results, investigating how the predicted deleteriousness landscape of the proteins relates to known functionally and structurally relevant protein regions and biophysical properties. Moreover, we qualitatively validated our results by comparing them with two mutagenesis studies targeting two specific proteins, showing the consistency of DEOGEN2 predictions with respect to experimental data.

https://dipot.ulb.ac.be/dspace/bitstream/2013/283645/4/doi_267272.pdf

Insights into noise modulation in oligomerization systems of increasing complexity

Pucci, F., & Rooman, M. (2018). Insights into noise modulation in oligomerization systems of increasing complexity. Physical Review E, 98(1), 012137. doi:10.1103/PhysRevE.98.012137

Understanding under which conditions the increase of systems complexity is evolutionarily advantageous, and how this trend is related to the modulation of the intrinsic noise, are fascinating issues of utmost importance for synthetic and systems biology. To get insights into these matters, we analyzed a series of chemical reaction networks with different topologies and complexity, described by mass-action kinetics. We showed, analytically and numerically, that the global level of fluctuations at the steady state, measured by the sum over all species of the Fano factors of the number of molecules, is directly related to the network's deficiency. For zero-deficiency systems, this sum is constant and equal to the rank of the network. For higher deficiencies, additional terms appear in the Fano factor sum, which are proportional to the net reaction fluxes between the molecular complexes. We showed that the system's global intrinsic noise is reduced when all fluxes flow from lower to higher degree oligomers, or equivalently, towards the species of higher complexity, whereas it is amplified when the fluxes are directed towards lower complexity species.

Lexicon Visualization Library and Javascript for Scientific data visualization

Tanyalçin, I., Ferte, J., Ancien, F., Smits, G., Rooman, M., & Vranken, W. F. (2018). Lexicon Visualization Library and Javascript for Scientific data visualization. Computing in science & engineering, 20, 50-65. doi:10.1109/MCSE.2018.011111125

It is becoming increasingly challenging to efficiently visualize and extract useful insight from complex and big datasets. JavaScript stands out as a suitable programming choice that offers mature libraries, easy implementation, and extensive customization, all of which stay in the shadow of new and rapid developments in the language. To illustrate the use of JavaScript in a scientific context, this article elaborates on Lexicon, a collection of JavaScript libraries for generating interactive visualizations in bioinformatics and other custom libraries.

Prediction and interpretation of deleterious coding variants in terms of protein structural stability

Ancien, F., Pucci, F., Godfroid, M., & Rooman, M. (2018). Prediction and interpretation of deleterious coding variants in terms of protein structural stability. Scientific reports, 8, 4480. doi:10.1038/s41598-018-22531-2

The classification of human genetic variants into deleterious and neutral is a challenging issue, whose complexity is rooted in the large variety of biophysical mechanisms that can be responsible for disease conditions. For non-synonymous mutations in structured proteins, one of these is the protein stability change, which can lead to loss of protein structure or function. We developed a stability-driven knowledge-based classifier that uses protein structure, artificial neural networks and solvent accessibility-dependent combinations of statistical potentials to predict whether destabilizing or stabilizing mutations are disease-causing. Our predictor yields a balanced accuracy of 71% in cross validation. As expected, it has a very high positive predictive value of 89%: It predicts with high accuracy the subset of mutations that are deleterious because of stability issues, but is by construction unable of classifying variants that are deleterious for other reasons. Its combination with an evolutionary-based predictor increases the balanced accuracy up to 75%, and allowed predicting more than 1/4 of the variants with 95% positive predictive value. Our method, called SNPMuSiC, can be used with both experimental and modeled structures and compares favorably with other prediction tools on several independent test sets. It constitutes a step towards interpreting variant effects at the molecular scale. SNPMuSiC is freely available at https://soft.dezyme.com/.

Intrinsic noise modulation in closed oligomerization-type system

Rooman, M., & Pucci, F. (2018). Intrinsic noise modulation in closed oligomerization-type system. IFAC-PapersOnLine, 51(2), 649-653. doi:10.1016/j.ifacol.2018.03.110

How random fluctuations impact on biological systems and what is their relationship with complexity and energetic cooperativity are challenging questions that are far from being elucidated. Using the stochastic differential equation formalism, we studied analytically the effect of fluctuations on a series of oligomerization processes, in which several molecules of the same or different species interact to form complexes, without interaction with the environment. The conservation of the total number of molecules within the systems imposes constraints on the stochastic quantities, among which the negativity of the covariances and the vanishing of the determinant of the covariance matrix. The intrinsic noise on the number of molecules of each species is represented by the Fano factor, defined as the variance to mean ratio. At the equilibrium steady states, the sum of the Fano factors of all molecular species is equal to the rank of the system, independently of the parameters. The Fano factors of the individual molecular species are, however, parameter dependent. We found that when the free energy cooperativity of the reactions increases, the intrinsic noise on the oligomeric product decreases, and is compensated by a higher noise on the monomeric reactants and/or intermediate states. The noise reduction is moreover more pronounced for higher complexity systems, involving oligomers of higher degrees.

https://dipot.ulb.ac.be/dspace/bitstream/2013/264290/3/Elsevier_247917.pdf

2017

SCooP: an accurate and fast predictor of protein stability curves as a function of temperature.

Pucci, F., Kwasigroch, J.-M., & Rooman, M. (2017). SCooP: an accurate and fast predictor of protein stability curves as a function of temperature. Bioinformatics, 33(21), 3415-3422. doi:10.1093/bioinformatics/btx417

The molecular bases of protein stability remain far from elucidated even though substantial progress has been made through both computational and experimental investigations. One of the most challenging goals is the development of accurate prediction tools of the temperature dependence of the standard folding free energy ΔG(T). Such predictors have an enormous series of potential applications, which range from drug design in the biopharmaceutical sector to the optimization of enzyme activity for biofuel production. There is thus an important demand for novel, reliable and fast predictors.

DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins.

Raimondi, D., Tanyalçin, I., Ferte, J., Gazzo, A., Orlando, G., Lenaerts, T., Rooman, M., & Vranken, W. F. (2017). DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic acids research, 45(W1), W201-W206. doi:10.1093/nar/gkx390

High-throughput sequencing methods are generating enormous amounts of genomic data, giving unprecedented insights into human genetic variation and its relation to disease. An individual human genome contains millions of Single Nucleotide Variants: to discriminate the deleterious from the benign ones, a variety of methods have been developed that predict whether a protein-coding variant likely affects the carrier individual's health. We present such a method, DEOGEN2, which incorporates heterogeneous information about the molecular effects of the variants, the domains involved, the relevance of the gene and the interactions in which it participates. This extensive contextual information is non-linearly mapped into one single deleteriousness score for each variant. Since for the non-expert user it is sometimes still difficult to assess what this score means, how it relates to the encoded protein, and where it originates from, we developed an interactive online framework (http://deogen2.mutaframe.com/) to better present the DEOGEN2 deleteriousness predictions of all possible variants in all human proteins. The prediction is visualized so both expert and non-expert users can gain insights into the meaning, protein context and origins of each prediction.

https://dipot.ulb.ac.be/dspace/bitstream/2013/254250/3/doi_237877.pdf

Physical and molecular bases of protein thermal stability and cold adaptation.

Pucci, F., & Rooman, M. (2017). Physical and molecular bases of protein thermal stability and cold adaptation. Current opinion in structural biology, 42, 117-128. doi:10.1016/j.sbi.2016.12.007

The molecular bases of thermal and cold stability and adaptation, which allow proteins to remain folded and functional in the temperature ranges in which their host organisms live and grow, are still only partially elucidated. Indeed, both experimental and computational studies fail to yield a fully precise and global physical picture, essentially because all effects are context-dependent and thus quite intricate to unravel. We present a snapshot of the current state of knowledge of this highly complex and challenging issue, whose resolution would enable large-scale rational protein design.

https://dipot.ulb.ac.be/dspace/bitstream/2013/246177/1/Elsevier_229804.pdf

SEPIa, a knowledge-driven algorithm for predicting conformational B-cell epitopes from the amino acid sequence.

Dalkas, G. A., & Rooman, M. (2017). SEPIa, a knowledge-driven algorithm for predicting conformational B-cell epitopes from the amino acid sequence. BMC bioinformatics, 18(1), 95. doi:10.1186/s12859-017-1528-9

The identification of immunogenic regions on the surface of antigens, which are able to be recognized by antibodies and to trigger an immune response, is a major challenge for the design of new and effective vaccines. The prediction of such regions through computational immunology techniques is a challenging goal, which will ultimately lead to a drastic limitation of the experimental tests required to validate their efficiency. However, current methods are far from being sufficiently reliable and/or applicable on a large scale.

https://dipot.ulb.ac.be/dspace/bitstream/2013/247338/4/doi_230965.pdf

2016

Improved insights into protein thermal stability: From the molecular to the structurome scale

Pucci, F., & Rooman, M. (2016). Improved insights into protein thermal stability: From the molecular to the structurome scale. Philosophical transactions - Royal Society. Mathematical, Physical and engineering sciences, 374(2080), 20160141. doi:10.1098/rsta.2016.0141

Despite the intense efforts of the last decades to understand the thermal stability of proteins, the mechanisms responsible for its modulation still remain debated. In this investigation, we tackle this issue by showing how a multiscale perspective can yield new insights. With the help of temperaturedependent statistical potentials, we analysed some amino acid interactions at the molecular level, which are suggested to be relevant for the enhancement of thermal resistance. We then investigated the thermal stability at the protein level by quantifying its modification upon amino acid substitutions. Finally, a large scale analysis of protein stability-at the structurome level-contributed to the clarification of the relation between stability and natural evolution, thereby showing that the mutational profile of proteins differs according to their thermal properties. Some considerations on how the multiscale approach could help in unravelling the protein stability mechanisms are briefly discussed. This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.

Single Mutations in the Transmembrane Domains of Maize Plasma Membrane Aquaporins Affect the Activity of Monomers within a Heterotetramer.

Berny, M. C., Gilis, D., Rooman, M., & Chaumont, F. (2016). Single Mutations in the Transmembrane Domains of Maize Plasma Membrane Aquaporins Affect the Activity of Monomers within a Heterotetramer. Molecular Plant, 9(7), 986-1003. doi:10.1016/j.molp.2016.04.006

Aquaporins are channels facilitating the diffusion of water and/or small uncharged solutes across biological membranes. They assemble as homotetramers but some of them also form heterotetramers, especially in plants. In Zea mays, aquaporins belonging to the plasma membrane intrinsic protein (PIP) subfamily are clustered into two groups, PIP1 and PIP2, which exhibit different water-channel activities when expressed in Xenopus oocytes. When PIP1 and PIP2 isoforms are co-expressed, they physically interact to modulate their subcellular localization and channel activity. Here, we demonstrated by affinity chromatography purification that, when co-expressed in Xenopus oocytes, the maize PIP1;2 and PIP2;5 isoforms assemble as homo- and heterodimers within heterotetramers. We built the 3D structure of such heterotetramers by comparative modeling on the basis of the spinach SoPIP2;1 X-ray structure and identified amino acid residues in the transmembrane domains which putatively interact at the interfaces between monomers. Their roles in the water-channel activity, subcellular localization, protein abundance, and physical interaction were investigated by mutagenesis. We highlighted single-residue substitutions that either inactivated PIP2;5 or activated PIP1;2 without affecting their interaction. Interestingly, the Phe220Ala mutation in the transmembrane domain 5 of PIP1;2 activated its water-channel activity and, at the same time, inactivated PIP2;5 within a heterotetramer. Altogether, these data contribute to a better understanding of the interaction mechanisms between PIP isoforms and the role of heterotetramerization on their water-channel activity.

https://dipot.ulb.ac.be/dspace/bitstream/2013/241497/1/Elsevier_225124.pdf

High-quality thermodynamic data on the stability changes of proteins upon single-site mutations

Pucci, F., Bourgeas, R., & Rooman, M. (2016). High-quality thermodynamic data on the stability changes of proteins upon single-site mutations. Journal of physical and chemical reference data, 45(2), 023104. doi:10.1063/1.4947493

We have set up and manually curated a dataset containing experimental information on the impact of amino acid substitutions in a protein on its thermal stability. It consists of a repository of experimentally measured melting temperatures (Tm) and their changes upon point mutations (ΔTm) for proteins having a well-resolved x-ray structure. This high-quality dataset is designed for being used for the training or benchmarking of in silico thermal stability prediction methods. It also reports other experimentally measured thermodynamic quantities when available, i.e., the folding enthalpy (ΔH) and heat capacity (ΔCP) of the wild type proteins and their changes upon mutations (ΔΔH and ΔΔCP), as well as the change in folding free energy (ΔΔG) at a reference temperature. These data are analyzed in view of improving our insights into the correlation between thermal and thermodynamic stabilities, the asymmetry between the number of stabilizing and destabilizing mutations, and the difference in stabilization potential of thermostable versus mesostable proteins.

Multilevel biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects

Raimondi, D., Gazzo, A., Rooman, M., Lenaerts, T., & Vranken, W. (2016). Multilevel biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects. Bioinformatics, 32(12), 1797-1804. doi:10.1093/bioinformatics/btw094

Motivation: There are now many predictors capable of identifying the likely phenotypic effects of single nucleotide variants (SNVs) or short in-frame Insertions or Deletions (INDELs) on the increasing amount of genome sequence data. Most of these predictors focus on SNVs and use a combination of features related to sequence conservation, biophysical, and/or structural properties to link the observed variant to either neutral or disease phenotype. Despite notable successes, the mapping between genetic variants and their phenotypic effects is riddled with levels of complexity that are not yet fully understood and that are often not taken into account in the predictions, despite their promise of significantly improving the prediction of deleterious mutants. Results: We present DEOGEN, a novel variant effect predictor that can handle both missense SNVs and in-frame INDELs. By integrating information from different biological scales and mimicking the complex mixture of effects that lead from the variant to the phenotype, we obtain significant improvements in the variant-effect prediction results. Next to the typical variant-oriented features based on the evolutionary conservation of the mutated positions, we added a collection of protein-oriented features that are based on functional aspects of the gene affected. We cross-validated DEOGEN on 36 825 polymorphisms, 20 821 deleterious SNVs, and 1038 INDELs from SwissProt. The multilevel contextualization of each (variant, protein) pair in DEOGEN provides a 10% improvement of MCC with respect to current state-of-the-art tools.

Predicting protein thermal stability changes upon point mutations using statistical potentials: introducing HoTMuSiC

Pucci, F., Bourgeas, R., & Rooman, M. (2016). Predicting protein thermal stability changes upon point mutations using statistical potentials: introducing HoTMuSiC. Scientific reports, 6, 23257. doi:10.1038/srep23257

The accurate prediction of the impact of an amino acid substitution on the thermal stability of a protein is a central issue in protein science, and is of key relevance for the rational optimization of various bioprocesses that use enzymes in unusual conditions. Here we present one of the first computational tools to predict the change in melting temperature ΔTm upon point mutations, given the protein structure and, when available, the melting temperature Tm of the wild-type protein. The key ingredients of our model structure are standard and temperature-dependent statistical potentials, which are combined with the help of an artificial neural network. The model structure was chosen on the basis of a detailed thermodynamic analysis of the system. The parameters of the model were identified on a set of more than 1,600 mutations with experimentally measured ΔTm. The performance of our method was tested using a strict 5-fold cross-validation procedure, and was found to be significantly superior to that of competing methods. We obtained a root mean square deviation between predicted and experimental ΔTm values of 4.2 °C that reduces to 2.9 °C when ten percent outliers are removed. A webserver-based tool is freely available for non-commercial use at soft.dezyme.com.

Stability strengths and weaknesses in protein structures detected by statistical potentials. Application to bovine seminal ribonuclease

De Laet, M., Gilis, D., & Rooman, M. (2016). Stability strengths and weaknesses in protein structures detected by statistical potentials. Application to bovine seminal ribonuclease. Proteins, 84(1), 143-158. doi:10.1002/prot.24962

We present an in silico method to estimate the contribution of each residue in a protein to its overall stability using three database-derived statistical potentials that are based on inter-residue distances, backbone torsion angles and solvent accessibility, respectively. Residues that contribute very unfavorably to the folding free energy are defined as stability weaknesses, whereas residues that show a highly stabilizing contribution are called stability strengths. Strengths and/or weaknesses on residues that are in spatial contact are clustered into 3-dimensional (3D) stability patches. The identification and analysis of strength- and weakness-containing regions in a protein may reveal structural or functional characteristics, and/or interesting spots to introduce mutations. To illustrate the power of our method, we apply it to bovine seminal ribonuclease. This enzyme catalyzes the degradation of RNA strands, and has the peculiarity of undergoing 3D domain swapping in physiological conditions. The weaknesses and strengths were compared among the monomeric, dimeric and swapped dimeric forms. We identified weaknesses among the catalytic residues and a mixture of weaknesses and strengths among the substrate-binding residues in the three forms. In the regions involved in 3D swapping, we observed an accumulation of weaknesses in the monomer, which disappear in the dimer and especially in the swapped dimer. Moreover, monomeric homologous proteins were found to exhibit less weaknesses in these regions, whereas mutants known to favor unswapped dimerization appear stabilized in this form. Our method has several perspectives for functional annotation, rational prediction of targeted mutations, and mapping of stability changes upon conformational rearrangements.

https://dipot.ulb.ac.be/dspace/bitstream/2013/220611/3/220611.pdf

Probability distributions for multimeric systems are skew normal

Albert, J., & Rooman, M. (2016). Probability distributions for multimeric systems are skew normal. Journal of mathematical biology, 72(1-2), 157-169. doi:10.1007/s00285-015-0877-0

We propose a fast and accurate method of obtaining the equilibrium mono-modal joint probability distributions for multimeric systems. The method necessitates only two assumptions: the copy number of all species of molecule may be treated as continuous; and, the probability density functions (pdf) are well-approximated by multivariate skew normal distributions (MSND). Starting from the master equation, we convert the problem into a set of equations for the statistical moments which are then expressed in terms of the parameters intrinsic to the MSND. Using an optimization package on Mathematica, we minimize a Euclidian distance function comprising of a sum of the squared difference between the left and the right hand sides of these equations. Comparison of results obtained via our method with those rendered by the Gillespie algorithm demonstrates our method to be highly accurate as well as efficient.

2015

Towards an accurate prediction of the thermal stability of homologous proteins

Pucci, F., & Rooman, M. (2015). Towards an accurate prediction of the thermal stability of homologous proteins. Journal of biomolecular structure & dynamics, 34(5), 1132-1142. doi:10.1080/07391102.2015.1073631

Is the cell nucleus a necessary component in precise temporal patterning?

Albert, J., & Rooman, M. (2015). Is the cell nucleus a necessary component in precise temporal patterning? PloS one, 10(7), e0134239. doi:10.1371/journal.pone.0134239

One of the functions of the cell nucleus is to help regulate gene expression by controlling molecular traffic across the nuclear envelope. Here we investigate, via stochastic simulation, what effects, if any, does segregation of a system into the nuclear and cytoplasmic compartments have on the stochastic properties of a motif with a negative feedback. One of the effects of the nuclear barrier is to delay the nuclear protein concentration, allowing it to behave in a switch-like manner. We found that this delay, defined as the time for the nuclear protein concentration to reach a certain threshold, has an extremely narrow distribution. To show this, we considered two models. In the first one, the proteins could diffuse freely from cytoplasm to nucleus (simple model); and in the second one, the proteins required assistance from a special class of proteins called importins. For each model, we generated fifty parameter sets, chosen such that the temporal profiles they effectuated were very similar, and whose average threshold time was approximately 150 minutes. The standard deviation of the threshold times computed over one hundred realizations were found to be between 1.8 and 7.16 minutes across both models. To see whether a genetic motif in a prokaryotic cell can achieve this degree of precision, we also simulated five variations on the coherent feedforward motif (CFFM), three of which contained a negative feedback. We found that the performance of these motifs was nowhere near as impressive as the one found in the eukaryotic cell; the best standard deviation was 6.6 minutes. We argue that the significance of these results, the fact and necessity of spatiooral precision in the developmental stages of eukaryotes, and the absence of such a precision in prokaryotes, all suggest that the nucleus has evolved, in part, under the selective pressure to achieve highly predictable phenotypes.

Symmetry principles in optimization problems: an application to protein stability prediction

Pucci, F., Bernaerts, K., Teheux, F., Gilis, D., & Rooman, M. (2015). Symmetry principles in optimization problems: an application to protein stability prediction. IFAC-PapersOnLine, 48, 458-463. doi:10.1016/j.ifacol.2015.05.062

In this paper, we show how the adequate use of the intrinsic symmetry of a system when setting up its model structure can avoid unwanted biases in the parameter optimization phase. The playground of our analysis is the prediction of protein thermodynamic stability changes upon single amino acid substitutions (point mutations). Using a simple artificial neural network (ANN), sixteen different energy-like contributions are combined to predict the change in folding free energy (Δ ΔG). We show that the presence of terms violating the symmetry under inverse mutations induces a bias towards the dataset on which the ANN is trained, even if a strict n-fold cross-validation procedure is performed. A completely symmetric free energy functional is then introduced, which gives predictions that are slightly less efficient in terms of root mean square error with respect to the experimental Δ ΔG's, but appear to be basically independent of the training dataset and are thus more satisfactory.

https://dipot.ulb.ac.be/dspace/bitstream/2013/205308/3/Elsevier_188935.pdf

2014

Cation-pi, amino-pi, pi-pi, and H-bond interactions stabilize antigen-antibody interfaces.

Dalkas, G. A., Teheux, F., Kwasigroch, J.-M., & Rooman, M. (2014). Cation-pi, amino-pi, pi-pi, and H-bond interactions stabilize antigen-antibody interfaces. Proteins, 82(9), 1734-1746. doi:10.1002/prot.24527

The identification of immunogenic regions on the surface of antigens, which are able to stimulate an immune response, is a major challenge for the design of new vaccines. Computational immunology aims at predicting such regions-in particular B-cell epitopes-but is far from being reliably applicable on a large scale. To gain understanding into the factors that contribute to the antigen-antibody affinity and specificity, we perform a detailed analysis of the amino acid composition and secondary structure of antigen and antibody surfaces, and of the interactions that stabilize the complexes, in comparison with the composition and interactions observed in other heterodimeric protein interfaces. We make a distinction between linear and conformational B-cell epitopes, according to whether they consist of successive residues along the polypeptide chain or not. The antigen-antibody interfaces were shown to differ from other protein-protein interfaces by their smaller size, their secondary structure with less helices and more loops, and the interactions that stabilize them: more H-bond, cation-π, amino-π, and π-π interactions, and less hydrophobic packing; linear and conformational epitopes can clearly be distinguished. Often, chains of successive interactions, called cation/amino-π and π-π chains, are formed. The amino acid composition differs significantly between the interfaces: antigen-antibody interfaces are less aliphatic and more charged, polar and aromatic than other heterodimeric protein interfaces. Moreover, paratopes and epitopes-albeit to a lesser extent-have amino acid compositions that are distinct from general protein surfaces. This specificity holds promise for improving B-cell epitope prediction.Proteins 2014. © 2014 Wiley Periodicals, Inc.

https://dipot.ulb.ac.be/dspace/bitstream/2013/171908/3/171908.pdf

Stochastic noise reduction upon complexification: Positively correlated birth-death type systems.

Rooman, M., Albert, J., & Duerinckx, M. (2014). Stochastic noise reduction upon complexification: Positively correlated birth-death type systems. Journal of theoretical biology, 354, 113-123. doi:10.1016/j.jtbi.2014.03.007

Cell systems consist of a huge number of various molecules that display specific patterns of interactions, which have a determining influence on the cell׳s functioning. In general, such complexity is seen to increase with the complexity of the organism, with a concomitant increase of the accuracy and specificity of the cellular processes. The question thus arises how the complexification of systems - modeled here by simple interacting birth-death type processes - can lead to a reduction of the noise - described by the variance of the number of molecules. To gain understanding of this issue, we investigated the difference between a single system containing molecules that are produced and degraded, and the same system - with the same average number of molecules - connected to a buffer. We modeled these systems using Itō stochastic differential equations in discrete time, as they allow straightforward analytical developments. In general, when the molecules in the system and the buffer are positively correlated, the variance on the number of molecules in the system is found to decrease compared to the equivalent system without a buffer. Only buffers that are too noisy themselves tend to increase the noise in the main system. We tested this result on two model cases, in which the system and the buffer contain proteins in their active and inactive state, or protein monomers and homodimers. We found that in the second test case, where the interconversion terms are non-linear in the number of molecules, the noise reduction is much more pronounced; it reaches up to 20% reduction of the Fano factor with the parameter values tested in numerical simulations on an unperturbed birth-death model. We extended our analysis to two arbitrary interconnected systems, and found that the sum of the noise levels in the two systems generally decreases upon interconnection if the molecules they contain are positively correlated.

https://dipot.ulb.ac.be/dspace/bitstream/2013/171905/1/Elsevier_155535.pdf

Stability Curve Prediction of Homologous Proteins Using Temperature-Dependent Statistical Potentials

Pucci, F., & Rooman, M. (2014). Stability Curve Prediction of Homologous Proteins Using Temperature-Dependent Statistical Potentials. PLoS computational biology, 10(7), e1003689. doi:10.1371/journal.pcbi.1003689

The unraveling and control of protein stability at different temperatures is a fundamental problem in biophysics that is substantially far from being quantitatively and accurately solved, as it requires a precise knowledge of the temperature dependence of amino acid interactions. In this paper we attempt to gain insight into the thermal stability of proteins by designing a tool to predict the full stability curve as a function of the temperature for a set of 45 proteins belonging to 11 homologous families, given their sequence and structure, as well as the melting temperature (Tm) and the change in heat capacity (δCp) of proteins belonging to the same family. Stability curves constitute a fundamental instrument to analyze in detail the thermal stability and its relation to the thermodynamic stability, and to estimate the enthalpic and entropic contributions to the folding free energy. In summary, our approach for predicting the protein stability curves relies on temperature-dependent statistical potentials derived from three datasets of protein structures with targeted thermal stability properties. Using these potentials, the folding free energies (δG) at three different temperatures were computed for each protein. The Gibbs-Helmholtz equation was then used to predict the protein's stability curve as the curve that best fits these three points. The results are quite encouraging: the standard deviations between the experimental and predicted Tm's, δCp and folding free energies at room temperature (δG25) are equal to 13 °C, 1.3 kcal/(mol° C) and 4.1 kcal/mol, respectively, in cross-validation. The main sources of error and some further improvements and perspectives are briefly discussed. © 2014 Pucci, Rooman.

Sequence and conformation effects on ionization potential and charge distribution of homo-nucleobase stacks using M06-2X hybrid density functional theory calculations.

Rooman, M., & Wintjens, R. (2014). Sequence and conformation effects on ionization potential and charge distribution of homo-nucleobase stacks using M06-2X hybrid density functional theory calculations. Journal of biomolecular structure & dynamics, 32(4), 532-545. doi:10.1080/07391102.2013.783508

DNA is subject to oxidative damage due to radiation or by-products of cellular metabolism, thereby creating electron holes that migrate along the DNA stacks. A systematic computational analysis of the dependence of the electronic properties of nucleobase stacks on sequence and conformation was performed here, on the basis of single- and double-stranded homo-nucleobase stacks of 1-10 bases or 1-8 base pairs in standard A-, B-, and Z-conformation. First, several levels of theory were tested for calculating the vertical ionization potentials of individual nucleobases; the M06-2X/6-31G* hybrid density functional theory method was selected by comparison with experimental data. Next, the vertical ionization potential, and the Mulliken charge and spin density distributions were calculated and considered on all nucleobase stacks. We found that (1) the ionization potential decreases with the number of bases, the lowest being reached by Gua≡Cyt tracts; (2) the association of two single strands into a double-stranded tract lowers the ionization potential significantly (3) differences in ionization potential due to sequence variation are roughly three times larger than those due to conformational modifications. The charge and spin density distributions were found (1) to be located toward the 5'-end for single-stranded Gua-stacks and toward the 3'-end for Cyt-stacks and basically delocalized over all bases for Ade- and Thy-stacks; (2) the association into double-stranded tracts empties the Cyt- and Thy-strands of most of the charge and all the spin density and concentrates them on the Gua- and Ade-strands. The possible biological implications of these results for transcription are discussed.

https://dipot.ulb.ac.be/dspace/bitstream/2013/171917/4/PMC3919198.pdf

Protein Thermostability Prediction within Homologous Families Using Temperature-Dependent Statistical Potentials

Pucci, F., Dhanani, M., Dehouck, Y., & Rooman, M. (2014). Protein Thermostability Prediction within Homologous Families Using Temperature-Dependent Statistical Potentials. PloS one, 9(3), e91659. doi:10.1371/journal.pone.0091659
https://dipot.ulb.ac.be/dspace/bitstream/2013/360343/1/doi_343987.pdf

Modeling the Drosophila gene cluster regulation network for muscle development.

Haye, A., Albert, J., & Rooman, M. (2014). Modeling the Drosophila gene cluster regulation network for muscle development. PloS one, 9(3), e90285. doi:10.1371/journal.pone.0090285

The development of accurate and reliable dynamical modeling procedures that describe the time evolution of gene expression levels is a prerequisite to understanding and controlling the transcription process. We focused on data from DNA microarray time series for 20 Drosophila genes involved in muscle development during the embryonic stage. Genes with similar expression profiles were clustered on the basis of a translation-invariant and scale-invariant distance measure. The time evolution of these clusters was modeled using coupled differential equations. Three model structures involving a transcription term and a degradation term were tested. The parameters were identified in successive steps: network construction, parameter optimization, and parameter reduction. The solutions were evaluated on the basis of the data reproduction and the number of parameters, as well as on two biology-based requirements: the robustness with respect to parameter variations and the values of the expression levels not being unrealistically large upon extrapolation in time. Various solutions were obtained that satisfied all our evaluation criteria. The regulatory networks inferred from these solutions were compared with experimental data. The best solution has half of the experimental connections, which compares favorably with previous approaches. Biasing the network toward the experimental connections led to the identification of a model that is only slightly less good on the basis of the evaluation criteria. The non-uniqueness of the solutions and the variable agreement with experimental connections were discussed in the context of the different hypotheses underlying this type of approach.

https://dipot.ulb.ac.be/dspace/bitstream/2013/171906/4/doi_155536.pdf

2013

BeAtMuSiC: prediction of changes in protein-protein binding affinity on mutations.

Dehouck, Y., Kwasigroch, J.-M., Rooman, M., & Gilis, D. (2013). BeAtMuSiC: prediction of changes in protein-protein binding affinity on mutations. Nucleic acids research, 41, 333-339. doi:10.1093/nar/gkt450

The ability of proteins to establish highly selective interactions with a variety of (macro)molecular partners is a crucial prerequisite to the realization of their biological functions. The availability of computational tools to evaluate the impact of mutations on protein-protein binding can therefore be valuable in a wide range of industrial and biomedical applications, and help rationalize the consequences of non-synonymous single-nucleotide polymorphisms. BeAtMuSiC (http://babylone.ulb.ac.be/beatmusic) is a coarse-grained predictor of the changes in binding free energy induced by point mutations. It relies on a set of statistical potentials derived from known protein structures, and combines the effect of the mutation on the strength of the interactions at the interface, and on the overall stability of the complex. The BeAtMuSiC server requires as input the structure of the protein-protein complex, and gives the possibility to assess rapidly all possible mutations in a protein chain or at the interface, with predictive performances that are in line with the best current methodologies.

https://dipot.ulb.ac.be/dspace/bitstream/2013/145142/4/doi_128943.pdf

Community-wide evaluation of methods for predicting the effect of mutations on protein-protein interactions

Moretti, R., Fleishman, S. S., Agius, R., Torchala, M., Bates, P. P., Kastritis, P. P., Rodrigues, J. J., Trellet, M., Bonvin, A. A., Gilis, D., Rooman, M., Dehouck, Y., et al. (2013). Community-wide evaluation of methods for predicting the effect of mutations on protein-protein interactions. Proteins, 81(11), 1980-1987. doi:10.1002/prot.24356

Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side-chain sampling and backbone relaxation, evaluated packing, electrostatic, and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of both existing and new prediction methodologies.© 2013 Wiley Periodicals, Inc.

https://dipot.ulb.ac.be/dspace/bitstream/2013/177972/4/177972.pdf

2012

Structure-based mutant stability predictions on proteins of unknown structure.

Gonnelli, G., Rooman, M., & Dehouck, Y. (2012). Structure-based mutant stability predictions on proteins of unknown structure. Journal of biotechnology, 161(3), 287-293. doi:10.1016/j.jbiotec.2012.06.020

The ability to rapidly and accurately predict the effects of mutations on the physicochemical properties of proteins holds tremendous importance in the rational design of modified proteins for various types of industrial, environmental or pharmaceutical applications, as well as in elucidating the genetic background of complex diseases. In many cases, the absence of an experimentally resolved structure represents a major obstacle, since most currently available predictive software crucially depend on it. We investigate here the relevance of combining coarse-grained structure-based stability predictions with a simple comparative modeling procedure. Strikingly, our results show that the use of average to high quality structural models leads to virtually no loss in predictive power compared to the use of experimental structures. Even in the case of low quality models, the decrease in performance is quite limited and this combined approach remains markedly superior to other methods based exclusively on the analysis of sequence features.

https://dipot.ulb.ac.be/dspace/bitstream/2013/129960/1/Elsevier_111860.pdf

A conserved cysteine residue is involved in disulfide bond formation between plant plasma membrane aquaporin monomers.

Bienert, G. P., Cavez, D., Besserer, A., Berny, M. C., Gilis, D., Rooman, M., & Chaumont, F. (2012). A conserved cysteine residue is involved in disulfide bond formation between plant plasma membrane aquaporin monomers. Biochemical journal, 445(1), 101-111. doi:10.1042/BJ20111704

AQPs (aquaporins) are conserved in all kingdoms of life and facilitate the rapid diffusion of water and/or other small solutes across cell membranes. Among the different plant AQPs, PIPs (plasma membrane intrinsic proteins), which fall into two phylogenetic groups, PIP1 and PIP2, play key roles in plant water transport processes. PIPs form tetramers in which each monomer acts as a functional channel. The intermolecular interactions that stabilize PIP oligomer complexes and are responsible for the resistance of PIP dimers to denaturating conditions are not well characterized. In the present study, we identified a highly conserved cysteine residue in loop A of PIP1 and PIP2 proteins and demonstrated by mutagenesis that it is involved in the formation of a disulfide bond between two monomers. Although this cysteine seems not to be involved in regulation of trafficking to the plasma membrane, activity, substrate selectivity or oxidative gating of ZmPIP1s (Zm is Zea mays), ZmPIP2s and hetero-oligomers, it increases oligomer stability under denaturating conditions. In addition, when PIP1 and PIP2 are co-expressed, the loop A cysteine of ZmPIP1;2, but not that of ZmPIP2;5, is involved in the mercury sensitivity of the channels.

https://dipot.ulb.ac.be/dspace/bitstream/2013/142309/1/2012_Bienert_BiochemJ.pdf

Design principles of a genetic alarm clock.

Albert, J., & Rooman, M. (2012). Design principles of a genetic alarm clock. PloS one, 7(11), e47256. doi:10.1371/journal.pone.0047256

Turning genes on and off is a mechanism by which cells and tissues make phenotypic decisions. Gene network motifs capable of supporting two or more steady states and thereby providing cells with a plurality of possible phenotypes are referred to as genetic switches. Modeled on the bases of naturally occurring genetic networks, synthetic biologists have successfully constructed artificial switches, thus opening a door to new possibilities for improvement of the known, but also the design of new synthetic genetic circuits. One of many obstacles to overcome in such efforts is to understand and hence control intrinsic noise which is inherent in all biological systems. For some motifs the noise is negligible; for others, fluctuations in the particle number can be comparable to its average. Due to their slowed dynamics, motifs with positive autoregulation tend to be highly sensitive to fluctuations of their chemical environment and are in general very noisy, especially during transition (switching). In this article we use stochastic simulations (Gillespie algorithm) to model such a system, in particular a simple bistable motif consisting of a single gene with positive autoregulation. Due to cooperativety, the dynamical behavior of this kind of motif is reminiscent of an alarm clock - the gene is (nearly) silent for some time after it is turned on and becomes active very suddenly. We investigate how these sudden transitions are affected by noise and show that under certain conditions accurate timing can be achieved. We also examine how promoter complexity influences the accuracy of this timing mechanism.

https://dipot.ulb.ac.be/dspace/bitstream/2013/136846/4/doi_119609.pdf

Robust non-linear differential equation models of gene expression evolution across Drosophila development.

Haye, A., Albert, J., & Rooman, M. (2012). Robust non-linear differential equation models of gene expression evolution across Drosophila development. BMC research notes, 5, 46. doi:10.1186/1756-0500-5-46

This paper lies in the context of modeling the evolution of gene expression away from stationary states, for example in systems subject to external perturbations or during the development of an organism. We base our analysis on experimental data and proceed in a top-down approach, where we start from data on a system's transcriptome, and deduce rules and models from it without a priori knowledge. We focus here on a publicly available DNA microarray time series, representing the transcriptome of Drosophila across evolution from the embryonic to the adult stage.

https://dipot.ulb.ac.be/dspace/bitstream/2013/136848/4/doi_119614.pdf

2011

Flanking domain stability modulates the aggregation kinetics of a polyglutamine disease protein.

Saunders, H. M., Gilis, D., Rooman, M., Dehouck, Y., Robertson, A. L., & Bottomley, S. P. (2011). Flanking domain stability modulates the aggregation kinetics of a polyglutamine disease protein. Protein science, 20(10), 1675-1681. doi:10.1002/pro.698

Spinocerebellar Ataxia Type 3 (SCA3) is one of nine polyglutamine (polyQ) diseases that are all characterized by progressive neuronal dysfunction and the presence of neuronal inclusions containing aggregated polyQ protein, suggesting that protein misfolding is a key part of this disease. Ataxin-3, the causative protein of SCA3, contains a globular, structured N-terminal domain (the Josephin domain) and a flexible polyQ-containing C-terminal tail, the repeat-length of which modulates pathogenicity. It has been suggested that the fibrillogenesis pathway of ataxin-3 begins with a non-polyQ-dependent step mediated by Josephin domain interactions, followed by a polyQ-dependent step. To test the involvement of the Josephin domain in ataxin-3 fibrillogenesis, we have created both pathogenic and nonpathogenic length ataxin-3 variants with a stabilized Josephin domain, and have both stabilized and destabilized the isolated Josephin domain. We show that changing the thermodynamic stability of the Josephin domain modulates ataxin-3 fibrillogenesis. These data support the hypothesis that the first stage of ataxin-3 fibrillogenesis is caused by interactions involving the non-polyQ containing Josephin domain and that the thermodynamic stability of this domain is linked to the aggregation propensity of ataxin-3.

https://dipot.ulb.ac.be/dspace/bitstream/2013/129964/5/129964.pdf

Dynamic modeling of gene expression in prokaryotes: application to glucose-lactose diauxie in Escherichia coli.

Albert, J., & Rooman, M. (2011). Dynamic modeling of gene expression in prokaryotes: application to glucose-lactose diauxie in Escherichia coli. Systems and Synthetic Biology, 5(1-2), 33-43. doi:10.1007/s11693-011-9079-2

Coexpression of genes or, more generally, similarity in the expression profiles poses an unsurmountable obstacle to inferring the gene regulatory network (GRN) based solely on data from DNA microarray time series. Clustering of genes with similar expression profiles allows for a course-grained view of the GRN and a probabilistic determination of the connectivity among the clusters. We present a model for the temporal evolution of a gene cluster network which takes into account interactions of gene products with genes and, through a non-constant degradation rate, with other gene products. The number of model parameters is reduced by using polynomial functions to interpolate temporal data points. In this manner, the task of parameter estimation is reduced to a system of linear algebraic equations, thus making the computation time shorter by orders of magnitude. To eliminate irrelevant networks, we test each GRN for stability with respect to parameter variations, and impose restrictions on its behavior near the steady state. We apply our model and methods to DNA microarray time series' data collected on Escherichia coli during glucose-lactose diauxie and infer the most probable cluster network for different phases of the experiment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11693-011-9079-2) contains supplementary material, which is available to authorized users.

https://dipot.ulb.ac.be/dspace/bitstream/2013/136847/4/doi_119613.pdf

Conformations consistent with charge migration observed in DNA and RNA X-ray structures

Rooman, M., Cauet, E., Liévin, J., & Wintjens, R. (2011). Conformations consistent with charge migration observed in DNA and RNA X-ray structures. Journal of biomolecular structure & dynamics, 28, 949-954.

Detection of perturbation phases and developmental stages in organisms from DNA microarray time series data.

Rooman, M., Albert, J., Dehouck, Y., & Haye, A. (2011). Detection of perturbation phases and developmental stages in organisms from DNA microarray time series data. PloS one, 6(12), e27948. doi:10.1371/journal.pone.0027948

Available DNA microarray time series that record gene expression along the developmental stages of multicellular eukaryotes, or in unicellular organisms subject to external perturbations such as stress and diauxie, are analyzed. By pairwise comparison of the gene expression profiles on the basis of a translation-invariant and scale-invariant distance measure corresponding to least-rectangle regression, it is shown that peaks in the average distance values are noticeable and are localized around specific time points. These points systematically coincide with the transition points between developmental phases or just follow the external perturbations. This approach can thus be used to identify automatically, from microarray time series alone, the presence of external perturbations or the succession of developmental stages in arbitrary cell systems. Moreover, our results show that there is a striking similarity between the gene expression responses to these a priori very different phenomena. In contrast, the cell cycle does not involve a perturbation-like phase, but rather continuous gene expression remodeling. Similar analyses were conducted using three other standard distance measures, showing that the one we introduced was superior. Based on these findings, we set up an adapted clustering method that uses this distance measure and classifies the genes on the basis of their expression profiles within each developmental stage or between perturbation phases.

https://dipot.ulb.ac.be/dspace/bitstream/2013/129961/4/doi_111861.pdf

Robustness analysis of a linear dynamical model of the drosophila gene expression

Haye, A., Albert, J., & Rooman, M. (2011). Robustness analysis of a linear dynamical model of the drosophila gene expression. Lecture notes in computer science, 6685 LNBI, 242-252. doi:10.1007/978-3-642-21946-7_19

The evolution of the gene expression levels of Drosophila melanogaster, from the embryonic to adult development phases, has been studied on the basis of a microarray time series involving the expression levels of more than 4000 genes over 67 time-points, and has been modeled by a system of linear differential equations with constant coefficients. Here we investigate the robustness of this model against perturbations of its parameters and of the initial data values. We found that the model is not robust at all for fully connected networks, but that the robustness significantly increases after parameter reduction. This puts some limits to the biological relevance of linear models for gene expression evolution. © 2011 Springer-Verlag Berlin Heidelberg.

PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality.

Dehouck, Y., Kwasigroch, J.-M., Gilis, D., & Rooman, M. (2011). PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC bioinformatics, 12, 151. doi:10.1186/1471-2105-12-151

ABSTRACT:

https://dipot.ulb.ac.be/dspace/bitstream/2013/92196/4/doi_70825.pdf

2010

Evidence that interaction between conserved residues in transmembrane helices 2, 3 and 7 are crucial for human VPAC1 receptor activation.

Chugunov, A. O., Simms, J., Poyner, D. R., Dehouck, Y., Rooman, M., Gilis, D., & Langer, I. (2010). Evidence that interaction between conserved residues in transmembrane helices 2, 3 and 7 are crucial for human VPAC1 receptor activation. Molecular pharmacology, 78(3), 394-401. doi:10.1124/mol.110.063578

The VPAC(1) receptor belongs to family B of G protein coupled receptors (GPCR-B) and is activated upon binding of the VIP peptide. Despite the recent solving of the structure of the N-terminus of several members of this receptor family, little is known about the structure of the transmembrane (TM) region and about the molecular mechanisms leading to activation. In the present study we designed a new structural model of the TM domain and combined it with experimental mutagenesis experiments to investigate the interaction network that governs ligand binding and receptor activation. Our results suggest that this network involves the cluster of residues R(188) in TM2, Q(380) in TM7 and N(229) in TM3. This cluster is expected to be altered upon VIP binding, as R(188) has previously been shown to interact with D(3) of VIP. Several point mutations at positions 188, 229 and 380 were experimentally characterized and shown to severely affect VIP binding and/or VIP mediated cAMP production. Double mutants built from reciprocal residue exchanges exhibit strong cooperative or anti-cooperative effects, thereby indicating the spatial proximity of residues R(188), Q(380) and N(229). As these residues are highly conserved in the GPCR-B family, they can moreover be expected to have a general role in mediating function.

Thermo- and mesostabilizing protein interactions identified by temperature-dependent statistical potentials.

Folch, B., Dehouck, Y., & Rooman, M. (2010). Thermo- and mesostabilizing protein interactions identified by temperature-dependent statistical potentials. Biophysical journal, 98(4), 667-677. doi:10.1016/j.bpj.2009.10.050

The goal of controlling protein thermostability is tackled here through establishing, by in silico analyses, the relative weight of residue-residue interactions in proteins as a function of temperature. We have designed for that purpose a (melting-) temperature-dependent, statistical distance potential, where the interresidue distances are computed between the side-chain geometric centers or their functional centers. Their separate derivation from proteins of either high or low thermal resistance reveals the interactions that contribute most to stability in different temperature ranges. Thermostabilizing interactions include salt bridges and cation-pi interactions (especially those involving arginine), aromatic interactions, and H-bonds between negatively charged and some aromatic residues. In contrast, H-bonds between two polar noncharged residues or between a polar noncharged residue and a negatively charged residue are relatively less stabilizing at high temperatures. An important observation is that it is necessary to consider both repulsive and attractive interactions in overall thermostabilization, as the degree of repulsion may also vary with temperature. These temperature-dependent potentials are not only useful for the identification of meso- and thermostabilizing pair interactions, but also exhibit predictive power, as illustrated by their ability to predict the melting temperature of a protein based on the melting temperature of homologous proteins.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71780/1/Elsevier_49137.pdf

Gene expression model (in)validation by Fourier analysis.

Konopka, T., & Rooman, M. (2010). Gene expression model (in)validation by Fourier analysis. B M C Systems Biology, 4, 123. doi:10.1186/1752-0509-4-123

The determination of the right model structure describing a gene regulation network and the identification of its parameters are major goals in systems biology. The task is often hampered by the lack of relevant experimental data with sufficiently low noise level, but the subset of genes whose concentration levels exhibit an oscillatory behavior in time can readily be analyzed on the basis of their Fourier spectrum, known to turn complex signals into few relatively noise-free parameters. Such genes therefore offer opportunities of understanding gene regulation quantitatively.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71779/4/doi_49136.pdf

Interaction among the gene products contributes to the regulation of gene expression

Albert, J., & Rooman, M. (2010). Interaction among the gene products contributes to the regulation of gene expression. IFAC proceedings volumes, 11(PART 1), 245-250. doi:10.3182/20100707-3-BE-2012.0025

Owing to its complexity, the mechanism of gene regulation for the whole genome is not easily probed with the conventional gene-level analysis. Models of gene regulatory networks (GRN) that are based on a course-grained view of gene regulation, where genes with similar expression profile are grouped in clusters, can aid the inference of important correlations between genes.We propose a model of gene regulation which incorporates the effects of interaction among gene products on the transcription rates. We introduce for that purpose a non-constant rate of gene product degradation which depends on the concentrations of other gene products. Guided by previous reports on the sparseness of GRNs we employ a method of parameter reduction based on interpolating polynomials. We test our model on DNA microarray time series data of Escherichia coli during a glucose-lactose diauxie and report good agreement between data and theory. Our analysis shows that on going from glucose to lactose phase the system becomes highly connected and becomes again sparsely connected in the lactose phase. Out of the many compatible GRNs that our model predicts we chose the ones that satisfy criteria based on robustness, behavior near a fixed point, and fit of the data. © 2010 IFAC.

2009

The first peptides: the evolutionary transition between prebiotic amino acids and early proteins.

van der Gulik, P., Massar, S., Gilis, D., Buhrman, H., & Rooman, M. (2009). The first peptides: the evolutionary transition between prebiotic amino acids and early proteins. Journal of theoretical biology, 261(4), 531-539. doi:10.1016/j.jtbi.2009.09.004

The issues we attempt to tackle here are what the first peptides did look like when they emerged on the primitive earth, and what simple catalytic activities they fulfilled. We conjecture that the early functional peptides were short (3-8 amino acids long), were made of those amino acids, Gly, Ala, Val and Asp, that are abundantly produced in many prebiotic synthesis experiments and observed in meteorites, and that the neutralization of Asp's negative charge is achieved by metal ions. We further assume that some traces of these prebiotic peptides still exist, in the form of active sites in present-day proteins. Searching these proteins for prebiotic peptide candidates led us to identify three main classes of motifs, bound mainly to Mg(2+) ions: D(F/Y)DGD corresponding to the active site in RNA polymerases, DGD(G/A)D present in some kinds of mutases, and DAKVGDGD in dihydroxyacetone kinase. All three motifs contain a DGD submotif, which is suggested to be the common ancestor of all active peptides. Moreover, all three manipulate phosphate groups, which was probably a very important biological function in the very first stages of life. The statistical significance of our results is supported by the frequency of these motifs in today's proteins, which is three times higher than expected by chance, with a P-value of 3 x 10(-2). The implications of our findings in the context of the appearance of life and the possibility of an experimental validation are discussed.

Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0.

Dehouck, Y., Grosfils, A., Folch, B., Gilis, D., Bogaerts, P., & Rooman, M. (2009). Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics, 25(19), 2537-2543. doi:10.1093/bioinformatics/btp445

MOTIVATION: The rational design of proteins with modified properties, through amino acid substitutions, is of crucial importance in a large variety of applications. Given the huge number of possible substitutions, every protein engineering project would benefit strongly from the guidance of in silico methods able to predict rapidly, and with reasonable accuracy, the stability changes resulting from all possible mutations in a protein. RESULTS: We exploit newly developed statistical potentials, based on a formalism that highlights the coupling between four protein sequence and structure descriptors, and take into account the amino acid volume variation upon mutation. The stability change is expressed as a linear combination of these energy functions, whose proportionality coefficients vary with the solvent accessibility of the mutated residue and are identified with the help of a neural network. A correlation coefficient of R = 0.63 and a root mean square error of sigma(c) = 1.15 kcal/mol between measured and predicted stability changes are obtained upon cross-validation. These scores reach R = 0.79, and sigma(c) = 0.86 kcal/mol after exclusion of 10% outliers. The predictive power of our method is shown to be significantly higher than that of other programs described in the literature. AVAILABILITY: http://babylone.ulb.ac.be/popmusic

A robust method for the joint estimation of yield coefficients and kinetic parameters in bioprocess models.

Vastemans, V., Rooman, M., & Bogaerts, P. (2009). A robust method for the joint estimation of yield coefficients and kinetic parameters in bioprocess models. Biotechnology progress, 25(3), 606-618. doi:10.1002/btpr.89

Bioprocess model structures that require nonlinear parameter estimation, thus initialization values, are often subject to poor identification performances because of the uncertainty on those initialization values. Under some conditions on the model structure, it is possible to partially circumvent this problem by an appropriate decoupling of the linear part of the model from the nonlinear part of it. This article provides a procedure to be followed when these structural conditions are not satisfied. An original method for decoupling two sets of parameters, namely, kinetic parameters from maximum growth, production, decay rates, and yield coefficients, is presented. It exhibits the advantage of requiring only initialization of the first subset of parameters. In comparison with a classical nonlinear estimation procedure, in which all the parameters are freed, results show enhanced robustness of model identification with regard to parameter initialization errors. This is illustrated by means of three simulation case studies: a fed-batch Human Embryo Kidney cell cultivation process using a macroscopic reaction scheme description, a process of cyclodextrin-glucanotransferase production by Bacillus circulans, and a process of simultaneous starch saccharification and glucose fermentation to lactic acid by Lactobacillus delbrückii, both based on a Luedeking-Piret model structure. Additionally, perspectives of the presented procedure in the context of systematic bioprocess modeling are promising.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71781/3/71781.pdf

Modeling the temporal evolution of the Drosophila gene expression from DNA microarray time series.

Haye, A., Dehouck, Y., Kwasigroch, J.-M., Bogaerts, P., & Rooman, M. (2009). Modeling the temporal evolution of the Drosophila gene expression from DNA microarray time series. Physical biology, 6(1), 016004. doi:10.1088/1478-3975/6/1/016004

The time evolution of gene expression across the developmental stages of the host organism can be inferred from appropriate DNA microarray time series. Modeling this evolution aims eventually at improving the understanding and prediction of the complex phenomena that are the basis of life. We focus on the embryonic-to-adult development phases of Drosophila melanogaster, and chose to model the expression network with the help of a system of differential equations with constant coefficients, which are nonlinear in the transcript concentrations but linear in their logarithms. To reduce the dimensionality of the problem, genes having similar expression profiles are grouped into 17 clusters. We show that a simple linear model is able to reproduce the experimental data with very good precision, owing to the large number of parameters that represent the connections between the clusters. Remarkably, the parameter reduction allowed elimination of up to 80-85% of these connections while keeping fairly good precision. This result supports the low-connectivity hypothesis of gene expression networks, with about three connections per cluster, without introducing a priori hypotheses. The core of the network shows a few gene clusters with negative self-regulation, and some highly connected clusters involving proteins with crucial functions.

2008

Revisiting the correlation between proteins' thermoresistance and organisms' thermophilicity

Dehouck, Y., Folch, B., & Rooman, M. (2008). Revisiting the correlation between proteins' thermoresistance and organisms' thermophilicity. Protein engineering, design & selection, 21(4), 275-278. doi:10.1093/protein/gzn001

The possibility to rationally design protein mutants that remain structured and active at high temperatures strongly depends on a better understanding of the mechanisms of protein thermostability. Studies devoted to this issue often rely on the living temperature (Tenv) of the host organism rather than on the melting temperature (Tm) of the analyzed protein. To investigate the scale of this approximation, we probed the relationship between Tm and Tenv on a dataset of 127 proteins, and found a much weaker correlation than previously expected: the correlation coefficient is equal to 0.59 and the regression line is Tm ≈ 42.9°C + 0.62Tenv. To illustrate the effect of using Tenv rather than Tm to analyze protein thermoresistance, we derive statistical distance potentials, describing Glu-Arg and Asp-Arg salt bridges, from protein structure sets with high or low Tm or Tenv. The results show that the more favorable nature of salt bridges, relative to other interactions, at high temperatures is more clear-cut when defining thermoresistance in terms of Tm. The Tenv-based sets nevertheless remain informative. © The Author 2008. Published by Oxford University Press. All rights reserved.

Mn/Fe superoxide dismutase interaction fingerprints and prediction of oligomerization and metal cofactor from sequence

Wintjens, R., Gilis, D., & Rooman, M. (2008). Mn/Fe superoxide dismutase interaction fingerprints and prediction of oligomerization and metal cofactor from sequence. Proteins, 70(4), 1564-1577. doi:10.1002/prot.21650

Fe- and Mn-containing superoxide dismutase (sod) enzymes are closely related and similar in both amino acid sequence and structure, but differ in their mode of oligomerization and in their specificity for the Fe or Mn cofactor. The goal of the present work is to identify and analyze the sequence and structure characteristics that ensure the cofactor specificities and the oligomerization modes. For that purpose, 374 sod sequences and 17 sod crystal structures were collected and aligned. These alignments were searched for residues and interresidue interactions that are conserved within the whole sod family, or alternatively, that are specific to a given sod subfamily sharing common characteristics. This led us to define key residues and interresidue interaction fingerprints in each subfamily. The comparison of these fingerprints allows, on a rational basis, the design of mutants likely to modulate the activity and/or specificity of the target sod, in good agreement with the available experimental results on known mutants. The key residues and interaction fingerprints are furthermore used to predict if a novel sequence corresponds to a sod enzyme, and if so, what type of sod it is. The predictions of this fingerprint method reach much higher scores and present much more discriminative power than the commonly used method that uses pairwise sequence comparisons. © 2007 Wiley-Liss, Inc.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60555/1/2008_Wintjens_Proteins.pdfhttps://dipot.ulb.ac.be/dspace/bitstream/2013/60555/4/60555.pdf

Thermostability of salt bridges versus hydrophobic interactions in proteins probed by statistical potentials

Folch, B., Rooman, M., & Dehouck, Y. (2008). Thermostability of salt bridges versus hydrophobic interactions in proteins probed by statistical potentials. Journal of chemical information and modeling, 48(1), 119-127. doi:10.1021/ci700237g

The temperature dependence of the interactions that stabilize protein structures is a long-standing issue, the elucidation of which would enable the prediction and the rational modification of the thermostability of a target protein. It is tackled here by deriving distance-dependent amino acid pair potentials from four datasets of proteins with increasing melting temperatures (Tm). The temperature dependence of the interactions is determined from the differences in the shape of the potentials derived from the four datasets. Note that, here, we use an unusual dataset definition, which is based on the Tm values, rather than on the living temperature of the host organisms. Our results show that the stabilizing weight of hydrophobic interactions (between He, Leu, and Val) remains constant as the temperature increases, compared to the other interactions. In contrast, the two minima of the Arg-Glu and Arg-Asp salt bridge potentials show a significant Tm dependence. These two minima correspond to two geometries: the fork-fork geometry, where the side chains point toward each other, and the fork-stick geometry, which involves the Nε side chain atom of Arg. These two types of salt bridges were determined to be significantly more stabilizing at high temperature. Moreover, a preference for more-compact salt bridges is noticeable in heat-resistant proteins, especially for the fork-fork geometry. The T m-dependent potentials that have been defined here should be useful for predicting thermal stability changes upon mutation. © 2008 American Chemical Society.

SODa: an Mn/Fe superoxide dismutase prediction and design server

Kwasigroch, J.-M., Wintjens, R., Gilis, D., & Rooman, M. (2008). SODa: an Mn/Fe superoxide dismutase prediction and design server. BMC bioinformatics, 9, 257. doi:10.1186/1471-2105-9-257

Background: Superoxide dismutases (SODs) are ubiquitous metalloenzymes that play an important role in the defense of aerobic organisms against oxidative stress, by converting reactive oxygen species into nontoxic molecules. We focus here on the SOD family that uses Fe or Mn as cofactor. Results: The SODa webtool http://babylone.ulb.ac.be/soda predicts if a target sequence corresponds to an Fe/Mn SOD. If so, it predicts the metal ion specificity (Fe, Mn or cambialistic) and the oligomerization mode (dimer or tetramer) of the target. In addition, SODa proposes a list of residue substitutions likely to improve the predicted preferences for the metal cofactor and oligomerization mode. The method is based on residue fingerprints, consisting of residues conserved in SOD sequences or typical of SOD subgroups, and of interaction fingerprints, containing residue pairs that are in contact in SOD structures. Conclusion: SODa is shown to outperform and to be more discriminative than traditional techniques based on pairwise sequence alignments. Moreover, the fact that it proposes selected mutations makes it a valuable tool for rational protein design. © 2008 Kwasigroch et al; licensee BioMed Central Ltd.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60543/4/doi_36882.pdf

2007

Enhancing the stability and solubility of TEV protease using in silico design.

Cabrita, L. D., Gilis, D., Robertson, A. L., Dehouck, Y., Rooman, M., & Bottomley, S. P. (2007). Enhancing the stability and solubility of TEV protease using in silico design. Protein science, 16(11), 2360-2367. doi:10.1110/ps.072822507

The ability to rationally increase the stability and solubility of recombinant proteins has long been a goal of biotechnology and has significant implications for biomedical research. Poorly soluble enzymes, for example, result in the need for larger reaction volumes, longer incubation times, and more restricted reaction conditions, all of which increase the cost and have a negative impact on the feasibility of the process. Rational design is achieved here by means of the PoPMuSiC program, which performs in silico predictions of stability changes upon single-site mutations. We have used this program to increase the stability of the tobacco etch virus (TEV) protein. TEV is a 27-kDa nuclear inclusion protease with stringent specificity that is commonly used for the removal of solubility tags during protein purification protocols. However, while recombinant TEV can be produced in large quantities, a limitation is its relatively poor solubility (generally approximately 1 mg/mL), which means that large volumes and often long incubation times are required for efficient cleavage. Following PoPMuSiC analysis of TEV, five variants predicted to be more stable than the wild type were selected for experimental analysis of their stability, solubility, and activity. Of these, two were found to enhance the solubility of TEV without compromising its functional activity. In addition, a fully active double mutant was found to remain soluble at concentrations in excess of 40 mg/mL. This modified TEV appears thus as an interesting candidate to be used in recombinant protein technology.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60546/6/PMC2211701.pdf

2006

Deformation of D-branes in three-dimensional anti-de Sitter black holes

Bieliavsky, P., Detournay, S., Rooman, M., & Spindel, P. (2006). Deformation of D-branes in three-dimensional anti-de Sitter black holes. Journal of physics. Conference series, 53(1), 059, 900-911. doi:10.1088/1742-6596/53/1/059

Prelude and Fugue, predicting local protein structure, early folding regions and structural weaknesses.

Kwasigroch, J.-M., & Rooman, M. (2006). Prelude and Fugue, predicting local protein structure, early folding regions and structural weaknesses. Bioinformatics, 22(14), 1800-1802. doi:10.1093/bioinformatics/btl176

Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude(&Fugue) computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. (Prelude&)Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold. AVAILABILITY: http://babylone.ulb.ac.be/Prelude_and_Fugue.

A new generation of statistical potentials for proteins

Dehouck, Y., Gilis, D., & Rooman, M. (2006). A new generation of statistical potentials for proteins. Biophysical journal, 90(11), 4010-4017. doi:10.1529/biophysj.105.079434

We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decomposition of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addition, this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60558/4/PMC1459517.pdf

Development of novel statistical potentials describing cation-pi interactions in proteins and comparison with semiempirical and quantum chemistry approaches

Gilis, D., Biot, C., Buisine, E., Dehouck, Y., & Rooman, M. (2006). Development of novel statistical potentials describing cation-pi interactions in proteins and comparison with semiempirical and quantum chemistry approaches. Journal of chemical information and modeling, 46(2), 884-893. doi:10.1021/ci050395b

Novel statistical potentials derived from known protein structures are presented. They are designed to describe cation-pi and amino-pi interactions between a positively charged amino acid or an amino acid carrying a partially charged amino group and an aromatic moiety. These potentials are based on the propensity of residue types to be separated by a certain spatial distance or to have a given relative orientation. Several such potentials, describing different kinds of correlations between residue types, distances, and orientations, are derived and combined in a way that maximizes their information content and minimizes their redundancy. To test the ability of these potentials to describe cation-pi and amino-pi systems, we compare their energies with those computed with the CHARMM molecular mechanics force field and with quantum chemistry calculations at the Hartree-Fock level (HF) and at the second order of the Møller-Plesset perturbation theory (MP2). The latter calculations are performed in the gas phase and in acetone, in order to mimic the average dielectric constant of protein environments. The energies computed with the best of our statistical potentials and with gas-phase HF or MP2 show correlation coefficients up to 0.96 when considering one side-chain degree of freedom in the statistical potentials and up to 0.94 when using a totally simplified model excluding all side-chain degrees of freedom. These potentials perform as well as, or better than, the CHARMM molecular mechanics force field that uses a much more detailed protein representation. The good performance of our cation-pi statistical potentials suggests their utility in protein structure and stability prediction and in protein design.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60556/1/2006_Gilis-b_JChemInfModel.pdf

2005

Histidine-aromatic interactions in proteins and protein-ligand complexes

Cauet, E., Rooman, M., Wintjens, R., Liévin, J., & Biot, C. (2005). Histidine-aromatic interactions in proteins and protein-ligand complexes: quantum chemical study of X-ray and model structures. Journal of chemical theory and computation, 1(3), 472-483. doi:10.1021/ct049875k

His-aromatic complexes, with the His located above the aromatic plane, are stabilized by π-π, δ +-π and/or cation-π interactions according to whether the His is neutral or protonated and the partners are in stacked or T-shape conformations. Here we attempt to probe the relative strength of these interactions as a function of the geometry and protonation state, in gas phase, in water and protein-like environments (acetone, THF and CCl 4), by means of quantum chemistry calculations performed up to second order of the Møller-Plesset pertubation theory. Two sets of conformations are considered for that purpose. The first set contains 89 interactions between His and Phe, Tyr, Trp, or Ade, observed in X-ray structures of proteins and protein-ligand complexes. The second set contains model structures obtained by moving an imidazolium/imida-zole moiety above a benzene ring or an adenine moiety. We found that the protonated complexes are much more stable than the neutral ones in gas phase. This higher stability is due to the electrostatic contributions, the electron correlation contributions being equally important in the two forms. Thus, π-π and δ +-π interactions present essentially favorable electron correlation energy terms, whereas cation-π interactions feature in addition favorable electrostatic energies. The pro-tonated complexes remain more stable than the neutral ones in protein-like environments, but the difference is drastically reduced. Furthermore, the T-shape conformation is undoubtedly more favorable than the stacked one in gas phase. This advantage decreases in the solvents, and the stacked conformation becomes even slightly more favorable in water. The frequent occurrence of His-aromatic interactions in catalytic sites, at protein-DNA or protein-ligand interfaces and in 3D domain swapping proteins emphasize their importance in biological processes. © 2005 American Chemical Society.

2004

Database-derived potentials dependent on protein size for in silico folding and design

Dehouck, Y., Gilis, D., & Rooman, M. (2004). Database-derived potentials dependent on protein size for in silico folding and design. Biophysical journal, 87(1), 171-181. doi:10.1529/biophysj.103.037861

Knowledge-based potentials are widely used in simulations of protein folding, structure prediction, and protein design. Their advantages include limited computational requirements and the ability to deal with low-resolution protein models compatible with long-scale simulations. Their drawbacks comprehend their dependence on specific features of the dataset from which they are derived, such as the size of the proteins it contains, and their physical meaning is still a subject of debate. We address these issues by probing the theoretical validity of these potentials as mean-force potentials that take the solvent implicitly into account and involve entropic contributions due to atomic degrees of freedom and solvation. The dependence on the size of the system is checked on distance-dependent amino acid pair potentials, derived from six protein structure sets containing proteins of increasing length N. For large inter-residue distances, they are found to display the theoretically predicted 1/N behavior weighted by a factor depending on the boundaries and the compressibility of the system. For short distances, different trends are observed according to the nature of the residue pairs and their ability to form, for example, electrostatic, cation-pi or pi-pi interactions, or hydrophobic packing. The results of this analysis are used to devise a novel protein size-dependent distance potential, which displays an improved performance in discriminating native sequence-structure matches among decoy models.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60561/4/PMC1304340.pdf

Star products on extended massive non-rotating BTZ black holes

Bieliavsky, P., Detournay, S., Spindel, P., & Rooman, M. (2004). Star products on extended massive non-rotating BTZ black holes. The journal of high energy physics (Online), 8(6), 697-721.

Cation-pi/H-bond stair motifs at protein-DNA interfaces

Biot, C., Wintjens, R., & Rooman, M. (2004). Cation-pi/H-bond stair motifs at protein-DNA interfaces: nonadditivity of H-bond, stacking, and cation-π interactions. Journal of the American Chemical Society, 126(20), 6220-6221. doi:10.1021/ja049620g

At the interface between protein and double-stranded DNA, stair motifs simultaneously involve three different types of pairwise interactions: aromatic base stacking, hydrogen bonding, and cation-π. The relative importance of these interactions is studied in the stair motif occurring in the 1TC3 crystal structure, which involves an arginine and two stacked guanines, by means of Hartree-Fock (HF) and Møller-Plesset energy and free energy calculations, including vibrational, rotational, translational contributions, both in a vacuum and various solvents. The results obtained show an anti-cooperative tendency of the HF energy and vibrational free energy terms, and the cooperativity of the rotational, translational, and solvation free energies. Hence, the cooperativity of the stair motif interactions, in the context of protein-DNA recognition, can be viewed as arising from the environment. Copyright © 2003 American Chemical Society.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71786/4/a73d6ec3-2242-4a29-b1d4-20269e2fc947.txt

Specificity and phenetic relationships of iron- and manganese-containing superoxide dismutases on the basis of structure and sequence comparisons.

Wintjens, R., Noël, C., May, A. C. W., Gerbod, D., Dufernez, F., Capron, M., Viscogliosi, E., & Rooman, M. (2004). Specificity and phenetic relationships of iron- and manganese-containing superoxide dismutases on the basis of structure and sequence comparisons. The Journal of biological chemistry, 279(10), 9248-9254. doi:10.1074/jbc.M312329200

The iron- and manganese-containing superoxide dismutases (Fe/Mn-SOD) share the same chemical function and spatial structure but can be distinguished according to their modes of oligomerization and their metal ion specificity. They appear as homodimers or homotetramers and usually require a specific metal for activity. On the basis of 261 aligned SOD sequences and 12 superimposed x-ray structures, two phenetic trees were constructed, one sequence-based and the other structure-based. Their comparison reveals the imperfect correlation of sequence and structural changes; hyperthermophilicity requires the largest sequence alterations, whereas dimer/tetramer and manganese/iron specificities are induced by the most sizable structural differences within the monomers. A systematic investigation of sequence and structure characteristics conserved in all aligned SOD sequences or in subsets sharing common oligomeric and/or metal specificities was performed. Several residues were identified as guaranteeing the common function and dimeric conformation, others as determining the tetramer formation, and yet others as potentially responsible for metal specificity. Some form cation-pi interactions between an aromatic ring and a fully or partially positively charged group, suggesting that these interactions play a significant role in the structure and function of SOD enzymes. Dimer/tetramer- and iron/manganese-specific fingerprints were derived from the set of conserved residues; they can be used to propose selected residue substitutions in view of the experimental validation of our in silico derived hypotheses.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71787/1/Wintjens-2004.pdf

2003

Free-energy calculations of protein-ligand cation-pi and amino-pi interactions: from vacuum to proteinlike environments

Biot, C., Buisine, E., & Rooman, M. (2003). Free-energy calculations of protein-ligand cation-pi and amino-pi interactions: from vacuum to proteinlike environments. Journal of the American Chemical Society, 125(46), 13988-13994. doi:10.1021/ja035223e

To probe the role of cation-pi and amino-pi interactions in the context of protein-ligand interactions, the stability of 55 X-ray cation/amino-pi motifs involving the Ade moieties of cofactor molecules and Arg, Lys, Asn, or Gln side chains of their host protein was evaluated using quantum chemistry calculations. The conjunction of vacuum interaction energies, vibrational entropy, and solvation contributions led to identify Arg-Ade as the most favorable cation/amino-pi complex in the solvents considered, followed by Asn/Gln-Ade and Lys-Ade: their minimum interaction free energies are approximately equal to -7, -4, and -2 kcal/mol, respectively, in the solvents of dielectric constant similar to that estimated for proteins (i.e., acetone, THF, and CCl(4)). Remarkably, these free-energy values of cation/amino-pi interactions correlate well with their frequency of occurrences in protein-ligand structures, which corroborates our approach in the absence of experimental data.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71788/4/5b29d7bc-1505-4fcb-a489-8867aa08142e.txt

Global geometry of the 2 + 1 rotating black hole

Bieliavsky, P., Detournay, S., Herquet, M., Rooman, M., & Spindel, P. (2003). Global geometry of the 2 + 1 rotating black hole. Physics letters. Section B, 570(3-4), 231-236. doi:10.1016/j.physletb.2003.07.055

The generic rotating BTZ black hole, obtained by identifications in AdS3 space through a discrete subgroup of its isometry group, is investigated within a Lie theoretical context. This space is found to admit a foliation by two-dimensional leaves, orbits of a two-parameter subgroup of S̃L(2, ℝ) and invariant under the BTZ identification subgroup. A global expression for the metric is derived, allowing a better understanding of the causal structure of the black hole. © 2003 Elsevier B.V. All rights reserved.

Basis set and electron correlation effects on ab initio calculations of cation-pi/H-bond stair motifs

Wintjens, R., Biot, C., Rooman, M., & Liévin, J. (2003). Basis set and electron correlation effects on ab initio calculations of cation-pi/H-bond stair motifs. The Journal of Physical Chemistry. A, 107(32), 6249-6258. doi:10.1021/jp034103q

Cation-π/H-bond stair motifs are recurrently found at the binding interface between protein and DNA. They involve two nucleobases and an amino acid side chain, and encompass three different types of interactions: nucleobase stacking, nucleobase-amino acid H-bond and nucleobase-amino acid cation-π interaction. The interaction energies of the 77 stair motif geometries identified in a data set of 52 high-resolution protein - DNA complexes were investigated by means of ab initio quantum chemistry calculations. Using the standard 6-3IG* basis set, we first establish the value of the Gaussian αd-exponent of d-polarization functions on heavy atoms, which optimizes the MP2 interaction energies. We show that, although the default value of αd = 0.8 is appropriate to minimize the total MP2 energy of a system, the value of αd = 0.2 is optimal for the three types of pairwise interactions studied and yields MP2 interaction energies quite similar to those calculated with more extended basis sets. Indeed, the more diffuse nature of the αd = 0.2 basis functions allows a spatial overlap between the orbitals of the interacting partners. Such functions are also shown to improve the multipole electric moments in the interaction region, which results in a stabilizing polarization effect and a better description of the dispersive energy contributions. Using the MP2 computation level and the 6-31G* basis set with αd = 0.2 instead of αd = 0.8, we computed the interaction energies of the 77 observed stair motif geometries and found that, in a vacuum, the cation-π energy is much less favorable, about 3 times, than the H-bond energy and of the same order of magnitude as the π-π stacking energy. Furthermore, the convergence of the MP perturbation theory expansions was analyzed by computing the MP3 and MP4 corrections on simplified complexes. These expansions exhibited an oscillatory behavior, where MP2 seems to provide a satisfactory approximation, albeit slightly overestimated, to the interaction energy.

https://dipot.ulb.ac.be/dspace/bitstream/2013/72007/4/6c4b27af-f38c-4c30-a20f-96f9c504fe96.txt

Sequence-structure signals of 3D domain swapping in proteins

Dehouck, Y., Biot, C., Gilis, D., Kwasigroch, J.-M., & Rooman, M. (2003). Sequence-structure signals of 3D domain swapping in proteins. Journal of Molecular Biology, 330(5), 1215-1225. doi:10.1016/S0022-2836(03)00614-4

Three-dimensional domain swapping occurs when two or more identical proteins exchange identical parts of their structure to generate an oligomeric unit. It affects proteins with diverse sequences and structures, and is expected to play important roles in evolution, functional regulation and even conformational diseases. Here, we search for traces of domain swapping in the protein sequence, by means of algorithms that predict the structure and stability of proteins using database-derived potentials. Regions whose sequences are not optimal with regard to the stability of the native structure, or showing marked intrinsic preferences for non-native conformations in absence of tertiary interactions are detected in most domain-swapping proteins. These regions are often located in areas crucial in the swapping process and are likely to influence it on a kinetic or thermodynamic level. In addition, cation-pi interactions are frequently observed to zip up the edges of the interface between intertwined chains or to involve hinge loop residues, thereby modulating stability. We end by proposing a set of mutations altering the swapping propensities, whose experimental characterization would contribute to refine our in silico derived hypotheses.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60564/4/Elsevier_70816.pdf

In vitro and in silico design of alpha1-antitrypsin mutants with different conformational stabilities

Gilis, D., McLennan, H. R., Dehouck, Y., Cabrita, L. D., Rooman, M., & Bottomley, S. P. (2003). In vitro and in silico design of alpha1-antitrypsin mutants with different conformational stabilities. Journal of Molecular Biology, 325(3), 581-589. doi:10.1016/S0022-2836(02)01221-4

α1-Antitrypsin, a protein belonging to the serine protease inhibitor (serpin) superfamily, is characterized by the ability to undergo dramatic conformational changes leading to inactive polymers. Serpin polymerization, which causes a range of diseases such as emphysema, thrombosis and dementia, occurs through a process in which the reactive center loop residues of one serpin molecule insert into the A β-sheet of another. PoP-MuSiC, a program that uses database-derived mean force potentials to predict changes in folding free energy resulting from single-site mutations, was used to modulate rationally the polymerization propensity of α1-antitrypsin. This was accomplished by generating mutants with a stabilized active form and destabilized polymerized form, or the converse. Of these mutants, five were expressed and characterized experimentally. In agreement with the predictions, three of them, K331F, K331I and K331V, were shown to stabilize the active form and decrease the polymerization rate, and one of them, S330R, to destabilize the active form and to increase polymerization. Only one mutant (K331T) did not display the expected behavior. Thus, strikingly, the adjacent positions 330 and 331, which are located at the beginning of the β-strand next to the additionally inserted β-strand in the polymerized form, have opposite effects on the conformational change. These residues therefore appear to play a key role in inducing or preventing such conformational change. © 2003 Elsevier Science Ltd. All rights reserved.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60565/1/2003_Gilis_JMolBiol.pdf

2002

Regular poisson structures on massive non-rotating BTZ black holes

Bieliavsky, P., Rooman, M., & Spindel, P. (2002). Regular poisson structures on massive non-rotating BTZ black holes. Nuclear physics. B, 645(3), 349-364.

PoPMuSiC, rationally designing point mutations in protein structures

Kwasigroch, J.-M., Gilis, D., Dehouck, Y., & Rooman, M. (2002). PoPMuSiC, rationally designing point mutations in protein structures. Bioinformatics, 18(12), 1701-1702. doi:10.1093/bioinformatics/18.12.1701

PoPMuSiC is an efficient tool for rational computer-aided design of single-site mutations in proteins and peptides. Two types of queries can be submitted. The first option allows to estimate the changes in folding free energy for specific point mutations given by the user. In the second option, all possible point mutations in a given protein or protein region are performed and the most stabilizing or destabilizing mutations, or the neutral mutations with respect to thermodynamic stability, are selected. For each sequence position or secondary structure the deviation from the most stable sequence is moreover evaluated, which helps to identify the most suitable sites for the introduction of mutations.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60568/4/doi_36905.pdf

What is paradoxical about Levinthal paradox?

Rooman, M., Dehouck, Y., Kwasigroch, J.-M., Biot, C., & Gilis, D. (2002). What is paradoxical about Levinthal paradox? Journal of biomolecular structure & dynamics, 20(3), 327-329. doi:10.1080/07391102.2002.10506850

We would be tempted to state that there has never been a Levinthal paradox. Indeed, Levinthal raised an interesting problem about protein folding, as he realized that proteins have no time to explore exhaustively their conformational space on the way to their native structure. He did not seem to find this paradoxical and immediately proposed a straightforward solution, which has essentially never been refuted. In other words, Levinthal solved his own paradox.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60566/4/2c092a64-b70d-4a90-8853-c62ff6f0aeee.txt

Probing the energetic and structural role of amino acid/nucleobase cation-pi interactions in protein-ligand complexes.

Biot, C., Buisine, E., Kwasigroch, J.-M., Wintjens, R., & Rooman, M. (2002). Probing the energetic and structural role of amino acid/nucleobase cation-pi interactions in protein-ligand complexes. The Journal of biological chemistry, 277(43), 40816-40822. doi:10.1074/jbc.M205719200

X-ray structures of proteins bound to ligand molecules containing a nucleic acid base were systematically searched for cation-pi interactions between the base and a positively charged or partially charged side chain group located above it, using geometric criteria. Such interactions were found in 38% of the complexes and are thus even more frequent than pi-pi stacking interactions. They are moreover well conserved in families of related proteins. The overwhelming majority of cation-pi contacts involve Ade bases, as these constitute by far the most frequent ligand building block; Arg-Ade is the most frequent cation-pi pair. Ab initio energy calculations at MP2 level were performed on all recorded pairs. Though cation-pi interactions involving the net positive charge carried by Arg or Lys side chains are the most favorable energetically, those involving the partial positive charge of Asn and Gln side chain amino groups (sometimes referred to as amino-pi interactions) are favorable too, owing to the electron correlation energy contribution. Chains of cation-pi interactions with a nucleobase bound simultaneously to two charged groups or a charged group sandwiched between two aromatic moieties are found in several complexes. The systematic association of these motifs with specific ligand molecules in unrelated protein sequences raises the question of their role in protein-ligand structure, stability, and recognition.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71789/6/bc7ee094-be71-4578-8143-b82558dd98f2.txt

Cation-pi/H-bond stair motifs at protein-DNA interfaces

Rooman, M., Liévin, J., Buisine, E., & Wintjens, R. (2002). Cation-pi/H-bond stair motifs at protein-DNA interfaces. Journal of Molecular Biology, 319(1), 67-76. doi:10.1016/S0022-2836(02)00263-2

H-bonds and cation-π interactions between nucleic acid bases and amino acid side-chains are known to occur often concomitantly at the interface between protein and double-stranded DNA. Here we define and analyze stair-shaped motifs, which simultaneously involve base stacking, H-bond and cation-π interactions. They consist of two successive bases along the DNA stack, one in cation-π interaction with an amino acid side-chain that carries a total or partial positive charge, and the other H-bonded with the same side-chain. A survey of 52 high-resolution structures of protein/DNA complexes reveals the occurrence of such motifs in the majority of the complexes, the most frequent of these motifs involving Arg side-chains and G bases. These stair motifs are sometimes part of larger motifs, called multiple stair motifs, which contain several successive stairs; zinc finger proteins for example exhibit up to quadruple stairs. In another kind of stair motif extension, termed cation-π chain motif, an amino acid side-chain or a nucleic acid base forms simultaneously two cation-π interactions. Such a motif is observed in several homeodomains, where it involves a DNA base in cation-π interactions with an Arg in the minor groove and an Asn in the major groove. A different cation-π chain motif contains an Arg in cation-π with a G and a Tyr, and is found in ets transcription factors. Still another chain motif is encountered in proteins that expulse a base from the DNA stack and replace it by an amino acid side-chain carrying a net or partial positive charge, which forms cation-π interactions with the two neighboring bases along the DNA strand. The striking conservation of typical stair and cation-π chain motifs within families of protein/DNA complexes suggests that they might play a structural and/or functional role and might moreover influence electron migration through the DNA double helix. © 2002 Elsevier Science Ltd. All rights reserved.

https://dipot.ulb.ac.be/dspace/bitstream/2013/145289/1/Elsevier_129112.pdf

2001

Uniqueness of the asymptotic AdS3 geometry

Rooman, M., & Spindel, P. (2001). Uniqueness of the asymptotic AdS3 geometry. Classical and quantum gravity, 18(11), 2117-2123. doi:10.1088/0264-9381/18/11/309

Identification and ab initio simulations of early folding units in proteins

Gilis, D., & Rooman, M. (2001). Identification and ab initio simulations of early folding units in proteins. Proteins, 42(2), 164-176. doi:10.1002/1097-0134(20010201)42:2<164::AID-PROT30>3.0.CO;2-#

The location of protein subunits that form early during folding, constituted of consecutive secondary structure elements with some intrinsic stability and favorable tertiary interactions, is predicted using a combination of threading algorithms and local structure prediction methods. Two folding units are selected among the candidates identified in a database of known protein structures: the fragment 15-55 of 434 cro, an all-alpha protein, and the fragment 1-35 of ubiquitin, an alpha/beta protein. These units are further analyzed by means of Monte Carlo simulated annealing using several database-derived potentials describing different types of interactions. Our results suggest that the local interactions along the chain dominate in the first folding steps of both fragments, and that the formation of some of the secondary structures necessarily occurs before structure compaction. These findings led us to define a prediction protocol, which is efficient to improve the accuracy of the predicted structures. It involves a first simulation with a local interaction potential only, whose final conformation is used as a starting structure of a second simulation that uses a combination of local interaction and distance potentials. The root mean square deviations between the coordinates of predicted and native structures are as low as 2-4 A in most trials. The possibility of extending this protocol to the prediction of full proteins is discussed. Proteins 2001;42:164-176.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60570/4/60570.pdf

Holonomies, anomalies and the Fefferman-Graham ambiguity in AdS3 gravity

Rooman, M., & Spindel, P. (2001). Holonomies, anomalies and the Fefferman-Graham ambiguity in AdS3 gravity. Nuclear physics. B, 594(1-2), 329-353. doi:10.1016/S0550-3213(00)00636-2

Using the Chern-Simons formulation of (2+1)-gravity, we derive, for the general asymptotic metrics given by the Fefferman-Graham-Lee theorems, the emergence of the Liouville mode associated to the boundary degrees of freedom of (2+1)-dimensional anti-de-Sitter geometries. Holonomies are described through multi-valued gauge and Liouville fields and are found to algebraically couple the fields defined on the disconnected components of spatial infinity. In the case of flat boundary metrics, explicit expressions are obtained for the fields and holonomies. We also show the link between the variation under diffeomorphisms of the Einstein theory of gravitation and the Weyl anomaly of the conformal theory at infinity. © 2001 Elsevier Science B.V.

https://dipot.ulb.ac.be/dspace/bitstream/2013/72010/1/Elsevier_49385.pdf

Role of salt bridges in homeodomains investigated by structural analyses and molecular dynamics simulations

Iurcu-Mustata, G., Van Belle, D., Wintjens, R., Prévost, M., & Rooman, M. (2001). Role of salt bridges in homeodomains investigated by structural analyses and molecular dynamics simulations. Biopolymers, 59(3), 145-159. doi:10.1002/1097-0282(200109)59:3<145::AID-BIP1014>3.0.CO;2-Z
https://dipot.ulb.ac.be/dspace/bitstream/2013/145296/4/145296.pdf https://dipot.ulb.ac.be/dspace/bitstream/2013/145296/1/Role-salt-bridges-homeodomains.pdf

Optimality of the genetic code with respect to protein stability and amino-acid frequencies

Gilis, D., Massar, S., Cerf, N., & Rooman, M. (2001). Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biology, 2, 0049.

BACKGROUND: The genetic code is known to be efficient in limiting the effect of mistranslation errors. A misread codon often codes for the same amino acid or one with similar biochemical properties, so the structure and function of the coded protein remain relatively unaltered. Previous studies have attempted to address this question quantitatively, by estimating the fraction of randomly generated codes that do better than the genetic code in respect of overall robustness. We extended these results by investigating the role of amino-acid frequencies in the optimality of the genetic code. RESULTS: We found that taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code. This effect is particularly pronounced when more refined measures of the amino-acid substitution cost are used than hydrophobicity. To show this, we devised a new cost function by evaluating in silico the change in folding free energy caused by all possible point mutations in a set of protein structures. With this function, which measures protein stability while being unrelated to the code's structure, we estimated that around two random codes in a billion (109) are fitter than the natural code. When alternative codes are restricted to those that interchange biosynthetically related amino acids, the genetic code appears even more optimal. CONCLUSIONS: These results lead us to discuss the role of amino-acid frequencies and other parameters in the genetic code's evolution, in an attempt to propose a tentative picture of primitive life.

https://dipot.ulb.ac.be/dspace/bitstream/2013/30416/4/PMC60310.pdf

Ab initio structure predictions using a hierarchic approach applied to 434 CRO and Drosophila homeodomain

Gilis, D., & Rooman, M. (2001). Ab initio structure predictions using a hierarchic approach applied to 434 CRO and Drosophila homeodomain. Theoretical Chemistry accounts, 106(1-2), 69-75. doi:10.1007/s002140000221

A discrete-state ab initio protein structure prediction procedure is presented, based on the assumption that some protein fold in an hierarchical way, where the early folding of independent units precedes and helps complete structure formation. It involves a first step predicting, by means of threading algorithms and local structure prediction methods, the location of autonomous protein subunits presenting favorable local and tertiary interactions. The second step consists of predicting the structure of these units by Monte Carlo simulated annealing using several database-derived potentials. In a last step, these predicted structures are used as starting conformations of additional simulations, keeping these structures frozen and including the complete protein sequence. This procedure is applied to two small DNA-binding proteins, 434 cro and the Drosophila melanogaster homeodomain that contain 65 and 47 residues, respectively, and is compared to the nonhierarchical procedure where the whole protein is predicted in a single run. The best predicted structures were found to present root-mean-square deviation relative to the native conformation of 2.7 A in the case of the homeodomain and of 3.9 A for 434 cro; these structures thus represent low-resolution models of the native structures. Strikingly, not only the helices were correctly predicted but also intervening turn motifs.

2000

PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins.

Gilis, D., & Rooman, M. (2000). PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. Protein engineering, 13(12), 849-856.

A novel tool for computer-aided design of single-site mutations in proteins and peptides is presented. It proceeds by performing in silico all possible point mutations in a given protein or protein region and estimating the stability changes with linear combinations of database-derived potentials, whose coefficients depend on the solvent accessibility of the mutated residues. Upon completion, it yields a list of the most stabilizing, destabilizing or neutral mutations. This tool is applied to mouse, hamster and human prion proteins to identify the point mutations that are the most likely to stabilize their cellular form. The selected mutations are essentially located in the second helix, which presents an intrinsic preference to form beta-structures, with the best mutations being T183-->F, T192-->A and Q186-->A. The T183 mutation is predicted to be by far the most stabilizing one, but should be considered with care as it blocks the glycosylation of N181 and this blockade is known to favor the cellular to scrapie conversion. Furthermore, following the hypothesis that the first helix might induce the formation of hydrophilic beta-aggregates, several mutations that are neutral with respect to the structure's stability but improve the helix hydrophobicity are selected, among which is E146-->L. These mutations are intended as good candidates to undergo experimental tests.

Contribution of cation-pi interactions to the stability of protein-DNA complexes

Wintjens, R., Liévin, J., Rooman, M., & Buisine, E. (2000). Contribution of cation-pi interactions to the stability of protein-DNA complexes. Journal of Molecular Biology, 302(2), 395-410.

The Fefferman-Graham ambiguity and ADS black holes

Bautier, K., Englert, F., Rooman, M., & Spindel, P. (2000). The Fefferman-Graham ambiguity and ADS black holes. Physics letters. Section B, 479(1-3), 291-298. doi:10.1016/S0370-2693(00)00339-7

Asymptotically anti-de Sitter space-times in pure gravity with negative cosmological constant are described, in all space-time dimensions greater than two, by classical degrees of freedom on the conformal boundary at space-like infinity. Their effective boundary action has a conformal anomaly for even dimensions and is conformally invariant for odd ones. These degrees of freedom are encoded in traceless tensor fields in the Fefferman-Graham asymptotic metric for any choice of conformally fiat boundary and generate all Schwarzschild and Kerr black holes in anti-de Sitter space-time. We argue that these fields describe components of an energy-momentum tensor of a boundary theory and show explicitly how this is realized in 2 + 1 dimensions. There, the Fefferman-Graham fields reduce to the generators of the Virasoro algebra and give the mass and the angular momentum of the BTZ black holes. Their local expression is the Liouville field in a general curved background. (C) 2000 Elsevier Science B.V.

https://dipot.ulb.ac.be/dspace/bitstream/2013/72011/1/Elsevier_49387.pdf

Aspects of (2+1) dimensional gravity: AdS3 asymptotic dynamics in the framework of Fefferman-Graham-Lee theorems

Rooman, M., & Spindel, P. (2000). Aspects of (2+1) dimensional gravity: AdS3 asymptotic dynamics in the framework of Fefferman-Graham-Lee theorems. Annalen der Physik, 9, 161-167.

1999

Expression of a novel factor in human breast cancer cells with metastatic potential

Ree, A. H., Tvermyr, M., Engebraaten, O., Rooman, M., Røsok, Ø., Hovig, E., Meza-Zepeda, L. A., Bruland, O., & Fodstad, Ø. (1999). Expression of a novel factor in human breast cancer cells with metastatic potential. Cancer research, 59(18), 4675-4680.

Prediction of stability changes upon single-site mutations using database-derived potentials

Gilis, D., & Rooman, M. (1999). Prediction of stability changes upon single-site mutations using database-derived potentials. Theoretical Chemistry accounts, 101(1-3), 46-50. doi:10.1007/s002140050404

One of the purposes of studying protein stability changes upon mutations is to get information about the dominating interactions that drive folding and stabilise the native structure. With this in mind, we present a method that predicts folding free-energy variations caused by point mutations using combinations of two types of database-derived potentials, i.e. backbone torsion-angle potentials and distance potentials, describing local and non-local interactions along the chain, respectively. The method is applied to evaluate the folding free-energy changes of 344 single-site mutations introduced in six different proteins and a synthetic peptide. We found that the relative importance of local versus non-local interactions along the chain is essentially a function of the solvent accessibility of the mutated residues. For the subset of totally buried residues, the optimal potential is the sum of a distance potential and a torsion potential weighted by a factor of 0.4. This combination yields a correlation coefficient between measured and computed changes in folding free energy of 0.80. For mutations of partially buried residues, the best potential is the sum of a torsion potential and a distance potential weighted by 0.7. For fully accessible residues, the torsion potentials taken alone perform best, reaching correlation coefficients of 0.87 on all but 10 mutations; the excluded mutations seem to modify the backbone structure or to involve interactions that are atypical for the surface. These results show that the relative weight of non-local interactions along the sequence decreases as the solvent accessibility of the mutated residue increases, and vanishes at the protein surface. On the contrary, the weight of local interactions increases with solvent accessibility. The latter interactions are nevertheless never negligible, even for the most buried residues.

(2+1)-dimensional stars

Lubo, M., Rooman, M., & Spindel, P. (1999). (2+1)-dimensional stars. Physical Review D - Particles, Fields, Gravitation and Cosmology, 59(4), 1-12.

1998

Gödel metric as a squashed anti-de Sitter geometry

Rooman, M., & Spindel, P. (1998). Gödel metric as a squashed anti-de Sitter geometry. Classical and quantum gravity, 15(10), 3241-3249.

Typical interaction patterns in alpha-beta and beta-alpha turn motifs.

Wintjens, R., Wodak, S., & Rooman, M. (1998). Typical interaction patterns in alpha-beta and beta-alpha turn motifs. Protein engineering, 11(7), 505-522.

A fully automatic classification procedure of short protein fragments is applied to identify connections between alpha-helices and beta-strands in a dataset of 141 protein chains. It yields 15 structural families of alphabeta turns and 15 families of betaalpha turns with at least five members. The sequence and structural features of these turn motifs are analysed with the focus on the local interactions located at alpha-helix and beta-strand ends. This analysis reveals specific interaction patterns that occur frequently among the members of many of the identified turn motifs. For the beta-strands, novel patterns are identified at the strands' entry and exit; they involve side chain/side chain contacts and beta-turns, generally of type I or II. For the alpha-helices, the interaction patterns consist of several backbone/backbone or backbone/side chain hydrogen bonds and of hydrophobic contacts; they generalize the well known N-terminal capping and C-terminal Schellman motifs. The interaction patterns at both ends of alpha-helices and beta-strands are found to constitute favourable structure motifs with low amino acid sequence specificity; their possible stabilizing role is discussed. Finally, the robustness of our classification procedure and of the description of N- and C-cap interaction patterns is validated by repeating our analysis on a larger dataset of 381 protein chains and showing that the results are maintained.

Different derivations of knowledge-based potentials and analysis of their robustness and context-dependent predictive power.

Rooman, M., & Gilis, D. (1998). Different derivations of knowledge-based potentials and analysis of their robustness and context-dependent predictive power. European journal of biochemistry / FEBS, 254(1), 135-143. doi:10.1046/j.1432-1327.1998.2540135.x

The possibility of defining effective potentials from known protein structures, which are sufficiently accurate to be used for protein-structure-prediction purposes, is investigated. Three types of distance potentials and three types of backbone torsion potentials are defined, based on propensities of amino acid pairs to be separated by a given spatial distance or to be associated to a backbone torsion angle domain. Their differences reside in the way the physical correlations between the amino acids and the conformational states are extracted from the bulk interactions due to the presence of many residues in a protein. For the distance potentials, a physical meaning can be associated to the different definitions, given that some of the potentials favor hydrophobic interactions and others favor interactions between oppositely charged residues. The performance of the different torsion and distance potentials in structure prediction procedures, in particular native-fold recognition and evaluation of protein stability changes upon point mutations, is analyzed. It appears to differ according to the specific proteins and protein environments. In particular, one of the distance potentials performs better than the others for membrane proteins and in protein regions involving charged residues, but less well in other protein regions. Furthermore, the dependence of the potentials on the characteristics of the proteins from which they are derived is analyzed. It is shown that the dependence of the potentials on the length, amino acid composition and secondary-structure content of the proteins from the dataset is either very limited or rather strong, according to the type of potential. The results obtained suggest that the main problem limiting the performance of database-derived potentials is their lack of universality: each potential describes with satisfactory accuracy only the interactions present in certain protein environments.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60572/1/1998_Rooman_EurJBiochem.pdf

Structural classification of alpha-beta-beta and beta-beta-alpha supersecondary structure units in proteins

Boutonnet, N., Kajava, A., & Rooman, M. (1998). Structural classification of alpha-beta-beta and beta-beta-alpha supersecondary structure units in proteins. Proteins, 30(2), 193-212. doi:10.1002/(SICI)1097-0134(19980201)30:2<193::AID-PROT9>3.0.CO;2-O
https://dipot.ulb.ac.be/dspace/bitstream/2013/145291/3/145291.pdf

1997

Chiral supersymmetric pp-wave solutions of IIA supergravity

Gabriel, C., Spindel, P., & Rooman, M. (1997). Chiral supersymmetric pp-wave solutions of IIA supergravity. Physics letters. Section B, 415(1), 54-62.

Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence.

Gilis, D., & Rooman, M. (1997). Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. Journal of Molecular Biology, 272(2), 276-290. doi:10.1006/jmbi.1997.1237

For 238 mutations of residues totally or partially buried in the protein core, we estimate the folding free energy changes upon mutation using database-derived potentials and correlate them with the experimentally measured ones. Several potentials are tested, representing different kinds of interactions. Local interactions along the chain are described by torsion potentials, based on propensities of amino acids to be associated with backbone torsion angle domains. Non-local interactions along the sequence are represented by distance potentials, derived from propensities of amino acid pairs or triplets to be at a given spatial distance. We find that for the set of totally buried residues, the best performing potential is a combination of a distance potential and a torsion potential weighted by a factor of 0.4; it yields a correlation coefficient between computed and measured changes in folding free energy of 0.80. For mutations of partially buried residues, the best potential is a combination of a torsion potential and a distance potential weighted by a factor of 0.7, and for the previously analysed mutations of solvent accessible residues, it is a torsion potential taken individually; the respective correlation coefficients reach 0.82 and 0.87. These results show that distance potentials, dominated by hydrophobic interactions, represent best the main interactions stabilizing the protein core, whereas torsion potentials, describing local interactions along the chain, represent best the interactions at the protein surface. The prediction accuracy reached by the distance potentials is, however, lower than that of the torsion potentials. A possible reason for this is that distance potentials would not describe correctly the effect on protein stability due to cavity formation upon mutating a large into a small amino acid. Last but not least, our results indicate that although local interactions, responsible for secondary structure formation, do not dominate in the protein core, they are not negligible for all that. They have a significant weight in the delicate balance between all the interactions that ensure protein stability.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60573/1/1997_Gilis_JMolBiol.pdf

1996

Structural classification of HTH DNA-binding domains and protein-DNA interaction modes.

Wintjens, R., & Rooman, M. (1996). Structural classification of HTH DNA-binding domains and protein-DNA interaction modes. Journal of Molecular Biology, 262(2), 294-313. doi:10.1006/jmbi.1996.0514

This paper constitutes an attempt to rationalize the structural similarities and differences that are observed among the HTH DNA-binding domains, and the various modes of protein-DNA interactions. It consists of classifying all the domains of known structure into families on the basis of the spatial arrangement of their helices, irrespective of the type of loops and the presence of beta-strands, and examining the interaction patterns between amino acids and DNA within each family. It is found that the recognition helix and the preceding helix along the chain have always the same relative orientation. Structural differences arise when considering three helices, corresponding usually to the recognition helix and the two preceding ones, but sometimes to the recognition helix and the two flanking helices. Using an automatic classification procedure, seven main families are obtained, whose members have in common the spatial arrangement of their three key helices, but have sometimes different topology and belong to different species. The structural divergence among these families and the existence of structural intermediates are analyzed. Searching these families systematically for recurrent motifs, leads to identify two specific turns, besides the HTH turn. They both link the two helices preceding the recognition helix and are each characteristic of a given family. Furthermore, the conservation of protein-DNA interaction patterns is examined with respect to the structural alignments. These patterns are found to be relatively well conserved within each family and to be different between the different families. The agreement of the structural classification and the patterns of protein-DNA contacts justify our approach, and suggests its applicability, in particular for modelling protein-DNA interactions.

Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials.

Gilis, D., & Rooman, M. (1996). Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials. Journal of Molecular Biology, 257(5), 1112-1126. doi:10.1006/jmbi.1996.0226

The stability changes in peptides and proteins caused by the substitution of a single amino acid, which can be measured experimentally by the change in folding free energy, are evaluated here using effective potentials derived from known protein structures. The analysis is focused on mutations of residues that are accessible to the solvent. These represent in total 106 mutations, introduced at different sites in barnase, bacteriophage T4 lysozyme and chymotrypsin inhibitor 2, and in a synthetic helical peptide. Assuming that the mutations do not modify the backbone structure, the changes in folding free energies are computed using various types of database-derived potentials and are compared with the measured ones. Distance-dependent residue-residue potentials are found to be inadequate for estimating the stability changes caused by these mutations, as they are dominated by hydrophobic interactions, which do not play an essential role at the protein surface. On the contrary, the potentials based on backbone torsion angle propensities yield quite good results. Indeed, for a subset of 96 out of the 106 mutations, the computed and measured changes in folding free energy correlate with a linear correlation coefficient of 0.87. Moreover, the ten mutations that are excluded from the correlation either seem to cause modifications of the backbone structure or to involve strong hydrophobic interactions, which are atypical for solvent-accessible residues. We find furthermore that raising the ionic strength of the solvent used for measuring the changes in folding free energies improves the correlation, as it tends to mask the electrostatic interactions. When adding to these 106 mutations 44 mutations performed in staphylococcal nuclease and chemotactic protein, which were first discarded because some of them were suspected to affect the backbone conformation or the denatured state, the correlation between measured and computed folding free energy changes remains quite good: the correlation coefficient is 0.86 for 135 out of the 150 mutations. The success of the backbone torsion potentials in predicting stability changes indicates that the approximations made for deriving these potentials are adequate. It suggests moreover that the local interactions along the chain dominate at the protein surface.

https://dipot.ulb.ac.be/dspace/bitstream/2013/60574/1/1996_Gilis_JMolBiol.pdf

Automatic classification and analysis of alpha alpha-turn motifs in proteins.

Wintjens, R., Rooman, M., & Wodak, S. (1996). Automatic classification and analysis of alpha alpha-turn motifs in proteins. Journal of Molecular Biology, 255(1), 235-253. doi:10.1006/jmbi.1996.0020

An automatic procedure for the classification of short protein fragments, representing turn motifs between two consecutive secondary structures, is presented. This procedure has two steps. Fragments of given length are first grouped on the basis of their backbone dihedral angle values, and then clustered as a function of the root-mean-square deviation of their superimposed backbone atoms. The classification procedure identifies 63 families of turn motifs with at least five members, in a dataset of 141 proteins. A detailed analysis is presented of the ten identified alpha alpha-turn families, of which four correspond to novel motifs. The sequence and structure features that characterize these families are described. It is found that some features are conserved within the fragments belonging to the same family, but their environment in the parent protein varies considerably. N-capping interactions and helix stop signals are encountered in a number of families, where they seem to stabilize the motif conformation. In two families, one with three residues in the loop, and one with four, an appreciable fraction of the members displays both types of characteristic helix end interactions in the same motif. Interestingly, contrary to most other alpha alpha-turns, the relative frequency of these two motifs is much higher than that of short protein segments with the same loop conformation. Furthermore, the family with three residues in the loop includes the helix-turn-helix motif known to bind DNA. It seems to be the only one among the ten identified families that can be related to biological function.

1995

Protein structure prediction by threading methods: evaluation of current techniques.

Lemer, C., Rooman, M., & Wodak, S. (1995). Protein structure prediction by threading methods: evaluation of current techniques. Proteins, 23(3), 337-355. doi:10.1002/prot.340230308

This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure.

Automatic analysis of protein conformational changes by multiple linkage clustering.

Boutonnet, N., Rooman, M., & Wodak, S. (1995). Automatic analysis of protein conformational changes by multiple linkage clustering. Journal of Molecular Biology, 253(4), 633-647. doi:10.1006/jmbi.1995.0578

An automatic algorithm is presented for analyzing protein conformational changes such as those occurring upon substrate binding or in different crystal forms of the same protein. Using, as sole information, the atomic coordinates of a pair of protein structures, the procedure first generates structure alignments, which optimize the root-mean-square deviation of the backbone atoms. To this end, equivalent secondary structures and/or loops from both proteins are combined by a multiple linkage hierarchic clustering algorithm, which generates several intertwined clustering trees. Automatic analysis of these clustering trees is used to dissect the mechanism of the conformational change. It allows the identification of the static core, representing the collection of secondary structures which undergo no structural changes, as well as other entities which move like rigid bodies. It also permits the description of the movement of secondary structures or loops relative to this core or entities. USing this information, it can be inferred whether a particular conformational change involves shear or hinge motion, or components of both. The algorithm is applied to the analysis of the conformational changes of citrate synthase, lactate dehydrogenase, lactoferrin and beta-glucosyltransferase, representing typical examples of shear- and hinge-type mechanisms, and a varied range in movement size. The results are shown to be in excellent agreement with previous analyses, and to provide additional information which gives a more complete and objective picture of the conformational change. Using our automatic algorithm, we find that any conformational change may be viewed as having components of both shear- and hinge-type motion. Determining which of these is most appropriate requires the combination of the information provided by our procedure with detailed knowledge of the protein tertiary structures.

Are database-derived potentials valid for scoring both forward and inverted protein folding?

Rooman, M., & Wodak, S. (1995). Are database-derived potentials valid for scoring both forward and inverted protein folding? Protein engineering, 8(9), 849-858. doi:10.1093/protein/8.9.849

Database-derived potentials, compiled from frequencies of sequence and structure features, are often used for scoring the compatibility of protein sequences and conformations. It is often believed that these scores correspond to differences in free energy with, in addition, a term containing the partition function of the system. Since this function does not depend on the conformation, the potentials are considered to be valid for scoring the compatibility of different conformations with a given sequence ('forward folding'), but not of sequences with a given structure ('inverted folding'). This interpretation is questioned here. It is argued that when many body-effects, which dominate frequencies compiled from the protein database, are corrected for, the potentials approximate a physically meaningful free energy difference from which the partition function term cancels out. It is the difference between the free energy of a given sequence in a specific conformation and that of the same sequence in a denatured-like state. Two examples of denatured-like states are discussed. Depending on the considered state, the free energy difference reduces to the commonly used scoring scheme, or contains additional terms that depend on the sequence. In both cases, all the terms can be derived from sequence-structure frequencies in the database. Such free energy difference, commonly defined as the folding free energy, is a measure of protein stability and can be used for scoring both forward and inverted protein folding. The implications for the use of knowledge-based potentials in protein structure prediction are described. Finally, the difficulty of designing tests that could validate the proposed approach, and the inherent limitations of such tests, are discussed.

Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins.

Boutonnet, N., Rooman, M., Ochagavia, M. E., Richelle, J., & Wodak, S. (1995). Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins. Protein engineering, 8(7), 647-662.

A fully automatic procedure for aligning two protein structures is presented. It uses as sole structural similarity measure the root mean square (r.m.s.) deviation of superimposed backbone atoms (N, C alpha, C and O) and is designed to yield optimal solutions with respect to this measure. In a first step, the procedure identifies protein segments with similar conformations in both proteins. In a second step, a novel multiple linkage clustering algorithm is used to identify segment combinations which yield optimal global structure alignments. Several structure alignments can usually be obtained for a given pair of proteins, which are exploited here to define automatically the common structural core of a protein family. Furthermore, an automatic analysis of the clustering trees is described which enables detection of rigid-body movements between structure elements. To illustrate the performance of our procedure, we apply it to families of distantly related proteins. One groups the three alpha + beta proteins ubiquitin, ferredoxin and the B1-domain of protein G. Their common structure motif consists of four beta-strands and the only alpha-helix, with one strand and the helix being displaced as a rigid body relative to the remaining three beta-strands. The other family consists of beta-proteins from the Greek key group, in particular actinoxanthin, the immunoglobulin variable domain and plastocyanin. Their consensus motif, composed of five beta-strands and a turn, is identified, mostly intact, in all Greek key proteins except the trypsins, and interestingly also in three other beta-protein families, the lipocalins, the neuraminidases and the lectins. This result provides new insights into the evolutionary relationships in the very diverse group of all beta-proteins.

1994

Conformational properties of four peptides corresponding to alpha-helical regions of Rhodospirillum cytochrome c2 and bovine calcium binding protein.

Pintar, A., Chollet, A., Bradshaw, C., Chaffotte, A., Cadieux, C., Rooman, M., Hallenga, K., Knowles, J., Goldberg, M., & Wodak, S. (1994). Conformational properties of four peptides corresponding to alpha-helical regions of Rhodospirillum cytochrome c2 and bovine calcium binding protein. Biochemistry, 33(37), 11158-11173.

Four peptides corresponding to alpha-helical regions delimited by residues 63-73 and 97-112 of cytochrome c2 (Rhodospirillum) and residues 24-36 and 45-55 of bovine calcium binding protein are predicted to be alpha-helical by a recently developed method [Rooman, M., Kocher, J.P., & Wodak, S.J. (1991) J. Mol. Biol. 221, 961-979], synthesized by solid phase methods, and purified by HPLC, and their solution conformations are determined by NMR and CD. The observed conformational properties of these peptides in solution confirmed prediction results: in water/TFE (60/40, v/v) at room temperature, these peptides adopt an alpha-helical conformation, as shown by an extended pattern of strong, sequential dNN(i,i + 1) NOE cross-peaks, d alpha N(i,i + 1) NOEs of reduced intensity, several medium-range [d alpha N(i,i + 3), d alpha N(i,i + 4), d alpha beta-(i,i + 3)] NOE connectivities, small 3JH alpha N values, and more upfield alpha-proton chemical shifts. CD studies at different TFE concentrations and at room temperature provide further evidence of the propensity of these peptides to adopt an alpha-helical conformation in solution, as determined by the ellipticity values at 222 nm, and by deconvolution of the CD spectra. According to the method used, helicities in the range 34-50% and 55-75% are found for the 63-73 and 97-112 fragments of cytochrome c2, respectively, and in the range 53-80% and 42-65% for the fragments 24-36 and 45-55 of calcium binding protein in water/TFE (60/40, v/v) at 298 K. In addition, the experiments and predictions agree for those residues that are more flexible. Finally, the relevance of our results for the protein folding pathways is discussed.

Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches.

Kocher, J.-P., Rooman, M., & Wodak, S. (1994). Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. Journal of Molecular Biology, 235(5), 1598-1613. doi:10.1006/jmbi.1994.1109

Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C alpha or inter-C beta distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.

Identification of short turn motifs in proteins using sequence and structure fingerprints

Wintjens, R., Rooman, M., & Wodak, S. (1994). Identification of short turn motifs in proteins using sequence and structure fingerprints. Israel Journal of Chemistry, 34(2), 257-269. doi:10.1002/ijch.199400030

Families of 20‐residue turn motifs are identified in a dataset of known protein structures using a fully automatic classification procedure that relies on dihedral angle values and atomic root mean square deviations of the polypeptide backbone. Four of the identified motifs, a novel αα connection, and three well‐known αα, βα, and ββ motifs, are used to investigate the possibility of identifying sequences that adopt these motifs in a library of 20‐residue sequence segments from the protein dataset. To this end, several types of fingerprints are derived for individual members of each turn motif family, and for each family as a whole. These fingerprints represent the sequence conservation among family members, or different reduced descriptions of the protein 3D structure that consider the backbone conformation or the tertiary interactions in the context of the parent proteins. All sequence segments in the library are successively mounted onto the fingerprints, without allowing gaps, and the energy of each mount is evaluated using effective potentials derived from known protein structures. The results show that the ability to recognize native sequences associated with a turn motif is improved when different types of fingerprints are combined, and that it fluctuates significantly according to the specific turn family considered. Overall, however, this ability remains rather limited to a level which is much below the native recognition performance generally achieved for full proteins. The fingerprints and their associated potentials are nevertheless found to be quite effective in generating subsets of solutions that are significantly enriched for the correct sequence‐structure combinations. This may have very useful applications in protein folding simulations and in homology modeling. Copyright © 1994 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim

1993

Generating and testing protein folds

Wodak, S., & Rooman, M. (1993). Generating and testing protein folds. Current opinion in structural biology, 3(2), 247-259.

1992

Extracting information on folding from the amino acid sequence: consensus regions with preferred conformation in homologous proteins.

Rooman, M., & Wodak, S. (1992). Extracting information on folding from the amino acid sequence: consensus regions with preferred conformation in homologous proteins. Biochemistry, 31(42), 10239-10249.

It is investigated whether protein segments predicted to have a well-defined conformational preference in the absence of tertiary interactions are conserved in families of homologous proteins. The prediction method follows the procedures of Rooman, M., Kocher, J.-P., and Wodak, S. (preceding paper in this issue). It uses a knowledge-based force field that incorporates only local interactions along the sequence and identifies segments whose lowest energy structure displays a sizable energy gap relative to other computed conformations. In 13 of the protein families and subfamilies considered that are sufficiently homologous to have similar 3D structures, at least one region is consistently predicted as having the same preferred conformation in virtually all family members. These regions are between 4 and 26 residues long. They are often located at chain ends and correspond primarily to segments of secondary structure heavily involved in interactions with the rest of the protein, suggesting that they could act as nuclei around which other parts of the structure would assemble. Experimental data on early folding intermediates or on protein fragments with appreciable structure in aqueous solution are available for more than half of the protein families. Comparison of our results with these data is quite favorable. They reveal that each of the experimentally identified early formed, or independently stable, substructures harbors at least one of the segments consistently predicted as having a preferred conformation by our procedure. The implications of our findings for the conservation of folding pathways in homologous proteins are discussed.

Extracting information on folding from the amino acid sequence: accurate predictions for protein regions with preferred conformation in the absence of tertiary interactions.

Rooman, M., Kocher, J.-P., & Wodak, S. (1992). Extracting information on folding from the amino acid sequence: accurate predictions for protein regions with preferred conformation in the absence of tertiary interactions. Biochemistry, 31(42), 10226-10238.

A recently developed procedure to predict backbone structure from the amino acid sequence [Rooman, M., Kocher, J. P., & Wodak, S. (1991) J. Mol. Biol, 221, 961-979] is fine tuned to identify protein segments, of length 5-15 residues, that adopt well-defined conformations in the absence of tertiary interactions. These segments are obtained by requiring that their predicted lowest energy structures have a sizable energy gap relative to other computed conformations. Applying this procedure to 69 proteins of known structure, we find that regions with largest energy gaps--those having highly preferred conformations--are also the most accurately predicted ones. On the basis of previous findings that such regions correlate well with sites that become structured early during folding, our approach provides the means of identifying such sites in proteins without prior knowledge of the tertiary structure. Furthermore, when predictions are performed so as to ignore the influence of residues flanking each segment along the sequence, a situation akin to excising the considered peptide from the rest of the chain, they offer the possibility of identifying protein segments liable to adopt well-defined conformations on their own. The described approach should have useful applications in experimental and theoretical investigations of protein folding and stability, and aid in designing peptide drugs and vaccines.

1991

Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions.

Rooman, M., Kocher, J.-P., & Wodak, S. (1991). Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. Journal of Molecular Biology, 221(3), 961-979. doi:10.1016/0022-2836(91)80186-X

A method is developed to compute backbone tertiary folds from the amino acid sequence. In this method, the number of degrees of freedom is drastically reduced by neglecting side-chain flexibility, and by describing backbone conformations as combinations of only seven structural states. These are characterized by single values of the dihedral angles phi, psi and omega, representing allowed conformations of the isolated dipeptide. We show that this restrictive model is none the less capable of describing native backbones to within acceptable deviations. Using our backbone description, potentials of mean force are derived from a database of known protein structures, based on statistical influences of single residues and residue pairs on the conformational states in their vicinity along the chain. This yields the force-field component due to local interactions, which is then used to predict lowest-energy conformations from any given amino acid sequence. The prediction algorithm does not require searching conformational space and is therefore extremely fast. Another important asset of our method is that it is able to compute not only the minimum energy conformation, but any number of lowest energy structures, whose relative preferences can be determined from the corresponding computed energy values. The performance of our procedure is tested on short peptides that are likely to be stabilized by local interactions. These include several helical structures and a hexapeptide with a beta-bend conformation, corresponding to peptides shown to have relatively well-defined conformations in aqueous solution, and to protein segments believed to adopt their native conformation early during folding. In addition, several flexible peptides are analysed. Except for the problems encountered in predicting observed disulphide bridges in two of the flexible peptides, and in a somewhat larger fragment comprising residues 30 to 51 of bovine trypsin inhibitor, prediction results compare very favourably with experimental data. Potential applications of our procedure to protein modelling and its extension to protein folding are discussed.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71944/1/Elsevier_49312.pdf

Weak correlation between predictive power of individual sequence patterns and overall prediction accuracy in proteins.

Rooman, M., & Wodak, S. (1991). Weak correlation between predictive power of individual sequence patterns and overall prediction accuracy in proteins. Proteins, 9(1), 69-78. doi:10.1002/prot.340090108

Patterns in amino acid properties (polar, hydrophobic, etc.) that characterize secondary structure motifs are derived from a database containing 75 protein structures, with the aim of circumventing the limitations due to data base size so as to increase structure prediction score. Many such sequence-structure associations with high intrinsic predictive power are found, which turn out to be correct 78% of the time when applied individually to proteins outside the learning set. Based on these associations, a prediction method is developed, which reaches the score of 62% on the 3 states alpha-helix, beta-strand, and loop, without using additional constraints. Though this score is quite good compared to that of other available prediction methods, it is much lower than could be expected from the high intrinsic predictive power of the associations used. The reasons underlying this surprising result, which indicate that prediction score and intrinsic predictive power are only weakly coupled, are discussed. It is also shown that the size of the present database still seriously limits prediction scores, even when property patterns are used, and that higher scores are expected in large databases. Clues are provided on the relative influence of neglecting spatial interactions on prediction efficiency, suggesting that, in sufficiently large databases, predicted secondary structures would correspond to those formed early in the folding process. This hypothesis is tested by confronting present predictions with available experimental data on early protein folding intermediates and on small peptides that adopt a relatively stable conformation in water. Although admittedly there are still too few such data, results suggest that the hypothesis might be well founded.

1990

Relations between protein sequence and structure and their significance.

Rooman, M., Rodriguez, J., & Wodak, S. (1990). Relations between protein sequence and structure and their significance. Journal of Molecular Biology, 213(2), 337-350. doi:10.1016/S0022-2836(05)80195-0

The relation between amino acid sequence and local structure in proteins is investigated. The local structures considered are either the four classes of secondary structure (H, E, T and C) or four classes of local conformations defined using measures of conformational similarity based on distances between C alpha atoms. The classes are obtained by applying an automatic clustering procedure to short polypeptide fragments of uniform length from a database of 75 known protein structures. The thrust of our investigation consists of systematically searching the database for simple amino acid patterns of the type Gly-X-Ala-X-X-Val, where X denotes an arbitrary residue. Patterns that are nearly always associated with the same structure are retained. Finding many such associations, we then evaluate by a statistical approach how many among them are non-random and compare the results for different definitions of local structure. A similar comparison is made for the predictive value of retained associations, which is assessed using an internal test based on dividing the database into "learning" and "test" subsets. While we find that local structures defined by conformational similarity are not superior to secondary structure for prediction purposes, they help us gain insight into the factors that influence the predictive value of derived associations. A major conclusion is that the number of retained associations is in large excess over the number expected from a random correlation between sequence and structure, irrespective of how local conformation is defined. However, only a very small number of these associations can be earmarked as reliable using statistical criteria, due to the limited size of the database. We find, for instance, that the pattern Ala-Ala-X-X-Lys reliably characterizes helix, and the pattern Val-X-Val-X-X-X-Ala reliably characterizes extended structure and beta-strand. The possibility is discussed that these and other reliable associations correspond to regions of the polypeptide chain whose conformations are locally determined and that these regions may play a role in folding.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71946/1/Elsevier_49316.pdf

Automatic definition of recurrent local structure motifs in proteins.

Rooman, M., Rodriguez, J., & Wodak, S. (1990). Automatic definition of recurrent local structure motifs in proteins. Journal of Molecular Biology, 213(2), 327-336. doi:10.1016/S0022-2836(05)80194-9

An automatic procedure for defining recurrent folding motifs in proteins of known structure is described. These motifs are formed by short polypeptide fragments of equal size containing between four and seven residues. The method applies a classical clustering algorithm that operates on distances between selected backbone atoms. In one application, we use it to cluster all protein fragments into only four structural classes. This classification is rough considering the observed diversity of local structures, but comparable in homogeneity to the four classes of secondary structure (alpha-helix, beta-strand, turn and coil). Yet, it discriminates between extended and curved coil and distinguishes beta-bulges from beta-strands. In a second application, the clustering procedure is combined with assignment of backbone dihedral angles to allowed regions in the Ramachandran map. This produces an exhaustive repertoire of highly homogeneous families of structural motifs that contains all the beta-hairpins, beta alpha- and alpha beta-loops previously defined by manual procedures, and new structural families of which two examples, a beta alpha-loop and an alpha-helix beginning, are analyzed in detail. The described automatic procedures should be useful in categorizing structure information in proteins, thereby increasing our ability to analyze relations between structure and sequence.

https://dipot.ulb.ac.be/dspace/bitstream/2013/71947/1/Elsevier_49317.pdf

1989

Amino acid sequence templates derived from recurrent turn motifs in proteins: critical evaluation of their predictive power.

Rooman, M., Wodak, S., & Thornton, J. (1989). Amino acid sequence templates derived from recurrent turn motifs in proteins: critical evaluation of their predictive power. Protein engineering, 3(1), 23-27. doi:10.1093/protein/3.1.23

Amino acid sequence patterns suggested to characterize specific recurrent turn conformation in protein are tested as to their predictive power in a database containing 75 proteins of known structure. Many of these patterns are found to be associated with local structures that differ from the motifs originally used to derive them. It is therefore concluded that, while they could be useful for improving predictions made by other methods, their stand-alone predictive power is poor. The issue of deriving and validating consensus sequence patterns for use in protein structure prediction is raised.

1988

Identification of predictive sequence motifs limited by protein structure data base size.

Rooman, M., & Wodak, S. (1988). Identification of predictive sequence motifs limited by protein structure data base size. Nature (London), 335(6185), 45-49. doi:10.1038/335045a0

Associations between short amino acid sequence patterns and protein secondary structure classes can be found by searching a data base of known protein structures. Analysis of these associations suggests that secondary structure of proteins can be determined locally by sequence motifs of high predictive value, but at present our ability to find these motifs is limited by the size of the available data bases.

1987

Metric space-time as fixed point of the renormalization group equations on fractal structures

Englert, F., Frère, J.-M., Rooman, M., & Spindel, P. (1987). Metric space-time as fixed point of the renormalization group equations on fractal structures. Nuclear physics. B, 280, 147-180.

1986

Chiral fermions on two-dimensional fractal structures

Rooman, M. (1986). Chiral fermions on two-dimensional fractal structures. Physics letters. Section B, 182(3-4), 358-364.

Stability of the large-scale metric in fractal structures

Rooman, M. (1986). Stability of the large-scale metric in fractal structures. Physics letters. Section B, 169(2-3), 253-258.

1984

The mass spectrum of supergravity on the round seven-sphere

Casher, A., Englert, F., Nicolai, H., & Rooman, M. (1984). The mass spectrum of supergravity on the round seven-sphere. Nuclear physics. B, 243(1), 173-188.

Eleven-dimensional supergravity and octonions

Rooman, M. (1984). Eleven-dimensional supergravity and octonions. Nuclear physics. B, 236(2), 501-521.

The fluctuating seven-sphere in eleven-dimensional supergravity

Biran, B., Casher, A., Englert, F., Rooman, M., & Spindel, P. (1984). The fluctuating seven-sphere in eleven-dimensional supergravity. Physics letters. Section B, 134(3-4), 179-183.

1983

Symmetries in eleven-dimensional supergravity compactified on a parallelized seven-sphere

Englert, F., Rooman, M., & Spindel, P. (1983). Symmetries in eleven-dimensional supergravity compactified on a parallelized seven-sphere. Physics letters. Section B, 130(1-2), 50-54.

Supersymmetry breaking by torsion and the Ricci-flat squashed seven-spheres

Englert, F., Rooman, M., & Spindel, P. (1983). Supersymmetry breaking by torsion and the Ricci-flat squashed seven-spheres. Physics letters. Section B, 127(1-2), 47-50.

An open universe solution to Einstein's equations coupled to a quantum scalar field

Nardone, P., & Rooman, M. (1983). An open universe solution to Einstein's equations coupled to a quantum scalar field. Physics letters. Section B, 123(3-4), 182-184.

1982

The avoidance of singularity in a friedmann universe by particle production

Rooman, M., & Spindel, P. (1982). The avoidance of singularity in a friedmann universe by particle production. Nuclear physics. B, 209(2), 497-519.

Updated on October 11, 2021