A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
Approaching Optimal pH Enzyme Prediction with Large Language Models (2024)
Zaretckii, M., Buslaev, P., Kozlovskii, I., Morozov, A., & Popov, P. (2024). Approaching Optimal pH Enzyme Prediction with Large Language Models. ACS Synthetic Biology, Early online. https://doi.org/10.1021/acssynbio.4c00465
JYU-tekijät tai -toimittajat
Julkaisun tiedot
Julkaisun kaikki tekijät tai toimittajat: Zaretckii, Mark; Buslaev, Pavel; Kozlovskii, Igor; Morozov, Alexander; Popov, Petr
Lehti tai sarja: ACS Synthetic Biology
eISSN: 2161-5063
Julkaisuvuosi: 2024
Ilmestymispäivä: 28.08.2024
Volyymi: Early online
Kustantaja: American Chemical Society
Julkaisumaa: Yhdysvallat (USA)
Julkaisun kieli: englanti
DOI: https://doi.org/10.1021/acssynbio.4c00465
Julkaisun avoin saatavuus: Avoimesti saatavilla
Julkaisukanavan avoin saatavuus: Osittain avoin julkaisukanava
Julkaisu on rinnakkaistallennettu (JYX): https://jyx.jyu.fi/handle/123456789/97023
Tiivistelmä
Enzymes are widely used in biotechnology due to their ability to catalyze chemical reactions: food making, laundry, pharmaceutics, textile, brewing─all these areas benefit from utilizing various enzymes. Proton concentration (pH) is one of the key factors that define the enzyme functioning and efficiency. Usually there is only a narrow range of pH values where the enzyme is active. This is a common problem in biotechnology to design an enzyme with optimal activity in a given pH range. A large part of this task can be completed in silico, by predicting the optimal pH of designed candidates. The success of such computational methods critically depends on the available data. In this study, we developed a language-model-based approach to predict the optimal pH range from the enzyme sequence. We used different splitting strategies based on sequence similarity, protein family annotation, and enzyme classification to validate the robustness of the proposed approach. The derived machine-learning models demonstrated high accuracy across proteins from different protein families and proteins with lower sequence similarities compared with the training set. The proposed method is fast enough for the high-throughput virtual exploration of protein space for the search for sequences with desired optimal pH levels.
YSO-asiasanat: entsyymit; pH; koneoppiminen; kielimallit; biotekniikka; laskennallinen kemia; in silico -menetelmä
Vapaat asiasanat: enzyme optimal pH; large language models; machine learning; protein engineering; protein engineering
Liittyvät organisaatiot
Hankkeet, joissa julkaisu on tehty
- Entsyymien laskennallinen seulonta
- Buslaev, Pavel
- Suomen Akatemia
OKM-raportointi: Kyllä
VIRTA-lähetysvuosi: 2024
Alustava JUFO-taso: 2