### Abstract

The identification of chemical structures in natural product mixtures is an important task in drug discovery but is still a challenging problem, as structural elucidation is a time-consuming process and is limited by the available mass spectra of known natural products. Computer-aided structure elucidation (CASE) strategies seek to automatically propose a list of possible chemical structures in mixtures by utilizing chromatographic and spectroscopic methods. However, current CASE tools still cannot automatically solve structures for experienced natural product chemists. Here, we formulated the structural elucidation of natural products in a mixture as a computational problem by extending a list of scaffolds using a weighted side chain list after analyzing a collection of 243,130 natural products and designed an efficient algorithm to precisely identify the chemical structures. The complexity of such a problem is NP-complete. A dynamic programming (DP) algorithm can solve this NP-complete problem in pseudo-polynomial time after converting floating point molecular weights into integers. However, the running time of the DP algorithm degrades exponentially as the precision of the mass spectrometry experiment grows. To ideally solve in polynomial time, we proposed a novel iterative DP algorithm that can quickly recognize the chemical structures of natural products. By utilizing this algorithm to elucidate the structures of four natural products that were experimentally and structurally determined, the algorithm can search the exact solutions, and the time performance was shown to be in polynomial time for average cases. The proposed method improved the speed of the structural elucidation of natural products and helped broaden the spectrum of available compounds that could be applied as new drug candidates. A web service built for structural elucidation studies is freely accessible via the following link (http://csccp.cmdm.tw/).

Original language | English |
---|---|

Article number | 57 |

Journal | Journal of Cheminformatics |

Volume | 9 |

Issue number | 1 |

DOIs | |

Publication status | Published - Nov 15 2017 |

Externally published | Yes |

### Fingerprint

### Keywords

- CASE
- Dynamic programming
- Natural products
- Polynomial time

### ASJC Scopus subject areas

- Computer Science Applications
- Physical and Theoretical Chemistry
- Computer Graphics and Computer-Aided Design
- Library and Information Sciences

### Cite this

*Journal of Cheminformatics*,

*9*(1), [57]. https://doi.org/10.1186/s13321-017-0244-9

**An efficient computer-aided structural elucidation strategy for mixtures using an iterative dynamic programming algorithm.** / Su, Bo Han; Shen, Meng Yu; Harn, Yeu Chern; Wang, San Yuan; Schurz, Alioune; Lin, Chieh; Lin, Olivia A.; Tseng, Yufeng J.

Research output: Contribution to journal › Article

*Journal of Cheminformatics*, vol. 9, no. 1, 57. https://doi.org/10.1186/s13321-017-0244-9

}

TY - JOUR

T1 - An efficient computer-aided structural elucidation strategy for mixtures using an iterative dynamic programming algorithm

AU - Su, Bo Han

AU - Shen, Meng Yu

AU - Harn, Yeu Chern

AU - Wang, San Yuan

AU - Schurz, Alioune

AU - Lin, Chieh

AU - Lin, Olivia A.

AU - Tseng, Yufeng J.

PY - 2017/11/15

Y1 - 2017/11/15

N2 - The identification of chemical structures in natural product mixtures is an important task in drug discovery but is still a challenging problem, as structural elucidation is a time-consuming process and is limited by the available mass spectra of known natural products. Computer-aided structure elucidation (CASE) strategies seek to automatically propose a list of possible chemical structures in mixtures by utilizing chromatographic and spectroscopic methods. However, current CASE tools still cannot automatically solve structures for experienced natural product chemists. Here, we formulated the structural elucidation of natural products in a mixture as a computational problem by extending a list of scaffolds using a weighted side chain list after analyzing a collection of 243,130 natural products and designed an efficient algorithm to precisely identify the chemical structures. The complexity of such a problem is NP-complete. A dynamic programming (DP) algorithm can solve this NP-complete problem in pseudo-polynomial time after converting floating point molecular weights into integers. However, the running time of the DP algorithm degrades exponentially as the precision of the mass spectrometry experiment grows. To ideally solve in polynomial time, we proposed a novel iterative DP algorithm that can quickly recognize the chemical structures of natural products. By utilizing this algorithm to elucidate the structures of four natural products that were experimentally and structurally determined, the algorithm can search the exact solutions, and the time performance was shown to be in polynomial time for average cases. The proposed method improved the speed of the structural elucidation of natural products and helped broaden the spectrum of available compounds that could be applied as new drug candidates. A web service built for structural elucidation studies is freely accessible via the following link (http://csccp.cmdm.tw/).

AB - The identification of chemical structures in natural product mixtures is an important task in drug discovery but is still a challenging problem, as structural elucidation is a time-consuming process and is limited by the available mass spectra of known natural products. Computer-aided structure elucidation (CASE) strategies seek to automatically propose a list of possible chemical structures in mixtures by utilizing chromatographic and spectroscopic methods. However, current CASE tools still cannot automatically solve structures for experienced natural product chemists. Here, we formulated the structural elucidation of natural products in a mixture as a computational problem by extending a list of scaffolds using a weighted side chain list after analyzing a collection of 243,130 natural products and designed an efficient algorithm to precisely identify the chemical structures. The complexity of such a problem is NP-complete. A dynamic programming (DP) algorithm can solve this NP-complete problem in pseudo-polynomial time after converting floating point molecular weights into integers. However, the running time of the DP algorithm degrades exponentially as the precision of the mass spectrometry experiment grows. To ideally solve in polynomial time, we proposed a novel iterative DP algorithm that can quickly recognize the chemical structures of natural products. By utilizing this algorithm to elucidate the structures of four natural products that were experimentally and structurally determined, the algorithm can search the exact solutions, and the time performance was shown to be in polynomial time for average cases. The proposed method improved the speed of the structural elucidation of natural products and helped broaden the spectrum of available compounds that could be applied as new drug candidates. A web service built for structural elucidation studies is freely accessible via the following link (http://csccp.cmdm.tw/).

KW - CASE

KW - Dynamic programming

KW - Natural products

KW - Polynomial time

UR - http://www.scopus.com/inward/record.url?scp=85034565912&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034565912&partnerID=8YFLogxK

U2 - 10.1186/s13321-017-0244-9

DO - 10.1186/s13321-017-0244-9

M3 - Article

VL - 9

JO - Journal of Cheminformatics

JF - Journal of Cheminformatics

SN - 1758-2946

IS - 1

M1 - 57

ER -