kinCSM: using graph-based signatures to predict small molecule CDK2 inhibitors 

Abstract: Protein phosphorylation acts as an essential on/off switch in many cellular signalling pathways, regulating protein function. This has led to ongoing interest in targeting kinases for therapeutic intervention. Computer-aided drug discovery has been proven a useful and cost-effective approach for facilitating prioritisation and enrichment of screening libraries. Limited effort, however, has been devoted to developing and tailoring in silico tools to assist the development of kinase inhibitors and providing relevant insights on what makes potent inhibitors.

To fill this gap, here we developed kinCSM, an integrative computational tool capable of accurately identifying potent cyclin-dependent kinase 2 (CDK2) inhibitors, quantitatively predicting CDK2 ligand-kinase inhibition constants (pKi) and classifying inhibition modes without kinase information. kinCSM predictive models were built using supervised learning and leveraged the concept of graph-based signatures to capture both physicochemical properties and geometry properties of small molecules. CDK2 inhibitors were accurately identified with Matthew’s Correlation Coefficients of up to 0.74, and inhibition constants predicted with Pearson’s correlation of up to 0.76, both with consistent performances of 0.66 and 0.68 on a non-redundant blind test, respectively.  kinCSM was also able to identify the potential type of inhibition for a given molecule, achieving Matthew’s Correlation Coefficient of up to 0.80 on cross-validation and 0.73 on the blind test. Analysing the molecular composition of kinase inhibitors revealed enriched chemical fragments in CDK2 inhibitors and different types of inhibitors, which provides insights into the molecular mechanisms behind ligand-kinase interactions. We believe kinCSM will be an invaluable tool to guide future kinase drug discovery.