CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function

Protein–carbohydrate interactions are crucial for many cellular processes, but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein–carbohydrate complexes with experimental structural and biophysical data. We next trained and validated our tool, CSM-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function.

We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and made all data freely available through both a user-friendly web interface and API, to facilitate programmatic access. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.