Protein target prediction

Protein target prediction is a critical component of modern drug design and pharmaceutical research. Computational methods provide fast and cost efficient solutions for obtaining initial hypotheses in areas such as virtual screening, drug repurposing or side effects prediction. Their predictive value constantly increases, benefitting from exponential growth of available data and computer power. Currently, given naturally imperfect reproducibility of experimental activity assessments, virtual assays are estimated to provide overall comparable performance to their in-vitro counterparts, while offering significantly broader insights with only a fraction of cost and effort.

Our activity predictions are based on a neural network model trained on small molecule activity data available in the public domain, particularly these deposited in the the ChEMBL database. The model is regularly updated to include the newest records. After quality-based filtering of available data and selection of protein targets with favourable prediction accuracy, the model provides small molecule activity estimates against over a thousand individual protein targets across several species, with performance exceeding state of the art competition. High computational efficiency allows screening millions of compounds per 24 hours.

The model employs input queries in the form of commonly used molecular representations (e.g., SMILES, MOL2, SDF) of compound database. Each individual molecular structure undergoes standardisation, transformation into a vectorised representation, and subsequent submission to a neural network-based activity predictor.

For each compound, the primary output includes an estimate of affinity against all targets together with the assessment of response selectivity and sensitivity.