HitPickV2 is a web server to predict molecular targets of compounds.

HitPickV2 employs a novel ligand-based approach to predict targets of compounds. In this new version of HitPick (1), the number of possible predicted targets has increased to 2739 (1350 more than in the former HitPick version (1)). For a query compound, HitPickV2 explores structurally similar compounds within a chemical space of annotated compound-target interactions using k-nearest neighbours (k-NN) chemical similarity search (2) and scores the first encountered 10 protein targets in this space.



Target prediction data input

The query compounds can be inserted in a Text area or uploaded in a file (without headers). Each compound should be included in a different line represented as a SMILES string, which can be preceded by a molecule identifier separated from the SMILES string by whitespaces (example of input data).


Target prediction output

To see an interactive target prediction output example, please click here.



HitPickV2 Target Prediction Method

To predict associations of a query compound to protein targets, HitPickV2 places the query compound into its surrounding chemical space of annotated compound-target interactions using k-NN chemical similarity search (2). Then, it selects the closest 10 targets in this space. Afterward, HitPickV2 scores these 10 targets based on three parameters: the Tanimoto coefficient (Tc) between the query and the most similar compound interacting with the target in such space, a target rank that considers Tc and Laplacian-modified naïve Bayesian target models scores (3) and a novel parameter introduced in HitPickV2, the number of compounds interacting with each target (occur) in this space. In a final step, HitPickV2 assigns the precision based on the precision table (Table 1).

Assessment of the precision of predicted targets

We assessed the precision of HitPickV2 target prediction in a cross-validation approach. After filtering out protein targets with less than three known associated compounds, we spitted the compound-target database into a training (85%, 366,764 compounds) and a validation (15%, 64588 compounds) sets distributing the compounds of each target into the two sets in an 85:15 ratio. We used the training set to build Bayesian models of drug targets and the validation set to test the precision of HitPickV2 target prediction method. To that aim, we compared the predicted targets of each compound in the latter set to its actual know targets.

Table 1. Precision (%) for the first five ranked predicted targets in relation to the Tc similarity (in green) of a validation compound to the most similar molecule in the training set and the occur parameter. Cells with a precision higher than 50% are marked in red. The precision for cells marked as * was determined with a number of compound-target predictions lower than 30. Due to the design of the widely and successfully used fingerprint scheme, a Tc of 1 does not mean that two molecules are necessarily identical. For predicted compound-targets pairs already annotated in our in-house compound-protein database, we assign the target with 100% target prediction precision.

Compound-protein interaction database

To predict targets of compounds HitPickV2 uses an in-house database of 891,629 physically interacting human chemical-protein target associations including 521,682 compounds and 3,235 protein targets. This in-house database integrates interactions from CHEMBL, BindingDB, T3DB, DrugBank, KiDB,LigExpo and TTD public drug-target resources. Compound-target associations observed in biochemical assays with an association affinity (typically half maximal inhibitory concentration (IC50)) lower than 10µM were included in the database.

HitPickV2 uses this database to create 2739 Bayesian models of protein targets and as a chemical space. As we are using a 2D fingerprints, stereochemistry is not taken into account during the generation of target models as well as similarity calculations. Therefore, interaction data for sets of steroisomers are merged into new records, identified by the concatenation of all individual identifiers of the contributing compounds.


Privacy

To preserve the privacy of the user data, only users are able to access their uploaded data and results. In addition, all data will be deleted automatically after seven days.


Availability

HitPickV2 is free for academic use.


Processing time

The target prediction takes around 1.2 second per query compound when 1000 compounds are submitted as query.


References

1. Liu,X., Vogt,I., Haque,T., Campillos,M. (2013) HitPick: a web server for hit identification and target prediction of chemical screenings. Bioinformatics 29(15):1910-2.

2. Schuffenhauer,A., Floersheim,P., Acklin,P. and Jacoby,E. (2003) Similarity metrics for ligands reflecting the similarity of the target proteins. Journal of Chemical Information and Computer Sciences, 43: 391-405.

3. Nidhi, Glick,M., Davies,J.W. and Jenkins,J.L. (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. Journal of Chemical Information and Modeling, 46: 1124-1133.

4. David Rogers and Mathew Hahn. (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742-754.

5. Bemis, G.W. and Murcko, M.A. (1996) The properties of known drugs. 1. Molecular frameworks. Journal of medicinal chemistry, 39(15):2887-2893