HitPickV2 is a web server to predict molecular targets of compounds.
HitPickV2 employs a novel ligand-based approach to predict targets of compounds. In this new version of HitPick (1), the number of possible predicted targets has increased to 2739 (1350 more than in the former HitPick version (1)). For a query compound, HitPickV2 explores structurally similar compounds within a chemical space of annotated compound-target interactions using k-nearest neighbours (k-NN) chemical similarity search (2) and scores the first encountered 10 protein targets in this space.
Target prediction data input
The query compounds can be inserted in a Text area or uploaded in a file (without headers). Each compound should be included in a different line represented as a SMILES string, which can be preceded by a molecule identifier separated from the SMILES string by whitespaces (example of input data).
Target prediction output
To see an interactive target prediction output example, please click here.
The Change columns button allows the user to select the visible output fields. The copy, csv and pdf buttons enable to copy as well as to export as an csv file or as a pdf the output table, respectively.
Table output fields
Displays the two dimensional chemical structure of the query compound along with a molecule identifier (if defined).
Displays the smallest protein complex from the Reactome database the Target (predicted or known) belongs to. More information about the protein complex can be accessed via the complex link. Since HitPickV2 provides predictions at the level of individual protein targets, this information facilitates the identification of proteins belonging to the same protein complex and might be consider as joint units.
The precision of the predicted target is calculated within intervals of chemical similarity (Tc) (between query and the i-th closest compound in the k-NN chemical space annotated to the predicted target), for the first ten ranking predicted targets according to the Bayesian Model scores as well as within Target occurrence (occur) intervals of the predicted target within such space (Table 1).
Displays the two dimensional chemical structure of the most similar compound annotated to the predicted target in the k-NN chemical space.
Tanimoto coefficient between the query compound and the closest compound in the k-NN chemical space annotated to the predicted target. Tc=1 indicates strong similarity, while Tc=0 indicates weak similarity. The Tc compares the similarity of Functional-Class Fingerprints (FCFP)-like (4) circular Morgan fingerprints using feature-invariants as implemented in RDKit.
Number of compounds annotated to the predicted target in the k-NN chemical space covering 10 distinct protein targets. This field is enhanced with a scaffold perception feature displaying the three most abundant Murcko scaffolds (5) and their occurrence in these set of compounds. The number of unique scaffolds is reported in brackets. Since Murcko scaffolds are not defined for linear compounds (or scaffolds), scaffolds without ring systems (acyclic compounds) are indicated as “acyclic compounds”.
HitPickV2 Target Prediction Method
To predict associations of a query compound to protein targets, HitPickV2 places the query compound into its surrounding chemical space of annotated compound-target interactions using k-NN chemical similarity search (2). Then, it selects the closest 10 targets in this space. Afterward, HitPickV2 scores these 10 targets based on three parameters: the Tanimoto coefficient (Tc) between the query and the most similar compound interacting with the target in such space, a target rank that considers Tc and Laplacian-modified naïve Bayesian target models scores (3) and a novel parameter introduced in HitPickV2, the number of compounds interacting with each target (occur) in this space. In a final step, HitPickV2 assigns the precision based on the precision table (Table 1).
Assessment of the precision of predicted targets
We assessed the precision of HitPickV2 target prediction in a cross-validation approach. After filtering out protein targets with less than three known associated compounds, we spitted the compound-target database into a training (85%, 366,764 compounds) and a validation (15%, 64588 compounds) sets distributing the compounds of each target into the two sets in an 85:15 ratio. We used the training set to build Bayesian models of drug targets and the validation set to test the precision of HitPickV2 target prediction method. To that aim, we compared the predicted targets of each compound in the latter set to its actual know targets.
Table 1. Precision (%) for the first five ranked predicted targets in relation to the Tc similarity (in green) of a validation compound to the most similar molecule in the training set and the occur parameter. Cells with a precision higher than 50% are marked in red. The precision for cells marked as * was determined with a number of compound-target predictions lower than 30. Due to the design of the widely and successfully used fingerprint scheme, a Tc of 1 does not mean that two molecules are necessarily identical. For predicted compound-targets pairs already annotated in our in-house compound-protein database, we assign the target with 100% target prediction precision.
Compound-protein interaction database
To predict targets of compounds HitPickV2 uses an in-house database of 891,629 physically interacting human chemical-protein target associations including 521,682 compounds and 3,235 protein targets. This in-house database integrates interactions from CHEMBL, BindingDB, T3DB, DrugBank, KiDB,LigExpo and TTD public drug-target resources. Compound-target associations observed in biochemical assays with an association affinity (typically half maximal inhibitory concentration (IC50)) lower than 10µM were included in the database.
HitPickV2 uses this database to create 2739 Bayesian models of protein targets and as a chemical space. As we are using a 2D fingerprints, stereochemistry is not taken into account during the generation of target models as well as similarity calculations. Therefore, interaction data for sets of steroisomers are merged into new records, identified by the concatenation of all individual identifiers of the contributing compounds.
To preserve the privacy of the user data, only users are able to access their uploaded data and results. In addition, all data will be deleted automatically after seven days.
HitPickV2 is free for academic use.
The target prediction takes around 1.2 second per query compound when 1000 compounds are submitted as query.
1. Liu,X., Vogt,I., Haque,T., Campillos,M. (2013) HitPick: a web server for hit identification and target prediction of chemical screenings. Bioinformatics 29(15):1910-2.
2. Schuffenhauer,A., Floersheim,P., Acklin,P. and Jacoby,E. (2003) Similarity metrics for ligands reflecting the similarity of the target proteins. Journal of Chemical Information and Computer Sciences, 43: 391-405.
3. Nidhi, Glick,M., Davies,J.W. and Jenkins,J.L. (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. Journal of Chemical Information and Modeling, 46: 1124-1133.
4. David Rogers and Mathew Hahn. (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742-754.
5. Bemis, G.W. and Murcko, M.A. (1996) The properties of known drugs. 1. Molecular frameworks. Journal of medicinal chemistry, 39(15):2887-2893