Calculating physico chemical descriptors with opentox algorithm
in-silico toxicology gmbh blog
Opentox-ruby has facilities to calculate more than 300 non-proprietary physico-chemical descriptors, whose calculation can be easily triggered with REST calls.
Three libraries are employed for descriptor calculation:
-
Chemistry Development Kit (CDK), developed by Chris Steinbeck, Egon Willighagen, and others.
-
OpenBabel, developed by Noel O’Boyle and others.
-
Joelib2, developed by Jörg Wegener.
CDK and OpenBabel are actively maintained packages with a large community, while Joelib seems is more or less a single man’s work.
Descriptors have been categorized across libraries. It is possible to calculate descriptors individually, or calculate groups of descriptors according to categories and/or libraries.
Categories
Descriptors were categorized based on work by R. Guha. The list gives code names and number of descriptors in braces:
-
geometrical (20)
-
topological (186)
-
electronic (10)
-
cpsa (28)
-
constitutional (48)
-
hybrid (23)
Libraries
Three libraries are employed. The list gives code names and number of descriptors in braces.
-
cdk (263)
-
openbabel (20)
-
joelib (32)
Creating a Feature Dataset
The deployment on ot-test.in-silico.ch
is used to demonstrate the usage:
Get a list of all descriptors: Query the pc descriptor service without parameters.
curl "http://ot-test.in-silico.ch/algorithm/pc"
Calculate all descriptors (using AllDescriptors from the above list) by POSTing a dataset uri.
curl -X POST \
--data-urlencode "dataset_uri=..." \
"http://ot-test.in-silico.ch/algorithm/
pc/AllDescriptors"
Pass arguments pc_type and/or lib to constrain the query to certain categories and/or libraries. Use code names from above as argument values. E.g., to calculate constitutional and electronic descriptors from CDK, use pc_type=constitutional,electronic
and lib=cdk
.
curl -X POST \
--data-urlencode "dataset_uri=..." \
--data-urlencode "pc_type=constitutional,electronic", \
--data-urlencode "lib=cdk" \
"http://ot-test.in-silico.ch/algorithm/pc/AllDescriptors"
It is possible to combine values by appending them in a comma-separated list. Moreover,
-
Calculate individual descriptors by POSTing to one of the other URIs from the list.
-
The POST returns a task URI that can be queried for progress.
That’s it! The resulting feature dataset contains extensive OWL feature metadata:
-
OT.hasSource
: The dataset passed as dataset_uri argument -
DC.creator
: The webservice used to calculate the feature -
DC.description
: Free text with a human readable description, library, and category of the feature
Notes
-
As with any dataset, resulting feature datasets may also be requested in CSV or YAML formats by specifying appropriate accept-Headers.
-
Further processing may include feature selection as next step. An implementation based on Random Forests is available in opentox-ruby.
-
The dataset (perhaps after feature selection) can be passed to the Lazar algorithm as feature dataset.
-
From a higher perspective: A complete tutorial that streamlines the process.
blog comments powered by Disqus