Flyer, Poster and Presentations

Download our flyers to get a quick overview over our products.

OC|miner Overview Flyer - (PDF 1,1 MB)

Our Products Presentation - (PDF 3,5 MB)

Name2Chemistry Information - (PDF 0,1 MB)


Software

Download SODIAC (free for academic use), our ontology editor especially created for chemical ontologies with special features like compound classification. Please be aware that the download encompasses only a demo version that albeight fully functional, comes with reduced features. It does not fully represent the product that is available for purchase. A ChemAxon license is necessary for some of the chemical functionalities.

(ZIP-archive 86 MB)


Ontologies

Download an example ontology showcasing our chemical compound classes.

(GZ-archive 0,5 MB)


SARminer Gold Corpus

We present the SARminer Gold Corpus, a gold standard for chemistry-disease relations in patent texts. The corpus consists of excerpts from US patent applications and includes annotations of named entities of the domains chemistry (e.g. "propranolol") and diseases (e.g. "hypertension") as well as of related domains like methods and substances. Also, domain-relevant relations between these entities, e.g. "propranolol treats hypertension", have been manually annotated. The corpus is attempted to be suitable for developing and evaluating relation extraction methods. The corpus is available in the BRAT standoff annotation format. Details on the creation of the corpus can be found in the following publication:

Schlaf, Antje, Claudia Bobach, and Matthias Irmer (2014): Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts. In Nicoletta Calzolari et al., editors: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland. European Language Resources Association (ELRA).| (PDF)

This work is distributed under the Creative Commons license. Any licensee shall acknowledge use of the corpus in all publications of research based in whole or in part on its use through citation of the above publication.

 

SARminer Gold Corpus 1

For creating this corpus, we randomly selected 21 US patent applications from 2010 which contained a claimed SAR relation. We annotated named entities in these documents with OCMiner and selected all of those sentences which contained both a chemistry as well as a disease term, resulting in a total of 365 sentences. This co-occurrence based selection method provides a first approximation to the maximal recall of chemistry-disease relations expressed in the text. Please note that relations that are expressed by more than one sentence were not considered for the SARminer Gold 1 corpus (but see SARminer Claim Sections below). Subsequently, sentences were manually annotated with the help of the open-source tool BRAT. The manual annotation process included the correction of erroneous automatic named entity annotations (which lead to a reduction of the number of sentences which actually bear a chemistry-disease co-occurrence from 365 to 270), the addition of missing named entities from the relevant domains, as well as the annotation of relations between named entitities. Additionally, we applied chain reasoning, a method for automatically inferring additional relations via relation chains, and integrated these relations into the corpus.
Size: 270 sentences from patent texts (both from description and from claim sections), exhaustively annotated with named entities and relevant relations between them.
Source: 21 US patent applications from 2010 containing a claimed SAR relation.

sarminer_gold1
(ZIP-archive 0,4 MB)

 

SARminer Gold Corpus 2

From chemistry-related US patent applications from 2010 (approx. 43,000 from approx. 340,000 in total), we selected those patent claims (approx. 22,000) which contained co-occurrences of chemistry and disease terms. Of these claim sentences, we randomly chose 1,000 and randomized them. The annotation was performed in the same way as in SARminer Gold Corpus 1, with the difference that SARminer Gold Corpus 2 contains only sentences from patent claims, and the annotated sentences originate from a much bigger number of randomly selected patent applications.
Size: 1,094 patent claims exhaustively annotated with named entities and relevant relations between them.
Source: Automatic selection and randomization of claims from US patent applications from 2010.

sarminer_gold2
(ZIP-archive 2,4 MB)

 

SARminer Claim Sections

The claim sections of the patent application chosen for SARminer Gold Corpus 1 have been annotated with named entities and relevant relations between them. Additionally, relations between named entities in different patent claims have been annotated, including anaphoric relations (coreference).
Size: Entire claim sections of 21 patent applications, annotated with named entities and relevant relations between them, both within a claim and in different claims.
Source: Same patent sources as SARminer Gold 1.

sarminer_claimsections
(ZIP-archive 0,5 MB)


Our Publications

 

2015

Krallinger Martin, et al.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. Journal of Cheminformatics. 2015;7 (PDF 2,3 MB)

Claudia Bobach, et al.: (2015): Screening of synthetic and natural product databases: Identification of novel androgens and antiandrogens. Eur J Med Chem 2015 Jan 27. (LINK)


Irmer Matthias, Lutz Weber, Timo Böhme, Anett Püschel, Claudia Bobach, and Ulf Laube (2015): OCMiner for Patents. Extracting Chemical Information from Patent Texts. Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, 119-123
(PDF 0,2 MB)

 

2014

Böhme, Timo, Matthias Irmer, Anett Püschel, Claudia Bobach, Ulf Laube and Lutz Weber (2014): OCMiner: Text processing, annotation and relation extraction for the Life Sciences. In Adrian Paschke, Albert Burger, Paolo Romano, M. Scott Marshall and Andrea Splendiani, editors: Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2014), Berlin, Germany. (PDF 0,4 MB)

Schlaf, Antje, Claudia Bobach, and Matthias Irmer (2014): Creating a Gold Standard corpus for the extraction of chemistry-disease relations from patent texts. In Nicoletta Calzolari et al., editors: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland. European Language Resources Association (ELRA). (PDF 0,2 MB)


Hiss, Jan A, Michael Reutlinger, Christian P Koch, Anna M Perna, Petra Schneider, Tiago Rodrigues, Sarah Haller, Gerd Folkers, Lutz Weber, Renato B Baleeiro, Peter Walden, Paul Wrede, and Gisbert Schneider (2014): Combinatorial chemistry by ant colony optimization. Future Medicinal Chemistry 2014 6:3, 267-280 (LINK)

 

2013

Weber, Lutz, Timo Böhme and Matthias Irmer (2013): Ontology-based content analysis of US patent applications from 2001–2010. Pharmaceutical Patent Analyst, 2(1):39–54. (LINK)

Irmer, Matthias, Claudia Bobach, Timo Böhme, Ulf Laube, Anett Püschel and Lutz Weber (2013): Adapting the OCMiner text processing system to the CTD controlled vocabulary. In Cecilia Arighi et al., editors: Proceedings to the 4th BioCreative challenge evaluation workshop, volume 1, Bethesda, MD, USA, 114-117. (PDF 0,5 MB)

Irmer, Matthias, Claudia Bobach, Timo Böhme, Ulf Laube, Anett Püschel and Lutz Weber (2013):
Chemical Named Entity Recognition with OCMiner. In Martin Krallinger et al., editor: Proceedings to the 4th BioCreative challenge evaluation workshop, volume 2, Bethesda, MD, USA, 42-96. (PDF 1,0 MB)

Irmer, Matthias, Claudia Bobach, Timo Böhme, Anett Püschel and Lutz Weber (2013):
Using a chemical ontology for detecting and classifying chemical terms mentioned in texts. In Nigam Shah, Susanna-Assunta Sansone, Larisa Soldatova, and Michel Dumontier, editors: Proceedings of Bio-Ontologies 2013, Berlin, Germany, page 43.

Eyrisch S, Girschick T, Ross G, Kalinski C, Khazak V, and Weber L. PriaXplore® - a novel technology platform for the identification of small molecule modulators of protein-protein interactions. Journal of Cheminformatics. 2013;5(Suppl 1):P35.
(PDF 0,2 MB)
 

2012

Bobach Claudia, Timo Böhme, Ulf Laube, Anett Püschel, and Lutz Weber (2012): Automated compound classification using a chemical ontology. Journal of Cheminformatics. 2012;4:40. (PDF 1,6 MB)