The more, the better.

Since you can't have enough content we offer you an unprecedented range of normalized data sources from scientific literature, to patents, from public internet newsfeeds to your very own internal data that we can deliver as as normalized data streams to you. Those open source and proprietary content sources form not only the basis of our own datamining products - like the OC|miner - but we also offer them to you to freely use in you own data management and data extraction pipelines. All these external and internal sources will be normalized, annotated as well as indexed and therefore searchable just like your own internal documents if you so wish. All sources can be switched on and off according to your needs. To do that just give us a call or do it yourself using our OC|manager tool.

For your Big Data predictive projects we can for example deliver extracted information like compound lists with biological and and physico-chemical properties. These streams will contain up-to-date information from the content of choice in selectable formats such as XML, ANSI or TSV formats to meet your data needs.

All the sources below are both part of our OC|miner as well as being deliverable as on-demand data streams.


Open source content (normalized, annotated, indexed):

  • Medline abstracts (approx. 27 million abstracts from over 30.000 journals - growing by about 3.100 abstracts every 24h)
  • full-text journal articles from Pubmed Central (approx. 1.6 million journal articles from over 11.000 journals - growing by about 900 full text articles every 24h)
  • full-text open access journals (approx. 2,5 million full text articles from almost 9500 journals that you can choose from, automatically downloadable)
  • health related Webpages (e.g. FDA, EMEA, EFSA, etc.)
  • ChEMBL (approx. 2.040.000 compounds) and databases (approx. 244.000 studies)


Proprietary content (normalized, annotated, indexed):

  • European patents (EPO) (approx. 5,3 million patents)
  • US patents (USPTO) (approx. 11,3 million patents)
  • World patents (WIPO) (approx. 3,6 million patents)
  • all three patent sources growing by approx. 3200 patents every 24h
  • news flow (customizable web feeds with daily updates from newspapers, blogs, forums and social media - approx. 10 million news items per day that can be filtered according to your wishes)
  • knowledge graph - structured extracted properties


Derived content:

  • PhytoBase┬« - our comprehensive and dynamic plant database where we used text analysis and knowledge mining technologies to retrieve qualified relationship information about plants, natural products that occur in these plants, their uses and biological activities, as well as molecular pathways including proteins and genes



The content is automatically incorporated in OC|miner on a daily basis. If you are interestend in the raw data itself, this can be delivered on demand as data streams, news flows or as alerts via e-mail, S-FTP or a webservice. You have other sources you need normalized and delivered daily to you? Ask us about it!


Read on ...