Dylan Storey

Recovering academic, hacker, tinkerer, scientist.

Getting All the KEGG pathways down from the KEGG server.

Ever Wanted to get all the KEGG pathways from the KEGG server?

So - we’re looking at alot of our data through the lens of metabolism now and as a result needed a way to tie into well known and well curated metabalomic databases.

For those of you who don’t know, KEGG is arguably the best of these as far as curation and ease of use are concerned. Unfortunately, the lab responsible for keeping KEGG running has had to start charging yearly access fees to get to their databases so that they can continue to support it. That they need to do this sucks, in todays funding enviroment it’s understandable. As somewhat of a compromise to the research community, they’ve opened up a REST api that can be used to query and retrieve information.

The make file that downloads all of the pathways as KGMLs is on github as a gist. Some quick notes about the structure:

I’ve tried to maintain the hierarchy that KEGG implements as much as possible Each type of pathway (i.e Carbohydrate Metabolism, Energy Metabolism, etc) is organised the same way the KEGG keeps it. These are the lists ko numbers at the top of the file. I’ve also grouped up all of the pathways into their larger groups (i.e. Metabolism , Global Maps, etc.). So you can get these by issuing something like: make genetic_information_processing Make all - Will fetch the whole database but please be nice only do it if you need it. Full file up on github as a gist.

I haven’t tested this at all beyond what I needed specifically, so if you find any problems please share !


blog comments powered by Disqus