KEGG logo
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database which helps the user understand interactions within a biological system at the molecular level through the use of gene and genome sequences.  KEGG compiles genetic information to diagram networks of molecular interactions.  Started in 1995 as part of the Human Genome Program in Japan, KEGG is run by the laboratory of Minoru Kanehisa at Kyoto Univeristy (1 ).
KEGG databases

Table of KEGG databases and their organization. KEGG website.

The database is composed of 16 "smaller" databases which can be organized, based on content, into 3 categories: systems information, chemical information, and genomic information.  These databases are all integrated to form a visual representation of various biological pathways.  The pathways are created from data gathered from available genomic sequences and the functional data assocaited with those sequences.


KEGG uses

Schematic of ways to use KEGG. KEGG website.

One of the most useful aspects of this database is the ability to visualize different biological pathways (individually or in the context of a particular organism) along with the enzymes, reactions, and substrates/products involved.  Another way KEGG can be useful is when determining which enzymes are present in different organisms and how those enzymes are used to create a certain metabolite.

From a human biology perspective, KEGG has been used by researchers to gather genetic information for different diseases.  This includes the genetic interactions in the commonly used breast cancer cell line MCF-7 and susceptibility to coranary artery disease (2 , 3 ).


This section will be devoted to a quick guide of how to use the basics of the KEGG Pathway feature.  Please refer to the slideshow to see screenshots of the major steps described.

  • KEGG homepage
  • KEGG Pathway Database
  • Metabolism pathways
  • Histidine metabolism pathway map
  • Page for substrate imidazole-acetol phosphate
  • Histidine metabolism pathway map for T. maritima
  • Page for E.C. imidazoleglycerol-phosphate dehydratase

For the purpose of this demonstration, we will look at histidine metabolism in the hyperthermophilic bacteria species Thermotoga maritima. Start at the homepage for KEGG (  The link titled (KEGG Pathway" will bring us to the KEGG Pathway database where we can select from a set of categories focusing on biological systems.  By selecting metabolism, we are brought to a list of metabolic pathways for which the genomic and functional data are available.  When we select "Histidine metabolism" under section 1.5 Amino Acid Metabolism, we are brought to a map depicting the enzymes and substrates needed to generate histidine, its precursors, and products created from histidine.  From here, we can select any of the metabolites or enzymes; this is the link to the page for E.C. (imidazoleglycerol-phosphate dehydratase) .  These pages include a large amount of information on each exnyme, including name, orthology, associated genes, reactions carried out, and papers relating to the particular enzyme.  Also included are links to other useful databases.  Included in the slideshow is an image of the page for the product of this enzyme, imidazole-acetol phosphate.   We can go back to the histidine map and select an organism to see which ezymes are in that organism's genome and which substrates it can utilize.  KEGG contains complete genomes for over 2500 organisms (201 eukaryotes, 2458 bacteria, and 160 archaea).  When we choose Thermotoga maritima, we see the exact same map, except the genes found in that organism are highlighted in green.  When we select E.C. while on the organism's map, the information now becomes relevant for the enzyme in that particular organism, including amino acid and nucletide sequences.


1. Kanehisa, M and Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl. Acids Res. (2000)

2. Huan, J, et al. Insights into significant pathways and gene interaction networks underlying breast cancer cell line MCF-7 treated with 17beta-Estradiol (E2). Gene. (2013)

3. Duan, S, et al. Identification of susceptibility modules for coronary heart disease using a genome wide integrated network analysis. Gene. (2013)