Skip to content

Genes and transcripts

Scout stores information about genes and transcripts. The information is collected from a couple of resources, these can be updated manually if desired. Definition of what genes that exists and their correct id and symbols are collected from HGNC. Unfortunately HGNC does only maintain a distribution for GRCh38, at this time (mid 2018) there are many resources that lack support for build 38 so many investigators still use build 37. We therefore load two gene sets, one for each build.

Genes with missing RefSeq transcripts

There is a issue where this problem is discussed here, please read that since it is important. When we first look at all genes we could see the following:

Nr of genes: 33578 Nr without transcripts: 11048

So 1/3 of all genes are missing any refseq transcript. This came out as a high number at first, after some thought one might realise that many of these genes are not protein coding etc.

If we look at disease causing genes from OMIM there are 16 genes that are missing refseq id for any transcript in ensembl build 37, 7 genes in build 38.

The genes in OMIM without refseq transcripts in build 37 are the following:

TTC25
PTPRQ
SRD5A2
PIGY
FGF16
TRAC
PADI6
GDF1
TUBB3
IGHG2
IGHM
IGKC
FCGR2C
KMT2B
NEFL
NR2E3

In the gene panels there are one more gene outside OMIM wihtout refseq transcripts: MAP3K14

We will here look at all these genes in detail:

TTC25

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 HG185_PATCH ENSG00000260703 TTC25 ENST00000569427 -
37 HG185_PATCH ENSG00000260703 TTC25 ENST00000561994 -
37 HG185_PATCH ENSG00000260703 TTC25 ENST00000569541 NM_031421
37 HG185_PATCH ENSG00000260703 TTC25 ENST00000565052 -
37 17 ENSG00000204815 TTC25 ENST00000593239 -
37 17 ENSG00000204815 TTC25 ENST00000591658 -
37 17 ENSG00000204815 TTC25 ENST00000585530 -
37 17 ENSG00000204815 TTC25 ENST00000377540 -
--- --- --- --- --- ---
38 17 ENSG00000204815 TTC25 ENST00000593239 -
38 17 ENSG00000204815 TTC25 ENST00000591658 -
38 17 ENSG00000204815 TTC25 ENST00000377540 NM_031421
38 17 ENSG00000204815 TTC25 ENST00000377540 NM_031421
38 17 ENSG00000204815 TTC25 ENST00000585530 -

PTPRQ

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 12 ENSG00000139304 PTPRQ ENST00000551042 -
37 12 ENSG00000139304 PTPRQ ENST00000547376 -
37 12 ENSG00000139304 PTPRQ ENST00000551573 -
37 12 ENSG00000139304 PTPRQ ENST00000526956 -
37 12 ENSG00000139304 PTPRQ ENST00000547485 -
37 12 ENSG00000139304 PTPRQ ENST00000551624 -
37 12 ENSG00000139304 PTPRQ ENST00000547881 -
37 12 ENSG00000139304 PTPRQ ENST00000549355 -
37 12 ENSG00000139304 PTPRQ ENST00000266688 -
--- --- --- --- --- ---
38 12 ENSG00000139304 PTPRQ ENST00000623635 -
38 12 ENSG00000139304 PTPRQ ENST00000551042 -
38 12 ENSG00000139304 PTPRQ ENST00000547376 -
38 12 ENSG00000139304 PTPRQ ENST00000551573 -
38 12 ENSG00000139304 PTPRQ ENST00000614701 NM_001145026
38 12 ENSG00000139304 PTPRQ ENST00000547485 -
38 12 ENSG00000139304 PTPRQ ENST00000551624 -
38 12 ENSG00000139304 PTPRQ ENST00000547881 -
38 12 ENSG00000139304 PTPRQ ENST00000549355 -
38 12 ENSG00000139304 PTPRQ ENST00000616559 XM_017019274

SRD5A2

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 2 ENSG00000049319 SRD5A2 ENST00000405650 -
37 2 ENSG00000049319 SRD5A2 ENST00000233139 -
--- --- --- --- --- ---
38 2 ENSG00000277893 SRD5A2 ENST00000622030 NM_000348

Here we can see that two transcripts has been merged to one and been given a refseq identifier. Most probable one of the old transcripts match the new one. We can not be sure if variants are missed here.

PIGY

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 4 ENSG00000255072 PIGY ENST00000527353 -
--- --- --- --- --- ---
38 4 ENSG00000255072 PIGY ENST00000527353 -

Here the transcript is identical for both builds so we dear to guess that we do not miss any variants.

FGF16

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 X ENSG00000196468 FGF16 ENST00000439435 -
37 HG1426_PATCH ENSG00000268853 FGF16 ENST00000600602 -
--- --- --- --- --- ---
38 X ENSG00000196468 FGF16 ENST00000439435 NM_003868

The refseq transcript exists in build 37 but is missing the refseq identifier. We would NOT miss any variants here.

TRAC

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 14 ENSG00000229164 TRAC ENST00000478163 -
--- --- --- --- --- ---
38 14 ENSG00000277734 TRAC ENST00000611116 -

Transcripts have different ENS ids in both builds but are most probably the same. No refseq identifier for any build.

PADI6

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 1 ENSG00000256049 PADI6 ENST00000434762 -
37 1 ENSG00000256049 PADI6 ENST00000358481 -
--- --- --- --- --- ---
38 1 ENSG00000276747 PADI6 ENST00000619609 NM_207421
38 CHR_HG2095_PATCH ENSG00000280949 PADI6 ENST00000625380 NM_207421

Unclear what happens here...

GDF1

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 19 ENSG00000130283 GDF1 ENST00000247005 -
--- --- --- --- --- ---
38 19 ENSG00000130283 GDF1 ENST00000247005 -

Same transcripts, no refseq in both builds. No variants will be missed here

TUBB3

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 16 ENSG00000198211 TUBB3 ENST00000556922 -
--- --- --- --- --- ---
38 16 ENSG00000258947 TUBB3 ENST00000555810 -
38 16 ENSG00000258947 TUBB3 ENST00000554444 NM_001197181
38 16 ENSG00000258947 TUBB3 ENST00000556565 -
38 16 ENSG00000258947 TUBB3 ENST00000315491 NM_006086
38 16 ENSG00000258947 TUBB3 ENST00000553656 -
38 16 ENSG00000258947 TUBB3 ENST00000556536 -
38 16 ENSG00000258947 TUBB3 ENST00000554116 -
38 16 ENSG00000258947 TUBB3 ENST00000554927 -
38 16 ENSG00000258947 TUBB3 ENST00000557262 -
38 16 ENSG00000258947 TUBB3 ENST00000557490 -
38 16 ENSG00000258947 TUBB3 ENST00000555576 -
38 16 ENSG00000258947 TUBB3 ENST00000554336 -
38 16 ENSG00000258947 TUBB3 ENST00000555609 -
38 16 ENSG00000258947 TUBB3 ENST00000553967 -
38 16 ENSG00000258947 TUBB3 ENST00000625617 -

This is very unclear, it goes from one to many transcripts between the builds. Hard to say if variants are missed here.

IGHG2

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 14 ENSG00000211893 IGHG2 ENST00000390545 -
37 HG1592_PATCH ENSG00000270895 IGHG2 ENST00000603473
--- --- --- --- --- ---
38 CHR_HSCHR14_3_CTG1 ENSG00000274497 IGHG2 ENST00000621803 -
38 14 ENSG00000211893 IGHG2 ENST00000641095 -
38 14 ENSG00000211893 IGHG2 ENST00000390545 -

No refseq in any build. Same transcripts in both builds except a new one in a patch. No variants would be missed here.

IGHM

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 14 ENSG00000211899 IGHM ENST00000390559 -
37 HG1592_PATCH ENSG00000271541 IGHM ENST00000605693 -
--- --- --- --- --- ---
37 CHR_HSCHR14_3_CTG1 ENSG00000282657 IGHM ENST00000626472 -
37 14 ENSG00000211899 IGHM ENST00000637539 -
37 14 ENSG00000211899 IGHM ENST00000390559 -

Same as above, no variants missing here

IGKC

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 2 ENSG00000211592 IGKC ENST00000390237 -
--- --- --- --- --- ---
38 2 ENSG00000211592 IGKC ENST00000390237 -

Same in both builds. No variants would be missing here

FCGR2C

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 1 ENSG00000244682 FCGR2C ENST00000502411 -
37 1 ENSG00000244682 FCGR2C ENST00000496692 -
37 1 ENSG00000244682 FCGR2C ENST00000466542 -
37 1 ENSG00000244682 FCGR2C ENST00000465075 -
37 1 ENSG00000244682 FCGR2C ENST00000473530 -
37 1 ENSG00000244682 FCGR2C ENST00000473712 -
37 1 ENSG00000244682 FCGR2C ENST00000482226 -
37 1 ENSG00000244682 FCGR2C ENST00000467903 -
37 1 ENSG00000244682 FCGR2C ENST00000507374 -
37 1 ENSG00000244682 FCGR2C ENST00000508651 -
37 1 ENSG00000244682 FCGR2C ENST00000543859 -
--- --- --- --- --- ---
38 1 ENSG00000244682 FCGR2C ENST00000502411 -
38 1 ENSG00000244682 FCGR2C ENST00000496692 -
38 1 ENSG00000244682 FCGR2C ENST00000466542 -
38 1 ENSG00000244682 FCGR2C ENST00000465075 -
38 1 ENSG00000244682 FCGR2C ENST00000473530 -
38 1 ENSG00000244682 FCGR2C ENST00000473712 -
38 1 ENSG00000244682 FCGR2C ENST00000482226 -
38 1 ENSG00000244682 FCGR2C ENST00000467903 -
38 1 ENSG00000244682 FCGR2C ENST00000507374 -
38 1 ENSG00000244682 FCGR2C ENST00000508651 -
38 1 ENSG00000244682 FCGR2C ENST00000611236 -
38 1 ENSG00000244682 FCGR2C ENST00000543859 NR_047648

Most probably no variants would be missing here. All transcripts seems to be correct between the builds.

KMT2B

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 19 ENSG00000105663 KMT2B ENST00000606995 -
37 19 ENSG00000105663 KMT2B ENST00000607650 -
37 19 ENSG00000105663 KMT2B ENST00000592092 -
37 19 ENSG00000105663 KMT2B ENST00000585476 -
37 19 ENSG00000105663 KMT2B ENST00000586308 -
--- --- --- --- --- ---
38 19 ENSG00000272333 KMT2B ENST00000420124 NM_014727
38 19 ENSG00000272333 KMT2B ENST00000606995 -
38 19 ENSG00000272333 KMT2B ENST00000592092 -
38 19 ENSG00000272333 KMT2B ENST00000585476 -
38 19 ENSG00000272333 KMT2B ENST00000586308 -

Here the transcript ENST00000607650 in build 37 have probably changed id to ENST00000420124 and been given a refseq identifier.

NEFL

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 8 ENSG00000104725 NEFL ENST00000221169 -
--- --- --- --- --- ---
38 8 ENSG00000277586 NEFL ENST00000610854 NM_006158
38 8 ENSG00000277586 NEFL ENST00000615973 -
38 8 ENSG00000277586 NEFL ENST00000639464 -
38 8 ENSG00000277586 NEFL ENST00000619417 -

This is unclear, probably ENST00000221169 has been renamed to ENST00000610854 and been given a refseq identifier.

NR2E3

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 15 ENSG00000031544 NR2E3 ENST00000561604 -
37 15 ENSG00000031544 NR2E3 ENST00000567496 -
37 15 ENSG00000031544 NR2E3 ENST00000562839 -
37 15 ENSG00000031544 NR2E3 ENST00000562925 -
37 15 ENSG00000031544 NR2E3 ENST00000563709 -
37 15 ENSG00000031544 NR2E3 ENST00000398840 -
37 15 ENSG00000031544 NR2E3 ENST00000326995 -
--- --- --- --- --- ---
38 15 ENSG00000278570 NR2E3 ENST00000621736 -
38 15 ENSG00000278570 NR2E3 ENST00000617575 NM_014249
38 15 ENSG00000278570 NR2E3 ENST00000621098 NM_016346
38 15 ENSG00000278570 NR2E3 ENST00000563709 -

This is unclear. We might miss variants here

MAP3K14

Build Chrom EnsGeneID Gene Name EnsTransID RefSeq mRNA
37 17 ENSG00000006062 MAP3K14 ENST00000587332 -
37 17 ENSG00000006062 MAP3K14 ENST00000592267 -
37 17 ENSG00000006062 MAP3K14 ENST00000586644 -
37 17 ENSG00000006062 MAP3K14 ENST00000344686 -
37 17 ENSG00000006062 MAP3K14 ENST00000376926 -
--- --- --- --- --- ---
38 17 ENSG00000006062 MAP3K14 ENST00000344686 NM_003954
38 17 ENSG00000006062 MAP3K14 ENST00000640087 -
38 17 ENSG00000006062 MAP3K14 ENST00000592267 -
38 17 ENSG00000006062 MAP3K14 ENST00000586644 -
38 17 ENSG00000006062 MAP3K14 ENST00000617331 -
38 17 ENSG00000006062 MAP3K14 ENST00000376926 -
38 CHR_HSCHR17_2_CTG5 ENSG00000282637 MAP3K14 ENST00000633437 -
38 CHR_HSCHR17_2_CTG5 ENSG00000282637 MAP3K14 ENST00000634016 -

Here one of the transcripts have been given a refseq identifier in build 38, that transcript exists in build 37 so no variants would be missed here.