Annotations
Scout is a primarily a visualisation tool with some other functionality. One could imagine that in the future, some or all annotations could be performed by Scout. For now scout will look for some known keys when uploading a VCF and extract information for those. VEP is the tool supported for functional and regional annotations at the moment, SnpEff will be added in the near future. For the other types of annotations Scout will look for certain keys in the INFO field of the vcf and expect the value to be of a specific type. This means that there is not a dependency on any other specific annotation tool besides VEP, just make sure that the key and values are correct according to the specification below.
Rank score
One of the hard problem when dealing with whole genome data is the huge amount variants that are generated in every analysis. Scout was developed to be used in rare variant analysis, this means that there is ony a small number of variants that are actually interesting to look at. We do not want to store all variants from each case in a database that should be able to controll thousands of cases. To solve this problem we are working with rank scores, each variant is scored according to a scoring schema then we only upload and sort the variants based on their rank score. In this way the users can start by looking at the variants that looks potentially most dangerous from a bioinformatic perspective. We use the tool genmod to (among other things) score the variant, but as long as there is a RankScore
-field in the INFO
field of the VCF with a float as value it is handeled by Scout.
Annotation keys and tool suggestions
In this section all the different annotation keys and suggestions of tools that can be used to annotate them are listed. If nothing else is stated Scout will search in the INFO field to locate the key value pair.
Frequencies
1000G
The frequency from the 1000G population database.
1000G_MAX_AF
The maximum allele frequency of all populations in the 1000G population database.
- Key:
1000G_MAX_AF
- Value:
Float
- Tools: custom made, we have modified the 1000G file and use genmod
ExAC
The frequency from the ExAC population database.
ExAC_MAX_AF
The maximum allele frequency of all populations in the ExAC population database.
- Key:
EXAC_MAX_AF
- Value:
Float
- Tools: custom made, we have modified the exac file and use genmod
GnomAD
The maximum allele frequency of all populations in gnomAD population database.
- Key:
gnomAD_AF
in VEPCSQ
field - Value:
Float
- Tools: VEP
Severity
CADD score
The Combined Annotation Dependent Depletion(CADD) score. A prediction of the deleterioussness for a variant.
SIFT
The SIFT prediction for how a variation affects the protein.
- Key:
SIFT
in VEPCSQ
field - Value:
String
- Tools: VEP
PolyPhen
The PolyPhen prediction for how a variation affects the protein.
- Key:
PolyPhen
in VEPCSQ
field - Value:
String
- Tools: VEP
SpliceAI
The SpliceAI prediction for how a variant affects splicing. SpliceAI transcript delta score, defined as max of all transcipt delta scores DS, can be interpreted as a probabilty of splicing being affected for the current transcript. Jaganathan 2019 consider thresholds 0.2 high recall, 0.5 recommended, and 0.8 high precision. The DS and DP pairs describe probability and relative position scores, with negative position values being upstream. E.g. donor gain DS_DG 0.08 at DP_DG -31, acceptor loss DS_AL 0.58 at DP_AL -2.
- Key
CSQ
-SpliceAI_pred_DS_AG
,SpliceAI_pred_DP_AG
,SpliceAI_pred_DS_AL
,SpliceAI_pred_DP_AL
,SpliceAI_pred_DS_DG
,SpliceAI_pred_DP_DG
,SpliceAI_pred_DS_DL
,SpliceAI_pred_DP_DL
- Value:
Float
inDS
andInt
inDP
- Tools: SpliceAI
Rank score
The combined rank score for a variant. For exact info see test
- Key:
RankScore
- Value:
Float
- Tools: genmod
Conservation
Gerp
The Genomic Evolutionary Rate Profiling(GERP) conservation string. An estimation of how conserved this position is.
- Key:
GERP++_RS_prediction_term
- Value:
String
- Tools: SnpSift
phastCons
The PHASTcons conservation string.
- Key:
phastCons100way_vertebrate_prediction_term
- Value:
String
- Tools: SnpSift
phylop
The phylop 100 way predicted conservation string.
- Key:
phyloP100way_vertebrate_prediction_term
- Value:
String
- Tools: SnpSift
Inheritance
Genetic models
What genetics models are followed for the variant in the particular family
- Key:
GeneticModels
- Value: list of
String
- Tools: genmod
Autosomal Recessive Compounds
What variants is this variant in Autosomal Recessive Compound with?
- Key:
Compounds
- Value: list of
String
- Tools: genmod