Getting started

Introduction

mutfunc is a resource used to annotate variants, displaying the ones that are likely to be deleterious to function and predicted consequences on protein stability, interaction interfaces, regulatory regions (TF binding sites), PTMs, linear motifs, conservation and start/stop codons. The annotations/predictions are based on the computation on the impact of all possible variants using existing algorithms that cover diferent mechanisms listed below.

This help will guide you through using the database. For any additional questions or suggestions please see the about page for contact information.

Data input

There are two accepted formats. These are described in further detail below.

Plain text variants

The plain text variants is a simplified format for variants. Variants are line separated. They can be uploaded via the textbox or as a file and should be formatted as follows: NAME X123Y or NAME 123 X Y where

NAME Name of the protein or chromosome. For protein names, UniProt accessions, gene names or IDs are acceptable
X Reference amino acid or base
Y Mutated amino acid or base
123 Amino acid or chromosome position

Note that the separator can be any of the following symbols , /, a space character or a tab character.

Here are a few valid examples
YDL203C K184A   ACK1 K184A   Q07622 184 K/A  chrI 61165 T/A

Note that DNA variants should always be provided with respect to the positive strand, and not the negative.

Gene and chromosome names

Accepted gene and chromosome nomenclature will vary from organism to organism. Please make sure you are using one of the following. Note: gene names are case insensitive.

Organism Accepted gene nomenclature Accepted chromosome nomenclature
Yeast ORF identifiers (e.g. YNL064C), gene names (e.g. MAS5), UniProt accessions (e.g. P25491), and UniProt entry IDs (e.g. MAS5_YEAST) Numeric (e.g. chr1, chr2), roman numerals (e.g. chrI, chrII) and NCBI IDs (e.g. NC_001133)
Ecoli Gene names (e.g. casA), locus IDs or b-numbers (e.g. b2760), ECK identifiers (e.g. ECK2755), UniProt accessions (e.g. Q46901), and UniProt entry IDs (e.g. CSE1_ECOLI) NCBI IDs (NC_000913 or NC_000913.3) and simply using chr
Human UniProt accessions (e.g. P04637), UniProt entry IDs (e.g. P53_HUMAN) and Entrez gene identifiers (e.g. 7157) Numeric (e.g. chr1, chr2) or simply the chromosome number (e.g. 1,2)

VCF file format

VCF files are accepted. Simply upload your VCF file, preferably via Google Drive or Dropbox. Once uploaded, coding variants will be identified and used along with non-coding variants to query the database.

Note that all sample information is ignored. In other words, all called variants in the VCF file will be used, regardless of quality or sample. Therefore, if your VCF file is large we suggest removing sample information and/or low quality variants to avoid prolonged upload and processing times.

Interpreting results

In expanded rows, clicking links will display a popup dialog with additional information on the consequence. This section aims to explain dialogs displayed by different consequences.

Phosphorylation

Phosphorylation occurs on serine (S), threonine (T) and tyrosine (Y) residues by protein kinases. Kinases often have specificities around the central STY residue, crucial for phosphorylation. A variant hitting the central STY residue will always result in the phosphorylation site being lost. If the variant hits the flanking region of the site, it may also disrupt the kinases specificity, leading to loss of phosphorylation. We use the MIMP program to predict loss of phosphorylation here.

The top of the dialog shows the sequence of the phosphorylation site. If a kinase is affected, the sequence logo visualizing the specificity of the kinase will also be shown.

The following are the fields which appear:

Phosphosite
Residue and position of the phosphorylation site affected
Experimental
kinases
Experimentally identified kinases, if any, for this site
Function
Known function, if any, for the site
Site evidence
PubMed evidence of the phosphorylation site
Kinase evidence
PubMed evidence for experimental kinases
Function evidence
PubMed evidence for the function of the site

If a kinase is affected, the dialog will show a few extra fields:

Kinase lost
Kinase predicted to have lost phosphorylation of this site
Wildtype score
Score of the phosphosite against the kinase specificity model, before the variant. The score ranges between 0-1
Mutant score
Score of the phosphosite against the kinase specificity model, after the variant
Probability of loss
Probability the phosphorylation for this site is disrupted for this kinases

Other PTMs

Unlike kinases, most other enzymes responsible for other PTMs do not confer any flanking specificity towards their target site. Therefore, for other PTMs such as acetylation and ubiquitylation, we only report if a variant changes the central modified residue.

The top of the dialog shows the sequence of the modified site, before and after the variant.

The fields displayed in the dialog are as follows:

Site
Residue and position of the modified site
Modification
Type of PTM
Residue type
Type of residue being modified
Reference
PubMed evidence for modification

Linear motifs

Short linear motifs are sequence patterns required for recognition and targeting activities. For example cleavage sites, and protein localization.

If your variant falls within a linear motif site, it can either be impactful or not depending on if it affects the recognition pattern. For example if your pattern is [KR].D a variant which changes a site from KND to RND would not be impactful compared to DND, which would be impactful. You can set the option to show only impactful variants by setting the impactful variants only option

The top of the dialog shows the sequence of experimentally confirmed linear motif site, before and after the variant.

The fields displayed in the dialog are as follows:

ELM ID
Linear motif ID
ELM Description
Description of the linear motif
Pattern
Pattern of the linear motif
Relative position
Relative position of the variant within the linear motif

Protein stability

If an amino acid substitution causes too much strain on the protein structure, it will often destabilise it, resulting in loss of function. We use the FoldX program to calculate the free energy of unfolding the protein structure before (ΔGwt) and after your variant (ΔGmt). If the difference between these two values (ΔΔGpred) is high, often above 2, the variant is said to be destabilising. We use both experimental and homology modelled structures.

The top of the dialog shows protein structure viewer, with the variant in red. Note that only the mutated chain is shown in this viewer. You can switch the viewer to full screen mode by clicking the icon. You can also change the view and coloring of the structure by toggling the settings with

The fields displayed in the dialog are as follows:

Model
Structural model used for the stability prediction
Mutated chain
Chain in the model which is mutated
PDB Position
Position of the variant within the PDB coordinate file. If the PDB file has different numbering, this can often be different from the position queried
ΔΔGpred
Difference between the predicted ΔG before and after the variant. A value above 2 often indicates a destabilising variant.

Protein interaction interfaces

Variants within protein interaction interfaces, if destabilising, can disrupt the interaction. Similar to protein stability, we compute the ΔΔGpred in binary interaction structures defined by the Interactome3D database.

If a residue is within an interaction interface but not destabilising, it will not have a red dot.

The top of the dialog shows protein structure viewer. The two proteins are shown in blue and white. Interface residues of the mutated protein are shown in yellow, and the mutated residue is shown in red.

The fields displayed in the dialog are the same as those in the protein stability section with the exception of the following:

Interaction
Interaction predicted to be impacted by this variant

Conservation

If a variant occurs within a conserved region, it's likely to have an impact on protein function. We use the SIFT program to predict whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.

The fields displayed in the dialog are as follows:

SIFT score
The SIFT score ranges between 0-1 and is analogous to a p-value i.e. a variant with score of < 0.05 is considered deleterious
Number of sequences in alignment
Number of sequences, out of all sequences, with a non-gapped amino acid at this position
Positional information content in alignment
The information content of the position. This indicates how conserved this position is and ranges between 0-4.32. A high value indicates high conservation.

Transcription factor binding sites

Transcription factors have specifcities towards their target sites. We identify biologically relevant TF binding sites and use known specifcities of TFs to score these sites before and after variants.

The top of the dialog shows the sequence of the TFBS along with the sequence logo visualizing the specificity of the TF.

The fields displayed in the dialog are as follows:

TF lost
The transcription factor predicted to be impacted
TFBS
Loci and strand of the binding site
Downstream ORF
Gene thought to regulated by the TF
TF knockout p-value
p-value for the over or under-expression of the downstream gene, when the TF is knocked out. These values are obtained from experimentally knockdown experiments.
ChIP-chip region
ChIP-chip evidence for the TF binding to this site
Wildtype score
Score of the binding site against the TF specificity model, before the variant. The score ranges between 0-1
Mutant score
Score of the binding site against the TF specificity model, after the variant.
Score difference
Difference between the wildtype and mutant scores
IC difference
Difference between the information content between the wildtype and mutant. This is represented as the height of the base in the sequence logo
Percentile difference
In a distribution of scores for the TF specificity model against randomly generated oligonucleotides.

Start/stop codons

If queried variants disrupt start codons, disrupt stop codons, or introduce stop codons, they are expected to impact the translation of the protein

  • Start codon loss the protein is never translated, since the start codon is required for translation initiation
  • Stop codon loss protein translation does not stop when it should, resulting in a non-functional run-off protein, typically degraded by the cell
  • Stop codon gain protein translation halts earlier, resulting in a misfolded protein, typically degraded by the cell

FAQs

How do I submit a bug report?

We have set up a public bugtracker on github. Alternatively, you can get in touch with us. We are also keen in receiving new feature requests.

How long are my results stored for?

All jobs are stored for 48 hours. After that period, you'll likely get a job not found error and will have to reupload your variants or VCF file

What does the "no deleterious impacts found" error message mean?

This error shows up when a submitted variant is not found in the database. Please note that for variants modelled by the protein stability and conservation categories, we are currently storing only the information of variants with a deleterious impact. This is due to limitations in the database underlying mutfunc.

Copyright EMBL-EBI 2016 · EBI is an outstation of the European Molecular Biology Laboratory · Terms of use