mutfunc is a resource used to annotate variants, displaying the ones that are likely to be deleterious to function and predicted consequences on protein stability, interaction interfaces, regulatory regions (TF binding sites), PTMs, linear motifs, conservation and start/stop codons. The annotations/predictions are based on the computation on the impact of all possible variants using existing algorithms that cover diferent mechanisms listed below.
This help will guide you through using the database. For any additional questions or suggestions please see the about page for contact information.
The plain text variants is a simplified format for variants. Variants are line separated. They can be uploaded via the textbox or as a file and should be formatted as follows:
NAME X123Y
or NAME 123 X Y
where
NAME |
Name of the protein or chromosome. For protein names, UniProt accessions, gene names or IDs are acceptable |
X |
Reference amino acid or base |
Y |
Mutated amino acid or base |
123 |
Amino acid or chromosome position |
Note that the separator can be any of the following symbols ,
/
, a space character or a tab character.
Here are a few valid examples
YDL203C K184A
ACK1 K184A
Q07622 184 K/A
chrI 61165 T/A
Note that DNA variants should always be provided with respect to the positive strand, and not the negative.
Accepted gene and chromosome nomenclature will vary from organism to organism. Please make sure you are using one of the following. Note: gene names are case insensitive.
Organism | Accepted gene nomenclature | Accepted chromosome nomenclature |
---|---|---|
Yeast | ORF identifiers (e.g. YNL064C ), gene names (e.g. MAS5 ), UniProt accessions (e.g. P25491 ), and UniProt entry IDs (e.g. MAS5_YEAST ) |
Numeric (e.g. chr1 , chr2 ), roman numerals (e.g. chrI , chrII ) and NCBI IDs (e.g. NC_001133 ) |
Ecoli | Gene names (e.g. casA ), locus IDs or b-numbers (e.g. b2760 ), ECK identifiers (e.g. ECK2755 ), UniProt accessions (e.g. Q46901 ), and UniProt entry IDs (e.g. CSE1_ECOLI ) |
NCBI IDs (NC_000913 or NC_000913.3 ) and simply using chr |
Human | UniProt accessions (e.g. P04637 ), UniProt entry IDs (e.g. P53_HUMAN ) and Entrez gene identifiers (e.g. 7157 ) |
Numeric (e.g. chr1 , chr2 ) or simply the chromosome number (e.g. 1 ,2 ) |
VCF files are accepted. Simply upload your VCF file, preferably via Google Drive or Dropbox. Once uploaded, coding variants will be identified and used along with non-coding variants to query the database.
Note that all sample information is ignored. In other words, all called variants in the VCF file will be used, regardless of quality or sample. Therefore, if your VCF file is large we suggest removing sample information and/or low quality variants to avoid prolonged upload and processing times.
In expanded rows, clicking links will display a popup dialog with additional information on the consequence. This section aims to explain dialogs displayed by different consequences.
Phosphorylation occurs on serine (S), threonine (T) and tyrosine (Y) residues by protein kinases. Kinases often have specificities around the central STY residue, crucial for phosphorylation. A variant hitting the central STY residue will always result in the phosphorylation site being lost. If the variant hits the flanking region of the site, it may also disrupt the kinases specificity, leading to loss of phosphorylation. We use the MIMP program to predict loss of phosphorylation here.
The top of the dialog shows the sequence of the phosphorylation site. If a kinase is affected, the sequence logo visualizing the specificity of the kinase will also be shown.
The following are the fields which appear:
If a kinase is affected, the dialog will show a few extra fields:
Unlike kinases, most other enzymes responsible for other PTMs do not confer any flanking specificity towards their target site. Therefore, for other PTMs such as acetylation and ubiquitylation, we only report if a variant changes the central modified residue.
The top of the dialog shows the sequence of the modified site, before and after the variant.
The fields displayed in the dialog are as follows:
Short linear motifs are sequence patterns required for recognition and targeting activities. For example cleavage sites, and protein localization.
If your variant falls within a linear motif site, it can either be impactful or not depending on if it affects the recognition pattern. For example if your pattern is [KR].D
a variant which changes a site from KND
to RND
would not be impactful compared to DND
, which would be impactful. You can set the option to show only impactful variants by setting the impactful variants only option
The top of the dialog shows the sequence of experimentally confirmed linear motif site, before and after the variant.
The fields displayed in the dialog are as follows:
If an amino acid substitution causes too much strain on the protein structure, it will often destabilise it, resulting in loss of function. We use the FoldX program to calculate the free energy of unfolding the protein structure before (ΔGwt) and after your variant (ΔGmt). If the difference between these two values (ΔΔGpred) is high, often above 2, the variant is said to be destabilising. We use both experimental and homology modelled structures.
The top of the dialog shows protein structure viewer, with the variant in red. Note that only the mutated chain is shown in this viewer. You can switch the viewer to full screen mode by clicking the icon. You can also change the view and coloring of the structure by toggling the settings with
The fields displayed in the dialog are as follows:
Variants within protein interaction interfaces, if destabilising, can disrupt the interaction. Similar to protein stability, we compute the ΔΔGpred in binary interaction structures defined by the Interactome3D database.
If a residue is within an interaction interface but not destabilising, it will not have a red dot.
The top of the dialog shows protein structure viewer. The two proteins are shown in blue and white. Interface residues of the mutated protein are shown in yellow, and the mutated residue is shown in red.
The fields displayed in the dialog are the same as those in the protein stability section with the exception of the following:
If a variant occurs within a conserved region, it's likely to have an impact on protein function. We use the SIFT program to predict whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.
The fields displayed in the dialog are as follows:
Transcription factors have specifcities towards their target sites. We identify biologically relevant TF binding sites and use known specifcities of TFs to score these sites before and after variants.
The top of the dialog shows the sequence of the TFBS along with the sequence logo visualizing the specificity of the TF.
The fields displayed in the dialog are as follows:
If queried variants disrupt start codons, disrupt stop codons, or introduce stop codons, they are expected to impact the translation of the protein
We have set up a public bugtracker on github. Alternatively, you can get in touch with us. We are also keen in receiving new feature requests.
All jobs are stored for 48 hours. After that period, you'll likely get a job not found error and will have to reupload your variants or VCF file
This error shows up when a submitted variant is not found in the database. Please note that for variants modelled by the protein stability and conservation categories, we are currently storing only the information of variants with a deleterious impact. This is due to limitations in the database underlying mutfunc.