Last updated: 2022-10-08

Checks: 2 0

Knit directory: Bio322/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version e972b6c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store

Untracked files:
    Untracked:  210922_genome expression_epigenetics.pptx
    Untracked:  210927.module1.2.RNAseq_on_Galaxy.pdf
    Untracked:  220317_Advanced topics in genomics.docx
    Untracked:  220317_Advanced topics in genomics_MarieSai.docx
    Untracked:  BIO322_Teaching plan BIO322 2021.docx
    Untracked:  Bio322.09132021.pdf
    Untracked:  Bio322.09132021.pptx
    Untracked:  Bio322.09152021.backup.pptx
    Untracked:  Bio322.09152021.pdf
    Untracked:  Bio322.09152021.pptx
    Untracked:  Bio322.09202021.pdf
    Untracked:  Bio322.09202021.pptx
    Untracked:  Bio322.09272021.pptx
    Untracked:  Bio322.09272021/
    Untracked:  Bio322scRNAseq.tsv
    Untracked:  Galaxy1-[intestinalData.tsv].tabular
    Untracked:  Galaxy2.txt
    Untracked:  Group.csv
    Untracked:  analysis/2022_genome.function2.Rmd
    Untracked:  analysis/Evolution_for_lab.Rmd
    Untracked:  analysis/_site/
    Untracked:  analysis/genomebrowser_for_lab.Rmd
    Untracked:  analysis/tutorial.RNAseq.foradults.xlsx
    Untracked:  bio322.xlsx
    Untracked:  bio322_2022.pptx
    Untracked:  chr15_inversion-v1.0.0.zip
    Untracked:  gene_regulation_bio322_2022.pptx
    Untracked:  gwassim.txt
    Untracked:  intestinalData.tsv
    Untracked:  main_workflow.ga
    Untracked:  markdown_test/
    Untracked:  mouse_intestine_scRNAseq.txt
    Untracked:  oharring-chr15_inversion-9615456/
    Untracked:  science.abg0718_data_s1_to_s8.zip
    Untracked:  science.abg0718_data_s1_to_s8/
    Untracked:  scrna_tenx.ga

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/2022_genome.function1.Rmd) and HTML (docs/2022_genome.function1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd e972b6c mariesaitou 2022-10-08 wflow_publish("analysis/2022_genome.function1.Rmd")
html 7c92f8f mariesaitou 2022-10-08 Build site.
Rmd b929869 mariesaitou 2022-10-08 wflow_publish("analysis/2022_genome.function1.Rmd")
html ab351d7 mariesaitou 2022-10-04 Build site.
Rmd c9b08bd mariesaitou 2022-10-04 wflow_publish("analysis/2022_genome.function1.Rmd")

Genome variation and function 1

Goal: Today, we learn about how to analyze genetic variants between individuals of the same species.

  1. How to read a “variant call format” (VCF) file

  2. How to investigate the phenotypic effect of variants

1 Variant call format

Interpret a VCF file

ref: Variant calling format: VCF file

ref: How to read a VCF file

Header with “#” — information field.

CHROM: chromosome of the variant location

POS: position of the variant

ID: ID of each variant to make it easy to manage

REF: reference allele (the same type as the reference genome)

ALT: alternative allele (the mutant type which is different from the reference genome)

QUAL FILTER INFO FORMAT: variant calling quality and other detail of the variant

Samples… Genotype. “0” means “reference type”, “1” means “alternative type”

Example

  1. The first variant is at chromosome 2, 81170 base.

  2. The first variant’s reference allele is C, alternative allele is T.

  3. For The first variant, Sample1’s genotype is C/C, Sample2’s is C/C, Sample3’s is C/T.

2 The effect of genetic variants

You found a genetic variant in your sequenced individual. You want to investigate the potential effect of the variants.

Assume that you found the following deletion polymorphism at in a rabbit genome.

12: 107,236,296-107,236,969

Go to the Variant Effect Predictor

Input the following information:

Species - rabbit (Oryctolagus_cuniculus)

Variant - 12 107236296 107236969 DEL + deletion1

the variant format, left to right … chromosome, starting point, ending point, kind, strand, variant ID (you can name it as you like)

Variants.

You can also find various acceptable input formats here

The result shows that the deletion is affecting a gene, SLC35F1, by causing coding sequence variant, splice variant, feature truncation and intron variant. You can modify the result table by clicking “show/hide columns”

Now you found that the variant is overlapping a gene, and you want to know the function of the gene.

Click “ENSOCUG00000015307” to know more about this gene.

Sometimes, gene function information is not well described in non-model species. But in other species such as mouse, the homologous gene function may have been investigated.

In the gene page, you can investigate its evolutionary history and potential function.

Evolutionary insights

Let’s explore “gene gain/loss tree” “Onthologies”and “Phenotypes”

Gene gain/loss tree

Most of mammalian species have two copies of this gene.

See what is happening in fish!

Gene Ontology The gene is related to “transmembrane transtport”.

Phenotypes. There is no phenotype information for this gene in rabbits, but there are some reports in mouse and rat, such as leukocyte cell number etc.

3 Hands-on part

Now, you have a VCF file - click here of dogs (Canis_lupus_familiaris).

Please explore the genetic variation in the file with the tools above.

Questions 1

Q1-1. How many variants are there in the VCF file?

Q1-2. How many individuals are there in the VCF file?

Q1-3. What is the genotype of sample 1 for variant 1?

Q1-4. What is the reference and alternative alleles of variant 2?

Q1-5. Please describe the genotype frequency of variant 3.

Q1-6. Please describe the allele frequency of variant 15.

Result 1

Click to display

A1-1. 15 variants

A1-2. 18 samples

A1-3. GG

A1-4. Reference: T. Alternative: TAGACACTGTCTACCGAAATGTAGACACTGTCTACCGAAATG

A1-5.

There are sixteen 0|0, one 0|1 and one 1|1 individuals.

Each genotype frequency is:

G/G:0.056 (1 out of 18)

G/GGG:0.89 (16 out of 18)

GGG/GGG:0.056 (1 out of 18)

A1-6.

There are fifteen 0|0, two 0|1 and one 1|1 individuals.

There are 2 * 18 = 36 alleles (each individual has two alleles)

Each allele frequency is:

A allele: 0.89 (15 * 2 + 2 * 1 = 32, 32 our of 36)

G allele: 0.11 (2 * 1 + 1 * 2 = 4, 4 out of 36)

Questions 2

Q2-1 How many genes were overlapped with submitted variants?

Q2-2 How many variants are overlapped with intronic regions?

Q2-3 How many variants are known, and how many are not reported?

Q2-4 What kind of exonic variants are there?

Q2-5 What are the predicted amino acid changes by the “missense variant” among them? Which exon is it?

Result 2

Click to display

A2-1. Three (LCT, SPOPL, UBXN4)

A2-2. Ten variants

A2-3. Nine variants are already reported (existing), and six variants are not reported (novel).

A2-4. inframe_insertion, synonymous_variant, frameshift_variant, missense_variant

A2-5. Q -> R (Glutamine -> Arginine), the 8th of the 17 exons.

Questions 3

Q3-1 What is the “ontology” of the LCT gene?

Q3-2 What is the phenotype of the LCT-deficient rodents?

Q3-3 Do mammalian and non-mammalian species have the same number of the LCT gene?

Result 3

Click to display

A3-1

Molecular function: protein homodimerization activity

Cellular component: plasma membrane

Biological process: metabolic process

— LCT is located in membrane, and contributes to metabolic process with its protein homodimerization activity.

Show/hide columns (1 hidden)

A3-2

Lactose Intolerance, Diarrhea, colitis etc.

A3-3

Most mammals have one LCT gene. Fish species have variable number of LCT gene, such as eight genes in Bicolor damselfish.