# if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#
# BiocManager::install("PCAtools")Bioinformatics Midterm exam spring 2026
Load libraries
you will need “tidyverse”, “broom” and “ggbeeswarm” (hint library()) if you miss any library install it with install.packages()
Load dataset
- load mice_weights_and_diet.tsv data sets and tissue_gene_expression.tsv (hint
read_tsv)
Examine Data
- Inspect the mice_weights object with
head(),tail()
- What are the dimensions of mice_weights?
Subsetting Data using base R
- Select the 2nd column of mice_weights only using square brackets.
- Select the 42nd row of mice_weights using square brackets.
- Select rows 10 to 25 and columns 1 and 5 using square brackets from mice_weights.
- Select the
sexcolumn using a dollar sign from mice_weights
- Find the the minimum and maximum of mice_weights
- Use
arrange()to rearrange the “percent_fat” table so the largest entry is the first in mice_weights
- Find the mean and standard deviation of litter size for female sex (hint: use
filter,group_by(),summarize) from mice_weights.
- Find the find average weights and standard deviation for male and female on different diets (hint: use
group_by()) in mice_weights dataset.
Graphics
- load
tissue gene expressionif you didn’t do at at step 1 and look at it withhead()
Scater plot
- from “Tissue Gene Expression” dataset make scatter plot of correlation of two randomly selected genes
Scater plot w/ linear fit
- add linear trend line to you plot and change the plot theme with
theme_bw()ortheme_classic()ortheme_minimal()or any othertheme_...extension
- Modified previous plot by splitting data by tissue of origin when plotting this correlation and switching off standard error
se = Foption
Box plot
- Pick one random gene and make Box plot of it with tissues type on x-axis, expression level on y-axis and color (
fillcolor) by tissue.
Density plot
- change previous plot of your randomly selected gene to density plot and color by tissue type (set
alpha=0.2in density plot)
18 to make previous plot more informative facet this plot by tissue use as theme try to use theme_minimal()
Statistical test
T.Test
- Go back to mice_weights dataset and plot Box plot of
body_weightby diet and sex (use color and faceting) (e.g. x-axissex, y-axisbody_weight,facet bydiet) (extra bonus: if you want to make it prettier add the points to your boxplot withgeom_quasirandom(alpha= 0.5))
- In mice_weights dataset: Q: Is there effect of diet on weight? (hint use t.test) Plesase answer the question in writing! One sentence is enought.
- In mice_weights dataset Q: is there effect of sex on weight? (use t.test) Plesase answer the question in writing! One sentence is enought.
ANOVA
- re-plot box plot step 16. Now using ANOVA-test
aovtest if the expression of your gene is different across tissues types Q: Is there a difference of your gene expression across tissue types? Is at least one tissue different in expression then other? Plesase answer the question in writing! One sentence is enought.
- Do post-hoc test of 22. If you see your Anova-test to be significant, at least one tissue has different expression. Which one (two or more) is/are different? Q1: Which two tissue are the most different in your gene expression analysis, Q2: Is there any pair where the difference in expression is not significant (expression is same or comparable)? Plesase answer the questions in writing! One sentence is enought.
Linear modeling
Test linear dependence of two genes that you previously piked in steps 13-15 from Gene Expression dataset.
- Redo plot you did at step 14:
- re-plot gene1 against gene2
- Test in linear model if this two genes are dependent (correlated) or not: Q: Is the expression between your two genes lineary dependent (correlated)? (H0: no correlation) Plesase answer the question in writing! One sentence is enought.
Bonus
- Bonus: if you do not feel like you can do it just skip it: In
mice weights dataset; Can you put together step 20 and 21 in one test that will test for diet and sex simultaneously ? (hint use linear model with covarietlm( y ~ x1 + x2 ))
Principal Component Analysis (PCA)
Run PCA on tissue expression dataset preferably use PCAtools package
If you do not have PCAtools install them with uncomenting and running following code:
library("PCAtools")Loading required package: ggplot2
Loading required package: ggrepel
Attaching package: 'PCAtools'
The following objects are masked from 'package:stats':
biplot, screeplot
To help you reformat data, here is reformating step
# data_4_pca <- your_gene_expression %>%
# select(-tissue) %>%
# column_to_rownames("sample") %>%
# t() %>%
# as_tibble(rownames="geneID") %>%
# column_to_rownames("geneID")
#
# metadata_4_pca <- your_gene_expression %>%
# select(sample, tissue) %>%
# column_to_rownames("sample")- run PCA (easiest is to use
pca()from PCAtools package)
28 plot Screenplot that shows inportance of copmonents (hint use screeplot() function). Please do not plot all components 1:10 is enought! (hint use option components = 1:10)
29 Make Scatter plot of few first PCs Biplot (hint use biplot and use option colby = 'tissue')