Introducing R and RStudio IDE


  • R is a powerful, popular open-source scripting language
  • You can customize the layout of RStudio, and use the project feature to manage the files and packages used in your analysis
  • RStudio allows you to run R in an easy-to-use interface and makes it easy to find help

R Basics


  • Effectively using R is a journey of months or years. Still you don’t have to be an expert to use R and you can start using and analyzing your data with with about a day’s worth of training
  • It is important to understand how data are organized by R in a given object type and how the mode of that type (e.g. numeric, character, logical, etc.) will determine how R will operate on that data.
  • Working with vectors effectively prepares you for understanding how data are organized in R.

Introduction to the example dataset and file type


  • The dataset comes from a real world experiment in E. coli.
  • Publicly available FASTQ files can be downloaded from NCBI SRA.
  • Several steps are taken outside of R/RStudio to create VCF files from FASTQ files.
  • VCF files store variant calls in a special format.

R Basics continued - factors and data frames


  • It is easy to import data into R from tabular formats including Excel. However, you still need to check that R has imported and interpreted your data correctly
  • There are best practices for organizing your data (keeping it tidy) and R is great for this
  • Base R has many useful functions for manipulating your data, but all of R’s capabilities are greatly enhanced by software packages developed by the community

Using packages from Bioconductor


  • Bioconductor is an alternative package repository for bioinformatics packages.
  • Installing packages from Bioconductor requires a new method, since it is not compatible with the install.packages() function used for CRAN.
  • Check Bioconductor to see if there is a package relevant to your analysis before writing code yourself.

Data Wrangling and Analyses with Tidyverse


  • Use the dplyr package to manipulate data frames.
  • Use glimpse() to quickly look at your data frame.
  • Use select() to choose variables from a data frame.
  • Use filter() to choose data based on values.
  • Use mutate() to create new variables.
  • Use group_by() and summarize() to work with subsets of data.

Data Visualization with ggplot2


  • ggplot2 is a powerful tool for high-quality plots
  • ggplot2 provides a flexiable and readable grammar to build plots

Getting help with R


  • R provides thousands of functions for analyzing data, and provides several way to get help
  • Using R will mean searching for online help, and there are tips and resources on how to search effectively