Tutorial of regions versus signals

In this tutorial, we will demonstrate how we can use RGT-Viz to visualize signlas in different regions.

Download the data

We will use the epigenetic data from dendritic cell development study as example. There, we have ChIP-Seq data from the transcription factor PU.1 and IRF8, and histone modifications H3K4me1, H3K4me3, H3K9me3, H3K27me3, and H3K27ac on four cellular states: multipotent progenitors (MPP), dendritic cell progenitors (CDP), common dendritic cells (cDC) and plamatocyte dendritic cells (pDC). The functional annotation of these histone markers are showed as follows:

  • H3K4me1 is enriched at active and primed enhancers;

  • H3K4me3 is highly enriched at active promoters near Transcription start site (TSS);

  • H3K9me3 is a marker of heterochromatin which has pivotal role during lineage commitement;

  • H3K27me3 is associated with the downregulation of nearby genes via the formation of heterochromatic regions;

  • H3K27ac is accociated with the higher activation of transcription and defined as an active enhancer marker.

The peaks of PU.1 and IRF8 are further processed into 3 groups: overlapping peaks of PU.1 and IRF8, PU.1 peaks (no IRF8), and IRF8 peaks (no PU.1). Those files are listed below:

  • PU1_IRF8_pDC_overlap_peaks.bed

  • PU1_pDC_noIRF8_peaks.bed

  • IRF8_pDC_noPU1_peaks.bed

  • PU1_IRF8_cDC_overlap_peaks.bed

  • PU1_cDC_noIRF8_peaks.bed

  • IRF8_cDC_noPU1_peaks.bed

Next, please download the folder “rgt_viz_example” from here.

unzip rgt_viz_example
cd rgt_viz_example

Now you have the files as described below:

data/
├── bw
│   ├── H3K27ac_cDC.bw
│   ├── H3K27ac_CDP.bw
│   ├── H3K27ac_MPP.bw
│   ├── H3K27ac_pDC.bw
│   ├── H3K27me3_cDC.bw
│   ├── H3K27me3_CDP.bw
│   ├── H3K27me3_MPP.bw
│   ├── H3K27me3_pDC.bw
│   ├── H3K4me1_cDC.bw
│   ├── H3K4me1_CDP.bw
│   ├── H3K4me1_MPP.bw
│   ├── H3K4me1_pDC.bw
│   ├── H3K4me3_cDC.bw
│   ├── H3K4me3_CDP.bw
│   ├── H3K4me3_MPP.bw
│   ├── H3K4me3_pDC.bw
│   ├── H3K9me3_cDC.bw
│   ├── H3K9me3_CDP.bw
│   ├── H3K9me3_MPP.bw
│   ├── H3K9me3_pDC.bw
│   ├── IRF8_cDC.bw
│   ├── IRF8_pDC.bw
│   ├── PU1_cDC.bw
│   ├── PU1_CDP.bw
│   ├── PU1_MPP.bw
│   └── PU1_pDC.bw
└── peaks
    ├── H3K4me3_cDC_WT_peaks.bed
    ├── H3K4me3_CDP_WT_peaks.bed
    ├── H3K4me3_MPP_WT_peaks.bed
    ├── H3K4me3_pDC_WT_peaks.bed
    ├── PU1_IRF8_cDC_overlap_peaks.bed
    ├── IRF8_cDC_noPU1_peaks.bed
    ├── PU1_cDC_noIRF8_peaks.bed
    ├── PU1_IRF8_pDC_overlap_peaks.bed
    ├── IRF8_pDC_noPU1_peaks.bed
    └── PU1_pDC_noIRF8_peaks.bed

These directories include the genomic signals of histone modifications (files with a .bw ending as generated by bamCoverage) and the genomic regions of PU.1 and IRF8 peaks (files with .narrowPeak endings as generated by MACS2) in different DC cells.

Creating Line Plots with RGT-Viz

Here we demonstrate how we can use RGT-Viz for drawing a lineplots. This allows for example to inspect the ChIP-Seq signals around particular genomic regions, as PU.1. peaks. Before you proceed, please install RGT-Viz.

Understand experimental matrix

Before we use the RGT-Viz, you must define an experimental matrix. This tab separated file includes information necessary for RGT to understand your data, i.e. file paths, protein measured in the ChIP-Seq experiment, type of file and so on.

For example “Matrix_cDC.txt” includes the files, which we need for finding the association of genomic signals on the genomic peaks of PU.1 and IRF8.

name

type

file

factor

cell

cDC_PU1_IRF8_peaks

regions

./data/peaks/PU1_IRF8_cDC_overlap_peaks.bed

PU1_IRF8_peaks

cDC

cDC_PU1_peaks

regions

./data/peaks/PU1_cDC_noIRF8_peaks.bed

PU1_peaks

cDC

cDC_IRF8_peaks

regions

./data/peaks/IRF8_cDC_noPU1_peaks.bed

IRF8_peaks

cDC

cDC_PU1

reads

./data/bw/PU1_cDC.bw

PU.1

cDC

cDC_H3K4me1

reads

./data/bw/H3K4me1_cDC.bw

H3K4me1

cDC

cDC_H3K4me3

reads

./data/bw/H3K4me3_cDC.bw

H3K4me3

cDC

cDC_H3K9me3

reads

./data/bw/H3K9me3_cDC.bw

H3K9me3

cDC

cDC_H3K27me3

reads

./data/bw/H3K27me3_cDC.bw

H3K27me3

cDC

cDC_H3K27ac

reads

./data/bw/H3K27ac_cDC.bw

H3K27ac

cDC

The first column (name) is a unique name for labeling the data; the second column indicate the type of experiment. Here we have either “regions” (genomic regions in bed format) or “reads” (genomic signals in bigwig or bam format). The third column is the file path to the data. You can include additional columns to annotate your data. In our example, the 4th column (factor) indicates the protein measured by the ChIP-Seq and the 5th collumn indicates the cell, where experiments were performed. You can add any more columns and the column names identify the feature.

Boxplot

After defining the experiment matrix, now you can simply run RGT-Viz under “rgt_viz_example” directory by:

rgt-viz boxplot Matrix_cDC.txt -o results -t boxplot_cDC -g None -s reads -c regions

Boxplot is used to show the signal on the regions and present their association by p-values. The above command includes the parameters such as:

  • Matrix_cDC.txt is the experimental matrix which contains the design of the data;

  • -o indicates the output directory;

  • -t defines the title of this experiment;

  • -g defines how we group the analyses;

  • -c defines how we want to color the bars in the boxplot;

  • -s defines how we sort the data on the axis of row.

This command generates the following result:

This plot can give some intersting insights on the data. For example, while we observe that cobinding peaks (Irf8+PU.1) have higher Irf8 signals that non-cobinding Irf8 peaks. The same pattern is observed for PU.1. signals. This indicates that there is synergistic event, i.e. more problable binding, when both factors are present in cDC cells.

Lineplot

The next example is to generate lineplot which shows the average of the signal across the defined regions by a sliding window. You can run it with the command below:

rgt-viz lineplot Matrix_cDC.txt -o results -t lineplot_cDC -col reads -c regions
  • Matrix_cDC.txt is the experimental matrix which contains the design of the data;

  • -o indicates the output directory;

  • -t defines the title of this experiment.

  • -col defines the way to group data in columns, here we use “reads”, which is one of the headers in Matrix_cDC.txt;

This command will generate a directory “results” with figures and html pages. You can check the result by opening results/index.html

This lineplot shows the genomic signals of different histone modifications on the PU.1/IRF8 peaks in cDC. This plot is somewhat analogous to the previous boxplot, while providing nuances about the location of the reads around the mid point of the peaks. One interesting read out is the fact Irf8-PU1 co-binding is highly associated with enhancers (higher H3K4me1), while PU.1. alone has higher promoter markers (H3K4me3). This suggests Irf8 binding (without PU.1.) is preferential promoter related, while Irf8+PU.1 are enhancer related.

Add one more cell type

Lineplot is designed to compare more categories of data. Here we show another example to include one more cell type (pDC and cDC). This allow us to inspect if patterns related to Irf8/PU.1 are cell specific.

rgt-viz lineplot Matrix_cDC_pDC.txt -o results -t lineplot_cDC_pDC -col reads -c regions -row cell -scol
  • Matrix_CDP_cDC.txt is the experimental matrix which contains the design of the data;

  • -c defines the way to color the lines, here we use “regions” as the tag to show different regions in different colors;

  • -row defines the way to group data in rows, here we use “cell”;

  • -scol shares the y-axis for the plots in the same columns.

This lineplot shows the difference of histone signatures of Irf8 and PU.1 peaks among cDC and pDC cells. It indicates for examples stronger Irf8 signals (with or without PU.1.) in cDC than pDC. Moreover, there is no signs of promoter associated binding of Irf8 (or PU.1) in pDCs in contrast to cDC cells.

One can also reoganize results, to contrast the signals across cells.

rgt-viz lineplot Matrix_cDC_pDC.txt -o results -t lineplot_cDC_pDC_2 -col reads -c cell -row regions
  • -col defines the way to group data in columns, here we use “reads”;

  • -c defines the way to color the lines, here we use “cell” as the tag to show different cells in different colors;

  • -row defines the way to group data in rows, here we use “regions”.

This makes the distinct of Irf8 signals in pDC vs. cDCs more clear than previous vizualisation. Therefore, by changing the experimental matrix or the way to present, you can generate more complicated lineplot for comparison of your data across cell types, treatments, histone modification, or any other designs. RGT-Viz allows several other plots variants.

Heatmap

We also have another way to present the data with heatmap. Heatmap is a feature of lineplot command with a parameter -heatmap. Here are the example commands and their results.

rgt-viz lineplot Matrix_cDC_pDC.txt -o results -t heatmap_cDC_pDC -heatmap -row reads -col regions -g cell -c None

References

  1. Lin Q, Chauvistre H, Costa IG, Mitzka S, Gusmao EG, Haenzelmann S, Baying B, Hennuy B, Smeets H, Hoffmann K, Benes V, Sere K, Zenke M, Epigenetic and Transcriptional Architecture of Dendritic Cell Development, Nucleic Acids Research, 43:9680-9693, [paper][data][genome tracks]