Tutorial of regions versus regions

In this tutorial, we will demonstrate how we can use RGT-Viz to visualize association among different region sets.

Download the data

We will use the epigenetic data from dendritic cell development study as example. There, we have ChIP-Seq data from the transcription factor PU.1 and IRF8, and histone modifications H3K4me1, H3K4me3, H3K9me3, H3K27me3, and H3K27ac on four cellular states: multipotent progenitors (MPP), dendritic cell progenitors (CDP), common dendritic cells (cDC) and plamatocyte dendritic cells (pDC). The functional annotation of these histone markers are showed as follows:

  • H3K4me1 is enriched at active and primed enhancers;

  • H3K4me3 is highly enriched at active promoters near Transcription start site (TSS);

  • H3K9me3 is a marker of heterochromatin which has pivotal role during lineage commitement;

  • H3K27me3 is associated with the downregulation of nearby genes via the formation of heterochromatic regions;

  • H3K27ac is accociated with the higher activation of transcription and defined as an active enhancer marker.

The peaks of PU.1 and IRF8 are further processed into 3 groups: overlapping peaks of PU.1 and IRF8, PU.1 peaks (no IRF8), and IRF8 peaks (no PU.1). Those files are listed below:

  • PU1_IRF8_pDC_overlap_peaks.bed

  • PU1_pDC_noIRF8_peaks.bed

  • IRF8_pDC_noPU1_peaks.bed

  • PU1_IRF8_cDC_overlap_peaks.bed

  • PU1_cDC_noIRF8_peaks.bed

  • IRF8_cDC_noPU1_peaks.bed

Next, please download the folder “rgt_viz_example” from here.

unzip rgt_viz_example
cd rgt_viz_example

Now you have the files as described below:

data/
├── bw
│   ├── H3K27ac_cDC.bw
│   ├── H3K27ac_CDP.bw
│   ├── H3K27ac_MPP.bw
│   ├── H3K27ac_pDC.bw
│   ├── H3K27me3_cDC.bw
│   ├── H3K27me3_CDP.bw
│   ├── H3K27me3_MPP.bw
│   ├── H3K27me3_pDC.bw
│   ├── H3K4me1_cDC.bw
│   ├── H3K4me1_CDP.bw
│   ├── H3K4me1_MPP.bw
│   ├── H3K4me1_pDC.bw
│   ├── H3K4me3_cDC.bw
│   ├── H3K4me3_CDP.bw
│   ├── H3K4me3_MPP.bw
│   ├── H3K4me3_pDC.bw
│   ├── H3K9me3_cDC.bw
│   ├── H3K9me3_CDP.bw
│   ├── H3K9me3_MPP.bw
│   ├── H3K9me3_pDC.bw
│   ├── IRF8_cDC.bw
│   ├── IRF8_pDC.bw
│   ├── PU1_cDC.bw
│   ├── PU1_CDP.bw
│   ├── PU1_MPP.bw
│   └── PU1_pDC.bw
└── peaks
    ├── H3K4me3_cDC_WT_peaks.bed
    ├── H3K4me3_CDP_WT_peaks.bed
    ├── H3K4me3_MPP_WT_peaks.bed
    ├── H3K4me3_pDC_WT_peaks.bed
    ├── PU1_IRF8_cDC_overlap_peaks.bed
    ├── IRF8_cDC_noPU1_peaks.bed
    ├── PU1_cDC_noIRF8_peaks.bed
    ├── PU1_IRF8_pDC_overlap_peaks.bed
    ├── IRF8_pDC_noPU1_peaks.bed
    └── PU1_pDC_noIRF8_peaks.bed

These directories include the genomic signals of histone modifications (files with a .bw ending as generated by bamCoverage) and the genomic regions of PU.1 and IRF8 peaks (files with .narrowPeak endings as generated by MACS2) in different DC cells.

With these data, the first question we would like to ask is: are PU.1 and IRF8 co-binders in DC differentiation? If so, in which cells?

Intersection test

For evaluating the association between PU.1 and IRF8, the intersection test is applied on the ChIP-seq binding regions of PU.1 on all cell types to compare with the ChIP-seq binding regions of IRF8 on cDC and pDC (the ChIP-seq binding regions of IRF8 in CDP and MPP are not available).

rgt-viz intersect -r Matrix_PU1.txt -q Matrix_IRF8.txt -o results -t PU1_IRF8_intersection -organism mm9 -stest 30
  • -r is reference region set as the base for statistics;

  • -q is query region set for testing its association with the reference regions;

  • -o indicates the output directory;

  • -t defines the title of this experiment;

  • -c defines the color tag for cloring the test;

  • -organism defines the genome assembly used here;

  • -stest defines the repitition times of random subregion test between reference and query. The more repitition times are, the more reliable the result is. However, it take time to run.

This command will generate a directory “results/PU1_IRF8_intersection” with figures and html pages.

The exact numbers of intersected regions between PU.1 and IRF8, and p-values are shown in below table:

Reference
name
Query
name
Reference
number
Query
number
Intersect. Average
intersect.
Chi-square
statistic
Positive
Association
p-value
Negative
Association
p-value
MPP_PU1_peaks cDC_IRF8_peaks 6212 34003 4412 1163 3359 0 1.00
MPP_PU1_peaks pDC_IRF8_peaks 6212 6467 1745 848.5 351.3 5.1e-77 1.00
CDP_PU1_peaks cDC_IRF8_peaks 20237 34003 14003 6574 4438 0 1.00
CDP_PU1_peaks pDC_IRF8_peaks 20237 6467 4078 1494 1979 0 1.00
cDC_PU1_peaks cDC_IRF8_peaks 20054 34003 13973 6538 4489 0 1.00
cDC_PU1_peaks pDC_IRF8_peaks 20054 6467 4066 1497 1955 0 1.00
pDC_PU1_peaks cDC_IRF8_peaks 21050 34003 14307 6757 4367 0 1.00
pDC_PU1_peaks pDC_IRF8_peaks 21050 6467 4137 1478 2100 0 1.00

The intersection test reveals that PU.1 and IRF8 are associated significantly in all cell types as shown in the 8th column. Though there are many overlaps between IRF8 and PU.1 in all cell types, the table shows that the highest number of overlaps appears between PU.1 and IRF8 in cDC.

Jaccard test

Alternatively, we can use Jaccard test to evaluate the association level between PU.1 and IRF8 by comparing with jaccard index from repeating randomization.

Run the command:

rgt-viz jaccard -r Matrix_PU1.txt -q Matrix_IRF8.txt -o results -t PU1_IRF8_jaccard -organism mm9

This command will generate a directory “results/PU1_IRF8_jaccard” with figures and html pages.

We can also look at the statistic numbers and p-values as shown below:

Reference
name
Query
name
Reference
number
Query
number
True
Jaccard
index
Average
random
Jaccard
Positive
Association
p-value
Negative
Association
p-value
MPP_PU1_peaks cDC_IRF8_peaks 6212 34003 0.1101 0.0007 0 1.00
MPP_PU1_peaks pDC_IRF8_peaks 6212 6467 0.1387 0.0004 0 1.00
CDP_PU1_peaks cDC_IRF8_peaks 20237 34003 0.3125 0.0019 0 1.00
CDP_PU1_peaks pDC_IRF8_peaks 20237 6467 0.1430 0.0007 0 1.00
cDC_PU1_peaks cDC_IRF8_peaks 20054 34003 0.3140 0.0019 0 1.00
cDC_PU1_peaks pDC_IRF8_peaks 20054 6467 0.1437 0.0007 0 1.00
pDC_PU1_peaks cDC_IRF8_peaks 21050 34003 0.3147 0.0020 0 1.00
pDC_PU1_peaks pDC_IRF8_peaks 21050 6467 0.1402 0.0007 0 1.00

Projection test

We next evaluate the association between IRF8 and PU.1 binding sites and histone modification markers. For this, we can use projection test. It evaluates the association between a set of region sets (peaks of Irf8, PU1 and Irf8/PU1) vs. a background regions (H3K4me1 peaks in cDC cells) by evaluating intersection counts with a random binomial model.

rgt-viz projection -r Matrix_H3K4me1.txt -q Matrix_cDC_pDC.txt -o results -t projection -c factor -organism mm9 -g cell
  • -r is reference region set as the base for statistics;

  • -q is query region set for testing its association with the reference regions;

  • -o indicates the output directory;

  • -t defines the title of this experiment;

  • -c defines the color tag for cloring the test;

  • -organism defines the genome assembly used here;

  • -g defines the group tag for grouping the test.

This command will generate a directory “results/projection” with figures and html pages.

These results indicates the majority of peaks associated with H3K4me3 in cDC are of PU.1 and Irf8 co-binding, while H34me3 pekas are associated with Irf8 peaks.

Combinatorial test

We next ask if co-binding of PU.1 and IRF8 relates to different histone modifications. For this, we use combinatorial test. This is another variant of intersection test that checks all the combinations of the given query regions and calcuate their intersections to the reference. This test is useful in exploring the unknown association between region sets.

rgt-viz combinatorial -o results -q Matrix_H3K4me1_cDC_pDC.txt -r Matrix_PU1_IRF8_peaks.txt -t combinatorial -organism mm9 -g cell -c factor