Tutorial of regions versus regions
In this tutorial, we will demonstrate how we can use RGT-Viz to visualize association among different region sets.
Download the data
We will use the epigenetic data from dendritic cell development study as example. There, we have ChIP-Seq data from the transcription factor PU.1 and IRF8, and histone modifications H3K4me1, H3K4me3, H3K9me3, H3K27me3, and H3K27ac on four cellular states: multipotent progenitors (MPP), dendritic cell progenitors (CDP), common dendritic cells (cDC) and plamatocyte dendritic cells (pDC). The functional annotation of these histone markers are showed as follows:
H3K4me1 is enriched at active and primed enhancers;
H3K4me3 is highly enriched at active promoters near Transcription start site (TSS);
H3K9me3 is a marker of heterochromatin which has pivotal role during lineage commitement;
H3K27me3 is associated with the downregulation of nearby genes via the formation of heterochromatic regions;
H3K27ac is accociated with the higher activation of transcription and defined as an active enhancer marker.
The peaks of PU.1 and IRF8 are further processed into 3 groups: overlapping peaks of PU.1 and IRF8, PU.1 peaks (no IRF8), and IRF8 peaks (no PU.1). Those files are listed below:
PU1_IRF8_pDC_overlap_peaks.bed
PU1_pDC_noIRF8_peaks.bed
IRF8_pDC_noPU1_peaks.bed
PU1_IRF8_cDC_overlap_peaks.bed
PU1_cDC_noIRF8_peaks.bed
IRF8_cDC_noPU1_peaks.bed
Next, please download the folder “rgt_viz_example” from here.
unzip rgt_viz_example
cd rgt_viz_example
Now you have the files as described below:
data/
├── bw
│ ├── H3K27ac_cDC.bw
│ ├── H3K27ac_CDP.bw
│ ├── H3K27ac_MPP.bw
│ ├── H3K27ac_pDC.bw
│ ├── H3K27me3_cDC.bw
│ ├── H3K27me3_CDP.bw
│ ├── H3K27me3_MPP.bw
│ ├── H3K27me3_pDC.bw
│ ├── H3K4me1_cDC.bw
│ ├── H3K4me1_CDP.bw
│ ├── H3K4me1_MPP.bw
│ ├── H3K4me1_pDC.bw
│ ├── H3K4me3_cDC.bw
│ ├── H3K4me3_CDP.bw
│ ├── H3K4me3_MPP.bw
│ ├── H3K4me3_pDC.bw
│ ├── H3K9me3_cDC.bw
│ ├── H3K9me3_CDP.bw
│ ├── H3K9me3_MPP.bw
│ ├── H3K9me3_pDC.bw
│ ├── IRF8_cDC.bw
│ ├── IRF8_pDC.bw
│ ├── PU1_cDC.bw
│ ├── PU1_CDP.bw
│ ├── PU1_MPP.bw
│ └── PU1_pDC.bw
└── peaks
├── H3K4me3_cDC_WT_peaks.bed
├── H3K4me3_CDP_WT_peaks.bed
├── H3K4me3_MPP_WT_peaks.bed
├── H3K4me3_pDC_WT_peaks.bed
├── PU1_IRF8_cDC_overlap_peaks.bed
├── IRF8_cDC_noPU1_peaks.bed
├── PU1_cDC_noIRF8_peaks.bed
├── PU1_IRF8_pDC_overlap_peaks.bed
├── IRF8_pDC_noPU1_peaks.bed
└── PU1_pDC_noIRF8_peaks.bed
These directories include the genomic signals of histone modifications (files with a .bw ending as generated by bamCoverage) and the genomic regions of PU.1 and IRF8 peaks (files with .narrowPeak endings as generated by MACS2) in different DC cells.
With these data, the first question we would like to ask is: are PU.1 and IRF8 co-binders in DC differentiation? If so, in which cells?
Intersection test
For evaluating the association between PU.1 and IRF8, the intersection test is applied on the ChIP-seq binding regions of PU.1 on all cell types to compare with the ChIP-seq binding regions of IRF8 on cDC and pDC (the ChIP-seq binding regions of IRF8 in CDP and MPP are not available).
rgt-viz intersect -r Matrix_PU1.txt -q Matrix_IRF8.txt -o results -t PU1_IRF8_intersection -organism mm9 -stest 30
-r is reference region set as the base for statistics;
-q is query region set for testing its association with the reference regions;
-o indicates the output directory;
-t defines the title of this experiment;
-c defines the color tag for cloring the test;
-organism defines the genome assembly used here;
-stest defines the repitition times of random subregion test between reference and query. The more repitition times are, the more reliable the result is. However, it take time to run.
This command will generate a directory “results/PU1_IRF8_intersection” with figures and html pages.
The exact numbers of intersected regions between PU.1 and IRF8, and p-values are shown in below table:
Reference name |
Query name |
Reference number |
Query number |
Intersect. | Average intersect. |
Chi-square statistic |
Positive Association p-value |
Negative Association p-value |
---|---|---|---|---|---|---|---|---|
MPP_PU1_peaks | cDC_IRF8_peaks | 6212 | 34003 | 4412 | 1163 | 3359 | 0 | 1.00 |
MPP_PU1_peaks | pDC_IRF8_peaks | 6212 | 6467 | 1745 | 848.5 | 351.3 | 5.1e-77 | 1.00 |
CDP_PU1_peaks | cDC_IRF8_peaks | 20237 | 34003 | 14003 | 6574 | 4438 | 0 | 1.00 |
CDP_PU1_peaks | pDC_IRF8_peaks | 20237 | 6467 | 4078 | 1494 | 1979 | 0 | 1.00 |
cDC_PU1_peaks | cDC_IRF8_peaks | 20054 | 34003 | 13973 | 6538 | 4489 | 0 | 1.00 |
cDC_PU1_peaks | pDC_IRF8_peaks | 20054 | 6467 | 4066 | 1497 | 1955 | 0 | 1.00 |
pDC_PU1_peaks | cDC_IRF8_peaks | 21050 | 34003 | 14307 | 6757 | 4367 | 0 | 1.00 |
pDC_PU1_peaks | pDC_IRF8_peaks | 21050 | 6467 | 4137 | 1478 | 2100 | 0 | 1.00 |
The intersection test reveals that PU.1 and IRF8 are associated significantly in all cell types as shown in the 8th column. Though there are many overlaps between IRF8 and PU.1 in all cell types, the table shows that the highest number of overlaps appears between PU.1 and IRF8 in cDC.
Jaccard test
Alternatively, we can use Jaccard test to evaluate the association level between PU.1 and IRF8 by comparing with jaccard index from repeating randomization.
Run the command:
rgt-viz jaccard -r Matrix_PU1.txt -q Matrix_IRF8.txt -o results -t PU1_IRF8_jaccard -organism mm9
This command will generate a directory “results/PU1_IRF8_jaccard” with figures and html pages.
We can also look at the statistic numbers and p-values as shown below:
Reference name |
Query name |
Reference number |
Query number |
True Jaccard index |
Average random Jaccard |
Positive Association p-value |
Negative Association p-value |
---|---|---|---|---|---|---|---|
MPP_PU1_peaks | cDC_IRF8_peaks | 6212 | 34003 | 0.1101 | 0.0007 | 0 | 1.00 |
MPP_PU1_peaks | pDC_IRF8_peaks | 6212 | 6467 | 0.1387 | 0.0004 | 0 | 1.00 |
CDP_PU1_peaks | cDC_IRF8_peaks | 20237 | 34003 | 0.3125 | 0.0019 | 0 | 1.00 |
CDP_PU1_peaks | pDC_IRF8_peaks | 20237 | 6467 | 0.1430 | 0.0007 | 0 | 1.00 |
cDC_PU1_peaks | cDC_IRF8_peaks | 20054 | 34003 | 0.3140 | 0.0019 | 0 | 1.00 |
cDC_PU1_peaks | pDC_IRF8_peaks | 20054 | 6467 | 0.1437 | 0.0007 | 0 | 1.00 |
pDC_PU1_peaks | cDC_IRF8_peaks | 21050 | 34003 | 0.3147 | 0.0020 | 0 | 1.00 |
pDC_PU1_peaks | pDC_IRF8_peaks | 21050 | 6467 | 0.1402 | 0.0007 | 0 | 1.00 |
Projection test
We next evaluate the association between IRF8 and PU.1 binding sites and histone modification markers. For this, we can use projection test. It evaluates the association between a set of region sets (peaks of Irf8, PU1 and Irf8/PU1) vs. a background regions (H3K4me1 peaks in cDC cells) by evaluating intersection counts with a random binomial model.
rgt-viz projection -r Matrix_H3K4me1.txt -q Matrix_cDC_pDC.txt -o results -t projection -c factor -organism mm9 -g cell
-r is reference region set as the base for statistics;
-q is query region set for testing its association with the reference regions;
-o indicates the output directory;
-t defines the title of this experiment;
-c defines the color tag for cloring the test;
-organism defines the genome assembly used here;
-g defines the group tag for grouping the test.
This command will generate a directory “results/projection” with figures and html pages.
These results indicates the majority of peaks associated with H3K4me3 in cDC are of PU.1 and Irf8 co-binding, while H34me3 pekas are associated with Irf8 peaks.
Combinatorial test
We next ask if co-binding of PU.1 and IRF8 relates to different histone modifications. For this, we use combinatorial test. This is another variant of intersection test that checks all the combinations of the given query regions and calcuate their intersections to the reference. This test is useful in exploring the unknown association between region sets.
rgt-viz combinatorial -o results -q Matrix_H3K4me1_cDC_pDC.txt -r Matrix_PU1_IRF8_peaks.txt -t combinatorial -organism mm9 -g cell -c factor