Introduction

Method

HINT (Hmm-based IdeNtification of Transcription factor footprints) is a framework that uses open chromatin data to identify the active transcription factor binding sites. This method is originally proposed to model the active binding sites by simultaneous analysis of DNase-seq and the ChIP-seq profiles of histone modifications on a genome-wide level (paper).

The HMM has as input a normalized and a slope signal of DNase-seq and one of the histone marks. It can, therefore, detect the increase, top, and decrease regions of either histone modification and DNase signals. And we next modified HINT to allow only DNase-seq data by removing the three histone-level states and the use of bias-corrected DNase-seq signal before normalization steps.

Recently, we extended HINT to ATAC-seq, a new assay to identify accessible DNA regions, taking the protocol-specificity into consideration.

Basic Usage

We describe here how to detect footprints using HINT for ATAC-seq, DNase-seq, and histone modifications data. To perform footprinting, you need at least two files, one with the aligned reads of your chromatin data and another describing the regions to detect footprints. You can use a peak caller, such as MACS2, to define these regions of interest.

Footprinting for ATAC-seq data

Download here the example data for ATAC-seq based on chromosome 1 of the GM12878 cell. Execute the following commands to extract the data from the download file:

tar xvfz HINT_ATACTest.tar.gz
cd HINT_ATACTest

and the below command to perform footprinting:

rgt-hint footprinting --atac-seq ATAC.bam ATACPeaks.bed

For simplicity, we use only the first 1000 peaks from chromosome 1. The above commands will output a BED file containing the footprints in your current folder with footprints as the prefix. Moreover, You can set the below arguments

--output-location=your_directory  --output-prefix=your_prefix

to tell HINT your preferred output directory and name. Each footprint, i.e. each line of the BED file, will contain information regarding the tag-count score (number of reads) of each footprint. This score can be used as a footprint quality assessment (the higher values indicates better candidates). In addition, a file including the details of reads and footprints will also be written in the same folder of BED file.

If your data is paired-end, you may want to try another model which is optimized for paired-end sequencing data:

rgt-hint footprinting --atac-seq --paired-end --output-prefix=fp_paired ATAC.bam ATACPeaks.bed

Note: HINT performs bias correction for ATAC-seq by default, so you must download the genomes following these instructions and correctly specify the genome references with the following command before footprinting:

--organism=genome_version

Currently, the default setting is hg19. Find here for more information.

Footprinting for DNase-seq

You can find here example DNase-seq data. Execute the following commands to extract the data from a compressed file:

tar xvfz HINT_DNaseTest.tar.gz
cd HINT_DNaseTest

and the following command to call the footprints:

rgt-hint footprinting --dnase-seq DNase.bam DNasePeaks.bed

We recommend you to use cleavage bias correction. This can be done by using the following command:

rgt-hint footprinting --dnase-seq --bias-correction DNase.bam DNasePeaks.bed

Don’t forget to define the proper genome references using :

--organism=genome_version

Currently, the default setting is hg19.

Footprinting for histone modification data

Download here the example data for histone modification. Execute the following commands to extract data:

tar xvfz HINT_HistoneTest.tar.gz
cd HINT_HistoneTest 

and call footprints

rgt-hint footprinting --histone histone.bam histonePeaks.bed

Citation

If you use HINT with ATAC-seq should cite the following publication:

HINT with DNase with bias correction should cite:

HINT with DNase-seq or histones cite the following publication: