Recently I went on a seminar trip, and happened to discuss DNase-seq data analysis with colleagues. I realized that many people didn’t know about our paper on the potential issues with DNase-seq footprint analysis.
After our original paper, the Stam lab submitted a correspondence to Nature Method challenging our study, and we were asked to submit a response. Both were submitted to reviewers, who turned out to be overwhelmingly supportive of our study. Unfortunately based on these reviews, the editor decided not to publish the correspondence and our response, which could have been informative to the research community. I still see people toil with DNase-seq footprint analysis now, only to reach similar conclusions as we did in 2014. So, instead of a lengthy blog about our original paper, I would like to include this response we wrote, which clearly summarized the technical issues with DNase-seq footprint analysis. I apologize for not being able to include the original correspondence from the Stam lab, because I don’t have their permission to post here, but I hope the readers can guess.
Overall the DNase-seq data from the Stam lab has been very high quality and extremely valuable to the community, and is one of the crowning successes of the ENCODE project. However, we cautioned the liberal calls of DNase-seq footprints, due to DNase I cutting bias and over dispersion of sequencing noise. Similar caution should also be given in ATAC-seq footprint analysis, in fact we see even stronger cutting bias in ATAC-seq. Instead of footprint analysis, we believe DNase/ATAC-seq peak heights (or read count in the peaks) with motif hits could better predict TF binding.