Jul 262012

In a recent NIH study section trip, Xihong told me that it is important to belong to a community. I have picked computational cancer epigenetics as my future research direction, which is naturally interdisciplinary. As a result, I don’t quite belong to any of the following communities (characterized by the conference they go to): statistics (JSM), bioinformatics (ISMB), genomics (Biology of Genome), chromatin (CSHL, Gordon, or Keystone conferences on chromatin and epigenetics), or cancer (AACR). Most of my closest colleagues to go the Cold Spring Harbor Systems Biology of Gene Regulation, which include people who use genomics and bioinformatics approaches to study transcriptional and epigenetic gene regulation. However, I follow their work closely, so sometimes don’t learn as much from this meeting any more.

Over the years, I have grown to enjoy the domain biology meetings such as cancer or chromatin meetings much more than the genomics and bioinformatics meetings. In the future, I should alternate between the CSHL Systems Biology and Biology of Genome meetings, then go to one cancer (AACR) and one epigenetics (keystone or CSHL) meeting every year. Recently there are also some good cancer epigenetics meetings, which could be very interesting.

Jul 152012

Wei Li stopped by Shanghai to attend the Tongji summer camp (for graduate student recruitment). He told me that Gongming Pu said that nowadays the way to publish a Cell, Nature, and Science (CNS) paper is to use new technologies to re-investigate decade-old problems that are published in CNS, and it is especially exciting when the new technology gives different results as previously reported in those CNS papers.

This is quite an interesting idea. I only determined to focus my research on cancer last spring, so don’t quite know the general landscape of the cancer field, nor understand what the big and important cancer problems are. I started searching for original research papers related to cancer that are published in Nature, Science, Cell, and Cancer Cell, and only found ~1800 hits since 1990. Even if I include JAMA and New England Journal, the total hit is less than 4000. If I read the abstracts, and occasional the full paper, of 20 papers a day four days a week, I could finish all in a year. I will do it with members of the lab in the coming year.

Jul 132012

In 2011, we spent some efforts looking at integrating ChIP-seq with GWAS data. That led me to the realization that for cancer studies, it is much more fruitful to study somatic mutations than germline mutations, and studying normal populations are less likely to be cost-effective.

Ever since we sequenced the LNCaP / abl and MCF7 / LTED genomes, I have been thinking of establishing our whole genome sequence analysis capacity in Tongji University, China. Our assistant professor Jianxing Feng got his CS PhD from Tsinghua University specializing in algorithms, so we thought that he would like the computational challenge. We held a focused journal club reviewing the high impact computational and biological papers for genome sequencing. To our surprise and disappointment, most of the existing algorithms are just brute force intuitive software with little algorithmic or statistic component.

Going to IBW, I realized that we are late in the whole genome or exome sequencing game. Many computational groups domestic and overseas are already analyzing massive amount of genome/exome sequencing data. The trend is clear, the first group can publish a good paper with only one whole genome; the second group will need to sequence 2 genomes; then future groups need to sequence 5 (pairs of) genomes, 10, 50, 100, etc to publish a good paper. The bar will rise just like for GWAS studies: the community would expect the sequencing studies to understand the function and consequences of these mutations. That’s where we have some expertise and should be prepared to make an impact.

Recent exome sequencing and whole genome sequencing comparing cancer normal or primary metastatic cancer genomes have yielded many exciting findings. The easy cases to investigate functional mutations are genes with copy number gain or loss, and most of these genes are clear oncogenes or tumor suppressors likely already identified before with CGH or SNP arrays. The functional consequences of these genes are easy to investigate with knockdown / knockout or over expression assays. Our current approach of combining RNA-seq with DNase-seq to profile the wild type vs knockdown / overexpression conditions is a good screening approach to generate initial hypothesis.

One area that is likely to create new research opportunities is long noncoding RNA (lncRNA). Theoretically CGH and SNP studies should have information on their copy number changes, except that previously people didn’t realize that they were genes. In addition to using RNA-seq and DNase-seq to investigate their function, one informative experiment might be to use oligo probes to specifically pull down the lncRNA and mass spec to study the proteins that interact with it. John Rinn seems to have some expertise in this area, and we should also explore this technique.

If enough tumors have been sequenced, and still people only observe point mutations but not copy number variations, it would indicate the mutation is not having weaker or stronger regulation of existing network of genes. The reason is that tumors could increase or decrease copy numbers to achieve similar goals of exerting stronger and weaker regulation. Instead, the mutation must be creating new links in the regulatory network. This type of gain of function mutations could be investigated by knocking in genes carrying the specific mutation, and examining its downstream consequences. This is not a trivial experiment, and we might need to think of more efficient ways to study these mutations.