Feb 282012

When I was searching for online teaching material on cancer, I stumbled on the MIT open course Introduction to Biology. It was taught by Eric Lander and Bob Weinberg in 2004. The two are pioneers in genomics and cancer biology, respectively, and both are members of the National Academy of Sciences. They delivered over 30 amazing lectures (with publicly available videos) on introductory biology. They explained biology in simplistic yet fascinating story telling style, and only used chalk without any powerpoint slides. The concepts they explain are super clear and and their messages definitely stick. You could totally see their passion for biology and their charismatic style.

Most elite universities value faculty research over teaching. I always have special respect for faculty who care to teach well in addition to doing good research. Bob and Eric epitomize this type of faculty. I wish I had learned Introduction to Biology from them when I was a freshman in college. As a matter of fact, I would urge all biology majors to watch these videos.

Actually I just found that the videos are hosted here, which also has many other interesting free online courses. E.g. Rafa told me about this Machine Learning course from Stanford. Hooray! I will be going to these lectures for sure! Many of the courses are given a rating, e.g. the Machine Learning one is rated A. Great resource, I hope they keep adding more material over time.

Feb 252012

With the development of high throughput sequencing, ChIP-seq and DNase-seq are becoming increasingly popular to investigate transcriptional and epigenetic gene regulation. As of end of 2011, over 5,000 ChIP-seq and DNase-seq samples are publicly available for human and mouse, but their meta-data annotation has a lot of inconsistencies to prevent data sharing and reuse. At the suggestion of Martha Bulyk, we created a web resource called CistromeMap. Have you ever had questions like, “has anyone done a ChIP-seq of factor X in cell Y?” CistromeMap gives you the answer.

We started this project 1.5 years ago, and was quite bogged down by meta-data inconsistency, mostly on the cell annotations and gene names. We had to train students and redo things a few times. It made me realize the importance of controlled vocabulary in data integration. Yesterday, members of the Center for Functional Cancer Epigenetics at DFCI, Myles Brown, Nelly Polyak, Ramesh Shivdasani, Prakash Rao spent a whole afternoon with our CistromeMap team to correct the cell annotation. I was quite moved by my colleagues’ willingness to help make the cell annotation better.

Since the prototype of the CistromeMap became available in early Jan, I have been using it almost daily. Hopefully other colleagues will find it useful too. Considering the increasing rate of ChIP-seq and DNase-seq paper publication, it takes a lot of work to keep the database accurate and updated. We hope the community will help us maintain and update the database. If you just publish a new ChIP-seq or DNase-seq paper, please visit Here. Just enter the PMID or GSEID/GSMID (or SRA / EBI ID), and we will update the rest of the meta-data within two weeks. We will also create a function to allow users to revise any mistakes in our annotation.

Feb 232012

Postdocs in my lab often come from two different types of background: those with good genomics informatics background but not much quantitative methodology background; those with good quantitative methodology background but not much genomics background. The ones good in both probably would look for even better labs to do postdoc or go directly to a faculty position, and ones good in neither would never make it into our group :).

What I have found over the years is that for postdocs with good quantitative methodology background, they should start with a collaboration project involving a medium level of genomics data complexity. They usually have good methodology development experience, have several first author papers on decent quantitative journals, but were not working on genomics or real data before. From a single collaboration with experimental biologists, the postdocs understand better genomics biology, learn to manipulate high throughput data, use bioinformatics algorithms, interpret the results, and quickly see what computational biology is. Then they can use their previous quantitative skills to develop better quantiative method to solve real problems they see during the collaboration or on new data. However, for postdocs with good genomics background, they should start with a methodology project. These postdocs usually come from laboratories with easy access to unpublished data and already have several second or co-first author papers on excellent journals. They need protected time to think of ideas (rather than dictated by their biologist mentor before), develop method or algorithms to analyze or integrate published data, and figure out how to computationally validate their method for publication.

If a postdoc knows both how to develop and publish their own quantitative method, and how to interact with experimental biologists to answer important biological questions, they are ready to become a faculty. It takes a lot of reading, thinking and hard work to get there, and I try to provide guidance and environment to make this happen.

Feb 162012

In preparation for the application to The Yangtze River Scholar, I have been thinking about our future research directions. These areas best combine my personal interest, existing expertise, and good purpose, so I am very happy to settle on them. There is still a lot to learn, but hopefully with deliberate practice over time I can become a real expert in them.

  1. Bioinformatics: The development of high throughput genomics technologies has created many exciting opportunities and analysis challenges. Our group has developed some of the most widely used and cited bioinformatics methods to analyze these high throughput data. To this end, we will continue to develop novel computational algorithms for new high throughput technologies and techniques. We will also conduct efficient data integration to better mine the hidden biological insights from publicly available high throughput data and refine hypothesis. Finally, we will integrate genomics experimental design and bioinformatics analysis to best utilize the newest technologies in gene regulation studies.
  2. Epigenetics: Epigenetics play an important role in gene regulation, and include diverse toipcs such as DNA methylation, nucleosome positioning, histone marks, epigenetic enzymes, and higher order chromatin interactions. We will focus on two major areas of epigenetic research. The first is use the dynamics of histone mark ChIP-seq and DNase-seq to infer in vivo transcription factor binding and understand transcription regulatory mechanism. And the second is to use genome-wide approaches to understand the specificity and mechanism of epigenetic enzymes and lncRNA (with epigenetic function). These areas are mostly unexplored, and will have a lot of exciting opportunities in the future.
  3. Cancer: Studying the mechanism and finding a cure for cancer is an honorable cause, as one in three people in the developed countries will get cancer. Cancer is a genetic disease amenable for research using genomic approaches, and recently many cancer studies have found mutations or misregulations in epigenetic enzymes so perhaps it is also an epigenetic disease. Many pharmaceutical and biotech companies as well as academic scientists are actively developing cancer drugs targeting epigenetic enzymes. We will study the genome-wide function and response of cancer cells to epigenetic drugs, and identify cancer patients that might respond better to certain cancer drugs based on the genetic and epigenetic status of their tumor.
Feb 112012

I just became a regular member of the NIH GCAT study section, and went to my first review on Thur and Fri. I heard that GCAT is the most frequently requested study sections now, and we have ~80 submissions each cycle. The review was a good experience, and here are some of my thoughts. Hope it will help others in their grant writing in the future.

  1. Reviewers appreciate proposals that have well motivated and focused biological questions, and reasonably (not overly sophisticated) computational methods. Experimental and computational biologists should think together of a good experimental design that really answer a good biological question.
  2. GCAT is a diverse study section, try to write something (at least in language and style) that appeals to people with to biological, genomics, computational, or statistical backgrounds alike.
  3. For computational PIs, it is better to motivate the methodology development on real data, have an experimental co-investigator, have some experimental validation, and have publication record to show that their computational work is biologically relevant. For experimental PIs, proposal shouldn’t just generate a lot of data, but should explain how to analyze the data and what biology can be learned from it.
  4. Make reviewers’ job easier. Write very clearly the hypothesis, aims, innovations, and significance. Also, have shorter paragraphs, use bold and italics to focus reviewers’ attention and improve the readability of a proposal.
  5. In the current funding environment, any grant with a single weakness could loose out. Any potential weakness needs to add co-investigators or support letters from real experts to address these weakness. Also, it is better to have publications to show previous collaboration record with the co-investigator, even if only mentioned in the support letter.
  6. Reviewers definitely look at productive publication record and previous work impact. It takes years of focused efforts to become a real expert and have a real impact in some fields. Use new technologies well and wisely, but don’t always chase the newest and hottest thing.
  7. PI should devote reasonable effort, usually 15-25% (new investigators could go higher). Reviewers prefer applicants with a few well focused PI grants to those with too many (either PI or co-PI) grants relative to their productivity.
  8. Make genomics data/resource generation and software publicly available and widely used. It will pay off for future applications. Do well by doing good.
  9. New investigators asking non-modular budget or non-modular proposal asking $499K could be conceived by the reviewers as too greedy. Also computational PIs proposing too many or too expensive wet lab experiments could be criticized.
  10. Proposals not discussed at one review might not be worth a revised submission. It is better to work hard on projects dear to your heart and submit a new proposal.

With the current funding environment and grant scoring system, grant reviews definitely have chance effect. So submit more proposals (e.g. one of my colleagues taught me to always have two grants pending), just so you have chance to get some.