Feb 252014
 

Saw some very provocative blog articles on Lior Pachter’s blog attacking Barabasi and Kellis’s recent network biology papers, and the Kellis group response. Lior is probably a little harsh in his tone, but the irregularities he pointed out in the Kellis study are probably true. Otherwise, it would have been easy for Manolis to show Lior the code / data to reproduce the figures and claim the $100, story closed. The group’s reputation is worth the effort, regardless of whether the wager is $100 or $60K. In fact, any bright graduate student working on network biology should try out the step-by-step instructions by Manolis, study the code, and post their results on Lior’s blog. This not only is a good learning experience for the students, but also a big favor to the many scientists who are curious about the results.

Confucius says (don’t know whether I can translate this well): if someone point out our problem, if we indeed have it, we correct it; if we don’t, we should remind ourselves not to fall for such problems. We experienced some difficulties in trying to reproduce other high profile paper results, mostly because the supp material is not detailed enough. It is a painful process, I have to say, so I think calling for more detailed supp material and code is reasonable.

I myself might fall victims too if others take closer examination of our studies, and that’s what I will ask my group to be careful for in our future publications. For the papers we already published, I can only pray we did as much as we should have. Jun Liu once told me that papers we published are like sprinkled water, and in Chinese, this phrase is used to describe married daughters. Actually my close colleagues published a Nat Genetics paper evaluating all the Nat Genetics paper analysis results in the previous 2 years. Our own Carroll et al paper was evaluated and unfortunately not among the 2 they called reproducible. Although I wasn’t happy that their differential expressed gene list is only within 1% difference from our reported list and we were called “irreproducible”, I could only ask lab members to be more specific about our parameter settings the next time.

Lior’s next blog is quite personal and damning, which I am not sure I approve. But I like the last two paragraphs on “Methods matter”, and especially the objective comments by Erik van Nimwegen and Marc RobinsonRechavi. The blog calls for scientists to provide enough methodological details and code for their papers, reviewers to take more serious look at the method in manuscript evaluation, and new comers not to look at a paper merely from its journal IF. For bioinformatics papers to appear in high profile journals, studies often overstate their results. The blog about publishing bioinformatics in high profile journals, although funny and cynical, has some truth to it. Good computational biology methods will stand the test of time, and the good conceptual ideas might benefit many other computational biology studies, even if they don’t look totally novel or revolutionary or get published in high profile journals.

It might not be fair to focus just on individual investigators. Computational biology as a discipline should aim to establish our credibility and respect from colleagues in maths / statistics, computer sciences, and biology. I hope computational biologists can recognize the problem, have a community of peers to discuss and work together. I hope knowing there are scientists like Lior out there will make all of us more rigorous scientists. In fact, Lior’s blog puts himself on the test and he has to be a good model himself for his own future studies. Time will tell…

P.S. Had a talk with Rafa and Cliff 2 days after I posted the original blog, and they made some excellent points. It is OK for Lior to attack Manolis’ paper on its technical ground. Scientists should be able to openly criticize other people’s science, whether the authors are friends or foes. But attacking the authors for fraud is something very serious and totally different in nature from calling a paper nonsense. We genomics and informatics people, when reading the blog, can understand this. However, if people not in the field who don’t understand the nuances get the message that an MIT computational biology professor committed fraud, it really hurts people. This type of damage is something you can’t let the genie back in the bottle, and this kind of accusation is unhealthy to the field. From Lior’s blog, what I agreed is that Manolis’ method might not be as novel, the parameter setting was not rigorous, and it might not work as well as the paper claimed. This has been a systemic problem of a promising new field, an issue Rafa and many other colleagues have raised before and we should all be more careful of, but I would never call the authors fraud. Lior’s points about “methods matter” and “not overstating the results” could have reached more audiences if he has attacked the science of the paper rather than the authors’ integrity. If my blog above has lead people to believe otherwise, then I could understand the authors’ chagrin, and I offer my public apologies to Manolis and his co-authors.

Feb 112014
 

Just heard about Illumina’s NextSeq Machine. It is a desktop machine that delivers the speed (one day runs) and the reads (400M 2 * 150bp reads), and will really democratize sequencing. This might be the best machine for several labs in a department, a floor, or a small center. It might be extremely valuable for clinical applications, and probably will replace HiSeq, MiSeq, or Ion Proton as the work force for research investigator sequencing. Found a very interesting blog about NextSeq. I do believe that the NextSeq will be very appealing to many labs if the two-color thing works out.

If indeed departments or several labs share a NextSeq, the informatics might become a bottle neck. That’s where Bina Technologies might come to the rescue.

Update May 2014: DFCI bought a NextSeq and it is really delivering both the speed and the reads, so we are getting another one. It is amazing how fast and reliable Illumina is pushing out each new generation of their sequencing machines.