Aug 142014

Recently a postdoc in the lab asked me whether it is worth joining the editorial board of a new open source journal and whether this will be considered favorably during his later faculty job search.

There has been a wave of new open source journals. Frankly, with the large number of journals making paper open 6 months after publication, the requirement of NIH to put all NIH-funded publications into PubMed Central, as well as the large number of existing good open source journals, it doesn’t make a whole lot of sense from a scientific point of view to start new open source journals. Unless we work on a new field (like bioinformatics 15 years ago or nanotechnology 10 years ago), don’t we have enough places to publish good science already?

Many good journals don’t ask their faculty editors to handle peer reviews, instead they have well-trained scientists as full time editors to handle the logistics. These journals consult their faculty editors on topics to cover for a special issue, writing special reviews or giving expert-opinions in interviews, or helping a paper decision where reviewers could not reach consensus. I understand that depends on the field, this may not be financially possible for some good journals, but I believe this is a much better use of faculty expertise and time.

Having been on several faculty recruitment and promotion committees, and written promotion evaluation letters for many colleagues, I would say that being on journal editorial board is useful, but only if the journal is reputable. Instead of being on the editorial board of low profile journals, postdocs and junior faculty can probably benefit more from having experience reviewing papers for high profile journals. Since “high profile” might mean differently for different people / fields, I would say only serve on editorial board of a journal if you often read papers from that journal.

Aug 022014

Recently I encountered a number of students from China using my signatures. When visiting students asked me to write letters for visiting invitations, apartment rental, or bank application, I often asked the students to draft the letter so I can put it on my letterhead with my signature. When the students send me the drafted letter, to my surprise a number of times the word document had my electronic signature. I asked the students where they got my signature, they answered that they cropped the signature from a previous pdf I sent them before and pasted on the word document.

Students probably don’t understand that it is a very serious problem to reuse other people’s signature without prior permission from the signature owner every time a signature is used. This kinda violates the honor code, and is considered similarly as cheating in exams, fabricating data in papers, or stealing other people’s credit cards. People who crop other people’s signature to use in one letter will be always under the suspicion for fabricating reference letters later. I would like to seriously warn students against ever doing this.

Jul 252014

Just finished the book “Walt Disney: The Triumph of American Imagination“. Although I later found that it was not the Disney books with the best reviews, I still thoroughly enjoyed it. The introduction at the beginning was a bit scattered and boring, but as soon as the story begins, it is fun to read.

As I was “reading” (listening to audiobook on my commute”, I felt like in many ways Disney is similar to Steve Jobs. Like Steve Jobs, Walt Disney was always totally passionate and absorbed in his projects, and extremely detailed oriented. Unlike Steve Jobs who was always (at least portrayed in the book) completely confident like a maniac, Walt Disney had his doubts and struggles when what he believed in didn’t get the desired outcome, and he would revise his approaches or let his delegates try new things to see how things work. He continued to challenge himself for better creativity, and was willing makes a lot of practical compromises. The book also covered other more humane side of him than Jobs. E.g. how Disney was like to his families, his old colleagues and teachers, how he worked with the Disneyland workers late into the night before it opened. As a scientist, I got totally inspired!

By chance I found this blog: Ten Things I’ve Learned from Walt Disney. Can’t agree more!

Jul 012014

Recently heard some talks about single cell gene expression using either Fluidigm’s microfluidic chip or CyToF. Fluidigm is a microfluidics approach that can do multiplex RNA expression of a few hundred genes in single cells, and the user just need to custom design qPCR primers for the target genes. CyToF is a proteomics approach to look at the protein expression of ~50 genes in single cells, if antibodies for the proteins of interest are available. Both give amazingly robust results, which at the current level, seems to be more cost effective than single-cell RNA-seq. For most of the biological systems, robustly testing 50-300 genes in single cells in a population will be enough to gain the insights people get from single-cell RNA-seq at a much lower cost. Potentially interesting bioinformatics problems will be better selection of the genes for testing. Heard from colleagues that Fluidigm bought the company that made CyTof, and they will be pushing out a new machine that combines the two capabilities. I look forward to many exiting new discoveries and opportunities in this area.

May 012014

I found Skype interviews (with videos) to be very effective in screening applicants. An hour of Skype investment can save whole day processes. Through these interviews, I found some common issues with candidates that I would like to discuss in this blog. Hopefully it will help future candidates to better prepare a Skype interview.

    The day before or a few hours before the Skype meeting, the candidate should send an email with his/her CV and presentation slides. The presentation slides should summarize the candidate’s previous research work, and present one good study in more details. The total time of the presentation should be about 20 min.
    The meeting starts with the candidate checking with the faculty that s/he has received all the application material and reference letters. If there are any letters missing, s/he should follow up with his references after the meeting to get those letters sent.
    The first 20-25 min will be spent on the candidate going through the presentation on his/her previous work. This will show case the candidate’s technical and communication skills.
    The next 20 min is often on discussing the current research in the faculty’s lab. This includes discussion on the faculty’s published work, for which the candidate can ask for clarifications or questions. The candidate should do his due diligence, and shows that s/he has a good mastery of the faculty’s published work. The discussion also includes the faculty explaining the recent unpublished results in the lab, which will show case some of the most exciting projects and opportunities in the lab.
    The last 15 min will be about an area or a project that the postdoc plans to do in the faculty’s lab during his/her postdoc period. It should be an area of great interest to both the candidate and the faculty, can use the candidate / faculty’s existing expertise, but allow both (especially the postdoc) to expand his expertise, learn new things, and make an impact. It is very likely that depending on new technology development, resources or other opportunities, the postdoc might end up doing something different from the one he proposed initially. However, having this discussion during the interview demonstrate the candidate’s ability for independent and critical thinking, and the ability to identify good problems to solve.

In summary, candidates should not treat a Skype interview as an hour of free chat. Instead should read, think, and prepare well, to allow the candidate and faculty to learn most about each other in a short time. This not only allows the faculty to better evaluate a candidate, but also should be a beneficial experience for the candidate in learning and planning his future work.

Feb 252014

Saw some very provocative blog articles on Lior Pachter’s blog attacking Barabasi and Kellis’s recent network biology papers, and the Kellis group response. Lior is probably a little harsh in his tone, but the irregularities he pointed out in the Kellis study are probably true. Otherwise, it would have been easy for Manolis to show Lior the code / data to reproduce the figures and claim the $100, story closed. The group’s reputation is worth the effort, regardless of whether the wager is $100 or $60K. In fact, any bright graduate student working on network biology should try out the step-by-step instructions by Manolis, study the code, and post their results on Lior’s blog. This not only is a good learning experience for the students, but also a big favor to the many scientists who are curious about the results.

Confucius says (don’t know whether I can translate this well): if someone point out our problem, if we indeed have it, we correct it; if we don’t, we should remind ourselves not to fall for such problems. We experienced some difficulties in trying to reproduce other high profile paper results, mostly because the supp material is not detailed enough. It is a painful process, I have to say, so I think calling for more detailed supp material and code is reasonable.

I myself might fall victims too if others take closer examination of our studies, and that’s what I will ask my group to be careful for in our future publications. For the papers we already published, I can only pray we did as much as we should have. Jun Liu once told me that papers we published are like sprinkled water, and in Chinese, this phrase is used to describe married daughters. Actually my close colleagues published a Nat Genetics paper evaluating all the Nat Genetics paper analysis results in the previous 2 years. Our own Carroll et al paper was evaluated and unfortunately not among the 2 they called reproducible. Although I wasn’t happy that their differential expressed gene list is only within 1% difference from our reported list and we were called “irreproducible”, I could only ask lab members to be more specific about our parameter settings the next time.

Lior’s next blog is quite personal and damning, which I am not sure I approve. But I like the last two paragraphs on “Methods matter”, and especially the objective comments by Erik van Nimwegen and Marc RobinsonRechavi. The blog calls for scientists to provide enough methodological details and code for their papers, reviewers to take more serious look at the method in manuscript evaluation, and new comers not to look at a paper merely from its journal IF. For bioinformatics papers to appear in high profile journals, studies often overstate their results. The blog about publishing bioinformatics in high profile journals, although funny and cynical, has some truth to it. Good computational biology methods will stand the test of time, and the good conceptual ideas might benefit many other computational biology studies, even if they don’t look totally novel or revolutionary or get published in high profile journals.

It might not be fair to focus just on individual investigators. Computational biology as a discipline should aim to establish our credibility and respect from colleagues in maths / statistics, computer sciences, and biology. I hope computational biologists can recognize the problem, have a community of peers to discuss and work together. I hope knowing there are scientists like Lior out there will make all of us more rigorous scientists. In fact, Lior’s blog puts himself on the test and he has to be a good model himself for his own future studies. Time will tell…

P.S. Had a talk with Rafa and Cliff 2 days after I posted the original blog, and they made some excellent points. It is OK for Lior to attack Manolis’ paper on its technical ground. Scientists should be able to openly criticize other people’s science, whether the authors are friends or foes. But attacking the authors for fraud is something very serious and totally different in nature from calling a paper nonsense. We genomics and informatics people, when reading the blog, can understand this. However, if people not in the field who don’t understand the nuances get the message that an MIT computational biology professor committed fraud, it really hurts people. This type of damage is something you can’t let the genie back in the bottle, and this kind of accusation is unhealthy to the field. From Lior’s blog, what I agreed is that Manolis’ method might not be as novel, the parameter setting was not rigorous, and it might not work as well as the paper claimed. This has been a systemic problem of a promising new field, an issue Rafa and many other colleagues have raised before and we should all be more careful of, but I would never call the authors fraud. Lior’s points about “methods matter” and “not overstating the results” could have reached more audiences if he has attacked the science of the paper rather than the authors’ integrity. If my blog above has lead people to believe otherwise, then I could understand the authors’ chagrin, and I offer my public apologies to Manolis and his co-authors.

Feb 112014

Just heard about Illumina’s NextSeq Machine. It is a desktop machine that delivers the speed (one day runs) and the reads (400M 2 * 150bp reads), and will really democratize sequencing. This might be the best machine for several labs in a department, a floor, or a small center. It might be extremely valuable for clinical applications, and probably will replace HiSeq, MiSeq, or Ion Proton as the work force for research investigator sequencing. Found a very interesting blog about NextSeq. I do believe that the NextSeq will be very appealing to many labs if the two-color thing works out.

If indeed departments or several labs share a NextSeq, the informatics might become a bottle neck. That’s where Bina Technologies might come to the rescue.

Update May 2014: DFCI bought a NextSeq and it is really delivering both the speed and the reads, so we are getting another one. It is amazing how fast and reliable Illumina is pushing out each new generation of their sequencing machines.

Jan 162014

Here is to another unfinished blog article I started last summer…

When I was a first semester graduate student at Stanford, because of some difficulties in the AI class (I took CS221 without taking the prerequisite CS121), I felt that Stanford made a mistake admitting me there. When people praise our work after my talk, I also often feel afraid that they will find out some caveats in our algorithms or findings that our work could not fully address. I have a wonderful team of students, postdocs, and research scientists at DFCI, and I often worry that my team will think I am not smart, hard working, or caring enough.

During the career training in Texas in 2012, I learned as women we are particularly vulnerable in feeling that we are not as great as people think and we are afraid sooner or later people will find out what a fraud we are. This is called Impostor Syndrome, and Sheryl Sandberg mentioned it in her book Lean In as well. It is an interesting revelation to me, although it didn’t stop me from feeling so just the same.

In a recent China trip, I watched the movie Hyde Park on Hudson. We normally see FDR (Franklin D. Roosevelt, not false discovery rate 🙂 ) pictures as the charming and confident president. But seeing FDR being carried from place to place by his valet, I wonder how humiliating he must have felt. In the movie, his night conversation with George VI was quite interesting and endearing. We all have our vulnerabilities, but that’s OK.

When reading shorter biographies of George Washington before, I couldn’t help marvel at his character, beneficence and good judgement. During the China trip, I read a more detailed Washington biography by Ron Chernow. By the way, Ron Chernow is quite a master at biographies, and I read his biography on Alexander Hamilton 3 times. Anyway, the Washington book not only mentioned some blunders of his youth, but also his personality flaws and corkiness. I guess none of us are saints… George Washington might not be the most brilliant of generals, but his character and integrity made him one of the most respected founding fathers of America and probably one of the best human beings I have read about.

There are two things I learned from the Washington book that are directly applicable to the impostor syndrome. The first is that George Washington was always modest and respectful to his colleagues, even the competitor generals during the war who reviled him behind his back. The second is that George Washington was extremely loyal and supportive to his team members. It is like saying, “Sure, I may not be the best, but I never acted like one. I just do the best I can.” Who can criticize that?? This really disarms the impostor syndrome, but it is easier said than done. Interestingly, looking at my colleagues, I found Bing Ren to best fit these characters. No wonder he earned the respect of so many colleagues!

Jan 152014

I started this blog last summer, but didn’t finish, so let me kick off this year’s blog by finishing it…

A colleague sent me this blog from a Harvard Computer Science professor on how to have a balance and happy tenure track experience.

Very interesting advices! I especially liked her considering the job as a 7-year postdoc (FAS tenure track is 7 years) and to work in fixed number of hours and amount. Actually her advices could be good even for people with tenure. I thought life after tenure will be more relaxed, but the reality is that life after the tenure could be like on the hamster wheel. So it is good to read an article like this sometimes, slow down a little to smell the flowers and listen to our hearts. Indeed, we may not be the perfect parent, researcher or professor, but there will certainly be a place for us in this world, and that will just have to be good enough for me.

Once I was quite stressed out at work, and my PhD advisor Jun Liu kindly offered to chat over coffee. He asked me whether I liked what I was doing. I said yes but there was just too much to do. Jun said, “That makes the solution easier: pick the life style you want to have, and make work fit in it.” Another colleague Zhiping Weng also once told me that we don’t want work to kill us, so if sometimes we have to let some balls drop because we are too swamped, there is no need to feel guilty. Before that, I often felt guilty for not contributing enough on writing a grant, delaying paper submission deadlines, or giving an unsatisfactory talk. These conversations definitely made my life much better! I have learned to not feel guilty on past failures, so I can save some energy to do better the next time.

Oct 232013

Just returned from a recent China trip, during which I attended an epigenetics retreated organized by Yang Shi, and another young bioinformatics PI workshop organized by Yi Zhao. Both were excellent meetings, and one common theme came up: should computational biologists do experiments? I should first say that even as a computational biologist, I am relatively weak in computer science, statistics, and machine learning. If I have Hasty/Tibshirani’s machine learning or Speed / Jun Liu’s statistics, doing experiments might not be necessary. With my limited quantitative abilities, my answer to the question is: we should do cutting edge high throughput experiments, develop the computational methods to help advance good techniques, answer useful biological questions in our own domain interests, and continue to collaborate with experimental biologists.

I believe genomics and bioinformatics are like molecular biology in the 1980’s. They are very useful, but only tools to answer specific biology questions. From the recent epigentic retreat, it is quite clear that many experimental labs are becoming very genomics and bioinformatics savvy. They have overcome the genomics learning curve, which will force computational biologists to be more independent. We have to have our own biological domain and have our own biological questions. Sometimes it is impossible to completely mine public data to answer a specific biological questions, so doing some experiments to generate data is necessary. Of course, for very biologically focused questions, if simple experiments are enough to answer a great biological question, by all means do them. And if we can pay a company or core facility to do it for us, even better. Make sure the hypothesis is generated from mining a lot of data, and check literature to make sure this hasn’t been published already. If it is an important biological question that any experimental biologists could easily come up with and the experiments are easy, rest assured these have been tried by the experimental biologists already. If the hypothesis is novel and arise from mining and modeling public data, but experimental validation is complicated, I would suggest collaborating with an experimental group to validate. The problem with doing experimental validation ourselves is that depending on the hypothesis, validation could mean a cell biology experiment one day, a biochemistry experiment tomorrow, an imaging experiment a third day, and an animal model experiment next week. We simply don’t have the capacity to learn them all. This is no difference from experimental biologists collaborating with each other, but collaboration with experimental biologists doesn’t mean we don’t have our own biological questions. As long as the biological question is really good, and the evidence of our hypothesis is strong enough, there will be interested experimental groups who are willing to help us, especially if we have helped these groups with informatics before.

There is one type of experiments which computational biologists should give priorities to do, which not just generates data for us to analyze or validate, but have interesting data that allow us to develop computational algorithms. I would quote from a colleague and collaborator of mine Mitch Lazar, “computational biologists should stay at the cutting edge of technologies”. As computational biologists, we are most likely not strong enough in genomics to invent new techniques (if you can, that’s great), and new technique invention is risky and time / money consuming. Instead, early adoption of new techniques might be a better option. We can adopt these techniques to answer interesting biological questions, develop computational methods to help other biologists adopt these new techniques (e.g. MACS for ChIP-seq), identify potential biases that the technique developer had in analyzing their data (stay tuned with our Nat Method paper DNase-seq analysis which is in revision), and find novel uses of these techniques that the original developer didn’t intend (e.g. use histone mark ChIP-seq to predict nucleosome positioning and TF binding).

In addition to cutting edge, the experiments we do should be:
1. Generate high throughput data so we can apply our bioinformatics expertise to it. E.g. even though CRISPR/CAS9 is cutting edge, it might not be high throughput enough by itself, so we need to combine it with another good cutting edge high throughput experiment to make it work for us.
2. Is really as good as what the technique developer claimed it to be in the paper figures. With high throughput data, anybody could pick some examples to show how great it works. But the key is to download the data and see how noisy the data is, does the conclusion holds, does it really answer questions that previous experimental methods couldn’t? Sometimes one dataset is not enough which might lead to methods that over fit, so we will need >=3 good datasets to develop a working algorithm.
3. Answer the biological questions we are interested in. E.g. if I am interested in transcription regulation, then ribo-seq would not be a good fit for me, because it investigates translational efficiency not transcription regulation.
4. Not too hard and too expensive to do. First of all, this would allow us to master this experiment in a reasonable time and budget. Also, for an algorithm to work well, there must be enough data later in the public to help improve the algorithm, and the wisdom of the crowd is important to evaluate different algorithms. E.g. for PPI, there are 3 groups generating the high throughput data and 1,000 bioinformatics groups developing algorithms, at the end the data generation group’s algorithm always wins no matter how much better the other algorithms are.

Computational biology PIs like me are often no good at attempting these experiments ourselves. We simply don’t have the time or the neat hands to make the experiment work, although as a young PI I had the opportunity of making such foolish attempts. Instead, hire a capable experimental postdoc to do it, same as experimental PIs recruiting a capable bioinformatics postdoc. Even better, co-supervise the experimental postdoc with an experimental collaborator. This will help recruit a better experimental postdoc, and together the two labs could come up with a better experimental design and biological question. Besides, the experimental postdoc will also be more motivated in trying these new techniques, rather than being like an experimental technician to the computational biology lab. The experimental group could help trouble shoot experimental difficulties and the computational group could share our informatics expertise in analysis and algorithm development. Once we master the technique, we could use it to answer interesting biological questions in other systems and help advance the adoption of such techniques by others in the community, both experimentally (teaching others how to do the experiments) and computationally (developing the algorithms for people to use in analyzing their new data). I really admire Myles Brown and Jason Carroll, as well as many pioneers in the genomics community, in their generosity in helping so many labs learn new genomic techniques. The field is moving so quickly, keeping techniques secretive will only result in their own techniques being overtaken by newer techniques. Also, I believe in Benjamin Franklin’s idea of “doing well (themselves) by doing good (to others)”.

Besides the above reasons, doing experiments is also a logistical necessity in China. Funding agency caps salary at 15% of total grant amount. With a RMB1M grant, if a computational biologist only spent 150K on salary and 150K on equipment and supplies, next round the funding agency might only give 300K, with only 45K possible to be used on salary, and so on. The computational biology labs will be forced to close down in a few years if they don’t do experiments.

Update in 2014: CRISPR/Cas9 is becoming high throughput, which means we are getting excited about it. Wink wink!