Oct 112014

Over the years, I have decided to categorize bioinformatics (or computational biology) research into the following five levels. Here I don’t make any distinction between bioinformatics and computational biology, and so I use the words interchangeably.

Level 0 (if you remember Kongfu Panda) is “modeling for modeling’s sake”. I remember years ago someone asked me, “there are plenty of opportunities to do modeling with the large amount of available GEO data, and what project shall we work on”. I asked him, “what problems would you like to answer”, and he answered “modeling problems”. This is totally OK if the scientists only consider themselves mathematicians, statisticians, computer scientists, or physicists since there are indeed many good theoretical modeling problems in their respective fields, but not OK if they are serious about bioinformatics or computational biology research. Many Level 0 bioinformaticians never read or publish in biological journals or attend biological conferences, so in a way they haven’t got in the door of biomedical research yet. Level 0 research is often only read and cited by the authors themselves and by other people who also only conduct level 0 research, so is quite a waste of resources.

Level 1 is analyzing unpublished data from their own lab or collaborators and trying to make novel biological findings. This is a much more useful endeavor compared to Level 0 bioinformatics, and is a great way to train bioinformaticians. We can practice our existing bioinformatics skills to make real biological findings, learn new bioinformatics skills and a great deal of biology knowledge, and more importantly stimulate insights and ideas on level 2 and level 3 projects. The way to evaluate a Level 1 study is to see how complicated the data is (e.g. the total data volume and data types), whether the bioinformatician needs to create new algorithms or only use other people’s tools to analyze data (e.g. in the method section), how essential the bioinformatics analysis is to the overall project (e.g. how many figures were generated by the bioinformatician and whether main hypothesis is from an informatics analysis), whether the experimental and computational have real fruitful interactions (e.g. from a published paper, more cycles of experimental / computational result description suggest that experiments and computational analyses inform each other for the next step of experiments / analyses, in contrast to studies where all the data was generated first followed by bioinformatics analysis to summarize and integrate the data which sometimes don’t have real findings thus only have descriptive results and no experimental validation), whether there are real and significant biological findings in the study (from reading the abstract).

Level 2 is developing 1) method to solve a general quantitative problem in big data studies that are especially relevant to biomedical research (e.g. Qvalue for FDR), 2) computational algorithms for analyzing data from a new high throughput technique (e.g. RMA or Bowtie), or 3) databases or resources for integrating many other public data (e.g. Oncomine). I considered this a higher level of bioinformatics research since for a Level 1 project the bioinformatician only help their own collaborator, while a good Level 2 project can help thousands of other biologists. Usually these algorithms or resources should address an important and timely biological problem or technical challenge. They don’t have to be published in high profile place, and only time could tell their real significance based on usage and citations. The method may or may not be extremely novel (previously developed statistical or computational method applied to a new biological problem is sufficiently novel), but really has to work and be user friendly. The developers often need to take a lot of additional efforts after the initial publication to maintain and update the algorithm / resources even without future publications. The developers don’t necessarily get sufficient credit from the publication directly, but will do well (when their papers or grants get reviewed) by doing good to the community. Also, to do well in Level 2 research, bioinformaticians should stay focused on their biological domain, so they have good understanding of new computational methods or experimental techniques that are the most relevant or useful in their biological domain.

Level 3 is integrating public high throughput data in a smart way to make good biological findings, so the study often starts from public data and ends in experimental validations. This requires the bioinformatician to have solid biological knowledge, and can come up with their own interesting biological questions. The bioinformatician can lead a biological project where experimental collaborators trust the correctness and significance of the predictions to be willing to conduct experimental validation. Some Level 3 findings that are well designed can even be validated in silico, although unfortunately sometimes experimental biologists might not accept even a solid in silico validation. With more and more public data on resources like GEO, there will be increasing opportunities for level 3 research. These studies should be evaluated by whether the biological question is interesting, whether the integration is smart and sound, and often by the level of the journal where the study is published (as compared to pure experimental studies).

Level X is where bioinformaticians provide the key integration and modeling to the massive amount of data generated from big consortia. A good biomarker for Level X bioinformatics (good specificity not so good sensitivity) is when numbers appear in the paper’s title. Only bioinformaticians with good Level 1 and Level 2 track record and good leadership in team science are recruited to the consortia and eligible for Level X research. These studies often get published in very high profile journals with excellent citations, take tremendous efforts from the informatics lead authors and coordination from all the senior authors. Although the informatics integration is necessary to get the consortium paper published, sometimes the data trumps the informatics, i.e. the journal judges the paper by its data and potential citations and not by the informatics. Also first authorship often better represents the leadership of first author’s PI than the technical capabilities and creativities of the first authors, so first authors may not really get sufficient recognition in their future career for Level X publications. Therefore, first authors in these studies, especially after they become independent, need to establish their own scientific reputation independent of the Level X projects. It might be beneficial for a PI to be involved in some Level X consortium, as these consortia often have members that are pioneers of their respective community. However, only funded by Level X projects and publishings only Level X studies might be a sign that the PI is more on the politics than on the science.

A bioinformatician in training should probably first learn the basic bioinformatics skills and start on Level 1 project, and move towards Level 2 and Level 3 projects as his / her biological understanding and computational techniques improve. As the bioinformatician matures and gains experiences over time, s/he should preferably have a balance of level 1, 2, and 3 projects, with the option of doing some Level X studies. In fact, if resources allow, it is probably healthier for an established bioinformatics PI to conduct research in all levels 1 to X than in just one level. There are also many bioinformaticians, including myself, who are starting to conduct experimental research and generate experimental data themselves. The experimental component of their research should be compared with other experimental biologists, and the informatics component of their research could still be evaluated according to the above 5 categories. Next time, when you read genomics and bioinformatics papers, ask “what is the level of their bioinformatics work?” Try to evaluate the bioinformatics work objectively, instead of by the impact factor of the journal the study is published.

  4 Responses to “Levels of Bioinformatics Research”

  1. […] 讲完废话讲正事。话说八号晚上参会的学者们开了一个“青年沙龙”,形式很活泼,讨论了各种问题,其中忘记谁提了个问题,大致就是怎么评价生物信息学者的水平。大家七嘴八舌说了半天,最后Shirley做总结,众人一听,有道理。所以回来之后Shirley就写了篇博文《Levelsof Bioinformatics Research》,建议同行兄弟们读读,推荐指数五星。因为咱生信算是比较新的研究领域,又是交叉学科,一般来说正统搞计算的觉得你在方法上没贡献,搞生物的又觉得你其实也就是个修电脑的。话说到这,电话铃儿响了,二楼打电话让过去看一下电脑咋黑屏了。好嘛,蹬蹬蹬跑下楼,围着电脑琢磨了半天,明白了:插头没插紧。摆平,回办公室。因此交叉学科评价这是个问题,哪边儿都不讨好,给个酱油级的评价那已经算是给面子了。所以同行评议才合理,好在咱生信现在人也不少,建立同行评议的机制并不困难。兄弟我写博客属于踩西瓜皮式的写法,滑哪儿写哪儿,这篇咱尽量忠于Shirley的原文,不改主要的观点。 […]

  2. level 0: 为了建模而建模,关注的其实是纯粹技术性的东西;
    level 1: 自己实验室产生数据,并作出一些简单的分析,基本只服务于自己或者合作者;
    level 2: 创造一些算法以解决大数据处理中的的数值性问题;
    level 3: 从公共数据中挖掘出新的生物学发现,并作出验证;
    level X: 对公共的大量数据作出关键性的整合和建模。

  3. […] Shirley Liu‘s Blog Site:Levels of Bioinformatics Research ,然后就是科学网薛宇 翻译:如何成为顶级生物信息学家 […]

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>