
By: Tuomo Hartonen and Alison Ollikainen
Let’s Meet Tuomo Hartonen, a doctoral student in the Medical Systems Biology group at University of Helsinki! The interdisciplinary research group is composed of both experimental and computer scientists, and provides world-leading strength to the Center of Excellence in high-throughput screening and handling of big datasets. In this blog, Tuomo tells us about how he found himself leaving the world of physics to work in cancer research and explains how his experience in coding tackles some of the biological questions regarding cancer cell proliferation.
Can you introduce yourself?
I work as a doctoral student studying gene regulation in colorectal cancer in the Medical Systems Biology group led by Professor Jussi Taipale. Even though my title is doctoral student, it’s a little misleading; I actually spend very little time on courses and spend most of my time working on my own research. On the other hand, I study and learn something new every day by reading new scientific publications and via discussions with my supervisors and colleagues.

Where do you work?
My office is at the Meilahti campus of the University of Helsinki where our research group is part of the Center of Excellence in Tumor Genetics. I have actually just returned from a nine-month long research visit at the Department of Biochemistry at the University of Cambridge in the UK where I continued working on my PhD. I believe it is important in science to be open for new ideas and collaborations, both of which are things that can be inspired by a change of scenery! For me, this was actually a second short research visit abroad, the first one was at CERN when I did my theoretical physics studies…

Wait a second! So you studied theoretical physics? How did you find yourself working in cancer research?
Well yes, I actually did my bachelor’s and master’s theses majoring in theoretical physics at the University of Helsinki, so I guess this makes my background sound a little bit unusual for cancer research. I grew interested in applying physics and mathematical methods to biological problems gradually during my studies. Initially, I thought about becoming a particle physicist (hence the summer internship at CERN) but in the end, as my interest in biology and medical research grew, I noticed that the Faculty of Medicine offered a Master’s degree in “Translational Medicine”. I quickly learned that many of the computational methods and techniques I studied during my physics and computer science studies are also relevant and applicable here.
What are you studying?
I study gene regulation in colorectal cancer, focusing on the non-coding genome. The human genome can be divided into the coding and the non-coding genome. Only a small part of the non-coding genome is actually involved in regulating gene expression in any given cell. It’s also not fully understood how and which DNA sequence elements, and their combination, actively regulate gene expression in different cell types. Therefore, it is more difficult to understand the effect of a mutation in the non-coding genome compared to the coding and, thus far, the disease causing mechanism has only been explained for a handful of these non-coding mutations. We are trying to establish what kind of patterns in the DNA sequence mark positions that actively regulate gene expression in cancer and normal cells. With the help of these results, we can better understand how a cell knows how much to grow and what proteins it should produce. I believe it will be easier to explain how mutations in the non-coding genome relate to cancer and other diseases. Understanding these disease mechanisms can open up new avenues for diagnosis and treatment.
In the perspective of a data analyst, describe a typical day for you
My typical day starts by a nice 5 km cycle to work. Once I arrive, I usually log in to our group’s computing servers to see if my jobs have finished during the night. By a “job”, I mean a computational data-analysis task. Nowadays, we are studying gene regulation, for example, by training neural networks on enhancer activity data measured from cell lines and these analyses I run on our group’s servers. If there are jobs that have finished successfully, I will inspect the results and if there is something really interesting we will discuss them with my supervisors and decide on how to proceed. If the results don’t look quite as good as was expected, we think why this is so.
Is it possible that there is a mistake in the analysis code?
Are the parameter values used in the analysis sensible or should they be changed?
Was our initial hypothesis incorrect?
Data driven science is very iterative, meaning that it usually takes many different types of analysis of big data sets in order to understand the biological processes that were at work during the experiment. Most of my time on a typical day goes to writing analysis code or learning how to use analysis software that other scientists have published. The process starts by trying to conceptualise the problem we are studying and then thinking of what kind of analysis I need in order to solve the problem. First, I try to search if someone has already solved the problem so that we can use their software or analytical results to help analyse our data (standing on the shoulders of giants, right? 😉 ).
If no existing tools are applicable, I will write a computer program to solve the problem. I do this by breaking our question into a series of tasks the computer needs to do in order for us to “get the results”. How to break the scientific question into these smaller tasks is something you learn through experience in “coding”. I think the best part of my job is that I constantly get to learn new things and because of that, the work is rarely repetitive.
How do you know where to start when you have to write the code yourself?
I am going to let you in for a secret: most programmers don’t just sit in front of the computer and write the program from their head. Instead, there are many resources on the internet where people describe the most efficient ways of solving small tasks that are common in programming problems using the programming language of your choice. I have also found that following relevant scientific journals and scientists in Twitter is a good way to keep up with the latest research in my field. Sometimes we get lucky and are able to combine methods published by other scientists to analyse our experimental data, but most of the time it is a combination of using some existing tools and writing our own new analysis code.

Click on the links to check out a couple of papers in Nature and Bioinformatics that Tuomo has contributed to during his PhD studies