<<
Advisor Interview: The Paradigm of Computational Biology Deepened by AI and HPC — From Parametron to Bioinformatics, Interview with Professor Kentaro Shimizu
March 15, 2025
Kentaro Shimizu, Professor Emeritus, University of Tokyo
Biological research faces the formidable challenges of massive data and complex computations. Predicting gene sequences, protein structures, or unraveling their principles requires time-intensive simulations and specialized software development, yet researchers capable of handling these are scarce. To break through these barriers and accelerate human progress, the VN Machine project has been launched. Today, we speak with Professor Kentaro Shimizu, an advisor to the project.
Professor Kentaro Shimizu, Professor Emeritus at the University of Tokyo, has been a leader in computational biology and bioinformatics for decades, carving a unique path from computer science to biology. In the 1960s, he earned his doctorate in Eiichi Goto’s lab, contributing to the development of the parametron*1 computer, one of the earliest innovative digital computers. In an era before machine learning became mainstream, Professor Shimizu pioneered computational methods to predict 3D protein structures from amino acid sequences alone, as well as advanced AI-driven approaches for genomics and proteomics, influencing countless subsequent studies. A leading figure in molecular dynamics (MD) simulation*2, he has studied protein folding and ligand binding, with significant applications in drug discovery and design, earning widespread recognition in the scientific community.
The VNM project builds on the insights of pioneers like Professor Shimizu, merging large-scale AI and high-performance computing (HPC) to pursue a world where large-scale data analysis and simulations are accessible even without specialized computing expertise. Today, we explore the possibilities and the future of biological research with him.
Kentaro Shimizu (Shimizu Kentaro)
Born in 1938. Professor Emeritus, University of Tokyo. After contributing to the development of the parametron computer in Eiichi Goto’s lab, he shifted his focus to bioinformatics in the early 1980s. As a pioneer in protein 3D structure prediction using machine learning and through molecular dynamics simulations of protein folding and ligand binding, he has significantly influenced drug discovery and the understanding of biology.
Professor Kentaro Shimizu, Professor Emeritus at the University of Tokyo, has been a leader in computational biology and bioinformatics for decades, carving a unique path from computer science to biology. In the 1960s, he earned his doctorate in Eiichi Goto’s lab, contributing to the development of the parametron*1 computer, one of the earliest innovative digital computers. In an era before machine learning became mainstream, Professor Shimizu pioneered computational methods to predict 3D protein structures from amino acid sequences alone, as well as advanced AI-driven approaches for genomics and proteomics, influencing countless subsequent studies. A leading figure in molecular dynamics (MD) simulation*2, he has studied protein folding and ligand binding, with significant applications in drug discovery and design, earning widespread recognition in the scientific community.
The VNM project builds on the insights of pioneers like Professor Shimizu, merging large-scale AI and high-performance computing (HPC) to pursue a world where large-scale data analysis and simulations are accessible even without specialized computing expertise. Today, we explore the possibilities and the future of biological research with him.
From Parametron to Bioinformatics
Kazuki Otsuka (hereafter, Otsuka)
Before you pursued bioinformatics, I understand that you were deeply involved in computer research. What were your main interests back then?
Professor Kentaro Shimizu (hereafter, Shimizu)
It was a lot of fun, simply put. I suspect you might feel the same way, Otsuka—when I’m coding, it makes me really happy. The deeper I go into the computer’s core or low-level software, the more it excites me.
Back then, computer resources were becoming available to a wider group of users, and I was extremely interested in creating something that couldn’t be done by one person or a small team alone, something cooperative—distributed processing or collaborative software. I continued working on aspects of that after moving into biology.
For example, I even published a paper on running MD (molecular dynamics) simulations*2 faster in a distributed environment.
Back then, computer resources were becoming available to a wider group of users, and I was extremely interested in creating something that couldn’t be done by one person or a small team alone, something cooperative—distributed processing or collaborative software. I continued working on aspects of that after moving into biology.
For example, I even published a paper on running MD (molecular dynamics) simulations*2 faster in a distributed environment.
Otsuka
So in a way, you were automating distributed processing. Considering this was in the 1990s, that sounds quite pioneering.
What do you consider your most representative work, Professor Shimizu?
What do you consider your most representative work, Professor Shimizu?
Shimizu
Well, from before AI became the huge trend it is today, I was using machine learning and similar techniques for prediction—and even exploring automatic generation of prediction tools.
For instance, we developed software to predict protein structures and functions based on amino acid sequences.
For instance, we developed software to predict protein structures and functions based on amino acid sequences.
Handling Massive Data – Practical Challenges in Computational Biology
Otsuka
I imagine biological data is quite large.
Shimizu
Yes. Right now, for example, we’re doing work that goes from sequence to structure, and the datasets are enormous. Some major databases won’t allow programmatic bulk downloads, and if you try to pull large amounts of data via a web interface, you’ll often run into restrictions.
So what we do instead is bring it all in locally. But because the data is so large, we struggle with how to handle it.
So what we do instead is bring it all in locally. But because the data is so large, we struggle with how to handle it.
Otsuka
Large data also takes a lot of time to process.
Shimizu
Exactly. If it’s on a database site with fast external access, that would be ideal. But I was just reminded that it’s quite difficult to rapidly process or query large datasets at scale.
Otsuka
How big are we talking—terabytes?
Shimizu
Just the sequence data alone can amount to hundreds of gigabytes, and when you add 3D structures and dynamic structural data, it grows to terabytes.
Otsuka
I personally think it would be easiest if we had something like a supercomputer with effectively unlimited storage, where the data could reside and be accessed whenever needed for computation.
Shimizu
Yes, I agree. It’d be great if that environment were easier to come by. It’s also quite important that it be connected to the internet.
In other words, the data we use is typically public, not from a private drive somewhere. Having a way to easily use publicly available data as if it were local would be fantastic.
In other words, the data we use is typically public, not from a private drive somewhere. Having a way to easily use publicly available data as if it were local would be fantastic.
Otsuka
At VNM, we’re looking into a data hub that would provide common access from servers on the same local network, minimizing the need for downloads or hard copies.
We’re also considering how to commercialize publicly available data. Do you think data sales are realistic?
We’re also considering how to commercialize publicly available data. Do you think data sales are realistic?
Shimizu
It used to be difficult, but if it’s specialized—say, targeted toward a specific type of R&D—then yes, it’s possible. We might need a TLO (Technology Licensing Organization) or a similar framework in place.
Ease of Use Can Prompt Investigations into Underexplored Mechanisms
Otsuka
People likely have a lot of ideas they’d love to pursue but can’t. How widespread do you think that issue is?
Shimizu
Looking at MD (molecular dynamics) simulations, for example, even though computing power keeps improving, fully enumerating all states is still impractical, so we rely on sampling. We do use AI in these areas, but it often becomes a black box.
If we could do something that genuinely explains phenomena, it would advance natural science.
If we could do something that genuinely explains phenomena, it would advance natural science.
Otsuka
So not just getting an answer you can plug in somewhere, but really revealing the underlying mechanism.
Shimizu
Yes. MD is done step by step according to physical laws, but when phenomena take a long time to occur, straightforward simulation can’t always keep up. We need a more coarse-grained approach, or we have to sample the configurations generated by MD in effective ways, so various methods are proposed.
Otsuka
I was talking to another biologist who said there’s still a lot to uncover simply by applying existing methods—plenty of data is still unexplored, and that work can yield results. Then there are researchers who aim to uncover the fundamental mechanisms themselves, and that likely requires writing new systems.
Ultimately, it depends on human curiosity or the specific type of problem each researcher wants to solve.
Ultimately, it depends on human curiosity or the specific type of problem each researcher wants to solve.
Shimizu
That’s a very important perspective, indeed.
Otsuka
If that’s the case, how many fall into the latter category? Someone suggested that maybe 10% of researchers in a given group focus on fundamental mechanisms. That would mean 10 in a group of 100, then 100 across Japan, then 10,000 globally.*3
Shimizu
Since they’re researchers, I think everyone wants to get at the mechanisms on some level. However, given that many grants demand results within a set timeframe, it can be hard to devote time to deeper questions.
Otsuka
A variety of factors—funding constraints, etc.—likely play a role.
Shimizu
Exactly. So if there’s software that, with a bit more work, can clarify or explain the phenomena, it might encourage people to give it a try.
Otsuka
If there were tools that didn’t require too much effort to master, people might use spare time to explore new ideas. Does that sound plausible?
Shimizu
Yes, it does. And I believe it’s important for scientific advancement.
Otsuka
I suspect there are quite a few researchers who’ve had ideas shelved for a long time because they weren’t feasible with current tools.
Shimizu
Right, that’s a big challenge.
Otsuka
I hope we can create a framework that makes those ideas feasible.
Fundamental concepts are often put off precisely because they’re so foundational. Having a way to execute them would be wonderful.
Fundamental concepts are often put off precisely because they’re so foundational. Having a way to execute them would be wonderful.
Shortening Computation Time Is Key to Discovery
Otsuka
What about HPC (High-Performance Computing)?
Shimizu
Simulations of proteins or nucleic acids are a good example of where faster is better. For instance, if you want to understand how a protein interacts with another molecule or how it changes conformation, you need prolonged simulations or extensive sampling. It would be great if we could speed that up.
Otsuka
Is it simply that the matrices are too big, or is there some other reason it takes so long?
Shimizu
It’s that the range of possible molecular configurations is huge, making it very challenging to sample broadly.
Otsuka
So there’s a massive combinatorial space to explore.
Shimizu
Exactly. There’s something called docking, where we examine how molecule A binds to molecule B.
Even when crystallography shows a particular binding pose, physics-based simulations sometimes fail to find that pose.
Broadly speaking, everyone knows that running MD for longer should yield better results, so if we can accelerate it with specialized hardware or refine the simulation granularity, accuracy improves.
We also need better models, but thorough sampling is key, which depends on runtime. You run it long enough to see a variety of configurations, and eventually the phenomenon you’re interested in might appear.
Even when crystallography shows a particular binding pose, physics-based simulations sometimes fail to find that pose.
Broadly speaking, everyone knows that running MD for longer should yield better results, so if we can accelerate it with specialized hardware or refine the simulation granularity, accuracy improves.
We also need better models, but thorough sampling is key, which depends on runtime. You run it long enough to see a variety of configurations, and eventually the phenomenon you’re interested in might appear.
Specialized Expertise vs. Generalized Models
Otsuka
From talking with experts in many fields, I’ve realized that not everyone loves formulas or computation.
Some are fully outside the realm of computing, others are deeply immersed, and some float in between.
I sense there’s a lot of untapped potential in those differences.
Some are fully outside the realm of computing, others are deeply immersed, and some float in between.
I sense there’s a lot of untapped potential in those differences.
Shimizu
Indeed, things vary among individuals. In the early days, when I talked to people in, say, the Faculty of Agriculture, they were often very focused on “this particular protein” or “this particular gene,” drilling down into specifics. So not everyone was interested in modeling or generalization, just as you said.
From their perspective, “tweaking parameters” to interpret their hard-won experimental data could seem suspect.
From their perspective, “tweaking parameters” to interpret their hard-won experimental data could seem suspect.
Otsuka
How do you address that concern about “tweaking parameters” arbitrarily?
Shimizu
Informatics approaches have steadily improved accuracy. Even a provisional model that explains the phenomena has value, since it can serve as a hypothesis.
Furthermore, when more detailed experimental data emerges, you can fit it to the model to see whether it explains the observations. That’s important.
Furthermore, when more detailed experimental data emerges, you can fit it to the model to see whether it explains the observations. That’s important.
Otsuka
So even if someone does experiments, they might not necessarily build a model afterward.
But creating a model lets you make predictions, right?
But creating a model lets you make predictions, right?
Shimizu
Precisely—it allows prediction. That means if we can simulate and explain a phenomenon, we can then predict what happens next.
In the end, each researcher has a motivation to investigate whatever protein phenomenon they’re focusing on.
In the end, each researcher has a motivation to investigate whatever protein phenomenon they’re focusing on.
Otsuka
They’ll solve specific problems, but only some will go as far as building generalized models.
Shimizu
Yes, exactly. And when you generalize, they tend to be quite strict about parameter reliability. For instance, I’ve seen people question why an AUC-ROC of 0.9 isn’t 1.0. Some say that if it’s not perfect, they can’t move forward with it in a scientific context.
VNM’s Approach: Creating Custom Software Through Interaction
Otsuka
There’s a U.S. company called Rescale that recently raised a massive amount of funding to offer a research HPC cloud service mainly for enterprise clients. They host a huge variety of top-tier software that runs on the cloud.
For existing, well-established MD software, using those solutions is fine. We’re not aiming to compete directly; we’re more interested in the unmet needs that current solutions can’t address.
For existing, well-established MD software, using those solutions is fine. We’re not aiming to compete directly; we’re more interested in the unmet needs that current solutions can’t address.
Shimizu
Yes, that resonates.
In protein dynamics, you have large domains within a protein, but AlphaFold struggles to predict how these domains move relative to each other—especially if flexible linkers connect them. So investigating how those domains move is itself a research subject. If we had breakthrough software for that, it’d be really valuable.
At the domain or chain level, how does the protein structure move or interact? There’s definitely a need there.
In protein dynamics, you have large domains within a protein, but AlphaFold struggles to predict how these domains move relative to each other—especially if flexible linkers connect them. So investigating how those domains move is itself a research subject. If we had breakthrough software for that, it’d be really valuable.
At the domain or chain level, how does the protein structure move or interact? There’s definitely a need there.
Otsuka
Thank you so much for sharing such valuable insights. It’s been tremendously helpful to learn from someone with deep knowledge of both computing and biology. I look forward to working with you in the future.
Shimizu
Likewise. Thank you, and let’s continue collaborating.
*1 Parametron is a logic circuit element invented in 1954 by Eiichi Goto, then a graduate student at the University of Tokyo. It significantly reduced the need for vacuum tubes and transistors in computer construction, leading to the building of several parametron-based computers. By the 1960s, parametron technology was largely replaced by transistor-based systems, but the same principle was later realized in various physical systems. Since the 2010s, parametron technology has attracted renewed attention in relation to quantum computing. (Source: Wikipedia)
*2 MD (molecular dynamics) simulation is a method for tracking the stepwise physical interactions of molecules, predicting the motion of proteins, nucleic acids, etc. AlphaFold excels in predicting static 3D structures but does not directly account for time-dependent changes (dynamics). Although it is useful for initial hypotheses in drug discovery, phenomena like ligand binding and molecular motion require physics-based MD simulation.
*3 The global researcher population (all fields) is estimated to be around 8.8 million. Biology and medical/life sciences account for around 36% of published papers, suggesting at least several million researchers in these areas. Thus, the “10,000” figure is a conservative back-of-the-envelope estimate. Considering, for instance, that the NIH alone funds over 27,000 basic research principal investigators, the number of researchers focusing on “fundamental mechanism elucidation” is easily in the tens or even hundreds of thousands. (Sources: UNESCO, national statistics, NSF reports on researcher numbers and publication data, etc.)
*2 MD (molecular dynamics) simulation is a method for tracking the stepwise physical interactions of molecules, predicting the motion of proteins, nucleic acids, etc. AlphaFold excels in predicting static 3D structures but does not directly account for time-dependent changes (dynamics). Although it is useful for initial hypotheses in drug discovery, phenomena like ligand binding and molecular motion require physics-based MD simulation.
*3 The global researcher population (all fields) is estimated to be around 8.8 million. Biology and medical/life sciences account for around 36% of published papers, suggesting at least several million researchers in these areas. Thus, the “10,000” figure is a conservative back-of-the-envelope estimate. Considering, for instance, that the NIH alone funds over 27,000 basic research principal investigators, the number of researchers focusing on “fundamental mechanism elucidation” is easily in the tens or even hundreds of thousands. (Sources: UNESCO, national statistics, NSF reports on researcher numbers and publication data, etc.)

Born in 1938. Professor Emeritus, University of Tokyo. After contributing to the development of the parametron computer in Eiichi Goto’s lab, he shifted his focus to bioinformatics in the early 1980s. As a pioneer in protein 3D structure prediction using machine learning and through molecular dynamics simulations of protein folding and ligand binding, he has significantly influenced drug discovery and the understanding of biology.