Multitask Machine Learning

Industries

Healthcare & Life Sciences

Expertise

Artificial Intelligence & Machine Learning

Technologies

Python

Client

Our client is a major international pharmaceutical company, which conducts research and development activities related to a wide range of human medical disorders, including mental illness, neurological disorders, cancer, and other disorders.

Challenge

For every pharmaceutical company, it is vitally important to maintain its drug discovery pipeline—a set of drug candidates under development. The search of drug candidates is extremely expensive, as it implies thousands of experiments to find substances having desired effect on biotargets.

Our client set us the task to deliver a very fast and highly scalable data pipeline that uses different machine learning algorithms to learn and predict chemical compound activity to reduce the number of real experiments.

Among other tasks (see case study ‘Machine Learning for Biochemistry’ for full description), we were to develop a solution maintaining multi-task learning, i.e. ability of the AI to solve several learning tasks at the same time and to exploit commonalities and differences across tasks. This enables more efficient learning due to efficient resource use and positive impact of tasks on each other.

Solution

Our ML models support multitasking. This means, the target tensor is a 2D matrix and it contains different tasks (e.g. several biological targets). For classification we also support multilabel and multitask target tensors containing multiple labels for each of the tasks.
Also, we created a framework which allows to speed up the multitask calculations and improve model performance using several modern approaches:

Task Affinity Grouping allows to improve model accuracy and reduce the number of tasks by finding the groups of tasks which have positive impact on each other.
AdaShare (adaptive sharing approach) utilizes parameter sharing between tasks. I.e., the method defines, how to decide ‘what to share across which tasks to achieve the best recognition accuracy, while taking resource efficiency into account’
L-DRO (lookahead distributionally robust optimization) tries to improve the task with the worst accuracy using the game theory.
PCGrad operates gradients and improves the tasks with conflicting gradients.
MolTSE (Molecular Tasks Similarity Estimator) ‘projects individual tasks into a latent space and measures the distance between the embedded vectors to derive the task similarity estimation and thus enhance the molecular prediction results’.

Results & Benefits

We improved some original algorithms to deal with the number of tasks (over 500) to be able to solve the problem in a reasonable time. In our generic framework users can use their arbitrary neural networks for binary and multilabel classification problems and for regression problems.

Related Cases

Read all

RTSM Solution: Data Ingestion Improvement

Removing issues in data architecture and processing in order to provide a solid foundation for future growth of the platform.

LMS Content Import and Export Feature

A solution for importing and exporting content from / to Moodle and IMSCC platforms.

Content Generation with Copilot Studio and MCP Servers

A solution to help new teachers rapidly adapt to the educational system while providing easy access to the existing content base.