Proteomics is rapidly advancing as a promising avenue for discovering predictive cancer biomarkers. Recently, ProCan and the Wellcome Sanger Institute published the world’s largest pan-cancer proteomic dataset of 949 cell lines, treated with 625 anti-cancer drugs, representing a rich resource for biomarker discovery. Furthermore, the rapid expansion of multi-omic datasets holds bright prospects for deriving insights into drug response mechanisms, but currently, there is a lack of adequate tools to effectively harness such multifaceted data.
Previous studies have correlated single proteins in cell line proteomic data with drug susceptibility. However, due to high computational demand, identifying pair-wise and higher-order (triplet and quadruplet) interactions that synergistically modulate drug susceptibility are beyond the scope of current methods. We introduce a novel machine learning method utilising the random forest algorithm to identify higher-order proteomic synergies underlying drug response. By incorporating AlphaFold in our approach, we can speculate putative interactions between our significant hits. Furthermore, our versatile methodology can identify cross-omic higher-order signatures in any multi-omic dataset.
Our method uncovers “global” baseline signatures predicting drug susceptibility that recurrently appear across all drug classes, and “local” signatures that exclusively predict susceptibility to specific drug classes. We replicate 183 interactions in an independent dataset of 76 breast cancer cell lines. Among our notable findings, EGFR is recurrently identified as a “hub” protein central to many interactions involved in sensitivity to tyrosine kinase inhibitors (TKIs), corroborating clinical observations that certain EGFR mutations predict favourable responses to TKIs such as erlotinib and gefitinib. Conversely, vimentin was identified as a resistance biomarker for TKIs, aligning with studies showing that epithelial to mesenchymal transition and gefitinib resistance are associated with increased vimentin expression.
Taken together, our findings contribute towards the goal of leveraging ‘omic data to guide cancer precision medicine, leading to more effective, personalised treatments for cancer patients.