Speaker: 

Uri Keich

Institution: 

University of Sidney (Australia)

Time: 

Monday, January 25, 2021 - 4:00pm to 5:00pm

Location: 

Zoom (follow link below)
A typical shotgun proteomics experiment produces thousands of tandem mass spectra, each of which can be tentatively assigned a corresponding peptide by using a database search procedure that looks for a peptide-spectrum match (PSM) that optimizes the score assigned to a matched pair. Some of the resulting PSMs will be correct while others will be false, and we need to decide which ones we report.
Determining which PSMs are correct is an example of the multiple testing problem. More generally these days researchers often simultaneously test thousands of null hypotheses (e.g., the PSM is incorrect) trying to decide which ones to reject, or equivalently, report as discoveries. The common approach to controlling the statistical error is to employ procedures that control the False Discovery Rate (FDR), which is defined as the expectation of the proportion of false discoveries.
Starting with Benjamini & Hochberg's 1995 paper there is a rich statistical literature on controlling the FDR assuming we can assign a p-value to each of our tests. In the absence of those the mass spec community largely relied on a "home grown" method called target-decoy competition (Elias & Gygi 2007) to control the FDR in the PSM context. This general approach of generating competing null scores recently gained popularity after Barber and Candes (2015) used the same principle in their knockoff+ procedure to control the FDR in feature selection in a classical linear regression model.
We will start by introducing the problem of tandem mass spectrum identification and giving an overview of FDR control in the canonical multiple testing problem. The rest of the talk will focus on how competition-based approach works and on our recent contributions to this area.