Por favor, use este identificador para citar o enlazar este ítem:
https://www.arca.fiocruz.br/handle/icict/51188
Tipo
ArtículoDerechos de autor
Acceso abierto
Colecciones
- IOC - Artigos de Periódicos [12747]
Metadatos
Mostrar el registro completo del ítem
LEVERAGING THE PARTITION SELECTION BIAS TO ACHIEVE A HIGH-QUALITY CLUSTERING OF MASS SPECTRA
Espectros de massa em tandem
Ferramenta de avaliação de partição
Autor
Afiliación
Fiocruz Paraná. Instituto Carlos Chagas. Laboratório de Proteômica Estrutural e Computacional. Curitiba, PR, Brasil.
Department of Chemical Biology, Leibniz – Forschungsinstitut für Molekulare Pharmakologie (FMP). Berlin, Germany.
Fiocruz Paraná. Instituto Carlos Chagas. Laboratório de Proteômica Estrutural e Computacional. Curitiba, PR, Brasil.
Mass Spectrometry for Biology Unit, CNRS USR 2000. Institut Pasteur, Paris, France.
Mass Spectrometry for Biology Unit, CNRS USR 2000. Institut Pasteur, Paris, France.
Fiocruz Paraná. Instituto Carlos Chagas. Laboratório de Proteômica Estrutural e Computacional. Curitiba, PR, Brasil.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Toxinologia. Rio de Janeiro, RJ, Brasil / Centre de Recherche en Cancérologie et Immunologie Nantes-Angers (CRCINA), Team SOAP, INSERM U1232. Nantes, France.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Toxinologia. Rio de Janeiro, RJ, Brasil.
Universidade Federal do Rio de Janeiro. Programa de Engenharia de Sistemas e Ciência da Computação. Rio de Janeiro, RJ, Brasi..
Fiocruz Paraná. Instituto Carlos Chagas. Laboratório de Proteômica Estrutural e Computacional. Curitiba, PR, Brasil.
Department of Chemical Biology, Leibniz – Forschungsinstitut für Molekulare Pharmakologie (FMP). Berlin, Germany.
Fiocruz Paraná. Instituto Carlos Chagas. Laboratório de Proteômica Estrutural e Computacional. Curitiba, PR, Brasil.
Mass Spectrometry for Biology Unit, CNRS USR 2000. Institut Pasteur, Paris, France.
Mass Spectrometry for Biology Unit, CNRS USR 2000. Institut Pasteur, Paris, France.
Fiocruz Paraná. Instituto Carlos Chagas. Laboratório de Proteômica Estrutural e Computacional. Curitiba, PR, Brasil.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Toxinologia. Rio de Janeiro, RJ, Brasil / Centre de Recherche en Cancérologie et Immunologie Nantes-Angers (CRCINA), Team SOAP, INSERM U1232. Nantes, France.
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de Toxinologia. Rio de Janeiro, RJ, Brasil.
Universidade Federal do Rio de Janeiro. Programa de Engenharia de Sistemas e Ciência da Computação. Rio de Janeiro, RJ, Brasi..
Fiocruz Paraná. Instituto Carlos Chagas. Laboratório de Proteômica Estrutural e Computacional. Curitiba, PR, Brasil.
Resumen en ingles
In proteomics, the identification of peptides from mass spectral data can be mathematically described as the
partitioning of mass spectra into clusters (i.e., groups of spectra derived from the same peptide). The way partitions
are validated is just as important, having evolved side by side with the clustering algorithms themselves
and given rise to many partition assessment measures. An assessment measure is said to have a selection bias if,
and only if, the probability that a randomly chosen partition scoring a high value depends on the number of
clusters in the partition. In the context of clustering mass spectra, this might mislead the validation process to
favor clustering algorithms that generate too many (or few) spectral clusters, regardless of the underlying peptide
sequence. A selection bias toward the number of peptides is desirable for proteomics as it estimates the number of
peptides in a complex protein mixture. Here, we introduce an assessment measure that is purposely biased toward
the number of peptide ion species. We also introduce a partition assessment framework for proteomics,
called the Partition Assessment Tool, and demonstrate its importance by evaluating the performance of eight
clustering algorithms on seven proteomics datasets while discussing the trade-offs involved.
Significance: Clustering algorithms are widely adopted in proteomics for undertaking several tasks such as
speeding up search engines, generating consensus mass spectra, and to aid in the classification of proteomic
profiles. Choosing which algorithm is most fit for the task at hand is not simple as each algorithm has advantages
and disadvantages; furthermore, specifying clustering parameters is also a necessary and fundamental step. For
example, deciding on whether to generate “pure clusters” or fewer clusters but accepting noise. With this as
motivation, we verify the performance of several widely adopted algorithms on proteomic datasets and introduce
a theoretical framework for drawing conclusions on which approach is suitable for the task at hand.
Palabras clave en portugues
AgrupamentoEspectros de massa em tandem
Ferramenta de avaliação de partição
Compartir