Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets

dc.WoS.categoriesBiochemical Research Methods; Biotechnology & Applied Microbiology; Mathematical & Computational Biologyen_US
dc.authorid0000-0002-9253-8152en_US
dc.contributor.authorOtu, Hasan Hüseyin
dc.contributor.authorAlbayrak, Aydın
dc.contributor.authorSezerman, Uğur
dc.date.accessioned2021-01-18T11:19:25Z
dc.date.available2021-01-18T11:19:25Z
dc.date.issued2010-08-18
dc.description.abstractBackground: Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering. Results: We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively. Conclusions: The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences.en_US
dc.fullTextLevelFull Texten_US
dc.identifier.doi10.1186/1471-2105-11-428
dc.identifier.issn1471-2105
dc.identifier.pmid20718947en_US
dc.identifier.scopus2-s2.0-77955613051en_US
dc.identifier.urihttps://hdl.handle.net/11411/3127
dc.identifier.urihttps://doi.org/10.1186/1471-2105-11-428
dc.identifier.wosWOS:000293622900001en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakPubMeden_US
dc.language.isoenen_US
dc.nationalInternationalen_US
dc.numberofauthors3en_US
dc.publisherBmcen_US
dc.relation.ispartofBmc Bioinformaticsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectPHYLOGENETIC ANALYSISen_US
dc.subjectTREEen_US
dc.subjectEVOLUTIONen_US
dc.subjectSEQUENCESen_US
dc.subjectDISTANCEen_US
dc.subjectPROFILEen_US
dc.titleClustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets
dc.typeArticle
dc.volume12en_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Otu 2010.pdf
Boyut:
741.54 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.71 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: