Известия высших учебных заведений

Прикладная нелинейная динамика

ISSN 0869-6632 (Print)
ISSN 2542-1905 (Online)


Для цитирования:

Sheinman M. Exact sequence matches in genomic studies [Шейнман М. Точные соответствия последовательностей в геномных исследованиях] // Известия вузов. ПНД. 2023. Т. 31, вып. 6. С. 739-756. DOI: 10.18500/0869-6632-003073, EDN: TVKYRH


Статья опубликована на условиях лицензии Creative Commons Attribution 4.0 International (CC-BY 4.0).
Полный текст в формате PDF(Ru):
(загрузок: 0)
Полный текст в формате PDF(En):
(загрузок: 315)
Язык публикации: 
английский
Тип статьи: 
Обзорная статья
УДК: 
573.2
EDN: 

Exact sequence matches in genomic studies
[Точные соответствия последовательностей в геномных исследованиях]

Авторы: 
Шейнман Михаил, Севастопольский Государственный Университет
Аннотация: 

Цель этого обзора — рассмотреть использование точных соответствий геномных последовательностей в разных контекстах.

Методы. Здесь представлен явно не исчерпывающий список работ, в которых авторы добились интерпретируемых результатов, используя статистические данные точных соответствий разных геномных последовательностей или самосоответствий на одном геноме.

Результаты. Часто в геномных исследованиях разные геномные локусы обладают различными статистическими свойствами, в то время как их границы изначально не известны. В таких случаях рассмотрение статистических свойств точных соответствий является полезной альтернативой других методик, например основывающихся на разделении генома на произвольного размера участки — так называемые скользящие или нескользящие окна.

Заключение. Этот обзор демонстрирует, что анализ точных соответствий это не только вспомогательный шаг в выравнивании геномных последовательностей, но также выявляет биологические закономерности в различных контекстах.

Благодарности: 
Я благодарю П. Ф. Арндта и Ф. Массипа за полезные комментарии и обсуждения. Численный анализ проводился с использованием суперкомпьютерного кластера «Афалина» в Севастопольском государственном университете. Работа выполнена в рамках программы «Приоритет-2030» Севастопольского государственного университета (стратегический проект № 3, № 121121700318-1) и проекта FE FM-2023-0005.
Список источников: 
  1. Reeck G, de Haen C, Teller D, Doolittle R, Fitch W, Dickerson R, Chambon P, McLachlan A, Margoliash E, Jukes T, Zuckerkandl E. “Homology” in proteins and nucleic acids: A terminology muddle and a way out of it. Cell. 1987;50(5):667. DOI: 10.1016/0092-8674(87)90322-9.
  2. Koonin E. Orthologs, paralogs, and evolutionary genomics. Annual Review of Genetics. 2005;39:309–338. DOI: 10.1146/annurev.genet.39.073003.114725.
  3. Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–770. DOI: 10.1093/bioinformatics/btr011.
  4. Claverie J, Sauvaget I, Bougueleret L. Computer generation and statistical analysis of a data bank of protein sequences translated from GenBank. Biochimie. 1985;67(5):437–443. DOI: 10.1016/s0300-9084(85)80261-3.
  5. Karlin S, J M. Compositional differences within and between eukaryotic genomes. Proceedings of the National Academy of Sciences. 1997;94(19):10227–10232. DOI: 10.1073/pnas.94.19.10227.
  6. Mahajan S, Agashe D. Evolutionary jumps in bacterial GC content. G3 Genes|Genomes|Genetics. 2022;12(8):jkac108. DOI: 10.1093/g3journal/jkac108.
  7. Karlin S. Global dinucleotide signatures and analysis of genomic heterogeneity. Current Opinion in Microbiology. 1998;1(5):598–610. DOI: 10.1016/S1369-5274(98)80095-7.
  8. Angeloni A, Bogdanovic O. Sequence determinants, function, and evolution of CpG islands. Biochemical Society Transactions. 2021;49(3):1109–1119. DOI: 10.1042/BST20200695.
  9. Parvathy S, Udayasuriyan V, Bhadana V. Codon usage bias. Molecular Biology Reports. 2022;49(1):539–565.
  10. Gusfield D. Algorithms on strings, trees, and sequences: Computer science and computational biology. Acm Sigact News. 1997;28(4):41–60.
  11. Gibbs A, McIntyre G. The diagram, a method for comparing sequences: Its use with amino acid and nucleotide sequences. European Journal of Biochemistry. 1970;16(1):1–11. DOI: 10.1111/j.1432- 1033.1970.tb01046.x.
  12. Karlin S, Altschul S. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the United States of America. 1990;87(6):2264–2268. DOI: 10.1073/pnas.87.6.2264.
  13. Karlin S, Altschul S. Applications and statistics for multiple high-scoring segments in molecular sequences. Proceedings of the National Academy of Sciences of the United States of America. 1993;90(12):5873–5877. DOI: 10.1073/pnas.90.12.5873.
  14. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. Journal of Molecular Biology. 1990;215(3):403–410. DOI: 10.1016/S0022-2836(05)80360-2.
  15. Brown D. A survey of seeding for sequence alignment. In: Bioinformatics Algorithms: Techniques and Applications. New York: Wiley; 2007. p. 117–142. DOI: 10.1002/9780470253441.ch6.
  16. Ebel M, Migliorelli G, Stanke M. Global, highly specific and fast filtering of alignment seeds. BMC Bioinformatics. 2022;23(1):225. DOI: 10.1186/s12859-022-04745-4.
  17. Wilbur W, Lipman D. Rapid similarity searches of nucleic acid and protein data banks. Proceedings of the National Academy of Sciences of the United States of America. 1983;80(3):726–730. DOI: 10.1073/pnas.80.3.726.
  18. Lipman D, Pearson W. Rapid and sensitive protein similarity searches. Science. 1985;227(4693): 1435–1441. DOI: 10.1126/science.2983426.
  19. Ma B, Tromp J, Li M. PatternHunter: faster and more sensitive homology search. Bioinformatics. 2002;18(3):440–445.
  20. Burkhardt S, Karkkainen J. Better filtering with gapped q-grams. Fundamenta Informaticae. 2003;56(1–2):51–70.
  21. Schwartz S, Kent W, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, Miller W. Humanmouse alignments with BLASTZ. Genome Research. 2003;13(1):103–107. DOI: 10.1101/gr.809403.
  22. Noe L, Kucherov G. Improved hit criteria for DNA local alignment. BMC Bioinformatics. 2004;5:149. DOI: 10.1186/1471-2105-5-149.
  23. Buchfink B, Xie C, Huson D. Fast and sensitive protein alignment using DIAMOND. Nature Methods. 2015;12(1):59–60. DOI: 10.1038/nmeth.3176.
  24. Harris R. Improved Pairwise Alignment of Genomic DNA. University Park, PA United States: The Pennsylvania State University; 2007.
  25. Morgulis A, Coulouris G, Raytselis Y, Madden T, Agarwala R, Schaffer A. Database indexing for production MegaBLAST searches. Bioinformatics. 2008;24(16):1757–1764. DOI: 10.1093/ bioinformatics/btn322.
  26. Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal P, Yang H, Xue V, Knyazev S, Singer B, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, et al. Technology dictates algorithms: recent developments in read alignment. Genome Biology. 2021;22(1):249. DOI: 10.1186/s13059- 021-02443-7.
  27. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18): 3094–3100. DOI: 10.1093/bioinformatics/bty191.
  28. Edgar R. Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences. PeerJ. 2021;9:e10805. DOI: 10.7717/peerj.10805.
  29. Sahlin K. Effective sequence similarity detection with strobemers. Genome Research. 2021;31(11): 2080–2094. DOI: 10.1101/gr.275648.121.
  30. Bray N, Pachter L. MAVID: Constrained ancestral alignment of multiple sequences. Genome Research. 2004;14(4):693–699. DOI: 10.1101/gr.1960404.
  31. Choi J, Cho H, Kim S. GAME: A simple and efficient whole genome alignment method using maximal exact match filtering. Computational Biology and Chemistry. 2005;29(3):244–253. DOI: 10.1016/j.compbiolchem.2005.04.004.
  32. Abouelhoda M, Kurtz S, Ohlebusch E. CoCoNUT: an efficient system for the comparison and analysis of genomes. BMC Bioinformatics. 2008;9:476. DOI: 10.1186/1471-2105-9-476.
  33. Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S. Alignment of whole genomes. Nucleic Acids Research. 1999;27(11):2369–2376. DOI: 10.1093/nar/27.11.2369.
  34. Marcais G, Delcher A, Phillippy A, Coston R, Salzberg S, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology. 2018;14(1):e1005944. DOI: 10.1371/journal.pcbi.1005944.
  35. Weiner P. Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory (SWAT 1973). New York: IEEE; 1973. p. 1–11. DOI: 10.1109/SWAT.1973.13. 
  36. Ukkonen E. On-line construction of suffix trees. Algorithmica. 1995;14(3):249–260. DOI: 10.1007/ BF01206331.
  37. Zielezinski A, Vinga S, Almeida J, Karlowski W. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biology. 2017;18(1):186. DOI: 10.1186/s13059-017-1319-7.
  38. Bernard G, Chan C, Chan Y, Chua X, Cong Y, Hogan J, Maetschke S, Ragan M. Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Briefings in Bioinformatics. 2019;20(2):426–435. DOI: 10.1093/bib/bbx067.
  39. Felsenstein J. Inferring Phylogenies. Oxford University Press; 2004.
  40. Sheinman M, Massip F, Arndt P. Statistical properties of pairwise distances between leaves on a random yule tree. PLoS ONE. 2015;10(3):e0120206. DOI: 10.1371/journal.pone.0120206.
  41. Qi J, Luo H, Hao B. CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Research. 2004;32(Suppl_2):W45–W47. DOI: 10.1093/nar/gkh362.
  42. Sims G, Jun S, Wu G, Kim S. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(8):2677–2682. DOI: 10.1073/pnas.0813249106.
  43. Reinert G, Chew D, Sun F, Waterman M. Alignment-free sequence comparison (I): Statistics and power. Journal of Computational Biology. 2009;16(12):1615–1634. DOI: 10.1089/cmb.2009.0198.
  44. Wan L, Reinert G, Sun F, Waterman M. Alignment-free sequence comparison (II): Theoretical power of comparison statistics. Journal of Computational Biology. 2010;17(11):1467–1490. DOI: 10.1089/cmb.2010.0056.
  45. Song K, Ren J, Zhai Z, Liu X, Deng M, Sun F. Alignment-free sequence comparison based on next-generation sequencing reads. Journal of Computational Biology. 2013;20(2):64–79. DOI: 10.1089/cmb.2012.0228.
  46. Aflitos S, Severing E, Sanchez-Perez G, Peters S, de Jong H, de Ridder D. Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data. BMC Bioinformatics. 2015;16(1):352. DOI: 10.1186/s12859-015-0806-7.
  47. Fan H, Ives A, Surget-Groba Y, Cannon C. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics. 2015;16(1):522. DOI: 10.1186/s12864-015-1647-5.
  48. Bromberg R, Grishin N, Otwinowski Z. Phylogeny reconstruction with alignment-free method that corrects for horizontal gene transfer. PLoS Computational Biology. 2016;12(6):e1004985. DOI: 10.1371/journal.pcbi.1004985.
  49. Ulitsky I, Burstein D, Tuller T, Chor B. The average common substring approach to phylogenomic reconstruction. Journal of Computational Biology. 2006;13(2):336–350. DOI: 10.1089/cmb.2006. 13.336.
  50. Leimeister C, Morgenstern B. kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics. 2014;30(14):2000–2008. DOI: 10.1093/ bioinformatics/btu331.
  51. Horwege S, Lindner S, Boden M, Hatje K, Kollmar M, Leimeister C, Morgenstern B. Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches. Nucleic Acids Research. 2014;42(W1):W7–W11. DOI: 10.1093/nar/gku398.
  52. Haubold B, Pfaffelhuber P, Domazet-Loso M, Wiehe T. Estimating mutation distances from unaligned genomes. Journal of Computational Biology. 2009;16(10):1487–1500. DOI: 10.1089/ cmb.2009.0106.
  53. Morgenstern B, Schobel S, Leimeister C. Phylogeny reconstruction based on the length distribution of k-mismatch common substrings. Algorithms for Molecular Biology. 2017;12(1):27. DOI: 10.1186/s13015-017-0118-8.
  54. Brinda K, Sykulski M, Kucherov G. Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics. 2015;31(22):3584–3592. DOI: 10.1093/bioinformatics/btv419.
  55. Ondov B, Treangen T, Melsted P, Mallonee A, Bergman N, Koren S, Phillippy A. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology. 2016;17(1):132. DOI: 10.1186/s13059-016-0997-x.
  56. Lu J, Breitwieser F, Thielen P, Salzberg S. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science. 2017;3(1):e104. DOI: 10.7717/peerj-cs.104.
  57. Linard B, Swenson K, Pardi F. Rapid alignment-free phylogenetic identification of metagenomic sequences. Bioinformatics. 2019;35(18):3303–3312. DOI: 10.1093/bioinformatics/btz068.
  58. Hosseini M, Pratas D, Morgenstern B, Pinho A. Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements. GigaScience. 2020;9(5):giaa048.
  59. Navarro-Gomez D, Leipzig J, Shen L, Lott M, Stassen A, Wallace D, Wiggs J, Falk M, van Oven M, Gai X. Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier. Bioinformatics. 2015;31(8):1310–1312. DOI: 10.1093/bioinformatics/btu825.
  60. Lees J, Harris S, Tonkin-Hill G, Gladstone R, Lo S, Weiser J, Corander J, Bentley S, Croucher N. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research. 2019;29(2):304–316. DOI: 10.1101/gr.241455.118.
  61. Brinda K, Callendrello A, Cowley L, Charalampous T, Lee R, MacFadden D, Kucherov G, O’Grady J, Baym M, Hanage W. Lineage calling can identify antibiotic resistant clones within minutes. bioRxiv. 2018;(403204).
  62. Zhang Q, Jun S, Leuze M, Ussery D, Nookaew I. Viral phylogenomics using an alignment free method: A three-step approach to determine optimal length of k-mer. Scientific Reports. 2017;7(1):40712. DOI: 10.1038/srep40712.
  63. Ahlgren N, Ren J, Lu Y, Fuhrman J, Sun F. Alignment-free d * 2 oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic Acids Research. 2017;45(1):39–53. DOI: 10.1093/nar/gkw1002.
  64. Consortium IHGS. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822): 860–921. DOI: 10.1038/35057062.
  65. Peng C, Buldyrev S, Goldberger A, Havlin S, Sciortino F, Simons M, Stanley H. Long-range correlations in nucleotide sequences. Nature. 1992;356(6365):168–170. DOI: 10.1038/356168a0.
  66. Price A, Jones N, Pevzner P. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21(Suppl_1):i351–i358. DOI: 10.1093/bioinformatics/bti1018.
  67. Estoup J. Gammes stenographiques: methode et exercices pour l’acquisition de la vitesse. Institut Stenographique; 1916.
  68. Zipf G. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Cambridge, Mass.: Addison-Wesley Press; 1949.
  69. Newman M. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics. 2005;46(5): 323–351. DOI: 10.1080/00107510500052444.
  70. Mantegna R, Buldyrev S, Goldberger A, Havlin S, Peng C, Simons M, Stanley H. Linguistic features of noncoding DNA sequences. Physical Review Letters. 1994;73(23):3169–3172. DOI: 10.1103/PhysRevLett.73.3169.
  71. Gimona M. Protein linguistics – a grammar for modular protein assembly? Nature Reviews Molecular Cell Biology. 2006;7(1):68–73. DOI: 10.1038/nrm1785.
  72. Loose C, Jensen K, Rigoutsos I, Stephanopoulos G. A linguistic model for the rational design of antimicrobial peptides. Nature. 2006;443(7113):867–869. DOI: 10.1038/nature05233.
  73. Csuros M, Noe L, Kucherov G. Reconsidering the significance of genomic word frequencies. Trends in Genetics. 2007;23(11):543–546. DOI: 10.1016/j.tig.2007.07.008.
  74. Sindi S, Hunt B, Yorke J. Duplication count distributions in DNA sequences. Physical Review E. 2008;78(6):061912. DOI: 10.1103/PhysRevE.78.061912.
  75. Sheinman M, Ramisch A, Massip F, Arndt P. Evolutionary dynamics of selfish DNA explains the abundance distribution of genomic subsequences. Scientific Reports. 2016;6(1):30851. DOI: 10.1038/srep30851.
  76. Schmid C. Alu: structure, origin, evolution, significance, and function of one-tenth of human DNA. Progress in Nucleic Acid Research and Molecular Biology. 1996;53:283–319. DOI: 10.1016/S0079- 6603(08)60148-8.
  77. Austen J. Pride and Prejudice. Whitehall, London: T. Egerton; 1813.
  78. Sakoparnig T, Field C, van Nimwegen E. Whole genome phylogenies reflect the distributions of recombination rates for many bacterial species. eLife. 2021;10:e65366. DOI: 10.7554/eLife.65366.
  79. Dixit P, Pang T, Studier F, Maslov S. Recombinant transfer in the basic genome of Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America. 2015;112(29):9070–9075. DOI: 10.1073/pnas.1510839112.
  80. Harris K, Nielsen R. Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genetics. 2013;9(6):e1003521. DOI: 10.1371/journal.pgen.1003521.
  81. Massip F, Arndt P. Neutral evolution of duplicated DNA: An evolutionary stick-breaking process causes scale-invariant behavior. Physical Review Letters. 2013;110(14):148101. DOI: 10.1103/ PhysRevLett.110.148101.
  82. Massip F, Sheinman M, Schbath S, Arndt P. How evolution of genomes is reflected in exact DNA sequence match statistics. Molecular Biology and Evolution. 2015;32(2):524–535. DOI: 10.1093/molbev/msu313.
  83. Massip F, Sheinman M, Schbath S, Arndt P. Comparing the statistical fate of paralogous and orthologous sequences. Genetics. 2016;204(2):475–482. DOI: 10.1534/genetics.116.193912.
  84. Sheinman M, Arkhipova K, Arndt P, Dutilh B, Hermsen R, Massip F. Identical sequences found in distant genomes reveal frequent horizontal transfer across the bacterial domain. eLife. 2021;10:e62719. DOI: 10.7554/eLife.62719.
  85. Sheinman M, Arndt P, Massip F. Modeling the mosaic structure of bacterial genomes to infer their evolutionary history. bioRxiv. 2023;(2023.09.22.558938). DOI: 10.1101/2023.09.22.558938.
  86. Arndt P, Massip F, Sheinman M. An analytical derivation of the distribution of distances between heterozygous sites in diploid species to efficiently infer demographic history. bioRxiv. 2023;(2023.09.20.558510). DOI: 10.1101/2023.09.20.558510.
  87. Arndt P. Sequential and continuous time stick-breaking. Journal of Statistical Mechanics: Theory and Experiment. 2019;2019(6):064003. DOI: 10.1088/1742-5468/ab1dd8.
  88. Ziff R, McGrady E. The kinetics of cluster fragmentation and depolymerisation. Journal of Physics A: Mathematical and General. 1985;18(15):3027–3037. DOI: 10.1088/0305-4470/18/15/026.
  89. Gao K, Miller J. Algebraic distribution of segmental duplication lengths in whole-genome sequence self-alignments. PLoS ONE. 2011;6(7):e18464. DOI: 10.1371/journal.pone.0018464.
  90. Bailey J, Eichler E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nature Reviews Genetics. 2006;7(7):552–564. DOI: 10.1038/nrg1895.
  91. Vanin E. Processed pseudogenes: characteristics and evolution. Annual Review of Genetics. 1985;19:253–272. DOI: 10.1146/annurev.ge.19.120185.001345.
  92. Okamura K, Nakai K. Retrotransposition as a source of new promoters. Molecular Biology and Evolution. 2008;25(6):1231–1238. DOI: 10.1093/molbev/msn071.
  93. Kaessmann H, Vinckenbosch N, Long M. RNA-based gene duplication: mechanistic and evolutionary insights. Nature Reviews Genetics. 2009;10(1):19–31. DOI: 10.1038/nrg2487.
  94. Kingman J. The coalescent. Stochastic Processes and their Applications. 1982;13(3):235–248. DOI: 10.1016/0304-4149(82)90011-4.
  95. Hein J, Schierup M, Wiuf C. Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory. Oxford: Oxford University Press; 2004.
  96. Ceballos F, Joshi P, Clark D, Ramsay M, Wilson J. Runs of homozygosity: windows into population history and trait architecture. Nature Reviews Genetics. 2018;19(4):220–234. DOI: 10.1038/ nrg.2017.109.
  97. Henn B, Cavalli-Sforza L, Feldman M. The great human expansion. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(44):17758–17764. DOI: 10.1073/pnas.1212380109.
  98. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475(7357):493–496. DOI: 10.1038/nature10231.
  99. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent W, Mattick J, Haussler D. Ultraconserved elements in the human genome. Science. 2004;304(5675):1321–1325. DOI: 10.1126/science. 1098119.
  100. Snetkova V, Pennacchio L, Visel A, Dickel D. Perfect and imperfect views of ultraconserved sequences. Nature Reviews Genetics. 2022;23(3):182–194. DOI: 10.1038/s41576-021-00424-x.
  101. Sturtevant A. Essays on evolution. I. On the effects of selection on mutation rate. The Quarterly Review of Biology. 1937;12(4):464–467.
  102. Silander O, Tenaillon O, Chao L. Understanding the evolutionary fate of finite populations: The dynamics of mutational effects. PLoS Biology. 2007;5(4):e94. DOI: 10.1371/journal.pbio.0050094.
  103. Eyre-Walker A, Keightley P. The distribution of fitness effects of new mutations. Nature Reviews Genetics. 2007;8(8):610–618. DOI: 10.1038/nrg2146.
  104. Gao K, Miller J. Human–chimpanzee alignment: Ortholog exponentials and paralog power laws. Computational Biology and Chemistry. 2014;53:59–70. DOI: 10.1016/j.compbiolchem.2014.08.010.
  105. Boto L. Horizontal gene transfer in evolution: facts and challenges. Proceedings of the Royal Society B: Biological Sciences. 2010;277(1683):819–827. DOI: 10.1098/rspb.2009.1679.
  106. Puigbo P, Lobkovsky A, Kristensen D, Wolf Y, Koonin E. Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes. BMC Biology. 2014;12:66. DOI: 10.1186/s12915- 014-0066-4.
  107. Soucy S, Huang J, Gogarten J. Horizontal gene transfer: building the web of life. Nature Reviews Genetics. 2015;16(8):472–482. DOI: 10.1038/nrg3962.
  108. Van Etten J, Bhattacharya D. Horizontal gene transfer in eukaryotes: Not if, but how much? Trends in Genetics. 2020;36(12):915–925. DOI: 10.1016/j.tig.2020.08.006.
  109. Boucher Y, Cordero O, Takemura A, Hunt D, Schliep K, Bapteste E, Lopez P, Tarr C, Polz M. Local mobile gene pools rapidly cross species boundaries to create endemicity within global Vibrio cholerae populations. mBio. 2011;2(2):e00335–10. DOI: 10.1128/mBio.00335-10.
  110. Freeman V. Studies on the virulence of bacteriophage-infected strains of Corynebacterium diphtheriae. Journal of Bacteriology. 1951;61(6):675–688. DOI: 10.1128/jb.61.6.675-688.1951.
  111. Ravenhall M, Skunca N, Lassalle F, Dessimoz C. Inferring horizontal gene transfer. PLoS Computational Biology. 2015;11(5):e1004095. DOI: 10.1371/journal.pcbi.1004095.
  112. Smillie C, Smith M, Friedman J, Cordero O, David L, Alm E. Ecology drives a global network of gene exchange connecting the human microbiome. Nature. 2011;480(7376):241–244. DOI: 10.1038/nature10571.
  113. Groussin M, Poyet M, Sistiaga A, Kearney S, Moniz K, Noel M, Hooker J, Gibbons S, Segurel L, Froment A, Mohamed R, Fezeu A, Juimo V, Lafosse S, Tabe F, et al. Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell. 2021;184(8):2053–2067.e18. DOI: 10.1016/j.cell.2021.02.052.
  114. Darby C, Stolzer M, Ropp P, Barker D, Durand D. Xenolog classification. Bioinformatics. 2017;33(5):640–649. DOI: 10.1093/bioinformatics/btw686.
  115. Quiles-Puchalt N, Tormo-Mas MA, Campoy S, Toledo-Arana A, Monedero V, Lasa I, Novick R, Christie G, Penades J. A super-family of transcriptional activators regulates bacteriophage packaging and lysis in Gram-positive bacteria. Nucleic Acids Research. 2013;41(15):7260–7275. DOI: 10.1093/nar/gkt508.
  116. Dmitrijeva M, Tackmann J, Rodrigues J, Huerta-Cepas J, Coelho L, von Mering C. A global survey of eco-evolutionary pressures acting on horizontal gene transfer. Research Square. 2023:25. DOI: 10.21203/rs.3.rs-3062985/v1.
Поступила в редакцию: 
28.09.2023
Принята к публикации: 
05.10.2023
Опубликована онлайн: 
17.11.2023
Опубликована: 
30.11.2023