ISSN 0869-6632 (Print)
ISSN 2542-1905 (Online)


For citation:

Zimnyakov D. А., Alonova M. V., Skripal A. V., Inkin M. G., Zaytsev S. S., Feodorova V. Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics. Izvestiya VUZ. Applied Nonlinear Dynamics, 2024, vol. 32, iss. 4, pp. 439-459. DOI: 10.18500/0869-6632-003110, EDN: CGGWGX

This is an open access article distributed under the terms of Creative Commons Attribution 4.0 International License (CC-BY 4.0).
Full text PDF(Ru):
Language: 
Russian
Article type: 
Article
UDC: 
004.93:577.113.5:535-4
EDN: 

Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics

Autors: 
Zimnyakov Dmitry Александрович, Yuri Gagarin State Technical University of Saratov
Alonova Marina Vasilevna, Yuri Gagarin State Technical University of Saratov
Skripal Anatolij Vladimirovich, Saratov State University
Inkin Maksim Glebovich, Saratov State University
Abstract: 

Purpose of this work is the comparative analysis of two approaches to the synthesis of two-dimensional binary identifiers of nucleotide sequences obtained using DNA sequencing of biological objects.

Methods. One of the approaches is based on modeling the polarization-dependent diffraction of a coherent readout beam on a two-dimensional phase-modulating structure (phase screen) associated with the symbolic sequence obtained as a result of DNA sequencing. Another approach uses a two-dimensional representation of the symbolic sequence using a chaos game representation (CGR). To obtain a finite-element CGR mapping, it is fragmented into a given number of cells, ensuring acceptable sensitivity of the synthesized binary identifier to structural changes in the displayed sequence.

Results. The comparative analysis was carried out using fragments of symbol sequences corresponding to various strains (Wuhan, Delta, Omicron) of the SarSCoV2 virus. In the course of the analysis, the correlation coefficients between the binary identifiers corresponding to various strains were obtained and compared with each other.

Conclusion. It has been established that binary identifiers synthesized using the polarization encoding technique are characterized by significantly higher sensitivity to structural changes in the analyzed sequences and smaller sizes compared to CGR binary identifiers.

Acknowledgments: 
This work was supported by the Russian Science Foundation, grant No. 22-21-00194
Reference: 
  1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics. 2016;17(6):333–351. DOI:10.1038/nrg.2016.49.
  2. Neidle S, Sanderson M. Principles of Nucleic Acid Structure. Academic Press; 2021. 454 p.
  3. Randic M, Vracko M, Lers N, Plavsic D. Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Letters. 2003;368(1–2):1–6. DOI: 10.1016/ S0009-2614(02)01784-0.
  4. Randic M, Vracko M, Nandy A, Basak SC. On 3-D graphical representation of DNA primary sequence and their numerical characterization. Journal of Chemical Information and Computer Sciences. 2000;40(5):1235–1244. DOI: 10.1021/ci000034q.
  5. Xie G, Mo Z. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications. Journal of Theoretical Biology. 2011;269(1): 123–130. DOI: 10.1016/j.jtbi.2010.10.018.
  6. Jafarzadeh N, Iranmanesh A. A novel graphical and numerical representation for analyzing DNA sequences based on codons. Match-Communications in Mathematical and Computer Chemistry. 2012;68(2):611–620.
  7. Jafarzadeh N, Iranmanesh A. C-curve: A novel 3D graphical representation of DNA sequence based on codons. Mathematical Biosciences. 2013;241(2):217–224. DOI: 10.1016/j.mbs. 2012.11.009.
  8. Hamori E, Ruskin J. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. Journal of Biological Chemistry. 1983;258(2):1318–1327. DOI: 10.1016/S0021-9258(18)33196-X.
  9. Zhang CT, Zhang R, Ou HY. The Z-curve databases: A graphic representation of genome sequence. Bioinformatics. 2003;19(5):593–599. DOI: 10.1093/bioinformatics/btg041.
  10. Yu ZG, Wang B. A time series model of CDS sequences in complete genome. Chaos Solitons Fractals. 2001;12(3):519–526. DOI: 10.1016/S0960-0779(99)00208-8.
  11. Jeffrey HJ. Chaos game representation of gene structure. Nucleic Acids Research. 1990;18(8):2163– 2170. DOI: 10.1093/nar/18.8.2163.
  12. Anitas EM. Small-angle scattering and multifractal analysis of DNA sequences. International Journal of Molecular Sciences. 2020;21(13):4651. DOI: 10.3390/ijms21134651.
  13. Burma PK, Raj A, Deb JK, Brahmachari SK. Genome analysis: a new approach for visualization of sequence organization in genomes. Journal of Biosciences. 1992;17(4):395–411. DOI: 10.1007/ BF02720095.
  14. Huynen MA, Konings DAM, Hogeweg P. Equal G and C contents in histone genes indicate selection pressures on mRna secondary structure. Journal of Molecular Evolution. 1992;34(4):280– 291. DOI: 10.1007/BF00160235.
  15. Hill KA, Schisler NJ, Singh SM. Chaos game representation of coding regions of human globin genes and alcohol dehydrogenase genes of phylogenetically divergent species. Journal of Molecular Evolution. 1992;35(3):261–269. DOI: 10.1007/BF00178602.
  16. Almeida JS, Carrico JA, Maretzek A, Noble PA, Fletcher M. Analysis of genomic sequences by chaos game representation. Bioinformatics. 2001;17(5):429–437. DOI: 10.1093/bioinformatics/ 17.5.429.
  17. Zimnyakov DA, Alonova MV, Skripal AnV, Zaitsev SS, Feodorova VA. Polarization analysis of gene sequence structures: Mapping of extreme local polarization states. Journal of Biomedical Photonics & Engineering. 2022;8(4):040302. DOI: 10.18287/JBPE22.08.040302.
  18. Zimnyakov DA, Alonova MV, Skripal AnV, Dobdin SY, Feodorova VA. Quantification of the diversity in gene structures using the principles of polarization mapping. Current Issues in Molecular Biology. 2023;45(2):1720–1740. DOI: 10.3390/cimb45020111.
  19. Ulyanov SS, Ulianova OV, Zaytsev SS, Saltykov YV, Feodorova VA. Statistics on gene-based laser speckles with a small number of scatterers: implications for the detection of polymorphism in the Chlamydia trachomatis omp1 gene. Laser Physics Letters. 2018;15:045601. DOI: 10.1088/1612- 202X/aaa11c.
  20. Rak A, Isakova-Sivak I, Rudenko L. Overview of Nucleocapsid-Targeting Vaccines against COVID-19. Vaccines. 2023;11(12):1810. DOI: 10.3390/vaccines11121810.
  21. Telenti A, Hodcroft EB, Robertson DL. The Evolution and Biology of SARS-CoV-2 Variants. Cold Spring Harbor Perspectives in Medicine. 2022;12:a041390. DOI: 10.1101/cshperspect.a041390.
  22. Bergmann CC, Silverman RH. COVID-19: coronavirus replication, pathogenesis, and therapeutic strategies. Cleveland Clinic Journal of Medicine. 2020;87:321—327 DOI: 10.3949/ccjm.87a.20047.
  23. Shang J, Wan Y, Luo C, Ye G, Geng Q, Auerbach A, Li F. Cell entry mechanisms of SARS-CoV-2. Proceedings of the National Academy of Sciences. 2020;117:11727—11734. DOI: 10.1073/pnas. 2003138117.
  24. Grobbelaar LM, Venter C, Vlok M, Ngoepe M, Laubscher GJ, Lourens PJ, Steenkamp J, Kell DB, Pretorius E. SARS-CoV-2 spike protein S1 induces fibrin (ogen) resistant to fibrinolysis: implications for microclot formation in COVID-19. Bioscience Reports. 2021;41(8):BSR20210611. DOI: 10.1042/BSR20210611.
  25. Singh D, Yi SV. On the origin and evolution of SARS-CoV-2. Experimental & Molecular Medicine. 2021;53:537—547. DOI: 10.1038/s12276-021-00604-z.
  26. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen HD, Chen J, Luo Y, Guo H, Jiang RD, Liu MQ, Chen Y, Shen XR, Wang X, Zheng XS, Zhao K, Chen QJ, Deng F, Liu LL, Yan B, Zhan FX, Wang YY, Xiao GF, Shi ZL. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270–273. DOI: 10.1038/s41586-020-2012-7.
  27. Chakraborty C, Bhattacharya M, Chopra H, Bhattacharya P, Islam MA, Dhama K. Recently emerged omicron subvariant BF.7 and its R346T mutation in the RBD region reveal increased transmissibility and higher resistance to neutralization antibodies: need to understand more under the current scenario of rising cases in China and fears of driving a new wave of the COVID-19 pandemic. International Journal of Surgery. 2023;109(4):1037–1040. DOI: 10.1097/JS9.00000 00000000219.
  28. GISAID: Official hCoV-19 Reference Sequence. Acc. ID: EPI_ISL_402124. Available online: https://gisaid.org/wiv04/.
  29. GISAID: Official hCoV-19 Reference Sequence. Acc. ID: EPI_ISL_2552101. Available online: https://gisaid.org/wiv04/.
  30. GISAID: Official hCoV-19 Reference Sequence. Acc. ID: EPI_ISL_9991311. Available online: https://gisaid.org/wiv04/.
  31. Goodman JW. Introduction to Fourier Optics, 4th ed. New York: Macmillan Learning; 2017. 491 p.
  32. Bracewell R. The Fourier Transform and Its Applications. New York: McGraw Hill; 1986. 474 p.
  33. Chipman R, Lam WST, Young G. Polarized Light and Optical Systems (Optical Sciences and Applications of Light). Boca-Raton: CRC Press; 2018. 1036 p.
  34. Anitas EM. Fractal analysis of DNA sequences using frequency chaos game representation and small-angle scattering. International Journal of Molecular Sciences. 2022;23(3):1847. DOI: 10. 3390/ijms23031847.
Received: 
07.11.2023
Accepted: 
28.02.2024
Available online: 
28.05.2024
Published: 
31.07.2024