Computer Engineering, Volume. 51, Issue 8, 53(2025)

Sequence Alignment Algorithm Based on Combined minimizer Seeds on Pan-Genome Graph

GAO Jia1,2 and XU Yun1,2、*
Author Affiliations
  • 1School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, Anhui, China
  • 2Key Laboratory of High Performance Computing of Anhui Province, Hefei 230027, Anhui, China
  • show less
    References(24)

    [1] [1] QUAN W, GUAN D F, QUAN G R, et al. Short read alignment based on maximal approximate match seeds[J]. Frontiers in Molecular Biosciences, 2020, 7: 572934.

    [3] [3] TETTELIN H, MASIGNANI V, CIESLEWICZ M J, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(39): 13950-13955.

    [4] [4] BRANDT D Y C, AGUIAR V R C, BITARELLO B D, et al. Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 genomes project phase I data[J]. G3: Genes, Genomes, Genetics, 2015, 5(5): 931-941.

    [5] [5] OUTTEN J, WARREN A. Methods and developments in graphical pangenomics[J]. Journal of the Indian Institute of Science, 2021, 101(3): 485-498.

    [7] [7] WILBUR W J, LIPMAN D J. Rapid similarity searches of nucleic acid and protein data banks[J]. Proceedings of the National Academy of Sciences of the United States of America, 1983, 80(3): 726-730.

    [8] [8] SMITH T F, WATERMAN M S. Identification of common molecular subsequences[J]. Journal of Molecular Biology, 1981, 147(1): 195-197.

    [9] [9] NEEDLEMAN S B, WUNSCH C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48(3): 443-453.

    [10] [10] DELCHER A L, KASIF S, FLEISCHMANN R D, et al. Alignment of whole genomes[J]. Nucleic Acids Research, 1999, 27(11): 2369-2376.

    [11] [11] DELCHER A L, PHILLIPPY A, CARLTON J, et al. Fast algorithms for large-scale genome alignment and comparison[J]. Nucleic Acids Research, 2002, 30(11): 2478-2483.

    [12] [12] SIRN J, MONLONG J, CHANG X, et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes[J]. Science, 2021, 374(6574): 8871.

    [13] [13] RAUTIAINEN M, MARSCHALL T. GraphAligner: rapid and versatile sequence-to-graph alignment[J]. Genome Biology, 2020, 21(1): 253.

    [14] [14] ROBERTS M, HAYES W, HUNT B R, et al. Reducing storage requirements for biological sequence comparison[J]. Bioinformatics, 2004, 20(18): 3363-3369.

    [15] [15] JAIN C, RHIE A, ZHANG H W, et al. Weighted minimizer sampling improves long read mapping[J]. Bioinformatics, 2020, 36: 111-118.

    [16] [16] LI H. Minimap2: pairwise alignment for nucleotide sequences[J]. Bioinformatics, 2018, 34(18): 3094-3100.

    [17] [17] LI H, FENG X W, CHU C. The design and construction of reference pangenome graphs with minigraph[J]. Genome Biology, 2020, 21(1): 265.

    [18] [18] MA J, CCERES M, SALMELA L, et al. GraphChainer: co-linear chaining for accurate alignment of long reads to variation graphs[J]. Bioinformatics, 2023, 39(8): 475.

    [19] [19] CHANDRA G, JAIN C. Sequence to graph alignment using gap-sensitive co-linear chaining[C]//Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology. Berlin, Germany: Springer, 2023: 58-73.

    [20] [20] JOUDAKI A, METEREZ A, MUSTAFA H, et al. Aligning distant sequences to graphs using long seed sketches[J]. Genome Research, 2023, 33(7): 1208-1217.

    [21] [21] HOANG M, ZHENG H Y, KINGSFORD C. Differentiable learning of sequence-specific minimizer schemes with DeepMinimizer[J]. Journal of Computational Biology, 2022, 29(12): 1288-1304.

    [23] [23] ALTSCHUL S F, GISH W, MILLER W, et al. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215(3): 403-410.

    [24] [24] OI M, IKI M. Edlib: a C/C++ library for fast, exact sequence alignment using edit distance[J]. Bioinformatics, 2017, 33(9): 1394-1395.

    [25] [25] ONO Y, ASAI K, HAMADA M. PBSIM: PacBio reads simulator—toward accurate genome assembly[J]. Bioinformatics, 2013, 29(1): 119-121.

    [26] [26] GARRISON E, SIRN J, NOVAK A M, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference[J]. Nature Biotechnology, 2018, 36(9): 875-879.

    [27] [27] MOKVELD T, LINTHORST J, AL-ARS Z, et al. CHOP: haplotype-aware path indexing in population graphs[J]. Genome Biology, 2020, 21(1): 65.

    Tools

    Get Citation

    Copy Citation Text

    GAO Jia, XU Yun. Sequence Alignment Algorithm Based on Combined minimizer Seeds on Pan-Genome Graph[J]. Computer Engineering, 2025, 51(8): 53

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Jan. 15, 2024

    Accepted: Aug. 26, 2025

    Published Online: Aug. 26, 2025

    The Author Email: XU Yun (xuyun@ustc.edu.cn)

    DOI:10.19678/j.issn.1000-3428.0069237

    Topics