Optics and Precision Engineering, Volume. 33, Issue 1, 135(2025)

Super-resolution reconstruction of text image with multimodal semantic interaction

Yulan HAN*, Yihong LUO, Yujie CUI, and Chaofeng LAN
Author Affiliations
  • College of Measurement and Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin150080, China
  • show less
    Figures & Tables(11)
    Overall architecture of MSISR
    Visual dual flow integration module
    Cross-modal adaptive fusion module
    Visualization result of different methods on TextZoom dataset
    • Table 1. Influence of recognition accuracy on number of MSIB

      View table
      View in Article

      Table 1. Influence of recognition accuracy on number of MSIB

      数量CRNN准确率
      easymediumhardavg
      362.650.837.951.2
      464.052.739.152.7
      564.854.039.853.6
      664.153.239.453.0
      763.451.738.351.9
    • Table 2. Recognition accuracy of different modules

      View table
      View in Article

      Table 2. Recognition accuracy of different modules

      Swin语义先验VDFICAFMLEAavg/%
      ××××44.2
      ×××52.3
      ×××52.0
      ×××51.4
      ××53.2
      ×52.9
      ×53.6
    • Table 3. Impact of different fusion strategy over recognition accuracy

      View table
      View in Article

      Table 3. Impact of different fusion strategy over recognition accuracy

      融合策略CRNN准确率
      easymediumhardavg
      C61.750.637.050.5
      A61.250.836.750.3
      C+CA61.951.237.350.9
      CAFM63.152.638.152.0
    • Table 4. Impact of different loss function over recognition accuracy

      View table
      View in Article

      Table 4. Impact of different loss function over recognition accuracy

      损失CRNN准确率
      easymediumhardavg
      LGP61.750.637.050.5
      LEG62.551.137.351.1
      LEA62.851.637.551.4
    • Table 5. 不同值对识别精度的影响

      View table
      View in Article

      Table 5. 不同值对识别精度的影响

      β平均识别精度/%
      1×10-553.2
      1×10-453.6
      1×10-353.1
      1×10-252.9
      1×10-152.5
    • Table 6. Recognition accuracy of different methods on TextZoom dataset

      View table
      View in Article

      Table 6. Recognition accuracy of different methods on TextZoom dataset

      算 法ASTERMORANCRNN
      easymediumhardavgeasymediumhardavgeasymediumhardavg
      Bicubic64.742.431.247.260.637.930.844.136.421.121.126.8
      SRCNN369.443.432.249.563.239.030.245.338.721.620.927.7
      HAN471.152.839.055.367.448.535.451.551.635.829.039.6
      TSRN1375.156.340.158.370.153.337.954.852.538.231.441.4
      PCAN2477.560.743.161.573.757.641.058.559.645.434.847.4
      TBSRN2575.759.941.660.074.157.040.858.459.647.135.348.1
      TG2677.960.242.461.375.857.841.459.461.247.635.548.9
      MTSR2775.659.843.458.973.957.241.856.056.247.035.345.4
      TATT1678.963.445.463.672.560.243.159.562.653.439.852.6
      DPGSR1775.557.841.959.469.753.439.755.257.643.033.445.5
      TPGSR1477.060.942.461.272.257.841.358.161.049.936.749.9
      TPGSR-31478.962.744.562.874.960.544.160.563.152.038.651.8
      Ours80.063.645.664.176.560.944.861.764.854.039.853.6
    • Table 7. PSRN and SSIM of different methods on TextZoom dataset

      View table
      View in Article

      Table 7. PSRN and SSIM of different methods on TextZoom dataset

      算 法PSNRSSIM
      easymediumhardavgeasymediumhardavg
      Bicubic22.3518.9819.3920.350.788 40.625 40.659 20.696 1
      SRCNN323.4819.0619.3420.780.837 90.632 30.679 10.722 7
      HAN423.3019.0220.1620.950.869 10.653 70.738 70.759 7
      TSRN1325.0718.8619.7121.420.889 70.667 60.730 20.769 0
      PCAN2424.5719.1420.2621.490.883 00.678 10.747 50.775 2
      TBSRN2523.4619.1719.6820.910.872 90.645 50.745 20.760 3
      TG2623.8219.1719.6821.050.866 00.653 30.749 00.761 4
      MTSR2723.5519.8819.6421.160.873 40.684 30.747 60.773 9
      TATT1624.7219.0220.3121.530.900 60.691 10.770 30.793 0
      DPGSR1723.3618.7619.7720.770.871 10.671 90.750 70.769 8
      TPGSR1423.7318.6820.0620.970.880 50.673 80.744 00.771 9
      TPGSR-31424.3518.7319.9321.180.886 00.678 40.750 70.777 4
      Ours24.7619.9820.3921.880.901 30.697 60.778 00.797 7
    Tools

    Get Citation

    Copy Citation Text

    Yulan HAN, Yihong LUO, Yujie CUI, Chaofeng LAN. Super-resolution reconstruction of text image with multimodal semantic interaction[J]. Optics and Precision Engineering, 2025, 33(1): 135

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Jul. 31, 2024

    Accepted: --

    Published Online: Apr. 1, 2025

    The Author Email: Yulan HAN (hanyulan@hrbust.edu.cn)

    DOI:10.37188/OPE.20253301.0135

    Topics