Journal of Electronic Science and Technology, Volume. 23, Issue 2, 100315(2025)

Efficient feature selection based on Gower distance for breast cancer diagnosis

Salwa Shakir Baawi, Mustafa Noaman Kadhim*, and Dhiah Al-Shammary
Author Affiliations
  • College of Computer Science and Information Technology, University of Al-Qadisiyah, Al Diwaniyah, 58001, Iraq
  • show less
    Figures & Tables(13)
    Cloud-based feature selection framework using the Gower distance for enhanced breast cancer diagnosis in medical organizations.
    Workflow of the proposed methodology.
    Normalization stage with min-max scaler: (a) before and (b) after min-max normalization.
    Confusion matrix.
    Confusion matrices of standard classifiers without feature selection: (a) KNN, (b) NB, (c) SVM, (d) DT, (e) RF, and (f) LR.
    Accuracy comparison of classifiers with and without the proposed feature selection method based on the Gower distance.
    Confusion matrices of standard classifiers with the proposed feature selection method based on the Gower distance: (a) KNN, (b) NB, (c) SVM, (d) DT, (e) RF, and (f) LR.
    Execution time of classifiers with and without the proposed feature selection method.
    • Table 1. WDBC dataset details.

      View table
      View in Article

      Table 1. WDBC dataset details.

      ParameterCharacteristic
      Feature typeReal
      Dataset characteristicsMultivariate
      Number of cases569
      Number of healthy people357
      Number of unhealthy people212
      Number of features30
      Missing valuesN\A
      Classification typeBinary classification
    • Table 2. System environment and setup.

      View table
      View in Article

      Table 2. System environment and setup.

      SpecificationDescription
      RAM16 GB
      Processor6th Gen Intel® Core™ i7
      Development environmentVisual Studio Code
      Software environmentPython 3.12
    • Table 3. Performance evaluation of traditional classifiers without feature selection.

      View table
      View in Article

      Table 3. Performance evaluation of traditional classifiers without feature selection.

      ClassifierMetrics
      Recall (%)Precision (%)F1-score (%)Accuracy (%)
      KNN92.1097.2294.6092.98
      NB90.5493.0691.7889.47
      DT89.6195.8392.6290.35
      RF97.1093.0695.0493.86
      SVM89.7497.2293.3391.23
      LR92.0095.8393.8892.10
    • Table 4. Accuracy results of classifiers and its impact on performance with varying feature ratios and block sizes.

      View table
      View in Article

      Table 4. Accuracy results of classifiers and its impact on performance with varying feature ratios and block sizes.

      Block sizeClassifierFeature ratio from the dataset
      10%20%30%40%50%60%70%
      5 samples per blockKNN87.7189.0893.2298.2490.3592.1091.22
      NB85.9687.0891.0196.4991.2288.5990.35
      DT82.4589.4792.2197.3693.8594.7391.22
      RF88.5992.1094.7399.1294.7395.6195.61
      SVM83.3484.5692.9195.6194.7692.3090.74
      LR85.9588.4091.5097.3690.8987.8990.21
      10 samples per blockKNN87.7185.0891.2289.4791.2292.1091.22
      NB85.9685.9884.2188.5989.4790.3390.35
      DT80.7088.5982.4592.9888.5992.9893.85
      RF89.4792.1092.1094.7395.6195.6194.73
      SVM84.6592.4088.4093.9492.2088.9086.30
      LR89.2090.8092.0292.9894.3494.8793.33
      15 samples per blockKNN88.2185.0889.4787.7191.2291.2292.11
      NB86.9685.0885.9685.0890.0890.3593.21
      DT81.7088.5989.4789.4792.9892.9894.73
      RF88.4792.9890.3592.1094.7394.7395.61
      SVM88.9090.3289.0285.4189.8990.8794.67
      LR90.2192.5690.2091.2193.4394.6796.89
      20 samples per blockKNN89.3185.0892.2287.7193.1292.1093.33
      NB86.4685.9988.2185.0892.4788.5990.35
      DT83.4088.5989.3388.5991.2292.9891.22
      RF91.2292.1093.8592.9894.7393.8595.61
      SVM89.2087.6089.8090.6591.3087.8089.50
      LR84.5090.4092.3191.8392.5091.1290.32
      25 samples per blockKNN87.7185.0889.4787.7191.2292.1091.22
      NB85.9685.0885.9685.0889.4790.1190.35
      DT83.3387.7191.2288.5990.3593.8592.98
      RF90.3592.9892.9892.1094.7395.6194.73
      SVM84.2087.6089.5489.6090.3090.5990.74
      LR88.8091.5090.8991.2393.9894.4393.20
    • Table 5. Performance comparison with recently related works on the WDBC dataset.

      View table
      View in Article

      Table 5. Performance comparison with recently related works on the WDBC dataset.

      YearReferenceFeature selectionNumber of selected featuresClassifierAccuracy (%)Execution time
      2022[26]Univariate and recursive16Deep extreme gradient descent optimization98.73N/A
      2023[17]PCC15KNN91.2N/A
      RF96.5
      LR94.7
      XGBoost97.4
      2023[20]PCA16SVM98.07N/A
      DT94.20
      KNN96.84
      RF94.20
      MLP97.54
      NB91.04
      LR98.42
      LR+SVM98.77
      2023[27]Chi-square15MLP95N/A
      LR92
      KNN96
      SVM92
      RF94
      2023[28]Gorilla troops optimization (GTO)30Deep Q learning (DQL)98.88N/A
      2023[29]ESO12RF98.954 s
      2024[30]EHO18KNN97.96N/A
      2024Proposed in this paperGower distance12KNN98.240.1472 ms
      NB96.490.0087 ms
      DT97.360.0021 ms
      RF99.120.0583 ms
      SVM95.610.0262 ms
      LR97.360.0018 ms
    Tools

    Get Citation

    Copy Citation Text

    Salwa Shakir Baawi, Mustafa Noaman Kadhim, Dhiah Al-Shammary. Efficient feature selection based on Gower distance for breast cancer diagnosis[J]. Journal of Electronic Science and Technology, 2025, 23(2): 100315

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Dec. 3, 2024

    Accepted: Apr. 21, 2025

    Published Online: Jun. 16, 2025

    The Author Email: Mustafa Noaman Kadhim (mustafa.noaman@qu.edu.iq)

    DOI:10.1016/j.jnlest.2025.100315

    Topics