Efficient feature selection based on Gower distance for breast cancer diagnosis

Fig. 7. Confusion matrices of standard classifiers with the proposed feature selection method based on the Gower distance: (a) KNN, (b) NB, (c) SVM, (d) DT, (e) RF, and (f) LR.

Download full size

View in Article

Fig. 8. Execution time of classifiers with and without the proposed feature selection method.

Download full size

View in Article

Table 1. WDBC dataset details.

View table

View in Article

Table 1. WDBC dataset details.

Parameter	Characteristic
Feature type	Real
Dataset characteristics	Multivariate
Number of cases	569
Number of healthy people	357
Number of unhealthy people	212
Number of features	30
Missing values	N\A
Classification type	Binary classification

Table 2. System environment and setup.
View table
View in Article
Table 2. System environment and setup.
Specification Description
RAM 16 GB
Processor 6th Gen Intel® Core™ i7
Development environment Visual Studio Code
Software environment Python 3.12

Table 3. Performance evaluation of traditional classifiers without feature selection.
View table
View in Article
Table 3. Performance evaluation of traditional classifiers without feature selection.
Classifier Metrics
Recall (%) Precision (%) F1-score (%) Accuracy (%)
KNN 92.10 97.22 94.60 92.98
NB 90.54 93.06 91.78 89.47
DT 89.61 95.83 92.62 90.35
RF 97.10 93.06 95.04 93.86
SVM 89.74 97.22 93.33 91.23
LR 92.00 95.83 93.88 92.10

Table 4. Accuracy results of classifiers and its impact on performance with varying feature ratios and block sizes.

View table

View in Article

Table 4. Accuracy results of classifiers and its impact on performance with varying feature ratios and block sizes.

Block size	Classifier	Feature ratio from the dataset
Block size	Classifier	10%	20%	30%	40%	50%	60%	70%
5 samples per block	KNN	87.71	89.08	93.22	98.24	90.35	92.10	91.22
	NB	85.96	87.08	91.01	96.49	91.22	88.59	90.35
	DT	82.45	89.47	92.21	97.36	93.85	94.73	91.22
	RF	88.59	92.10	94.73	99.12	94.73	95.61	95.61
	SVM	83.34	84.56	92.91	95.61	94.76	92.30	90.74
	LR	85.95	88.40	91.50	97.36	90.89	87.89	90.21
10 samples per block	KNN	87.71	85.08	91.22	89.47	91.22	92.10	91.22
	NB	85.96	85.98	84.21	88.59	89.47	90.33	90.35
	DT	80.70	88.59	82.45	92.98	88.59	92.98	93.85
	RF	89.47	92.10	92.10	94.73	95.61	95.61	94.73
	SVM	84.65	92.40	88.40	93.94	92.20	88.90	86.30
	LR	89.20	90.80	92.02	92.98	94.34	94.87	93.33
15 samples per block	KNN	88.21	85.08	89.47	87.71	91.22	91.22	92.11
	NB	86.96	85.08	85.96	85.08	90.08	90.35	93.21
	DT	81.70	88.59	89.47	89.47	92.98	92.98	94.73
	RF	88.47	92.98	90.35	92.10	94.73	94.73	95.61
	SVM	88.90	90.32	89.02	85.41	89.89	90.87	94.67
	LR	90.21	92.56	90.20	91.21	93.43	94.67	96.89
20 samples per block	KNN	89.31	85.08	92.22	87.71	93.12	92.10	93.33
	NB	86.46	85.99	88.21	85.08	92.47	88.59	90.35
	DT	83.40	88.59	89.33	88.59	91.22	92.98	91.22
	RF	91.22	92.10	93.85	92.98	94.73	93.85	95.61
	SVM	89.20	87.60	89.80	90.65	91.30	87.80	89.50
	LR	84.50	90.40	92.31	91.83	92.50	91.12	90.32
25 samples per block	KNN	87.71	85.08	89.47	87.71	91.22	92.10	91.22
	NB	85.96	85.08	85.96	85.08	89.47	90.11	90.35
	DT	83.33	87.71	91.22	88.59	90.35	93.85	92.98
	RF	90.35	92.98	92.98	92.10	94.73	95.61	94.73
	SVM	84.20	87.60	89.54	89.60	90.30	90.59	90.74
	LR	88.80	91.50	90.89	91.23	93.98	94.43	93.20

Table 5. Performance comparison with recently related works on the WDBC dataset.

View table

View in Article

Table 5. Performance comparison with recently related works on the WDBC dataset.

Year	Reference	Feature selection	Number of selected features	Classifier	Accuracy (%)	Execution time
2022	[26]	Univariate and recursive	16	Deep extreme gradient descent optimization	98.73	N/A
2023	[17]	PCC	15	KNN	91.2	N/A
				RF	96.5
				LR	94.7
				XGBoost	97.4
2023	[20]	PCA	16	SVM	98.07	N/A
				DT	94.20
				KNN	96.84
				RF	94.20
				MLP	97.54
				NB	91.04
				LR	98.42
				LR+SVM	98.77
2023	[27]	Chi-square	15	MLP	95	N/A
				LR	92
				KNN	96
				SVM	92
				RF	94
2023	[28]	Gorilla troops optimization (GTO)	30	Deep Q learning (DQL)	98.88	N/A
2023	[29]	ESO	12	RF	98.95	4 s
2024	[30]	EHO	18	KNN	97.96	N/A
2024	Proposed in this paper	Gower distance	12	KNN	98.24	0.1472 ms
				NB	96.49	0.0087 ms
				DT	97.36	0.0021 ms
				RF	99.12	0.0583 ms
				SVM	95.61	0.0262 ms
				LR	97.36	0.0018 ms

Tools

Get Citation

Copy Citation Text

Salwa Shakir Baawi, Mustafa Noaman Kadhim, Dhiah Al-Shammary. Efficient feature selection based on Gower distance for breast cancer diagnosis[J]. Journal of Electronic Science and Technology, 2025, 23(2): 100315

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Dec. 3, 2024

Accepted: Apr. 21, 2025

Published Online: Jun. 16, 2025

The Author Email: Mustafa Noaman Kadhim (mustafa.noaman@qu.edu.iq)

DOI:10.1016/j.jnlest.2025.100315

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

Table 1. WDBC dataset details.

Table 1. WDBC dataset details.

Table 2. System environment and setup.

Table 2. System environment and setup.

Table 3. Performance evaluation of traditional classifiers without feature selection.

Table 3. Performance evaluation of traditional classifiers without feature selection.

Table 4. Accuracy results of classifiers and its impact on performance with varying feature ratios and block sizes.

Table 4. Accuracy results of classifiers and its impact on performance with varying feature ratios and block sizes.

Table 5. Performance comparison with recently related works on the WDBC dataset.

Table 5. Performance comparison with recently related works on the WDBC dataset.