Chained semantic generation network for video captioning

	MSR-VTT				MSVD
Mode	B-4	C	M	R	B-4	C	M	R
G	45.83	53.16	29.28	63.64	62.35	109.71	39.04	77.04
L	45.21	52.63	29.38	63.59	61.07	113.41	39.71	77.34
G-L	46.70	52.93	29.55	64.03	65.20	119.58	41.51	79.02
交叉	47.44	52.98	29.75	64.11	66.29	119.93	42.25	79.30
非交叉	46.72	52.67	29.60	63.99	65.47	120.09	41.49	79.24
k=1	46.70	52.93	29.55	64.03	65.20	119.58	41.51	79.02
k=2	47.44	52.98	29.75	64.11	66.29	119.93	42.25	79.30
k=3	47.70	53.64	29.78	64.31	65.62	122.44	42.25	79.93
k=4	48.20	53.75	29.98	64.48	66.51	122.26	41.98	80.05
k=5	48.35	54.22	29.90	64.47	64.45	121.09	42.13	79.85

Table 2. Performance comparison on MSR-VTT dataset

View table

View in Article

Table 2. Performance comparison on MSR-VTT dataset

Method	B-4	C	M	R
MTVC^［20］	40.8	47.1	28.8	60.2
CIDEnt-RL^［21］	40.5	51.7	28.4	61.4
SibNet^［22］	40.9	47.5	27.5	60.2
HACA^［23］	43.4	49.7	29.5	61.8
TAMoE^［24］	42.2	48.9	29.4	62.0
POS^［25］	41.3	53.4	28.7	62.1
MARN^［10］	40.4	47.1	28.1	60.7
JSRL-VCT^［26］	42.3	49.1	29.7	62.8
GRU-EVE^［27］	38.3	48.1	28.4	60.7
STG-KD^［28］	40.5	47.1	28.3	60.9
SAAT^［29］	39.9	51.0	27.7	61.2
ORG-TRL^［9］	43.6	50.9	28.8	62.1
SAVC^［8］	45.8	53.2	29.3	63.6
ChainS-Net	48.2	53.8	30.0	64.5

Table 3. Performance comparison on MSVD dataset

View table

View in Article

Table 3. Performance comparison on MSVD dataset

Method	B-4	C	M	R
LSTM-E^［30］	45.3		31.0
h-RNN^［31］	49.9	65.8	32.6
aLSTMs^［32］	50.8	74.8	33.3
SCN^［14］	51.1	77.7	33.5
MTVC^［20］	54.5	92.4	36.0	72.8
ECO^［18］	53.5	85.8	35.0
SibNet^［22］	54.2	88.2	34.8	71.7
POS^［^25］	53.9	91.0	34.9	72.1
MARN^［10］	48.6	92.2	35.1	71.9
JSRL-VCT^［26］	52.8	87.8	36.1	71.8
GRU-EVE^［27］	47.9	78.1	35.0	71.5
STG-KD^［28］	52.2	93.0	36.9	73.9
SAAT^［29］	46.5	81.0	33.5	69.4
ORG-TRL^［9］	54.3	95.2	36.4	73.9
SAVC^［8］	62.4	109.7	39.0	77.0
ChainS-Net	65.6	122.4	42.3	79.9

Table 4. Runtime comparison
View table
View in Article
Table 4. Runtime comparison
算法运行时间/ms
SAVC 517
ChainS-Net 774

Tools

Get Citation

Copy Citation Text

Lin MAO, Hang GAO, Dawei YANG, Rubo ZHANG. Chained semantic generation network for video captioning[J]. Optics and Precision Engineering, 2022, 30(24): 3198

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites