A Q-learning method for creating a Hex opening library

Hex is a perfect-information board game, and its opening library-an essential component of the game system — has traditionally been generated based on human expertise and Monte Carlo tree search (MCTS) algorithms. However, this approach is computationally expensive and may not consistently ensure accuracy. This study proposes a self-play method based on Q-learning for the efficient construction of Hex opening libraries. The proposed method employs multi-threaded simulations and an improved upper confidence bound applied to trees (UCT) algorithm to identify promising opening moves. An enhanced ε-greedy strategy is incorporated to improve the convergence rate of the Q-learning algorithm. To further improve performance, Q-values are integrated into the upper confidence bound(UCB) formula as prior knowledge, which is intended to enhance decision-making accuracy during gameplay. Experimental results indicate that after 3 000 training iterations, the Q-values across the board converge, suggesting the method's potential for stable policy learning. In comparative evaluations, the generated opening library achieved a 62.9% average win rate against the improved UCT algorithm. When Q-values were used as prior input to the UCB formula, the average win rate increased to 75.9%. The method was also applied in the Chinese Computer Game Competition, where the implementation received a first-place award, supporting the practical applicability of the approach.

Keywords

computer game Hex opening library Q-learning reinforcement learning

Tools

Get Citation

Copy Citation Text

XU Zhifan, LI Yuan, WANG Jingwen, LI Zhuoxuan, CAO Yiding. A Q-learning method for creating a Hex opening library[J]. Journal of Nantong University (Natural Science Edition), 2025, 24(2): 22

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Received: May. 29, 2024

Accepted: Aug. 25, 2025

Published Online: Aug. 25, 2025

The Author Email: LI Yuan (syliyuan@sut.edu.cn)

DOI:10.12194/j.ntu.20240529001

Topics