Journal of Nantong University (Natural Science Edition), Volume. 24, Issue 2, 22(2025)
A Q-learning method for creating a Hex opening library
Hex is a perfect-information board game, and its opening library-an essential component of the game system — has traditionally been generated based on human expertise and Monte Carlo tree search (MCTS) algorithms. However, this approach is computationally expensive and may not consistently ensure accuracy. This study proposes a self-play method based on Q-learning for the efficient construction of Hex opening libraries. The proposed method employs multi-threaded simulations and an improved upper confidence bound applied to trees (UCT) algorithm to identify promising opening moves. An enhanced ε-greedy strategy is incorporated to improve the convergence rate of the Q-learning algorithm. To further improve performance, Q-values are integrated into the upper confidence bound(UCB) formula as prior knowledge, which is intended to enhance decision-making accuracy during gameplay. Experimental results indicate that after 3 000 training iterations, the Q-values across the board converge, suggesting the method's potential for stable policy learning. In comparative evaluations, the generated opening library achieved a 62.9% average win rate against the improved UCT algorithm. When Q-values were used as prior input to the UCB formula, the average win rate increased to 75.9%. The method was also applied in the Chinese Computer Game Competition, where the implementation received a first-place award, supporting the practical applicability of the approach.
Get Citation
Copy Citation Text
XU Zhifan, LI Yuan, WANG Jingwen, LI Zhuoxuan, CAO Yiding. A Q-learning method for creating a Hex opening library[J]. Journal of Nantong University (Natural Science Edition), 2025, 24(2): 22
Received: May. 29, 2024
Accepted: Aug. 25, 2025
Published Online: Aug. 25, 2025
The Author Email: LI Yuan (syliyuan@sut.edu.cn)