
Utilizing Python to integrate Reinforcement Learning with Alpha-Beta Search in Chinese chess
by Châu Khải Phátin Other , Scripts & Code on September 8, 2023Choose Your Desired Option(s)
Utilizing Python to integrate Reinforcement Learning with Alpha-Beta Search in Chinese chess

“AlphaBing is a lightweight Chinese Chess (Xiangqi) engine that incorporates modified concepts from AlphaZero, enabling it to run efficiently on limited hardware. It provides users with the option to challenge the AI through a minimalist, user-friendly, and intuitive user interface.
The primary objective of this project is to enhance the accessibility of the Alpha(Go)Zero algorithm for developers. The downscaled and highly optimized algorithm showcased its full functionality and remarkable efficiency when applied to the domain of Xiangqi on consumer-grade hardware.
AlphaBing’s foundation lies in the fusion of traditional AI techniques, such as optimized alpha-beta search, with innovative reinforcement learning concepts. This amalgamation allows the strengths of each approach to compensate for the weaknesses of the other.
The development of AlphaBing is driven by the desire to make AlphaZero’s codebase more accessible to the community and to eliminate the need for expensive resources to run the system. AlphaBing operates smoothly on a single device and offers adaptable skill levels for players.
For researchers and enthusiasts, this repository includes a variety of visualization scripts using matplotlib.”
Dependencies
Using pip:
Setting up the Environment with Conda
Mac, Windows & Linux
Apple Silicon
Activate the environment
Training the agent
Open another terminal, then run:
conda activate cheapchess
tensorboard –logdir core/engine/ai/selfplay_rl/checkpoints/logs
Playing against AI
Usage
positional arguments:
{ab,az,abz} AI-agent playing in interactive environment (ab: Alpha-Beta, az: AlphaZero, abz: Alpha-Beta-Zero) (default: ab)
cores maximum number of processors to use for pipeline (default: multiprocessing.cpu_count())
time time on the clock in minutes (default: 5)
options:
-h, –help show this help message and exit
–chinese rendering chinese style UI
–perft run performance tests for move generation speed and accuracy
–pipeline run the self-play and training pipeline (to evaluate, see –eval)
–eval add evaluation to the pipeline
–nui no UI
–black play black
–second move second
File Structure Overview
├── README.md
├── assets
├── core
│ ├── checkpoints
│ │ └── examples
│ ├── engine
│ │ ├── AI
│ │ │ ├── ABMM
│ │ │ │ ├── AI_diagnostics.py
│ │ │ │ ├── __init__.py
│ │ │ │ ├── agent.py
│ │ │ │ ├── eval_utility.py
│ │ │ │ ├── move_ordering.py
│ │ │ │ ├── piece_square_tables.py
│ │ │ │ ├── search.py
│ │ │ │ └── transposition_table.py
│ │ │ ├── AlphaZero
│ │ │ │ ├── MCTS.py
│ │ │ │ ├── __init__.py
│ │ │ │ ├── agent.py
│ │ │ │ ├── checkpoints
│ │ │ │ │ ├── checkpoint_new.h5
│ │ │ │ │ ├── examples
│ │ │ │ │ └── logs
│ │ │ │ ├── config.py
│ │ │ │ ├── model.py
│ │ │ │ ├── nnet.py
│ │ │ │ └── selfplay.py
│ │ │ ├── EvaluateAgent
│ │ │ │ ├── __init__.py
│ │ │ │ └── evaluate.py
│ │ │ ├── SLEF
│ │ │ │ ├── README.md
│ │ │ │ ├── __init__.py
│ │ │ │ ├── eval_data_black.csv
│ │ │ │ ├── eval_data_collection.py
│ │ │ │ └── eval_data_red.csv
│ │ │ ├── agent_interface.py
│ │ │ └── mixed_agent.py
│ │ ├── UI.py
│ │ ├── __init__.py
│ │ ├── board.py
│ │ ├── clock.py
│ │ ├── config.py
│ │ ├── data_init.py
│ │ ├── fast_move_gen.py
│ │ ├── game_manager.py
│ │ ├── move_generator.py
│ │ ├── piece.py
│ │ ├── precomputed_move_data.py
│ │ ├── test.py
│ │ ├── tt_entry.py
│ │ ├── verbal_command_handler.py
│ │ └── zobrist_hashing.py
│ └── utils
│ ├── __init__.py
│ ├── board_utils.py
│ ├── claim_copyright.py
│ ├── modify_pst.py
│ ├── perft_utility.py
│ ├── select_agent.py
│ └── timer.py
├── environment.yml
├── main.py
├── metal.yml
└── requirements.txt
Methods Roadmap
Engine
- Move generation
- A novel optimization of Zobrist Hashing
- FEN utility
- Bitboard representation
- UI / UX – pygame, provisional, drag & drop, sound-effects, move-highlighting etc.
Alpha-Beta-Search
- Piece-square-table implementation
- Minimax-Search with Alpha-Beta-Pruning
- Move ordering
- Multiprocessing
- Transposition Tables
- Iterative Deepening
Reinforcement Learning
- Deep Convolutional ResNet Architecture
- Fast MCTS
- Self-Play policy iteration and Q-Learning
- Training Pipeline
- Evaluation – Elo & win-rate diagnostics
- Parallelism with tensorflow sessions – parallelized pipeline
- Train the agent on server (in progress)
| Download Category | Other, Scripts & Code |
| Product Homepage URL→ | |
| Product Version | |
| File Type | Py |
| File Size | 3.52 MB |
| Developer | |
| Documentation |





