Gyozas | Olivier Juan

Gyozas is an open-source reinforcement learning framework designed for combinatorial optimization research. It provides a clean, Gymnasium-compatible interface to train RL agents that make decisions inside SCIP’s branch-and-bound solver — specifically variable selection (branching) and node selection.

v1.0 was released in April 2026, after a year of private development. The library is intentionally lightweight, pip-installable in one line:

pip install gyozas

My role

Creator and Lead Developer, with minor contributions from Paul Strang. The need came from internal EDF work at the intersection of GenAI and optimization, where Ecole was a blocker — difficult to install in a corporate environment and no longer compatible with recent SCIP releases. I wanted to keep Ecole’s user-facing surface (observations, rewards, instance generators), so I reimplemented an Ecole-equivalent in pure Python on top of PySCIPOpt.

Motivation

Leading open-source MILP solvers like SCIP rely on hand-crafted heuristics for branching and node selection. Recent research — including PlanB&B (Strang et al., 2026) and BBMDP (Strang et al., 2025) — has shown that learned policies can outperform these defaults.

Until 2023, Ecole was the de facto environment for this line of work. Its last release predates SCIP 8, and running BBMDP and PlanB&B made the consequences concrete: incompatibility with current SCIP releases, and friction whenever a new observation or reward had to be prototyped through Ecole’s C++ layer.

Gyozas is the response to that experience. It provides an Ecole-style API — existing Ecole scripts port over with minor refactoring — while keeping the user-facing layer in Python, so observation functions, rewards, and instance generators can be customized without touching C++. SCIP 8+ is supported out of the box, and the library exposes both the Ecole-style interface and a standard Gymnasium interface from the same environment.

Key features

Dual API — Ecole-style interface for porting existing codebases and a standard Gymnasium reset()/step() interface for new work
SCIP 8+ support — tracks current SCIP / PySCIPOpt releases, where Ecole stopped at SCIP 7
Variable selection or node selection — one decision type per environment (combined mode on the roadmap)
Bipartite graph observations — LP-feature-based observations following Gasse et al. (2019)
Pluggable components — swap reward functions, observation generators, and instance generators in pure Python, no recompilation
Built-in problem generators — Set Cover, Independent Set, Combinatorial Auction, Facility Location (Ecole-style signatures)
B&B tree visualization — inspect solver behavior during and after training

Future work

Performance comparison with Ecole
Maintain performance parity with Ecole; apply cffi or numba where measurements warrant
Port BBMDP and PlanB&B onto Gyozas as reference use cases
Provide simultaneous Branching and Node selection environment
Adding more realistic instance generators

Quick start

Ecole-style API — port existing Ecole pipelines with minor refactoring:

import gyozas

instances = gyozas.SetCoverGenerator(n_rows=100, n_cols=200)
env = gyozas.Environment(
    instance_generator=instances,
    observation_function=gyozas.NodeBipartite(),
    reward_function=gyozas.NNodes(),
)

obs, action_set, reward, done, info = env.reset()
while not done:
    action = action_set[0]
    obs, action_set, reward, done, info = env.step(action)

A standard Gymnasium interface is also exposed by the same environment — see the documentation for the gymnasium.make(...) entry points.

Technical stack

Dynamics: Configuration, PrimalSearch, Branching selection, Node Selection
Reward functions: Node count · Solving time · LP iterations · Bound integrals (primal, dual, primal/dual)
Problem generators: Set Cover · Independent Set · Combinatorial Auction · Facility Location

Links

GitHub
Documentation
PyPI
License: MIT

Mixed-Integer Linear Programming (MILP) lies at the core of many real-world combinatorial optimization (CO) problems, traditionally solved by branch-and-bound (B&B). A key driver influencing B&B solvers efficiency is the variable selection heuristic that guides branching decisions. Looking to move beyond static, hand-crafted heuristics, recent work has explored adapting traditional reinforcement learning (RL) algorithms to the B&B setting, aiming to learn branching strategies tailored to specific MILP distributions. In parallel, RL agents have achieved remarkable success in board games, a very specific type of combinatorial problems, by leveraging environment simulators to plan via Monte Carlo Tree Search (MCTS). Building on these developments, we introduce Plan-and-Branch-and-Bound (PlanB&B), a model-based reinforcement learning (MBRL) agent that leverages a learned internal model of the B&B dynamics to discover improved branching strategies. Computational experiments empirically validate our approach, with our MBRL branching agent outperforming previous state-of-the-art RL methods across four standard MILP benchmarks.

Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B&B). A key factor influencing the performance of B&B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B&B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B&B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four standard MILP benchmarks.