nd2py.search.mcts package#

Submodules#

nd2py.search.mcts.mcts module#

nd2py.search.mcts.mcts.simplify(eq: Symbol)[source]#
class nd2py.search.mcts.mcts.Node(eqtree: Symbol)[source]#

Bases: object

__init__(eqtree: Symbol)[source]#
UCT(c) float[source]#
to_route(N=5, c=1.41) str[source]#

Root ├ Node1 ┆ ├ self ┆ └ Node1-2 └ Node2

copy() Node[source]#
class nd2py.search.mcts.mcts.MCTS(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter=100, use_tqdm=False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, time_limit=None, sample_num=300, keep_vars=False, normalize_y=False, normalize_X=False, remove_abnormal=False, train_eval_split=1.0, child_num=50, n_playout=100, d_playout=10, max_len=30, c=1.41, eta=0.999, **kwargs)[source]#

Bases: BaseEstimator, RegressorMixin

Monte Carlo Tree Search-based Symbolic Regression

__init__(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter=100, use_tqdm=False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, time_limit=None, sample_num=300, keep_vars=False, normalize_y=False, normalize_X=False, remove_abnormal=False, train_eval_split=1.0, child_num=50, n_playout=100, d_playout=10, max_len=30, c=1.41, eta=0.999, **kwargs)[source]#

Initialize a Monte Carlo Tree Search symbolic regression estimator.

This configures the function set, search hyperparameters, logging behavior, optional graph structure, and various data preprocessing options used during MCTS-based exploration of expression trees.

Parameters:
  • variables (List[Variable]) – List of input variables that can be used in generated expressions.

  • binary (List[Symbol], optional) – Binary operator symbols available to the search (for example Add, Sub, Mul). Defaults to a standard arithmetic and min/max set.

  • unary (List[Symbol], optional) – Unary operator symbols available to the search (for example Sqrt, Log, Sin). Defaults to a standard set of common functions.

  • max_params (int, optional) – Maximum number of numeric parameters (Number nodes) allowed in an expression. Defaults to 2.

  • const_range (Tuple[float, float], optional) – Range from which random constants are sampled. Defaults to (-1.0, 1.0).

  • depth_range (Tuple[int, int], optional) – Minimum and maximum tree depth for randomly generated expressions. Defaults to (2, 6).

  • nettype (Optional[Literal["node", "edge", "scalar"]], optional) – Nettype of the target expression when working with graph data. Defaults to "scalar".

  • log_per_iter (float, optional) – Log progress every log_per_iter iterations; use float("inf") to disable iteration-based logging. Defaults to float("inf").

  • log_per_sec (float, optional) – Log progress every log_per_sec seconds; use float("inf") to disable time-based logging. Defaults to float("inf").

  • log_detailed_speed (bool, optional) – If True, include detailed timing information for individual steps in logs. Defaults to False.

  • save_path (str, optional) – Directory in which JSON lines of per-iteration records are stored as records.jsonl. If None, records are not written to disk. Defaults to None.

  • random_state (Optional[int], optional) – Seed used to control randomness for reproducible runs. Defaults to None.

  • n_iter (int, optional) – Maximum number of MCTS iterations. Defaults to 100.

  • use_tqdm (bool, optional) – If True, wrap the main search loop with a tqdm progress bar. Defaults to False.

  • edge_list (Tuple[List[int], List[int]], optional) – Optional graph edge list (sources, targets) used when evaluating graph operators. Defaults to None.

  • num_nodes (int, optional) – Number of nodes in the underlying graph; if None, it may be inferred elsewhere. Defaults to None.

  • time_limit (float, optional) – Maximum wall-clock time (in seconds) for the search; if exceeded, the search terminates early. Defaults to None.

  • sample_num (int, optional) – Number of samples drawn when evaluating or sampling candidate expressions. Defaults to 300.

  • keep_vars (bool, optional) – If True, keep variable names instead of renaming them during preprocessing. Defaults to False.

  • normalize_y (bool, optional) – If True, normalize target values before fitting. Defaults to False.

  • normalize_X (bool, optional) – If True, normalize input features before fitting. Defaults to False.

  • remove_abnormal (bool, optional) – If True, attempt to remove abnormal samples before training. Defaults to False.

  • train_eval_split (float, optional) – Fraction of data used for training; the remainder may be used for evaluation. Defaults to 1.0.

  • child_num (int, optional) – Maximum number of child nodes expanded from a node during expansion. Defaults to 50.

  • n_playout (int, optional) – Number of rollouts performed from a node during simulation. Defaults to 100.

  • d_playout (int, optional) – Maximum depth of each simulation rollout. Defaults to 10.

  • max_len (int, optional) – Maximum allowed expression length; used to constrain actions. Defaults to 30.

  • c (float, optional) – Exploration constant used in the UCT formula during selection. Defaults to 1.41.

  • eta (float, optional) – Complexity penalty factor used in the reward function, where larger eta discounts complex expressions less. Defaults to 0.999.

  • **kwargs – Additional unused keyword arguments; a warning is logged if any are provided.

fit(X: ndarray | DataFrame | Dict[str, ndarray], y: ndarray | Series)[source]#

Fit the MCTS model to training data by exploring expression trees.

The input features can be provided as a NumPy array, a pandas DataFrame, or a dictionary mapping variable names to arrays. The method builds a Monte Carlo search tree, repeatedly performs selection–expansion–simulation–backpropagation steps, and tracks the best discovered symbolic expression in self.eqtree.

Parameters:
  • X (ndarray | DataFrame | Dict[str, ndarray]) – Input features with shape (n_samples, n_dims) or an equivalent mapping from variable names to 1D arrays.

  • y (ndarray | Series) – Target values with shape (n_samples,).

Returns:

The fitted estimator instance.

Return type:

MCTS

Raises:

ValueError – If X is of an unsupported type.

predict(X: ndarray | DataFrame | Dict[str, ndarray]) ndarray[source]#

Predict target values for X using the best expression found by MCTS.

action(state: Node, action: Tuple[Symbol, Symbol]) Node[source]#

Apply an action to a state by replacing an empty placeholder with a symbol.

check_valid_action(state: Node, action: Tuple[Symbol, Symbol]) bool[source]#

Return whether a proposed action is valid under length and nettype constraints.

iter_valid_action(state: Node, shuffle=False) Generator[Tuple[Symbol, Symbol], None, None][source]#

Iterate over all valid actions from a given state, optionally in random order.

pick_valid_action(state: Node) Tuple[Symbol, Symbol][source]#

Randomly sample a single valid action from the current state.

fill_to_complete(state: Node) Node[source]#

Fill all remaining empty leaves in a state with compatible variables.

select(root: Node) Node[source]#

Select a leaf node from the tree using the UCT rule.

expand(node: Node, X: Dict[str, ndarray], y: ndarray) Node[source]#

Expand a node by generating child states and return one child for simulation.

simulate(node: Node, X: Dict[str, ndarray], y: ndarray) Tuple[Node, float][source]#

Run playout simulations from a node and return the best simulated state and reward.

backpropagate(node: Node, reward: float)[source]#

Backpropagate a reward up the tree, updating visit counts and value estimates.

set_reward(node: Node, X: Dict[str, ndarray], y: ndarray)[source]#

Fit parameters, evaluate an expression, and assign reward and diagnostics to a node.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MCTS#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

nd2py.search.mcts.mcts_forest module#

class nd2py.search.mcts.mcts_forest.NodeForest(eqtree: Symbol)[source]#

Bases: Node

class nd2py.search.mcts.mcts_forest.MCTSForest(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter=100, use_tqdm=False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, time_limit=None, sample_num=300, keep_vars=False, normalize_y=False, normalize_X=False, remove_abnormal=False, train_eval_split=1.0, child_num=50, n_playout=100, d_playout=10, max_len=30, c=1.41, eta=0.999, **kwargs)[source]#

Bases: MCTS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MCTSForest#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

nd2py.search.mcts.utils module#

nd2py.search.mcts.utils.preprocess(X: ndarray | DataFrame | Dict[str, ndarray])[source]#
nd2py.search.mcts.utils.sample_Xy(X: Dict[str, ndarray], y: ndarray, sample_num)[source]#
nd2py.search.mcts.utils.rename_variable(eqtree: Symbol, variable_mapping: Dict[str, str])[source]#