nd2py.search.gp package#

Submodules#

nd2py.search.gp.gp module#

class nd2py.search.gp.gp.Individual(eqtree: Symbol)[source]#

Bases: object

__init__(eqtree: Symbol)[source]#

copy() → Individual[source]#

class nd2py.search.gp.gp.GP(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, elitism_k: int = 10, population_size: int = 1000, tournament_size: int = 20, p_crossover: float = 0.9, p_subtree_mutation: float = 0.01, p_hoist_mutation: float = 0.01, p_point_mutation: float = 0.01, p_point_replace: float = 0.05, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), full_prob: float = 0.5, nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', n_jobs: int = None, log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter=100, use_tqdm=False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, **kwargs)[source]#

Bases: BaseEstimator, RegressorMixin

Genetic Programming-based Symbolic Regression

__init__(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, elitism_k: int = 10, population_size: int = 1000, tournament_size: int = 20, p_crossover: float = 0.9, p_subtree_mutation: float = 0.01, p_hoist_mutation: float = 0.01, p_point_mutation: float = 0.01, p_point_replace: float = 0.05, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), full_prob: float = 0.5, nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', n_jobs: int = None, log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter=100, use_tqdm=False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, **kwargs)[source]#

Initialize a genetic-programming-based symbolic regression estimator.

This configures the function set, search hyperparameters, logging behavior, and optional graph structure used for nettype-aware expressions.

Parameters:

variables (List[Variable]) – List of input variables that can be used in generated expressions.
binary (List[Symbol], optional) – Binary operator symbols available to the GP (for example Add, Sub, Mul). Defaults to a standard arithmetic and min/max set.
unary (List[Symbol], optional) – Unary operator symbols available to the GP (for example Sqrt, Log, Sin). Defaults to a standard set of common functions.
max_params (int, optional) – Maximum number of numeric parameters (Number nodes) allowed in an expression. Defaults to 2.
elitism_k (int, optional) – Number of top individuals carried over unchanged between generations. Defaults to 10.
population_size (int, optional) – Number of individuals in each generation. Defaults to 1000.
tournament_size (int, optional) – Number of individuals competing in each tournament during parent selection. Defaults to 20.
p_crossover (float, optional) – Probability of applying subtree crossover. Defaults to 0.9.
p_subtree_mutation (float, optional) – Probability of applying subtree mutation. Defaults to 0.01.
p_hoist_mutation (float, optional) – Probability of applying hoist mutation. Defaults to 0.01.
p_point_mutation (float, optional) – Probability of applying point mutation. Defaults to 0.01.
p_point_replace (float, optional) – Probability of replacing a node during point mutation. Defaults to 0.05.
const_range (Tuple[float, float], optional) – Range from which random constants are sampled. Defaults to (-1.0, 1.0).
depth_range (Tuple[int, int], optional) – Minimum and maximum tree depth for randomly generated expressions. Defaults to (2, 6).
full_prob (float, optional) – Probability of using the “full” method rather than “grow” when generating random trees. Defaults to 0.5.
nettype (Optional[Literal["node", "edge", "scalar"]], optional) – Nettype of the target expression, used when working with graph data. Defaults to "scalar".
n_jobs (int, optional) – Number of parallel jobs used for evolving the population. If None, evolution is run in a single process. Defaults to None.
log_per_iter (float, optional) – Log progress every log_per_iter iterations; use float("inf") to disable iteration-based logging. Defaults to float("inf").
log_per_sec (float, optional) – Log progress every log_per_sec seconds; use float("inf") to disable time-based logging. Defaults to float("inf").
log_detailed_speed (bool, optional) – If True, include detailed timing information for individual steps in logs. Defaults to False.
save_path (str, optional) – File path to which JSON lines of per-iteration records are appended. If None, records are not written to disk. Defaults to None.
random_state (Optional[int], optional) – Seed for the internal RNG to make runs reproducible. Defaults to None.
n_iter (int, optional) – Maximum number of evolution iterations. Defaults to 100.
use_tqdm (bool, optional) – If True, wrap the main evolution loop with a tqdm progress bar. Defaults to False.
edge_list (Tuple[List[int], List[int]], optional) – Optional graph edge list (sources, targets) used when evaluating graph operators. If provided and num_nodes is None, the number of nodes is inferred. Defaults to None.
num_nodes (int, optional) – Number of nodes in the underlying graph. If None and edge_list is provided, it is inferred from the edges. Defaults to None.
**kwargs – Additional unused keyword arguments; a warning is logged if any are provided.

Raises:

AssertionError – If the sum of mutation and crossover probabilities exceeds 1.0.

fit(X: ndarray | DataFrame | Dict[str, ndarray], y: ndarray | Series)[source]#

Fit the GP model to training data by evolving expression trees.

The input features can be provided as a NumPy array, a pandas DataFrame, or a dictionary mapping variable names to arrays. The method runs the evolutionary loop, tracks the best individual, and stores its expression tree in self.eqtree.

Parameters:

X (ndarray | DataFrame | Dict[str, ndarray]) – Input features with shape (n_samples, n_dims) or an equivalent mapping from variable names to 1D arrays.
y (ndarray | Series) – Target values with shape (n_samples,).

Returns:

The fitted estimator instance.

Return type:

Raises:

ValueError – If X is of an unsupported type.

predict(X: ndarray | DataFrame | Dict[str, ndarray]) → ndarray[source]#: Predict target values for X using the best evolved expression tree.

evolve(population: List[Individual], X: Dict[str, ndarray], y: ndarray, children_size=None, elitism_k=None) → List[Individual][source]#: Evolve a population for one generation and return the offspring.

init_population(X: Dict[str, ndarray], y: ndarray) → List[Individual][source]#: Initialize the first population of individuals from the generator.

tournament(population: List[Individual], num) → List[Individual][source]#: Select individuals from the population via tournament selection.

crossover(parent: Individual, donor: Individual) → Individual[source]#: Create a child by replacing a subtree of parent with one from donor.

subtree_mutation(parent: Individual) → Individual[source]#: Create a child by replacing a random subtree with a newly generated random tree.

hoist_mutation(parent: Individual) → Individual[source]#: Create a child by hoisting a randomly chosen subtree to the root.

point_mutation(parent: Individual) → Individual[source]#: Create a child by performing point mutation with random symbol replacements or insertions.

set_fitness(individual: Individual, X: Dict[str, ndarray], y: ndarray)[source]#: Compute and assign complexity, accuracy, and fitness for an individual.

get_random_subtree(individual: Individual | Symbol, nettypes: Set[Literal['node', 'edge', 'scalar']] = None) → Symbol[source]#: Sample a subtree from an individual following the GPlearn/Koza (1992) node-selection strategy.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GP#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

nd2py.search.gp package

Contents

nd2py.search.gp package#

Submodules#

nd2py.search.gp.gp module#