nd2py.search.ndformer package

nd2py.search.ndformer package#

class nd2py.search.ndformer.NDFormerDataGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#

sample(eqtree: Symbol, dist_type: Literal['GMM', 'Uniform', 'Gaussian'] = 'GMM', edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, sample_num: int = None, _rng: Generator = None, **kwargs)[source]#

Arguments: - eqtree: a symbolic expression tree

Returns:

var_dict = dict(: A: np.ndarray, (V, V) G: np.ndarray, (E, 2) out: np.ndarray, (N, V) or (N, E) v1/v2/v3/v4/v5: np.ndarray, (N, V) e1/e2/e3/e4/e5: np.ndarray, (N, E)

)

generate_normal_data(size, mean=None, std=None, _rng: Generator = None)[source]#

generate_uniform_data(size, low=None, high=None, _rng: Generator = None)[source]#

generate_GMM_data(size, L=1, _rng: Generator = None)[source]#

class nd2py.search.ndformer.NDFormerEqtreeGenerator(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#

Bases: object

__init__(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#

generate_node(nettypes: Set[NetType], _rng: np.random_Generator = None) → Symbol[source]#

generate_leaf(nettypes: Set[NetType], _rng: np.random.Generator = None) → Number | Variable[source]#

sample(nettypes: Set[NetType] = 'scalar', assign_root_nettypes=True, _rng: np.random.Generator = None) → Symbol[source]#

class nd2py.search.ndformer.NDFormerGraphGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#

sample(topology: Literal['ER', 'BA', 'WS', 'Complete'] = None, _rng: Generator = None, **kwargs)[source]#

Arguments: - V: node num - topology: ‘ER’, ‘BA’, ‘WS’, ‘Complete’ - kwargs:

(When topology is ‘ER’) - p: edge probability - directed: directed or not (When topology is ‘BA’) - m: number of edges to attach from a new node to existing nodes (When topology is ‘WS’) - k: each node is connected to k nearest neighbors in ring topology - p: probability of rewiring each edge (When topology is ‘Complete’) - None

Return: - edge_list: (2, E), edge list - num_nodes: int, node num

generate_ER_graph(V=None, E=None, directed=None, _rng: Generator = None)[source]#

generate_BA_graph(V=None, m=None, _rng: Generator = None)[source]#

generate_WS_graph(V=None, k=None, p=None, _rng: Generator = None)[source]#

generate_complete_graph(V=None, _rng: Generator = None)[source]#

class nd2py.search.ndformer.NDFormerModelConfig(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20)[source]#

Bases: object

Configuration for NDFormer model architecture and capabilities.

═══════════════════════════════════════════════════════════════════════════ PURPOSE ═══════════════════════════════════════════════════════════════════════════

This class defines the model’s structure and capabilities:

Model architecture (transformer layers, GNN layers, embedding dimensions)
Tokenization scheme (number encoding, vocabulary)
Supported operators and sequence length limits

═══════════════════════════════════════════════════════════════════════════ RELATIONSHIP WITH NDFormerMCTS (INFERENCE-TIME SEARCH) ═══════════════════════════════════════════════════════════════════════════

TL;DR: Users of NDFormerMCTS do not need to interact with this class directly.

When using a pre-trained model with NDFormerMCTS:

The trained model + config is a black box: The model and its associated config are loaded together from a checkpoint. The config is used internally to reconstruct the tokenizer and model architecture.
No control over search behavior: NDFormerMCTS does NOT use this config to control how search proceeds. Search parameters (beam_width, temperature, c, etc.) are configured directly in NDFormerMCTS.__init__().
Capability validation only: NDFormerMCTS may use the config to verify that search settings are within the model’s capabilities: - max_len (search) should not exceed max_seq_len (model capability) - Operator set should be compatible with trained vocabulary - Variable count should not exceed max_var_num

This design follows standard ML practice where model architecture config is separate from inference/search hyperparameters.

═══════════════════════════════════════════════════════════════════════════ USAGE ═══════════════════════════════════════════════════════════════════════════

Training a new model: `python config = NDFormerModelConfig(model='default', n_head=16, d_emb=256) tokenizer = NDFormerTokenizer(config, variables) model = NDFormerModel.create(config, tokenizer) # ... train on dataset ... torch.save({'model': model.state_dict(), 'config': config}, 'checkpoint.pth') `

Using a pre-trained model (automatic, users don’t handle config directly): `python search = NDFormerMCTS(variables=[x, y]) search.load_ndformer('hf://YuMeow/ndformer:best.pth') # Config is automatically loaded and used for capability validation search.fit(X, y) `

Creating alternative model architectures: ```python @NDFormerModel.register_model(‘gcn’) class GCNNDFormer(NDFormerModel):

def __init__(self, config, tokenizer):
super().__init__(config, tokenizer) # … custom architecture …

config = NDFormerModelConfig(model=’gcn’) model = NDFormerModel.create(config, tokenizer) ```

═══════════════════════════════════════════════════════════════════════════ ATTRIBUTES ═══════════════════════════════════════════════════════════════════════════

n_mantissa: int = 4#: Number of digits in mantissa for number tokenization.

min_exponent: int = -100#: Minimum exponent value for number tokenization.

max_exponent: int = 100#: Maximum exponent value for number tokenization.

max_var_num: int = 10#: Maximum number of variables per nettype (node/edge/scalar).

model: str = 'default'#

Model architecture type. Used by NDFormerModel.create() to select subclass.

Available models are registered via @NDFormerModel.register_model(‘name’). Default is ‘default’ (the base NDFormerModel architecture).

n_head: int = 8#: Number of attention heads in multi-head self-attention.

d_emb: int = 128#: Dimension of token embeddings and hidden states.

d_ff: int = 512#: Dimension of feed-forward network intermediate layer.

dropout: float = 0.2#: Dropout probability applied to embeddings and attention.

n_GNN_layers: int = 2#: Number of graph neural network layers for encoding graph topology.

n_transformer_encoder_layers: int = 2#: Number of transformer encoder layers.

n_transformer_decoder_layers: int = 2#: Number of transformer decoder layers for autoregressive generation.

use_aux_input: bool = True#: Whether to use auxiliary inputs (parent/nettype information).

__init__(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20) → None#

n_induction_points: int = 128#: Number of induction points for Set Transformer encoder (FLASH-ANSR). Only used when model=’flash_ansr’.

max_seq_len: int = 100#

Maximum sequence length the model can handle.

Note: NDFormerMCTS uses this for capability validation - search with max_len > max_seq_len may produce unreliable results.

operands: Tuple[str]#

Tuple of operator class names in the model vocabulary.

Note: NDFormerMCTS may check if its operator set is compatible with the trained model’s vocabulary.

min_data_num: int = 100#: Minimum number of samples per training equation.

max_data_num: int = 200#: Maximum number of samples per training equation.

min_node_num: int = 10#: Minimum number of nodes in generated graphs.

max_node_num: int = 100#: Maximum number of nodes in generated graphs.

min_edge_num: int = 20#: Minimum number of edges in generated graphs.

max_edge_num: int = 600#: Maximum number of edges in generated graphs.

min_var_val: int = -10#: Minimum absolute value for variable sampling.

max_var_val: int = 10#: Maximum absolute value for variable sampling.

min_coeff_val: int = -20#: Minimum value for equation coefficients.

max_coeff_val: int = 20#: Maximum value for equation coefficients.

class nd2py.search.ndformer.NDFormerTokenizer(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#

Bases: object

__init__(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#

property vocab_size#

property pad_token_id#

property sos_token_id#

property eos_token_id#

property unk_token_id#

encode(eqtree: Symbol, mode: Literal['token', 'token_id'] = 'token') → Tuple[List[int], List[int], List[int]][source]#

decode(tokens: List[str], parents: List[str], nettypes: List[str], mode: Literal['token', 'token_id'] = 'token') → Symbol[source]#

encode_array(data: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#: 专门用于将纯浮点数组转换为 token 或 token_id

decode_array(tokens: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#: 专门用于将 token 或 token_id 数组转换回纯浮点数组

to_dict() → dict[source]#: 导出核心配置以供序列化

classmethod from_dict(config: dict) → NDFormerTokenizer[source]#

save(filepath: str)[source]#: 保存到本地 JSON 文件

classmethod load(filepath: str) → NDFormerTokenizer[source]#: 从本地 JSON 文件加载

nd2py.search.ndformer.setup_lazy_imports(module_name: str, import_mapping: Dict[str, Tuple[str, str]])[source]#

Set up lazy imports for a module’s __init__.py.

Returns (__getattr__, __dir__, __all__) which should be assigned at the module level so that from package import OptionalClass works without importing the optional dependency until it is actually needed.

Parameters:

module_name – The __name__ of the calling module.
import_mapping – A dict mapping attribute names to (module_path, requires) tuples. module_path is a relative import path (e.g. ".torch_calc") and requires is the optional-dependency group name (e.g. "nn") shown in the error message when the dependency is missing.

Usage:

# __init__.py
from .core import CoreClass
from ..utils.lazy_loader import setup_lazy_imports

if TYPE_CHECKING:
    from .optional import OptionalClass

__getattr__, __dir__, __all__ = setup_lazy_imports(__name__, {
    "OptionalClass": (".optional", "nn"),
})

Submodules#

nd2py.search.ndformer.ndformer_config module#

class nd2py.search.ndformer.ndformer_config.NDFormerModelConfig(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20)[source]#

Bases: object

Configuration for NDFormer model architecture and capabilities.

This class defines the model’s structure and capabilities:

Model architecture (transformer layers, GNN layers, embedding dimensions)
Tokenization scheme (number encoding, vocabulary)
Supported operators and sequence length limits

TL;DR: Users of NDFormerMCTS do not need to interact with this class directly.

When using a pre-trained model with NDFormerMCTS:

The trained model + config is a black box: The model and its associated config are loaded together from a checkpoint. The config is used internally to reconstruct the tokenizer and model architecture.
No control over search behavior: NDFormerMCTS does NOT use this config to control how search proceeds. Search parameters (beam_width, temperature, c, etc.) are configured directly in NDFormerMCTS.__init__().
Capability validation only: NDFormerMCTS may use the config to verify that search settings are within the model’s capabilities: - max_len (search) should not exceed max_seq_len (model capability) - Operator set should be compatible with trained vocabulary - Variable count should not exceed max_var_num

This design follows standard ML practice where model architecture config is separate from inference/search hyperparameters.

Creating alternative model architectures: ```python @NDFormerModel.register_model(‘gcn’) class GCNNDFormer(NDFormerModel):

def __init__(self, config, tokenizer):
super().__init__(config, tokenizer) # … custom architecture …

config = NDFormerModelConfig(model=’gcn’) model = NDFormerModel.create(config, tokenizer) ```

n_mantissa: int = 4#: Number of digits in mantissa for number tokenization.

min_exponent: int = -100#: Minimum exponent value for number tokenization.

max_exponent: int = 100#: Maximum exponent value for number tokenization.

max_var_num: int = 10#: Maximum number of variables per nettype (node/edge/scalar).

model: str = 'default'#

Model architecture type. Used by NDFormerModel.create() to select subclass.

Available models are registered via @NDFormerModel.register_model(‘name’). Default is ‘default’ (the base NDFormerModel architecture).

n_head: int = 8#: Number of attention heads in multi-head self-attention.

d_emb: int = 128#: Dimension of token embeddings and hidden states.

d_ff: int = 512#: Dimension of feed-forward network intermediate layer.

dropout: float = 0.2#: Dropout probability applied to embeddings and attention.

n_GNN_layers: int = 2#: Number of graph neural network layers for encoding graph topology.

n_transformer_encoder_layers: int = 2#: Number of transformer encoder layers.

n_transformer_decoder_layers: int = 2#: Number of transformer decoder layers for autoregressive generation.

use_aux_input: bool = True#: Whether to use auxiliary inputs (parent/nettype information).

__init__(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20) → None#

n_induction_points: int = 128#: Number of induction points for Set Transformer encoder (FLASH-ANSR). Only used when model=’flash_ansr’.

max_seq_len: int = 100#

Maximum sequence length the model can handle.

Note: NDFormerMCTS uses this for capability validation - search with max_len > max_seq_len may produce unreliable results.

operands: Tuple[str]#

Tuple of operator class names in the model vocabulary.

Note: NDFormerMCTS may check if its operator set is compatible with the trained model’s vocabulary.

min_data_num: int = 100#: Minimum number of samples per training equation.

max_data_num: int = 200#: Maximum number of samples per training equation.

min_node_num: int = 10#: Minimum number of nodes in generated graphs.

max_node_num: int = 100#: Maximum number of nodes in generated graphs.

min_edge_num: int = 20#: Minimum number of edges in generated graphs.

max_edge_num: int = 600#: Maximum number of edges in generated graphs.

min_var_val: int = -10#: Minimum absolute value for variable sampling.

max_var_val: int = 10#: Maximum absolute value for variable sampling.

min_coeff_val: int = -20#: Minimum value for equation coefficients.

max_coeff_val: int = 20#: Maximum value for equation coefficients.

nd2py.search.ndformer.ndformer_dataset module#

class nd2py.search.ndformer.ndformer_dataset.InfiniteSampler(*args: Any, **kwargs: Any)[source]#: Bases: Sampler

class nd2py.search.ndformer.ndformer_dataset.NDFormerDataset(*args: Any, **kwargs: Any)[source]#

Bases: Dataset

__init__(config: NDFormerModelConfig, eqtree_generator: NDFormerEqtreeGenerator, topo_generator: NDFormerGraphGenerator, data_generator: NDFormerDataGenerator, tokenizer: NDFormerTokenizer, n_samples: int | None = None, random_state: int | None = None)[source]#

collate_fn(batch)[source]#

get_sampler()[source]#

nd2py.search.ndformer.ndformer_generator module#

class nd2py.search.ndformer.ndformer_generator.NDFormerEqtreeGenerator(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#

Bases: object

__init__(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#

generate_node(nettypes: Set[NetType], _rng: np.random_Generator = None) → Symbol[source]#

generate_leaf(nettypes: Set[NetType], _rng: np.random.Generator = None) → Number | Variable[source]#

sample(nettypes: Set[NetType] = 'scalar', assign_root_nettypes=True, _rng: np.random.Generator = None) → Symbol[source]#

class nd2py.search.ndformer.ndformer_generator.NDFormerGraphGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#

sample(topology: Literal['ER', 'BA', 'WS', 'Complete'] = None, _rng: Generator = None, **kwargs)[source]#

Arguments: - V: node num - topology: ‘ER’, ‘BA’, ‘WS’, ‘Complete’ - kwargs:

(When topology is ‘ER’) - p: edge probability - directed: directed or not (When topology is ‘BA’) - m: number of edges to attach from a new node to existing nodes (When topology is ‘WS’) - k: each node is connected to k nearest neighbors in ring topology - p: probability of rewiring each edge (When topology is ‘Complete’) - None

Return: - edge_list: (2, E), edge list - num_nodes: int, node num

generate_ER_graph(V=None, E=None, directed=None, _rng: Generator = None)[source]#

generate_BA_graph(V=None, m=None, _rng: Generator = None)[source]#

generate_WS_graph(V=None, k=None, p=None, _rng: Generator = None)[source]#

generate_complete_graph(V=None, _rng: Generator = None)[source]#

class nd2py.search.ndformer.ndformer_generator.NDFormerDataGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#

Arguments: - eqtree: a symbolic expression tree

Returns:

var_dict = dict(: A: np.ndarray, (V, V) G: np.ndarray, (E, 2) out: np.ndarray, (N, V) or (N, E) v1/v2/v3/v4/v5: np.ndarray, (N, V) e1/e2/e3/e4/e5: np.ndarray, (N, E)

)

generate_normal_data(size, mean=None, std=None, _rng: Generator = None)[source]#

generate_uniform_data(size, low=None, high=None, _rng: Generator = None)[source]#

generate_GMM_data(size, L=1, _rng: Generator = None)[source]#

nd2py.search.ndformer.ndformer_mcts module#

NDFormer-guided MCTS for Symbolic Regression

Uses a pre-trained NDFormer model to guide MCTS search via PUCK

class nd2py.search.ndformer.ndformer_mcts.NDFormerNode(eqtree: Symbol)[source]#

Bases: Node

__init__(eqtree: Symbol)[source]#

UCT(c) → float[source]#

PUCT score for a node

PUCT(s, a) = Q(s, a) + c_puct * P(s, a) * sqrt(sum(N) / (1 + N(s, a)))

class nd2py.search.ndformer.ndformer_mcts.NDFormerMCTS(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter: int = 100, use_tqdm: bool = False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, time_limit: float = None, sample_num: int = 300, keep_vars: bool = False, normalize_y: bool = False, normalize_X: bool = False, remove_abnormal: bool = False, train_eval_split: float = 1.0, child_num: int = 50, n_playout: int = 100, d_playout: int = 10, max_len: int = 30, c: float = 1.41, eta: float = 0.999, ndformer: NDFormerModel | None = None, tokenizer: NDFormerTokenizer | None = None, ndformer_temperature: float = 1.0, beam_width: int = 10, **kwargs)[source]#

Bases: MCTS

NDFormer-guided Monte Carlo Tree Search for Symbolic Regression.

This class extends MCTS by using a pre-trained NDFormer model to provide prior probabilities for action selection via the PUCT algorithm:

PUCT(s, a) = Q(s, a) + c_puct * P(s, a) * sqrt(sum(N(s, b)) / (1 + N(s, a)))

where P(s, a) is the prior probability from NDFormer’s policy head.

The pre-trained model is treated as a black box - it provides policy priors but does not control search behavior. Search is controlled by parameters like beam_width, ndformer_temperature, c, and eta.

Usage Examples#

# Load pre-trained model from Hugging Face Hub search = NDFormerMCTS(variables=[x, y]) search.load_ndformer(‘hf://YuMeow/ndformer:best.pth’) search.fit(X, y)

# Or pass model directly model = NDFormerModel(config) model.load_state_dict(checkpoint) tokenizer = NDFormerTokenizer(config, [x, y]) search = NDFormerMCTS(

variables=[x, y], ndformer=model, tokenizer=tokenizer, beam_width=20, c=1.5,

) search.fit(X, y)

param variables:: Input variables for symbolic regression.
type variables:: List[nd.Variable]
param binary:: Binary operators for search. Default: [Add, Sub, Mul, Div, Max, Min].
type binary:: List[nd.Symbol], optional
param unary:: Unary operators for search. Default: [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan].
type unary:: List[nd.Symbol], optional
param max_params:: Maximum number of numeric parameters in expressions. Default: 2.
type max_params:: int, optional
param const_range:: Range for constant initialization. Default: (-1.0, 1.0).
type const_range:: Tuple[float, float], optional
param depth_range:: Depth range for generated expressions. Default: (2, 6).
type depth_range:: Tuple[int, int], optional
param nettype:: Network type for the search. Default: “scalar”.
type nettype:: Literal[“node”, “edge”, “scalar”], optional
param log_per_iter:: Log every N iterations. Default: inf.
type log_per_iter:: int, optional
param log_per_sec:: Log every N seconds. Default: inf.
type log_per_sec:: float, optional
param log_detailed_speed:: Log detailed timing information. Default: False.
type log_detailed_speed:: bool, optional
param save_path:: Directory to save search records. Default: None.
type save_path:: str, optional
param random_state:: Random seed for reproducibility. Default: None.
type random_state:: int, optional
param n_iter:: Maximum number of MCTS iterations. Default: 100.
type n_iter:: int, optional
param use_tqdm:: Show progress bar. Default: False.
type use_tqdm:: bool, optional
param edge_list:: Graph edge list for network operators. Default: None.
type edge_list:: Tuple[List[int], List[int]], optional
param num_nodes:: Number of nodes in the graph. Default: None.
type num_nodes:: int, optional
param time_limit:: Maximum search time in seconds. Default: None.
type time_limit:: float, optional
param sample_num:: Number of samples for evaluation. Default: 300.
type sample_num:: int, optional
param keep_vars:: Keep original variable names. Default: False.
type keep_vars:: bool, optional
param normalize_y:: Normalize target values. Default: False.
type normalize_y:: bool, optional
param normalize_X:: Normalize input features. Default: False.
type normalize_X:: bool, optional
param remove_abnormal:: Remove abnormal samples. Default: False.
type remove_abnormal:: bool, optional
param train_eval_split:: Train/eval data split ratio. Default: 1.0.
type train_eval_split:: float, optional
param child_num:: Maximum children per expansion. Default: 50.
type child_num:: int, optional
param n_playout:: Number of playouts per simulation. Default: 100.
type n_playout:: int, optional
param d_playout:: Maximum depth per playout. Default: 10.
type d_playout:: int, optional
param max_len:: Maximum expression length during search. Default: 30.
type max_len:: int, optional
param c:: PUCT exploration constant. Default: 1.41.
type c:: float, optional
param eta:: Complexity penalty factor for reward. Default: 0.999.
type eta:: float, optional
param ndformer:: Pre-trained NDFormer model. Default: None.
type ndformer:: NDFormerModel, optional
param tokenizer:: Tokenizer for the model. Default: None.
type tokenizer:: NDFormerTokenizer, optional
param ndformer_temperature:: Temperature for policy softmax. Default: 1.0.
type ndformer_temperature:: float, optional
param beam_width:: Beam size for leaf selection. Default: 10.
type beam_width:: int, optional

__init__(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter: int = 100, use_tqdm: bool = False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, time_limit: float = None, sample_num: int = 300, keep_vars: bool = False, normalize_y: bool = False, normalize_X: bool = False, remove_abnormal: bool = False, train_eval_split: float = 1.0, child_num: int = 50, n_playout: int = 100, d_playout: int = 10, max_len: int = 30, c: float = 1.41, eta: float = 0.999, ndformer: NDFormerModel | None = None, tokenizer: NDFormerTokenizer | None = None, ndformer_temperature: float = 1.0, beam_width: int = 10, **kwargs)[source]#

Initialize a Monte Carlo Tree Search symbolic regression estimator.

This configures the function set, search hyperparameters, logging behavior, optional graph structure, and various data preprocessing options used during MCTS-based exploration of expression trees.

Parameters:

variables (List[Variable]) – List of input variables that can be used in generated expressions.
binary (List[Symbol], optional) – Binary operator symbols available to the search (for example Add, Sub, Mul). Defaults to a standard arithmetic and min/max set.
unary (List[Symbol], optional) – Unary operator symbols available to the search (for example Sqrt, Log, Sin). Defaults to a standard set of common functions.
max_params (int, optional) – Maximum number of numeric parameters (Number nodes) allowed in an expression. Defaults to 2.
const_range (Tuple[float, float], optional) – Range from which random constants are sampled. Defaults to (-1.0, 1.0).
depth_range (Tuple[int, int], optional) – Minimum and maximum tree depth for randomly generated expressions. Defaults to (2, 6).
nettype (Optional[Literal["node", "edge", "scalar"]], optional) – Nettype of the target expression when working with graph data. Defaults to "scalar".
log_per_iter (float, optional) – Log progress every log_per_iter iterations; use float("inf") to disable iteration-based logging. Defaults to float("inf").
log_per_sec (float, optional) – Log progress every log_per_sec seconds; use float("inf") to disable time-based logging. Defaults to float("inf").
log_detailed_speed (bool, optional) – If True, include detailed timing information for individual steps in logs. Defaults to False.
save_path (str, optional) – Directory in which JSON lines of per-iteration records are stored as records.jsonl. If None, records are not written to disk. Defaults to None.
random_state (Optional[int], optional) – Seed used to control randomness for reproducible runs. Defaults to None.
n_iter (int, optional) – Maximum number of MCTS iterations. Defaults to 100.
use_tqdm (bool, optional) – If True, wrap the main search loop with a tqdm progress bar. Defaults to False.
edge_list (Tuple[List[int], List[int]], optional) – Optional graph edge list (sources, targets) used when evaluating graph operators. Defaults to None.
num_nodes (int, optional) – Number of nodes in the underlying graph; if None, it may be inferred elsewhere. Defaults to None.
time_limit (float, optional) – Maximum wall-clock time (in seconds) for the search; if exceeded, the search terminates early. Defaults to None.
sample_num (int, optional) – Number of samples drawn when evaluating or sampling candidate expressions. Defaults to 300.
keep_vars (bool, optional) – If True, keep variable names instead of renaming them during preprocessing. Defaults to False.
normalize_y (bool, optional) – If True, normalize target values before fitting. Defaults to False.
normalize_X (bool, optional) – If True, normalize input features before fitting. Defaults to False.
remove_abnormal (bool, optional) – If True, attempt to remove abnormal samples before training. Defaults to False.
train_eval_split (float, optional) – Fraction of data used for training; the remainder may be used for evaluation. Defaults to 1.0.
child_num (int, optional) – Maximum number of child nodes expanded from a node during expansion. Defaults to 50.
n_playout (int, optional) – Number of rollouts performed from a node during simulation. Defaults to 100.
d_playout (int, optional) – Maximum depth of each simulation rollout. Defaults to 10.
max_len (int, optional) – Maximum allowed expression length; used to constrain actions. Defaults to 30.
c (float, optional) – Exploration constant used in the UCT formula during selection. Defaults to 1.41.
eta (float, optional) – Complexity penalty factor used in the reward function, where larger eta discounts complex expressions less. Defaults to 0.999.
**kwargs – Additional unused keyword arguments; a warning is logged if any are provided.

fit(X: ndarray | DataFrame | Dict[str, ndarray], y: ndarray | Series)[source]#

Fit the model using NDFormer-guided MCTS with batch expansion

First encodes the graph data and caches memory, then runs MCTS search with beam search based select and batch expand for efficiency

load_ndformer(checkpoint_path='hf://YuMeow/ndformer:best.pth', device=None)[source]#

Load pre-trained NDFormer model and tokenizer from checkpoint.

The model config is automatically loaded from the checkpoint. Users do not need to provide a config manually.

Parameters:

checkpoint_path – Path to model checkpoint. Can be: - Local file path: “/path/to/checkpoint.pth” - HF shorthand: “YuMeow/ndformer:best.pth” - HF full syntax: “hf://YuMeow/ndformer:best.pth”
device – Device to load model on. If None, auto-detects CUDA/CPU.

encode_data(X: Dict[str, ndarray], y: ndarray)[source]#: Encode graph data using NDFormer encoder and cache memory for reuse

set_policy_prior(actions_dict: Dict[NDFormerNode, List[Tuple[Symbol, Symbol]]])[source]#

Set prior probabilities from NDFormer for valid actions by decoding the current partial sequences in batch

Parameters:

states – List of MCTS nodes
actions_dict – List of valid (empty, operator) tuples for each node

Returns:

List of dictionaries mapping actions to prior probabilities (one per node)

select(root: NDFormerNode) → List[NDFormerNode][source]#

Select leaf nodes using Beam Search with PUCT

Returns a list of leaf nodes to expand in batch

expand(nodes: List[NDFormerNode], X: Dict[str, ndarray], y: ndarray) → List[NDFormerNode][source]#

Expand multiple nodes with NDFormer-guided action selection in batch

Parameters:: nodes – List of leaf nodes to expand
Returns:: List of selected child nodes for simulation

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → NDFormerMCTS#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

nd2py.search.ndformer.ndformer_model module#

class nd2py.search.ndformer.ndformer_model.NDFormerModel(*args: Any, **kwargs: Any)[source]#

Bases: FactoryMixin, Module

Base class for NDFormer models.

Inherits from FactoryMixin which provides: - NDFormerModel.register_model(‘name’): Decorator to register subclasses - NDFormerModel.create(config, tokenizer): Factory method to create instances

The base class is automatically registered as ‘default’ model.

Example

@NDFormerModel.register_model(‘gcn’) class GCNNDFormer(NDFormerModel):

def __init__(self, config, tokenizer):
super().__init__(config, tokenizer) # … custom architecture …

# Usage config = NDFormerModelConfig(model=’gcn’) model = NDFormerModel.create(config, tokenizer)

__init__(config: NDFormerModelConfig, tokenizer: NDFormerTokenizer)[source]#

encode_graph(data_node, data_edge, data_scalar, edge_list, num_nodes, node_batch_idx=None, timer=None)[source]#

图编码阶段：仅在图拓扑变更或初始化时调用一次。

Args: - data_node: (SampleNum, NodeNum, max_var_num+1, 3) - data_edge: (SampleNum, EdgeNum, max_var_num+1, 3) - data_scalar: (SampleNum, BatchNum, max_var_num+1, 3) - edge_list: (2, EdgeNum) - num_nodes: int - node_batch_idx: (TotalNodeNum,) 每个节点所属图的索引

Returns: - memory: (BatchNum, MaxNodeNum, d_emb) - memory_key_padding_mask: (BatchNum, MaxNodeNum) 或 None (如果 node_batch_idx is None), 其中 True 代表需要被忽略的 Pad

decode_sequence(memory, partial_eq, memory_key_padding_mask=None, seq_batch_idx=None, timer=None)[source]#

序列解码阶段：支持 1-to-N 广播，可高频调用。

Args: - memory: (BatchNum, NodeNum*SampleNum, d_emb), 来自 encode_graph 的输出 - partial_eq: (SeqNum, MaxSeqLen) - memory_key_padding_mask: (BatchNum, NodeNum*SampleNum) 或 None, 其中 True 代表需要被忽略的 Pad - seq_batch_idx: (SeqNum,) 每个序列所属图的索引, 用于将 memory 中的节点特征正确广播到每个序列

Returns: - logits: (SeqNum, vocab_size)

forward(batch_dict, timer=None)[source]#: 训练时的入口函数：无缝衔接 Dataset 的 collate_fn。

nd2py.search.ndformer.ndformer_model_flash_ansr module#

FLASH-ANSR variant of NDFormer.

Based on the paper describing FLASH-ANSR architecture: - Pre-norm Transformer (norm_first=True) - Set Transformer encoder with induction points - FlashAttention support (via torch.nn.MultiheadAttention with backend selection)

class nd2py.search.ndformer.ndformer_model_flash_ansr.SetTransformerEncoder(*args: Any, **kwargs: Any)[source]#

Bases: Module

Set Transformer Encoder with induction points.

Drop-in replacement for nn.TransformerEncoder with identical forward signature. Induction points are used internally but output shape matches input shape.

Architecture (Lee et al., 2019): 1. Induction points attend to input data (cross-attention) 2. Self-attention among induction points 3. Induction points attend back to original positions (output projection)

Parameters:

encoder_layer – Not used (kept for API compatibility)
num_layers – Number of transformer layers
norm – Final normalization layer
d_model – Embedding dimension
n_induction_points – Number of learnable induction points

__init__(encoder_layer=None, num_layers=2, norm=None, enable_nested_tensor=True, mask_check=True, d_model: int = None, n_induction_points: int = 128, n_head: int = 8)[source]#

forward(src: torch.Tensor, mask=None, src_key_padding_mask: torch.Tensor | None = None, is_causal=None) → torch.Tensor[source]#

Parameters:

src – Input tensor (batch, seq_len, d_model) - GNN encoded nodes
mask – Not used (kept for API compatibility)
src_key_padding_mask – Padding mask (batch, seq_len), True for padding
is_causal – Not used (kept for API compatibility)

Returns:

Output tensor with same shape as input (batch, seq_len, d_model)

class nd2py.search.ndformer.ndformer_model_flash_ansr.FlashANSRNDFormer(*args: Any, **kwargs: Any)[source]#

Bases: NDFormerModel

FLASH-ANSR: Transformer-based symbolic regression with Set Transformer encoder and pre-norm architecture.

Key features: - Pre-norm Transformer (norm_first=True) - Set Transformer encoder with learnable induction points - LayerNorm for normalization

Reuses NDFormerModel.encode_graph() and NDFormerModel.decode_sequence().

__init__(config: NDFormerModelConfig, tokenizer: NDFormerTokenizer)[source]#

model_name = 'flash_ansr'#

nd2py.search.ndformer.ndformer_tokenizer module#

class nd2py.search.ndformer.ndformer_tokenizer.NumberTokenizer(n_mantissa=4, min_exponent=-100, max_exponent=100)[source]#

Bases: object

__init__(n_mantissa=4, min_exponent=-100, max_exponent=100)[source]#

encode(value: float | List[float], mode: Literal['token', 'token_id'] = 'token') → List[str | int][source]#

decode(tokens: List[str], mode: Literal['token', 'token_id'] = 'token') → List[float][source]#

class nd2py.search.ndformer.ndformer_tokenizer.NDFormerTokenizer(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#

Bases: object

__init__(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#

property vocab_size#

property pad_token_id#

property sos_token_id#

property eos_token_id#

property unk_token_id#

encode(eqtree: Symbol, mode: Literal['token', 'token_id'] = 'token') → Tuple[List[int], List[int], List[int]][source]#

decode(tokens: List[str], parents: List[str], nettypes: List[str], mode: Literal['token', 'token_id'] = 'token') → Symbol[source]#

encode_array(data: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#: 专门用于将纯浮点数组转换为 token 或 token_id

decode_array(tokens: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#: 专门用于将 token 或 token_id 数组转换回纯浮点数组

to_dict() → dict[source]#: 导出核心配置以供序列化

classmethod from_dict(config: dict) → NDFormerTokenizer[source]#

save(filepath: str)[source]#: 保存到本地 JSON 文件

classmethod load(filepath: str) → NDFormerTokenizer[source]#: 从本地 JSON 文件加载

nd2py.search.ndformer package

Contents

nd2py.search.ndformer package#

Submodules#

nd2py.search.ndformer.ndformer_config module#

nd2py.search.ndformer.ndformer_dataset module#

nd2py.search.ndformer.ndformer_generator module#

nd2py.search.ndformer.ndformer_mcts module#

Usage Examples#

nd2py.search.ndformer.ndformer_model module#

nd2py.search.ndformer.ndformer_model_flash_ansr module#

nd2py.search.ndformer.ndformer_tokenizer module#