nd2py.search.ndformer package

Contents

nd2py.search.ndformer package#

class nd2py.search.ndformer.NDFormerDataGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#
sample(eqtree: Symbol, dist_type: Literal['GMM', 'Uniform', 'Gaussian'] = 'GMM', edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, sample_num: int = None, _rng: Generator = None, **kwargs)[source]#

Arguments: - eqtree: a symbolic expression tree

Returns:

var_dict = dict(

A: np.ndarray, (V, V) G: np.ndarray, (E, 2) out: np.ndarray, (N, V) or (N, E) v1/v2/v3/v4/v5: np.ndarray, (N, V) e1/e2/e3/e4/e5: np.ndarray, (N, E)

)

generate_normal_data(size, mean=None, std=None, _rng: Generator = None)[source]#
generate_uniform_data(size, low=None, high=None, _rng: Generator = None)[source]#
generate_GMM_data(size, L=1, _rng: Generator = None)[source]#
class nd2py.search.ndformer.NDFormerEqtreeGenerator(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#

Bases: object

__init__(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#
generate_node(nettypes: Set[NetType], _rng: np.random_Generator = None) Symbol[source]#
generate_leaf(nettypes: Set[NetType], _rng: np.random.Generator = None) Number | Variable[source]#
sample(nettypes: Set[NetType] = 'scalar', assign_root_nettypes=True, _rng: np.random.Generator = None) Symbol[source]#
class nd2py.search.ndformer.NDFormerGraphGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#
sample(topology: Literal['ER', 'BA', 'WS', 'Complete'] = None, _rng: Generator = None, **kwargs)[source]#

Arguments: - V: node num - topology: ‘ER’, ‘BA’, ‘WS’, ‘Complete’ - kwargs:

(When topology is ‘ER’) - p: edge probability - directed: directed or not (When topology is ‘BA’) - m: number of edges to attach from a new node to existing nodes (When topology is ‘WS’) - k: each node is connected to k nearest neighbors in ring topology - p: probability of rewiring each edge (When topology is ‘Complete’) - None

Return: - edge_list: (2, E), edge list - num_nodes: int, node num

generate_ER_graph(V=None, E=None, directed=None, _rng: Generator = None)[source]#
generate_BA_graph(V=None, m=None, _rng: Generator = None)[source]#
generate_WS_graph(V=None, k=None, p=None, _rng: Generator = None)[source]#
generate_complete_graph(V=None, _rng: Generator = None)[source]#
class nd2py.search.ndformer.NDFormerModelConfig(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20)[source]#

Bases: object

Configuration for NDFormer model architecture and capabilities.

═══════════════════════════════════════════════════════════════════════════ PURPOSE ═══════════════════════════════════════════════════════════════════════════

This class defines the model’s structure and capabilities:

  • Model architecture (transformer layers, GNN layers, embedding dimensions)

  • Tokenization scheme (number encoding, vocabulary)

  • Supported operators and sequence length limits

═══════════════════════════════════════════════════════════════════════════ RELATIONSHIP WITH NDFormerMCTS (INFERENCE-TIME SEARCH) ═══════════════════════════════════════════════════════════════════════════

TL;DR: Users of NDFormerMCTS do not need to interact with this class directly.

When using a pre-trained model with NDFormerMCTS:

  1. The trained model + config is a black box: The model and its associated config are loaded together from a checkpoint. The config is used internally to reconstruct the tokenizer and model architecture.

  2. No control over search behavior: NDFormerMCTS does NOT use this config to control how search proceeds. Search parameters (beam_width, temperature, c, etc.) are configured directly in NDFormerMCTS.__init__().

  3. Capability validation only: NDFormerMCTS may use the config to verify that search settings are within the model’s capabilities: - max_len (search) should not exceed max_seq_len (model capability) - Operator set should be compatible with trained vocabulary - Variable count should not exceed max_var_num

This design follows standard ML practice where model architecture config is separate from inference/search hyperparameters.

═══════════════════════════════════════════════════════════════════════════ USAGE ═══════════════════════════════════════════════════════════════════════════

Training a new model: `python config = NDFormerModelConfig(model='default', n_head=16, d_emb=256) tokenizer = NDFormerTokenizer(config, variables) model = NDFormerModel.create(config, tokenizer) # ... train on dataset ... torch.save({'model': model.state_dict(), 'config': config}, 'checkpoint.pth') `

Using a pre-trained model (automatic, users don’t handle config directly): `python search = NDFormerMCTS(variables=[x, y]) search.load_ndformer('hf://YuMeow/ndformer:best.pth') # Config is automatically loaded and used for capability validation search.fit(X, y) `

Creating alternative model architectures: ```python @NDFormerModel.register_model(‘gcn’) class GCNNDFormer(NDFormerModel):

def __init__(self, config, tokenizer):

super().__init__(config, tokenizer) # … custom architecture …

config = NDFormerModelConfig(model=’gcn’) model = NDFormerModel.create(config, tokenizer) ```

═══════════════════════════════════════════════════════════════════════════ ATTRIBUTES ═══════════════════════════════════════════════════════════════════════════

n_mantissa: int = 4#

Number of digits in mantissa for number tokenization.

min_exponent: int = -100#

Minimum exponent value for number tokenization.

max_exponent: int = 100#

Maximum exponent value for number tokenization.

max_var_num: int = 10#

Maximum number of variables per nettype (node/edge/scalar).

model: str = 'default'#

Model architecture type. Used by NDFormerModel.create() to select subclass.

Available models are registered via @NDFormerModel.register_model(‘name’). Default is ‘default’ (the base NDFormerModel architecture).

n_head: int = 8#

Number of attention heads in multi-head self-attention.

d_emb: int = 128#

Dimension of token embeddings and hidden states.

d_ff: int = 512#

Dimension of feed-forward network intermediate layer.

dropout: float = 0.2#

Dropout probability applied to embeddings and attention.

n_GNN_layers: int = 2#

Number of graph neural network layers for encoding graph topology.

n_transformer_encoder_layers: int = 2#

Number of transformer encoder layers.

n_transformer_decoder_layers: int = 2#

Number of transformer decoder layers for autoregressive generation.

use_aux_input: bool = True#

Whether to use auxiliary inputs (parent/nettype information).

__init__(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20) None#
n_induction_points: int = 128#

Number of induction points for Set Transformer encoder (FLASH-ANSR). Only used when model=’flash_ansr’.

max_seq_len: int = 100#

Maximum sequence length the model can handle.

Note: NDFormerMCTS uses this for capability validation - search with max_len > max_seq_len may produce unreliable results.

operands: Tuple[str]#

Tuple of operator class names in the model vocabulary.

Note: NDFormerMCTS may check if its operator set is compatible with the trained model’s vocabulary.

min_data_num: int = 100#

Minimum number of samples per training equation.

max_data_num: int = 200#

Maximum number of samples per training equation.

min_node_num: int = 10#

Minimum number of nodes in generated graphs.

max_node_num: int = 100#

Maximum number of nodes in generated graphs.

min_edge_num: int = 20#

Minimum number of edges in generated graphs.

max_edge_num: int = 600#

Maximum number of edges in generated graphs.

min_var_val: int = -10#

Minimum absolute value for variable sampling.

max_var_val: int = 10#

Maximum absolute value for variable sampling.

min_coeff_val: int = -20#

Minimum value for equation coefficients.

max_coeff_val: int = 20#

Maximum value for equation coefficients.

class nd2py.search.ndformer.NDFormerTokenizer(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#

Bases: object

__init__(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#
property vocab_size#
property pad_token_id#
property sos_token_id#
property eos_token_id#
property unk_token_id#
encode(eqtree: Symbol, mode: Literal['token', 'token_id'] = 'token') Tuple[List[int], List[int], List[int]][source]#
decode(tokens: List[str], parents: List[str], nettypes: List[str], mode: Literal['token', 'token_id'] = 'token') Symbol[source]#
encode_array(data: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#

专门用于将纯浮点数组转换为 token 或 token_id

decode_array(tokens: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#

专门用于将 token 或 token_id 数组转换回纯浮点数组

to_dict() dict[source]#

导出核心配置以供序列化

classmethod from_dict(config: dict) NDFormerTokenizer[source]#
save(filepath: str)[source]#

保存到本地 JSON 文件

classmethod load(filepath: str) NDFormerTokenizer[source]#

从本地 JSON 文件加载

nd2py.search.ndformer.setup_lazy_imports(module_name: str, import_mapping: Dict[str, Tuple[str, str]])[source]#

Set up lazy imports for a module’s __init__.py.

Returns (__getattr__, __dir__, __all__) which should be assigned at the module level so that from package import OptionalClass works without importing the optional dependency until it is actually needed.

Parameters:
  • module_name – The __name__ of the calling module.

  • import_mapping – A dict mapping attribute names to (module_path, requires) tuples. module_path is a relative import path (e.g. ".torch_calc") and requires is the optional-dependency group name (e.g. "nn") shown in the error message when the dependency is missing.

Usage:

# __init__.py
from .core import CoreClass
from ..utils.lazy_loader import setup_lazy_imports

if TYPE_CHECKING:
    from .optional import OptionalClass

__getattr__, __dir__, __all__ = setup_lazy_imports(__name__, {
    "OptionalClass": (".optional", "nn"),
})

Submodules#

nd2py.search.ndformer.ndformer_config module#

class nd2py.search.ndformer.ndformer_config.NDFormerModelConfig(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20)[source]#

Bases: object

Configuration for NDFormer model architecture and capabilities.

═══════════════════════════════════════════════════════════════════════════ PURPOSE ═══════════════════════════════════════════════════════════════════════════

This class defines the model’s structure and capabilities:

  • Model architecture (transformer layers, GNN layers, embedding dimensions)

  • Tokenization scheme (number encoding, vocabulary)

  • Supported operators and sequence length limits

═══════════════════════════════════════════════════════════════════════════ RELATIONSHIP WITH NDFormerMCTS (INFERENCE-TIME SEARCH) ═══════════════════════════════════════════════════════════════════════════

TL;DR: Users of NDFormerMCTS do not need to interact with this class directly.

When using a pre-trained model with NDFormerMCTS:

  1. The trained model + config is a black box: The model and its associated config are loaded together from a checkpoint. The config is used internally to reconstruct the tokenizer and model architecture.

  2. No control over search behavior: NDFormerMCTS does NOT use this config to control how search proceeds. Search parameters (beam_width, temperature, c, etc.) are configured directly in NDFormerMCTS.__init__().

  3. Capability validation only: NDFormerMCTS may use the config to verify that search settings are within the model’s capabilities: - max_len (search) should not exceed max_seq_len (model capability) - Operator set should be compatible with trained vocabulary - Variable count should not exceed max_var_num

This design follows standard ML practice where model architecture config is separate from inference/search hyperparameters.

═══════════════════════════════════════════════════════════════════════════ USAGE ═══════════════════════════════════════════════════════════════════════════

Training a new model: `python config = NDFormerModelConfig(model='default', n_head=16, d_emb=256) tokenizer = NDFormerTokenizer(config, variables) model = NDFormerModel.create(config, tokenizer) # ... train on dataset ... torch.save({'model': model.state_dict(), 'config': config}, 'checkpoint.pth') `

Using a pre-trained model (automatic, users don’t handle config directly): `python search = NDFormerMCTS(variables=[x, y]) search.load_ndformer('hf://YuMeow/ndformer:best.pth') # Config is automatically loaded and used for capability validation search.fit(X, y) `

Creating alternative model architectures: ```python @NDFormerModel.register_model(‘gcn’) class GCNNDFormer(NDFormerModel):

def __init__(self, config, tokenizer):

super().__init__(config, tokenizer) # … custom architecture …

config = NDFormerModelConfig(model=’gcn’) model = NDFormerModel.create(config, tokenizer) ```

═══════════════════════════════════════════════════════════════════════════ ATTRIBUTES ═══════════════════════════════════════════════════════════════════════════

n_mantissa: int = 4#

Number of digits in mantissa for number tokenization.

min_exponent: int = -100#

Minimum exponent value for number tokenization.

max_exponent: int = 100#

Maximum exponent value for number tokenization.

max_var_num: int = 10#

Maximum number of variables per nettype (node/edge/scalar).

model: str = 'default'#

Model architecture type. Used by NDFormerModel.create() to select subclass.

Available models are registered via @NDFormerModel.register_model(‘name’). Default is ‘default’ (the base NDFormerModel architecture).

n_head: int = 8#

Number of attention heads in multi-head self-attention.

d_emb: int = 128#

Dimension of token embeddings and hidden states.

d_ff: int = 512#

Dimension of feed-forward network intermediate layer.

dropout: float = 0.2#

Dropout probability applied to embeddings and attention.

n_GNN_layers: int = 2#

Number of graph neural network layers for encoding graph topology.

n_transformer_encoder_layers: int = 2#

Number of transformer encoder layers.

n_transformer_decoder_layers: int = 2#

Number of transformer decoder layers for autoregressive generation.

use_aux_input: bool = True#

Whether to use auxiliary inputs (parent/nettype information).

__init__(n_mantissa: int = 4, min_exponent: int = -100, max_exponent: int = 100, max_var_num: int = 10, model: str = 'default', n_head: int = 8, d_emb: int = 128, d_ff: int = 512, dropout: float = 0.2, n_GNN_layers: int = 2, n_transformer_encoder_layers: int = 2, n_transformer_decoder_layers: int = 2, use_aux_input: bool = True, n_induction_points: int = 128, max_seq_len: int = 100, operands: Tuple[str] = <factory>, min_data_num: int = 100, max_data_num: int = 200, min_node_num: int = 10, max_node_num: int = 100, min_edge_num: int = 20, max_edge_num: int = 600, min_var_val: int = -10, max_var_val: int = 10, min_coeff_val: int = -20, max_coeff_val: int = 20) None#
n_induction_points: int = 128#

Number of induction points for Set Transformer encoder (FLASH-ANSR). Only used when model=’flash_ansr’.

max_seq_len: int = 100#

Maximum sequence length the model can handle.

Note: NDFormerMCTS uses this for capability validation - search with max_len > max_seq_len may produce unreliable results.

operands: Tuple[str]#

Tuple of operator class names in the model vocabulary.

Note: NDFormerMCTS may check if its operator set is compatible with the trained model’s vocabulary.

min_data_num: int = 100#

Minimum number of samples per training equation.

max_data_num: int = 200#

Maximum number of samples per training equation.

min_node_num: int = 10#

Minimum number of nodes in generated graphs.

max_node_num: int = 100#

Maximum number of nodes in generated graphs.

min_edge_num: int = 20#

Minimum number of edges in generated graphs.

max_edge_num: int = 600#

Maximum number of edges in generated graphs.

min_var_val: int = -10#

Minimum absolute value for variable sampling.

max_var_val: int = 10#

Maximum absolute value for variable sampling.

min_coeff_val: int = -20#

Minimum value for equation coefficients.

max_coeff_val: int = 20#

Maximum value for equation coefficients.

nd2py.search.ndformer.ndformer_dataset module#

class nd2py.search.ndformer.ndformer_dataset.InfiniteSampler(*args: Any, **kwargs: Any)[source]#

Bases: Sampler

class nd2py.search.ndformer.ndformer_dataset.NDFormerDataset(*args: Any, **kwargs: Any)[source]#

Bases: Dataset

__init__(config: NDFormerModelConfig, eqtree_generator: NDFormerEqtreeGenerator, topo_generator: NDFormerGraphGenerator, data_generator: NDFormerDataGenerator, tokenizer: NDFormerTokenizer, n_samples: int | None = None, random_state: int | None = None)[source]#
collate_fn(batch)[source]#
get_sampler()[source]#

nd2py.search.ndformer.ndformer_generator module#

class nd2py.search.ndformer.ndformer_generator.NDFormerEqtreeGenerator(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#

Bases: object

__init__(variables: List[Variable], binary: List[str | Symbol] = [Add, Sub, Mul, Div], unary: List[str | Symbol] = [Sqrt, SqrtAbs, Pow2, Pow3, Log, LogAbs, Exp, Abs, Neg, Inv, Sin, Cos, Tan, Tanh, Sigmoid, Aggr, Sour, Targ, Readout], full_prob: float = 0.5, depth_range: Tuple[int, int] = (2, 6), const_range: Tuple[float, float] = None, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, scalar_number_only=True)[source]#
generate_node(nettypes: Set[NetType], _rng: np.random_Generator = None) Symbol[source]#
generate_leaf(nettypes: Set[NetType], _rng: np.random.Generator = None) Number | Variable[source]#
sample(nettypes: Set[NetType] = 'scalar', assign_root_nettypes=True, _rng: np.random.Generator = None) Symbol[source]#
class nd2py.search.ndformer.ndformer_generator.NDFormerGraphGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#
sample(topology: Literal['ER', 'BA', 'WS', 'Complete'] = None, _rng: Generator = None, **kwargs)[source]#

Arguments: - V: node num - topology: ‘ER’, ‘BA’, ‘WS’, ‘Complete’ - kwargs:

(When topology is ‘ER’) - p: edge probability - directed: directed or not (When topology is ‘BA’) - m: number of edges to attach from a new node to existing nodes (When topology is ‘WS’) - k: each node is connected to k nearest neighbors in ring topology - p: probability of rewiring each edge (When topology is ‘Complete’) - None

Return: - edge_list: (2, E), edge list - num_nodes: int, node num

generate_ER_graph(V=None, E=None, directed=None, _rng: Generator = None)[source]#
generate_BA_graph(V=None, m=None, _rng: Generator = None)[source]#
generate_WS_graph(V=None, k=None, p=None, _rng: Generator = None)[source]#
generate_complete_graph(V=None, _rng: Generator = None)[source]#
class nd2py.search.ndformer.ndformer_generator.NDFormerDataGenerator(config: NDFormerModelConfig)[source]#

Bases: object

__init__(config: NDFormerModelConfig)[source]#
sample(eqtree: Symbol, dist_type: Literal['GMM', 'Uniform', 'Gaussian'] = 'GMM', edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, sample_num: int = None, _rng: Generator = None, **kwargs)[source]#

Arguments: - eqtree: a symbolic expression tree

Returns:

var_dict = dict(

A: np.ndarray, (V, V) G: np.ndarray, (E, 2) out: np.ndarray, (N, V) or (N, E) v1/v2/v3/v4/v5: np.ndarray, (N, V) e1/e2/e3/e4/e5: np.ndarray, (N, E)

)

generate_normal_data(size, mean=None, std=None, _rng: Generator = None)[source]#
generate_uniform_data(size, low=None, high=None, _rng: Generator = None)[source]#
generate_GMM_data(size, L=1, _rng: Generator = None)[source]#

nd2py.search.ndformer.ndformer_mcts module#

NDFormer-guided MCTS for Symbolic Regression

Uses a pre-trained NDFormer model to guide MCTS search via PUCK

class nd2py.search.ndformer.ndformer_mcts.NDFormerNode(eqtree: Symbol)[source]#

Bases: Node

__init__(eqtree: Symbol)[source]#
UCT(c) float[source]#

PUCT score for a node

PUCT(s, a) = Q(s, a) + c_puct * P(s, a) * sqrt(sum(N) / (1 + N(s, a)))

class nd2py.search.ndformer.ndformer_mcts.NDFormerMCTS(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter: int = 100, use_tqdm: bool = False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, time_limit: float = None, sample_num: int = 300, keep_vars: bool = False, normalize_y: bool = False, normalize_X: bool = False, remove_abnormal: bool = False, train_eval_split: float = 1.0, child_num: int = 50, n_playout: int = 100, d_playout: int = 10, max_len: int = 30, c: float = 1.41, eta: float = 0.999, ndformer: NDFormerModel | None = None, tokenizer: NDFormerTokenizer | None = None, ndformer_temperature: float = 1.0, beam_width: int = 10, **kwargs)[source]#

Bases: MCTS

NDFormer-guided Monte Carlo Tree Search for Symbolic Regression.

This class extends MCTS by using a pre-trained NDFormer model to provide prior probabilities for action selection via the PUCT algorithm:

PUCT(s, a) = Q(s, a) + c_puct * P(s, a) * sqrt(sum(N(s, b)) / (1 + N(s, a)))

where P(s, a) is the prior probability from NDFormer’s policy head.

The pre-trained model is treated as a black box - it provides policy priors but does not control search behavior. Search is controlled by parameters like beam_width, ndformer_temperature, c, and eta.

Usage Examples#

# Load pre-trained model from Hugging Face Hub search = NDFormerMCTS(variables=[x, y]) search.load_ndformer(‘hf://YuMeow/ndformer:best.pth’) search.fit(X, y)

# Or pass model directly model = NDFormerModel(config) model.load_state_dict(checkpoint) tokenizer = NDFormerTokenizer(config, [x, y]) search = NDFormerMCTS(

variables=[x, y], ndformer=model, tokenizer=tokenizer, beam_width=20, c=1.5,

) search.fit(X, y)

param variables:

Input variables for symbolic regression.

type variables:

List[nd.Variable]

param binary:

Binary operators for search. Default: [Add, Sub, Mul, Div, Max, Min].

type binary:

List[nd.Symbol], optional

param unary:

Unary operators for search. Default: [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan].

type unary:

List[nd.Symbol], optional

param max_params:

Maximum number of numeric parameters in expressions. Default: 2.

type max_params:

int, optional

param const_range:

Range for constant initialization. Default: (-1.0, 1.0).

type const_range:

Tuple[float, float], optional

param depth_range:

Depth range for generated expressions. Default: (2, 6).

type depth_range:

Tuple[int, int], optional

param nettype:

Network type for the search. Default: “scalar”.

type nettype:

Literal[“node”, “edge”, “scalar”], optional

param log_per_iter:

Log every N iterations. Default: inf.

type log_per_iter:

int, optional

param log_per_sec:

Log every N seconds. Default: inf.

type log_per_sec:

float, optional

param log_detailed_speed:

Log detailed timing information. Default: False.

type log_detailed_speed:

bool, optional

param save_path:

Directory to save search records. Default: None.

type save_path:

str, optional

param random_state:

Random seed for reproducibility. Default: None.

type random_state:

int, optional

param n_iter:

Maximum number of MCTS iterations. Default: 100.

type n_iter:

int, optional

param use_tqdm:

Show progress bar. Default: False.

type use_tqdm:

bool, optional

param edge_list:

Graph edge list for network operators. Default: None.

type edge_list:

Tuple[List[int], List[int]], optional

param num_nodes:

Number of nodes in the graph. Default: None.

type num_nodes:

int, optional

param time_limit:

Maximum search time in seconds. Default: None.

type time_limit:

float, optional

param sample_num:

Number of samples for evaluation. Default: 300.

type sample_num:

int, optional

param keep_vars:

Keep original variable names. Default: False.

type keep_vars:

bool, optional

param normalize_y:

Normalize target values. Default: False.

type normalize_y:

bool, optional

param normalize_X:

Normalize input features. Default: False.

type normalize_X:

bool, optional

param remove_abnormal:

Remove abnormal samples. Default: False.

type remove_abnormal:

bool, optional

param train_eval_split:

Train/eval data split ratio. Default: 1.0.

type train_eval_split:

float, optional

param child_num:

Maximum children per expansion. Default: 50.

type child_num:

int, optional

param n_playout:

Number of playouts per simulation. Default: 100.

type n_playout:

int, optional

param d_playout:

Maximum depth per playout. Default: 10.

type d_playout:

int, optional

param max_len:

Maximum expression length during search. Default: 30.

type max_len:

int, optional

param c:

PUCT exploration constant. Default: 1.41.

type c:

float, optional

param eta:

Complexity penalty factor for reward. Default: 0.999.

type eta:

float, optional

param ndformer:

Pre-trained NDFormer model. Default: None.

type ndformer:

NDFormerModel, optional

param tokenizer:

Tokenizer for the model. Default: None.

type tokenizer:

NDFormerTokenizer, optional

param ndformer_temperature:

Temperature for policy softmax. Default: 1.0.

type ndformer_temperature:

float, optional

param beam_width:

Beam size for leaf selection. Default: 10.

type beam_width:

int, optional

__init__(variables: List[Variable], binary: List[Symbol] = [Add, Sub, Mul, Div, Max, Min], unary: List[Symbol] = [Sqrt, Log, Abs, Neg, Inv, Sin, Cos, Tan], max_params: int = 2, const_range: Tuple[float, float] = (-1.0, 1.0), depth_range: Tuple[int, int] = (2, 6), nettype: Literal['node', 'edge', 'scalar'] | None = 'scalar', log_per_iter: int = inf, log_per_sec: float = inf, log_detailed_speed: bool = False, save_path: str = None, random_state: int | None = None, n_iter: int = 100, use_tqdm: bool = False, edge_list: Tuple[List[int], List[int]] = None, num_nodes: int = None, time_limit: float = None, sample_num: int = 300, keep_vars: bool = False, normalize_y: bool = False, normalize_X: bool = False, remove_abnormal: bool = False, train_eval_split: float = 1.0, child_num: int = 50, n_playout: int = 100, d_playout: int = 10, max_len: int = 30, c: float = 1.41, eta: float = 0.999, ndformer: NDFormerModel | None = None, tokenizer: NDFormerTokenizer | None = None, ndformer_temperature: float = 1.0, beam_width: int = 10, **kwargs)[source]#

Initialize a Monte Carlo Tree Search symbolic regression estimator.

This configures the function set, search hyperparameters, logging behavior, optional graph structure, and various data preprocessing options used during MCTS-based exploration of expression trees.

Parameters:
  • variables (List[Variable]) – List of input variables that can be used in generated expressions.

  • binary (List[Symbol], optional) – Binary operator symbols available to the search (for example Add, Sub, Mul). Defaults to a standard arithmetic and min/max set.

  • unary (List[Symbol], optional) – Unary operator symbols available to the search (for example Sqrt, Log, Sin). Defaults to a standard set of common functions.

  • max_params (int, optional) – Maximum number of numeric parameters (Number nodes) allowed in an expression. Defaults to 2.

  • const_range (Tuple[float, float], optional) – Range from which random constants are sampled. Defaults to (-1.0, 1.0).

  • depth_range (Tuple[int, int], optional) – Minimum and maximum tree depth for randomly generated expressions. Defaults to (2, 6).

  • nettype (Optional[Literal["node", "edge", "scalar"]], optional) – Nettype of the target expression when working with graph data. Defaults to "scalar".

  • log_per_iter (float, optional) – Log progress every log_per_iter iterations; use float("inf") to disable iteration-based logging. Defaults to float("inf").

  • log_per_sec (float, optional) – Log progress every log_per_sec seconds; use float("inf") to disable time-based logging. Defaults to float("inf").

  • log_detailed_speed (bool, optional) – If True, include detailed timing information for individual steps in logs. Defaults to False.

  • save_path (str, optional) – Directory in which JSON lines of per-iteration records are stored as records.jsonl. If None, records are not written to disk. Defaults to None.

  • random_state (Optional[int], optional) – Seed used to control randomness for reproducible runs. Defaults to None.

  • n_iter (int, optional) – Maximum number of MCTS iterations. Defaults to 100.

  • use_tqdm (bool, optional) – If True, wrap the main search loop with a tqdm progress bar. Defaults to False.

  • edge_list (Tuple[List[int], List[int]], optional) – Optional graph edge list (sources, targets) used when evaluating graph operators. Defaults to None.

  • num_nodes (int, optional) – Number of nodes in the underlying graph; if None, it may be inferred elsewhere. Defaults to None.

  • time_limit (float, optional) – Maximum wall-clock time (in seconds) for the search; if exceeded, the search terminates early. Defaults to None.

  • sample_num (int, optional) – Number of samples drawn when evaluating or sampling candidate expressions. Defaults to 300.

  • keep_vars (bool, optional) – If True, keep variable names instead of renaming them during preprocessing. Defaults to False.

  • normalize_y (bool, optional) – If True, normalize target values before fitting. Defaults to False.

  • normalize_X (bool, optional) – If True, normalize input features before fitting. Defaults to False.

  • remove_abnormal (bool, optional) – If True, attempt to remove abnormal samples before training. Defaults to False.

  • train_eval_split (float, optional) – Fraction of data used for training; the remainder may be used for evaluation. Defaults to 1.0.

  • child_num (int, optional) – Maximum number of child nodes expanded from a node during expansion. Defaults to 50.

  • n_playout (int, optional) – Number of rollouts performed from a node during simulation. Defaults to 100.

  • d_playout (int, optional) – Maximum depth of each simulation rollout. Defaults to 10.

  • max_len (int, optional) – Maximum allowed expression length; used to constrain actions. Defaults to 30.

  • c (float, optional) – Exploration constant used in the UCT formula during selection. Defaults to 1.41.

  • eta (float, optional) – Complexity penalty factor used in the reward function, where larger eta discounts complex expressions less. Defaults to 0.999.

  • **kwargs – Additional unused keyword arguments; a warning is logged if any are provided.

fit(X: ndarray | DataFrame | Dict[str, ndarray], y: ndarray | Series)[source]#

Fit the model using NDFormer-guided MCTS with batch expansion

First encodes the graph data and caches memory, then runs MCTS search with beam search based select and batch expand for efficiency

load_ndformer(checkpoint_path='hf://YuMeow/ndformer:best.pth', device=None)[source]#

Load pre-trained NDFormer model and tokenizer from checkpoint.

The model config is automatically loaded from the checkpoint. Users do not need to provide a config manually.

Parameters:
  • checkpoint_path – Path to model checkpoint. Can be: - Local file path: “/path/to/checkpoint.pth” - HF shorthand: “YuMeow/ndformer:best.pth” - HF full syntax: “hf://YuMeow/ndformer:best.pth”

  • device – Device to load model on. If None, auto-detects CUDA/CPU.

encode_data(X: Dict[str, ndarray], y: ndarray)[source]#

Encode graph data using NDFormer encoder and cache memory for reuse

set_policy_prior(actions_dict: Dict[NDFormerNode, List[Tuple[Symbol, Symbol]]])[source]#

Set prior probabilities from NDFormer for valid actions by decoding the current partial sequences in batch

Parameters:
  • states – List of MCTS nodes

  • actions_dict – List of valid (empty, operator) tuples for each node

Returns:

List of dictionaries mapping actions to prior probabilities (one per node)

select(root: NDFormerNode) List[NDFormerNode][source]#

Select leaf nodes using Beam Search with PUCT

Returns a list of leaf nodes to expand in batch

expand(nodes: List[NDFormerNode], X: Dict[str, ndarray], y: ndarray) List[NDFormerNode][source]#

Expand multiple nodes with NDFormer-guided action selection in batch

Parameters:

nodes – List of leaf nodes to expand

Returns:

List of selected child nodes for simulation

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') NDFormerMCTS#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

nd2py.search.ndformer.ndformer_model module#

class nd2py.search.ndformer.ndformer_model.NDFormerModel(*args: Any, **kwargs: Any)[source]#

Bases: FactoryMixin, Module

Base class for NDFormer models.

Inherits from FactoryMixin which provides: - NDFormerModel.register_model(‘name’): Decorator to register subclasses - NDFormerModel.create(config, tokenizer): Factory method to create instances

The base class is automatically registered as ‘default’ model.

Example

@NDFormerModel.register_model(‘gcn’) class GCNNDFormer(NDFormerModel):

def __init__(self, config, tokenizer):

super().__init__(config, tokenizer) # … custom architecture …

# Usage config = NDFormerModelConfig(model=’gcn’) model = NDFormerModel.create(config, tokenizer)

__init__(config: NDFormerModelConfig, tokenizer: NDFormerTokenizer)[source]#
encode_graph(data_node, data_edge, data_scalar, edge_list, num_nodes, node_batch_idx=None, timer=None)[source]#

图编码阶段:仅在图拓扑变更或初始化时调用一次。

Args: - data_node: (SampleNum, NodeNum, max_var_num+1, 3) - data_edge: (SampleNum, EdgeNum, max_var_num+1, 3) - data_scalar: (SampleNum, BatchNum, max_var_num+1, 3) - edge_list: (2, EdgeNum) - num_nodes: int - node_batch_idx: (TotalNodeNum,) 每个节点所属图的索引

Returns: - memory: (BatchNum, MaxNodeNum, d_emb) - memory_key_padding_mask: (BatchNum, MaxNodeNum) 或 None (如果 node_batch_idx is None), 其中 True 代表需要被忽略的 Pad

decode_sequence(memory, partial_eq, memory_key_padding_mask=None, seq_batch_idx=None, timer=None)[source]#

序列解码阶段:支持 1-to-N 广播,可高频调用。

Args: - memory: (BatchNum, NodeNum*SampleNum, d_emb), 来自 encode_graph 的输出 - partial_eq: (SeqNum, MaxSeqLen) - memory_key_padding_mask: (BatchNum, NodeNum*SampleNum) 或 None, 其中 True 代表需要被忽略的 Pad - seq_batch_idx: (SeqNum,) 每个序列所属图的索引, 用于将 memory 中的节点特征正确广播到每个序列

Returns: - logits: (SeqNum, vocab_size)

forward(batch_dict, timer=None)[source]#

训练时的入口函数:无缝衔接 Dataset 的 collate_fn。

nd2py.search.ndformer.ndformer_model_flash_ansr module#

FLASH-ANSR variant of NDFormer.

Based on the paper describing FLASH-ANSR architecture: - Pre-norm Transformer (norm_first=True) - Set Transformer encoder with induction points - FlashAttention support (via torch.nn.MultiheadAttention with backend selection)

class nd2py.search.ndformer.ndformer_model_flash_ansr.SetTransformerEncoder(*args: Any, **kwargs: Any)[source]#

Bases: Module

Set Transformer Encoder with induction points.

Drop-in replacement for nn.TransformerEncoder with identical forward signature. Induction points are used internally but output shape matches input shape.

Architecture (Lee et al., 2019): 1. Induction points attend to input data (cross-attention) 2. Self-attention among induction points 3. Induction points attend back to original positions (output projection)

Parameters:
  • encoder_layer – Not used (kept for API compatibility)

  • num_layers – Number of transformer layers

  • norm – Final normalization layer

  • d_model – Embedding dimension

  • n_induction_points – Number of learnable induction points

__init__(encoder_layer=None, num_layers=2, norm=None, enable_nested_tensor=True, mask_check=True, d_model: int = None, n_induction_points: int = 128, n_head: int = 8)[source]#
forward(src: torch.Tensor, mask=None, src_key_padding_mask: torch.Tensor | None = None, is_causal=None) torch.Tensor[source]#
Parameters:
  • src – Input tensor (batch, seq_len, d_model) - GNN encoded nodes

  • mask – Not used (kept for API compatibility)

  • src_key_padding_mask – Padding mask (batch, seq_len), True for padding

  • is_causal – Not used (kept for API compatibility)

Returns:

Output tensor with same shape as input (batch, seq_len, d_model)

class nd2py.search.ndformer.ndformer_model_flash_ansr.FlashANSRNDFormer(*args: Any, **kwargs: Any)[source]#

Bases: NDFormerModel

FLASH-ANSR: Transformer-based symbolic regression with Set Transformer encoder and pre-norm architecture.

Key features: - Pre-norm Transformer (norm_first=True) - Set Transformer encoder with learnable induction points - LayerNorm for normalization

Reuses NDFormerModel.encode_graph() and NDFormerModel.decode_sequence().

__init__(config: NDFormerModelConfig, tokenizer: NDFormerTokenizer)[source]#
model_name = 'flash_ansr'#

nd2py.search.ndformer.ndformer_tokenizer module#

class nd2py.search.ndformer.ndformer_tokenizer.NumberTokenizer(n_mantissa=4, min_exponent=-100, max_exponent=100)[source]#

Bases: object

__init__(n_mantissa=4, min_exponent=-100, max_exponent=100)[source]#
encode(value: float | List[float], mode: Literal['token', 'token_id'] = 'token') List[str | int][source]#
decode(tokens: List[str], mode: Literal['token', 'token_id'] = 'token') List[float][source]#
class nd2py.search.ndformer.ndformer_tokenizer.NDFormerTokenizer(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#

Bases: object

__init__(config: NDFormerModelConfig, variables: List[Symbol] | None = None)[source]#
property vocab_size#
property pad_token_id#
property sos_token_id#
property eos_token_id#
property unk_token_id#
encode(eqtree: Symbol, mode: Literal['token', 'token_id'] = 'token') Tuple[List[int], List[int], List[int]][source]#
decode(tokens: List[str], parents: List[str], nettypes: List[str], mode: Literal['token', 'token_id'] = 'token') Symbol[source]#
encode_array(data: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#

专门用于将纯浮点数组转换为 token 或 token_id

decode_array(tokens: ndarray, mode: Literal['token', 'token_id'] = 'token_id')[source]#

专门用于将 token 或 token_id 数组转换回纯浮点数组

to_dict() dict[source]#

导出核心配置以供序列化

classmethod from_dict(config: dict) NDFormerTokenizer[source]#
save(filepath: str)[source]#

保存到本地 JSON 文件

classmethod load(filepath: str) NDFormerTokenizer[source]#

从本地 JSON 文件加载