Frequently Asked Questions ========================== This page provides answers to common questions about PyTorch-BSF, its mathematical foundations, and practical usage. General Information ------------------- How does PyTorch-BSF differ from other hyperparameter optimization tools? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The key difference is that PyTorch-BSF **exploits problem structure** rather than treating the objective as a black box. * **Dramatically fewer evaluations:** Black-box methods such as Bayesian optimization make no assumptions about the objective and must explore the search space from scratch. Approximating a Pareto front to reasonable accuracy can require hundreds of evaluations. Because PyTorch-BSF assumes the problem is *weakly simplicial*, it can often recover the entire Pareto front from as few as 50 points with higher accuracy. * **Regression-based approach:** Unlike search methods that find discrete points, PyTorch-BSF fits a continuous parametric surface (a Bézier simplex). Once trained, you can evaluate any point on the trade-off surface instantly. * **Dimension-free convergence:** When data lie along a low-dimensional manifold embedded in a high-dimensional space, the convergence rate depends on the **intrinsic dimension** of the simplex, not on the ambient space dimension. This avoids the curse of dimensionality common in black-box methods. What applications are there beyond multi-objective optimization? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While primarily used for Pareto front approximation, Bézier simplex fitting is a general-purpose regression technique for any continuous map from a simplex to a Euclidean space. Potential applications include: * **Interpolation of parametric families:** When a model's behavior varies continuously with coefficients on a simplex (e.g., mixture weights, regularization strengths in Elastic Net), a Bézier simplex can compactly represent the entire family. * **Shape modeling:** Bézier simplices generalize Bézier triangles used in CAD and computer graphics; they can represent smooth curved surfaces of any dimension. * **Solution manifolds:** Any problem whose solution set forms a continuous simplex-structured manifold is a candidate for fitting. * **Scientific data fitting:** Modeling physical phenomena where constraints naturally form a simplex (e.g., chemical concentrations in a mixture). Mathematical Foundations ------------------------ What is the "weakly simplicial" assumption? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A problem is **weakly simplicial** if its Pareto set (and Pareto front) is the continuous image of a standard simplex. Topologically, this means the set of optimal trade-offs has no "holes" or disconnected components and can be "stretched" or "bent" from a simplex. This assumption is remarkably broad. For example, it has been mathematically proven that **all unconstrained strongly convex optimization problems are weakly simplicial**. This covers a wide class of practical problems, including Elastic Net regression, Ridge regression, and many regularized empirical risk minimization tasks. See the :doc:`whatis` section for formal definitions. Can I verify the "weakly simplicial" assumption for my problem? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. If your problem is unconstrained and strongly convex, it is guaranteed to be weakly simplicial. For other cases, you can use a **data-driven statistical test** based on persistent homology. The test checks whether the topology of the sampled Pareto set is consistent with a simplex structure. If the test rejects the simplicial hypothesis, a Bézier simplex model may not be appropriate. If it does not reject, you have statistical evidence supporting the use of PyTorch-BSF. Detailed information on these tests can be found in :cite:t:`hamada2018data` and :cite:t:`hamada2020test`. Practical Usage --------------- How do I choose the degree and estimate the required sample size? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The complexity of a Bézier simplex is determined by its **degree** (:math:`D`) and the number of **objectives** (:math:`M`). The number of control points is given by the formula: .. math:: N_{cp} = \binom{D+M-1}{M-1} **Guidelines:** * **Start low:** A degree of 2 or 3 is usually a good starting point. Low-degree models are less prone to overfitting and faster to train. * **Sample size:** You need at least as many training samples as there are control points (:math:`N_{cp}`) for the problem to be well-determined. In practice, having **2 to 3 times as many samples** as control points leads to more stable and reliable fits. * **Refine as needed:** If the residuals (fitting errors) are high, increase the degree. If the model overfits (low training error but poor generalization), increase the sample size or decrease the degree. How do I normalize my parameters or values? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``fit()`` function requires that each row of the ``params`` tensor sums to 1 (i.e., lies on the standard simplex :math:`\Delta^{M-1}`). If your raw parameters don't satisfy this, you must normalize them manually. Additionally, normalizing the output ``values`` can improve fitting stability and accuracy. PyTorch-BSF provides several options for automatic value normalization in the CLI/MLflow interface. Please refer to the :doc:`advanced/normalization` page for detailed instructions on how to normalize your parameter and value tensors. Can I use GPU or multi-node training? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Yes.** Since PyTorch-BSF is built on PyTorch and PyTorch Lightning, it supports hardware acceleration and distributed training out of the box. Please refer to the :doc:`advanced/acceleration` page for detailed instructions on using GPUs (single or multiple), multi-node clusters, and mixed-precision training. How do I save and load a trained model? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can save and load models using the ``torch_bsf.bezier_simplex`` module. Supported formats include ``.pt`` (PyTorch), ``.csv``, ``.tsv``, ``.json``, and ``.yaml``. .. code-block:: python from torch_bsf.bezier_simplex import save, load # Save the trained model save("my_model.pt", bs) # Load it back bs = load("my_model.pt") How can I perform cross-validation or grid search? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch-BSF includes built-in tools for model selection and specific task automation: * **K-Fold Cross-Validation:** Run `python -m torch_bsf.model_selection.kfold` to evaluate model performance across different data splits. * **Elastic Net Grid Search:** Run `python -m torch_bsf.model_selection.elastic_net_grid` to generate parameter grids specifically for Elastic Net regularization paths. When should I use the ``fix`` argument? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``fix`` argument allows you to hold specific control points constant during training. This is useful for: * **Boundary constraints:** If you know the exact values at the vertices of the simplex (e.g., results of single-objective optimizations), fix those vertices and fit only the interior. * **Incremental refinement:** Fit a low-degree model first, then use its control points as initialization for a higher-degree model, fixing the well-estimated parts to stabilize training. * **Encoding prior knowledge:** If theoretical or physical constraints dictate the value at certain parameter combinations, you can pin those points to ensure the model respects them. Troubleshooting --------------- Are approximation results always reliable? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Not necessarily. While the Universal Approximation Theorem guarantees that a Bézier simplex *can* approximate any continuous map, a model with a **fixed** degree might not be sufficient for highly complex or "wiggly" surfaces. To ensure reliability: 1. **Check Residuals:** Large fitting errors indicate the degree might be too low. 2. **Cross-Validation:** Use the built-in k-fold tools to ensure the model generalizes well to unseen data. 3. **Visualization:** If the dimension allows, plot the resulting surface against the training points. 4. **Domain Knowledge:** Verify that the predicted trade-offs make sense according to the physics or logic of your problem. How can I improve fitting convergence or accuracy? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you encounter poor accuracy or optimization issues, try these steps: 1. **Check Data Quality:** Ensure ``params`` sum to 1 and that you have enough samples relative to the degree. 2. **Adjust the Degree:** If the surface is too complex, increase the degree. If the model is oscillating or overfitting, decrease it. 3. **Better Initialization:** Use the ``init`` argument to provide a better starting point, perhaps from a coarse fit or domain knowledge. 4. **Increase Training Epochs:** For complex surfaces, the L-BFGS optimizer might need more iterations. In the CLI, use ``--max_epochs``. 5. **Data Normalization:** Scale your output values (``values``) to a similar range (e.g., using ``--normalize std``) to help the optimizer converge faster. References ---------- .. bibliography::