Elastic Net Grid Sampling ========================= When fitting a Bézier simplex to the elastic-net regularization map you first need to choose a set of weight vectors on the standard 2-simplex :math:`\Delta^2` at which to evaluate the model. The :func:`~torch_bsf.model_selection.elastic_net_grid.elastic_net_grid` function generates a purpose-built grid that respects the intrinsic geometry of the elastic-net hyperparameter space. The Hyperparameter Space ------------------------ The standard Elastic Net regression problem is formulated as: .. math:: \min_{\beta \in \mathbb{R}^N} \frac{1}{2n}\|y - X\beta\|_2^2 + \lambda \Bigl(\alpha \|\beta\|_1 + \frac{1-\alpha}{2}\|\beta\|_2^2\Bigr) where :math:`\lambda \ge 0` is the overall regularization strength and :math:`\alpha \in [0, 1]` controls the L1/L2 mixing ratio (setting :math:`\alpha = 1` recovers the Lasso; :math:`\alpha = 0` gives Ridge regression). To cast this into the multi-objective framework, we identify three objectives over :math:`\beta \in \mathbb{R}^N`: .. math:: f_{\text{data}}(\beta) &= \frac{1}{2n}\|y - X\beta\|_2^2 + \frac{\epsilon}{2}\|\beta\|_2^2 \\ f_{\text{sparse}}(\beta) &= \|\beta\|_1 + \frac{\epsilon}{2}\|\beta\|_2^2 \\ f_{\text{smooth}}(\beta) &= \frac{1 + \epsilon}{2}\|\beta\|_2^2 where :math:`n` is the number of observations and :math:`\epsilon > 0` is a small constant. It appears as :math:`\frac{\epsilon}{2}\|\beta\|_2^2` in :math:`f_{\text{data}}` and :math:`f_{\text{sparse}}` to make those terms strongly convex, and it is absorbed into :math:`f_{\text{smooth}}` via the coefficient :math:`\frac{1+\epsilon}{2}`. This ensures all three objectives are strongly convex, which is required for the solution map to be weakly simplicial; see :doc:`../whatis` for a detailed discussion. Equivalently, elastic-net optimization can be written as a convex combination of the same three objectives: .. math:: \min_{\beta} \; w_1 \, f_{\text{data}}(\beta) + w_2 \, f_{\text{sparse}}(\beta) + w_3 \, f_{\text{smooth}}(\beta), \qquad (w_1, w_2, w_3) \in \Delta^2. The conventional elastic-net parameters :math:`\lambda \ge 0` (overall regularization strength) and :math:`\alpha \in [0, 1]` (L1 mixing ratio) relate to the simplex weight vector by: .. math:: w_1 = \frac{1}{1+\lambda}, \qquad w_2 = \frac{\lambda\,\alpha}{1+\lambda}, \qquad w_3 = \frac{\lambda\,(1-\alpha)}{1+\lambda}. With these weights the convex-combination objective equals the :math:`\epsilon`-regularized elastic net objective divided by the positive constant :math:`(1 + \lambda)`. Because this scale factor does not depend on :math:`\beta`, both formulations share the same minimizer. The :math:`(\lambda, \alpha)` parameter space is a semi-infinite rectangle :math:`[0,\infty) \times [0,1]`. When :math:`\lambda = 0` the regularization terms vanish and the solution depends only on the data, *regardless of* :math:`\alpha`. Therefore the entire edge :math:`\{\lambda = 0\} \times [0, 1]` maps to the single vertex :math:`(w_1, w_2, w_3) = (1, 0, 0)` of the simplex. Identifying this edge with a single point transforms the rectangle into a triangle – the 2-simplex :math:`\Delta^2`. Conversely, as :math:`\lambda \to \infty` the regularization overwhelms the data term and drives all model coefficients to zero, regardless of :math:`\alpha`. In the elastic net this limit is called the **null model** (all :math:`\beta_i = 0`). All weight vectors on the opposite edge of the simplex :math:`\{(w_1, w_2, w_3) : w_1 = 0\}` — the *base edge* connecting :math:`(0, 1, 0)` and :math:`(0, 0, 1)` — therefore correspond to the same solution. Since the Bézier simplex (and the underlying solution map) must assign a single output to each input weight, all of these base-edge weights are identified with a single null-model point :math:`P^*` in the solution space. The :func:`~torch_bsf.model_selection.elastic_net_grid.elastic_net_grid` function still returns multiple distinct base-edge weights (:math:`w_1 = 0` with varying :math:`w_2`, :math:`w_3`), but they all evaluate to this same null-model solution. The resulting quotient space is a **leaf/eye-shaped CW complex**: two 0-cells (:math:`A` and :math:`P^*`), two 1-cells (the former edges :math:`AB` and :math:`AC`, now connecting :math:`A` to :math:`P^*` as curves), and one 2-cell (the interior). This identification gives the interior a **leaf (foliation) structure**: for each fixed value of :math:`w_1 \in (0, 1]`, the set of corresponding weight vectors :math:`\{(w_1, w_2, w_3) : w_2 + w_3 = 1 - w_1,\; w_2, w_3 \ge 0\}` is a line segment (a *leaf*) parametrized by :math:`\alpha`. As :math:`w_1 \to 0` (i.e. :math:`\lambda \to \infty`), the images of these leaves under the solution map shrink to the single null-model point :math:`P^*`. .. figure:: ../_static/elastic_net_leaf_space.png :width: 100% All points are colored by :math:`(w_1, w_2, w_3) \mapsto (R, G, B)`, so the same weight vector has the same color in every panel. **Left** – The :math:`(\lambda, \alpha)` hyperparameter space (x: regularization strength, y: L1 mixing ratio). The red line at :math:`\lambda = 0` is the identified edge; all points on it share the color :math:`(1, 0, 0)` = red because :math:`w = (1, 0, 0)` there. **Center** – The 2-simplex with vertices :math:`(1,0,0)` at the bottom-left (red), :math:`(0,1,0)` at the top (green), and :math:`(0,0,1)` at the bottom-right (blue). The gradient right edge (green→blue) is the null-model base edge to be identified. **Right** – The quotient space: vertex :math:`A` = :math:`(1,0,0)` (red) at the left, and the null-model point :math:`P^*` at the right shown as a large green dot :math:`(0,1,0)` behind a smaller blue dot :math:`(0,0,1)`, reflecting that both endpoints of the base edge are identified to :math:`P^*`. Grid Structure -------------- A uniform grid in :math:`(\lambda, \alpha)` is sub-optimal because solutions change rapidly near :math:`\lambda = 0` and slowly for large :math:`\lambda`. :func:`~torch_bsf.model_selection.elastic_net_grid.elastic_net_grid` therefore uses: * **Log-scale spacing in the data-fidelity weight coordinate** :math:`w_1` – the :func:`~torch_bsf.model_selection.elastic_net_grid.reverse_logspace` routine generates ``n_lambdas - 1`` values of :math:`w_1 \in [0, 1)` that serve as :math:`w_1`-levels for iso-:math:`w_1` leaves, including :math:`w_1 = 0` (which lies on the null-model base edge and corresponds to :math:`\lambda = \infty`) and excluding the data-fidelity vertex at :math:`w_1 = 1`. The vertex at :math:`w_1 = 1` (i.e. :math:`\lambda = 0`) is appended separately. As :math:`w_1` increases from 0 towards 1, these levels move from the base edge into the simplex interior towards the data-fidelity vertex. Since :math:`\lambda = (1 - w_1) / w_1` is finite only for :math:`w_1 > 0`, all finite values of :math:`\lambda` arise from :math:`0 < w_1 < 1`. The log spacing in :math:`w_1` produces more :math:`w_1` levels close to the data-fidelity vertex and therefore near :math:`\lambda = 0`. The steepness of this clustering is controlled by the ``base`` parameter: ``base=1`` gives uniform spacing in :math:`w_1`, while larger values concentrate points further towards :math:`w_1 = 1` (i.e. towards smaller :math:`\lambda`). * **Uniform spacing along** :math:`\alpha` – for each :math:`w_1` level, the ``n_alphas`` values of :math:`\alpha` are placed uniformly in :math:`[0, 1]` to span the corresponding iso-:math:`w_1` leaf. The ``n_vertex_copies`` parameter adds extra copies of each simplex vertex. This is useful when the grid is passed to k-fold cross-validation (see :doc:`auto_degree`): set ``n_vertex_copies >= k`` for k-fold CV so that, when the fold-splitting procedure distributes rows approximately evenly, each fold will contain every vertex at least once. Using fewer copies than folds can inflate cross-validation variance. .. figure:: ../_static/elastic_net_grid_comparison.png :width: 100% Grid points on the 2-simplex for different values of the ``base`` parameter (``n_lambdas=20``, ``n_alphas=10``). Larger bases push more points towards the data-fidelity vertex :math:`(1, 0, 0)` (bottom-left corner in the figure), which is appropriate when you expect the optimal model to have small :math:`\lambda`. Usage ----- As a Python Function ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import numpy as np from torch_bsf.model_selection.elastic_net_grid import elastic_net_grid # Generate a grid with 102 lambda levels, 12 alpha levels, and 10 vertex # copies (useful for 10-fold cross-validation). grid = elastic_net_grid( n_lambdas=102, n_alphas=12, n_vertex_copies=10, base=10, ) # grid.shape == (1240, 3) # Each row is a weight vector (w1, w2, w3) on the 2-simplex. np.savetxt("weights.csv", grid, delimiter=",", fmt="%.17e") The returned array can be saved to a CSV file and then either (a) loaded back into an array or tensor and passed as the ``params`` argument to :func:`torch_bsf.fit`, or (b) passed as a file path to the ``--params`` CLI option, to train a Bézier simplex over the elastic-net regularization map. As a Python Module (CLI) ~~~~~~~~~~~~~~~~~~~~~~~~ Run the module directly to print the grid as a CSV file to *stdout*: .. code-block:: bash python -m torch_bsf.model_selection.elastic_net_grid \ --n_lambdas=102 \ --n_alphas=12 \ --n_vertex_copies=10 \ --base=10 \ > weights.csv All four parameters are optional and fall back to their defaults (``n_lambdas=102``, ``n_alphas=12``, ``n_vertex_copies=1``, ``base=10``). Via MLproject ~~~~~~~~~~~~~ The ``elastic_net_grid`` entry point in ``MLproject`` calls the same module and redirects the output to a CSV file named ``weight_{n_lambdas}_{n_alphas}_{n_vertex_copies}_{base}.csv``: .. code-block:: bash mlflow run https://github.com/opthub-org/pytorch-bsf \ -e elastic_net_grid \ -P n_lambdas=102 \ -P n_alphas=12 \ -P n_vertex_copies=10 \ -P base=10 After the run completes, the grid is saved in the current working directory as ``weight_102_12_10_10.csv``. You can then pass it as the ``params`` argument to a subsequent training run: .. code-block:: bash mlflow run https://github.com/opthub-org/pytorch-bsf \ -P params=weight_102_12_10_10.csv \ -P values=values.csv \ -P degree=6 .. seealso:: * :func:`torch_bsf.model_selection.elastic_net_grid.elastic_net_grid` – API reference with parameter descriptions and examples. * :func:`torch_bsf.model_selection.elastic_net_grid.reverse_logspace` – helper that generates log-spaced samples for the first weight component :math:`w_1`, from which the :math:`\lambda = (1 - w_1) / w_1` values are derived. * :doc:`auto_degree` – automatic degree selection via k-fold cross-validation. * :doc:`../applications/elastic_net` – end-to-end example of elastic-net model selection using PyTorch-BSF.