Contain the different estimators of the library.

Estimators

The link between the regularization parameter C of scikit-learn and \(\lambda\) is \(C=\frac{1}{2n \lambda}\)

The Regression Class

class cyanure.estimators.Regression(loss='square', penalty='l2', fit_intercept=True, random_state=0, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, dual=None, safe=True)[source]

Bases: ERM

The regression class which derives from ERM.

The goal is to minimize the following objective:

\[\min_{w,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, w^\top x_i + b\right) + \psi(w),\]

where \(L\) is a regression loss, \(\\psi\) is a regularization function (or constraint), \(w\) is a p-dimensional vector representing model parameters, and b is an optional unregularized intercept., and the targets will be real values.

Parameters
loss (string): default=’square’

Loss function to be used. Possible choices are: Only the square loss is implemented at this point. Given two k-dimensional vectors y,z:

  • ‘square’ => \(L(y,z) = \frac{1}{2}( y-z)^2\)

penalty (string): default=’none’

Regularization function psi. Possible choices are

For binary_problem problems:

  • ‘none’

    \(psi(w) = 0\)

  • ‘l2’

    \(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)

  • ‘l1

    \(psi(w) = \lambda_1 ||w||_1\)

  • ‘elasticnet’

    \(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘fused-lasso’

    \(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘l1-ball’

    encodes the constraint \(||w||_1 <= \lambda\)

  • ‘l2-ball’

    encodes the constraint \(||w||_2 <= \lambda\)

For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.

\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]

In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)

  • ‘l1l2’, which is the multi-task group Lasso regularization
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
  • ‘l1linf’
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
  • ‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
    \[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
fit_intercept (boolean): default=’False’

Learns an unregularized intercept b (or several intercepts for multivariate problems)

lambda_1 (float): default=0

First regularization parameter

lambda_2 (float): default=0

Second regularization parameter, if needed

lambda_3 (float): default=0

Third regularization parameter, if needed

solver (string): default=’auto’

Optimization solver. Possible choices are

  • ‘ista’

  • ‘fista’

  • ‘catalyst-ista’

  • ‘qning-ista’ (proximal quasi-Newton method)

  • ‘svrg’

  • ‘catalyst-svrg’ (accelerated SVRG with Catalyst)

  • ‘qning-svrg’ (quasi-Newton SVRG)

  • ‘acc-svrg’ (SVRG with direct acceleration)

  • ‘miso’

  • ‘catalyst-miso’ (accelerated MISO with Catalyst)

  • ‘qning-miso’ (quasi-Newton MISO)

  • ‘auto’

see the Latex documentation for more details. If you are unsure, use ‘auto’

tol (float): default=’1e-3’

Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee

\(f(x_t) - f^* <= tol f(x_t)\)

max_iter (int): default=500

Maximum number of iteration of the algorithm in terms of passes over the data

duality_gap_interval (int): default=10

Frequency of duality-gap computation

verbose (boolean): default=True

Display information or not

n_threads (int): default=-1

Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.

random_state (int): default=0

Random seed

warm_start (boolean): default=False

Use a restart strategy

binary_problem (boolean): default=True

univariate or multivariate problems

limited_memory_qning (int): default=20

Memory parameter for the qning method

fista_restart (int): default=50

Restart strategy for fista (useful for computing regularization path)

Methods:

fit(X, y[, le_parameter])

Fit the parameters.

predict(X)

Predict the labels given an input matrix X (same format as fit).

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

densify()

Convert coefficient matrix to dense array format.

get_params([deep])

Get parameters for the estimator.

get_weights()

Get the model parameters (either w or the tuple (w,b)).

set_params(**params)

Allow to change the value of parameters.

sparsify()

Convert coefficient matrix to sparse format.

fit(X, y, le_parameter=None)[source]

Fit the parameters.

Parameters
X (numpy array or scipy sparse CSR matrix):

input n X p numpy matrix; the samples are on the rows

y (numpy array):
  • vector of size n with real values for regression

  • matrix of size n X k for multivariate regression

Returns
self (ERM):

Returns the instance of the class

predict(X)[source]

Predict the labels given an input matrix X (same format as fit).

Parameters
X (numpy array or scipy sparse CSR matrix):

Input matrix for the prediction

Returns
pred (numpy.array):

Prediction for the X matrix

score(X, y, sample_weight=None)[source]

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \\frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters
X (numpy array or scipy sparse CSR matrix):

Test samples.

y (numpy.array):

True labels for X.

sample_weight (numpy.array, optional):

Sample weights. Defaults to None.

Returns
score (float):

\(R^2\) of self.predict(X) wrt. y.

densify()

Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

Returns
self (ERM):

Fitted estimator converted to dense estimator

get_params(deep=True)

Get parameters for the estimator.

Parameters
deep (bool, optional):

If True returns also subobjects that are estimators. Defaults to True.

Returns
params (dict):

Parameters names and values

get_weights()

Get the model parameters (either w or the tuple (w,b)).

Returns
w or (w,b) (numpy.array or tuple of numpy.array):

Model parameters

set_params(**params)

Allow to change the value of parameters.

Parameters
params (dict):

Estimator parameters to set

Returns
self (ERM):

Estimator instance

Raises
ValueError:

The parameter does not exist

sparsify()

Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. The intercept_ member is not converted.

Returns
self (ERM):

Fitted estimator converted to parse estimator.

Notes

For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.

The Classifier Class

class cyanure.estimators.Classifier(loss='square', penalty='l2', fit_intercept=True, tol=0.001, solver='auto', random_state=0, max_iter=500, fista_restart=50, verbose=True, warm_start=False, multi_class='auto', limited_memory_qning=20, lambda_1=0, lambda_2=0, lambda_3=0, duality_gap_interval=5, n_threads=-1, dual=None, safe=True)[source]

Bases: ClassifierAbstraction

The classification class.

The goal is to minimize the following objective:

\[\min_{W,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, W^\top x_i + b\right) + \psi(W)\]

where \(L\) is a classification loss, \(\psi\) is a regularization function (or constraint), \(W=[w_1,\ldots,w_k]\) is a (p x k) matrix that carries the k predictors, where k is the number of classes, and \(y_i\) is a label in \(\{1,\ldots,k\}\). b is a k-dimensional vector representing an unregularized intercept (which is optional).

Parameters
loss: string, default=’square’

Loss function to be used. Possible choices are

  • ‘square’

    \(L(y,z) = \frac{1}{2} ( y-z)^2\)

  • ‘logistic’

    \(L(y,z) = \log(1 + e^{-y z} )\)

  • ‘sqhinge’ or ‘squared_hinge’

    \(L(y,z) = \frac{1}{2} \max( 0, 1- y z)^2\)

  • ‘safe-logistic’

    \(L(y,z) = e^{ yz - 1 } - y z ~\text{if}~ yz \leq 1~~\text{and}~~0\) otherwise

  • ‘multiclass-logistic’

    which is also called multinomial or softmax logistic: \(L(y, W^\top x + b) = \sum_{j=1}^k \log\left(e^{w_j^\top + b_j} - e^{w_y^\top + b_y} \right)\)

penalty (string): default=’none’

Regularization function psi. Possible choices are

For binary_problem problems:

  • ‘none’

    \(psi(w) = 0\)

  • ‘l2’

    \(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)

  • ‘l1’

    \(psi(w) = \lambda_1 ||w||_1\)

  • ‘elasticnet’

    \(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘fused-lasso’

    \(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘l1-ball’

    encodes the constraint \(||w||_1 <= \lambda\)

  • ‘l2-ball’

    encodes the constraint \(||w||_2 <= \lambda\)

For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.

\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]

In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)

  • ‘l1l2’, which is the multi-task group Lasso regularization
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
  • ‘l1linf’
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
  • ‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
    \[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
fit_intercept (boolean): default=’False’

Learns an unregularized intercept b (or several intercepts for multivariate problems)

lambda_1 (float): default=0

First regularization parameter

lambda_2 (float): default=0

Second regularization parameter, if needed

lambda_3 (float): default=0

Third regularization parameter, if needed

solver (string): default=’auto’

Optimization solver. Possible choices are

  • ‘ista’

  • ‘fista’

  • ‘catalyst-ista’

  • ‘qning-ista’ (proximal quasi-Newton method)

  • ‘svrg’

  • ‘catalyst-svrg’ (accelerated SVRG with Catalyst)

  • ‘qning-svrg’ (quasi-Newton SVRG)

  • ‘acc-svrg’ (SVRG with direct acceleration)

  • ‘miso’

  • ‘catalyst-miso’ (accelerated MISO with Catalyst)

  • ‘qning-miso’ (quasi-Newton MISO)

  • ‘auto’

see the Latex documentation for more details. If you are unsure, use ‘auto’

tol (float): default=’1e-3’

Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee

\(f(x_t) - f^* <= tol f(x_t)\)

max_iter (int): default=500

Maximum number of iteration of the algorithm in terms of passes over the data

duality_gap_interval (int): default=10

Frequency of duality-gap computation

verbose (boolean): default=True

Display information or not

n_threads (int): default=-1

Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.

random_state (int): default=0

Random seed

warm_start (boolean): default=False

Use a restart strategy

binary_problem (boolean): default=True

univariate or multivariate problems

limited_memory_qning (int): default=20

Memory parameter for the qning method

fista_restart (int): default=50

Restart strategy for fista (useful for computing regularization path)

Methods:

fit(X, y[, le_parameter])

Fit the parameters.

predict(X)

Predict the labels given an input matrix X (same format as fit).

score(X, y)

Give an accuracy score on test data.

decision_function(X)

Predict confidence scores for samples.

predict_proba(X)

Estimate the probability for each class.

densify()

Convert coefficient matrix to dense array format.

get_params([deep])

Get parameters for the estimator.

get_weights()

Get the model parameters (either w or the tuple (w,b)).

set_params(**params)

Allow to change the value of parameters.

sparsify()

Convert coefficient matrix to sparse format.

fit(X, y, le_parameter=None)[source]

Fit the parameters.

Parameters
X (numpy array, or scipy sparse CSR matrix):

input n x p numpy matrix; the samples are on the rows

y (numpy.array):

Input labels.

  • vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.

predict(X)[source]

Predict the labels given an input matrix X (same format as fit).

Parameters
X (numpy array or scipy sparse CSR matrix):

Input matrix for the prediction

Returns
pred (numpy.array):

Prediction for the X matrix

score(X, y)[source]

Give an accuracy score on test data.

Parameters
X (numpy array or scipy sparse CSR matrix):

Test samples.

y (numpy.array):

True labels for X.

sample_weight (numpy.array, optional):

Sample weights. Defaults to None.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

decision_function(X)[source]

Predict confidence scores for samples.

Parameters
X (numpy array or scipy sparse CSR matrix):

The data for which we want scores

Returns
scores (numpy.array):

Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means t his class would be predicted.

predict_proba(X)[source]

Estimate the probability for each class.

Parameters
X (numpy array or scipy sparse CSR matrix):

Data matrix for which we want probabilities

Returns
proba (numpy.array):

Return the probability of the samples for each class.

densify()

Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

Returns
self (ERM):

Fitted estimator converted to dense estimator

get_params(deep=True)

Get parameters for the estimator.

Parameters
deep (bool, optional):

If True returns also subobjects that are estimators. Defaults to True.

Returns
params (dict):

Parameters names and values

get_weights()

Get the model parameters (either w or the tuple (w,b)).

Returns
w or (w,b) (numpy.array or tuple of numpy.array):

Model parameters

set_params(**params)

Allow to change the value of parameters.

Parameters
params (dict):

Estimator parameters to set

Returns
self (ERM):

Estimator instance

Raises
ValueError:

The parameter does not exist

sparsify()

Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. The intercept_ member is not converted.

Returns
self (ERM):

Fitted estimator converted to parse estimator.

Notes

For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.

Pre-configured classes

class cyanure.estimators.LinearSVC(loss='sqhinge', penalty='l2', fit_intercept=True, verbose=False, lambda_1=0.1, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for square hinge loss.

Methods:

decision_function(X)

Predict confidence scores for samples.

densify()

Convert coefficient matrix to dense array format.

fit(X, y[, le_parameter])

Fit the parameters.

get_params([deep])

Get parameters for the estimator.

get_weights()

Get the model parameters (either w or the tuple (w,b)).

predict(X)

Predict the labels given an input matrix X (same format as fit).

predict_proba(X)

Estimate the probability for each class.

score(X, y)

Give an accuracy score on test data.

set_params(**params)

Allow to change the value of parameters.

sparsify()

Convert coefficient matrix to sparse format.

class cyanure.estimators.LogisticRegression(penalty='l2', loss='logistic', fit_intercept=True, verbose=False, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, multi_class='auto', dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for logistic regression loss.

Methods:

decision_function(X)

Predict confidence scores for samples.

densify()

Convert coefficient matrix to dense array format.

fit(X, y[, le_parameter])

Fit the parameters.

get_params([deep])

Get parameters for the estimator.

get_weights()

Get the model parameters (either w or the tuple (w,b)).

predict(X)

Predict the labels given an input matrix X (same format as fit).

predict_proba(X)

Estimate the probability for each class.

score(X, y)

Give an accuracy score on test data.

set_params(**params)

Allow to change the value of parameters.

sparsify()

Convert coefficient matrix to sparse format.

class cyanure.estimators.Lasso(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, dual=None, safe=True)[source]

Bases: Regression

A pre-configured class for Lasso regression.

Using active set when the number of features is superior to 1000.

Methods:

fit(X, y)

Fit the parameters.

densify()

Convert coefficient matrix to dense array format.

get_params([deep])

Get parameters for the estimator.

get_weights()

Get the model parameters (either w or the tuple (w,b)).

predict(X)

Predict the labels given an input matrix X (same format as fit).

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

set_params(**params)

Allow to change the value of parameters.

sparsify()

Convert coefficient matrix to sparse format.

fit(X, y)[source]

Fit the parameters.

Parameters
X (numpy array or scipy sparse CSR matrix):

input n X p numpy matrix; the samples are on the rows

y (numpy array):
  • vector of size n with real values for regression

  • matrix of size n X k for multivariate regression

Returns
self (ERM):

Returns the instance of the class

class cyanure.estimators.L1Logistic(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, multi_class='auto', dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for L1 logistic classification.

Using active set when the number of features is superior to 1000

Methods:

decision_function(X)

Predict confidence scores for samples.

densify()

Convert coefficient matrix to dense array format.

fit(X, y)

Fit the parameters.

get_params([deep])

Get parameters for the estimator.

get_weights()

Get the model parameters (either w or the tuple (w,b)).

predict(X)

Predict the labels given an input matrix X (same format as fit).

predict_proba(X)

Estimate the probability for each class.

score(X, y)

Give an accuracy score on test data.

set_params(**params)

Allow to change the value of parameters.

sparsify()

Convert coefficient matrix to sparse format.

fit(X, y)[source]

Fit the parameters.

Parameters
X (numpy array, or scipy sparse CSR matrix):

input n x p numpy matrix; the samples are on the rows

y (numpy.array):

Input labels.

  • vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.