Contain the different estimators of the library.


The link between the regularization parameter C of scikit-learn and \(\lambda\) is \(C=\frac{1}{2n \lambda}\)

The Regression Class

class cyanure.estimators.Regression(loss='square', penalty='l2', fit_intercept=True, random_state=0, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, dual=None, safe=True)[source]

Bases: ERM

The regression class which derives from ERM.

The goal is to minimize the following objective:

\[\min_{w,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, w^\top x_i + b\right) + \psi(w),\]

where \(L\) is a regression loss, \(\\psi\) is a regularization function (or constraint), \(w\) is a p-dimensional vector representing model parameters, and b is an optional unregularized intercept., and the targets will be real values.

loss (string): default=’square’

Loss function to be used. Possible choices are: Only the square loss is implemented at this point. Given two k-dimensional vectors y,z:

  • ‘square’ => \(L(y,z) = \frac{1}{2}( y-z)^2\)

penalty (string): default=’none’

Regularization function psi. Possible choices are

For binary_problem problems:

  • ‘none’

    \(psi(w) = 0\)

  • ‘l2’

    \(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)

  • ‘l1

    \(psi(w) = \lambda_1 ||w||_1\)

  • ‘elasticnet’

    \(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘fused-lasso’

    \(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘l1-ball’

    encodes the constraint \(||w||_1 <= \lambda\)

  • ‘l2-ball’

    encodes the constraint \(||w||_2 <= \lambda\)

For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.

\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]

In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)

  • ‘l1l2’, which is the multi-task group Lasso regularization
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
  • ‘l1linf’
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
  • ‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
    \[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
fit_intercept (boolean): default=’False’

Learns an unregularized intercept b (or several intercepts for multivariate problems)

lambda_1 (float): default=0

First regularization parameter

lambda_2 (float): default=0

Second regularization parameter, if needed

lambda_3 (float): default=0

Third regularization parameter, if needed

solver (string): default=’auto’

Optimization solver. Possible choices are

  • ‘ista’

  • ‘fista’

  • ‘catalyst-ista’

  • ‘qning-ista’ (proximal quasi-Newton method)

  • ‘svrg’

  • ‘catalyst-svrg’ (accelerated SVRG with Catalyst)

  • ‘qning-svrg’ (quasi-Newton SVRG)

  • ‘acc-svrg’ (SVRG with direct acceleration)

  • ‘miso’

  • ‘catalyst-miso’ (accelerated MISO with Catalyst)

  • ‘qning-miso’ (quasi-Newton MISO)

  • ‘auto’

see the Latex documentation for more details. If you are unsure, use ‘auto’

tol (float): default=’1e-3’

Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee

\(f(x_t) - f^* <= tol f(x_t)\)

max_iter (int): default=500

Maximum number of iteration of the algorithm in terms of passes over the data

duality_gap_interval (int): default=10

Frequency of duality-gap computation

verbose (boolean): default=True

Display information or not

n_threads (int): default=-1

Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.

random_state (int): default=0

Random seed

warm_start (boolean): default=False

Use a restart strategy

binary_problem (boolean): default=True

univariate or multivariate problems

limited_memory_qning (int): default=20

Memory parameter for the qning method

fista_restart (int): default=50

Restart strategy for fista (useful for computing regularization path)


fit(X, y[, le_parameter])

Fit the parameters.


Predict the labels given an input matrix X (same format as fit).

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.


Convert coefficient matrix to dense array format.


Get parameters for the estimator.


Get the model parameters (either w or the tuple (w,b)).


Allow to change the value of parameters.


Convert coefficient matrix to sparse format.

fit(X, y, le_parameter=None)[source]

Fit the parameters.

X (numpy array or scipy sparse CSR matrix):

input n X p numpy matrix; the samples are on the rows

y (numpy array):
  • vector of size n with real values for regression

  • matrix of size n X k for multivariate regression

self (ERM):

Returns the instance of the class


Predict the labels given an input matrix X (same format as fit).

X (numpy array or scipy sparse CSR matrix):

Input matrix for the prediction

pred (numpy.array):

Prediction for the X matrix

score(X, y, sample_weight=None)[source]

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \\frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

X (numpy array or scipy sparse CSR matrix):

Test samples.

y (numpy.array):

True labels for X.

sample_weight (numpy.array, optional):

Sample weights. Defaults to None.

score (float):

\(R^2\) of self.predict(X) wrt. y.


Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

self (ERM):

Fitted estimator converted to dense estimator


Get parameters for the estimator.

deep (bool, optional):

If True returns also subobjects that are estimators. Defaults to True.

params (dict):

Parameters names and values


Get the model parameters (either w or the tuple (w,b)).

w or (w,b) (numpy.array or tuple of numpy.array):

Model parameters


Allow to change the value of parameters.

params (dict):

Estimator parameters to set

self (ERM):

Estimator instance


The parameter does not exist


Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. The intercept_ member is not converted.

self (ERM):

Fitted estimator converted to parse estimator.


For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.

The Classifier Class

class cyanure.estimators.Classifier(loss='square', penalty='l2', fit_intercept=True, tol=0.001, solver='auto', random_state=0, max_iter=500, fista_restart=50, verbose=True, warm_start=False, multi_class='auto', limited_memory_qning=20, lambda_1=0, lambda_2=0, lambda_3=0, duality_gap_interval=5, n_threads=-1, dual=None, safe=True)[source]

Bases: ClassifierAbstraction

The classification class.

The goal is to minimize the following objective:

\[\min_{W,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, W^\top x_i + b\right) + \psi(W)\]

where \(L\) is a classification loss, \(\psi\) is a regularization function (or constraint), \(W=[w_1,\ldots,w_k]\) is a (p x k) matrix that carries the k predictors, where k is the number of classes, and \(y_i\) is a label in \(\{1,\ldots,k\}\). b is a k-dimensional vector representing an unregularized intercept (which is optional).

loss: string, default=’square’

Loss function to be used. Possible choices are

  • ‘square’

    \(L(y,z) = \frac{1}{2} ( y-z)^2\)

  • ‘logistic’

    \(L(y,z) = \log(1 + e^{-y z} )\)

  • ‘sqhinge’ or ‘squared_hinge’

    \(L(y,z) = \frac{1}{2} \max( 0, 1- y z)^2\)

  • ‘safe-logistic’

    \(L(y,z) = e^{ yz - 1 } - y z ~\text{if}~ yz \leq 1~~\text{and}~~0\) otherwise

  • ‘multiclass-logistic’

    which is also called multinomial or softmax logistic: \(L(y, W^\top x + b) = \sum_{j=1}^k \log\left(e^{w_j^\top + b_j} - e^{w_y^\top + b_y} \right)\)

penalty (string): default=’none’

Regularization function psi. Possible choices are

For binary_problem problems:

  • ‘none’

    \(psi(w) = 0\)

  • ‘l2’

    \(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)

  • ‘l1’

    \(psi(w) = \lambda_1 ||w||_1\)

  • ‘elasticnet’

    \(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘fused-lasso’

    \(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)

  • ‘l1-ball’

    encodes the constraint \(||w||_1 <= \lambda\)

  • ‘l2-ball’

    encodes the constraint \(||w||_2 <= \lambda\)

For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.

\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]

In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)

  • ‘l1l2’, which is the multi-task group Lasso regularization
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
  • ‘l1linf’
    \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
  • ‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
    \[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
fit_intercept (boolean): default=’False’

Learns an unregularized intercept b (or several intercepts for multivariate problems)

lambda_1 (float): default=0

First regularization parameter

lambda_2 (float): default=0

Second regularization parameter, if needed

lambda_3 (float): default=0

Third regularization parameter, if needed

solver (string): default=’auto’

Optimization solver. Possible choices are

  • ‘ista’

  • ‘fista’

  • ‘catalyst-ista’

  • ‘qning-ista’ (proximal quasi-Newton method)

  • ‘svrg’

  • ‘catalyst-svrg’ (accelerated SVRG with Catalyst)

  • ‘qning-svrg’ (quasi-Newton SVRG)

  • ‘acc-svrg’ (SVRG with direct acceleration)

  • ‘miso’

  • ‘catalyst-miso’ (accelerated MISO with Catalyst)

  • ‘qning-miso’ (quasi-Newton MISO)

  • ‘auto’

see the Latex documentation for more details. If you are unsure, use ‘auto’

tol (float): default=’1e-3’

Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee

\(f(x_t) - f^* <= tol f(x_t)\)

max_iter (int): default=500

Maximum number of iteration of the algorithm in terms of passes over the data

duality_gap_interval (int): default=10

Frequency of duality-gap computation

verbose (boolean): default=True

Display information or not

n_threads (int): default=-1

Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.

random_state (int): default=0

Random seed

warm_start (boolean): default=False

Use a restart strategy

binary_problem (boolean): default=True

univariate or multivariate problems

limited_memory_qning (int): default=20

Memory parameter for the qning method

fista_restart (int): default=50

Restart strategy for fista (useful for computing regularization path)


fit(X, y[, le_parameter])

Fit the parameters.


Predict the labels given an input matrix X (same format as fit).

score(X, y)

Give an accuracy score on test data.


Predict confidence scores for samples.


Estimate the probability for each class.


Convert coefficient matrix to dense array format.


Get parameters for the estimator.


Get the model parameters (either w or the tuple (w,b)).


Allow to change the value of parameters.


Convert coefficient matrix to sparse format.

fit(X, y, le_parameter=None)[source]

Fit the parameters.

X (numpy array, or scipy sparse CSR matrix):

input n x p numpy matrix; the samples are on the rows

y (numpy.array):

Input labels.

  • vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.


Predict the labels given an input matrix X (same format as fit).

X (numpy array or scipy sparse CSR matrix):

Input matrix for the prediction

pred (numpy.array):

Prediction for the X matrix

score(X, y)[source]

Give an accuracy score on test data.

X (numpy array or scipy sparse CSR matrix):

Test samples.

y (numpy.array):

True labels for X.

sample_weight (numpy.array, optional):

Sample weights. Defaults to None.


Mean accuracy of self.predict(X) wrt. y.


Predict confidence scores for samples.

X (numpy array or scipy sparse CSR matrix):

The data for which we want scores

scores (numpy.array):

Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means t his class would be predicted.


Estimate the probability for each class.

X (numpy array or scipy sparse CSR matrix):

Data matrix for which we want probabilities

proba (numpy.array):

Return the probability of the samples for each class.


Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

self (ERM):

Fitted estimator converted to dense estimator


Get parameters for the estimator.

deep (bool, optional):

If True returns also subobjects that are estimators. Defaults to True.

params (dict):

Parameters names and values


Get the model parameters (either w or the tuple (w,b)).

w or (w,b) (numpy.array or tuple of numpy.array):

Model parameters


Allow to change the value of parameters.

params (dict):

Estimator parameters to set

self (ERM):

Estimator instance


The parameter does not exist


Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. The intercept_ member is not converted.

self (ERM):

Fitted estimator converted to parse estimator.


For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.

Pre-configured classes

class cyanure.estimators.LinearSVC(loss='sqhinge', penalty='l2', fit_intercept=True, verbose=False, lambda_1=0.1, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for square hinge loss.



Predict confidence scores for samples.


Convert coefficient matrix to dense array format.

fit(X, y[, le_parameter])

Fit the parameters.


Get parameters for the estimator.


Get the model parameters (either w or the tuple (w,b)).


Predict the labels given an input matrix X (same format as fit).


Estimate the probability for each class.

score(X, y)

Give an accuracy score on test data.


Allow to change the value of parameters.


Convert coefficient matrix to sparse format.

class cyanure.estimators.LogisticRegression(penalty='l2', loss='logistic', fit_intercept=True, verbose=False, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, multi_class='auto', dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for logistic regression loss.



Predict confidence scores for samples.


Convert coefficient matrix to dense array format.

fit(X, y[, le_parameter])

Fit the parameters.


Get parameters for the estimator.


Get the model parameters (either w or the tuple (w,b)).


Predict the labels given an input matrix X (same format as fit).


Estimate the probability for each class.

score(X, y)

Give an accuracy score on test data.


Allow to change the value of parameters.


Convert coefficient matrix to sparse format.

class cyanure.estimators.Lasso(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, dual=None, safe=True)[source]

Bases: Regression

A pre-configured class for Lasso regression.

Using active set when the number of features is superior to 1000.


fit(X, y)

Fit the parameters.


Convert coefficient matrix to dense array format.


Get parameters for the estimator.


Get the model parameters (either w or the tuple (w,b)).


Predict the labels given an input matrix X (same format as fit).

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.


Allow to change the value of parameters.


Convert coefficient matrix to sparse format.

fit(X, y)[source]

Fit the parameters.

X (numpy array or scipy sparse CSR matrix):

input n X p numpy matrix; the samples are on the rows

y (numpy array):
  • vector of size n with real values for regression

  • matrix of size n X k for multivariate regression

self (ERM):

Returns the instance of the class

class cyanure.estimators.L1Logistic(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, multi_class='auto', dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for L1 logistic classification.

Using active set when the number of features is superior to 1000



Predict confidence scores for samples.


Convert coefficient matrix to dense array format.

fit(X, y)

Fit the parameters.


Get parameters for the estimator.


Get the model parameters (either w or the tuple (w,b)).


Predict the labels given an input matrix X (same format as fit).


Estimate the probability for each class.

score(X, y)

Give an accuracy score on test data.


Allow to change the value of parameters.


Convert coefficient matrix to sparse format.

fit(X, y)[source]

Fit the parameters.

X (numpy array, or scipy sparse CSR matrix):

input n x p numpy matrix; the samples are on the rows

y (numpy.array):

Input labels.

  • vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.