Contain the different estimators of the library.

Estimators

The link between the regularization parameter C of scikit-learn and \(\lambda\) is \(C=\frac{1}{2n \lambda}\)

The Regression Class

class cyanure.estimators.Regression(loss='square', penalty='l2', fit_intercept=True, random_state=0, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, dual=None, safe=True)[source]

Bases: ERM

The regression class which derives from ERM.

The goal is to minimize the following objective:

\[\min_{w,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, w^\top x_i + b\right) + \psi(w),\]

where \(L\) is a regression loss, \(\\psi\) is a regularization function (or constraint), \(w\) is a p-dimensional vector representing model parameters, and b is an optional unregularized intercept., and the targets will be real values.

Parameters

loss (string): default=’square’

Loss function to be used. Possible choices are: Only the square loss is implemented at this point. Given two k-dimensional vectors y,z:

‘square’ => \(L(y,z) = \frac{1}{2}( y-z)^2\)

penalty (string): default=’none’

Regularization function psi. Possible choices are

For binary_problem problems:

‘none’
\(psi(w) = 0\)
‘l2’
\(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)
‘l1
\(psi(w) = \lambda_1 ||w||_1\)
‘elasticnet’
\(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
‘fused-lasso’
\(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
‘l1-ball’
encodes the constraint \(||w||_1 <= \lambda\)
‘l2-ball’
encodes the constraint \(||w||_2 <= \lambda\)

For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.

\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]

In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)

‘l1l2’, which is the multi-task group Lasso regularization
\[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
‘l1linf’
\[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
\[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]

fit_intercept (boolean): default=’False’

Learns an unregularized intercept b (or several intercepts for multivariate problems)

lambda_1 (float): default=0

First regularization parameter

lambda_2 (float): default=0

Second regularization parameter, if needed

lambda_3 (float): default=0

Third regularization parameter, if needed

solver (string): default=’auto’

Optimization solver. Possible choices are

‘ista’
‘fista’
‘catalyst-ista’
‘qning-ista’ (proximal quasi-Newton method)
‘svrg’
‘catalyst-svrg’ (accelerated SVRG with Catalyst)
‘qning-svrg’ (quasi-Newton SVRG)
‘acc-svrg’ (SVRG with direct acceleration)
‘miso’
‘catalyst-miso’ (accelerated MISO with Catalyst)
‘qning-miso’ (quasi-Newton MISO)
‘auto’

see the Latex documentation for more details. If you are unsure, use ‘auto’

tol (float): default=’1e-3’

Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee

\(f(x_t) - f^* <= tol f(x_t)\)

max_iter (int): default=500

Maximum number of iteration of the algorithm in terms of passes over the data

duality_gap_interval (int): default=10

Frequency of duality-gap computation

verbose (boolean): default=True

Display information or not

n_threads (int): default=-1

Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.

random_state (int): default=0

Random seed

warm_start (boolean): default=False

Use a restart strategy

binary_problem (boolean): default=True

univariate or multivariate problems

limited_memory_qning (int): default=20

Memory parameter for the qning method

fista_restart (int): default=50

Restart strategy for fista (useful for computing regularization path)

Methods:

`fit`(X, y[, le_parameter])	Fit the parameters.
`predict`(X)	Predict the labels given an input matrix X (same format as fit).
`score`(X, y[, sample_weight])	Return the coefficient of determination of the prediction.
`densify`()	Convert coefficient matrix to dense array format.
`get_params`([deep])	Get parameters for the estimator.
`get_weights`()	Get the model parameters (either w or the tuple (w,b)).
`set_params`(**params)	Allow to change the value of parameters.
`sparsify`()	Convert coefficient matrix to sparse format.

fit(X, y, le_parameter=None)[source]

Fit the parameters.

Parameters

X (numpy array or scipy sparse CSR matrix):

input n X p numpy matrix; the samples are on the rows

y (numpy array):

vector of size n with real values for regression
matrix of size n X k for multivariate regression

Returns

self (ERM):: Returns the instance of the class

predict(X)[source]

Predict the labels given an input matrix X (same format as fit).

Parameters

X (numpy array or scipy sparse CSR matrix):: Input matrix for the prediction

Returns

pred (numpy.array):: Prediction for the X matrix

score(X, y, sample_weight=None)[source]

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \\frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters

X (numpy array or scipy sparse CSR matrix):: Test samples.
y (numpy.array):: True labels for X.
sample_weight (numpy.array, optional):: Sample weights. Defaults to None.

Returns

score (float):: \(R^2\) of self.predict(X) wrt. y.

densify()

Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

Returns

self (ERM):: Fitted estimator converted to dense estimator

get_params(deep=True)

Get parameters for the estimator.

Parameters

deep (bool, optional):: If True returns also subobjects that are estimators. Defaults to True.

Returns

params (dict):: Parameters names and values

get_weights()

Get the model parameters (either w or the tuple (w,b)).

Returns

w or (w,b) (numpy.array or tuple of numpy.array):: Model parameters

set_params(**params)

Allow to change the value of parameters.

Parameters

params (dict):: Estimator parameters to set

Returns

self (ERM):: Estimator instance

Raises

ValueError:: The parameter does not exist

sparsify()

Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. The intercept_ member is not converted.

Returns

self (ERM):: Fitted estimator converted to parse estimator.

Notes

For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.

The Classifier Class

class cyanure.estimators.Classifier(loss='square', penalty='l2', fit_intercept=True, tol=0.001, solver='auto', random_state=0, max_iter=500, fista_restart=50, verbose=True, warm_start=False, multi_class='auto', limited_memory_qning=20, lambda_1=0, lambda_2=0, lambda_3=0, duality_gap_interval=5, n_threads=-1, dual=None, safe=True)[source]

Bases: ClassifierAbstraction

The classification class.

The goal is to minimize the following objective:

\[\min_{W,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, W^\top x_i + b\right) + \psi(W)\]

where \(L\) is a classification loss, \(\psi\) is a regularization function (or constraint), \(W=[w_1,\ldots,w_k]\) is a (p x k) matrix that carries the k predictors, where k is the number of classes, and \(y_i\) is a label in \(\{1,\ldots,k\}\). b is a k-dimensional vector representing an unregularized intercept (which is optional).

Parameters

loss: string, default=’square’

Loss function to be used. Possible choices are

‘square’
\(L(y,z) = \frac{1}{2} ( y-z)^2\)

‘logistic’
\(L(y,z) = \log(1 + e^{-y z} )\)

‘sqhinge’ or ‘squared_hinge’
\(L(y,z) = \frac{1}{2} \max( 0, 1- y z)^2\)

‘safe-logistic’
\(L(y,z) = e^{ yz - 1 } - y z ~\text{if}~ yz \leq 1~~\text{and}~~0\) otherwise

‘multiclass-logistic’
which is also called multinomial or softmax logistic: \(L(y, W^\top x + b) = \sum_{j=1}^k \log\left(e^{w_j^\top + b_j} - e^{w_y^\top + b_y} \right)\)

penalty (string): default=’none’

Regularization function psi. Possible choices are

For binary_problem problems:

‘none’
\(psi(w) = 0\)
‘l2’
\(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)
‘l1’
\(psi(w) = \lambda_1 ||w||_1\)
‘elasticnet’
\(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
‘fused-lasso’
\(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
‘l1-ball’
encodes the constraint \(||w||_1 <= \lambda\)
‘l2-ball’
encodes the constraint \(||w||_2 <= \lambda\)

For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.

\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]

In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)

‘l1l2’, which is the multi-task group Lasso regularization
\[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
‘l1linf’
\[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
\[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]

fit_intercept (boolean): default=’False’

Learns an unregularized intercept b (or several intercepts for multivariate problems)

lambda_1 (float): default=0

First regularization parameter

lambda_2 (float): default=0

Second regularization parameter, if needed

lambda_3 (float): default=0

Third regularization parameter, if needed

solver (string): default=’auto’

Optimization solver. Possible choices are

‘ista’
‘fista’
‘catalyst-ista’
‘qning-ista’ (proximal quasi-Newton method)
‘svrg’
‘catalyst-svrg’ (accelerated SVRG with Catalyst)
‘qning-svrg’ (quasi-Newton SVRG)
‘acc-svrg’ (SVRG with direct acceleration)
‘miso’
‘catalyst-miso’ (accelerated MISO with Catalyst)
‘qning-miso’ (quasi-Newton MISO)
‘auto’

see the Latex documentation for more details. If you are unsure, use ‘auto’

tol (float): default=’1e-3’

Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee

\(f(x_t) - f^* <= tol f(x_t)\)

max_iter (int): default=500

Maximum number of iteration of the algorithm in terms of passes over the data

duality_gap_interval (int): default=10

Frequency of duality-gap computation

verbose (boolean): default=True

Display information or not

n_threads (int): default=-1

Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.

random_state (int): default=0

Random seed

warm_start (boolean): default=False

Use a restart strategy

binary_problem (boolean): default=True

univariate or multivariate problems

limited_memory_qning (int): default=20

Memory parameter for the qning method

fista_restart (int): default=50

Restart strategy for fista (useful for computing regularization path)

Methods:

`fit`(X, y[, le_parameter])	Fit the parameters.
`predict`(X)	Predict the labels given an input matrix X (same format as fit).
`score`(X, y)	Give an accuracy score on test data.
`decision_function`(X)	Predict confidence scores for samples.
`predict_proba`(X)	Estimate the probability for each class.
`densify`()	Convert coefficient matrix to dense array format.
`get_params`([deep])	Get parameters for the estimator.
`get_weights`()	Get the model parameters (either w or the tuple (w,b)).
`set_params`(**params)	Allow to change the value of parameters.
`sparsify`()	Convert coefficient matrix to sparse format.

fit(X, y, le_parameter=None)[source]

Fit the parameters.

Parameters

X (numpy array, or scipy sparse CSR matrix):

input n x p numpy matrix; the samples are on the rows

y (numpy.array):

Input labels.

vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.

predict(X)[source]

Predict the labels given an input matrix X (same format as fit).

Parameters

X (numpy array or scipy sparse CSR matrix):: Input matrix for the prediction

Returns

pred (numpy.array):: Prediction for the X matrix

score(X, y)[source]

Give an accuracy score on test data.

Parameters

X (numpy array or scipy sparse CSR matrix):: Test samples.
y (numpy.array):: True labels for X.
sample_weight (numpy.array, optional):: Sample weights. Defaults to None.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

decision_function(X)[source]

Predict confidence scores for samples.

Parameters

X (numpy array or scipy sparse CSR matrix):: The data for which we want scores

Returns

scores (numpy.array):: Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means t his class would be predicted.

predict_proba(X)[source]

Estimate the probability for each class.

Parameters

X (numpy array or scipy sparse CSR matrix):: Data matrix for which we want probabilities

Returns

proba (numpy.array):: Return the probability of the samples for each class.

densify()

Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

Returns

self (ERM):: Fitted estimator converted to dense estimator

get_params(deep=True)

Get parameters for the estimator.

Parameters

deep (bool, optional):: If True returns also subobjects that are estimators. Defaults to True.

Returns

params (dict):: Parameters names and values

get_weights()

Get the model parameters (either w or the tuple (w,b)).

Returns

w or (w,b) (numpy.array or tuple of numpy.array):: Model parameters

set_params(**params)

Allow to change the value of parameters.

Parameters

params (dict):: Estimator parameters to set

Returns

self (ERM):: Estimator instance

Raises

ValueError:: The parameter does not exist

sparsify()

Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. The intercept_ member is not converted.

Returns

self (ERM):: Fitted estimator converted to parse estimator.

Notes

For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.

Pre-configured classes

class cyanure.estimators.LinearSVC(loss='sqhinge', penalty='l2', fit_intercept=True, verbose=False, lambda_1=0.1, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for square hinge loss.

Methods:

`decision_function`(X)	Predict confidence scores for samples.
`densify`()	Convert coefficient matrix to dense array format.
`fit`(X, y[, le_parameter])	Fit the parameters.
`get_params`([deep])	Get parameters for the estimator.
`get_weights`()	Get the model parameters (either w or the tuple (w,b)).
`predict`(X)	Predict the labels given an input matrix X (same format as fit).
`predict_proba`(X)	Estimate the probability for each class.
`score`(X, y)	Give an accuracy score on test data.
`set_params`(**params)	Allow to change the value of parameters.
`sparsify`()	Convert coefficient matrix to sparse format.

class cyanure.estimators.LogisticRegression(penalty='l2', loss='logistic', fit_intercept=True, verbose=False, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, multi_class='auto', dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for logistic regression loss.

Methods:

`decision_function`(X)	Predict confidence scores for samples.
`densify`()	Convert coefficient matrix to dense array format.
`fit`(X, y[, le_parameter])	Fit the parameters.
`get_params`([deep])	Get parameters for the estimator.
`get_weights`()	Get the model parameters (either w or the tuple (w,b)).
`predict`(X)	Predict the labels given an input matrix X (same format as fit).
`predict_proba`(X)	Estimate the probability for each class.
`score`(X, y)	Give an accuracy score on test data.
`set_params`(**params)	Allow to change the value of parameters.
`sparsify`()	Convert coefficient matrix to sparse format.

class cyanure.estimators.Lasso(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, dual=None, safe=True)[source]

Bases: Regression

A pre-configured class for Lasso regression.

Using active set when the number of features is superior to 1000.

Methods:

`fit`(X, y)	Fit the parameters.
`densify`()	Convert coefficient matrix to dense array format.
`get_params`([deep])	Get parameters for the estimator.
`get_weights`()	Get the model parameters (either w or the tuple (w,b)).
`predict`(X)	Predict the labels given an input matrix X (same format as fit).
`score`(X, y[, sample_weight])	Return the coefficient of determination of the prediction.
`set_params`(**params)	Allow to change the value of parameters.
`sparsify`()	Convert coefficient matrix to sparse format.

fit(X, y)[source]

Fit the parameters.

Parameters

X (numpy array or scipy sparse CSR matrix):

input n X p numpy matrix; the samples are on the rows

y (numpy array):

vector of size n with real values for regression
matrix of size n X k for multivariate regression

Returns

self (ERM):: Returns the instance of the class

class cyanure.estimators.L1Logistic(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, multi_class='auto', dual=None, safe=True)[source]

Bases: Classifier

A pre-configured class for L1 logistic classification.

Using active set when the number of features is superior to 1000

Methods:

`decision_function`(X)	Predict confidence scores for samples.
`densify`()	Convert coefficient matrix to dense array format.
`fit`(X, y)	Fit the parameters.
`get_params`([deep])	Get parameters for the estimator.
`get_weights`()	Get the model parameters (either w or the tuple (w,b)).
`predict`(X)	Predict the labels given an input matrix X (same format as fit).
`predict_proba`(X)	Estimate the probability for each class.
`score`(X, y)	Give an accuracy score on test data.
`set_params`(**params)	Allow to change the value of parameters.
`sparsify`()	Convert coefficient matrix to sparse format.

fit(X, y)[source]

Fit the parameters.

Parameters

X (numpy array, or scipy sparse CSR matrix):

input n x p numpy matrix; the samples are on the rows

y (numpy.array):

Input labels.

vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.