Contain the different estimators of the library.
Estimators
The link between the regularization parameter C of scikit-learn and \(\lambda\) is \(C=\frac{1}{2n \lambda}\)
The Regression Class
- class cyanure.estimators.Regression(loss='square', penalty='l2', fit_intercept=True, random_state=0, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, dual=None, safe=True)[source]
Bases:
ERM
The regression class which derives from ERM.
The goal is to minimize the following objective:
\[\min_{w,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, w^\top x_i + b\right) + \psi(w),\]where \(L\) is a regression loss, \(\\psi\) is a regularization function (or constraint), \(w\) is a p-dimensional vector representing model parameters, and b is an optional unregularized intercept., and the targets will be real values.
- Parameters
- loss (string): default=’square’
Loss function to be used. Possible choices are: Only the square loss is implemented at this point. Given two k-dimensional vectors y,z:
‘square’ => \(L(y,z) = \frac{1}{2}( y-z)^2\)
- penalty (string): default=’none’
Regularization function psi. Possible choices are
For binary_problem problems:
- ‘none’
\(psi(w) = 0\)
- ‘l2’
\(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)
- ‘l1
\(psi(w) = \lambda_1 ||w||_1\)
- ‘elasticnet’
\(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
- ‘fused-lasso’
\(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
- ‘l1-ball’
encodes the constraint \(||w||_1 <= \lambda\)
- ‘l2-ball’
encodes the constraint \(||w||_2 <= \lambda\)
For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.
\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)
- ‘l1l2’, which is the multi-task group Lasso regularization
- \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
- ‘l1linf’
- \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
- ‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
- \[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
- fit_intercept (boolean): default=’False’
Learns an unregularized intercept b (or several intercepts for multivariate problems)
- lambda_1 (float): default=0
First regularization parameter
- lambda_2 (float): default=0
Second regularization parameter, if needed
- lambda_3 (float): default=0
Third regularization parameter, if needed
- solver (string): default=’auto’
Optimization solver. Possible choices are
‘ista’
‘fista’
‘catalyst-ista’
‘qning-ista’ (proximal quasi-Newton method)
‘svrg’
‘catalyst-svrg’ (accelerated SVRG with Catalyst)
‘qning-svrg’ (quasi-Newton SVRG)
‘acc-svrg’ (SVRG with direct acceleration)
‘miso’
‘catalyst-miso’ (accelerated MISO with Catalyst)
‘qning-miso’ (quasi-Newton MISO)
‘auto’
see the Latex documentation for more details. If you are unsure, use ‘auto’
- tol (float): default=’1e-3’
Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee
\(f(x_t) - f^* <= tol f(x_t)\)
- max_iter (int): default=500
Maximum number of iteration of the algorithm in terms of passes over the data
- duality_gap_interval (int): default=10
Frequency of duality-gap computation
- verbose (boolean): default=True
Display information or not
- n_threads (int): default=-1
Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.
- random_state (int): default=0
Random seed
- warm_start (boolean): default=False
Use a restart strategy
- binary_problem (boolean): default=True
univariate or multivariate problems
- limited_memory_qning (int): default=20
Memory parameter for the qning method
- fista_restart (int): default=50
Restart strategy for fista (useful for computing regularization path)
Methods:
fit
(X, y[, le_parameter])Fit the parameters.
predict
(X)Predict the labels given an input matrix X (same format as fit).
score
(X, y[, sample_weight])Return the coefficient of determination of the prediction.
densify
()Convert coefficient matrix to dense array format.
get_params
([deep])Get parameters for the estimator.
Get the model parameters (either w or the tuple (w,b)).
set_params
(**params)Allow to change the value of parameters.
sparsify
()Convert coefficient matrix to sparse format.
- fit(X, y, le_parameter=None)[source]
Fit the parameters.
- Parameters
- X (numpy array or scipy sparse CSR matrix):
input n X p numpy matrix; the samples are on the rows
- y (numpy array):
vector of size n with real values for regression
matrix of size n X k for multivariate regression
- Returns
- self (ERM):
Returns the instance of the class
- predict(X)[source]
Predict the labels given an input matrix X (same format as fit).
- Parameters
- X (numpy array or scipy sparse CSR matrix):
Input matrix for the prediction
- Returns
- pred (numpy.array):
Prediction for the X matrix
- score(X, y, sample_weight=None)[source]
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \\frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters
- X (numpy array or scipy sparse CSR matrix):
Test samples.
- y (numpy.array):
True labels for X.
- sample_weight (numpy.array, optional):
Sample weights. Defaults to None.
- Returns
- score (float):
\(R^2\) of
self.predict(X)
wrt. y.
- densify()
Convert coefficient matrix to dense array format.
Converts the
coef_
member (back) to a numpy.ndarray. This is the default format ofcoef_
and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.- Returns
- self (ERM):
Fitted estimator converted to dense estimator
- get_params(deep=True)
Get parameters for the estimator.
- Parameters
- deep (bool, optional):
If True returns also subobjects that are estimators. Defaults to True.
- Returns
- params (dict):
Parameters names and values
- get_weights()
Get the model parameters (either w or the tuple (w,b)).
- Returns
- w or (w,b) (numpy.array or tuple of numpy.array):
Model parameters
- set_params(**params)
Allow to change the value of parameters.
- Parameters
- params (dict):
Estimator parameters to set
- Returns
- self (ERM):
Estimator instance
- Raises
- ValueError:
The parameter does not exist
- sparsify()
Convert coefficient matrix to sparse format.
Converts the
coef_
member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. Theintercept_
member is not converted.- Returns
- self (ERM):
Fitted estimator converted to parse estimator.
Notes
For non-sparse models, i.e. when there are not many zeros in
coef_
, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with(coef_ == 0).sum()
, must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.
The Classifier Class
- class cyanure.estimators.Classifier(loss='square', penalty='l2', fit_intercept=True, tol=0.001, solver='auto', random_state=0, max_iter=500, fista_restart=50, verbose=True, warm_start=False, multi_class='auto', limited_memory_qning=20, lambda_1=0, lambda_2=0, lambda_3=0, duality_gap_interval=5, n_threads=-1, dual=None, safe=True)[source]
Bases:
ClassifierAbstraction
The classification class.
The goal is to minimize the following objective:
\[\min_{W,b} \frac{1}{n} \sum_{i=1}^n L\left( y_i, W^\top x_i + b\right) + \psi(W)\]where \(L\) is a classification loss, \(\psi\) is a regularization function (or constraint), \(W=[w_1,\ldots,w_k]\) is a (p x k) matrix that carries the k predictors, where k is the number of classes, and \(y_i\) is a label in \(\{1,\ldots,k\}\). b is a k-dimensional vector representing an unregularized intercept (which is optional).
- Parameters
- loss: string, default=’square’
Loss function to be used. Possible choices are
- ‘square’
\(L(y,z) = \frac{1}{2} ( y-z)^2\)
- ‘logistic’
\(L(y,z) = \log(1 + e^{-y z} )\)
- ‘sqhinge’ or ‘squared_hinge’
\(L(y,z) = \frac{1}{2} \max( 0, 1- y z)^2\)
- ‘safe-logistic’
\(L(y,z) = e^{ yz - 1 } - y z ~\text{if}~ yz \leq 1~~\text{and}~~0\) otherwise
- ‘multiclass-logistic’
which is also called multinomial or softmax logistic: \(L(y, W^\top x + b) = \sum_{j=1}^k \log\left(e^{w_j^\top + b_j} - e^{w_y^\top + b_y} \right)\)
- penalty (string): default=’none’
Regularization function psi. Possible choices are
For binary_problem problems:
- ‘none’
\(psi(w) = 0\)
- ‘l2’
\(psi(w) = \frac{\lambda_1}{2} ||w||_2^2\)
- ‘l1’
\(psi(w) = \lambda_1 ||w||_1\)
- ‘elasticnet’
\(psi(w) = \lambda_1 ||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
- ‘fused-lasso’
\(psi(w) = \lambda_3 \sum_{i=2}^p |w[i]-w[i-1]| + \lambda_1||w||_1 + \frac{\lambda_2}{2}||w||_2^2\)
- ‘l1-ball’
encodes the constraint \(||w||_1 <= \lambda\)
- ‘l2-ball’
encodes the constraint \(||w||_2 <= \lambda\)
For multivariate problems, the previous penalties operate on each individual (e.g., class) predictor.
\[\psi(W) = \sum_{j=1}^k \psi(w_j).\]In addition, multitask-group Lasso penalties are provided for multivariate problems (w is then a matrix)
- ‘l1l2’, which is the multi-task group Lasso regularization
- \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_2~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
- ‘l1linf’
- \[\psi(W) = \lambda \sum_{j=1}^p \|W^j\|_\infty.\]
- ‘l1l2+l1’, which is the multi-task group Lasso regularization + l1
- \[\psi(W) = \sum_{j=1}^p \lambda \|W^j\|_2 + \lambda_2 \|W^j\|_1 ~~~~ \text{where}~W^j~\text{is the j-th row of}~W.\]
- fit_intercept (boolean): default=’False’
Learns an unregularized intercept b (or several intercepts for multivariate problems)
- lambda_1 (float): default=0
First regularization parameter
- lambda_2 (float): default=0
Second regularization parameter, if needed
- lambda_3 (float): default=0
Third regularization parameter, if needed
- solver (string): default=’auto’
Optimization solver. Possible choices are
‘ista’
‘fista’
‘catalyst-ista’
‘qning-ista’ (proximal quasi-Newton method)
‘svrg’
‘catalyst-svrg’ (accelerated SVRG with Catalyst)
‘qning-svrg’ (quasi-Newton SVRG)
‘acc-svrg’ (SVRG with direct acceleration)
‘miso’
‘catalyst-miso’ (accelerated MISO with Catalyst)
‘qning-miso’ (quasi-Newton MISO)
‘auto’
see the Latex documentation for more details. If you are unsure, use ‘auto’
- tol (float): default=’1e-3’
Tolerance parameter. For almost all combinations of loss and penalty functions, this parameter is based on a duality gap. Assuming the (non-negative) objective function is “f” and its optimal value is “f^*”, the algorithm stops with the guarantee
\(f(x_t) - f^* <= tol f(x_t)\)
- max_iter (int): default=500
Maximum number of iteration of the algorithm in terms of passes over the data
- duality_gap_interval (int): default=10
Frequency of duality-gap computation
- verbose (boolean): default=True
Display information or not
- n_threads (int): default=-1
Maximum number of cores the method may use (-1 = all cores). Note that more cores is not always better.
- random_state (int): default=0
Random seed
- warm_start (boolean): default=False
Use a restart strategy
- binary_problem (boolean): default=True
univariate or multivariate problems
- limited_memory_qning (int): default=20
Memory parameter for the qning method
- fista_restart (int): default=50
Restart strategy for fista (useful for computing regularization path)
Methods:
fit
(X, y[, le_parameter])Fit the parameters.
predict
(X)Predict the labels given an input matrix X (same format as fit).
score
(X, y)Give an accuracy score on test data.
Predict confidence scores for samples.
Estimate the probability for each class.
densify
()Convert coefficient matrix to dense array format.
get_params
([deep])Get parameters for the estimator.
Get the model parameters (either w or the tuple (w,b)).
set_params
(**params)Allow to change the value of parameters.
sparsify
()Convert coefficient matrix to sparse format.
- fit(X, y, le_parameter=None)[source]
Fit the parameters.
- Parameters
- X (numpy array, or scipy sparse CSR matrix):
input n x p numpy matrix; the samples are on the rows
- y (numpy.array):
Input labels.
vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.
- predict(X)[source]
Predict the labels given an input matrix X (same format as fit).
- Parameters
- X (numpy array or scipy sparse CSR matrix):
Input matrix for the prediction
- Returns
- pred (numpy.array):
Prediction for the X matrix
- score(X, y)[source]
Give an accuracy score on test data.
- Parameters
- X (numpy array or scipy sparse CSR matrix):
Test samples.
- y (numpy.array):
True labels for X.
- sample_weight (numpy.array, optional):
Sample weights. Defaults to None.
- Returns
- scorefloat
Mean accuracy of
self.predict(X)
wrt. y.
- decision_function(X)[source]
Predict confidence scores for samples.
- Parameters
- X (numpy array or scipy sparse CSR matrix):
The data for which we want scores
- Returns
- scores (numpy.array):
Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means t his class would be predicted.
- predict_proba(X)[source]
Estimate the probability for each class.
- Parameters
- X (numpy array or scipy sparse CSR matrix):
Data matrix for which we want probabilities
- Returns
- proba (numpy.array):
Return the probability of the samples for each class.
- densify()
Convert coefficient matrix to dense array format.
Converts the
coef_
member (back) to a numpy.ndarray. This is the default format ofcoef_
and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.- Returns
- self (ERM):
Fitted estimator converted to dense estimator
- get_params(deep=True)
Get parameters for the estimator.
- Parameters
- deep (bool, optional):
If True returns also subobjects that are estimators. Defaults to True.
- Returns
- params (dict):
Parameters names and values
- get_weights()
Get the model parameters (either w or the tuple (w,b)).
- Returns
- w or (w,b) (numpy.array or tuple of numpy.array):
Model parameters
- set_params(**params)
Allow to change the value of parameters.
- Parameters
- params (dict):
Estimator parameters to set
- Returns
- self (ERM):
Estimator instance
- Raises
- ValueError:
The parameter does not exist
- sparsify()
Convert coefficient matrix to sparse format.
Converts the
coef_
member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation. Theintercept_
member is not converted.- Returns
- self (ERM):
Fitted estimator converted to parse estimator.
Notes
For non-sparse models, i.e. when there are not many zeros in
coef_
, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with(coef_ == 0).sum()
, must be more than 50% for this to provide significant benefits. After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.
Pre-configured classes
- class cyanure.estimators.LinearSVC(loss='sqhinge', penalty='l2', fit_intercept=True, verbose=False, lambda_1=0.1, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, dual=None, safe=True)[source]
Bases:
Classifier
A pre-configured class for square hinge loss.
Methods:
decision_function
(X)Predict confidence scores for samples.
densify
()Convert coefficient matrix to dense array format.
fit
(X, y[, le_parameter])Fit the parameters.
get_params
([deep])Get parameters for the estimator.
get_weights
()Get the model parameters (either w or the tuple (w,b)).
predict
(X)Predict the labels given an input matrix X (same format as fit).
predict_proba
(X)Estimate the probability for each class.
score
(X, y)Give an accuracy score on test data.
set_params
(**params)Allow to change the value of parameters.
sparsify
()Convert coefficient matrix to sparse format.
- class cyanure.estimators.LogisticRegression(penalty='l2', loss='logistic', fit_intercept=True, verbose=False, lambda_1=0, lambda_2=0, lambda_3=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, warm_start=False, n_threads=-1, random_state=0, multi_class='auto', dual=None, safe=True)[source]
Bases:
Classifier
A pre-configured class for logistic regression loss.
Methods:
decision_function
(X)Predict confidence scores for samples.
densify
()Convert coefficient matrix to dense array format.
fit
(X, y[, le_parameter])Fit the parameters.
get_params
([deep])Get parameters for the estimator.
get_weights
()Get the model parameters (either w or the tuple (w,b)).
predict
(X)Predict the labels given an input matrix X (same format as fit).
predict_proba
(X)Estimate the probability for each class.
score
(X, y)Give an accuracy score on test data.
set_params
(**params)Allow to change the value of parameters.
sparsify
()Convert coefficient matrix to sparse format.
- class cyanure.estimators.Lasso(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, dual=None, safe=True)[source]
Bases:
Regression
A pre-configured class for Lasso regression.
Using active set when the number of features is superior to 1000.
Methods:
fit
(X, y)Fit the parameters.
densify
()Convert coefficient matrix to dense array format.
get_params
([deep])Get parameters for the estimator.
get_weights
()Get the model parameters (either w or the tuple (w,b)).
predict
(X)Predict the labels given an input matrix X (same format as fit).
score
(X, y[, sample_weight])Return the coefficient of determination of the prediction.
set_params
(**params)Allow to change the value of parameters.
sparsify
()Convert coefficient matrix to sparse format.
- fit(X, y)[source]
Fit the parameters.
- Parameters
- X (numpy array or scipy sparse CSR matrix):
input n X p numpy matrix; the samples are on the rows
- y (numpy array):
vector of size n with real values for regression
matrix of size n X k for multivariate regression
- Returns
- self (ERM):
Returns the instance of the class
- class cyanure.estimators.L1Logistic(lambda_1=0, solver='auto', tol=0.001, duality_gap_interval=10, max_iter=500, limited_memory_qning=20, fista_restart=50, verbose=True, warm_start=False, n_threads=-1, random_state=0, fit_intercept=True, multi_class='auto', dual=None, safe=True)[source]
Bases:
Classifier
A pre-configured class for L1 logistic classification.
Using active set when the number of features is superior to 1000
Methods:
decision_function
(X)Predict confidence scores for samples.
densify
()Convert coefficient matrix to dense array format.
fit
(X, y)Fit the parameters.
get_params
([deep])Get parameters for the estimator.
get_weights
()Get the model parameters (either w or the tuple (w,b)).
predict
(X)Predict the labels given an input matrix X (same format as fit).
predict_proba
(X)Estimate the probability for each class.
score
(X, y)Give an accuracy score on test data.
set_params
(**params)Allow to change the value of parameters.
sparsify
()Convert coefficient matrix to sparse format.
- fit(X, y)[source]
Fit the parameters.
- Parameters
- X (numpy array, or scipy sparse CSR matrix):
input n x p numpy matrix; the samples are on the rows
- y (numpy.array):
Input labels.
vector of size n with {-1, +1} labels for binary classification, which will be automatically converted if labels in {0,1} are provided and {0,1,…, n} for multiclass classification.