Contain the functions concerning the processing of data.

Preprocessing functions

Preprocess data

cyanure.data_processing.preprocess(X, centering=False, normalize=True, columns=False)[source]

Preprocess features training data.

Perform in-place centering or normalization, either of columns or rows of the input matrix X.

Parameters
X (numpy array or scipy sparse CSR matrix):

Input matrix

centering (boolean)default=False

Perform a centering operation

normalize (boolean): default=True

l2-normalization

input_nameTrue).

Input verification functions

These functions are not be necessary for any normal use of the library

cyanure.data_processing.check_labels(labels, estimator)[source]

Verify the format of labels depending on the type of the estimator.

Can convert labels in some cases.

Parameters
labels (numpy array or scipy sparse CSR matrix):

Numpy array containing labels

estimator (ERM):

The estimator which will be fitted

Returns
labels (numpy array or scipy sparse CSR matrix):

Converted labels if required by the estimator.

label_encoder (sklearn.LabelEncoder):

Convert text labels if needed

Raises
ValueError:

Format of the labels does not respect the format supported by Cyanure classifiers.

ValueError:

Labels have an non finite value

ValueError:

Problem has only one class

cyanure.data_processing.check_input_type(X, labels, estimator)[source]

Verify the format of labels and features depending on the type of the estimator.

Can convert labels in some cases.

Parameters
X (numpy array or scipy sparse CSR matrix):

Numpy array containing features

labels (numpy array or scipy sparse CSR matrix):

Numpy array containing labels

estimator (ERM):

The estimator which will be fitted

Returns
X (numpy array or scipy sparse CSR matrix):

Converted features if required by the estimator.

labels (numpy array or scipy sparse CSR matrix):

Converted labels if required by the estimator.

label_encoder (sklearn.LabelEncoder):

Convert text labels if needed

Raises
ValueError:

Data are complex

ValueError:

Data contains non finite value

TypeError:

Sparsed features are not CSR

TypeError:

Sparsed labels are not CSR

cyanure.data_processing.check_positive_parameter(parameter, message)[source]

Check that a parameter if a number and positive.

Parameters
parameter (Any):

Parameter to verify

message (string):

Message of the exception

Raises
ValueError:

Parameter is not a number

ValueError:

Parameter is not positive

cyanure.data_processing.check_parameters(estimator)[source]

Verify that the different parameters of an estimator respect the constraints.

Parameters
estimator (ERM):

Estimator to veriffy

cyanure.data_processing.check_input_fit(X, labels, estimator)[source]

Check the different input arrays required for training according to the estimator type.

Can convert data if necessary.

Parameters
X (numpy array or scipy sparse CSR matrix):

Numpy array containing features

labels (numpy array or scipy sparse CSR matrix):

Numpy array containing labels

estimator (ERM):

The estimator which will be fitted

Returns
X (numpy array or scipy sparse CSR matrix):

Converted features if required by the estimator.

labels (numpy array or scipy sparse CSR matrix):

Converted labels if required by the estimator.

label_encoder (sklearn.LabelEncoder):

Convert text labels if needed

Raises
ValueError:

There is only one feature.

ValueError:

There is no sample.

ValueError:

An observation has no label.

ValueError:

Feature array has no feature

ValueError:

Features and labels does not have the same number of observations.

ValueError:

There is only one sample.

cyanure.data_processing.check_input_inference(X, estimator)[source]

Check the format of the array which will be used for inference. Input array can be converted.

Parameters
X (numpy array or scipy sparse CSR matrix):

Array which will be used for inference

estimator (ERM):

Estimator which will be used

Returns
X (numpy array or scipy sparse CSR matrix):

Potentially converted array (if converted as numpy.float64)

Raises
ValueError:

One of the value is not finite

ValueError:

Shape of features is not correct

ValueError:

Shape of features does not correspond to estimators shape