IdentifiableFastSRM

class fastsrm.identifiable_srm.IdentifiableFastSRM(method='prob', n_components=20, n_iter=1000, temp_dir=None, n_jobs=1, verbose='warn', aggregate='mean', tol=1e-12, use_pca='auto', random_state=0)

SRM decomposition using a very low amount of memory and computational power thanks to the use of an atlas as described in [Richard2019].

Given multi-subject data, factorize it as a shared response S among all subjects and an orthogonal transform (basis) W per subject:

\[X_i \approx W_i S, \forall i=1 \dots N\]
Parameters:
  • method (str, default="prob") – if “prob”, uses Probabilistic SRM if “det”, uses Deterministic SRM

  • n_components (int) – Number of timecourses of the shared coordinates

  • n_iter (int) – Number of iterations to perform

  • temp_dir (str or None) – Path to dir where temporary results are stored. If None temporary results will be stored in memory. This can results in memory errors when the number of subjects and/or sessions is large

  • n_jobs (int, optional, default=1) – The number of CPUs to use to do the computation. -1 means all CPUs, -2 all CPUs but one, and so on.

  • verbose (bool or "warn") – If True, logs are enabled. If False, logs are disabled. If “warn” only warnings are printed.

  • aggregate (str or None, default="mean") – If “mean”, shared_response is the mean shared response from all subjects. If None, shared_response contains all subject-specific responses in shared space

  • tol (float) – Stops if the norm of the gradient falls below tolerance value

  • use_pca (bool or "auto") – If True use pca as a preprocessing step. If “auto”, it is set to True if n_iter > 1 and n_voxels > 10 * n_timeframes.

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the initialization.

`basis_list`
  • if basis is a list of array, element i is the basis of subject i

  • if basis is a list of str, element i is the path to the basis of subject i that is loaded with np.load yielding an array of shape [n_components, n_voxels].

Type:

list of array, element i has shape=[n_components, n_voxels] or list of str

noise_variancenp array of shape (n_views,)

Noise variance

source_covariancenp array of shape (n_components,)

(Diagonal) Shared response covariance

Note that any call to the clean method erases this attribute

Note

References: H. Richard, L. Martin, A. Pinho, J. Pillow, B. Thirion, 2019: Fast shared response model for fMRI data (https://arxiv.org/pdf/1909.12537.pdf)

add_subjects(imgs, shared_response)

Add subjects to the current fit. Each new basis will be appended at the end of the list of basis (which can be accessed using self.basis)

Parameters:
  • imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

    Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

    imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

    imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

  • shared_response (list of arrays, list of list of arrays or array) –

    • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

    • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

    • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

    • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

clean()

This erases temporary files and basis_list attribute to free memory. This method should be called when fitted model is not needed anymore.

fit(imgs)

Computes basis across subjects from input imgs

Parameters:

imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

Returns:

self – Returns the instance itself. Contains attributes listed at the object level.

Return type:

object

fit_transform(imgs, subjects_indexes=None)

Computes basis across subjects and shared response from input imgs return shared response.

Parameters:
  • imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

    Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

    imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

    imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

  • subjects_indexes (list or None:) – if None imgs[i] will be transformed using basis_list[i]. Otherwise imgs[i] will be transformed using basis_list[subjects_index[i]]

Returns:

shared_response

  • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

  • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

  • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

  • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

Return type:

list of arrays, list of list of arrays or array

inverse_transform(shared_response, subjects_indexes=None, sessions_indexes=None)

From shared response and basis from training data reconstruct subject’s data

Parameters:
  • shared_response (list of arrays, list of list of arrays or array) –

    • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

    • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

    • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

    • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

  • subjects_indexes (list or None) – if None reconstructs data of all subjects used during train. Otherwise reconstructs data of subjects specified by subjects_indexes.

  • sessions_indexes (list or None) – if None reconstructs data of all sessions. Otherwise uses reconstructs data of sessions specified by sessions_indexes.

Returns:

reconstructed_data

  • if reconstructed_data is a list of list : element i, j is the reconstructed data for subject subjects_indexes[i] and session sessions_indexes[j] as an np array of shape n_voxels, n_timeframes

  • if reconstructed_data is a list : element i is the reconstructed data for subject subject_indexes[i] as an np array of shape n_voxels, n_timeframes

Return type:

list of list of arrays or list of arrays

set_fit_request(*, imgs: bool | None | str = '$UNCHANGED$') IdentifiableFastSRM

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

imgs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for imgs parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_inverse_transform_request(*, sessions_indexes: bool | None | str = '$UNCHANGED$', shared_response: bool | None | str = '$UNCHANGED$', subjects_indexes: bool | None | str = '$UNCHANGED$') IdentifiableFastSRM

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:
  • sessions_indexes (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sessions_indexes parameter in inverse_transform.

  • shared_response (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for shared_response parameter in inverse_transform.

  • subjects_indexes (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for subjects_indexes parameter in inverse_transform.

Returns:

self – The updated object.

Return type:

object

set_transform_request(*, imgs: bool | None | str = '$UNCHANGED$', subjects_indexes: bool | None | str = '$UNCHANGED$') IdentifiableFastSRM

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:
  • imgs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for imgs parameter in transform.

  • subjects_indexes (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for subjects_indexes parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(imgs, subjects_indexes=None)

From data in imgs and basis from training data, computes shared response.

Parameters:
  • imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

    Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

    imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

    imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

  • subjects_indexes (list or None:) – if None imgs[i] will be transformed using basis_list[i]. Otherwise imgs[i] will be transformed using basis[subjects_index[i]]

Returns:

shared_response

  • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

  • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

  • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

  • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

Return type:

list of arrays, list of list of arrays or array

FastSRM, DetSRM and ProbSRM

fastsrm.fastsrm.fastsrm(imgs, n_components, n_jobs=1, verbose=False, n_iter=100, tol=0.001, method='prob', temp_dir=None, random_state=None, callback=None)

Performs an SRM decomposition on input data with a number of features much larger than the number of samples: Reduces the data by PCA and apply an SRM algorithm one reduced data.

Parameters:
  • imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) – Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1 imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j. imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

  • n_components (int) – Number of timecourses of the shared coordinates

  • n_jobs (int, optional, default=1) – The number of CPUs to use to do the computation. -1 means all CPUs, -2 all CPUs but one, and so on.

  • verbose (bool or "warn") – If True, logs are enabled. If False, logs are disabled. If “warn” only warnings are printed.

  • n_iter (int) – Number of iterations to perform

  • tol (float) – Stops if the norm of the gradient falls below tolerance value

  • method (str, default="prob") – if “prob”, uses Probabilistic SRM if “det”, uses Deterministic SRM

  • temp_dir (str or None) – Path to dir where temporary results are stored. If None temporary results will be stored in memory. This can results in memory errors when the number of subjects and/or sessions is large

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the initialization.

  • callback (function or None) – At each iteration calls callback(S, gnorm, it, t0) where S is the current estimate of the shared response, gnorm is the current gradient norm, it is the current iteration and t0 the current time. The result is saved in a list record. If callback is None, nothing is done.

Returns:

  • W (list of n_subjects numpy array of shape (n_voxels, n_components) or path to np array of shape (n_voxels, n_components)) – Subject specific basis

  • S (np array of shape (n_components, n_timeframes)) – Shared response

  • sigmas (np array of shape (n_subjects,)) – Noise variance (only returned if method == “probsrm”)

  • Sigma (np array of shape (n_components,)) – (Diagonal) source covariance (only returned if method == “probsrm”)

  • records (list of shape (n_iter,)) – The recorded information from callback. Only returned if callback is not None

fastsrm.srm.detsrm(X, n_components, n_iter=100, random_state=None, verbose=False, tol=1e-05, callback=None)

Perform Deterministic SRM on numpy arrays. To be used when data hold in memory and the number of features is not much larger than the number of samples (in which case fastsrm is preferable)

Parameters:
  • X (np array of shape (n_views, n_features, n_samples)) – Input data

  • n_components (int) – Number of timecourses of the shared coordinates

  • n_iter (int) – Number of iterations to perform

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the initialization.

  • verbose (bool) – If True, logs are enabled. If False, logs are disabled.

  • tol (float) – Stops if the norm of the gradient falls below tolerance value

  • callback (function or None) – At each iteration calls callback(S, gnorm, it, t0) where S is the current estimate of the shared response, gnorm is the current gradient norm, it is the current iteration and t0 the current time. The result is saved in a list record. If callback is None, nothing is done.

Returns:

  • W (np array of shape (n_views, n_features, n_components)) – Subject specific basis

  • S (np array of shape (n_components, n_timeframes)) – Shared response

  • records (list of shape (n_iter,)) – The recorded information from callback. Only returned if callback is not None

fastsrm.srm.probsrm(X, n_components, n_iter=100, random_state=None, corrective_factor=1, verbose=False, tol=1e-05, callback=None)

Perform Probabilistic SRM on numpy arrays. To be used when data hold in memory and the number of features is not much larger than the number of samples (in which case fastsrm is preferable)

Parameters:
  • X (np array of shape (n_views, n_features, n_samples)) – Input data

  • n_components (int) – Number of timecourses of the shared coordinates

  • n_iter (int) – Number of iterations to perform

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the initialization.

  • verbose (bool) – If True, logs are enabled. If False, logs are disabled.

  • tol (float) – Stops if the norm of the gradient falls below tolerance value

  • callback (function or None) – At each iteration calls callback(S, gnorm, it, t0) where S is the current estimate of the shared response, gnorm is the current gradient norm, it is the current iteration and t0 the current time. The result is saved in a list record. If callback is None, nothing is done.

Returns:

  • W (np array of shape (n_views, n_features, n_components)) – Subject specific basis

  • S (np array of shape (n_components, n_timeframes)) – Shared response

  • sigmas (np array of shape (n_views,)) – Noise variance

  • Sigma (np array of shape (n_components,)) – (Diagonal) Shared response covariance

  • records (list of shape (n_iter,)) – The recorded information from callback. Only returned if callback is not None