breze Documentation Release 0.1 brml.de September 15, 2016 Contents 1 Basics 1.1 Specifiying losses, norms, transfer functions etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Models and Algorithms 2.1 Principal Component Analysis . . . . . 2.2 Extreme Component Analysis . . . . . 2.3 Sparse Filtering . . . . . . . . . . . . . 2.4 ICA with Reconstruction Cost . . . . . 2.5 Canonical Correlation Analysis . . . . 2.6 Slow Feature Analysis . . . . . . . . . 2.7 K-Means . . . . . . . . . . . . . . . . 2.8 Regularized Information Maximization 2.9 Stochastic Gradient Variational Bayes . 2.10 Linear Denoiser . . . . . . . . . . . . 2.11 Recurrent Neural Networks . . . . . . 2.12 Multilayer Perceptrons . . . . . . . . . 2.13 Hybrid Monte Carlo . . . . . . . . . . 2.14 Trainers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 7 9 10 10 11 12 13 15 15 16 17 21 22 Helpers, convenience functions and tools 3.1 Feature extraction . . . . . . . . . . 3.2 Data manipulation . . . . . . . . . . 3.3 Various utilities . . . . . . . . . . . . 3.4 Helpers for plotting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 25 27 28 28 Architectures, Components 4.1 Norms . . . . . . . . . . . . . . . . . . . 4.2 Transfer functions . . . . . . . . . . . . . 4.3 Loss functions . . . . . . . . . . . . . . . 4.4 Stochastic Corruption of Theano Variables 4.5 Miscellaneous functionality . . . . . . . . 4.6 Layers . . . . . . . . . . . . . . . . . . . 4.7 Common functions . . . . . . . . . . . . . 4.8 Univariate Normal Distribution . . . . . . 4.9 Multivariate Normal Distribution . . . . . 4.10 Utilities . . . . . . . . . . . . . . . . . . . 4.11 Common functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 33 35 38 39 40 41 42 44 45 48 3 4 . . . . 1 1 i 5 Implementation Notes 5.1 Variance propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 6 Indices and tables 55 Bibliography 57 Python Module Index 59 ii CHAPTER 1 Basics 1.1 Specifiying losses, norms, transfer functions etc. To maintain flexibility and conciseness, configuring models can be achieved twofold: either by using a string or by using a function that follows the specific API. 1.1.1 Using the builtin loss functions Let us start with an example. To instantiate a linear model, we can make use of the following notation: from breze.learn.glm import Linear model = Linear(5, 1, loss='squared') In this case, we specify the sum of squares loss as string. The logic behind this aims to be straight forward: for losses, a lookup is done in the module breze.arch.components.distance. Thus, the function breze.arch.component.distance.squared is used as a loss. This function follows a simple protocol. In the case of an supervised model, it is called with the target as its first argument and the output of the model as its second argument. However, both are required to be Theano variables. In the case of an unsupervised model, the output of the model is the only argument passed on to the loss. A list of supervised losses can be found by checking the contents of the breze.arch.components.distance module: >>> from breze.arch.components import distance >>> dir(distance) ['T', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'absolute', 'bernoulli_kl', 'bernoulli_neg_cross_entropy', 'discrete_entropy', 'distance_matrix', 'lookup', 'nca', 'neg_cross_entropy', 'nominal_neg_cross_entropy', 1 breze Documentation, Release 0.1 'norm', 'squared'] Some of these are just global variable of course. 1.1.2 Using custom loss functions Using your own loss function comes down to implementing it following the above protocol and working on Theano variables. We can thus define the sum of squares loss ourself as follows: def squared(target, output): d = target - output return (d**2).sum() We can also use more complicated loss functions. The Huber loss for example is a mix of the absolute error and the squared error, depending on the size of the error. It depends on an additional threshold parameter and is defined as follow: πΏπΏ (π) = 2 π 2 if |π| ≤ πΏ, πΏπΏ (π) = πΏ πΏ(|π| − ), 2 else. We can implement this as follows: import theano.tensor as T delta = 0.1 def huber(target, output): d = target - output a = .5 * d**2 b = delta * (abs(d) - delta / 2.) l = T.switch(abs(d) <= delta, a, b) return l.sum() Unfortunately, we will have to set a global variable for this. The most elegant solution is to use a function template: import theano.tensor as T def make_huber(delta): def inner(target, output): d = target - output a = .5 * d**2 b = delta * (abs(d) - delta / 2.) l = T.switch(abs(d) <= delta, a, b) return l.sum() return inner my_huber = make_huber(0.1) This way we can create wild loss functions. 1.1.3 Using norms and transfer functions The story is similar when using norms and loss functions. In the former case, the module of interest is breze.arch.component.norm. The protocol is that a single argument, a Theano variable, is given. The re2 Chapter 1. Basics breze Documentation, Release 0.1 sult is expected to be a Theano variable of the same shape. This is also the case for transfer functions, except that the module in question is breze.arch.component.transfer. 1.1. Specifiying losses, norms, transfer functions etc. 3 breze Documentation, Release 0.1 4 Chapter 1. Basics CHAPTER 2 Models and Algorithms Learning representations, clustering: 2.1 Principal Component Analysis This module provides functionality for principal component analysis. class breze.learn.pca.Pca(n_components=None, whiten=False) Class to perform principal component analysis. Attributes n_components whiten weights singular_values (integer) Number of components to keep. (boolean) Flag indicating whether to whiten the covariance matrix. (array_like) 2D array representing the map from observable to latent space. (array_like) 1D array containing the singular values of the problem. Methods fit(X) inverse_transform(F) reconstruct(X) transform(X) Fit the parameters of the model. Perform an inverse transformation of transformed data according to the model. Reconstruct the data according to the model. Transform data according to the model. __init__(n_components=None, whiten=False) Create a Pca object. fit(X) Fit the parameters of the model. The data should be centered (that is, its mean subtracted rowwise) before using this method. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. inverse_transform(F) 5 breze Documentation, Release 0.1 Perform an inverse transformation of transformed data according to the model. Parameters F : array_like An array of shape (n, d) where n is the number of data points and d the dimensionality if the feature space. Returns X : array_like An array of shape (n, c) where n is the number of samples and c is the dimensionality of the input space. reconstruct(X) Reconstruct the data according to the model. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. Returns Y : array_like An array of shape (n, d) where n is the number of samples and d is the dimensionality of the input space. transform(X) Transform data according to the model. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. Returns Y : array_like An array of shape (n, c) where n is the number of samples and c is the number of components kept. class breze.learn.pca.Zca(min_eig_val=0.1) Class to perform zero component analysis. Attributes min_eig_val weights singular_values (float) Eigenvalues are increased by this value before reconstructing. (array_like) 2D array representing the map from observable to latent space. (array_like) 1D array containing the singular values of the problem. Methods fit(X) inverse_transform(F) reconstruct(X) transform(X) Fit the parameters of the model. Perform an inverse transformation of transformed data according to the model. Reconstruct the data according to the model. Transform data according to the model. __init__(min_eig_val=0.1) Create a Zca object. fit(X) 6 Chapter 2. Models and Algorithms breze Documentation, Release 0.1 Fit the parameters of the model. The data should be centered (that is, its mean subtracted rowwise) before using this method. inverse_transform(F) Perform an inverse transformation of transformed data according to the model. Parameters F : array_like An array of shape (n, d) where n is the number of data points and d the dimensionality if the feature space. Returns X : array_like An array of shape (n, c) where n is the number of samples and c is the dimensionality of the input space. reconstruct(X) Reconstruct the data according to the model. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. Returns Y : array_like An array of shape (n, d) where n is the number of samples and d is the dimensionality of the input space. transform(X) Transform data according to the model. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. Returns Y : array_like An array of shape (n, c) where n is the number of samples and c is the number of components kept. 2.2 Extreme Component Analysis This module provides functionality for extreme component analysis. An explanation and derivation of the algorithm can be found in [XCA]. class breze.learn.xca.Xca(n_components, whiten=False) Class implementing extreme component analysis. The idea is that not only the prinicple components or the minor components of a data set are important, but a combination of the two. This algorithm works by combining probabilistic versions of PCA and MCA. The central idea is that if n principle and m minor components are chosen, a gap of size D - m - n dimensions is formed in the list of singular values. The exact location of this gap is found by chosing the one which minimizes a likelihood combining PCA and MCA. 2.2. Extreme Component Analysis 7 breze Documentation, Release 0.1 Attributes n_components (integer) Amount of components kept. Methods fit(X) inverse_transform(F) reconstruct(X) transform(X) Fit the parameters of the model. Perform an inverse transformation of transformed data according to the model. Reconstruct the data according to the model. Transform data according to the model. __init__(n_components, whiten=False) Create an Xca object. Parameters n_components : integer Amount of components to keep. fit(X) Fit the parameters of the model. The data should be centered (that is, its mean subtracted rowwise) before using this method. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. inverse_transform(F) Perform an inverse transformation of transformed data according to the model. Parameters F : array_like An array of shape (n, d) where n is the number of data points and d the dimensionality if the feature space. Returns X : array_like An array of shape (n, c) where n is the number of samples and c is the dimensionality of the input space. reconstruct(X) Reconstruct the data according to the model. Returns Y : array_like An array of shape (n, d) where n is the number of samples and d is the dimensionality of the input space. transform(X) Transform data according to the model. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. Returns F : array_like An array of shape (n, c) where n is the number of samples and c is the number of components kept. 8 Chapter 2. Models and Algorithms breze Documentation, Release 0.1 2.3 Sparse Filtering Sparse Filtering. As introduced in Sparse Filtering Jiquan Ngiam, Pangwei Koh, Zhenghao Chen, Sonia Bhaskar and Andrew Y. Ng. In NIPS*2011. class breze.learn.sparsefiltering.SparseFiltering(n_inpt, n_output, feature_transfer=’softabs’, optimizer=’lbfgs’, max_iter=1000, verbose=False) Attributes inpt loss output f_score f_transform gradient_clip_threshold mode Methods fit(X[, W]) function(variables, exprs[, mode, ...]) iter_fit(X[, W, info_opt]) powerfit(fit_data, eval_data, stop, report) score(X[, W]) transform(X) var_exp_for_gpu(variables, exprs[, outputs]) Fit the parameters of the model. Return a compiled function for the given exprs given variables. Iteratively fit the parameters of the model to the given data. Iteratively fit the model. Return the score of the model given the input and targets. Return the feature representation of the model given X. Given variables and theano expressions built from these variables, return variable __init__(n_inpt, n_output, feature_transfer=’softabs’, optimizer=’lbfgs’, max_iter=1000, verbose=False) Create a SparseFiltering object. Parameters n_inpt : int Input dimensionality of the data. n_output: int Dimensionality of the hidden feature dimension. feature_transfer : string or callable Transfer function to use. If a string referring any function found in breze.arch.component.transfer or a function that given an (n, d) array returns an (n, d) array as theano expressions. max_iter : int 2.3. Sparse Filtering 9 breze Documentation, Release 0.1 Maximum number of optimization iterations to perform. verbose : bool Flag indicating whether to print out information during fitting. fit(X, W=None) Fit the parameters of the model. Parameters X – Array representing the samples. iter_fit(X, W=None, info_opt=None) Iteratively fit the parameters of the model to the given data. Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state after each iteration, so that the optimization can be broken any time by the caller. This method does not respect the max_iter attribute. Parameters X – Array representing the samples. transform(X) Return the feature representation of the model given X. Parameters X : array_like Represents the inputs to be transformed. Returns Y : array_like Transformation of X under the model. 2.4 ICA with Reconstruction Cost 2.5 Canonical Correlation Analysis breze.learn.cca.cca(X, Y) Canonical Correlation Analysis Parameters X : array_like Observation matrix in first space, every column is one data point. Y : array_like Observation matrix in second space, every column is one data point. Returns cA : array_like Basis in X space B : array_like Basis in Y space. clambdas : array_like Correlation. 10 Chapter 2. Models and Algorithms breze Documentation, Release 0.1 2.6 Slow Feature Analysis Slow Feature Analysis. This module provides functionality for slow feature analysis. A helpful article is hosted at scholarpedia. class breze.learn.sfa.SlowFeatureAnalysis(n_components=None) Class for performing Slow feature analysis. Attributes n_components (integer) Number of components to keep. Methods fit(X) transform(X) Fit the parameters of the model. Transform data according to the model. __init__(n_components=None) Create a SlowFeatureAnalysis object. Parameters n_components : integer Amount of components to keep. fit(X) Fit the parameters of the model. The data should be centered (that is, its mean subtracted rowwise) and white (e.g. via pca.Pca) before using this method. Parameters X : list of array_like A list of sequences. Each entry is expected to be an array of shape (*, d) where * is the number of data points and may vary from item to item in the list. d is the input dimensionality and has to be consistent. Returns F : list of array_like List of sequences. Each item in the list is an array which corresponds to the sequence in X. It is of the same shape, except that d is replaced by n_components. transform(X) Transform data according to the model. Parameters X : array_like An array of shape (n, d) where n is the number of time steps and d the input dimensionality. Returns F : array_like An array of shape (n, c) where n is the number of time steps and c is the number of components kept. 2.6. Slow Feature Analysis 11 breze Documentation, Release 0.1 2.7 K-Means class breze.learn.kmeans.GainShapeKMeans(n_component, zscores=False, whiten=False, c_zca=1e-08, max_iter=10, random_state=None) GainShapeKMeans class to perform K-means clustering for feature learning as described in [LFRKM]. Parameters n_components : integer Number of features to learn. zscores : boolean, optional, default: False Flag indicating whether the data should be normalized to zero mean and unit variance before training and transformation. whiten : boolean, optional, default: False Flag indicating whether the data should be whitened before training and transformation. c_zca : float, optional, default: 1e-8 Small number that is added to each singular value during ZCA. max_iter : integer, optional Maximum number of iterations to perform. random_state : None, integer or numpy.RandomState, optional, default: None Generator to initialize the dictionary. If None, the numpy singleton generator is used. References [LFRKM] Attributes activation: {‘identity’, ‘omp-1’, ‘soft-threshold’}, optional, default: None threshold Activation to for transformation. ‘identity’ does not alter the output. ‘omp-1’ only retains the component with the largest absolute value. ‘soft-threshold’ only sets components below a certain threshold to zero, but separates positive and negative parts. (scalar,) Threshold used for soft-thresholding activation. Ignored if another activation is used. Methods fit(X) iter_fit(X) normalize_dict() prepare(n_inpt) transform(X[, activation]) Fit the parameters of the model. Normalize the columns of the dictionary to unit length. Initialize the models internal structures. Transform the data according to the dictionary. fit(X) Fit the parameters of the model. Parameters X : array_like 12 Chapter 2. Models and Algorithms breze Documentation, Release 0.1 Array of shape (n_samples, n_inpt) used for training. transform(X, activation=None) Transform the data according to the dictionary. Parameters X : array_like Input data of shape (n_samples, n_inpt). activation: {‘identity’, ‘omp-1’}, optional, default: None Activation to use. ‘linear’ does not alter the output. ‘omp-1’ only retains the component with the largest absolute value. ‘soft-threshold’ only sets components below a certain threshold to zero, but separates positive and negative parts. If None, .activation is used. 2.8 Regularized Information Maximization Regularized Information Maximization. As introduced in [R3]. 2.8.1 References class breze.learn.rim.Rim(n_inpt, n_cluster, bose=False) Class for regularized information maximization. c_rim, optimizer=’rprop’, max_iter=1000, ver- Attributes pa(ParamterSet object) Parameters of the model. rameters n_inpt (integer) Input dimensionality of the data. n_cluster (integer) Amount of clusters to use. c_rim (float) Value indicating the regularization strength. opti(string or pair) Can be either a string or a pair. In any case, climin.util.optimizer is used mizer to construct an optimizer. In the case of a string, the string is used as an identifier for the optimizer which is then instantiated with default arguments. If a pair, expected to be (identifier, kwargs) for more fine control of the optimizer. max_iter (integer) Maximum number of optimization iterations to perform. ver(boolean) Flag indicating whether to print out information during fitting. bose Methods fit(X[, W]) function(variables, exprs[, mode, ...]) iter_fit(X[, W, info_opt]) 2.8. Regularized Information Maximization Fit the parameters of the model. Return a compiled function for the given exprs given variables. Iteratively fit the parameters of the model to the given data. 13 breze Documentation, Release 0.1 powerfit(fit_data, eval_data, stop, report) score(X[, W]) transform(X) var_exp_for_gpu(variables, exprs[, outputs]) Table 2.8 – continued from previous page Iteratively fit the model. Return the score of the model given the input and targets. Return the feature representation of the model given X. Given variables and theano expressions built from these variables, return variable __init__(n_inpt, n_cluster, c_rim, optimizer=’rprop’, max_iter=1000, verbose=False) Create a Rim object. Parameters n_inpt : integer Input dimensionality of the data. n_cluster : integer Amount of clusters to use. c_rim : float Value indicating the regularization strength. optimizer : string or pair Can be either a string or a pair. In any case, climin.util.optimizer is used to construct an optimizer. In the case of a string, the string is used as an identifier for the optimizer which is then instantiated with default arguments. If a pair, expected to be (identifier, kwargs) for more fine control of the optimizer. max_iter : integer Maximum number of optimization iterations to perform. verbose : boolean Flag indicating whether to print out information during fitting. fit(X, W=None) Fit the parameters of the model. Parameters X – Array representing the samples. iter_fit(X, W=None, info_opt=None) Iteratively fit the parameters of the model to the given data. Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state after each iteration, so that the optimization can be broken any time by the caller. This method does not respect the max_iter attribute. Parameters X – Array representing the samples. transform(X) Return the feature representation of the model given X. Parameters X : array_like Represents the inputs to be transformed. Returns Y : array_like Transformation of X under the model. 14 Chapter 2. Models and Algorithms breze Documentation, Release 0.1 2.9 Stochastic Gradient Variational Bayes 2.9.1 Variational Autoencoder Denoising: 2.10 Linear Denoiser Module for the linear denoiser. class breze.learn.lde.LinearDenoiser(p_dropout) Class that represents linear denoisers. LinearDenoisers (LDEs) were later also named Marginalized Denoising AutoEncoders. Introduced in [R1]. References [R1] Methods fit(X) transform(X) Fit the parameters of the model. Transform data according to the model. __init__(p_dropout) Create a LinearDenoiser object. Parameters p_dropout : float Probability of an input being dropped out. fit(X) Fit the parameters of the model. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. transform(X) Transform data according to the model. Parameters X : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. Returns Y : array_like An array of shape (n, d) where n is the number of data points and d the input dimensionality. Supervised Learning 2.10. Linear Denoiser 15 breze Documentation, Release 0.1 2.11 Recurrent Neural Networks Module for learning various types of recurrent networks. class breze.learn.rnn.SupervisedRnn(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer=’identity’, loss=’squared’, pooling=None, gradient_clip=False, optimizer=’rprop’, batch_size=None, imp_weight=False, max_iter=1000, verbose=False) Attributes inpt loss output sample_dim target f_predict f_score gradient_clip_threshold mode Methods fit(X, Z[, imp_weight]) function(variables, exprs[, mode, ...]) initialize([par_std, par_std_affine, ...]) iter_fit(X, Z[, imp_weight, info_opt]) powerfit(fit_data, eval_data, stop, report) predict(X) score(X, Z[, imp_weight]) var_exp_for_gpu(variables, exprs[, outputs]) Fit the parameters of the model to the given data with the given error function. Return a compiled function for the given exprs given variables. Iteratively fit the parameters of the model to the given data with the given error fu Iteratively fit the model. Return the prediction of the model given the input. Return the score of the model given the input and targets. Given variables and theano expressions built from these variables, return variable fit(X, Z, imp_weight=None) Fit the parameters of the model to the given data with the given error function. Parameters • X – Array representing the inputs. • Z – Array representing the outputs. iter_fit(X, Z, imp_weight=None, info_opt=None) Iteratively fit the parameters of the model to the given data with the given error function. Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state after each iteration, so that the optimization can be broken any time by the caller. This method does not respect the max_iter attribute. Parameters • X – Array representing the inputs. 16 Chapter 2. Models and Algorithms breze Documentation, Release 0.1 • Z – Array representing the outputs. predict(X) Return the prediction of the model given the input. Parameters X : array_like Input to the model. Returns Y : array_like 2.12 Multilayer Perceptrons Module for learning various types of multilayer perceptrons. class breze.learn.mlp.Mlp(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss, imp_weight=False, optimizer=’adam’, batch_size=None, max_iter=1000, verbose=False) Multilayer perceptron class. This implementation uses a stack of affine mappings with a subsequent non linearity each. Parameters n_inpt : integer Dimensionality of a single input. n_hiddens : list of integers List of k integers, where k is thenumber of layers. Each gives the size of the corresponding layer. n_output : integer Dimensionality of a single output. hidden_transfers : list, each item either string or function Transfer functions for each of the layers. Can be either a string which is then used to look up a transfer function in breze.component.transfer or a function that given a Theano tensor returns a tensor of the same shape. out_transfer : string or function Either a string to look up a function in breze.component.transfer or a function that given a Theano tensor returns a tensor of the same shape. optimizer : string, pair Argument is passed to climin.util.optimizer to construct an optimizer. batch_size : integer, None Number of examples per batch when calculting the loss and its derivatives. None means to use all samples every time. imp_weight : boolean Flag indicating whether importance weights are used. max_iter : int Maximum number of optimization iterations to perform. Only respected during‘‘.fit()‘‘, not .iter_fit(). verbose : boolean 2.12. Multilayer Perceptrons 17 breze Documentation, Release 0.1 Flag indicating whether to print out information during fitting. Attributes inpt loss output target f_predict f_score gradient_clip_threshold mode Methods fit(X, Z[, imp_weight]) function(variables, exprs[, mode, ...]) iter_fit(X, Z[, imp_weight, info_opt]) powerfit(fit_data, eval_data, stop, report) predict(X) score(X, Z[, imp_weight]) var_exp_for_gpu(variables, exprs[, outputs]) Fit the parameters of the model to the given data with the given error function. Return a compiled function for the given exprs given variables. Iteratively fit the parameters of the model to the given data with the given error fu Iteratively fit the model. Return the prediction of the model given the input. Return the score of the model given the input and targets. Given variables and theano expressions built from these variables, return variable fit(X, Z, imp_weight=None) Fit the parameters of the model to the given data with the given error function. Parameters • X – Array representing the inputs. • Z – Array representing the outputs. iter_fit(X, Z, imp_weight=None, info_opt=None) Iteratively fit the parameters of the model to the given data with the given error function. Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state after each iteration, so that the optimization can be broken any time by the caller. This method does not respect the max_iter attribute. Parameters • X – Array representing the inputs. • Z – Array representing the outputs. predict(X) Return the prediction of the model given the input. Parameters X : array_like Input to the model. Returns Y : array_like 18 Chapter 2. Models and Algorithms breze Documentation, Release 0.1 class breze.learn.mlp.DropoutMlp(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss, p_dropout_inpt=0.2, p_dropout_hiddens=0.5, max_length=None, optimizer=’adam’, batch_size=None, max_iter=1000, verbose=False) Class representing an MLP that is trained with dropout [D1]. The gist of this method is that hidden units and input units are “zerod out” with a certain probability. References [D1] Attributes Same attributes as an Mlp object. p_dropout_inpt p_dropout_hidden max_length (float) Probability that an input unit is ommitted during a pass. (float) Probability that an input unit is ommitted during a pass. (float) Maximum squared length of a weight vector into a unit. After each update, the weight vectors will projected to be shorter. Methods fit(X, Z[, imp_weight]) function(variables, exprs[, mode, ...]) iter_fit(X, Z[, imp_weight, info_opt]) powerfit(fit_data, eval_data, stop, report) predict(X) score(X, Z[, imp_weight]) var_exp_for_gpu(variables, exprs[, outputs]) Fit the parameters of the model to the given data with the given error function. Return a compiled function for the given exprs given variables. Iteratively fit the parameters of the model to the given data with the given error fu Iteratively fit the model. Return the prediction of the model given the input. Return the score of the model given the input and targets. Given variables and theano expressions built from these variables, return variable __init__(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss, p_dropout_inpt=0.2, p_dropout_hiddens=0.5, max_length=None, optimizer=’adam’, batch_size=None, max_iter=1000, verbose=False) Create a DropoutMlp object. Parameters Same attributes as an ‘‘Mlp‘‘ object. p_dropout_inpt : float Probability that an input unit is ommitted during a pass. p_dropout_hiddens : list of floats List of which each item gives the probability that a hidden unit of that layer is omitted during a pass. fit(X, Z, imp_weight=None) Fit the parameters of the model to the given data with the given error function. Parameters • X – Array representing the inputs. • Z – Array representing the outputs. 2.12. Multilayer Perceptrons 19 breze Documentation, Release 0.1 iter_fit(X, Z, imp_weight=None, info_opt=None) Iteratively fit the parameters of the model to the given data with the given error function. Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state after each iteration, so that the optimization can be broken any time by the caller. This method does not respect the max_iter attribute. Parameters • X – Array representing the inputs. • Z – Array representing the outputs. predict(X) Return the prediction of the model given the input. Parameters X : array_like Input to the model. Returns Y : array_like class breze.learn.mlp.FastDropoutNetwork(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss, imp_weight=False, optimizer=’adam’, batch_size=None, p_dropout_inpt=0.2, p_dropout_hiddens=0.5, max_iter=1000, verbose=False) Class representing an MLP that is trained with fast dropout [FD]. This method employs a smooth approximation of dropout training. References [FD] Attributes Same attributes as an Mlp object. p_dropout_inpt p_dropout_hiddens inpt_var (float) Probability that an input unit is ommitted during a pass. (list of floats) Each item constitues the probability that a hidden unit of the corresponding layer is ommitted during a pass. (float) Assumed variance of the inputs. “quasi zero” per default. Methods fit(X, Z[, imp_weight]) function(variables, exprs[, mode, ...]) iter_fit(X, Z[, imp_weight, info_opt]) powerfit(fit_data, eval_data, stop, report) predict(X) score(X, Z[, imp_weight]) var_exp_for_gpu(variables, exprs[, outputs]) 20 Fit the parameters of the model to the given data with the given error function. Return a compiled function for the given exprs given variables. Iteratively fit the parameters of the model to the given data with the given error fu Iteratively fit the model. Return the prediction of the model given the input. Return the score of the model given the input and targets. Given variables and theano expressions built from these variables, return variable Chapter 2. Models and Algorithms breze Documentation, Release 0.1 __init__(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss, imp_weight=False, optimizer=’adam’, batch_size=None, p_dropout_inpt=0.2, p_dropout_hiddens=0.5, max_iter=1000, verbose=False) Create a FastDropoutMlp object. Parameters Same parameters as an ‘‘Mlp‘‘ object. p_dropout_inpt : float Probability that an input unit is ommitted during a pass. p_dropout_hidden : float Probability that an input unit is ommitted during a pass. max_length : float or None Maximum squared length of a weight vector into a unit. After each update, the weight vectors will projected to be shorter. If None, no projection is performed. fit(X, Z, imp_weight=None) Fit the parameters of the model to the given data with the given error function. Parameters • X – Array representing the inputs. • Z – Array representing the outputs. iter_fit(X, Z, imp_weight=None, info_opt=None) Iteratively fit the parameters of the model to the given data with the given error function. Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state after each iteration, so that the optimization can be broken any time by the caller. This method does not respect the max_iter attribute. Parameters • X – Array representing the inputs. • Z – Array representing the outputs. predict(X) Return the prediction of the model given the input. Parameters X : array_like Input to the model. Returns Y : array_like Sampling 2.13 Hybrid Monte Carlo breze.learn.sampling.hmc.sample(f_energy, f_energy_prime, position, n_steps, desired_accept=0.9, initial_step_size=0.01, step_size_grow=1.02, step_size_shrink=0.98, step_size_min=0.0001, step_size_max=0.25, avg_accept_slowness=0.9, sample_dim=0) Return a sample from the distribution given by f_energy. Parameters 2.13. Hybrid Monte Carlo 21 breze Documentation, Release 0.1 • f_energy – Log of a function proportional to the density. • f_energy_prime – Derivative of f_energy wrt to the current position. • position – An numpy array of any desired shape which represents multiple particles. • n_steps – Amount of steps to perform for the next sample. • desired_accept – Desired acceptance rate of the underlying Metropolis hastings. • initial_step_size – Initial size of a step along the energy landscape. • step_size_grow – If the acceptance rate is too high, increase the step size by this factor. • step_size_shrink – If the acceptance rate is too low, decrease the step size by this factor. • step_size_min – Don’t decrease the step size below this value. • step_size_max – Don’t increase the step size above this value. • avg_accept_slowness – When calculating the acceptance rate, use this value as a decay for an exponential average. • sample_dim – The axis which discriminates the different particles given in the position array from each other. Trainers 2.14 Trainers 2.14.1 Trainer module 2.14.2 Score module Module for various scoring strategies. breze.learn.trainer.score.simple(f_score, *data) Simple scoring strategy which just applies f_score to the passed arguments. class breze.learn.trainer.score.MinibatchScore(max_samples, sample_dims) MinibatchScore class. Scoring strategy for very large data sets, where the score of only a subset of rows can be calculated at the same time. This score assumes that scores are averages. Attributes max_samples (int) Maximum samples to calculcate the score for at the same time. sam(list of ints) Dimensions along which the samples are stored. The length of this list corresponds ple_dims to the number of arguments the score takes. The entry along which different samples are stored. Methods __call__(f_score, *data) 22 “Return the score of the data. Chapter 2. Models and Algorithms breze Documentation, Release 0.1 __init__(max_samples, sample_dims) Create MinibatchScore object. Parameters max_samples : int Maximum samples to calculcate the score for at the same time. sample_dims : list of ints Dimensions along which the samples are stored. The length of this list corresponds to the number of arguments the score takes. The entry along which different samples are stored. 2.14.3 Report module 2.14. Trainers 23 breze Documentation, Release 0.1 24 Chapter 2. Models and Algorithms CHAPTER 3 Helpers, convenience functions and tools 3.1 Feature extraction 3.1.1 Basic feature extraction breze.learn.feature.rbf(X, n_centers) Return a design matrix with features given by radial basis functions. n_centers Gaussian kernels are placed along data dimension, equidistant between the minimum and the maximum along that dimension. The result then contains one column for each of the Kernels. Parameters • X – NxD sized array. • n_centers – Amount of Kernels to use for each dimension. Returns Nx(n_centers * D) sized array. 3.1.2 Feature extraction for EMG and similar time series data Module that holds various preprocessing routines for emg signals. breze.learn.feature.emg.integrated(X) Return the sum of the absolute values of a signal. Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. Returns An (n, d) array. breze.learn.feature.emg.mean_absolute_value(X) Return the mean absolute value of the signal. Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. Returns An (n, d) array. breze.learn.feature.emg.modified_mean_absolute_value_1(X) Return a weighted version of the mean absolute value. Instead of equal weight, the first and last quarter of the signal are only weighed half. 25 breze Documentation, Release 0.1 breze.learn.feature.emg.modified_mean_absolute_value_2(X) Return a weighted version of the mean absolute value. The central half of the signal has weight one. The beginning and the last quarter increase/decrease their weight towards that. Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. Returns An (n, d) array. breze.learn.feature.emg.mean_absolute_value_slope(X) Return the first derivative of the mean absolute value. Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. Returns An (n, d) array. breze.learn.feature.emg.variance(X) Return the variance of the signals. Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. Returns An (n, d) array. breze.learn.feature.emg.root_mean_square(X) Return the root mean square of the signals. Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. Returns An (n, d) array. breze.learn.feature.emg.zero_crossing(X, threshold=1e-08) Return the amount of times the signal crosses the zero y-axis. Parameters • X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. • threshold – Changes below this value are ignored. Useful to surpress noise. Returns An (n, d) array. breze.learn.feature.emg.slope_sign_change(X, threshold=1e-08) Return the amount of times the signal changes slope. Parameters • X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. • threshold – Changes below this value are ignored. Useful to surpress noise. Returns An (n, d) array. breze.learn.feature.emg.willison_amplitude(X, threshold=1e-08) Return the amount of times the difference between two adjacent emg segments exceeds a threshold. Parameters • X – An (t, n, d) array where t is the number of time steps, n is the number of different signals and d is the number of channels. 26 Chapter 3. Helpers, convenience functions and tools breze Documentation, Release 0.1 • threshold – Changes below this value are ignored. Useful to surpress noise. Returns An (n, d) array. 3.2 Data manipulation Module for manipulating data. breze.learn.data.shuffle(data) Shuffle the first dimension of an indexable object in place. breze.learn.data.padzeros(lst, front=True, return_mask=False) Given a list of arrays, pad every array with up front zeros until they reach unit length. Each element of lst can have a different first dimension, but has to be equal on the other dimensions. breze.learn.data.collapse_seq_borders(arr) Given an array of ndim 3, return a view of ndim 2 where the first dimension is flattened out. breze.learn.data.uncollapse_seq_borders(arr, shape) Return a view of ndim 3, given an array of ndim 2, where the first dimension is expanded to 2 dimensions of the given shape. breze.learn.data.skip(X, n, d=1) Return an array X with the same number of rows, but only each n‘th block of d consecutive columns is kept. Crude way of reducing the dimensionality of time series. breze.learn.data.interleave(lst) Given a list of arrays, interleave the arrays in a way that the first dimension represents the first dimension of every array. This is useful for time series, where multiple time series should be processed in a single swipe. breze.learn.data.uninterleave(lst) Given an array of interleaved arrays, return an uninterleaved version of it. breze.learn.data.interpolate(X, n_intermediates, kind=’linear’) Given an array of shape (j, k), return an array of size (j * n_intermediates, k) where each i * n_intermediated element refers to the i’th element in X while all the others are linearly interpolated. breze.learn.data.windowify(X, size, offset=1) Return a static array that represents a sliding window dataset of size size given by the list of arrays ‘. breze.learn.data.iter_windows(X, size, offset=1) Return an iterator that goes over a sequential dataset with a sliding time window. X is expected to be a list of arrays, where each array represents a sequence along its first axis. breze.learn.data.split(X, maxlength) Return a list of sequences where each sequence has a length of at most maxlength. Given a list of sequences X, the sequences are split accordingly. breze.learn.data.collapse(X, n) Return a list of sequences, where n consecutive timesteps have been collapsed into a single timestep by concatenation for each sequence. Timesteps are cut off to ensure divisibility by n. breze.learn.data.uncollapse(X, n) Return a list of sequences, where each timestep is divided into n consecutive timesteps. 3.2. Data manipulation 27 breze Documentation, Release 0.1 breze.learn.data.consecutify(seqs) Given sequences of equal second dimension, put them into a consecutive memory block M and return it. Also return a list of views to that block that represent the given sequences. 3.3 Various utilities This function was taken from the deeplearning tutorials. The copyrght notice is in the source. 3.4 Helpers for plotting data breze.learn.display.scatterplot_matrix(X, C=None, symb=’o’, alpha=1, fig=None) Return a figure containig a scatter plot matrix. This is a useful tool for inspecting multi dimensional data. Each dimension will be plotted against each dimension as a scatter plot, arranged into a matrix. The diagonal will contain histograms. Parameters X : array_like 2D array containing the points to plot. C : array_like Class labels (optional). Each row of X with the same value in C will be given the same color in the plots. symb : string Symbol to use for plotting. Will be forwarded to pylab.plot. alpha : float Between 0 and 1. Transparency of the points, where 1 means fully opaque. fig : matplotlib.pyplot.Figure or None Figure to plot into. If None, will be created itself. breze.learn.display.time_series_filter_plot(filters, n_rows=None, fig=None) Plot filters for time series data. n_cols=None, Each filter is plotted into its own axis. Parameters filters : array_like The argument filters is expected to be an array of shape (n_filters, window_size, n_channels). n_filters is the number of filter banks, window_size is the length of a time window and n_channels is the number of different sensors. n_rows : int, optional, default: None Number of rows for the plot. If not given, inferred from n_cols to match dimensions. If n_cols is not given as well, both are taken to be roughly the square root of the number of filters. n_cols : int, optional, default: None 28 Chapter 3. Helpers, convenience functions and tools breze Documentation, Release 0.1 Number of rows for the plot. If not given, inferred from n_rows to match dimensions. If n_rows is not given as well, both are taken to be roughly the square root of the number of filters. fig : Figure, optional Figure to plot the axes into. If not given, a new one is created. Returns figure : matplotlib figre Figure object to save or plot. The following function was adapted from the scipy cookbook. breze.learn.display.hinton(ax, W, max_weight=None) Draws a Hinton diagram for the matrix W to axis ax. 3.4. Helpers for plotting data 29 breze Documentation, Release 0.1 30 Chapter 3. Helpers, convenience functions and tools CHAPTER 4 Architectures, Components 4.1 Norms Module containing various norms. breze.arch.component.norm.l2(arr, axis=None) Return the L2 norm of a tensor. Parameters arr : Theano variable. The variable to calculate the norm of. axis : integer, optional [default: None] The sum will be performed along this axis. This makes it possible to calculate the norm of many tensors in parallel, given they are organized along some axis. If not given, the norm will be computed for the whole tensor. Returns res : Theano variable. If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension less, where the missing dimension corresponds to axis. Examples >>> v = T.vector() >>> this_norm = l2(v) >>> m = T.matrix() >>> this_norm = l2(m, axis=1) >>> m = T.matrix() >>> this_norm = l2(m) breze.arch.component.norm.l1(arr, axis=None) Return the L1 norm of a tensor. Parameters arr : Theano variable. The variable to calculate the norm of. axis : integer, optional [default: None] 31 breze Documentation, Release 0.1 The sum will be performed along this axis. This makes it possible to calculate the norm of many tensors in parallel, given they are organized along some axis. If not given, the norm will be computed for the whole tensor. Returns res : Theano variable. If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension less, where the missing dimension corresponds to axis. Examples >>> v = T.vector() >>> this_norm = l1(v) >>> m = T.matrix() >>> this_norm = l1(m, axis=1) >>> m = T.matrix() >>> this_norm = l1(m) breze.arch.component.norm.soft_l1(inpt, eps=1e-08, axis=None) Return a “soft” L1 norm of a tensor. √ The term “soft” is used because we are using π₯2 + π in favor of |π₯| which is not smooth at π₯ = 0. Parameters arr : Theano variable. The variable to calculate the norm of. eps : float, optional [default: 1e-8] Small offset to make the function more smooth. axis : integer, optional [default: None] The sum will be performed along this axis. This makes it possible to calculate the norm of many tensors in parallel, given they are organized along some axis. If not given, the norm will be computed for the whole tensor. Returns res : Theano variable. If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension less, where the missing dimension corresponds to axis. Examples >>> v = T.vector() >>> this_norm = soft_l1(v) >>> m = T.matrix() >>> this_norm = soft_l1(m, axis=1) >>> m = T.matrix() >>> this_norm = soft_l1(m) breze.arch.component.norm.lp(inpt, p, axis=None) Return the Lp norm of a tensor. Parameters arr : Theano variable. 32 Chapter 4. Architectures, Components breze Documentation, Release 0.1 The variable to calculate the norm of. p : Theano variable or float. Order of the norm. axis : integer, optional [default: None] The sum will be performed along this axis. This makes it possible to calculate the norm of many tensors in parallel, given they are organized along some axis. If not given, the norm will be computed for the whole tensor. Returns res : Theano variable. If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension less, where the missing dimension corresponds to axis. Examples >>> v = T.vector() >>> this_norm = lp(v, .5) >>> m = T.matrix() >>> this_norm = lp(m, 3, axis=1) >>> m = T.matrix() >>> this_norm = lp(m, 4) 4.2 Transfer functions Module that keeps various transfer functions as used in the context of neural networks. breze.arch.component.transfer.tanh(inpt) Tanh activation function. Parameters inpt : Theano variable Input to be transformed. Returns output : Theano variable Transformed output. Same shape as inpt. breze.arch.component.transfer.tanhplus(inpt) Tanh with added linear activation function. π (π₯) = π‘ππβ(π₯) + π₯ Parameters inpt : Theano variable Input to be transformed. Returns output : Theano variable Transformed output. Same shape as inpt. breze.arch.component.transfer.sigmoid(inpt) Sigmoid activation function. π (π₯) = 4.2. Transfer functions 1 1 + exp(−π₯) 33 breze Documentation, Release 0.1 Parameters inpt : Theano variable Input to be transformed. Returns output : Theano variable Transformed output. Same shape as inpt. breze.arch.component.transfer.rectifier(inpt) Rectifier activation function. π (π₯) = max(0, π₯) Parameters inpt : Theano variable Input to be transformed. Returns output : Theano variable Transformed output. Same shape as inpt. breze.arch.component.transfer.softplus(inpt) Soft plus activation function. Smooth approximation to rectifier. π (π₯) = log(1 + exp(π₯)) Parameters inpt : Theano variable Input to be transformed. Returns output : Theano variable Transformed output. Same shape as inpt. breze.arch.component.transfer.softsign(inpt) Softsign activation function. π (π₯) = π₯ 1 + |π₯| Parameters inpt : Theano variable Input to be transformed. Returns output : Theano variable Transformed output. Same shape as inpt. breze.arch.component.transfer.softmax(inpt) Softmax activation function. exp(π₯π ) π (π₯π ) = ∑οΈ π exp(π₯π ) Here, the index runs over the columns of inpt. Numerical stable version that subtracts the maximum of each row from all of its entries. Wrapper for theano.nnet.softmax. Parameters inpt : Theano variable Array of shape (n, d). Input to be transformed. Returns output : Theano variable Transformed output. Same shape as inpt. 34 Chapter 4. Architectures, Components breze Documentation, Release 0.1 4.3 Loss functions Module containing several losses usable for supervised and unsupervised training. A loss is of the form: def loss(target, prediction, ...): ... The results depends on the exact nature of the loss. Some examples are: • coordinate wise loss, such as a sum of squares or a Bernoulli cross entropy with a one-of-k target, • sample wise, such as neighbourhood component analysis. In case of the coordinate wise losses, the dimensionality of the result should be the same as that of the predictions and targets. In all other cases, it is important that the sample axes (usually the first axis) stays the same. The individual data points lie along the coordinate axis, which might change to 1. Some examples of valid shape transformations: (n, d) -> (n, d) (n, d) -> (n, 1) These are not valid: (n, d) -> (1, d) (n, d) -> (n,) For some examples, consult the source code of this module. breze.arch.component.loss.squared(target, prediction) Return the element wise squared loss between the target and the prediction. Parameters target : Theano variable An array of arbitrary shape representing representing the targets. prediction : Theano variable An array of arbitrary shape representing representing the predictions. Returns res : Theano variable An array of the same shape as target and prediction representing the pairwise distances. breze.arch.component.loss.absolute(target, prediction) Return the element wise absolute difference between the target and the prediction. Parameters target : Theano variable An array of arbitrary shape representing representing the targets. prediction : Theano variable An array of arbitrary shape representing representing the predictions. Returns res : Theano variable An array of the same shape as target and prediction representing the pairwise distances. breze.arch.component.loss.cat_ce(target, prediction, eps=1e-08) Return the cross entropy between the target and the prediction, where prediction is a summary of the statistics of a categorial distribution and target is a some outcome. 4.3. Loss functions 35 breze Documentation, Release 0.1 Used for multiclass classification purposes. The loss is different to ncat_ce by that target is not an array of integers but a hot k coding. Note that predictions are clipped between eps and 1 - eps to ensure numerical stability. Parameters target : Theano variable An array of shape (n, k) where n is the number of samples and k is the number of classes. Each row represents a hot k coding. It should be zero except for one element, which has to be exactly one. prediction : Theano variable An array of shape (n, k). Each row is interpreted as a categorical probability. Thus, each row has to sum up to one and be non-negative. Returns res : Theano variable. An array of the same size as target and prediction representing the pairwise divergences. breze.arch.component.loss.ncat_ce(target, prediction) Return the cross entropy between the target and the prediction, where prediction is a summary of the statistics of the categorical distribution and target is a some outcome. Used for classification purposes. The loss is different to cat_ce by that target is not a hot k coding but an array of integers. Parameters target : Theano variable An array of shape (n,) where n is the number of samples. Each entry of the array should be an integer between 0 and k-1, where k is the number of classes. prediction : Theano variable An array of shape (n, k) or (t, n , k). Each row (i.e. entry in the last dimension) is interpreted as a categorical probability. Thus, each row has to sum up to one and be non-negative. Returns res : Theano variable An array of shape (n, 1) as target containing the log probability that that example is classified correctly. breze.arch.component.loss.bern_ces(target, prediction) Return the Bernoulli cross entropies between binary vectors target and a number of Bernoulli variables prediction. Used in regression on binary variables, not classification. Parameters target : Theano variable An array of shape (n, k) where n is the number of samples and k is the number of outputs. Each entry should be either 0 or 1. prediction : Theano variable. An array of shape (n, k). Each row is interpreted as a set of statistics of Bernoulli variables. Thus, each element has to lie in (0, 1). Returns res : Theano variable An array of the same size as target and prediction representing the pairwise divergences. 36 Chapter 4. Architectures, Components breze Documentation, Release 0.1 breze.arch.component.loss.bern_bern_kl(X, Y) Return the Kullback-Leibler divergence between Bernoulli variables represented by their sufficient statistics. Parameters X : Theano variable An array of arbitrary shape where each element represents the statistic of a Bernoulli variable and thus should lie in (0, 1). Y : Theano variable An array of the same shape as target where each element represents the statistic of a Bernoulli variable and thus should lie in (0, 1). Returns res : Theano variable An array of the same size as target and prediction representing the pairwise divergences. breze.arch.component.loss.ncac(target, embedding) Return the NCA for classification loss. This corresponds to the probability that a point is correctly classified with a soft knn classifier using leave-oneout. Each neighbour is weighted according to an exponential of its negative Euclidean distance. Afterwards, a probability is calculated for each class depending on the weights of the neighbours. For details, we refer you to ‘Neighbourhood Component Analysis’ by J Goldberger, S Roweis, G Hinton, R Salakhutdinov (2004). Parameters target : Theano variable An array of shape (n,) where n is the number of samples. Each entry of the array should be an integer between 0 and k - 1, where k is the number of classes. embedding : Theano variable An array of shape (n, d) where each row represents a point in‘‘d‘‘-dimensional space. Returns res : Theano variable Array of shape (n, 1) holding a probability that a point is classified correclty. breze.arch.component.loss.ncar(target, embedding) Return the NCA for regression loss. This is similar to NCA for classification, except that not soft KNN classification but regression performance is maximized. (Actually, the negative performance is minimized.) For details, we refer you to ‘Pose-sensitive embedding by nonlinear nca regression’ by Taylor, G. and Fergus, R. and Williams, G. and Spiro, I. and Bregler, C. (2010) Parameters target : Theano variable An array of shape (n, d) where n is the number of samples and d the dimensionalty of the target space. embedding : Theano variable An array of shape (n, d) where each row represents a point in d-dimensional space. Returns res : Theano variable Array of shape (n, 1). 4.3. Loss functions 37 breze Documentation, Release 0.1 breze.arch.component.loss.drlim(push_margin, pull_margin, push_loss=’squared’, pull_loss=’squared’) Return a function that implements the c_contrastive, ‘Dimensionality reduction by learning an invariant mapping’ by Hadsell, R. and Chopra, S. and LeCun, Y. (2006). For an example of such a function, see drlim1 with a margin of 1. Parameters push_margin : Float The minimum margin that negative pairs should be seperated by. Pairs seperated by higher distance than push_margin will not contribute to the loss. pull_margin: Float The maximum margin that positive pairs may be seperated by. Pairs seperated by lower distances do not contribute to the loss. c_contrastive : Float Coefficient to weigh the contrastive term relative to the positive term push_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’ Loss to encourage Euclidean distances between non pairs. pull_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’ Loss to punish Euclidean distances between pairs. Returns loss : callable Function that takes two arguments, a target and an embedding. 4.4 Stochastic Corruption of Theano Variables This module contains functionality to corrupt Theano variables with noise. breze.arch.component.corrupt.gaussian_perturb(arr, std, rng=None) Return a Theano variable which is perturbed by additive zero-centred Gaussian noise with standard deviation std. Parameters arr : Theano variable Array of some shape n. std : float or scalar Theano variable Standard deviation of the Gaussian noise. rng : Theano random number generator, optional [default: None] Generator to draw random numbers from. If None, rng will be instantiated on the spot. Returns res : Theano variable Of shape n. 38 Chapter 4. Architectures, Components breze Documentation, Release 0.1 Examples >>> m = T.matrix() >>> c = gaussian_perturb(m, 0.1) breze.arch.component.corrupt.mask(arr, p, rng=None) Return a Theano variable which is with elements of it set to zero with probability p. Parameters arr : Theano variable Array of some shape n. p : float or scalar Theano variable Probability that a unit is set to zero. rng : Theano random number generator, optional [default: None] Generator to draw random numbers from. If None, rng will be instantiated on the spot. Returns res : Theano variable Of shape n. Examples >>> m = T.matrix() >>> c = mask(m, 0.1) 4.5 Miscellaneous functionality Module holding miscellaneous functionality. breze.arch.component.misc.pairwise_diff(X, Y=None) Given two arrays with samples in the row, compute the pairwise differences. Parameters X : Theano variable Has shape (n, d). Contains one item per first dimension. Y : Theano variable, optional [default: None] Has shape (m, d). If not given, defaults to X. Returns res : Theano variable Has shape (n, d, m). breze.arch.component.misc.distance_matrix(X, Y=None, norm=<function l2>) Return an expression containing the distances given the norm of up to two arrays containing samples. Parameters X : Theano variable Has shape (n, d). Contains one item per first dimension. Y : Theano variable, optional [default: None] Has shape (m, d). If not given, defaults to X. norm : string or callable 4.5. Miscellaneous functionality 39 breze Documentation, Release 0.1 Either a string pointing at a function in breze.arch.component.norm or a function that has the same signature as these. Returns res : Theano variable Has shape (n, m). breze.arch.component.misc.distance_matrix_by_diff(diff, norm=<function l2>) Return an expression containing the distances given the norm norm arrays containing samples. Parameters D : Theano variable Has shape (n, d, m) and represents differences between two collections of the same set. norm : string or callable Either a string pointing at a function in breze.arch.component.norm or a function that has the same signature as these. Returns res : Theano variable Has shape (n, m). breze.arch.component.misc.cat_entropy(arr) Return the entropy of categorical distributions described by the rows in arr. Parameters arr : Theano variable Array of shape (n, d) describing n different categorical variables. Rows need to sum up to 1 and be non-negative. Returns res : theano variable Has shape (n,). breze.arch.component.misc.project_into_l2_ball(arr, radius=1) Return arr projected into the L2 ball. Parameters arr : Theano variable Array of shape either (n, d) or (d,). If the former, all rows are projected individually. radius : float, optional [default: 1] Returns res : Theano variable Projected result of the same shape as arr. 4.6 Layers Module that contains various layer like components. breze.arch.component.layer.simple(inpt, weights, bias, out_transfer, p_dropout=0, prefix=’‘) Return a dictionary containing computations from a simple layer. The layer has the following form π ((π₯ · π)π π + π), where π corresponds to transfer, π₯ to input, · indicates the element-wise product, π is a vector of Bernoulli samples with parameter p_dropout, π is the weight matrix weights and π is the bias. 40 Chapter 4. Architectures, Components breze Documentation, Release 0.1 Parameters inpt : Theano variable Array of shape (n, d). weights : Theano variable Array of shape (d, e). bias : Theano variable Array of shape (e,). transfer : function or string If a function should given a Theano variable return a Theano variable of the same shape. If string, is used to get a transfer function from breze.arch.component.transfer. p_dropout : Theano scalar or float Needs to be in (0, 1). Indicates the probability that an input is set to zero. prefix : string, optional [default: ‘’] Each enty in the returned dictionary will be prefixed with this. Returns d : dict Has the following entries: output_in, activation before application of transfer. output, activation after application of transfer. 4.7 Common functions Module that contains functionality common to many other modules. breze.arch.component.common.supervised_loss(target, prediction, loss, coord_axis=1, imp_weight=False, prefix=’‘) Return a dictionary populated with several expressions for a supervised loss and corresponding targets and predictions. Parameters target : Theano variable Array representing the target variables. prediction : Theano variable Array representing the predictions. loss : callable or string If a string, should index a member of breze.arch.component.loss. If a callable, has to be a of the form described in breze.arch.component.loss. coord_axis : integer, optional [default: 1] Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or some spatial axis. prefix : string, optional [default: ‘’] Each key in the resulting dictionary will be prefixed with prefix. imp_weight : Theano variable, float or boolean, optional [default: False] Importance weights for the loss. Will be multiplied to the coordinate wise loss. 4.7. Common functions 41 breze Documentation, Release 0.1 Returns res : dict Dictionary containing the expressions. See example for keys. Examples >>> import theano.tensor as T >>> prediction, target = T.matrix('prediction'), T.matrix('target') >>> from breze.arch.component.loss import squared >>> loss_dict = supervised_loss(target, prediction, squared, ... prefix='mymodel-') >>> sorted(loss_dict.items()) [('mymodel-loss', ...), ('mymodel-loss_coord_wise', ...), ('mymodel-loss_sample_wise', ...), ('m breze.arch.component.common.unsupervised_loss(output, loss, coord_axis=1, prefix=’‘) Return a dictionary populated with several expressions for a unsupervised loss and corresponding output. Parameters output : Theano variable Array representing the predictions. loss : callable or string If a string, should index a member of breze.arch.component.loss. If a callable, has to be a of the form described in breze.arch.component.loss. coord_axis : integer, optional [default: 1] Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or some spatial axis. prefix : string, optional [default: ‘’] Each key in the resulting dictionary will be prefixed with prefix. Returns res : dict Dictionary containing the expressions. See example for keys. Examples >>> import theano.tensor as T >>> output = T.matrix('output') >>> my_loss = lambda x: abs(x) >>> loss_dict = unsupervised_loss(output, my_loss, prefix='$') >>> sorted(loss_dict.items()) [('$loss', ...), ('$loss_coord_wise', ...), ('$loss_sample_wise', ...), ('$output', ...)] 4.8 Univariate Normal Distribution breze.arch.component.distributions.normal.pdf(sample, location=0, scale=1) Return a theano expression representing the values of the probability density function of a Gaussian distribution. Parameters sample : Theano variable Array of shape (n,) where n is the number of samples. location : Theano variable 42 Chapter 4. Architectures, Components breze Documentation, Release 0.1 Scalar representing the mean of the distribution. scale : Theano variable Scalar representing the standard deviation of the distribution. Returns l : Theano variable Array of shape (n,) where each entry represents the density of the corresponding sample. Examples >>> >>> >>> >>> >>> >>> >>> import theano import theano.tensor as T import numpy as np from breze.learn.utils import theano_floatx sample, mean, std = T.vector(), T.scalar(), T.scalar() p = pdf(sample, mean, std) f_p = theano.function([sample, mean, std], p) >>> X, = theano_floatx(np.array([-1, 0, 1])) >>> ps = f_p(X, 0.1, 1.2) >>> np.allclose(ps, [0.21840613, 0.33129956, True 0.25094786]) breze.arch.component.distributions.normal.cdf(sample, location=0, scale=1) Return a theano expression representing the values of the cumulative density function of a Gaussian distribution. Parameters sample : Theano variable Array of shape (n,) where n is the number of samples. location : Theano variable Scalar representing the mean of the distribution. scale : Theano variable Scalar representing the standard deviation of the distribution. Returns l : Theano variable Array of shape (n,) where each entry represents the cumulative density of the corresponding sample. Examples >>> >>> >>> >>> >>> >>> >>> import theano import theano.tensor as T import numpy as np from breze.learn.utils import theano_floatx sample, mean, std = T.vector(), T.scalar(), T.scalar() c = cdf(sample, mean, std) f_c = theano.function([sample, mean, std], c) >>> X, = theano_floatx(np.array([-1, 0, 1])) >>> cs = f_c(X, 0.1, 1.2) >>> np.allclose(cs, [0.17965868, 0.46679324, 0.77337265]) True 4.8. Univariate Normal Distribution 43 breze Documentation, Release 0.1 4.9 Multivariate Normal Distribution Module containing expression buildes for the multivariate normal. breze.arch.component.distributions.mvn.pdf(sample, mean, cov) Return a theano expression representing the values of the probability density function of the multivariate normal. Parameters sample : Theano variable Array of shape (n, d) where n is the number of samples and d the dimensionality of the data. mean : Theano variable Array of shape (d,) representing the mean of the distribution. cov : Theano variable Array of shape (d, d) representing the covariance of the distribution. Returns l : Theano variable Array of shape (n,) where each entry represents the density of the corresponding sample. Examples >>> >>> >>> >>> >>> >>> >>> >>> >>> import theano import theano.tensor as T import numpy as np from breze.learn.utils import theano_floatx sample = T.matrix('sample') mean = T.vector('mean') cov = T.matrix('cov') p = pdf(sample, mean, cov) f_p = theano.function([sample, mean, cov], p) >>> mu = np.array([-1, 1]) >>> sigma = np.array([[.9, .4], [.4, .3]]) >>> X = np.array([[-1, 1], [1, -1]]) >>> mu, sigma, X = theano_floatx(mu, sigma, X) >>> ps = f_p(X, mu, sigma) >>> np.allclose(ps, [4.798702e-01, 7.73744047e-17]) True breze.arch.component.distributions.mvn.logpdf(sample, mean, cov) Return a theano expression representing the values of the log probability density function of the multivariate normal. Parameters sample : Theano variable Array of shape (n, d) where n is the number of samples and d the dimensionality of the data. mean : Theano variable Array of shape (d,) representing the mean of the distribution. cov : Theano variable Array of shape (d, d) representing the covariance of the distribution. 44 Chapter 4. Architectures, Components breze Documentation, Release 0.1 Returns l : Theano variable Array of shape (n,) where each entry represents the log density of the corresponding sample. Examples >>> >>> >>> >>> >>> >>> >>> >>> >>> import theano import theano.tensor as T import numpy as np from breze.learn.utils import theano_floatx sample = T.matrix('sample') mean = T.vector('mean') cov = T.matrix('cov') p = logpdf(sample, mean, cov) f_p = theano.function([sample, mean, cov], p) >>> mu = np.array([-1, 1]) >>> sigma = np.array([[.9, .4], [.4, .3]]) >>> X = np.array([[-1, 1], [1, -1]]) >>> mu, sigma, X = theano_floatx(mu, sigma, X) >>> ps = f_p(X, mu, sigma) >>> np.allclose(ps, np.log([4.798702e-01, 7.73744047e-17])) True 4.10 Utilities class breze.arch.util.Model Model class. Intended as a base class for parameterized models providing a convenience method for compilation and a common interface. We partition Theano variables for parametrized models in three groups. (1) The adaptable parameters, (2) external variables such as inputs and targets, the data (3) expressions composed out of the two, such as the prediction of a model or the loss resulting from those. There are several “reserved” names for expressions. •inpt: observations of a supervised or unsupervised model, •target: desired outputs of a supervised model, •loss: quantity to be optimized for fitting the parameters; might not refer to the criterion of interest, but instead to a regularzied objective. •true_loss: Quantity of interest for the user, e.g. the loss without regularization or the empirical risk. Overriding these names is possible in general, but is part of the interface and will lead to unexpected behaviour with functionality building upon this. Lookup of variables and expressions is typically done in the following ways. •as the variable/expression itself, •as a string which is the attribute/key to look for in the ParameterSet 4.10. Utilities 45 breze Documentation, Release 0.1 object/expression dictinary, - as a path along theese, e.g. the tuple (’foo’, ’bar’, 0) will identify .parameters.foo.bar[0] or .parameters[’foo’][’bar’][0] depending on the context. Attributes pars exprs updates (ParameterSet object) Holding the adaptable parameters of the object. (dictionary) Containig the expressions. Out of convenience, the external variables are held in here as well. (dict) Containing update variables, e.g. due to the use of theano.scan. Methods function var_exp_for_gpu function(variables, exprs, mode=None, explicit_pars=False, givens=None, on_unused_input=’raise’, numpy_result=False) Return a compiled function for the given exprs given variables. Parameters variables : list of strings Each string refers to an item in .exprs and is considered an input to the function. exprs : (List of) Theano expression or string Expressions for which to create the function. If a single expression is given, the function will return a single value; if a list is given, the result will be a tuple containing one element for each. An expression can either be a Theano expression or a string. In the latter case, the corresponding expression will be retrieved from .exprs. mode : string or None, optional, default: None Mode to use for compilation. Passed on to theano.function. See Theano documentation for details. If None, self.mode will be used. explicit_pars: boolean, optional, default: False If True, the first argument to the function is expected to be an array representing the adaptable parameters of the model. givens : dictionary, optional, default: None Dictionary of substitutions for compilation. Not passed on to theano.function, instead the expressions are cloned. See code for further details. on_unused_input: string Specifiy behaviour in case of unused inputs. Passed on to theano.function. See Theano documentation for details. numpy_result : boolean, optional, default: False If set to True, a numpy array is always returned, even if the computation is done on the GPU and a gnumpy array was more natural. 46 Chapter 4. Architectures, Components breze Documentation, Release 0.1 var_exp_for_gpu(variables, exprs, outputs=True) Given variables and theano expressions built from these variables, return variables and expressions of the same form that are tailored towards GPU usage. class breze.arch.util.ParameterSet ParameterSet class. This class provides functionality to group several Theano tensors of different sizes in a consecutive chunk of memory. The main aim of this is to allow a view on several tensors as a single long vector. In the following, a (parameter) array refers to a concrete instantiation of a parameter variable (with concrete values) while a (parameter) tensor/variable refers to the symbolic Theano variable. Initialization takes a variable amount of keyword arguments, where each has to be a single integer or a tuple of arbitrary length containing only integers. For each of the keyword argument keys a tensor of the shape given by the value will be created. The key is the identifier of that variable. All symbolic variables can be accessed as attributes of the object, all concrete variables as keys. E.g. parameter_set.x references the symbolic variable, while parameter_set[’x’] will give you the concrete array. Attributes n_pars (integer) Total amount of parameters. flat (Theano vector) Flat one dimensional tensor containing all the different tensors flattened out. Symbolic pendant to data. data (array_like) Concrete array containig all the different arrays flattened out. Concrete pendant to flat. views (dict) All parameter arrays can be accessed by with their identifier as key in this dictionary. Methods alloc declare view 4.10.1 Nested Lists for Theano, etc. breze.arch.util.flatten(nested) Flatten nested tuples and/or lists into a flat list. breze.arch.util.unflatten(tmpl, flat) Nest the items in flat into the shape of tmpl. breze.arch.util.theano_function_with_nested_exprs(variables, exprs, *args, **kwargs) Creates and returns a theano.function that takes values for variables as arguments, where variables‘ may contain nested lists and/or tuples, and returns values for ‘‘exprs, where again exprs may contain nested lists and/or tuples. All other arguments are passed to theano.function without modification. breze.arch.util.theano_expr_bfs(expr) Generator function to walk a Theano expression graph in breadth first. breze.arch.util.tell_deterministic(expr) Return True iff no random number generator is in the expression graph. 4.10. Utilities 47 breze Documentation, Release 0.1 4.10.2 GPU related utilities breze.arch.util.cpu_tensor_to_gpu(tensor) Given a tensor for the CPU return a tensor of the same type and name for the GPU. breze.arch.util.cpu_tensor_to_gpu_nested(inpts, cache=None) Given a list (of lists of...) CPU tensor variables return as list of the same types of corresponding GPU tensor varaibles. Also return a dictionary containing all substitutions done. This can be provided to future calls to not make conversions multiple times. breze.arch.util.cpu_expr_to_gpu(expr, unsafe=False) Given a CPU expr return the same expression for the GPU. If unsafe is set to True, subsequent function calls evaluating the expression might return arrays pointing at the same memory region. breze.arch.util.cpu_expr_to_gpu_nested(inpts, unsafe=False) Given a list (of lists of...) expressions, return expressions for the GPU. If unsafe is set to True, subsequent function calls evaluating the expression might return arrays pointing at the same memory region. breze.arch.util.garray_to_cudandarray_nested(lst) breze.arch.util.gnumpy_func_wrap(f ) Wrap a function that accepts and returns CudaNdArrays to accept and return gnumpy arrays. 4.10.3 Other breze.arch.util.get_named_variables(dct, name=True, overwrite=False, prefix=’‘) Return a dictionary with all the items from dct with only Theano variables/expressions. If name is set to True, the variables will be named accordingly, however not be overwritten unless overwrite is True as well. breze.arch.util.lookup(what, where, default=None) Return where.what if what is a string, otherwise what. If not found return default. breze.arch.util.lookup_some_key(what, where, default=None) Given a list of keys what, return the first of those to which there is an item in where. If nothing is found, return default. For variance propagation: 4.11 Common functions breze.arch.component.varprop.common.supervised_loss(target, prediction, loss, coord_axis=1, imp_weight=False, prefix=’‘) Return a dictionary populated with several expressions for a supervised loss and corresponding targets and predictions. Version for variance propagation, where the prediction is not only a point but a mean with a variance. Parameters target : Theano variable 48 Chapter 4. Architectures, Components breze Documentation, Release 0.1 Array representing the target variables. coord_axis. Has size d along the coordinate axis prediction : Theano variable Array representing the predictions. Has size 2 * d along the coordinate axis, where the first half corresponds to the mean and the second half to the variance of the prediction. loss : callable or string If a string, should index a member of breze.arch.component.loss. If a callable, has to be a of the form described in breze.arch.component.varprop.loss. coord_axis : integer, optional [default: 1] Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or some spatial axis. imp_weight : Theano variable, float or boolean, optional [default: False] Importance weights for the loss. Will be multiplied to the coordinate wise loss. prefix : string, optional [default: ‘’] Each key in the resulting dictionary will be prefixed with prefix. Returns res : dict Dictionary containing the expressions. See example for keys. Examples >>> import theano.tensor as T >>> prediction, target = T.matrix('prediction'), T.matrix('target') >>> from breze.arch.component.varprop.loss import diag_gaussian_nll >>> loss_dict = supervised_loss(target, prediction, diag_gaussian_nll, ... prefix='mymodel-') >>> sorted(loss_dict.items()) [('mymodel-loss', ...), ('mymodel-loss_coord_wise', ...), ('mymodel-loss_sample_wise', ...), ('m breze.arch.component.varprop.common.unsupervised_loss(output, loss, coord_axis=1, prefix=’‘) Return a dictionary populated with several expressions for a unsupervised loss and corresponding output. Version for variance propagation, where the prediction is not only a point but a mean with a variance. Parameters output : Theano variable Array representing the output of the model. Has size 2 * d along the coordinate axis, where the first half corresponds to the mean and the second half to the variance of the prediction. loss : callable or string If a string, should index a member of breze.arch.component.loss. If a callable, has to be a of the form described in breze.arch.component.varprop.loss. coord_axis : integer, optional [default: 1] 4.11. Common functions 49 breze Documentation, Release 0.1 Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or some spatial axis. prefix : string, optional [default: ‘’] Each key in the resulting dictionary will be prefixed with prefix. Returns res : dict Dictionary containing the expressions. See example for keys. Examples >>> import theano.tensor as T >>> output = T.matrix('output') >>> my_loss = lambda x: abs(x) >>> loss_dict = unsupervised_loss(output, my_loss, prefix='$') >>> sorted(loss_dict.items()) [('$loss', ...), ('$loss_coord_wise', ...), ('$loss_sample_wise', ...), ('$output', ...)] 50 Chapter 4. Architectures, Components CHAPTER 5 Implementation Notes 5.1 Variance propagation This package implements variance propagating networks. If we really want to talk about neural networks in a probabilistic way, the right way to do it is to treat every number in the network as a Dirac distributed value. There have been numerous attempts to model the adaptable parameters of networks as random variables, leading to so called “Bayesian Neural Networks”. In some applications, it makes sense to treat the activations as random variables. This can be done very efficiently and with a very good approximation for the mean and the variance of random variables. The algorithm for this has initially been described in [FD] and been described in the context of RNNs in [FD-RNN]. 5.1.1 References 5.1.2 Recurrent Networks Module implementing variance propagation and fast dropout for recurrent networks. In this module, we will often do with multiple sequences organized into a single Theano tensor. This tensor then has the shape of (t, n, d), where • t is the number of time steps, • n is the number of samples and • d is the dimensionality of each sample. We call these “sequence tensor”. Sometimes, it makes sense to flatten out the time dimension to apply better optimized linear algebra, such as a dot product. In that case, we will talk of a “flat sequence tensor”. breze.arch.model.varprop.rnn.recurrent_layer(in_mean, in_var, tial_hidden_mean, p_dropout) Return a theano variable representing a recurrent layer. weights, f, iniinitial_hidden_var, Parameters in_mean : Theano variable Sequence tensor of shape (t, n ,d). Represents the mean of the input to the layer. in_var : Theano variable 51 breze Documentation, Release 0.1 Sequence tensor. Represents the variance of the input to the layer. Either (a) same shape as the mean or (b) scalar. weights : Theano variable Theano matrix of shape (d, d). Represents the recurrent weight matrix the hiddens are right multiplied with. f : function Function that takes a theano variable and returns a theano variable of the same shape. Meant as transfer function of the layer. initial_hidden : Theano variable Theano vector of size d, representing the initial hidden state. p_dropout : Theano variable Scalar representing the probability that unit is dropped out. Returns hidden_in_mean_rec : Theano variable Theano sequence tensor representing the mean of the hidden activations before the application of f. hidden_in_var_rec : Theano variable Theano sequence tensor representing the varianceof the hidden activations before the application of f. hidden_mean_rec : Theano variable Theano sequence tensor representing the mean of the hidden activations after the application of f. hidden_var_rec : Theano variable Theano sequence tensor representing the varianceof the hidden activations after the application of f. 5.1.3 Transfer functions Module that contains transfer functions for variance propagation, working on Theano variables. Each transfer function has the signature: m2, s2 = f(m1, s1) where f is the transfer function, m1 and s2 are the pre-synaptic mean and variance respectively; m2 and s2 are the post-synaptic means. breze.arch.component.varprop.transfer.identity(mean, var) Return the mean and variance unchanged. Parameters mean : Theano variable Theano variable of the shape s. var : Theano variable Theano variable of the shape s. Returns mean_ : Theano variable Theano variable of the shape r. 52 Chapter 5. Implementation Notes breze Documentation, Release 0.1 var_ : Theano variable Theano variable of the shape r. breze.arch.component.varprop.transfer.sigmoid(mean, var) Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne, after passing it through a logistic sigmoid. Parameters mean : Theano variable Theano variable of the shape s. var : Theano variable Theano variable of the shape s. Returns mean_ : Theano variable Theano variable of the shape r. var_ : Theano variable Theano variable of the shape r. breze.arch.component.varprop.transfer.rectifier(mean, var) Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne, after passing it through a rectified linear unit. Parameters mean : Theano variable Theano variable of the shape s. var : Theano variable Theano variable of the shape s. Returns mean_ : Theano variable Theano variable of the shape r. var_ : Theano variable Theano variable of the shape r. breze.arch.component.varprop.transfer.tanh(mean, var) Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne, after passing it through a tangent hyperbolicus. Parameters mean : Theano variable Theano variable of the shape s. var : Theano variable Theano variable of the shape s. Returns mean_ : Theano variable Theano variable of the shape r. var_ : Theano variable Theano variable of the shape r. 5.1. Variance propagation 53 breze Documentation, Release 0.1 5.1.4 Losses Module containing several losses usable for supervised and unsupervised training. This is different from breze.component.loss in the sense that each prediction is also assumed to have a variance. The losses in this module assume two inputs: a target and a prediction. Additionally, if the target has a dimensionality of D, the prediction is assumed to have a dimensionality of 2D. The first D element constitute to the mean while the latter to the variance. Additionally, all losses from breze.arch.component.loss are also available; here, we just ignore the variance part of the input to the loss. 54 Chapter 5. Implementation Notes CHAPTER 6 Indices and tables • genindex • modindex • search 55 breze Documentation, Release 0.1 56 Chapter 6. Indices and tables Bibliography [XCA] Extreme component analysis, Welling et al (2003) [LFRKM] Learning Feature Representations with K-means, Adam Coates (2012) [R3] Discriminative clustering by regularized information maximization, by Gomes, R. and Krause, A. and Perona, P., NIPS 2010 [R1] Xu, Zhixiang Eddie, Kilian Q. Weinberger, and Fei Sha. “Rapid feature learning with stacked linear denoisers.” arXiv preprint arXiv:1105.0972 (2011). [D1] Hinton, Geoffrey E., et al. “Improving neural networks by preventing co-adaptation of feature detectors.” arXiv preprint arXiv:1207.0580 (2012). [FD] Wang, Sida, and Christopher Manning. “Fast dropout training.” Proceedings of the 30th International Conference on Machine Learning (ICML-13). 2013. [FD] Wang, Sida, and Christopher Manning. “Fast dropout training.” Proceedings of the 30th International Conference on Machine Learning (ICML-13). 2013. [FD-RNN] Bayer, Justin, et al. “On Fast Dropout and its Applicability to Recurrent Networks.” arXiv preprint arXiv:1311.0701 (2013). 57 breze Documentation, Release 0.1 58 Bibliography Python Module Index b breze.arch.component.common, 41 breze.arch.component.corrupt, 38 breze.arch.component.distributions.mvn, 44 breze.arch.component.distributions.normal, 42 breze.arch.component.layer, 40 breze.arch.component.loss, 35 breze.arch.component.misc, 39 breze.arch.component.norm, 31 breze.arch.component.varprop.common, 48 breze.arch.component.varprop.loss, 54 breze.arch.component.varprop.transfer, 52 breze.arch.model.varprop, 51 breze.arch.model.varprop.rnn, 51 breze.learn.data, 27 breze.learn.feature, 25 breze.learn.feature.emg, 25 breze.learn.kmeans, 12 breze.learn.lde, 15 breze.learn.mlp, 17 breze.learn.pca, 5 breze.learn.rim, 13 breze.learn.rnn, 16 breze.learn.sfa, 11 breze.learn.sparsefiltering, 9 breze.learn.trainer.score, 22 breze.learn.xca, 7 59 breze Documentation, Release 0.1 60 Python Module Index Index Symbols breze.learn.kmeans (module), 12 breze.learn.lde (module), 15 __init__() (breze.learn.lde.LinearDenoiser method), 15 breze.learn.mlp (module), 17 __init__() (breze.learn.mlp.DropoutMlp method), 19 __init__() (breze.learn.mlp.FastDropoutNetwork breze.learn.pca (module), 5 breze.learn.rim (module), 13 method), 21 breze.learn.rnn (module), 16 __init__() (breze.learn.pca.Pca method), 5 breze.learn.sfa (module), 11 __init__() (breze.learn.pca.Zca method), 6 breze.learn.sparsefiltering (module), 9 __init__() (breze.learn.rim.Rim method), 14 __init__() (breze.learn.sfa.SlowFeatureAnalysis method), breze.learn.trainer.score (module), 22 breze.learn.xca (module), 7 11 __init__() (breze.learn.sparsefiltering.SparseFiltering C method), 9 __init__() (breze.learn.trainer.score.MinibatchScore cat_ce() (in module breze.arch.component.loss), 35 cat_entropy() (in module breze.arch.component.misc), 40 method), 23 cca() (in module breze.learn.cca), 10 __init__() (breze.learn.xca.Xca method), 8 cdf() (in module breze.arch.component.distributions.normal), 43 A collapse() (in module breze.learn.data), 27 absolute() (in module breze.arch.component.loss), 35 collapse_seq_borders() (in module breze.learn.data), 27 consecutify() (in module breze.learn.data), 27 B cpu_expr_to_gpu() (in module breze.arch.util), 48 bern_bern_kl() (in module breze.arch.component.loss), cpu_expr_to_gpu_nested() (in module breze.arch.util), 48 36 cpu_tensor_to_gpu() (in module breze.arch.util), 48 bern_ces() (in module breze.arch.component.loss), 36 cpu_tensor_to_gpu_nested() (in module breze.arch.util), breze.arch.component.common (module), 41 48 breze.arch.component.corrupt (module), 38 breze.arch.component.distributions.mvn (module), 44 D breze.arch.component.distributions.normal (module), 42 distance_matrix() (in module breze.arch.component.layer (module), 40 breze.arch.component.misc), 39 breze.arch.component.loss (module), 35 distance_matrix_by_diff() (in module breze.arch.component.misc (module), 39 breze.arch.component.misc), 40 breze.arch.component.norm (module), 31 drlim() (in module breze.arch.component.loss), 37 breze.arch.component.transfer (module), 33 DropoutMlp (class in breze.learn.mlp), 18 breze.arch.component.varprop.common (module), 48 breze.arch.component.varprop.loss (module), 54 F breze.arch.component.varprop.transfer (module), 52 FastDropoutNetwork (class in breze.learn.mlp), 20 breze.arch.model.varprop (module), 51 fit() (breze.learn.kmeans.GainShapeKMeans method), 12 breze.arch.model.varprop.rnn (module), 51 fit() (breze.learn.lde.LinearDenoiser method), 15 breze.learn.data (module), 27 fit() (breze.learn.mlp.DropoutMlp method), 19 breze.learn.feature (module), 25 fit() (breze.learn.mlp.FastDropoutNetwork method), 21 breze.learn.feature.emg (module), 25 61 breze Documentation, Release 0.1 fit() (breze.learn.mlp.Mlp method), 18 fit() (breze.learn.pca.Pca method), 5 fit() (breze.learn.pca.Zca method), 6 fit() (breze.learn.rim.Rim method), 14 fit() (breze.learn.rnn.SupervisedRnn method), 16 fit() (breze.learn.sfa.SlowFeatureAnalysis method), 11 fit() (breze.learn.sparsefiltering.SparseFiltering method), 10 fit() (breze.learn.xca.Xca method), 8 flatten() (in module breze.arch.util), 47 function() (breze.arch.util.Model method), 46 G M mask() (in module breze.arch.component.corrupt), 39 mean_absolute_value() (in module breze.learn.feature.emg), 25 mean_absolute_value_slope() (in module breze.learn.feature.emg), 26 MinibatchScore (class in breze.learn.trainer.score), 22 Mlp (class in breze.learn.mlp), 17 Model (class in breze.arch.util), 45 modified_mean_absolute_value_1() (in module breze.learn.feature.emg), 25 modified_mean_absolute_value_2() (in module breze.learn.feature.emg), 25 GainShapeKMeans (class in breze.learn.kmeans), 12 garray_to_cudandarray_nested() (in module N breze.arch.util), 48 ncac() (in module breze.arch.component.loss), 37 gaussian_perturb() (in module ncar() (in module breze.arch.component.loss), 37 breze.arch.component.corrupt), 38 ncat_ce() (in module breze.arch.component.loss), 36 get_named_variables() (in module breze.arch.util), 48 gnumpy_func_wrap() (in module breze.arch.util), 48 P padzeros() (in module breze.learn.data), 27 pairwise_diff() (in module breze.arch.component.misc), hinton() (in module breze.learn.display), 29 39 ParameterSet (class in breze.arch.util), 47 I Pca (class in breze.learn.pca), 5 identity() (in module breze.arch.component.varprop.transfer),pdf() (in module breze.arch.component.distributions.mvn), 52 44 integrated() (in module breze.learn.feature.emg), 25 pdf() (in module breze.arch.component.distributions.normal), interleave() (in module breze.learn.data), 27 42 interpolate() (in module breze.learn.data), 27 predict() (breze.learn.mlp.DropoutMlp method), 20 inverse_transform() (breze.learn.pca.Pca method), 5 predict() (breze.learn.mlp.FastDropoutNetwork method), inverse_transform() (breze.learn.pca.Zca method), 7 21 inverse_transform() (breze.learn.xca.Xca method), 8 predict() (breze.learn.mlp.Mlp method), 18 iter_fit() (breze.learn.mlp.DropoutMlp method), 19 predict() (breze.learn.rnn.SupervisedRnn method), 17 iter_fit() (breze.learn.mlp.FastDropoutNetwork method), project_into_l2_ball() (in module 21 breze.arch.component.misc), 40 iter_fit() (breze.learn.mlp.Mlp method), 18 iter_fit() (breze.learn.rim.Rim method), 14 R iter_fit() (breze.learn.rnn.SupervisedRnn method), 16 rbf() (in module breze.learn.feature), 25 iter_fit() (breze.learn.sparsefiltering.SparseFiltering reconstruct() (breze.learn.pca.Pca method), 6 method), 10 reconstruct() (breze.learn.pca.Zca method), 7 iter_windows() (in module breze.learn.data), 27 reconstruct() (breze.learn.xca.Xca method), 8 H rectifier() (in module breze.arch.component.transfer), 34 rectifier() (in module breze.arch.component.varprop.transfer), l1() (in module breze.arch.component.norm), 31 53 l2() (in module breze.arch.component.norm), 31 recurrent_layer() (in module LinearDenoiser (class in breze.learn.lde), 15 breze.arch.model.varprop.rnn), 51 logpdf() (in module breze.arch.component.distributions.mvn), Rim (class in breze.learn.rim), 13 44 root_mean_square() (in module breze.learn.feature.emg), lookup() (in module breze.arch.util), 48 26 lookup_some_key() (in module breze.arch.util), 48 S lp() (in module breze.arch.component.norm), 32 sample() (in module breze.learn.sampling.hmc), 21 L 62 Index breze Documentation, Release 0.1 scatterplot_matrix() (in module breze.learn.display), 28 unsupervised_loss() (in module shuffle() (in module breze.learn.data), 27 breze.arch.component.common), 42 sigmoid() (in module breze.arch.component.transfer), 33 unsupervised_loss() (in module sigmoid() (in module breze.arch.component.varprop.transfer), breze.arch.component.varprop.common), 53 49 simple() (in module breze.arch.component.layer), 40 V simple() (in module breze.learn.trainer.score), 22 skip() (in module breze.learn.data), 27 var_exp_for_gpu() (breze.arch.util.Model method), 46 slope_sign_change() (in module breze.learn.feature.emg), variance() (in module breze.learn.feature.emg), 26 26 SlowFeatureAnalysis (class in breze.learn.sfa), 11 W soft_l1() (in module breze.arch.component.norm), 32 willison_amplitude() (in module softmax() (in module breze.arch.component.transfer), 34 breze.learn.feature.emg), 26 softplus() (in module breze.arch.component.transfer), 34 windowify() (in module breze.learn.data), 27 softsign() (in module breze.arch.component.transfer), 34 SparseFiltering (class in breze.learn.sparsefiltering), 9 X split() (in module breze.learn.data), 27 Xca (class in breze.learn.xca), 7 squared() (in module breze.arch.component.loss), 35 supervised_loss() (in module Z breze.arch.component.common), 41 supervised_loss() (in module Zca (class in breze.learn.pca), 6 zero_crossing() (in module breze.learn.feature.emg), 26 breze.arch.component.varprop.common), 48 SupervisedRnn (class in breze.learn.rnn), 16 T tanh() (in module breze.arch.component.transfer), 33 tanh() (in module breze.arch.component.varprop.transfer), 53 tanhplus() (in module breze.arch.component.transfer), 33 tell_deterministic() (in module breze.arch.util), 47 theano_expr_bfs() (in module breze.arch.util), 47 theano_function_with_nested_exprs() (in module breze.arch.util), 47 time_series_filter_plot() (in module breze.learn.display), 28 transform() (breze.learn.kmeans.GainShapeKMeans method), 13 transform() (breze.learn.lde.LinearDenoiser method), 15 transform() (breze.learn.pca.Pca method), 6 transform() (breze.learn.pca.Zca method), 7 transform() (breze.learn.rim.Rim method), 14 transform() (breze.learn.sfa.SlowFeatureAnalysis method), 11 transform() (breze.learn.sparsefiltering.SparseFiltering method), 10 transform() (breze.learn.xca.Xca method), 8 U uncollapse() (in module breze.learn.data), 27 uncollapse_seq_borders() (in module breze.learn.data), 27 unflatten() (in module breze.arch.util), 47 uninterleave() (in module breze.learn.data), 27 Index 63