latest PDF - Read the Docs

advertisement
breze Documentation
Release 0.1
brml.de
September 15, 2016
Contents
1
Basics
1.1 Specifiying losses, norms, transfer functions etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Models and Algorithms
2.1 Principal Component Analysis . . . . .
2.2 Extreme Component Analysis . . . . .
2.3 Sparse Filtering . . . . . . . . . . . . .
2.4 ICA with Reconstruction Cost . . . . .
2.5 Canonical Correlation Analysis . . . .
2.6 Slow Feature Analysis . . . . . . . . .
2.7 K-Means . . . . . . . . . . . . . . . .
2.8 Regularized Information Maximization
2.9 Stochastic Gradient Variational Bayes .
2.10 Linear Denoiser . . . . . . . . . . . .
2.11 Recurrent Neural Networks . . . . . .
2.12 Multilayer Perceptrons . . . . . . . . .
2.13 Hybrid Monte Carlo . . . . . . . . . .
2.14 Trainers . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
7
9
10
10
11
12
13
15
15
16
17
21
22
Helpers, convenience functions and tools
3.1 Feature extraction . . . . . . . . . .
3.2 Data manipulation . . . . . . . . . .
3.3 Various utilities . . . . . . . . . . . .
3.4 Helpers for plotting data . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
27
28
28
Architectures, Components
4.1 Norms . . . . . . . . . . . . . . . . . . .
4.2 Transfer functions . . . . . . . . . . . . .
4.3 Loss functions . . . . . . . . . . . . . . .
4.4 Stochastic Corruption of Theano Variables
4.5 Miscellaneous functionality . . . . . . . .
4.6 Layers . . . . . . . . . . . . . . . . . . .
4.7 Common functions . . . . . . . . . . . . .
4.8 Univariate Normal Distribution . . . . . .
4.9 Multivariate Normal Distribution . . . . .
4.10 Utilities . . . . . . . . . . . . . . . . . . .
4.11 Common functions . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
33
35
38
39
40
41
42
44
45
48
3
4
.
.
.
.
1
1
i
5
Implementation Notes
5.1 Variance propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
51
6
Indices and tables
55
Bibliography
57
Python Module Index
59
ii
CHAPTER 1
Basics
1.1 Specifiying losses, norms, transfer functions etc.
To maintain flexibility and conciseness, configuring models can be achieved twofold: either by using a string or by
using a function that follows the specific API.
1.1.1 Using the builtin loss functions
Let us start with an example. To instantiate a linear model, we can make use of the following notation:
from breze.learn.glm import Linear
model = Linear(5, 1, loss='squared')
In this case, we specify the sum of squares loss as string. The logic behind this aims to be straight forward: for losses, a lookup is done in the module breze.arch.components.distance. Thus, the function
breze.arch.component.distance.squared is used as a loss. This function follows a simple protocol. In
the case of an supervised model, it is called with the target as its first argument and the output of the model as its
second argument. However, both are required to be Theano variables. In the case of an unsupervised model, the output
of the model is the only argument passed on to the loss.
A list of supervised losses can be found by checking the contents of the breze.arch.components.distance
module:
>>> from breze.arch.components import distance
>>> dir(distance)
['T',
'__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'absolute',
'bernoulli_kl',
'bernoulli_neg_cross_entropy',
'discrete_entropy',
'distance_matrix',
'lookup',
'nca',
'neg_cross_entropy',
'nominal_neg_cross_entropy',
1
breze Documentation, Release 0.1
'norm',
'squared']
Some of these are just global variable of course.
1.1.2 Using custom loss functions
Using your own loss function comes down to implementing it following the above protocol and working on Theano
variables. We can thus define the sum of squares loss ourself as follows:
def squared(target, output):
d = target - output
return (d**2).sum()
We can also use more complicated loss functions. The Huber loss for example is a mix of the absolute error and the
squared error, depending on the size of the error. It depends on an additional threshold parameter and is defined as
follow:
𝐿𝛿 (π‘Ž) =
2
π‘Ž
2
if |π‘Ž| ≤ 𝛿,
𝐿𝛿 (π‘Ž) =
𝛿
𝛿(|π‘Ž| − ),
2
else.
We can implement this as follows:
import theano.tensor as T
delta = 0.1
def huber(target, output):
d = target - output
a = .5 * d**2
b = delta * (abs(d) - delta / 2.)
l = T.switch(abs(d) <= delta, a, b)
return l.sum()
Unfortunately, we will have to set a global variable for this. The most elegant solution is to use a function template:
import theano.tensor as T
def make_huber(delta):
def inner(target, output):
d = target - output
a = .5 * d**2
b = delta * (abs(d) - delta / 2.)
l = T.switch(abs(d) <= delta, a, b)
return l.sum()
return inner
my_huber = make_huber(0.1)
This way we can create wild loss functions.
1.1.3 Using norms and transfer functions
The story is similar when using norms and loss functions. In the former case, the module of interest is
breze.arch.component.norm. The protocol is that a single argument, a Theano variable, is given. The re2
Chapter 1. Basics
breze Documentation, Release 0.1
sult is expected to be a Theano variable of the same shape. This is also the case for transfer functions, except that the
module in question is breze.arch.component.transfer.
1.1. Specifiying losses, norms, transfer functions etc.
3
breze Documentation, Release 0.1
4
Chapter 1. Basics
CHAPTER 2
Models and Algorithms
Learning representations, clustering:
2.1 Principal Component Analysis
This module provides functionality for principal component analysis.
class breze.learn.pca.Pca(n_components=None, whiten=False)
Class to perform principal component analysis.
Attributes
n_components
whiten
weights
singular_values
(integer) Number of components to keep.
(boolean) Flag indicating whether to whiten the covariance matrix.
(array_like) 2D array representing the map from observable to latent space.
(array_like) 1D array containing the singular values of the problem.
Methods
fit(X)
inverse_transform(F)
reconstruct(X)
transform(X)
Fit the parameters of the model.
Perform an inverse transformation of transformed data according to the model.
Reconstruct the data according to the model.
Transform data according to the model.
__init__(n_components=None, whiten=False)
Create a Pca object.
fit(X)
Fit the parameters of the model.
The data should be centered (that is, its mean subtracted rowwise) before using this method.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
inverse_transform(F)
5
breze Documentation, Release 0.1
Perform an inverse transformation of transformed data according to the model.
Parameters F : array_like
An array of shape (n, d) where n is the number of data points and d the dimensionality if the feature space.
Returns X : array_like
An array of shape (n, c) where n is the number of samples and c is the dimensionality of the input space.
reconstruct(X)
Reconstruct the data according to the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
Returns Y : array_like
An array of shape (n, d) where n is the number of samples and d is the dimensionality of the input space.
transform(X)
Transform data according to the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
Returns Y : array_like
An array of shape (n, c) where n is the number of samples and c is the number of
components kept.
class breze.learn.pca.Zca(min_eig_val=0.1)
Class to perform zero component analysis.
Attributes
min_eig_val
weights
singular_values
(float) Eigenvalues are increased by this value before reconstructing.
(array_like) 2D array representing the map from observable to latent space.
(array_like) 1D array containing the singular values of the problem.
Methods
fit(X)
inverse_transform(F)
reconstruct(X)
transform(X)
Fit the parameters of the model.
Perform an inverse transformation of transformed data according to the model.
Reconstruct the data according to the model.
Transform data according to the model.
__init__(min_eig_val=0.1)
Create a Zca object.
fit(X)
6
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
Fit the parameters of the model.
The data should be centered (that is, its mean subtracted rowwise) before using this method.
inverse_transform(F)
Perform an inverse transformation of transformed data according to the model.
Parameters F : array_like
An array of shape (n, d) where n is the number of data points and d the dimensionality if the feature space.
Returns X : array_like
An array of shape (n, c) where n is the number of samples and c is the dimensionality of the input space.
reconstruct(X)
Reconstruct the data according to the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
Returns Y : array_like
An array of shape (n, d) where n is the number of samples and d is the dimensionality of the input space.
transform(X)
Transform data according to the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
Returns Y : array_like
An array of shape (n, c) where n is the number of samples and c is the number of
components kept.
2.2 Extreme Component Analysis
This module provides functionality for extreme component analysis.
An explanation and derivation of the algorithm can be found in [XCA].
class breze.learn.xca.Xca(n_components, whiten=False)
Class implementing extreme component analysis.
The idea is that not only the prinicple components or the minor components of a data set are important, but a
combination of the two. This algorithm works by combining probabilistic versions of PCA and MCA.
The central idea is that if n principle and m minor components are chosen, a gap of size D - m - n dimensions is
formed in the list of singular values. The exact location of this gap is found by chosing the one which minimizes
a likelihood combining PCA and MCA.
2.2. Extreme Component Analysis
7
breze Documentation, Release 0.1
Attributes
n_components
(integer) Amount of components kept.
Methods
fit(X)
inverse_transform(F)
reconstruct(X)
transform(X)
Fit the parameters of the model.
Perform an inverse transformation of transformed data according to the model.
Reconstruct the data according to the model.
Transform data according to the model.
__init__(n_components, whiten=False)
Create an Xca object.
Parameters n_components : integer
Amount of components to keep.
fit(X)
Fit the parameters of the model.
The data should be centered (that is, its mean subtracted rowwise) before using this method.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
inverse_transform(F)
Perform an inverse transformation of transformed data according to the model.
Parameters F : array_like
An array of shape (n, d) where n is the number of data points and d the dimensionality
if the feature space.
Returns X : array_like
An array of shape (n, c) where n is the number of samples and c is the dimensionality
of the input space.
reconstruct(X)
Reconstruct the data according to the model.
Returns Y : array_like
An array of shape (n, d) where n is the number of samples and d is the dimensionality
of the input space.
transform(X)
Transform data according to the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
Returns F : array_like
An array of shape (n, c) where n is the number of samples and c is the number of
components kept.
8
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
2.3 Sparse Filtering
Sparse Filtering.
As introduced in
Sparse Filtering Jiquan Ngiam, Pangwei Koh, Zhenghao Chen, Sonia Bhaskar and Andrew Y. Ng. In
NIPS*2011.
class breze.learn.sparsefiltering.SparseFiltering(n_inpt,
n_output,
feature_transfer=’softabs’,
optimizer=’lbfgs’,
max_iter=1000,
verbose=False)
Attributes
inpt
loss
output
f_score
f_transform
gradient_clip_threshold
mode
Methods
fit(X[, W])
function(variables, exprs[, mode, ...])
iter_fit(X[, W, info_opt])
powerfit(fit_data, eval_data, stop, report)
score(X[, W])
transform(X)
var_exp_for_gpu(variables, exprs[, outputs])
Fit the parameters of the model.
Return a compiled function for the given exprs given variables.
Iteratively fit the parameters of the model to the given data.
Iteratively fit the model.
Return the score of the model given the input and targets.
Return the feature representation of the model given X.
Given variables and theano expressions built from these variables, return variable
__init__(n_inpt, n_output, feature_transfer=’softabs’, optimizer=’lbfgs’, max_iter=1000, verbose=False)
Create a SparseFiltering object.
Parameters n_inpt : int
Input dimensionality of the data.
n_output: int
Dimensionality of the hidden feature dimension.
feature_transfer : string or callable
Transfer function to use.
If a string referring any function found in
breze.arch.component.transfer or a function that given an (n, d) array returns an (n, d) array as theano expressions.
max_iter : int
2.3. Sparse Filtering
9
breze Documentation, Release 0.1
Maximum number of optimization iterations to perform.
verbose : bool
Flag indicating whether to print out information during fitting.
fit(X, W=None)
Fit the parameters of the model.
Parameters X – Array representing the samples.
iter_fit(X, W=None, info_opt=None)
Iteratively fit the parameters of the model to the given data.
Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state
after each iteration, so that the optimization can be broken any time by the caller.
This method does not respect the max_iter attribute.
Parameters X – Array representing the samples.
transform(X)
Return the feature representation of the model given X.
Parameters X : array_like
Represents the inputs to be transformed.
Returns Y : array_like
Transformation of X under the model.
2.4 ICA with Reconstruction Cost
2.5 Canonical Correlation Analysis
breze.learn.cca.cca(X, Y)
Canonical Correlation Analysis
Parameters X : array_like
Observation matrix in first space, every column is one data point.
Y : array_like
Observation matrix in second space, every column is one data point.
Returns cA : array_like
Basis in X space
B : array_like
Basis in Y space.
clambdas : array_like
Correlation.
10
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
2.6 Slow Feature Analysis
Slow Feature Analysis.
This module provides functionality for slow feature analysis. A helpful article is hosted at scholarpedia.
class breze.learn.sfa.SlowFeatureAnalysis(n_components=None)
Class for performing Slow feature analysis.
Attributes
n_components
(integer) Number of components to keep.
Methods
fit(X)
transform(X)
Fit the parameters of the model.
Transform data according to the model.
__init__(n_components=None)
Create a SlowFeatureAnalysis object.
Parameters n_components : integer
Amount of components to keep.
fit(X)
Fit the parameters of the model.
The data should be centered (that is, its mean subtracted rowwise) and white (e.g. via pca.Pca) before
using this method.
Parameters X : list of array_like
A list of sequences. Each entry is expected to be an array of shape (*, d) where * is
the number of data points and may vary from item to item in the list. d is the input
dimensionality and has to be consistent.
Returns F : list of array_like
List of sequences. Each item in the list is an array which corresponds to the sequence
in X. It is of the same shape, except that d is replaced by n_components.
transform(X)
Transform data according to the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of time steps and d the input dimensionality.
Returns F : array_like
An array of shape (n, c) where n is the number of time steps and c is the number of
components kept.
2.6. Slow Feature Analysis
11
breze Documentation, Release 0.1
2.7 K-Means
class breze.learn.kmeans.GainShapeKMeans(n_component,
zscores=False,
whiten=False,
c_zca=1e-08, max_iter=10, random_state=None)
GainShapeKMeans class to perform K-means clustering for feature learning as described in [LFRKM].
Parameters n_components : integer
Number of features to learn.
zscores : boolean, optional, default: False
Flag indicating whether the data should be normalized to zero mean and unit variance
before training and transformation.
whiten : boolean, optional, default: False
Flag indicating whether the data should be whitened before training and transformation.
c_zca : float, optional, default: 1e-8
Small number that is added to each singular value during ZCA.
max_iter : integer, optional
Maximum number of iterations to perform.
random_state : None, integer or numpy.RandomState, optional, default: None
Generator to initialize the dictionary. If None, the numpy singleton generator is used.
References
[LFRKM]
Attributes
activation: {‘identity’, ‘omp-1’,
‘soft-threshold’}, optional,
default: None
threshold
Activation to for transformation. ‘identity’ does not alter the output.
‘omp-1’ only retains the component with the largest absolute value.
‘soft-threshold’ only sets components below a certain threshold to zero,
but separates positive and negative parts.
(scalar,) Threshold used for soft-thresholding activation. Ignored if
another activation is used.
Methods
fit(X)
iter_fit(X)
normalize_dict()
prepare(n_inpt)
transform(X[, activation])
Fit the parameters of the model.
Normalize the columns of the dictionary to unit length.
Initialize the models internal structures.
Transform the data according to the dictionary.
fit(X)
Fit the parameters of the model.
Parameters X : array_like
12
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
Array of shape (n_samples, n_inpt) used for training.
transform(X, activation=None)
Transform the data according to the dictionary.
Parameters X : array_like
Input data of shape (n_samples, n_inpt).
activation: {‘identity’, ‘omp-1’}, optional, default: None
Activation to use. ‘linear’ does not alter the output. ‘omp-1’ only retains the component
with the largest absolute value. ‘soft-threshold’ only sets components below a certain
threshold to zero, but separates positive and negative parts. If None, .activation is
used.
2.8 Regularized Information Maximization
Regularized Information Maximization.
As introduced in [R3].
2.8.1 References
class breze.learn.rim.Rim(n_inpt, n_cluster,
bose=False)
Class for regularized information maximization.
c_rim,
optimizer=’rprop’,
max_iter=1000,
ver-
Attributes
pa(ParamterSet object) Parameters of the model.
rameters
n_inpt (integer) Input dimensionality of the data.
n_cluster (integer) Amount of clusters to use.
c_rim (float) Value indicating the regularization strength.
opti(string or pair) Can be either a string or a pair. In any case, climin.util.optimizer is used
mizer
to construct an optimizer. In the case of a string, the string is used as an identifier for the optimizer
which is then instantiated with default arguments. If a pair, expected to be (identifier,
kwargs) for more fine control of the optimizer.
max_iter (integer) Maximum number of optimization iterations to perform.
ver(boolean) Flag indicating whether to print out information during fitting.
bose
Methods
fit(X[, W])
function(variables, exprs[, mode, ...])
iter_fit(X[, W, info_opt])
2.8. Regularized Information Maximization
Fit the parameters of the model.
Return a compiled function for the given exprs given variables.
Iteratively fit the parameters of the model to the given data.
13
breze Documentation, Release 0.1
powerfit(fit_data, eval_data, stop, report)
score(X[, W])
transform(X)
var_exp_for_gpu(variables, exprs[, outputs])
Table 2.8 – continued from previous page
Iteratively fit the model.
Return the score of the model given the input and targets.
Return the feature representation of the model given X.
Given variables and theano expressions built from these variables, return variable
__init__(n_inpt, n_cluster, c_rim, optimizer=’rprop’, max_iter=1000, verbose=False)
Create a Rim object.
Parameters n_inpt : integer
Input dimensionality of the data.
n_cluster : integer
Amount of clusters to use.
c_rim : float
Value indicating the regularization strength.
optimizer : string or pair
Can be either a string or a pair. In any case, climin.util.optimizer is used to
construct an optimizer. In the case of a string, the string is used as an identifier for the
optimizer which is then instantiated with default arguments. If a pair, expected to be
(identifier, kwargs) for more fine control of the optimizer.
max_iter : integer
Maximum number of optimization iterations to perform.
verbose : boolean
Flag indicating whether to print out information during fitting.
fit(X, W=None)
Fit the parameters of the model.
Parameters X – Array representing the samples.
iter_fit(X, W=None, info_opt=None)
Iteratively fit the parameters of the model to the given data.
Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state
after each iteration, so that the optimization can be broken any time by the caller.
This method does not respect the max_iter attribute.
Parameters X – Array representing the samples.
transform(X)
Return the feature representation of the model given X.
Parameters X : array_like
Represents the inputs to be transformed.
Returns Y : array_like
Transformation of X under the model.
14
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
2.9 Stochastic Gradient Variational Bayes
2.9.1 Variational Autoencoder
Denoising:
2.10 Linear Denoiser
Module for the linear denoiser.
class breze.learn.lde.LinearDenoiser(p_dropout)
Class that represents linear denoisers.
LinearDenoisers (LDEs) were later also named Marginalized Denoising AutoEncoders.
Introduced in [R1].
References
[R1]
Methods
fit(X)
transform(X)
Fit the parameters of the model.
Transform data according to the model.
__init__(p_dropout)
Create a LinearDenoiser object.
Parameters p_dropout : float
Probability of an input being dropped out.
fit(X)
Fit the parameters of the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
transform(X)
Transform data according to the model.
Parameters X : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
Returns Y : array_like
An array of shape (n, d) where n is the number of data points and d the input dimensionality.
Supervised Learning
2.10. Linear Denoiser
15
breze Documentation, Release 0.1
2.11 Recurrent Neural Networks
Module for learning various types of recurrent networks.
class breze.learn.rnn.SupervisedRnn(n_inpt,
n_hiddens,
n_output,
hidden_transfers,
out_transfer=’identity’, loss=’squared’, pooling=None,
gradient_clip=False, optimizer=’rprop’, batch_size=None,
imp_weight=False, max_iter=1000, verbose=False)
Attributes
inpt
loss
output
sample_dim
target
f_predict
f_score
gradient_clip_threshold
mode
Methods
fit(X, Z[, imp_weight])
function(variables, exprs[, mode, ...])
initialize([par_std, par_std_affine, ...])
iter_fit(X, Z[, imp_weight, info_opt])
powerfit(fit_data, eval_data, stop, report)
predict(X)
score(X, Z[, imp_weight])
var_exp_for_gpu(variables, exprs[, outputs])
Fit the parameters of the model to the given data with the given error function.
Return a compiled function for the given exprs given variables.
Iteratively fit the parameters of the model to the given data with the given error fu
Iteratively fit the model.
Return the prediction of the model given the input.
Return the score of the model given the input and targets.
Given variables and theano expressions built from these variables, return variable
fit(X, Z, imp_weight=None)
Fit the parameters of the model to the given data with the given error function.
Parameters
• X – Array representing the inputs.
• Z – Array representing the outputs.
iter_fit(X, Z, imp_weight=None, info_opt=None)
Iteratively fit the parameters of the model to the given data with the given error function.
Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state
after each iteration, so that the optimization can be broken any time by the caller.
This method does not respect the max_iter attribute.
Parameters
• X – Array representing the inputs.
16
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
• Z – Array representing the outputs.
predict(X)
Return the prediction of the model given the input.
Parameters X : array_like
Input to the model.
Returns Y : array_like
2.12 Multilayer Perceptrons
Module for learning various types of multilayer perceptrons.
class breze.learn.mlp.Mlp(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss,
imp_weight=False, optimizer=’adam’, batch_size=None, max_iter=1000,
verbose=False)
Multilayer perceptron class.
This implementation uses a stack of affine mappings with a subsequent non linearity each.
Parameters n_inpt : integer
Dimensionality of a single input.
n_hiddens : list of integers
List of k integers, where k is thenumber of layers. Each gives the size of the corresponding layer.
n_output : integer
Dimensionality of a single output.
hidden_transfers : list, each item either string or function
Transfer functions for each of the layers. Can be either a string which is then used
to look up a transfer function in breze.component.transfer or a function that
given a Theano tensor returns a tensor of the same shape.
out_transfer : string or function
Either a string to look up a function in breze.component.transfer or a function
that given a Theano tensor returns a tensor of the same shape.
optimizer : string, pair
Argument is passed to climin.util.optimizer to construct an optimizer.
batch_size : integer, None
Number of examples per batch when calculting the loss and its derivatives. None means
to use all samples every time.
imp_weight : boolean
Flag indicating whether importance weights are used.
max_iter : int
Maximum number of optimization iterations to perform. Only respected during‘‘.fit()‘‘,
not .iter_fit().
verbose : boolean
2.12. Multilayer Perceptrons
17
breze Documentation, Release 0.1
Flag indicating whether to print out information during fitting.
Attributes
inpt
loss
output
target
f_predict
f_score
gradient_clip_threshold
mode
Methods
fit(X, Z[, imp_weight])
function(variables, exprs[, mode, ...])
iter_fit(X, Z[, imp_weight, info_opt])
powerfit(fit_data, eval_data, stop, report)
predict(X)
score(X, Z[, imp_weight])
var_exp_for_gpu(variables, exprs[, outputs])
Fit the parameters of the model to the given data with the given error function.
Return a compiled function for the given exprs given variables.
Iteratively fit the parameters of the model to the given data with the given error fu
Iteratively fit the model.
Return the prediction of the model given the input.
Return the score of the model given the input and targets.
Given variables and theano expressions built from these variables, return variable
fit(X, Z, imp_weight=None)
Fit the parameters of the model to the given data with the given error function.
Parameters
• X – Array representing the inputs.
• Z – Array representing the outputs.
iter_fit(X, Z, imp_weight=None, info_opt=None)
Iteratively fit the parameters of the model to the given data with the given error function.
Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state
after each iteration, so that the optimization can be broken any time by the caller.
This method does not respect the max_iter attribute.
Parameters
• X – Array representing the inputs.
• Z – Array representing the outputs.
predict(X)
Return the prediction of the model given the input.
Parameters X : array_like
Input to the model.
Returns Y : array_like
18
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
class breze.learn.mlp.DropoutMlp(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer,
loss,
p_dropout_inpt=0.2,
p_dropout_hiddens=0.5,
max_length=None,
optimizer=’adam’,
batch_size=None,
max_iter=1000, verbose=False)
Class representing an MLP that is trained with dropout [D1].
The gist of this method is that hidden units and input units are “zerod out” with a certain probability.
References
[D1]
Attributes
Same attributes as an
Mlp object.
p_dropout_inpt
p_dropout_hidden
max_length
(float) Probability that an input unit is ommitted during a pass.
(float) Probability that an input unit is ommitted during a pass.
(float) Maximum squared length of a weight vector into a unit. After each update,
the weight vectors will projected to be shorter.
Methods
fit(X, Z[, imp_weight])
function(variables, exprs[, mode, ...])
iter_fit(X, Z[, imp_weight, info_opt])
powerfit(fit_data, eval_data, stop, report)
predict(X)
score(X, Z[, imp_weight])
var_exp_for_gpu(variables, exprs[, outputs])
Fit the parameters of the model to the given data with the given error function.
Return a compiled function for the given exprs given variables.
Iteratively fit the parameters of the model to the given data with the given error fu
Iteratively fit the model.
Return the prediction of the model given the input.
Return the score of the model given the input and targets.
Given variables and theano expressions built from these variables, return variable
__init__(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss, p_dropout_inpt=0.2,
p_dropout_hiddens=0.5, max_length=None, optimizer=’adam’, batch_size=None,
max_iter=1000, verbose=False)
Create a DropoutMlp object.
Parameters Same attributes as an ‘‘Mlp‘‘ object.
p_dropout_inpt : float
Probability that an input unit is ommitted during a pass.
p_dropout_hiddens : list of floats
List of which each item gives the probability that a hidden unit of that layer is omitted
during a pass.
fit(X, Z, imp_weight=None)
Fit the parameters of the model to the given data with the given error function.
Parameters
• X – Array representing the inputs.
• Z – Array representing the outputs.
2.12. Multilayer Perceptrons
19
breze Documentation, Release 0.1
iter_fit(X, Z, imp_weight=None, info_opt=None)
Iteratively fit the parameters of the model to the given data with the given error function.
Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state
after each iteration, so that the optimization can be broken any time by the caller.
This method does not respect the max_iter attribute.
Parameters
• X – Array representing the inputs.
• Z – Array representing the outputs.
predict(X)
Return the prediction of the model given the input.
Parameters X : array_like
Input to the model.
Returns Y : array_like
class breze.learn.mlp.FastDropoutNetwork(n_inpt, n_hiddens, n_output, hidden_transfers,
out_transfer,
loss,
imp_weight=False,
optimizer=’adam’,
batch_size=None,
p_dropout_inpt=0.2,
p_dropout_hiddens=0.5,
max_iter=1000, verbose=False)
Class representing an MLP that is trained with fast dropout [FD].
This method employs a smooth approximation of dropout training.
References
[FD]
Attributes
Same attributes as an
Mlp object.
p_dropout_inpt
p_dropout_hiddens
inpt_var
(float) Probability that an input unit is ommitted during a pass.
(list of floats) Each item constitues the probability that a hidden unit of the
corresponding layer is ommitted during a pass.
(float) Assumed variance of the inputs. “quasi zero” per default.
Methods
fit(X, Z[, imp_weight])
function(variables, exprs[, mode, ...])
iter_fit(X, Z[, imp_weight, info_opt])
powerfit(fit_data, eval_data, stop, report)
predict(X)
score(X, Z[, imp_weight])
var_exp_for_gpu(variables, exprs[, outputs])
20
Fit the parameters of the model to the given data with the given error function.
Return a compiled function for the given exprs given variables.
Iteratively fit the parameters of the model to the given data with the given error fu
Iteratively fit the model.
Return the prediction of the model given the input.
Return the score of the model given the input and targets.
Given variables and theano expressions built from these variables, return variable
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
__init__(n_inpt, n_hiddens, n_output, hidden_transfers, out_transfer, loss, imp_weight=False,
optimizer=’adam’, batch_size=None, p_dropout_inpt=0.2, p_dropout_hiddens=0.5,
max_iter=1000, verbose=False)
Create a FastDropoutMlp object.
Parameters Same parameters as an ‘‘Mlp‘‘ object.
p_dropout_inpt : float
Probability that an input unit is ommitted during a pass.
p_dropout_hidden : float
Probability that an input unit is ommitted during a pass.
max_length : float or None
Maximum squared length of a weight vector into a unit. After each update, the weight
vectors will projected to be shorter. If None, no projection is performed.
fit(X, Z, imp_weight=None)
Fit the parameters of the model to the given data with the given error function.
Parameters
• X – Array representing the inputs.
• Z – Array representing the outputs.
iter_fit(X, Z, imp_weight=None, info_opt=None)
Iteratively fit the parameters of the model to the given data with the given error function.
Each iteration of the learning algorithm is an iteration of the returned iterator. The model is in a valid state
after each iteration, so that the optimization can be broken any time by the caller.
This method does not respect the max_iter attribute.
Parameters
• X – Array representing the inputs.
• Z – Array representing the outputs.
predict(X)
Return the prediction of the model given the input.
Parameters X : array_like
Input to the model.
Returns Y : array_like
Sampling
2.13 Hybrid Monte Carlo
breze.learn.sampling.hmc.sample(f_energy,
f_energy_prime,
position,
n_steps,
desired_accept=0.9,
initial_step_size=0.01,
step_size_grow=1.02,
step_size_shrink=0.98,
step_size_min=0.0001,
step_size_max=0.25,
avg_accept_slowness=0.9, sample_dim=0)
Return a sample from the distribution given by f_energy.
Parameters
2.13. Hybrid Monte Carlo
21
breze Documentation, Release 0.1
• f_energy – Log of a function proportional to the density.
• f_energy_prime – Derivative of f_energy wrt to the current position.
• position – An numpy array of any desired shape which represents multiple particles.
• n_steps – Amount of steps to perform for the next sample.
• desired_accept – Desired acceptance rate of the underlying Metropolis hastings.
• initial_step_size – Initial size of a step along the energy landscape.
• step_size_grow – If the acceptance rate is too high, increase the step size by this factor.
• step_size_shrink – If the acceptance rate is too low, decrease the step size by this
factor.
• step_size_min – Don’t decrease the step size below this value.
• step_size_max – Don’t increase the step size above this value.
• avg_accept_slowness – When calculating the acceptance rate, use this value as a
decay for an exponential average.
• sample_dim – The axis which discriminates the different particles given in the position
array from each other.
Trainers
2.14 Trainers
2.14.1 Trainer module
2.14.2 Score module
Module for various scoring strategies.
breze.learn.trainer.score.simple(f_score, *data)
Simple scoring strategy which just applies f_score to the passed arguments.
class breze.learn.trainer.score.MinibatchScore(max_samples, sample_dims)
MinibatchScore class.
Scoring strategy for very large data sets, where the score of only a subset of rows can be calculated at the same
time. This score assumes that scores are averages.
Attributes
max_samples
(int) Maximum samples to calculcate the score for at the same time.
sam(list of ints) Dimensions along which the samples are stored. The length of this list corresponds
ple_dims to the number of arguments the score takes. The entry along which different samples are stored.
Methods
__call__(f_score, *data)
22
“Return the score of the data.
Chapter 2. Models and Algorithms
breze Documentation, Release 0.1
__init__(max_samples, sample_dims)
Create MinibatchScore object.
Parameters max_samples : int
Maximum samples to calculcate the score for at the same time.
sample_dims : list of ints
Dimensions along which the samples are stored. The length of this list corresponds to
the number of arguments the score takes. The entry along which different samples are
stored.
2.14.3 Report module
2.14. Trainers
23
breze Documentation, Release 0.1
24
Chapter 2. Models and Algorithms
CHAPTER 3
Helpers, convenience functions and tools
3.1 Feature extraction
3.1.1 Basic feature extraction
breze.learn.feature.rbf(X, n_centers)
Return a design matrix with features given by radial basis functions.
n_centers Gaussian kernels are placed along data dimension, equidistant between the minimum and the maximum along that dimension. The result then contains one column for each of the Kernels.
Parameters
• X – NxD sized array.
• n_centers – Amount of Kernels to use for each dimension.
Returns Nx(n_centers * D) sized array.
3.1.2 Feature extraction for EMG and similar time series data
Module that holds various preprocessing routines for emg signals.
breze.learn.feature.emg.integrated(X)
Return the sum of the absolute values of a signal.
Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different
signals and d is the number of channels.
Returns An (n, d) array.
breze.learn.feature.emg.mean_absolute_value(X)
Return the mean absolute value of the signal.
Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different
signals and d is the number of channels.
Returns An (n, d) array.
breze.learn.feature.emg.modified_mean_absolute_value_1(X)
Return a weighted version of the mean absolute value.
Instead of equal weight, the first and last quarter of the signal are only weighed half.
25
breze Documentation, Release 0.1
breze.learn.feature.emg.modified_mean_absolute_value_2(X)
Return a weighted version of the mean absolute value.
The central half of the signal has weight one. The beginning and the last quarter increase/decrease their weight
towards that.
Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different
signals and d is the number of channels.
Returns An (n, d) array.
breze.learn.feature.emg.mean_absolute_value_slope(X)
Return the first derivative of the mean absolute value.
Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different
signals and d is the number of channels.
Returns An (n, d) array.
breze.learn.feature.emg.variance(X)
Return the variance of the signals.
Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different
signals and d is the number of channels.
Returns An (n, d) array.
breze.learn.feature.emg.root_mean_square(X)
Return the root mean square of the signals.
Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of different
signals and d is the number of channels.
Returns An (n, d) array.
breze.learn.feature.emg.zero_crossing(X, threshold=1e-08)
Return the amount of times the signal crosses the zero y-axis.
Parameters
• X – An (t, n, d) array where t is the number of time steps, n is the number of different signals
and d is the number of channels.
• threshold – Changes below this value are ignored. Useful to surpress noise.
Returns An (n, d) array.
breze.learn.feature.emg.slope_sign_change(X, threshold=1e-08)
Return the amount of times the signal changes slope.
Parameters
• X – An (t, n, d) array where t is the number of time steps, n is the number of different signals
and d is the number of channels.
• threshold – Changes below this value are ignored. Useful to surpress noise.
Returns An (n, d) array.
breze.learn.feature.emg.willison_amplitude(X, threshold=1e-08)
Return the amount of times the difference between two adjacent emg segments exceeds a threshold.
Parameters
• X – An (t, n, d) array where t is the number of time steps, n is the number of different signals
and d is the number of channels.
26
Chapter 3. Helpers, convenience functions and tools
breze Documentation, Release 0.1
• threshold – Changes below this value are ignored. Useful to surpress noise.
Returns An (n, d) array.
3.2 Data manipulation
Module for manipulating data.
breze.learn.data.shuffle(data)
Shuffle the first dimension of an indexable object in place.
breze.learn.data.padzeros(lst, front=True, return_mask=False)
Given a list of arrays, pad every array with up front zeros until they reach unit length.
Each element of lst can have a different first dimension, but has to be equal on the other dimensions.
breze.learn.data.collapse_seq_borders(arr)
Given an array of ndim 3, return a view of ndim 2 where the first dimension is flattened out.
breze.learn.data.uncollapse_seq_borders(arr, shape)
Return a view of ndim 3, given an array of ndim 2, where the first dimension is expanded to 2 dimensions of the
given shape.
breze.learn.data.skip(X, n, d=1)
Return an array X with the same number of rows, but only each n‘th block of d consecutive columns is kept.
Crude way of reducing the dimensionality of time series.
breze.learn.data.interleave(lst)
Given a list of arrays, interleave the arrays in a way that the first dimension represents the first dimension of
every array.
This is useful for time series, where multiple time series should be processed in a single swipe.
breze.learn.data.uninterleave(lst)
Given an array of interleaved arrays, return an uninterleaved version of it.
breze.learn.data.interpolate(X, n_intermediates, kind=’linear’)
Given an array of shape (j, k), return an array of size (j * n_intermediates, k) where each i * n_intermediated
element refers to the i’th element in X while all the others are linearly interpolated.
breze.learn.data.windowify(X, size, offset=1)
Return a static array that represents a sliding window dataset of size size given by the list of arrays ‘.
breze.learn.data.iter_windows(X, size, offset=1)
Return an iterator that goes over a sequential dataset with a sliding time window.
X is expected to be a list of arrays, where each array represents a sequence along its first axis.
breze.learn.data.split(X, maxlength)
Return a list of sequences where each sequence has a length of at most maxlength.
Given a list of sequences X, the sequences are split accordingly.
breze.learn.data.collapse(X, n)
Return a list of sequences, where n consecutive timesteps have been collapsed into a single timestep by concatenation for each sequence.
Timesteps are cut off to ensure divisibility by n.
breze.learn.data.uncollapse(X, n)
Return a list of sequences, where each timestep is divided into n consecutive timesteps.
3.2. Data manipulation
27
breze Documentation, Release 0.1
breze.learn.data.consecutify(seqs)
Given sequences of equal second dimension, put them into a consecutive memory block M and return it. Also
return a list of views to that block that represent the given sequences.
3.3 Various utilities
This function was taken from the deeplearning tutorials. The copyrght notice is in the source.
3.4 Helpers for plotting data
breze.learn.display.scatterplot_matrix(X, C=None, symb=’o’, alpha=1, fig=None)
Return a figure containig a scatter plot matrix.
This is a useful tool for inspecting multi dimensional data. Each dimension will be plotted against each dimension as a scatter plot, arranged into a matrix. The diagonal will contain histograms.
Parameters X : array_like
2D array containing the points to plot.
C : array_like
Class labels (optional). Each row of X with the same value in C will be given the same
color in the plots.
symb : string
Symbol to use for plotting. Will be forwarded to pylab.plot.
alpha : float
Between 0 and 1. Transparency of the points, where 1 means fully opaque.
fig : matplotlib.pyplot.Figure or None
Figure to plot into. If None, will be created itself.
breze.learn.display.time_series_filter_plot(filters,
n_rows=None,
fig=None)
Plot filters for time series data.
n_cols=None,
Each filter is plotted into its own axis.
Parameters filters : array_like
The argument filters is expected to be an array of shape (n_filters,
window_size, n_channels). n_filters is the number of filter banks,
window_size is the length of a time window and n_channels is the number of
different sensors.
n_rows : int, optional, default: None
Number of rows for the plot. If not given, inferred from n_cols to match dimensions.
If n_cols is not given as well, both are taken to be roughly the square root of the
number of filters.
n_cols : int, optional, default: None
28
Chapter 3. Helpers, convenience functions and tools
breze Documentation, Release 0.1
Number of rows for the plot. If not given, inferred from n_rows to match dimensions.
If n_rows is not given as well, both are taken to be roughly the square root of the
number of filters.
fig : Figure, optional
Figure to plot the axes into. If not given, a new one is created.
Returns figure : matplotlib figre
Figure object to save or plot.
The following function was adapted from the scipy cookbook.
breze.learn.display.hinton(ax, W, max_weight=None)
Draws a Hinton diagram for the matrix W to axis ax.
3.4. Helpers for plotting data
29
breze Documentation, Release 0.1
30
Chapter 3. Helpers, convenience functions and tools
CHAPTER 4
Architectures, Components
4.1 Norms
Module containing various norms.
breze.arch.component.norm.l2(arr, axis=None)
Return the L2 norm of a tensor.
Parameters arr : Theano variable.
The variable to calculate the norm of.
axis : integer, optional [default: None]
The sum will be performed along this axis. This makes it possible to calculate the norm
of many tensors in parallel, given they are organized along some axis. If not given, the
norm will be computed for the whole tensor.
Returns res : Theano variable.
If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension
less, where the missing dimension corresponds to axis.
Examples
>>> v = T.vector()
>>> this_norm = l2(v)
>>> m = T.matrix()
>>> this_norm = l2(m, axis=1)
>>> m = T.matrix()
>>> this_norm = l2(m)
breze.arch.component.norm.l1(arr, axis=None)
Return the L1 norm of a tensor.
Parameters arr : Theano variable.
The variable to calculate the norm of.
axis : integer, optional [default: None]
31
breze Documentation, Release 0.1
The sum will be performed along this axis. This makes it possible to calculate the norm
of many tensors in parallel, given they are organized along some axis. If not given, the
norm will be computed for the whole tensor.
Returns res : Theano variable.
If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension
less, where the missing dimension corresponds to axis.
Examples
>>> v = T.vector()
>>> this_norm = l1(v)
>>> m = T.matrix()
>>> this_norm = l1(m, axis=1)
>>> m = T.matrix()
>>> this_norm = l1(m)
breze.arch.component.norm.soft_l1(inpt, eps=1e-08, axis=None)
Return a “soft” L1 norm of a tensor.
√
The term “soft” is used because we are using π‘₯2 + πœ– in favor of |π‘₯| which is not smooth at π‘₯ = 0.
Parameters arr : Theano variable.
The variable to calculate the norm of.
eps : float, optional [default: 1e-8]
Small offset to make the function more smooth.
axis : integer, optional [default: None]
The sum will be performed along this axis. This makes it possible to calculate the norm
of many tensors in parallel, given they are organized along some axis. If not given, the
norm will be computed for the whole tensor.
Returns res : Theano variable.
If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension
less, where the missing dimension corresponds to axis.
Examples
>>> v = T.vector()
>>> this_norm = soft_l1(v)
>>> m = T.matrix()
>>> this_norm = soft_l1(m, axis=1)
>>> m = T.matrix()
>>> this_norm = soft_l1(m)
breze.arch.component.norm.lp(inpt, p, axis=None)
Return the Lp norm of a tensor.
Parameters arr : Theano variable.
32
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
The variable to calculate the norm of.
p : Theano variable or float.
Order of the norm.
axis : integer, optional [default: None]
The sum will be performed along this axis. This makes it possible to calculate the norm
of many tensors in parallel, given they are organized along some axis. If not given, the
norm will be computed for the whole tensor.
Returns res : Theano variable.
If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimension
less, where the missing dimension corresponds to axis.
Examples
>>> v = T.vector()
>>> this_norm = lp(v, .5)
>>> m = T.matrix()
>>> this_norm = lp(m, 3, axis=1)
>>> m = T.matrix()
>>> this_norm = lp(m, 4)
4.2 Transfer functions
Module that keeps various transfer functions as used in the context of neural networks.
breze.arch.component.transfer.tanh(inpt)
Tanh activation function.
Parameters inpt : Theano variable
Input to be transformed.
Returns output : Theano variable
Transformed output. Same shape as inpt.
breze.arch.component.transfer.tanhplus(inpt)
Tanh with added linear activation function.
𝑓 (π‘₯) = π‘‘π‘Žπ‘›β„Ž(π‘₯) + π‘₯
Parameters inpt : Theano variable
Input to be transformed.
Returns output : Theano variable
Transformed output. Same shape as inpt.
breze.arch.component.transfer.sigmoid(inpt)
Sigmoid activation function.
𝑓 (π‘₯) =
4.2. Transfer functions
1
1 + exp(−π‘₯)
33
breze Documentation, Release 0.1
Parameters inpt : Theano variable
Input to be transformed.
Returns output : Theano variable
Transformed output. Same shape as inpt.
breze.arch.component.transfer.rectifier(inpt)
Rectifier activation function.
𝑓 (π‘₯) = max(0, π‘₯)
Parameters inpt : Theano variable
Input to be transformed.
Returns output : Theano variable
Transformed output. Same shape as inpt.
breze.arch.component.transfer.softplus(inpt)
Soft plus activation function.
Smooth approximation to rectifier.
𝑓 (π‘₯) = log(1 + exp(π‘₯))
Parameters inpt : Theano variable
Input to be transformed.
Returns output : Theano variable
Transformed output. Same shape as inpt.
breze.arch.component.transfer.softsign(inpt)
Softsign activation function.
𝑓 (π‘₯) =
π‘₯
1 + |π‘₯|
Parameters inpt : Theano variable
Input to be transformed.
Returns output : Theano variable
Transformed output. Same shape as inpt.
breze.arch.component.transfer.softmax(inpt)
Softmax activation function.
exp(π‘₯𝑖 )
𝑓 (π‘₯𝑖 ) = ∑οΈ€
𝑗 exp(π‘₯𝑗 )
Here, the index runs over the columns of inpt.
Numerical stable version that subtracts the maximum of each row from all of its entries.
Wrapper for theano.nnet.softmax.
Parameters inpt : Theano variable
Array of shape (n, d). Input to be transformed.
Returns output : Theano variable
Transformed output. Same shape as inpt.
34
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
4.3 Loss functions
Module containing several losses usable for supervised and unsupervised training.
A loss is of the form:
def loss(target, prediction, ...):
...
The results depends on the exact nature of the loss. Some examples are:
• coordinate wise loss, such as a sum of squares or a Bernoulli cross entropy with a one-of-k target,
• sample wise, such as neighbourhood component analysis.
In case of the coordinate wise losses, the dimensionality of the result should be the same as that of the predictions and
targets. In all other cases, it is important that the sample axes (usually the first axis) stays the same. The individual
data points lie along the coordinate axis, which might change to 1.
Some examples of valid shape transformations:
(n, d) -> (n, d)
(n, d) -> (n, 1)
These are not valid:
(n, d) -> (1, d)
(n, d) -> (n,)
For some examples, consult the source code of this module.
breze.arch.component.loss.squared(target, prediction)
Return the element wise squared loss between the target and the prediction.
Parameters target : Theano variable
An array of arbitrary shape representing representing the targets.
prediction : Theano variable
An array of arbitrary shape representing representing the predictions.
Returns res : Theano variable
An array of the same shape as target and prediction representing the pairwise
distances.
breze.arch.component.loss.absolute(target, prediction)
Return the element wise absolute difference between the target and the prediction.
Parameters target : Theano variable
An array of arbitrary shape representing representing the targets.
prediction : Theano variable
An array of arbitrary shape representing representing the predictions.
Returns res : Theano variable
An array of the same shape as target and prediction representing the pairwise
distances.
breze.arch.component.loss.cat_ce(target, prediction, eps=1e-08)
Return the cross entropy between the target and the prediction, where prediction is a summary of
the statistics of a categorial distribution and target is a some outcome.
4.3. Loss functions
35
breze Documentation, Release 0.1
Used for multiclass classification purposes.
The loss is different to ncat_ce by that target is not an array of integers but a hot k coding.
Note that predictions are clipped between eps and 1 - eps to ensure numerical stability.
Parameters target : Theano variable
An array of shape (n, k) where n is the number of samples and k is the number of
classes. Each row represents a hot k coding. It should be zero except for one element,
which has to be exactly one.
prediction : Theano variable
An array of shape (n, k). Each row is interpreted as a categorical probability. Thus,
each row has to sum up to one and be non-negative.
Returns res : Theano variable.
An array of the same size as target and prediction representing the pairwise
divergences.
breze.arch.component.loss.ncat_ce(target, prediction)
Return the cross entropy between the target and the prediction, where prediction is a summary of
the statistics of the categorical distribution and target is a some outcome.
Used for classification purposes.
The loss is different to cat_ce by that target is not a hot k coding but an array of integers.
Parameters target : Theano variable
An array of shape (n,) where n is the number of samples. Each entry of the array
should be an integer between 0 and k-1, where k is the number of classes.
prediction : Theano variable
An array of shape (n, k) or (t, n , k). Each row (i.e. entry in the last dimension) is interpreted as a categorical probability. Thus, each row has to sum up to one
and be non-negative.
Returns res : Theano variable
An array of shape (n, 1) as target containing the log probability that that example
is classified correctly.
breze.arch.component.loss.bern_ces(target, prediction)
Return the Bernoulli cross entropies between binary vectors target and a number of Bernoulli variables
prediction.
Used in regression on binary variables, not classification.
Parameters target : Theano variable
An array of shape (n, k) where n is the number of samples and k is the number of
outputs. Each entry should be either 0 or 1.
prediction : Theano variable.
An array of shape (n, k). Each row is interpreted as a set of statistics of Bernoulli
variables. Thus, each element has to lie in (0, 1).
Returns res : Theano variable
An array of the same size as target and prediction representing the pairwise
divergences.
36
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
breze.arch.component.loss.bern_bern_kl(X, Y)
Return the Kullback-Leibler divergence between Bernoulli variables represented by their sufficient statistics.
Parameters X : Theano variable
An array of arbitrary shape where each element represents the statistic of a Bernoulli
variable and thus should lie in (0, 1).
Y : Theano variable
An array of the same shape as target where each element represents the statistic of a
Bernoulli variable and thus should lie in (0, 1).
Returns res : Theano variable
An array of the same size as target and prediction representing the pairwise
divergences.
breze.arch.component.loss.ncac(target, embedding)
Return the NCA for classification loss.
This corresponds to the probability that a point is correctly classified with a soft knn classifier using leave-oneout. Each neighbour is weighted according to an exponential of its negative Euclidean distance. Afterwards, a
probability is calculated for each class depending on the weights of the neighbours. For details, we refer you to
‘Neighbourhood Component Analysis’ by J Goldberger, S Roweis, G Hinton, R Salakhutdinov (2004).
Parameters target : Theano variable
An array of shape (n,) where n is the number of samples. Each entry of the array
should be an integer between 0 and k - 1, where k is the number of classes.
embedding : Theano variable
An array of shape (n, d) where each row represents a point in‘‘d‘‘-dimensional
space.
Returns res : Theano variable
Array of shape (n, 1) holding a probability that a point is classified correclty.
breze.arch.component.loss.ncar(target, embedding)
Return the NCA for regression loss.
This is similar to NCA for classification, except that not soft KNN classification but regression performance is
maximized. (Actually, the negative performance is minimized.)
For details, we refer you to
‘Pose-sensitive embedding by nonlinear nca regression’ by Taylor, G. and Fergus, R. and Williams, G. and
Spiro, I. and Bregler, C. (2010)
Parameters target : Theano variable
An array of shape (n, d) where n is the number of samples and d the dimensionalty
of the target space.
embedding : Theano variable
An array of shape (n, d) where each row represents a point in d-dimensional space.
Returns res : Theano variable
Array of shape (n, 1).
4.3. Loss functions
37
breze Documentation, Release 0.1
breze.arch.component.loss.drlim(push_margin,
pull_margin,
push_loss=’squared’, pull_loss=’squared’)
Return a function that implements the
c_contrastive,
‘Dimensionality reduction by learning an invariant mapping’ by Hadsell, R. and Chopra, S. and LeCun, Y.
(2006).
For an example of such a function, see drlim1 with a margin of 1.
Parameters push_margin : Float
The minimum margin that negative pairs should be seperated by. Pairs seperated by
higher distance than push_margin will not contribute to the loss.
pull_margin: Float
The maximum margin that positive pairs may be seperated by. Pairs seperated by lower
distances do not contribute to the loss.
c_contrastive : Float
Coefficient to weigh the contrastive term relative to the positive term
push_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’
Loss to encourage Euclidean distances between non pairs.
pull_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’
Loss to punish Euclidean distances between pairs.
Returns loss : callable
Function that takes two arguments, a target and an embedding.
4.4 Stochastic Corruption of Theano Variables
This module contains functionality to corrupt Theano variables with noise.
breze.arch.component.corrupt.gaussian_perturb(arr, std, rng=None)
Return a Theano variable which is perturbed by additive zero-centred Gaussian noise with standard deviation
std.
Parameters arr : Theano variable
Array of some shape n.
std : float or scalar Theano variable
Standard deviation of the Gaussian noise.
rng : Theano random number generator, optional [default: None]
Generator to draw random numbers from. If None, rng will be instantiated on the spot.
Returns res : Theano variable
Of shape n.
38
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
Examples
>>> m = T.matrix()
>>> c = gaussian_perturb(m, 0.1)
breze.arch.component.corrupt.mask(arr, p, rng=None)
Return a Theano variable which is with elements of it set to zero with probability p.
Parameters arr : Theano variable
Array of some shape n.
p : float or scalar Theano variable
Probability that a unit is set to zero.
rng : Theano random number generator, optional [default: None]
Generator to draw random numbers from. If None, rng will be instantiated on the spot.
Returns res : Theano variable
Of shape n.
Examples
>>> m = T.matrix()
>>> c = mask(m, 0.1)
4.5 Miscellaneous functionality
Module holding miscellaneous functionality.
breze.arch.component.misc.pairwise_diff(X, Y=None)
Given two arrays with samples in the row, compute the pairwise differences.
Parameters X : Theano variable
Has shape (n, d). Contains one item per first dimension.
Y : Theano variable, optional [default: None]
Has shape (m, d). If not given, defaults to X.
Returns res : Theano variable
Has shape (n, d, m).
breze.arch.component.misc.distance_matrix(X, Y=None, norm=<function l2>)
Return an expression containing the distances given the norm of up to two arrays containing samples.
Parameters X : Theano variable
Has shape (n, d). Contains one item per first dimension.
Y : Theano variable, optional [default: None]
Has shape (m, d). If not given, defaults to X.
norm : string or callable
4.5. Miscellaneous functionality
39
breze Documentation, Release 0.1
Either a string pointing at a function in breze.arch.component.norm or a function that has the same signature as these.
Returns res : Theano variable
Has shape (n, m).
breze.arch.component.misc.distance_matrix_by_diff(diff, norm=<function l2>)
Return an expression containing the distances given the norm norm arrays containing samples.
Parameters D : Theano variable
Has shape (n, d, m) and represents differences between two collections of the same
set.
norm : string or callable
Either a string pointing at a function in breze.arch.component.norm or a function that has the same signature as these.
Returns res : Theano variable
Has shape (n, m).
breze.arch.component.misc.cat_entropy(arr)
Return the entropy of categorical distributions described by the rows in arr.
Parameters arr : Theano variable
Array of shape (n, d) describing n different categorical variables. Rows need to sum
up to 1 and be non-negative.
Returns res : theano variable
Has shape (n,).
breze.arch.component.misc.project_into_l2_ball(arr, radius=1)
Return arr projected into the L2 ball.
Parameters arr : Theano variable
Array of shape either (n, d) or (d,). If the former, all rows are projected individually.
radius : float, optional [default: 1]
Returns res : Theano variable
Projected result of the same shape as arr.
4.6 Layers
Module that contains various layer like components.
breze.arch.component.layer.simple(inpt, weights, bias, out_transfer, p_dropout=0, prefix=’‘)
Return a dictionary containing computations from a simple layer.
The layer has the following form
𝑓 ((π‘₯ · 𝑑)𝑇 π‘Š + 𝑏),
where 𝑓 corresponds to transfer, π‘₯ to input, · indicates the element-wise product, 𝑑 is a vector of Bernoulli
samples with parameter p_dropout, π‘Š is the weight matrix weights and 𝑏 is the bias.
40
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
Parameters inpt : Theano variable
Array of shape (n, d).
weights : Theano variable
Array of shape (d, e).
bias : Theano variable
Array of shape (e,).
transfer : function or string
If a function should given a Theano variable return a Theano variable
of the same shape.
If string, is used to get a transfer function from
breze.arch.component.transfer.
p_dropout : Theano scalar or float
Needs to be in (0, 1). Indicates the probability that an input is set to zero.
prefix : string, optional [default: ‘’]
Each enty in the returned dictionary will be prefixed with this.
Returns d : dict
Has the following entries: output_in, activation before application of transfer.
output, activation after application of transfer.
4.7 Common functions
Module that contains functionality common to many other modules.
breze.arch.component.common.supervised_loss(target, prediction, loss, coord_axis=1,
imp_weight=False, prefix=’‘)
Return a dictionary populated with several expressions for a supervised loss and corresponding targets and
predictions.
Parameters target : Theano variable
Array representing the target variables.
prediction : Theano variable
Array representing the predictions.
loss : callable or string
If a string, should index a member of breze.arch.component.loss. If a
callable, has to be a of the form described in breze.arch.component.loss.
coord_axis : integer, optional [default: 1]
Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or
some spatial axis.
prefix : string, optional [default: ‘’]
Each key in the resulting dictionary will be prefixed with prefix.
imp_weight : Theano variable, float or boolean, optional [default: False]
Importance weights for the loss. Will be multiplied to the coordinate wise loss.
4.7. Common functions
41
breze Documentation, Release 0.1
Returns res : dict
Dictionary containing the expressions. See example for keys.
Examples
>>> import theano.tensor as T
>>> prediction, target = T.matrix('prediction'), T.matrix('target')
>>> from breze.arch.component.loss import squared
>>> loss_dict = supervised_loss(target, prediction, squared,
...
prefix='mymodel-')
>>> sorted(loss_dict.items())
[('mymodel-loss', ...), ('mymodel-loss_coord_wise', ...), ('mymodel-loss_sample_wise', ...), ('m
breze.arch.component.common.unsupervised_loss(output, loss, coord_axis=1, prefix=’‘)
Return a dictionary populated with several expressions for a unsupervised loss and corresponding output.
Parameters output : Theano variable
Array representing the predictions.
loss : callable or string
If a string, should index a member of breze.arch.component.loss. If a
callable, has to be a of the form described in breze.arch.component.loss.
coord_axis : integer, optional [default: 1]
Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or
some spatial axis.
prefix : string, optional [default: ‘’]
Each key in the resulting dictionary will be prefixed with prefix.
Returns res : dict
Dictionary containing the expressions. See example for keys.
Examples
>>> import theano.tensor as T
>>> output = T.matrix('output')
>>> my_loss = lambda x: abs(x)
>>> loss_dict = unsupervised_loss(output, my_loss, prefix='$')
>>> sorted(loss_dict.items())
[('$loss', ...), ('$loss_coord_wise', ...), ('$loss_sample_wise', ...), ('$output', ...)]
4.8 Univariate Normal Distribution
breze.arch.component.distributions.normal.pdf(sample, location=0, scale=1)
Return a theano expression representing the values of the probability density function of a Gaussian distribution.
Parameters sample : Theano variable
Array of shape (n,) where n is the number of samples.
location : Theano variable
42
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
Scalar representing the mean of the distribution.
scale : Theano variable
Scalar representing the standard deviation of the distribution.
Returns l : Theano variable
Array of shape (n,) where each entry represents the density of the corresponding
sample.
Examples
>>>
>>>
>>>
>>>
>>>
>>>
>>>
import theano
import theano.tensor as T
import numpy as np
from breze.learn.utils import theano_floatx
sample, mean, std = T.vector(), T.scalar(), T.scalar()
p = pdf(sample, mean, std)
f_p = theano.function([sample, mean, std], p)
>>> X, = theano_floatx(np.array([-1, 0, 1]))
>>> ps = f_p(X, 0.1, 1.2)
>>> np.allclose(ps, [0.21840613, 0.33129956,
True
0.25094786])
breze.arch.component.distributions.normal.cdf(sample, location=0, scale=1)
Return a theano expression representing the values of the cumulative density function of a Gaussian distribution.
Parameters sample : Theano variable
Array of shape (n,) where n is the number of samples.
location : Theano variable
Scalar representing the mean of the distribution.
scale : Theano variable
Scalar representing the standard deviation of the distribution.
Returns l : Theano variable
Array of shape (n,) where each entry represents the cumulative density of the corresponding sample.
Examples
>>>
>>>
>>>
>>>
>>>
>>>
>>>
import theano
import theano.tensor as T
import numpy as np
from breze.learn.utils import theano_floatx
sample, mean, std = T.vector(), T.scalar(), T.scalar()
c = cdf(sample, mean, std)
f_c = theano.function([sample, mean, std], c)
>>> X, = theano_floatx(np.array([-1, 0, 1]))
>>> cs = f_c(X, 0.1, 1.2)
>>> np.allclose(cs, [0.17965868, 0.46679324, 0.77337265])
True
4.8. Univariate Normal Distribution
43
breze Documentation, Release 0.1
4.9 Multivariate Normal Distribution
Module containing expression buildes for the multivariate normal.
breze.arch.component.distributions.mvn.pdf(sample, mean, cov)
Return a theano expression representing the values of the probability density function of the multivariate normal.
Parameters sample : Theano variable
Array of shape (n, d) where n is the number of samples and d the dimensionality of
the data.
mean : Theano variable
Array of shape (d,) representing the mean of the distribution.
cov : Theano variable
Array of shape (d, d) representing the covariance of the distribution.
Returns l : Theano variable
Array of shape (n,) where each entry represents the density of the corresponding
sample.
Examples
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
import theano
import theano.tensor as T
import numpy as np
from breze.learn.utils import theano_floatx
sample = T.matrix('sample')
mean = T.vector('mean')
cov = T.matrix('cov')
p = pdf(sample, mean, cov)
f_p = theano.function([sample, mean, cov], p)
>>> mu = np.array([-1, 1])
>>> sigma = np.array([[.9, .4], [.4, .3]])
>>> X = np.array([[-1, 1], [1, -1]])
>>> mu, sigma, X = theano_floatx(mu, sigma, X)
>>> ps = f_p(X, mu, sigma)
>>> np.allclose(ps, [4.798702e-01, 7.73744047e-17])
True
breze.arch.component.distributions.mvn.logpdf(sample, mean, cov)
Return a theano expression representing the values of the log probability density function of the multivariate
normal.
Parameters sample : Theano variable
Array of shape (n, d) where n is the number of samples and d the dimensionality of
the data.
mean : Theano variable
Array of shape (d,) representing the mean of the distribution.
cov : Theano variable
Array of shape (d, d) representing the covariance of the distribution.
44
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
Returns l : Theano variable
Array of shape (n,) where each entry represents the log density of the corresponding
sample.
Examples
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
import theano
import theano.tensor as T
import numpy as np
from breze.learn.utils import theano_floatx
sample = T.matrix('sample')
mean = T.vector('mean')
cov = T.matrix('cov')
p = logpdf(sample, mean, cov)
f_p = theano.function([sample, mean, cov], p)
>>> mu = np.array([-1, 1])
>>> sigma = np.array([[.9, .4], [.4, .3]])
>>> X = np.array([[-1, 1], [1, -1]])
>>> mu, sigma, X = theano_floatx(mu, sigma, X)
>>> ps = f_p(X, mu, sigma)
>>> np.allclose(ps, np.log([4.798702e-01, 7.73744047e-17]))
True
4.10 Utilities
class breze.arch.util.Model
Model class.
Intended as a base class for parameterized models providing a convenience method for compilation and a common interface.
We partition Theano variables for parametrized models in three groups. (1) The adaptable parameters, (2)
external variables such as inputs and targets, the data (3) expressions composed out of the two, such as the
prediction of a model or the loss resulting from those.
There are several “reserved” names for expressions.
•inpt: observations of a supervised or unsupervised model,
•target: desired outputs of a supervised model,
•loss: quantity to be optimized for fitting the parameters; might not refer to the criterion of interest, but
instead to a regularzied objective.
•true_loss: Quantity of interest for the user, e.g. the loss without regularization or the empirical risk.
Overriding these names is possible in general, but is part of the interface and will lead to unexpected behaviour
with functionality building upon this.
Lookup of variables and expressions is typically done in the following ways.
•as the variable/expression itself,
•as a string which is the attribute/key to look for in the ParameterSet
4.10. Utilities
45
breze Documentation, Release 0.1
object/expression dictinary, - as a path along theese, e.g. the tuple (’foo’, ’bar’, 0) will
identify .parameters.foo.bar[0] or .parameters[’foo’][’bar’][0] depending on
the context.
Attributes
pars
exprs
updates
(ParameterSet object) Holding the adaptable parameters of the object.
(dictionary) Containig the expressions. Out of convenience, the external variables are held in here
as well.
(dict) Containing update variables, e.g. due to the use of theano.scan.
Methods
function
var_exp_for_gpu
function(variables, exprs, mode=None, explicit_pars=False, givens=None, on_unused_input=’raise’,
numpy_result=False)
Return a compiled function for the given exprs given variables.
Parameters variables : list of strings
Each string refers to an item in .exprs and is considered an input to the function.
exprs : (List of) Theano expression or string
Expressions for which to create the function. If a single expression is given, the function
will return a single value; if a list is given, the result will be a tuple containing one
element for each. An expression can either be a Theano expression or a string. In the
latter case, the corresponding expression will be retrieved from .exprs.
mode : string or None, optional, default: None
Mode to use for compilation. Passed on to theano.function. See Theano documentation for details. If None, self.mode will be used.
explicit_pars: boolean, optional, default: False
If True, the first argument to the function is expected to be an array representing the
adaptable parameters of the model.
givens : dictionary, optional, default: None
Dictionary of substitutions for compilation. Not passed on to theano.function,
instead the expressions are cloned. See code for further details.
on_unused_input: string
Specifiy behaviour in case of unused inputs. Passed on to theano.function. See
Theano documentation for details.
numpy_result : boolean, optional, default: False
If set to True, a numpy array is always returned, even if the computation is done on the
GPU and a gnumpy array was more natural.
46
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
var_exp_for_gpu(variables, exprs, outputs=True)
Given variables and theano expressions built from these variables, return variables and expressions of the
same form that are tailored towards GPU usage.
class breze.arch.util.ParameterSet
ParameterSet class.
This class provides functionality to group several Theano tensors of different sizes in a consecutive chunk of
memory. The main aim of this is to allow a view on several tensors as a single long vector.
In the following, a (parameter) array refers to a concrete instantiation of a parameter variable (with concrete
values) while a (parameter) tensor/variable refers to the symbolic Theano variable.
Initialization takes a variable amount of keyword arguments, where each has to be a single integer or a tuple of
arbitrary length containing only integers. For each of the keyword argument keys a tensor of the shape given by
the value will be created. The key is the identifier of that variable.
All symbolic variables can be accessed as attributes of the object, all concrete variables as keys. E.g. parameter_set.x references the symbolic variable, while parameter_set[’x’] will give you the concrete array.
Attributes
n_pars (integer) Total amount of parameters.
flat
(Theano vector) Flat one dimensional tensor containing all the different tensors flattened out.
Symbolic pendant to data.
data
(array_like) Concrete array containig all the different arrays flattened out. Concrete pendant to
flat.
views (dict) All parameter arrays can be accessed by with their identifier as key in this dictionary.
Methods
alloc
declare
view
4.10.1 Nested Lists for Theano, etc.
breze.arch.util.flatten(nested)
Flatten nested tuples and/or lists into a flat list.
breze.arch.util.unflatten(tmpl, flat)
Nest the items in flat into the shape of tmpl.
breze.arch.util.theano_function_with_nested_exprs(variables, exprs, *args, **kwargs)
Creates and returns a theano.function that takes values for variables as arguments, where
variables‘ may contain nested lists and/or tuples, and returns values for
‘‘exprs, where again exprs may contain nested lists and/or tuples.
All other arguments are passed to theano.function without modification.
breze.arch.util.theano_expr_bfs(expr)
Generator function to walk a Theano expression graph in breadth first.
breze.arch.util.tell_deterministic(expr)
Return True iff no random number generator is in the expression graph.
4.10. Utilities
47
breze Documentation, Release 0.1
4.10.2 GPU related utilities
breze.arch.util.cpu_tensor_to_gpu(tensor)
Given a tensor for the CPU return a tensor of the same type and name for the GPU.
breze.arch.util.cpu_tensor_to_gpu_nested(inpts, cache=None)
Given a list (of lists of...) CPU tensor variables return as list of the same types of corresponding GPU tensor
varaibles.
Also return a dictionary containing all substitutions done. This can be provided to future calls to not make
conversions multiple times.
breze.arch.util.cpu_expr_to_gpu(expr, unsafe=False)
Given a CPU expr return the same expression for the GPU.
If unsafe is set to True, subsequent function calls evaluating the expression might return arrays pointing at the
same memory region.
breze.arch.util.cpu_expr_to_gpu_nested(inpts, unsafe=False)
Given a list (of lists of...) expressions, return expressions for the GPU.
If unsafe is set to True, subsequent function calls evaluating the expression might return arrays pointing at the
same memory region.
breze.arch.util.garray_to_cudandarray_nested(lst)
breze.arch.util.gnumpy_func_wrap(f )
Wrap a function that accepts and returns CudaNdArrays to accept and return gnumpy arrays.
4.10.3 Other
breze.arch.util.get_named_variables(dct, name=True, overwrite=False, prefix=’‘)
Return a dictionary with all the items from dct with only Theano variables/expressions.
If name is set to True, the variables will be named accordingly, however not be overwritten unless overwrite
is True as well.
breze.arch.util.lookup(what, where, default=None)
Return where.what if what is a string, otherwise what. If not found return default.
breze.arch.util.lookup_some_key(what, where, default=None)
Given a list of keys what, return the first of those to which there is an item in where.
If nothing is found, return default.
For variance propagation:
4.11 Common functions
breze.arch.component.varprop.common.supervised_loss(target, prediction, loss, coord_axis=1, imp_weight=False,
prefix=’‘)
Return a dictionary populated with several expressions for a supervised loss and corresponding targets and
predictions.
Version for variance propagation, where the prediction is not only a point but a mean with a variance.
Parameters target : Theano variable
48
Chapter 4. Architectures, Components
breze Documentation, Release 0.1
Array representing the target variables.
coord_axis.
Has size d along the coordinate axis
prediction : Theano variable
Array representing the predictions. Has size 2 * d along the coordinate axis, where
the first half corresponds to the mean and the second half to the variance of the prediction.
loss : callable or string
If a string, should index a member of breze.arch.component.loss.
If
a
callable,
has
to
be
a
of
the
form
described
in
breze.arch.component.varprop.loss.
coord_axis : integer, optional [default: 1]
Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or
some spatial axis.
imp_weight : Theano variable, float or boolean, optional [default: False]
Importance weights for the loss. Will be multiplied to the coordinate wise loss.
prefix : string, optional [default: ‘’]
Each key in the resulting dictionary will be prefixed with prefix.
Returns res : dict
Dictionary containing the expressions. See example for keys.
Examples
>>> import theano.tensor as T
>>> prediction, target = T.matrix('prediction'), T.matrix('target')
>>> from breze.arch.component.varprop.loss import diag_gaussian_nll
>>> loss_dict = supervised_loss(target, prediction, diag_gaussian_nll,
...
prefix='mymodel-')
>>> sorted(loss_dict.items())
[('mymodel-loss', ...), ('mymodel-loss_coord_wise', ...), ('mymodel-loss_sample_wise', ...), ('m
breze.arch.component.varprop.common.unsupervised_loss(output, loss, coord_axis=1,
prefix=’‘)
Return a dictionary populated with several expressions for a unsupervised loss and corresponding output.
Version for variance propagation, where the prediction is not only a point but a mean with a variance.
Parameters output : Theano variable
Array representing the output of the model. Has size 2 * d along the coordinate axis,
where the first half corresponds to the mean and the second half to the variance of the
prediction.
loss : callable or string
If a string, should index a member of breze.arch.component.loss.
If
a
callable,
has
to
be
a
of
the
form
described
in
breze.arch.component.varprop.loss.
coord_axis : integer, optional [default: 1]
4.11. Common functions
49
breze Documentation, Release 0.1
Axis aong which the coordinates of single sample are stored. I.e. not the sample axis or
some spatial axis.
prefix : string, optional [default: ‘’]
Each key in the resulting dictionary will be prefixed with prefix.
Returns res : dict
Dictionary containing the expressions. See example for keys.
Examples
>>> import theano.tensor as T
>>> output = T.matrix('output')
>>> my_loss = lambda x: abs(x)
>>> loss_dict = unsupervised_loss(output, my_loss, prefix='$')
>>> sorted(loss_dict.items())
[('$loss', ...), ('$loss_coord_wise', ...), ('$loss_sample_wise', ...), ('$output', ...)]
50
Chapter 4. Architectures, Components
CHAPTER 5
Implementation Notes
5.1 Variance propagation
This package implements variance propagating networks.
If we really want to talk about neural networks in a probabilistic way, the right way to do it is to treat every number in
the network as a Dirac distributed value.
There have been numerous attempts to model the adaptable parameters of networks as random variables, leading to so
called “Bayesian Neural Networks”.
In some applications, it makes sense to treat the activations as random variables. This can be done very efficiently and
with a very good approximation for the mean and the variance of random variables.
The algorithm for this has initially been described in [FD] and been described in the context of RNNs in [FD-RNN].
5.1.1 References
5.1.2 Recurrent Networks
Module implementing variance propagation and fast dropout for recurrent networks.
In this module, we will often do with multiple sequences organized into a single Theano tensor. This tensor then has
the shape of (t, n, d), where
• t is the number of time steps,
• n is the number of samples and
• d is the dimensionality of each sample.
We call these “sequence tensor”. Sometimes, it makes sense to flatten out the time dimension to apply better optimized
linear algebra, such as a dot product. In that case, we will talk of a “flat sequence tensor”.
breze.arch.model.varprop.rnn.recurrent_layer(in_mean, in_var,
tial_hidden_mean,
p_dropout)
Return a theano variable representing a recurrent layer.
weights, f, iniinitial_hidden_var,
Parameters in_mean : Theano variable
Sequence tensor of shape (t, n ,d). Represents the mean of the input to the layer.
in_var : Theano variable
51
breze Documentation, Release 0.1
Sequence tensor. Represents the variance of the input to the layer. Either (a) same shape
as the mean or (b) scalar.
weights : Theano variable
Theano matrix of shape (d, d). Represents the recurrent weight matrix the hiddens
are right multiplied with.
f : function
Function that takes a theano variable and returns a theano variable of the same shape.
Meant as transfer function of the layer.
initial_hidden : Theano variable
Theano vector of size d, representing the initial hidden state.
p_dropout : Theano variable
Scalar representing the probability that unit is dropped out.
Returns hidden_in_mean_rec : Theano variable
Theano sequence tensor representing the mean of the hidden activations before the application of f.
hidden_in_var_rec : Theano variable
Theano sequence tensor representing the varianceof the hidden activations before the
application of f.
hidden_mean_rec : Theano variable
Theano sequence tensor representing the mean of the hidden activations after the application of f.
hidden_var_rec : Theano variable
Theano sequence tensor representing the varianceof the hidden activations after the application of f.
5.1.3 Transfer functions
Module that contains transfer functions for variance propagation, working on Theano variables.
Each transfer function has the signature:
m2, s2 = f(m1, s1)
where f is the transfer function, m1 and s2 are the pre-synaptic mean and variance respectively; m2 and s2 are the
post-synaptic means.
breze.arch.component.varprop.transfer.identity(mean, var)
Return the mean and variance unchanged.
Parameters mean : Theano variable
Theano variable of the shape s.
var : Theano variable
Theano variable of the shape s.
Returns mean_ : Theano variable
Theano variable of the shape r.
52
Chapter 5. Implementation Notes
breze Documentation, Release 0.1
var_ : Theano variable
Theano variable of the shape r.
breze.arch.component.varprop.transfer.sigmoid(mean, var)
Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne,
after passing it through a logistic sigmoid.
Parameters mean : Theano variable
Theano variable of the shape s.
var : Theano variable
Theano variable of the shape s.
Returns mean_ : Theano variable
Theano variable of the shape r.
var_ : Theano variable
Theano variable of the shape r.
breze.arch.component.varprop.transfer.rectifier(mean, var)
Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne,
after passing it through a rectified linear unit.
Parameters mean : Theano variable
Theano variable of the shape s.
var : Theano variable
Theano variable of the shape s.
Returns mean_ : Theano variable
Theano variable of the shape r.
var_ : Theano variable
Theano variable of the shape r.
breze.arch.component.varprop.transfer.tanh(mean, var)
Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne,
after passing it through a tangent hyperbolicus.
Parameters mean : Theano variable
Theano variable of the shape s.
var : Theano variable
Theano variable of the shape s.
Returns mean_ : Theano variable
Theano variable of the shape r.
var_ : Theano variable
Theano variable of the shape r.
5.1. Variance propagation
53
breze Documentation, Release 0.1
5.1.4 Losses
Module containing several losses usable for supervised and unsupervised training. This is different from
breze.component.loss in the sense that each prediction is also assumed to have a variance.
The losses in this module assume two inputs: a target and a prediction. Additionally, if the target has a dimensionality
of D, the prediction is assumed to have a dimensionality of 2D. The first D element constitute to the mean while the
latter to the variance.
Additionally, all losses from breze.arch.component.loss are also available; here, we just ignore the variance
part of the input to the loss.
54
Chapter 5. Implementation Notes
CHAPTER 6
Indices and tables
• genindex
• modindex
• search
55
breze Documentation, Release 0.1
56
Chapter 6. Indices and tables
Bibliography
[XCA] Extreme component analysis, Welling et al (2003)
[LFRKM] Learning Feature Representations with K-means, Adam Coates (2012)
[R3] Discriminative clustering by regularized information maximization, by Gomes, R. and Krause, A. and Perona,
P., NIPS 2010
[R1] Xu, Zhixiang Eddie, Kilian Q. Weinberger, and Fei Sha. “Rapid feature learning with stacked linear denoisers.”
arXiv preprint arXiv:1105.0972 (2011).
[D1] Hinton, Geoffrey E., et al. “Improving neural networks by preventing co-adaptation of feature detectors.” arXiv
preprint arXiv:1207.0580 (2012).
[FD] Wang, Sida, and Christopher Manning. “Fast dropout training.” Proceedings of the 30th International Conference
on Machine Learning (ICML-13). 2013.
[FD] Wang, Sida, and Christopher Manning. “Fast dropout training.” Proceedings of the 30th International Conference
on Machine Learning (ICML-13). 2013.
[FD-RNN] Bayer, Justin, et al. “On Fast Dropout and its Applicability to Recurrent Networks.” arXiv preprint
arXiv:1311.0701 (2013).
57
breze Documentation, Release 0.1
58
Bibliography
Python Module Index
b
breze.arch.component.common, 41
breze.arch.component.corrupt, 38
breze.arch.component.distributions.mvn,
44
breze.arch.component.distributions.normal,
42
breze.arch.component.layer, 40
breze.arch.component.loss, 35
breze.arch.component.misc, 39
breze.arch.component.norm, 31
breze.arch.component.varprop.common, 48
breze.arch.component.varprop.loss, 54
breze.arch.component.varprop.transfer,
52
breze.arch.model.varprop, 51
breze.arch.model.varprop.rnn, 51
breze.learn.data, 27
breze.learn.feature, 25
breze.learn.feature.emg, 25
breze.learn.kmeans, 12
breze.learn.lde, 15
breze.learn.mlp, 17
breze.learn.pca, 5
breze.learn.rim, 13
breze.learn.rnn, 16
breze.learn.sfa, 11
breze.learn.sparsefiltering, 9
breze.learn.trainer.score, 22
breze.learn.xca, 7
59
breze Documentation, Release 0.1
60
Python Module Index
Index
Symbols
breze.learn.kmeans (module), 12
breze.learn.lde (module), 15
__init__() (breze.learn.lde.LinearDenoiser method), 15
breze.learn.mlp (module), 17
__init__() (breze.learn.mlp.DropoutMlp method), 19
__init__()
(breze.learn.mlp.FastDropoutNetwork breze.learn.pca (module), 5
breze.learn.rim (module), 13
method), 21
breze.learn.rnn (module), 16
__init__() (breze.learn.pca.Pca method), 5
breze.learn.sfa (module), 11
__init__() (breze.learn.pca.Zca method), 6
breze.learn.sparsefiltering (module), 9
__init__() (breze.learn.rim.Rim method), 14
__init__() (breze.learn.sfa.SlowFeatureAnalysis method), breze.learn.trainer.score (module), 22
breze.learn.xca (module), 7
11
__init__()
(breze.learn.sparsefiltering.SparseFiltering
C
method), 9
__init__()
(breze.learn.trainer.score.MinibatchScore cat_ce() (in module breze.arch.component.loss), 35
cat_entropy() (in module breze.arch.component.misc), 40
method), 23
cca() (in module breze.learn.cca), 10
__init__() (breze.learn.xca.Xca method), 8
cdf() (in module breze.arch.component.distributions.normal),
43
A
collapse()
(in module breze.learn.data), 27
absolute() (in module breze.arch.component.loss), 35
collapse_seq_borders() (in module breze.learn.data), 27
consecutify() (in module breze.learn.data), 27
B
cpu_expr_to_gpu() (in module breze.arch.util), 48
bern_bern_kl() (in module breze.arch.component.loss),
cpu_expr_to_gpu_nested() (in module breze.arch.util), 48
36
cpu_tensor_to_gpu() (in module breze.arch.util), 48
bern_ces() (in module breze.arch.component.loss), 36
cpu_tensor_to_gpu_nested() (in module breze.arch.util),
breze.arch.component.common (module), 41
48
breze.arch.component.corrupt (module), 38
breze.arch.component.distributions.mvn (module), 44
D
breze.arch.component.distributions.normal (module), 42
distance_matrix()
(in
module
breze.arch.component.layer (module), 40
breze.arch.component.misc),
39
breze.arch.component.loss (module), 35
distance_matrix_by_diff()
(in
module
breze.arch.component.misc (module), 39
breze.arch.component.misc),
40
breze.arch.component.norm (module), 31
drlim() (in module breze.arch.component.loss), 37
breze.arch.component.transfer (module), 33
DropoutMlp (class in breze.learn.mlp), 18
breze.arch.component.varprop.common (module), 48
breze.arch.component.varprop.loss (module), 54
F
breze.arch.component.varprop.transfer (module), 52
FastDropoutNetwork (class in breze.learn.mlp), 20
breze.arch.model.varprop (module), 51
fit() (breze.learn.kmeans.GainShapeKMeans method), 12
breze.arch.model.varprop.rnn (module), 51
fit() (breze.learn.lde.LinearDenoiser method), 15
breze.learn.data (module), 27
fit() (breze.learn.mlp.DropoutMlp method), 19
breze.learn.feature (module), 25
fit() (breze.learn.mlp.FastDropoutNetwork method), 21
breze.learn.feature.emg (module), 25
61
breze Documentation, Release 0.1
fit() (breze.learn.mlp.Mlp method), 18
fit() (breze.learn.pca.Pca method), 5
fit() (breze.learn.pca.Zca method), 6
fit() (breze.learn.rim.Rim method), 14
fit() (breze.learn.rnn.SupervisedRnn method), 16
fit() (breze.learn.sfa.SlowFeatureAnalysis method), 11
fit() (breze.learn.sparsefiltering.SparseFiltering method),
10
fit() (breze.learn.xca.Xca method), 8
flatten() (in module breze.arch.util), 47
function() (breze.arch.util.Model method), 46
G
M
mask() (in module breze.arch.component.corrupt), 39
mean_absolute_value()
(in
module
breze.learn.feature.emg), 25
mean_absolute_value_slope()
(in
module
breze.learn.feature.emg), 26
MinibatchScore (class in breze.learn.trainer.score), 22
Mlp (class in breze.learn.mlp), 17
Model (class in breze.arch.util), 45
modified_mean_absolute_value_1()
(in
module
breze.learn.feature.emg), 25
modified_mean_absolute_value_2()
(in
module
breze.learn.feature.emg), 25
GainShapeKMeans (class in breze.learn.kmeans), 12
garray_to_cudandarray_nested()
(in
module N
breze.arch.util), 48
ncac() (in module breze.arch.component.loss), 37
gaussian_perturb()
(in
module ncar() (in module breze.arch.component.loss), 37
breze.arch.component.corrupt), 38
ncat_ce() (in module breze.arch.component.loss), 36
get_named_variables() (in module breze.arch.util), 48
gnumpy_func_wrap() (in module breze.arch.util), 48
P
padzeros() (in module breze.learn.data), 27
pairwise_diff() (in module breze.arch.component.misc),
hinton() (in module breze.learn.display), 29
39
ParameterSet (class in breze.arch.util), 47
I
Pca (class in breze.learn.pca), 5
identity() (in module breze.arch.component.varprop.transfer),pdf() (in module breze.arch.component.distributions.mvn),
52
44
integrated() (in module breze.learn.feature.emg), 25
pdf() (in module breze.arch.component.distributions.normal),
interleave() (in module breze.learn.data), 27
42
interpolate() (in module breze.learn.data), 27
predict() (breze.learn.mlp.DropoutMlp method), 20
inverse_transform() (breze.learn.pca.Pca method), 5
predict() (breze.learn.mlp.FastDropoutNetwork method),
inverse_transform() (breze.learn.pca.Zca method), 7
21
inverse_transform() (breze.learn.xca.Xca method), 8
predict() (breze.learn.mlp.Mlp method), 18
iter_fit() (breze.learn.mlp.DropoutMlp method), 19
predict() (breze.learn.rnn.SupervisedRnn method), 17
iter_fit() (breze.learn.mlp.FastDropoutNetwork method), project_into_l2_ball()
(in
module
21
breze.arch.component.misc), 40
iter_fit() (breze.learn.mlp.Mlp method), 18
iter_fit() (breze.learn.rim.Rim method), 14
R
iter_fit() (breze.learn.rnn.SupervisedRnn method), 16
rbf() (in module breze.learn.feature), 25
iter_fit()
(breze.learn.sparsefiltering.SparseFiltering reconstruct() (breze.learn.pca.Pca method), 6
method), 10
reconstruct() (breze.learn.pca.Zca method), 7
iter_windows() (in module breze.learn.data), 27
reconstruct() (breze.learn.xca.Xca method), 8
H
rectifier() (in module breze.arch.component.transfer), 34
rectifier() (in module breze.arch.component.varprop.transfer),
l1() (in module breze.arch.component.norm), 31
53
l2() (in module breze.arch.component.norm), 31
recurrent_layer()
(in
module
LinearDenoiser (class in breze.learn.lde), 15
breze.arch.model.varprop.rnn), 51
logpdf() (in module breze.arch.component.distributions.mvn),
Rim (class in breze.learn.rim), 13
44
root_mean_square() (in module breze.learn.feature.emg),
lookup() (in module breze.arch.util), 48
26
lookup_some_key() (in module breze.arch.util), 48
S
lp() (in module breze.arch.component.norm), 32
sample() (in module breze.learn.sampling.hmc), 21
L
62
Index
breze Documentation, Release 0.1
scatterplot_matrix() (in module breze.learn.display), 28
unsupervised_loss()
(in
module
shuffle() (in module breze.learn.data), 27
breze.arch.component.common), 42
sigmoid() (in module breze.arch.component.transfer), 33 unsupervised_loss()
(in
module
sigmoid() (in module breze.arch.component.varprop.transfer),
breze.arch.component.varprop.common),
53
49
simple() (in module breze.arch.component.layer), 40
V
simple() (in module breze.learn.trainer.score), 22
skip() (in module breze.learn.data), 27
var_exp_for_gpu() (breze.arch.util.Model method), 46
slope_sign_change() (in module breze.learn.feature.emg), variance() (in module breze.learn.feature.emg), 26
26
SlowFeatureAnalysis (class in breze.learn.sfa), 11
W
soft_l1() (in module breze.arch.component.norm), 32
willison_amplitude()
(in
module
softmax() (in module breze.arch.component.transfer), 34
breze.learn.feature.emg), 26
softplus() (in module breze.arch.component.transfer), 34 windowify() (in module breze.learn.data), 27
softsign() (in module breze.arch.component.transfer), 34
SparseFiltering (class in breze.learn.sparsefiltering), 9
X
split() (in module breze.learn.data), 27
Xca (class in breze.learn.xca), 7
squared() (in module breze.arch.component.loss), 35
supervised_loss()
(in
module Z
breze.arch.component.common), 41
supervised_loss()
(in
module Zca (class in breze.learn.pca), 6
zero_crossing() (in module breze.learn.feature.emg), 26
breze.arch.component.varprop.common),
48
SupervisedRnn (class in breze.learn.rnn), 16
T
tanh() (in module breze.arch.component.transfer), 33
tanh() (in module breze.arch.component.varprop.transfer),
53
tanhplus() (in module breze.arch.component.transfer), 33
tell_deterministic() (in module breze.arch.util), 47
theano_expr_bfs() (in module breze.arch.util), 47
theano_function_with_nested_exprs()
(in
module
breze.arch.util), 47
time_series_filter_plot() (in module breze.learn.display),
28
transform()
(breze.learn.kmeans.GainShapeKMeans
method), 13
transform() (breze.learn.lde.LinearDenoiser method), 15
transform() (breze.learn.pca.Pca method), 6
transform() (breze.learn.pca.Zca method), 7
transform() (breze.learn.rim.Rim method), 14
transform()
(breze.learn.sfa.SlowFeatureAnalysis
method), 11
transform() (breze.learn.sparsefiltering.SparseFiltering
method), 10
transform() (breze.learn.xca.Xca method), 8
U
uncollapse() (in module breze.learn.data), 27
uncollapse_seq_borders() (in module breze.learn.data),
27
unflatten() (in module breze.arch.util), 47
uninterleave() (in module breze.learn.data), 27
Index
63
Download