MIBTP Masterclass The Dos and Don¹ts of Scientific Image Manipulation Till Bretschneider, Warwick Systems Biology Centre T.Bretschneider@warwick.ac.uk 25/26 November 2015 Outline Wednesday 25th November: 10:00 – 12:00 Part I: Introduction to digital image processing, Background subtraction Understanding the difference between linear and nonlinear filtering filtering Part II: Introduction to ImageJ 13:00 – 16:30 Practical 1 Thursday 26th November: 10:00 – 12:00 Part I: Principles of Colocalisation Analysis Part II: Common Pitfalls in Imaging 13:00 – 16:30 Practical 2 Imaging is an ubiquitous analytical tool • Light is an extremely powerful tool for measurements on very different scales, from single molecules to galaxies. • Many different sources of images (Light microscopy, AFM, EM, ultrasound, MRI, remote sensing, Array techniques, Gels) • Imaging in the life sciences provides an ever increasing amount of image data (Fluorescence imaging: GFP and variants, multichannel recordings of time-series, zsectioning, ultrafast cameras, robotics & high-throughput Systems • In order to extract and interpret data obtained from images we must understand at least the limitations of different analysis methods. Digital representation of images • • • • • Physical dimension: X x Y pixels Number of Z slices if it is a 3D stack Number of T timepoints if we have a time-series Variable number of channels Number of different brightness values per channel: 2,8,16 bit depth 180 185 183 184 185 186 187 188 189 184 182 182 191 189 179 187 185 100 184 185 184 183 185 185 181 183 156 183 187 183 181 188 178 189 180 123 185 185 189 182 186 186 185 180 245 183 185 187 181 185 185 180 181 180 180 187 188 181 185 187 180 174 144 187 181 185 183 182 186 181 175 166 212 181 188 180 186 189 174 181 185 255 180 183 182 183 183 175 185 185 101 184 189 184 180 183 181 183 182 How are the numbers encoded in an image file? • Image file formats like the popular TIFF (tagged image file format) can store images with different bit depths, and support different colour spaces. • They contain standard meta-information: size, colour space, compression, scaling. • TIFF is extremely flexible, e.g. software vendors support proprietary tags which poses problems in terms of compatibility: objective used, laser lines, • Today ImageJ supports most proprietary formats by using plugins such as LOCI (bio-formats library). Bit Depths & Image Brightness Histograms Quality Control of Under- and Overexposure • 8 bit images: 256 grey levels for medium range cameras 1 bit accounts already for readout noise • 16 bit images: 65536 grey levels Most cameras are 10 or 12 bit (4096 grey levels) only, however • Histograms immediately reveal under-/ and over-exposure • Quantitative imaging requires that we always work in the linear range and avoid saturation Display Transfer Function • The DTF maps the grey values of an image to intensity values displayed on the screen. For most image types (TIFF) it is stored within the image meta information (header) and therefore restored when the image is re-opened.. • The original grey values are not altered unless you apply the DTF. • Contrast and brightness are parameters of the display transfer function determining its … and … LUTs (Lookup Tables) • Scientific images usually cover a large dynamic range which makes it difficult to display faint details and prominent structures simultaneously. • Often colour is used to encode data with a large dynamic range, simply because we can discriminate more easily between many more colours than grey values • LUTs map intensity values to a specific grey value or colour. Using LUTs does not change the image content. • However, be aware that colour perception is very subjective and many are colour blind Don’t trust your eyes, but get used to looking at the numerical values of image intensities Analyze->Plot Profile • Values can be saved in form of a table and imported into Excel, Matlab … Image quality control • Exploit the full dynamic range by setting camera gain (contrast) and offset (brightness, sensitivity) correctly • Avoid any auto-gain and auto-gamma settings • Check that you work in the linear range • Judge by looking at (log) histograms, not by eye, and look at sample values, for example do a simple line profile • Reduce noise (increase exposure times, average successive images) • Consider trade-off between light intensity and photodamage Image Filtering Most typical applications – Background correction – Noise reduction – Feature detection Background subtraction is a requirement for quantitative analysis • Correct for inhomogeneous illumination • Remove unspecific labelling (see practical) / = Image math: Noise reduction • Point-wise operations: Add, subtract, multiply and divide … • Neighbourhood or Filter operations Linear Filters: Kernel operations Example: Neighbourhood averaging Each pixel is replaced with the average of itself and its neighbours Kernel usually is an square array of odd dimension 2m+1 containing weights H, which usually are integers Replace central pixel by: Image boundaries need to be treated differently I 'x , y = ! 1 1 1" # $ 3x3 box filter kernel: Hi , j = #1 1 1$ # 1 1 1$ % & +m ∑ H i , j ⋅I x+i , y+ j i , j=− m +m ∑ i , j=− m Hi , j How does averaging remove noise? Intuitively, by smoothing… Technically, we make the assumption that we have additive noise n(x) which is independent of the intensity I(x) and identically distributed, i.e. the amount of noise is the same everywhere. +m I(x,y) = Iw/o noise(x,y) + n(x,y) When averaging we replace I by I ' x ,y = ∑ I x+i ,y+ j i , j=− m (2m+1) 2 , which we can also write as I’ (x,y) = Iw/o noise, avg(x,y) + navg(x,y) , with the variance of the averaged noise navg being Var(navg) = Var(n) / N = Var(n) / (2m+1)2 In effect, we divide the variance of the noise by the number of pixels over which we average. Noise reduction by simple averaging (mitotic spindle) original 3x3 5x5 Kernel Properties (Linear Filtering) Size: determines computational cost bigger kernels for denoising result in decreased contrast Shape: (quadratic, disc shaped, ring shaped, discontinuous): Box filter with equal weighting causes anisotropic filtering Disc shape kernels preferred->isotropic smoothing Weighting: Any linear filter with positive coefficients is a smoothing filter In frequency space this corresponds to a low pass filter If there are negative coefficients (difference filters) local intensity changes are enhanced Used in sharpen filters for edge detection for example (high pass filter) Gaussian smoothing G ( x, y ) = 1 2πσ 2 e − x2 + y 2 2σ 2 "1 4 1# $ % 3x3 pixels, σ = 0.391: $ 4 12 4 % $1 4 1% & ' "0 0 1 1 1 1 $ $0 1 2 3 3 3 $1 2 3 6 7 6 $ $ 1 3 6 9 11 9 9x9 pixels, σ = 1: $ 1 3 7 11 12 11 $ $ 1 3 6 9 11 9 $1 2 3 6 7 6 $ $0 1 2 3 3 3 $0 0 1 1 1 1 & 1 0 2 1 3 2 6 7 3 3 6 3 3 2 2 1 1 0 0# % 0% 1% % 1% 1% % 1% 1% % 0% 0 %' Gaussian • Gives you more control over smoothing than simple box averaging • works as a low pass filter, reduces noise, but blurs edges, displaces boundaries, reduces contrast, can introduce artefacts • Assumption: all pixels in a given neighbourhood belong to the same feature, Problem: boundaries of features • linear Example for a difference filter: Laplacian filter • Zeros regions with even intensities and smooth gradients " −1 −1 −1# • highlights discontinuities, sharpens $ % boundaries, works as a high pass filter 3x3 kernel:$ −1 8 −1% $ −1 −1 −1% • Result can be negative: convert to 32 bit & ' or add 128 in 8 bit images before filtering • Kernel is an approximation to the second derivative of the brightness along x and y • Invariant to rotation (insensitive with 2 2 ∂ B ∂ B regard to the direction of the edge) 2 ∇ B= 2 + 2 • Enhances noise: Laplacian will highlight ∂x ∂y singular points more than edges and boundaries Difference filters: If negative values occur in the kernel we can interpret this as the difference of two sums, the weighted sum of all pixels with positive weights minus the weighted sum of all pixels with negative weights !!0 0 0" ! −1 −1 −1" ! 1 1 1" " $$ $ % % $ %% − 1 8 − 1 = 9 0 1 0 − 1/ 9 1 1 1 $$ $ % % $ %% $ −1 −1 −1% $ 1 1 1% % $$0 0 0% & ' & ' & '' & Laplacian original averaged The Laplacian is equivalent to subtracting a smoothed version from the original To avoid the problem of noise sensitivity images are often pre-smoothed before applying the Laplacian edge filter. (Laplacian of Gaussian, LoG) Differences of Gaussian DoG filters work very similar, but are more flexible and can be used as bandpass filters Example for a LoG (Laplacian of Gaussian) original Adding the Laplace filtered image enhances edges (note however, that slight Gaussian blur noise is enhanced although a Gaussian has been applied before) Important consequences of linear filters Original=I*H1 !0 0 0" # $ identity kernel H1: # 0 1 0 $ #0 0 0$ % & Laplacian=I*H2 " −1 −1 −1# $ % Laplacian kernel:$ −1 8 −1% $ −1 −1 −1% & ' Laplacian + Original=I*(H1+H2) " −1 −1 −1# $ % sum: $ −1 9 −1% $ −1 −1 −1% & ' Linear filters can be easily combined. They preserve total image intensity. Kernel operations and convolutions Linear filters correspond to a mathematical operation called linear convolution I = H*I, indicated by the star operator +m ∑ I 'x , y = H i , j ⋅ I x−i , y − j Before, we introduced the following expression for kernel operations, which to be mathematically precise is called a correlation i , j =− m +m ∑ i , j =− m +m Hi , j ∑ I 'x , y = H i , j ⋅ I x +i , y + j i , j =− m +m ∑ Hi , j i , j =− m Convolution corresponds to a correlation whereby the filter has been rotated by 180 degrees. Correlation and convolution are identical if the filter is symmetric. Thus the mathematical concept underlying all linear filters is the convolution operation (*), which satisfies: Commutativity, i.e. I*H=H*I Linearity (s is a real number): (s I) * H = I * (s H) = s (I * H) (I1 + I2) * H = (I1 * H) + (I2 * H) Associativity: A * (B * C) = (A * B) * C X-Y separability (under certain conditions, i.e. rank of Hxy must be 1): I * Hxy = I * (Hx * Hy) = (I * Hx) * Hy !1 1 1 1 1" # $ Hxy= #1 1 1 1 1$ #1 1 1 1 1$ % & ( Hx= 1 1 1 1 1 ) ⎛ 1 ⎞ ⎜ ⎟ Hy= ⎜ 1 ⎟ ⎝ 1 ⎠ (Similarly a 2D Gauss filter can be separated into two 1D filters, which runs much faster for larger kernels, since time goes with 2N vs N^2) Summary on linear filters In linear filters each pixel is replaced with the weighted sum of values of neighbour pixels, or a difference of such sums The mathematical operation is a convolution Linear filters preserve the total image intensity A disadvantage of linear filters is that they do not discriminate between specific image structures, e.g. edges, and background. In noise removal for example structures and background are always evenly blurred. Non-linear Filters Linear Gaussian s.d.=2 Noisy original Kuwahara Filter 7x7 Non-linear filters: • Non-linear filters are used to overcome the limitations of linear filters regarding the specificity with which they operate on certain structures (Example: Anisotropic diffusion to retain edges). CryoEM (actin filaments in red) (Ohad Medalia) • For non-linear filters, however, there exists no common mathematical theory like convolution. • With regards to quantitative analyses one has to bear in mind that the total image intensity is not kept constant. Rank filters Rank pixels according to their intensity from low to high: 2,6,7,3,5,1,4,8,9 • Rank filters: sort neighbour pixels according to their brightness (simplest: maximum or minimum filter) • Median: Replaces central pixel value with the median value of neighbour pixels • Good to treat shot noise (isolated outliers, pixel defects) • Keeps contrast, does not shift boundaries, minimal degradation of boundaries Median filtering removes shot noise Gaussian s.d.=1.5 ImageJ->Open Samples->Enhance me Median r=2 Particle removal using a Top-Hat (Rolling-Ball) Filter If Imax(crown) – Imax(brim) > threshold, the central pixel in the crown is replaced by the mean or median of the brim of the hat Any bright, isolated particle that fits entirely inside the crown region will be efficiently removed removed kept A top-hat filter used to extract particles From Russ Statistical Approaches: Kuwahara filter • Maximum likelihood reconstruction – Idea: One pixel at a boundary must have either one or the other value but cannot be the average • Assigns the mean value of whichever neighbourhood has the smallest variance to the central pixel • Sharpens region boundaries From Russ Summary on non-linear filters Non-linear filters often outperform linear-filters because they can be tailored to solve a very specific problem and they are much more versatile However, non-linear filters generally do not preserve the total image intensity Thus, while they can be used to detect certain structures in an image, quantitative measurements need to be performed on the original image (backgroundsubtracted and linearly filtered if required)