VALSE-Webinar Lecture Structured Modeling and Learning in Generalized Data Compression and Processing Hongkai Xiong 熊红凯 http://ivm.sjtu.edu.cn 电子工程系 上海交通大学 13 Jan. 2016 Sparse Representation Sparse representation x where x R N 1 , R N L , R L1 , L 0 K , K N , L Σ = N x Ψ θ 3 Multimedia Communication 1D signal (audio) 2D signal (image) 3D signal (video) High dimension SVC Sources 独立编码 视图1 DVC 视图2 K W Z Y Y 视图3 W Z K W Z Y Y W Z K 联合解码 Video coding: advances in higher dimension and higher resolution, in goals of better R-D behavior and greater compression rate Networks: develops to multiple data streaming within heterogeneous network structure, in goals of higher throughput and transmission reliability Unicast One-to-one Networks Multicast Many-to-many Multicast One-to-many 无线传感器 网络平面 视频摄像机 视场平面 4 Generalized Context Modeling in Signal Processing Wenrui Dai, Hongkai Xiong, J. Wang, S. Cheng, Y. F. Zheng, "Generalized Context Modeling with Multi-Directional Structuring and MDL-based Model Selection for Heterogeneous Data Compression", IEEE Trans. Signal Processing, 2015. Wenrui Dai, Hongkai Xiong, J. Wang, et al., “Discriminative structured set prediction modeling with max-margin Markov network for lossless image coding,” IEEE Transactions on Image Processing, 2014. Wenrui Dai, Hongkai Xiong, X. Jiang, et al., “Structured set intra prediction with discriminative learning in max-margin Markov network for High Efficiency Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 11, pp. 1941-1956, 2013. Heterogeneous Data Compression: Data Heterogeneous data are generated by multiple interlacing sources complied with different incomplete distribution statistics Image & Video: Spatial correlations are characterized by piecewise smooth with local oscillatory patterns like multiscale edges and textures Genome sequence: Repeatable patterns of nucleotides in various regions Executable files: Multiple interlacing data streams, e.g. opcode, displacement, and immediate data Heterogeneous Data Compression: Framework Data to be predicted Context Model Estimated probability Structured probabilistic model Capture regular patterns Optimized for specific data Context-based set prediction Encoded bitstream Coder 01011101…… Classic context model Variable order Sequential prediction Weighted estimation Background Context is suffix of data symbols in classical context modeling. Define C(s) the set of subsequences whose suffix is s. Disjoint property: no string in the context set is a suffix of any = s0, C(s) \ C(s0) = ; ; other string in this set, or for s 6 Exhaustive property: each subsequence of data symbols can find its suffix in the context set, or [ s2 S C(s) = A D. 7 Foundation: Structured Prediction Model Graphical probabilistic model Structured prediction model can be represented in the form of graph. Each node for the random variable to predict and each edge for the interdependencies among nodes. Estimating joint or conditional distribution for its nodes. Learning methods Markov random field (MRF) Max-margin Markov network: MRF + SVM Reasoning algorithms Belief propagation (BP) Expectation propagation (EP) Structured Probabilistic Model: Motivation Complex data structure Cannot be analytically represented Adaptively capture features with learning-based algorithms Incomplete distribution Cannot exactly estimate parameters for actual distribution Context-based predictive model using learning-based algorithms Structural coherence Cannot guarantee structural coherence of prediction task with isolated prediction Structured probabilistic model to constrain prediction task with structural coherence 10 Motivation Perceptual Intuition Intuition 1: Structural coherence for heterogeneous data An example: Images generated with the same local distribution Structural coherence • A natural images is not a 2D array of pixels generated with probabilistic distribution. The structural coherence is maintained to keep an image meaningful. • Structured prediction model is proposed to maintain such coherence. Generalized Context Modeling Motivation Intuition 2: Complex structure for heterogeneous data Statistics of heterogeneous data is not sequential and with uniform distribution, but is flexible and with interlaced complex distribution. Prediction based on sequential, contexts with uniform distribution Prediction based on flexibly constructed contexts with interlaced complex distribution 12 Problem Definition Pixel-wise prediction to impair the structure Parallel prediction to keep structure Challenge: Linear Prediction without high-order nonlinearity Independent Prediction without inter-dependency Similar PDF Contribution: Structure consistence → Structured Set Structured Probabilistic Model: Example 4x4 block at coordinate (401,113) in LENA Least Squares MSE:82.06 Structured Probabilistic Model MSE:68.75 Motivation Theoretical support: Sequential Source Coding Harish Viswanathan & Berger 2000: Given random variable X1 and X2, under arbitrary distortion D1 and D2, the rate for jointly describing them is no greater than the rate to describe them separately. h i R (D 1; D 2) · min RX 1 (D ) + RX 2 j X^ 1 (D 2) D· D1 X1 X2 Encoder1 Encoder2 Decoder1 Decoder2 X^ 1 X^ 2 Contribution GCM for heterogeneous data compression Structured probabilistic model for genome compression MRF for dependency between side information & optimized with BP Structured probabilistic model for lossless image coding M3N for joint spatial statistics and structural coherence Structured probabilistic model for intra-frame video coding M3N optimized with EP Contribution Universal Coding Heterogeneous Data Execut ables Learning Data Dependency Syntax & Semantics Genome GCM Input space Image Video Wenrui Dai@SJTU, 2014 16 Feature space Background: Heterogeneous Data Heterogeneous data Genomic data Long repeats with exception of insertion, deletion, and substitution. Image and Video Spatially spanned along the structures, e.g. edge, texture, and etc. Executables Interlaced data streams, e.g. opcodes, immediate data, and etc. Definition: Heterogeneous Data Heterogeneous data xN1 is generated by interlacing M data streams with a time-varying random process Ð(t) The j-th data stream is emitted from a stationary Markov source with order dj (j ) Symbol x t is obtained from the j t -th data stream f x n j t gM by jt=1 t 1:j t = Ð(t); (j t ) 2:x t = x nj t ; 3:nj t = nj t + 1: P and D = max1· j · M dj N= M j = 1 Nj N x1 is not stationary and wide sense stationary Clue In Memory Of Ben Taskar (1977-2013) A rising star in machine learning, computational linguistics and computer vision The founder of Max-Margin Markov Network Generalized Context Modeling Scenario: Coding Based on Context Modeling Predictive models for heterogeneous data compression Symbols for predicting Context Model Estimated probability Enocded bits Coding engine 01011101…… Structured prediction model Capture intrinsic structure of complex data Find adaptation to specific data Optimal set prediction based on observed contexts Generalized Context Modeling Topology Illustration Generalized context modeling (GCM) with combinatorial structuring & multi-dimensional extension 𝑐𝑀 … 𝑐𝑗 … 𝑐2 𝑐1 Extended context Context with 𝑥𝑡−𝐷 … 𝑥𝑡−𝑘 … 𝑥𝑡−𝑙 … 𝑥𝑡−2 𝑥𝑡−1 𝑥𝑡 Current symbol combinatorial structuring 𝑐 … 𝑐 … 𝑐 … 𝑐 𝑐 Sequential context 𝐷 𝑘 𝑙 2 1 𝑥𝑖−𝑚,𝑗−𝑛 … 𝑥𝑖−𝑚,𝑗−𝑛+𝑙 … 𝑥𝑖−𝑚,𝑗 ⋮ 𝑥𝑖−𝑚+𝑘,𝑗−𝑛 … 𝑥𝑖−𝑚+𝑘,𝑗−𝑛+𝑙 … 𝑥𝑖−𝑚+𝑘,𝑗 2-D context ⋮ 𝑥𝑖,𝑗−𝑛 … 𝑥𝑖,𝑗−𝑛+𝑙 … 𝑥𝑖,𝑗 𝐱 1 𝑖 ,…,𝐱 ,…,𝐱 𝑀 M-D context Context with multidimensional extension Generalized Context Modeling Graphical Model for Prediction Contexts (N ) c1 .. . (N ) c2 ¢¢¢ .. . ( 2) c1 ( 1) .. . ¢¢¢ c2 x1 x2 ci ( 1) ( 1) c1 ¢¢¢ ¢¢¢ ci + 1 (N ) cD ¢¢¢ .. . ( 2) ( 2) c2 (N ) (N ) ci • Symbols for predicting are correlated with their neighboring symbols. .. . ( 2) ci + 2 ( 2) ¢¢¢ ( 1) ci ci + 1 xi xi + 1 cD ( 1) ¢¢¢ ¢¢¢ Symbols for predicting • Graphical model for GCM with D-order and Mdirectional context. cD xD • Component of context in each direction is served as an observation for the prediction. • Conditional random field represent dependencies among symbols for predicting and contextbased correlations. Definition: Context Set Given context s, the set of subsequences containing s is ©b ª b C(s) = xajI (s) µ I (xa) where I (s) is the index set of s. S is a valid generalized context model, if it satisfies in each of its direction n Exhaustive property:for any subsequence x mj j 2 A ¤ in j-th direction, there exists s in S³ such´ that [ s2 S C s(j ) = (A ¤ ) nj Disjoint property: for any subsequence x m j 2 A ¤ in j-th direction, given arbitrary s and s’ ³ ´ ³ ´ C s(j ) \ C s0(j ) = ; Modeling & Prediction: Model Graph Trellis-like graph rooted from M-ary vector (∅, · · · ,∅) Each node corresponds to an index set for finite order combination of predicted symbols. Given node ° = f ° (j ) gM j=1, 0 Its succeeding node ° satisfies that i ) I (° ) ½ I (° 0) and l(° 0) = l(° ) + 1; i i ) i l (° ( j ) ) < i l (° 0( j ) ) · D for ° (j ) with l(° (j ) ) < D : Its preceding node statisfies that i ) I (° 00) ½ I (° ) and l(° 00) = l(° ) ¡ 1; i i ) I (° 00(j ) ) = ; or i l (° 00( j ) ) < i l (° 0( j ) ) for non-empty ° (j ) : 2D M ¡ 1 possible context structures locating in DM+1 vertical slices for GCM with given M and D. Generalized Context Modeling Model Tree example D = 4 M = 1 D= 3 M = 2 Representation & Prediction: Model Graph Degenerating Contexts with m < M Directions (; ; f 1g ; f 1; 2g) (; ; f 2g ; f 1; 2g) (; ; f 1; 2g ; f 1g) (; ; f 1; 2g; f 1; 2g) (; ; ; ; f 1; 2g) (; ; f 1; 2g ; f 2g) 0 1 ;; @;; A f 1g 0 1 ;; @;; A f 2g 0 0 1 ;; B C @; ;A ; 1 ;; @f 1g ;A ; 0 1 ;; @f 2g ;A ; 0 1 f 1g ; @ ;; A ; 0 1 f 2g ; @ ;; A ; (f 1; 2g; ; ; f 1; 2g) (; ; f 1; 2g ; ; ) (f 1g ; ; ; f 1; 2g) (f 1; 2g; f 1; 2g ; ; ) (f 1; 2g; ; ; ; ) (f 2g ; ; ; f 1; 2g) (f 2g ; f 1g ; f 1; 2g) (f 1g ; f 1; 2g ; ; ) (f 2g ; f 2g ; f 1; 2g) (; ; f 2g ; f 2g) (f 2g ; f 1; 2g ; ; ) (f 1g ; f 1; 2g ; f 1g) (f 1g ; ; ; f 1g) (f 2g ; f 1; 2g ; f 1g) (f 2g ; f 1; 2g ; f 2g) 0 (f 1g ; f 1; 2g ; f 2g) (f 1; 2g; f 2g ; ; ) (f 2g ; ; ; f 1g) (f 1g ; f 1g ; f 1g) (f 2g ; ; ; f 2g) (f 1g ; f 1g ; f 2g) (f 1; 2g ; f 1g ; f 1g) (f 1g ; f 1g ; ; ) (f 1g ; f 2g ; f 1g) (f 1; 2g ; f 1g ; f 2g) (f 1g ; f 2g ; ; ) (f 1g ; f 2g ; f 2g) (f 1; 2g ; f 2g ; f 1g) (f 2g ; f 1g ; ; ) (f 2g ; f 1g ; f 1g) (f 1; 2g ; f 2g ; f 2g) (f 2g ; f 2g ; ; ) 1 f 1; 2g ; B C @ f 1g ; A f 1; 2g 1 f 1; 2g ; B C @ f 2g ; A f 1; 2g (f 1; 2g; f 1g ; ; ) (f 1g ; ; ; f 2g) 1 0 (f 1; 2g ; ; ; f 2g) (; ; f 2g ; f 1g) 0 (f 1g ; f 2g ; f 1; 2g) (f 1; 2g ; ; ; f 1g) (; ; f 1g ; f 2g) 1 f 1g ; B C @ f 1; 2g ;A f 1; 2g f 2g ; B C @ f 1; 2g ;A f 1; 2g (f 1g ; f 1g ; f 1; 2g) (; ; f 1g ; f 1g) 0 0 0 1 f 1; 2g ; B C @ f 1; 2g ;A f 1; 2g 1 f 1; 2g ; B C @ f 1; 2g ;A f 1g 0 1 f 1; 2g ; B C @ f 1; 2g ;A f 2g (f 2g ; f 1g ; f 2g) (f 2g ; f 2g ; f 1g) (f 2g ; f 2g ; f 2g) 0 1 ;; @f 1g ;A ; 0 0 1 ;; @f 2g ;A ; 0 1 ;; @f 1g ;A f 1g 0 0 1 ;; 0 1 @;; A ;; f 1g B C @ ; ;A 0 1 ;; ; @;; A f 2g 0 1 ;; @f 2g ;A f 1g 0 0 0 0 0 1 f 1g ; @ ;; A ; 1 0 1 ;; ;; @ ; ; A @ f 1g ; A f 1; 2g f 1; 2g 1 f 1g ; @ ;; A f 1g 1 f 2g ; @ ;; A f 1g 1 ;; @f 1; 2g ;A f 1g 1 f 2g ; @f 1g ;A f 1g 1 f 1g ; @f 1g ;A f 1g 0 0 1 f 1g ; @ f 1g ; A f 1; 2g 1 f 1g ; B C @ f 1; 2g ;A f 1; 2g 0 1 1 f 1; 2g ; 0 f 1; 2g ; B C @ f 1g ; A B C @ f 1; 2g ;A 0 1 f 1; 2g f 1g ; f 1; 2g @f 1; 2g ;A 0 f 1; 2g ;1 B C f 1g @ f 1; 2g ;A 0 1 f 1g f 1; 2g ; @ f 1g ; A f 1g 0 1 f 2g ; @ ;; A ; Contexts with M Directions Model graph with depth D=3 and M=2 directions Solid (red) and dashed (blue) paths share some common nodes Generalized Context Modeling Problem Statement Model tree to represent generalized context models and their successive relationship. Minimum description length (MDL) principle to select optimal class of context models. Normalized maximum likelihood (NML) to determine optimal weighted probability for prediction. Generalized Context Modeling Model Tree For contexts with maximum order D and M directions, model tree elaborates all the possible combinations of finite-order predicted symbols and their successive relationship. Its root is M-directional empty vector ∅, … , ∅ , and each of its node corresponds to the index set of one combination of predicted symbols. There are 2𝐷𝑀−1 nodes in the 2𝑀 𝐷−1 paths from the root to the leaf nodes, which constraining the context selection. Model Selection: Separable Context Modeling Prediction based on contexts with multi-directional structuring can be made separately in each of its directions. Given R = (L + 1) D ¡ 1 0 1 0 1 (1) N N (1) P r (x 1 js = c1 ) Pr (x 1 js = c1 B C B C .. .. = H ¢ @ A @ A . . (M ) P r (x N1 js = cV P r (x N1 js(M ) = cU where H 2 RU£ V ; U = RM ; V = RM and its elements are ( ¡ n(j ) ¢ n(j ) Pr s = c u mod R = dv=Re huv = 0 otherwise The size of model class grows linearly with M Generalized Context Modeling Model Selection NML function with MDL principle for model selection. ¡ ln f N M L p n (y n jp) = ¡ ln f p (y n jµn ) + ln + log 2 2¼ Z p kJ (µ) kdµ + o(1) µ Model complexity Code assignment function Contexts in each direction are compared with the NML function to find optimal context for predicting current symbol. For M-interlaced autoregressive sources, the model complexity is constant with data size N. Generalized Context Modeling Weighted Probability for Prediction Estimated probability for generalized context modeling Pw ¡ x N1 ¢ X Y = w (S) S2 M Pr ¡ ¢ x N1 js s2 S In a sequential way, for each symbol X Pw (x t ) = w (s) Pe (x t js) s For each context s, its weights is l ( s) w (s) = P where l(s) = P CM D ´ l ( s) MD k= 1 k k CM D´ M (i ) l(s ) i= 1 l ( s) = CM D ´ l ( s) (1 + ´ ) M D ¡ 1 Generalized Context Modeling Model Redundancy Given generalized model class M with maximum order D and M directions, the model redundancy led by multidirectional extension is ½ ¹ M E = ¡ log Q Pw ¡ xN 1 ¡ ¢ N s2 Sa Pe x 1 ¢· ¡ L js MD log ´MD (1 + ´ ) MD ¡ 1 where L is the size of alphabet, η is the compensation for various contexts. The model redundancy led by multi-directional extension only depends on the maximum order D and the number of directions M, but is independent of size of data N. Generalized Context Modeling Experimental Results In Calgary corpus, GCM outperforms CTW by 7%12% in executable files and seismic data. In executable file compression, GCM outperforms PPMd and PPMonstr by 10% and 4%, respectively. GCM is comparative to the best compressor PAQ8 with less computational complexity. ML-based does not fully exploit the statistics in heterogeneous data. As an alternative, learning of structured prediction model is proposed. Model Redundancy Given generalized model class M with maximum order D and M directions, the model redundancy led by combinatorial structuring is ½ ¹ C S = ¡ log Q Pw ¡ xN 1 ¡ ¢ N s2 Sa Pe x 1 ´D ¢ · ¡ L log D js (1 + ´ ) ¡ 1 D where L is the size of alphabet, η is the compensation for various contexts. The model redundancy led by multi-directional extension only depends on the maximum order D, but is independent of size of data N. 34 Conceptual Diagram: Image • Discriminative prediction distinguishes the actual values of pixels with other possible estimations to the max margin based on contexts, but cannot utilize the structure for predictions. • Markov network maintains the structural coherence in the regions for predicting but cannot optimize the context-based prediction. • Joint optimization by max-margin Markov network Diagram Flowing diagram for structured set prediction model Structural coherence Imaging constraints for set of pixels Sampling Encoding Context-based prediction for each pixel Context-based prediction for each pixel Decoding Reconstruction Imaging constraints for set of pixels Structural coherence Prediction Given y the block of encoding pixels and x the reconstructed pixels as contexts, its prediction is derived in the concurrent form. Local spatial statistics is represented by the linear combination of the class of feature functions. Trained model parameters Collection of feature functions Training Trained model parameter Joint optimization Feature function Loss function Structural coherence Model parameter w is trained over the collection of training data S={xi, yi}. The feature functions {fi} establish the conditional probabilistic model for prediction based on the various contexts derived from the supposed predictive direction is the loss function that evaluates the prediction and adjusts the model parameter w. Loss Function The M-ary estimated output ŷ(i) for block of pixels y is measured over the generated graphical model. Log-Gaussian function for node clique and Dirac function for edge clique. With prediction error ϵi= ŷ(i) -y(i) and variance σ2 over errors Solution Standard quadratic programming (QP) for solving the min-max formulation suffers the high computational cost for the problems with large alphabet size. As an alternative, its dual is proposed. 8 X 1 X > 2 > max ® (y ) L (y ; y ) ¡ k ® (y ) (f (y ) ¡ f (y )) k > i i i i i i < 2 i ;y i ;y X > > s.t. ®i (y ) = C; ®i (y ) ¸ 0; 8i : > : y Sequential minimal optimization (SMO) breaks the dual problem into a series of small (QP) problems for cliques and takes an ascent step to modify the least number of variables where and αi(y) is the marginal distribution for the ith clique. SMO iteratively chooses the pairs of y with respect to the KKT condition for solution. Solution • Junction tree for loopy Markov network. Each junction is generated by adding edges to link cliques. • Junction tree is unique. • Each clique is predicted along the junction tree • Belief Propagation (BP) as message passing algorithm for inference and update the potential of each clique Upper Bound of Prediction Error Theoretical upper bound for prediction error Theorem: Given the trained weighting vector w and arbitrary constant η>0, the prediction error is asymptotically equivalent to the one obtained over the training data with probability at least 1-exp(-η). s µ ¶ 32 1 EX L (w ¢f ; y) · ES L ° (w ¢f ; y) + ln 4N 1 (L ; ° i ; S) + ln N pi ´ Upper bound for Upper bound for γ-relaxed average prediction error average training error Additional term converges to zero when N grows Remark: The prediction error is upper-bounded by the well-tuned training error. The Theorem ensures the predictive performance of the structured prediction model. Upper Bound of Prediction Error In view of probability, given the trained weighting vector w and arbitrary constant η>0, with sufficient sampling, there exists ε(L,γ,N,η)→0, satisfying P [sup [EX L (w ¢f ; y) ¡ ES L ° (w ¢f ; y)] · "] > 1 ¡ ´ The prediction error is upper-bounded by the well-tuned training error. The Theorem ensures the predictive performance of the structured set prediction model. Implementation • Combining with variancebased predictor for smooth regions, structured set prediction serves as an alternative mode • Comparing the coding cost of two alternative modes for the optimal one • Log-Gaussian loss function to obtain optimal coding of the residual based on alleged Gaussian distribution. Experimental Results Image Proposed MRP BMF TMW CALIC JPEGLS JPEG 2000 HD Photo Airplane 3.536 3.591 3.602 3.601 3.743 3.817 4.013 4.247 Baboon 5.635 5.663 5.714 5.738 5.666 6.037 6.107 6.149 •Balloon Performance exceeds2.579 JPEG-LS by 10% and JPEG2000 lossless in 2.548 2.649 2.649 2.825 2.904 mode 3.031by 14% 3.320 average in bits per pixel. Barb 3.764 3.815 3.959 4.084 4.413 4.691 4.600 4.836 Barb2 4.175 4.276 rate 4.378 4.530 4.686 4.789 5.024 • Performance exceeds4.216 the minimum predictor (MRP, the optimal predictor) by 1.35% per pixel. Camerain average 3.901 in bits 3.949 4.060 4.098 4.190 4.314 4.535 4.959 Couple 3.323 3.388 3.448 3.446 3.609 3.699 3.915 4.318 Goldhill 4.173 4.207 4.238 4.266 4.394 4.477 4.603 4.746 Lena 3.877 3.889 3.929 3.908 4.102 4.238 4.303 4.477 Peppers 4.163 4.199 4.241 4.251 4.246 4.513 4.629 4.850 Conceptual Diagram: Video Conceptual description for structured prediction model Trained model parameter Joint optimization Feature function Loss function Structural coherence • Optimal joint prediction by max-margin Markov network • Max-margin estimation directly conditioned on the predicted pixels for context-based prediction • Markov network to maintain the structural coherence in the regions for predicting. Loss Function Laplacian loss for the´M-ary ³ ³ error. ´ X function X estimated X ` i y^ (i ) ¡ y (i ) + L (^ y; y) = i ³ I y^ (i ) ; y (i ) I y^ (j ) ; y (j ) i j 2 ne(i ) Laplacian errors derived for each node and the state transition of the neighboring for each 8 nodes ³ ´ edge. For each node, its prediction error is 1 ¡ p 2¾ > log2 1 ¡ e > > > µ µ j² j¡ 0:5 > < ¡ ¾i =p 2 ¡ 1 log e ¡ e 2 2 ` i (² i ) = > µ ¶ > j² i j¡ 0:5 > > 1 ¡ ¾=p 2 > : log2 2e ¶¶ j² j+ 0:5 i p ¾= 2 ²i = 0 0 < j² i j < 255 j² i j = 255 where the error ϵi= ŷ(i) -y(i) and variance σ2 over errors. Laplacian loss function meets with DCT transform. The structured prediction model optimize it for minimal coding length under HEVC framework. ´ Expectation Propagation for Message Passing Utilize SMO for Solving the standard quadratic programming (QP) for the max-margin Markov network. Accordingly, junction tree is generated and message passing algorithm is conducted along junction tree for the most probable states of each pixel. The lossy intra video coding does not require to propagate the actual states along the junction tree. Statistics like means and variance cannot be selected and propagated for robust message passing with convergence. Expectation propagation (EP) utilizes such statistics so that the actual distribution is approximated with the exponential family. The metric for approximation can be varied based on the video data. Prediction based on EP is proven to converge to an upper bound. Implementation • Structured prediction model as an alternative mode: MODE_STRUCT • Integration into current HEVC framework without additional syntax element • Mode decision by ratedistortion optimization • Laplacian-based loss function for the residual obtaining best coding performance under DCT transform Experimental Results • Performance exceeds HEVC common test model by 2.26% in BD-rates. The gain in BD-PSNR is up to 0.38dB. Foreman_352×288 BlowingBubbles_416×240 • Performance exceeds HEVC with combined intra coding (CIP) by 1.31% in BD-rates. BQMall_832×480 Cactus_1920×1080 Conceptual Diagram: Genome Sequence • The central dogma of molecular biology: • A framework for understanding the normal flow for the transfer of DNA sequence informationRNA between sequential information carrier. Phenotype Protein (Genotype) • Proteins are constructed according to DNA. Purine Bases: Adenine (A); Guanine (G) Pyramidine Bases:Thymine (T); Cytosine (C) 51 Background: Compressive Structures in Genomic Data TGTCTGCAGCAGCCGCT Reversible Insertion of ‘G’ • DNA sequence is repeated patterns of nucleotides, namely ‘A’, ‘T’, Palindrome ‘G’, and ‘C’ ACAGACGTGTCGGCGA TGTCTGCAGGCAGCCGCT • Approximate repeats: Exact repeat, Insertion, Deletion, and Substitution Deletion of ‘G’ Substitution of ‘G’ TGTCTGCACAGCCGCT • Reversible palindrome: substitute ‘A’ with ‘T’ and‘A‘G’ with ’ with ‘C’, and vice versa TGTCTGCAACAGCCGCT 52 Background Reference-based Methods RLZ Relative LZ compression with related reference sequences indicated by their self-index Cannot handle alphabet other than {A,T,G,C,N} GRS General Genome ReSequencing tool Consider chromosome varied sequence percentage GReEn Copy model for matching of exact repeats in reference Statistical model for estimating the probabilities of matching Motivation Main Concerns Approximate or exact repeats of nucleotides Variable repeat sizes & offsets of repeats. Exception of insertion, deletion and substitution of nucleotides in repeats Motivations Differences between target and reference sequence are not uniformly distributed, but sparse for coding. Side information, e.g. sizes and offsets are correlated, which can be predicted based on structured prediction model. Framework • Under the hierarchical prediction structure, the reference is selected according to the loss function n o ³ ´ (j ) (j ) (j ) (j ) (j ) F^ ; M^ = arg min L j F ; F~ ; M : i i (j ) F~i ;M (j ) i i i i • Difference sequence is the zero sequence with emergence of non-zero symbol at a low frequency. It is suitable for wavelet transform and subsequent bit plane coding. • Markov random field is established for correlated side information, which is predicted and updated with BP algorithm. Hierarchical Prediction Structure Reference CAAATcttAcccCGCC Target OFFSET = 0 Difference 0x0000000000E0E0E000E0E0E0000000 CAAATCTTACCCCGCC SIZE = 16 Example for hierarchical prediction structure: A (sub-) fragment of 16 nucleotides is predicted based on a (sub-) fragment of 16 nucleotides in reference sequence. Its offset is 0. The difference (sub-)fragment is obtained by subtracting the selected reference from the target. Hierarchical Prediction Structure Reference TGTCTGCAMCAGCCGCT Target OFFSET =0 OFFSET =1 Difference 0x000000000000000000000000000000 TGTCTGCACAGCCGCT SIZE = 8 SIZE = 8 Example for hierarchical prediction structure: A (sub-) fragment of 16 nucleotides is predicted based on two subfragment of 8 nucleotides in reference sequence. Their offsets are 0 and 1, respectively. The difference (sub-)fragment is obtained by subtracting the selected reference from the target. Loss Function Hamming-like weighted loss function for estimation of coding cost. Distance (loss) between the target and reference sub-fragment Fi(j )and F~i(j ) with size m is N2 ³ ´ X (j ) (j ) L Fi ; F~i = ` H (x n ; x~n~) n= N 1 Three kinds of weighted losses for differences between target and reference nucleotides are proposed. 8 > < 0 x n ¡ x~n~ = 0x00 ` H (x n ; x~n~ ) = 1 jx n ¡ x~n~ j = 0x20 > : C other wi se Side Information Prediction Side information prediction (j ) f M Predicting the side information i gof fragment Fi simultaneously. Establish Markov chain as structured prediction model for prediction. Current sub-fragment is predicted and updated based on its neighboring ones. S1 F1 S2 F2 O1 O2 F^1 F^2 Wenrui Dai@SJTU, 2013 S1 + S2 S1 = S2 O1 = O2 F O1 = O2 F^ Side Information Prediction Markov chain is established to represent the interdependencies among the states of side information of the neighboring sub-fragments. BP algorithm to propagate the most probable states for side information, and calculate and update their marginal distribution. ³ ´ ³ ´ p^ M (j ) i M M^ (j ) i ¹ f j + 1! = max (j ) i M + ¹ fj¡ ³ = arg max ¹ f j + 1 ! M (j ) i (j ) M (j ) i 1! + ¹ fj¡ M (j ) i ´ 1! M (j ) i Experimental Results Compression Ratio Time(Second) 700.0 200 180 600.0 160 Proposed Time • KOREF20090224 and KOREF20090131 are the genomes from 500.0 GReEn Time 140 same human species. 120 400.0 GRS Time 100 •Compression ratio is about 495 times, which improves the Proposed Ratio 300.0 compression performance of GReEn and GRS by 150% and 80 GReEn Ratio 200%. 60 200.0 GRS Ratio 40 • The run-time ratio in comparison to GReEn is 219.7%, and is 100.0 20 comparative to GRS. 0.0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 M X Y Experimental Results Compression Ratio Time(Second) 350 140 300 120 100 •250 YH and KOREF20090224 are two genomes from various human species. 200 80 Proposed Time GReEn Time •150The compression ratio is about 232 times, which is 60about 150% Proposed Ratio the compression performance of GReEn. 100 40 GReEn Ratio • GRS cannot compress most of the chromosomes. 50 20 • 0The run-time ratio in comparison to GReEn is 252.99%. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 M X Y Acknowledgement PhD Students Faculty Botao Wang Xiaopeng Zhang Yong Li Yuchen Zhang Xing Gao Shuo Chen Prof. Hongkai Xiong Postdoctoral Fellow Kexin Tang Yangmei Shen Yuehan Xiong MS Students Dr. Wenrui Dai Dr. Chenglin Li EPFL UCSD Kuanyu Ju Can Xu Saijie Ni Han Cheng Wenjing Fan Thanks!