Network Coding Theory: Consolidation and Extensions Raymond Yeung Joint work with Bob Li, Ning Cai and Zhen Zhan Outline Single-Source Network Coding Global and Local Descriptions of a Network Code Linear Multicast, Broadcast, and Dispersion Static codes Multi-Source Network Coding Fundamental Limits of Linear Codes Based on an upcoming paper to appear in Foundation and Trends in Communications and Information Theory (Editor: Sergio Verdu). Single-Source Network Coding Network is acyclic. The message x, a -dimensional row vector in F, is generated at the source node. A symbol in F can be sent on each channel. Global Description The symbol sent on channel e is a function of the message, called the global encoding mapping for channel e. For any node v, the global encoding mappings have to satisfy the local constraints, i.e., the local encoding mapping for every node v is well defined. A Globally Linear Network Code A code is globally linear if all the global encoding mappings are linear (and all the local constraints satisfied). A globally linear code is the most general linear code that can possibly be defined. The global encoding mapping for channel e is characterized by a column vector fe, s.t. the symbol sent on e is x fe. It can be proved that if a code is globally linear, then it is also locally linearly, i.e., all local encoding mappings are linear. Global Description vs Local Description Since the local encoding mapping at a node v is linear, it follows that for any e Out(v), fe is a linear combination of fe’, e’ In(v). Global description (Li-Yeung-Cai). These linear combination forms the local encoding kernel. Local description (Koetter-Medard) Global Description = Local Description The global description and the local description are the two sides of a coin: They are equivalent. Both can describe the most general form of a (block) linear network code! Generic Network Code Definition (LYC) A linear network code is said to be generic if: For every set of channels {e1, e2, … , en}, where n and ej Out(vj), the vectors fe1, fe2, … , fen are linearly independent provided that {fd: d In(vj)} {fek: k j} for 1 j n. The idea: Whenever a collection of vectors can possibly be linear independent, they are. Special Cases of a Generic Network Code Generic network code Linear dispersion Linear Broadcast Linear Multicast Each notion is strictly weaker than the previous notion! Linear Multicast For each node v, if maxflow(v) , then the message x can be recovered. Linear Broadcast For every node v, If maxflow(v) , the message x can be received. If maxflow(v) < , maxflow(v) dimensions of the message x can be recovered. Linear Broadcast Linear Multicast Linear Dispersion For every collection of nodes P, If maxflow(P) , the message x can be received. If maxflow(P) < , maxflow(P) dimensions of the message x can be recovered. Linear Dispersion Linear Broadcast Linear Mulicast (Generic network code implies all) For a linear dispersion, a new comer who wants to receive the message x can do so by accessing a collection of nodes P such that maxflow(P) , where each individual node u in P may have maxflow(u) < . Code Constructions A generic network code exists for all sufficiently large F and can be constructed by the LYC algorithm. A linear dispersion, a linear broadcast, and a linear multicast can potentially be constructed with decreasing complexity since they satisfy a set of properties of decreasing strength. In particular, a polynomial time algorithm for constructing a linear multicast has been reported independently by Sanders et al. and Jaggi et al. Static Codes Static linear multicast was introduced by KM which finds applications in robust network multicast. Static versions of linear broadcast and linear dispersion can be defined accordingly. The LYC algorithm can be modified for constructing a static generic network code. This means that the static versions of a linear dispersion, a linear broadcast, and a linear multicast can all be constructed. Multi-Source Network Coding A network is given. Independent information sources of rates = (1, 2, …, S) are generated at possibly different nodes, and each source is to be multicast to a specific sets of nodes. The set of all achievable rates is called the achievable information rate region R. If all the sources are multicast to the same set of nodes, then it reduces to a single-source network coding problem, otherwise it does not. A multi-source network coding problem cannot be decomposed into single-source network coding problems even when all the information sources are generated at the same node (Yeung 95). Special multi-source network coding problems have been shown to be decomposable (Roche, Hau, Yeung, Zhang 95-99). An Example of Indecomposability (with Wireless Application) Independent sources need to be coded jointly b1 b1 b2 b2 b1+b2 b1 b2 Characterization of the Information Rate region R Inner and outer bounds on R acyclic networks can be expressed in term of the region of all entropy functions of random variables (Yeung 97, Yeung-Zhang 99, Song et al. 03). A computable outer bound on R, called RLP, has also been obtained. Only existence proofs by random coding are available no code construction. The region Γ* Let Γ* be the set of all entropy functions of a collection of random variables labeled by the information sources and the channels. Outer Bound Rout If an information rate tuple is achievable, then there exists h closure(Γ*) which satisfies a set of constraints denoted by C which specifies the independence of the information sources 2. the rate tuple 3. local constraints of the code 4. the channel capacity constraints 5. the multicast requirements. C is a collection of hyperplanes in the Eucledian space. 1. Linear Codes for Multiple Sources The global description for a linear network code can be generalized to multiple sources. Each channel is characterized by a column vector of an appropriate dimension. The existence of a linear code is nothing but the existence of a collections of vectors satisfying the set of constraints C. The Region * Let * be the set of all rank functions for a collection of -dimensional column vectors labeled by the information sources and the channels over some finite field F, where 1. Linear Codes vs Nonlinear Codes Linear codes Rlinear An information rate tuple is linearly achievable iff there exists h closure(*) which satisfies the set of constraints C. Note: Rlinear includes all rate tuples that are inferior to some rate tuples achievable by mixing linear codes. Nonlinear codes outer bound Rout If an information rate tuple is achievable, then there exists h closure(Γ*) which satisfies the set of constraints C. Similarity between Rank and Entropy The rank function satisfies 1. 0 rank(A). 2. rank(A) rank(B) if A B. 3. rank(A) + rank(B) rank(AB) + rank(A B). 4. rank(A) |A|. The entropy function in general satisfies 1. 0 H(A). 2. H(A) H (B) if A B. 3. H(A) + H (B) H (AB) + H (A B). 1 - 3 are called the polymatroidal axioms. The Bridge from Rank to Entropy Theorem 1: Let F be a finite field, Y be an -dimensional random row vector that distributes uniformly on F, and A be an l matrix. Let Z = Y·A. Then H(Z) = rank(A) log |F|. Using this theorem, it can be shown that * Γ*. A Gap between * and Γ* In addition to the polymatroidal axioms, the rank function also satisfies the Ingleton inequality: r(A13)+ r(A14)+ r(A23)+ r(A24)+ r(A34) r(A3)+ r(A4)+ r(A12)+ r(A134)+ r(A234) The Ingleton inequality is satisfied by algebraic structures as general as Abelian groups. The corresponding inequality is not satisfied by the entropy function (Zhang-Yeung 99), so there is a gap between * and Γ*. This gap between * and Γ* suggests that nonlinear codes may actually perform better for some multisource problems. Vector Linear Codes Vector Linear Codes (Riis, Lehman2, Medard, Effros, Ho, Karger, Koetter) It can be regarded as a linear code over a network obtained by expanding all the capacities by an integer factor. It has been shown that some multi-source problems do not have linear solutions but have vector linear solutions. Question 1: Are these vector linear solutions better than all mixtures of linear solutions? Question 2: Do these vector linear solutions exceed the Ingleton inequality? (If so, the answer to Q1 is yes.) Codes Beyond Fields Dougherty, Frieling and Zeger have recently shown that there exist a multi-source problem that has no linear solution even in the more general algebraic context of modules, which includes all finite rings and Abelian groups. Question 1: Is the nonlinear solution given by DFZ better than all mixtures of linear solutions? Question 2: Does the nonlinear solution given by DFZ exceed the Ingleton inequality? (If so, the answer to Q1 is yes.) Ingleton Inequality Classification Codes abide by the Ingleton inequality Linear codes, module codes Codes not necessarily abide by the Ingleton inequality Vector linear codes (abide by the Ingleton inequality in an extended space) Codes not abide by the Ingleton inequality Non-Abelian group codes are asymptotically as good as all nonlinear codes (Chan, submitted to ISIT 2005). Thank You