Geometry and Complexity Theory J.M. Landsberg

advertisement
Geometry and Complexity Theory
J.M. Landsberg
Contents
Preface
ix
Chapter 1.
Introduction
1
§1.1.
Matrix multiplication
1
§1.2.
Separation of algebraic complexity classes
7
§1.3.
Supplementary material
Chapter 2.
The complexity of Matrix multiplication I: lower bounds
11
15
§2.1.
Matrix multiplication and multi-linear algebra
15
§2.2.
Complexity of bilinear maps
17
§2.3.
Definitions and examples from representation theory
20
§2.4.
Definitions and examples from algebraic geometry
24
§2.5.
Strassen’s equations and generalizations
34
§2.6.
Lower bounds for the rank of matrix multiplication
43
§2.7.
Additional equations for σr (Seg(PA × PB × PC))
46
Chapter 3.
The complexity of matrix multiplication II: asymptotic
upper bounds
57
§3.1.
Overview
57
§3.2.
The upper bounds of Bini, Capovani, Lotti, and Romani
59
§3.3.
Schönhage’s upper bounds
62
§3.4.
Strassen’s laser method
65
§3.5.
The Coppersmith-Winograd method
70
§3.6.
The Cohn-Umans program
72
v
vi
Contents
Chapter 4.
§4.1.
§4.2.
§4.3.
§4.4.
§4.5.
§4.6.
Chapter
§5.1.
§5.2.
§5.3.
The complexity of Matrix multiplication III: algorithms
and “practical” upper bounds
Geometry of decompositions
Strassen’s algorithm for Mh2i
Border rank algorithms for Mh2,2,ni
Alternating least squares approach to algorithms
Algorithms for 3 × 3 matrix multiplication with 23
multiplications
Smirnov’s algorithm showing the border rank of Mh3i is at
most 20
79
80
81
86
89
89
89
5. Representation theory for complexity theory
91
Double-Commutant Theorem and first consequences
91
Representations of Sd and GL(V )
98
Methods for determining equations of secant varieties of Segre
varieties using representation theory
109
Chapter 6. Geometric Complexity Theory
113
§6.1. Circuits and the flagship conjecture of GCT
114
§6.2. A program to find modules in I[Detn ] via representation
theory
118
§6.3. Necessary conditions for modules of polynomials to be useful
for GCT
124
§6.4. GCTV: Blackbox derandomization and the search for explicit
linear spaces
128
§6.5. GCTVI: The needle exists
128
§6.6. Dual varieties and GCT
128
§6.7. Remarks on Kronecker coefficients and plethysm coefficients 138
§6.8. Symmetries of polynomials and coordinate rings of orbits
139
Chapter
§7.1.
§7.2.
§7.3.
7. Shallow circuits and the Chow variety
149
Shallow circuits and geometry
150
Hilbert functions and the method of shifted partial derivatives 154
The Chow variety
154
Chapter 8. Advanced Topics
§8.1. Asymptotic surjectivity of the Hadamard-Howe map
§8.2. Non-normality of Detn
167
167
172
Hints and Answers to Selected Exercises
175
Contents
vii
Bibliography
179
Index
187
Preface
The purpose of this book is to describe recent applications of algebraic
geometry and representation theory to complexity theory. I focus on two
central problems: the complexity of matrix multiplication and Geometric
Complexity Theory (GCT).
I have attempted to make this book accessible to both computer scientists and geometers, and the exposition as self-contained as possible. The
two main goals of this book are to convince computer scientists of the utility of techniques from algebraic geometry and representation theory, and
to show geometers beautiful, interesting, and important questions arising in
complexity theory.
Computer scientists have made extensive use of tools from mathematics
such as combinatorics, graph theory, asymptotic estimates, and especially
linear algebra. I hope to show that even elementary techniques from algebraic geometry and representation theory can substantially advance the
search for lower, and even upper bounds in complexity theory. For questions such as lower bounds for the complexity of matrix multiplication and
Valiant’s algebraic variants of P v. NP, I believe this additional mathematics will be necessary for further advances, as there is strong evidence that
these problems cannot be solved by linear algebra alone. I have attempted
to make these techniques accessible, introducing them as needed to deal with
concrete problems.
For geometers, I expect that complexity theory will be as good a source
for questions in algebraic geometry as modern physics has been. Recent
work has indicated that tools such as Fulton-McPherson intersection theory, Castelnuovo-Mumford regularity, and the Kempf-Weyman method for
ix
x
Preface
computing minimal free resolutions all have something to add to complexity theory. In addition, complexity theory has a way of rejuvenating old
questions that had been nearly forgotten but remain beautiful and intriguing: questions of Hadamard, Darboux, Luroth, and the classical Italian
school. At the same time, complexity theory has brought different areas
of mathematics together in new ways- combinatorics, representation theory
and algebraic geometry all play a role in understanding the coordinate ring
of the orbit closure of the determinant.
This book evolved from several classes I have given on the subject: a
spring 2013 semester course at Texas A&M, summer courses at: Sculoa
Matematica Inter-universitaria, Cortona (July 2012), CIRM, Trento (June
2014), and an IMA summer school at U. Chicago (July 2014), and most
recently, and importantly, a fall 2014 semester course at UC Berkeley as
part of the semester long program, Algorithms and Complexity in Algebraic
Geometry, at the Simons Institute for the Theory of Computing.
Overview. Chapters 2,3,4 deal with the complexity of matrix multiplication. Chapter 2, in addition to introducing background material in algebraic
geometry and representation theory, focuses on lower bounds. This is the
first exposition that includes recent advances in the subject. Chapter 3 deals
with asymptotic upper bounds. There are many excellent expositions of this
material. The exposition in this book emphasizes geometric aspects of the
advances and includes a detailed overview of the Cohn-Umans program, neither of which is available elsewhere. Chapter 4 provides a new geometric
perspective on, and an exposition of, the existing algorithms for fast matrix
multiplication.
Chapter 5 covers further topics in representation theory such as the algebraic Peter-Weyl theorem (which plays a central role in GCT) and includes
additional details on the use of representation theory in the study of matrix
multiplication.
Chapters 6,7,8 deal with the Geometric Complexity Theory (GCT) initiated in [MS01], as well as complexity results (such as reductions to shallow
circuits) that may be approached from a GCT perspective. Chapter 6 is on
the flagship permanent v. determinant conjecture of [MS01]. In addition
to content from [MS08, Mula, Mulb], it covers direct geometric methods.
Chapter 7 approaches the doors opened by [GKKS13b] towards paths to
separate VP from VNP via depth three or restricted depth five circuits
from a geometric perspective. These translate to questions about familiar
varieties in algebraic geometry (secant varieties of the Chow variety, and
of Veronese re-embeddings of secant varieties of Veronese varieties). More
advanced tools from algebraic geometry are unavoidable when approaching
GCT, and in Chapter 8, I present two (*possibly more*) results that require
Preface
xi
a more advanced background; the asymptotic surjectivity of the HadamardHowe map due to M. Brion, and the non-normality of the orbit closures of
the determinant and permanent, due to S. Kumar.
Prerequisites. I have attempted to limit prerequisites to a solid background in linear algebra, although such a reader would have to accept several
basic results in algebraic geometry without proof (e.g. Noether normalization). In chapter 6 some further, but still elementary algebraic geometry is
needed, but nothing beyond [Sha94] is used. The exception is chapter 8,
where more advanced results are utilized.
Attendees of the 2014 UC Berkeley course included first year graduate
students in mathematics and more advanced graduate students in computer
science.
Acknowledgments. To be written later
Layout. To be written later
Dependency of chapters. To be written later
Chapter 1
Introduction
A dramatic leap in signal processing occurred in the 1960’s with the implementation of the fast Fourier transform, an algorithm that surprised the
engineering community with its efficiency.1 How could one predict when
such fast, perhaps non-intuitive, algorithms exist? Can we prove when they
do not? Complexity theory addresses these questions.
This book is concerned with the use of geometry in attaining these goals.
I focus on two central questions: the complexity of matrix multiplication,
and algebraic variants of the famous P versus NP problem. In the first case,
a surprising algorithm exists and it is conjectured that even more amazing
algorithms exist. In the second case it is conjectured that no surprising
algorithm exists.
1.1. Matrix multiplication
Much of scientific computation is linear algebra, and the basic operation
of linear algebra is matrix multiplication. All operations of linear algebra,
solving systems of linear equations, computing determinants etc., use matrix
multiplication.
1.1.1. The standard algorithm. The standard algorithm for multiplying
matrices is row-column multiplication: Let A, B be 2 × 2 matrices
1 1
1 1
a1 a2
b1 b2
A=
,
B
=
.
a21 a22
b21 b22
1To this day, it is not known if there is an even more efficient algorithm than the FFT. See
[Val77, KLPSMN09, GHIL].
1
2
1. Introduction
The usual algorithm to calculate the matrix product C = AB is
c11 = a11 b11 + a12 b21 ,
c12 = a11 b12 + a12 b22 ,
c21 = a21 b11 + a22 b21 ,
c22 = a21 b12 + a22 b22 .
It requires 8 multiplications and 4 additions to execute, and applied to n×n
matrices, it uses n3 multiplications and n3 − n2 additions.
This algorithm has been around for a long time.
In 1968, V. Strassen set out to prove the standard algorithm was optimal
in the sense that no algorithm using fewer multiplications exists – at least
for two by two matrices – at least over Z2 . His spectacular failure opened
up a whole new area of research:
1.1.2. Strassen’s algorithm for multiplying 2 × 2 matrices using
seven scalar multiplications [Str69]. Set
I = (a11 + a22 )(b11 + b22 ),
(1.1.1)
II = (a21 + a22 )b11 ,
III = a11 (b12 − b22 )
IV = a22 (−b11 + b21 )
V = (a11 + a12 )b22
V I = (−a11 + a21 )(b11 + b12 ),
V II = (a12 − a22 )(b21 + b22 ),
Exercise 1.1.2.1: Show that if C = AB, then
c11 = I + IV − V + V II,
c21 = II + IV,
c12 = III + V,
c22 = I + III − II + V I.
This raises questions:
(1) Can one find an algorithm that uses just six multiplications?
(2) Could Strassen’s algorithm have been predicted in advance?
(3) Since it uses more additions, is it actually better in practice?
(4) What about algorithms for n × n matrices?
I address the last question first:
1.1. Matrix multiplication
3
1.1.3. Fast multiplication of n × n matrices. In Strassen’s algorithm,
the entries of the matrices need not be scalars - they could themselves be
matrices. Let A, B be 4 × 4 matrices, and write
1 1
1 1
a1 a2
b b
A=
, B = 12 22 .
a21 a22
b1 b2
where aij , bij are 2 × 2 matrices. One may apply Strassen’s algorithm to
get the blocks of C = AB in terms of the blocks of A, B performing 7
multiplications of 2 × 2 matrices. Since one can apply Strassen’s algorithm
to each block, one can multiply 4 × 4 matrices using 72 = 49 multiplications
instead of the usual 43 = 64.
If A, B are 2k × 2k matrices, one may multiply them using 7k multiplications instead of the usual 8k . If n is not a power of two, enlarge the
matrices with blocks of zeros to obtain matrices whose size is a power of two.
Asymptotically, by recursion and block multiplication one can multiply n×n
matrices using approximately nlog2 (7) ' n2.81 arithmetic operations. To see
this, let n = 2k and write 7k = (2k )a so klog2 7 = aklog2 2 so a = log2 7.
1.1.4. Regarding the number of additions. The number of additions in
Strassen’s algorithm also grows like n2.81 , so this algorithm is more efficient
in practice when the matrices are large. For any efficient algorithm for
matrix multiplication, the total complexity is governed by the number of
multiplications, see [BCS97, Prop. 15.1]. This is fortuitous because there
is a geometric object, tensor rank, that counts the number of multiplications
in an optimal algorithm (within a factor of two), and thus provides us with
a geometric measure of the complexity of matrix multiplication.
Just how large matrices one needs to obtain a substantial savings with
Strassen’s algorithm (one needs matrices of size thousands by thousands)
and other practical matters are addressed in [BB].
1.1.5. An even better algorithm? Regarding question (1) above, one
cannot improve upon Strassen’s algorithm for 2 × 2 matrices. This was first
shown in [Win71]. I will give a proof, using geometry and representation
theory, of a stronger statement in §5.3.5. However for n > 2 very little is
known, as is discussed below and in Chapters 2-4. It is known that better
algorithms than Strassen’s exist for n × n matrices, even if they are not
known explicitly.
1.1.6. How to predict in advance? The answer to question (2) is yes!
In fact it could have been predicted 100 years ago.
Had someone asked Terracini in 1913, he would have been able to predict the existence of something like Strassen’s algorithm from geometric
considerations alone. Matrix multiplication is a bilinear map (see §1.1.8).
4
1. Introduction
Terracini would have been able to tell you (and we will see why in §2.4.14)
that even a generic bilinear map C4 × C4 → C4 can be executed using seven
multiplications and thus, fixing any > 0, one can perform any bilinear map
C4 × C4 → C4 within an error of using seven multiplications.
1.1.7. An astonishing conjecture. The following quantity is the standard measure of the complexity of matrix multiplication:
Definition 1.1.7.1. The exponent ω of matrix multiplication is
ω := inf{h ∈ R | n×n matrices may be multiplied using O(nh ) arithmetic operations}
where inf denotes the infimum.
By Theorem 1.1.9.2 below, Strassen’s algorithm shows ω ≤ log2 (7) <
2.81. Determining ω is a central open problem in complexity theory. The
current “world record” is ω < 2.373 [Wil, Gal, Sto]. This result and the
substantial work leading up to it is the topic of Chapter 4.
This work has led to the following astounding conjecture:
Conjecture 1.1.7.2. ω = 2.
That is, it is conjectured that asymptotically, it is nearly just as easy to
multiply matrices as it is to add them!
Although I am unaware of anyone taking responsibility for the conjecture, all computer scientists I have discussed it with expect it to be true.
Since I have no opinion on whether the conjecture should be true or
false, I discuss both upper and lower bounds for ω, focusing on the role of
geometry. I start with lower bounds:
1.1.8. Matrix multiplication as a bilinear map. I will use the notation
Mhn,m,li : Cn×m × Cm×l → Cn×l
for matrix multiplication, writing Mhni = Mhn,n,ni .
Matrix multiplication is a bilinear map, that is, for all Xj , X ∈ Cn×m ,
Yj , Y ∈ Cm×l and aj , bj ∈ C,
Mhn,m,li (a1 X1 + a2 X2 , Y ) = a1 Mhn,m,li (X1 , Y ) + a2 Mhn,m,li (X2 , Y ), and
Mhn,m,li (X, b1 Y1 + b2 Y2 ) = b1 Mhn,m,li (X, Y1 ) + b2 Mhn,m,li (X, Y2 ).
The set of all bilinear maps Ca × Cb → Cc is a vector space. (In our
case a = nm, b = ml, and c = ln.) Write a1 , . . . , aa for a basis of Ca and
b
c
similarly for Cb , Cc . Then T : Ca × C
Pc→ Cijk is uniquely determined by its
action on basis vectors, T (ai , bj ) = k=1 t ck . That is, the vector space
of bilinear maps Ca × Cb → Cc , which I will denote by Ca∗ ⊗Cb∗ ⊗Cc , has
1.1. Matrix multiplication
5
dimension abc and when I discuss polynomials on this space, I will mean
polynomials on the coefficients tijk .
1.1.9. Tensor rank. Our measure of complexity for a bilinear map will be
its tensor rank. Given an a × b matrix X, one can always change bases, i.e.,
multiply X on the left by an invertible a × a matrix and on the right by an
invertible b × b matrix, to obtain a matrix with some number of 1’s along
the diagonal and zeros elsewhere. The number of 1’s appearing is called the
rank of the matrix and is an invariant of the linear map X determines. In
other words, the only property of a linear map Ca → Cb that is invariant
under changes of bases is its rank, and for each rank we have a normal form.
This is not surprising because the dimension of the space of such linear maps
is ab and we have a2 + b2 worth of parameters of changes of bases we can
make in a matrix representing the map.
For bilinear maps Ca × Cb → Cc we are not so lucky, as usually abc >
+ b2 + c2 , i.e., there are fewer free parameters of changes of bases than
the number of parameters needed to describe the map.
a2
Nonetheless, there are properties of a bilinear map that will not change
under a change of basis. The main property we will use is tensor rank.
It is a generalization of the rank of a linear map. A tensor T has tensor
rank r if it can be written as the sum of r rank one tensors but no fewer,
and a tensor has tensor rank one if in some coordinate system the multidimensional matrix representing it has exactly one nonzero entry. The set
of bilinear maps in Ca∗ ⊗Cb∗ ⊗Cc of tensor rank at most r will be denoted
σ̂r0 . For a bilinear map T : Ca × Cb → Cc , write R(T ) ≤ r if T ∈ σ̂r0 .
Remark 1.1.9.1. The peculiar notation σ̂r0 will be explained in §2.4.7. To
have an idea where it comes from for now: σr = σr (Seg(Pa−1 ×Pb−1 ×Pc−1 ))
is standard notation in algebraic geometry for the r-th secant variety of the
Segre variety, which is the object we will study. The hat denotes its cousin
in affine space and the 0 indicates that we are now only considering an open
subset of this set.
The following theorem shows that tensor rank is a legitimate measure
of complexity:
Theorem 1.1.9.2 (Strassen [Str69], see [BCS97] §15.1). R(Mhni ) = O(nω ).
Our goal is thus to determine, for a given r, whether or not matrix
multiplication lies in σ̂r0 .
1.1.10. How to use algebraic geometry to prove lower bounds for
the complexity of matrix multiplication? Algebraic geometry deals
with the study of zero sets of polynomials. It may be used to prove both
upper and lower complexity bounds. For lower bounds:
6
1. Introduction
Plan to show Mhn,m,li 6∈ σ̂r0 via algebraic geometry.
• Find a polynomial P on the space of bilinear maps Cnm × Cml →
Cnl such that P (T ) = 0 for all T ∈ σ̂r0 .
• Show that P (Mhn,m,li ) 6= 0.
Chapters 2 and 5 discuss techniques for finding such polynomials, using
algebraic geometry and representation theory, the study of symmetry in
linear algebra.
1.1.11. Representation theory and matrix multiplication. The study
of polynomials on any vector space is facilitated by sorting the polynomials
by degree. In our situation, our vector space of bilinear maps has additional
structure (the groups of changes of bases in each of Ca , Cb , and Cc act) and
we will exploit this to obtain a finer sorting of polynomials.
1.1.12. How to use algebraic geometry to prove upper bounds for
the complexity of matrix multiplication? Based on the above discussion, one could try:
Plan to show Mhn,m,li ∈ σ̂r0 with algebraic geometry.
• Find a set of polynomials {Pj } on the space of bilinear maps Cnm ×
Cml → Cnl such that T ∈ σ̂r0 if and only if Pj (T ) = 0 for all j.
• Show that Pj (Mhn,m,li ) = 0 for all j.
This plan has a problem: Consider the set S = {(w, z) ∈ C2 | z = 0, w 6=
0}, whose real picture looks like the z-axis with the origin removed.
Any polynomial vanishing on S will also vanish at the origin.
Just as in this example, the zero set of the polynomials vanishing on σ̂r0
is larger than σ̂r0 when r > 1 (see §2.2.2) so one cannot determine membership in σ̂r0 via polynomials (although one can exclude membership with
polynomials).
Definition 1.1.12.1. Given a subset Z of a vector space U , define the ideal
of Z, denoted I(Z), to be the set of polynomials vanishing at all points of
Z. Define the Zariski closure of Z, denoted Z, to be the set of u ∈ U such
that P (u) = 0 for all P ∈ I(Z). The common zero set of a collection of
polynomials (such as Z) is called an algebraic variety.
1.2. Separation of algebraic complexity classes
7
In the example above, S = {(w, z) ∈ C2 | z = 0}.
When U = Ca∗ ⊗Cb∗ ⊗Cc , let σ̂r := σ̂r0 denote the Zariski closure of the
set of bilinear maps of tensor rank at most r.
We will say T has border rank at most r if T ∈ σ̂r and write R(T ) ≤ r.
It is expected that R(Mhni ) < R(Mhni ), although for n = 2 they both
equal seven. For n = 3 we only know 14 ≤ R(Mh3i ) ≤ 20 and 19 ≤
R(Mh3i ) ≤ 23. See Chapter 2 for the lower bounds and Chapter 4 for the
upper bounds.
For the study of the exponent of matrix multiplication, we have good
luck: the problem goes away.
Theorem 1.1.12.2 (Bini [Bin80], see §3.2). R(Mhni ) = O(nω ).
That is, although we may have R(Mhni ) < R(Mhni ), they are not different enough to effect the exponent.
1.1.13. How to use geometry to find algorithms? At first glance,
geometry, which studies properties of objects that are independent of coordinates, seems ill-suited for developing explicit algorithms, which are given
in coordinates. In fact, one uses geometry not to write down a single algorithm, but rather families of algorithms all sharing common geometric
features. A highlight of this is a purely geometric derivation of Strassen’s
algorithm for 2 × 2 matrices in §4.2.3. Other algorithms and principles for
finding them are discussed in Chapter 4.
1.2. Separation of algebraic complexity classes
In a 1956 letter to von Neumann (see [Sip92, Appendix]) Gödel tried to
quantify the apparent difference between intuition and systematic problem
solving. Around the same time, researchers in the Soviet Union were trying
to determine if “brute force search” was avoidable in solving problems such
as the famous traveling salesman problem where there seems to be no fast
way to find a solution, but a proposed solution can be easily checked. (The
problem is to determine if there exists a way to visit, say twenty cities
traveling less than a thousand miles. If I claim to have an algorithm to
do so, you just need to look at my plan and check the distances.) These
discussions eventually gave rise to the complexity classes P, which models
problems admitting a fast algorithm to produce a solution, and NP which
models problems admitting a fast algorithm to verify a proposed solution,
and the famous conjecture of Cook, Karp and Levin that these two classes
are distinct. See [Sip92] for a history of the problem.
The transition of this problem to geometry goes via algebra:
8
1. Introduction
1.2.1. From complexity to algebra. The P v. NP conjecture is generally believed to be out of reach at the moment, so there have been weaker
conjectures proposed that might be more tractable. One such comes from a
standard counting problem discussed in §1.3.1. This variant has the advantage that it admits a clean algebraic formulation that I now discuss.
L. Valiant [Val79a] conjectured that a sequence of polynomials that is
“easy” to write down should not necessarily admit a fast evaluation. He
defined algebraic complexity classes that are now called VP and VNP (see
§6.1.2 for their definitions), and conjectured:
Conjecture 1.2.1.1 (Valiant [Val79a]). VP 6= VNP.
For the precise relationship between this conjecture and the P 6= NP
conjecture see [BCS97, Chap. 21].
Valiant also showed that a particular polynomial sequence, the permanent (permn ), is complete for the class VNP, in the sense that VP 6= VNP
if and only if (permn ) 6∈ VP. As explained in §1.3.1, the permanent is natural for computer science. Although it is not immediately clear, the permanent is also natural to geometry, see §5.2.6. The formula for the permanent
of an n × n matrix x = (xij ) is:
(1.2.1)
permn (x) :=
X
x1σ(1) · · · xnσ(n) .
σ∈Sn
Here Sn denotes the group of permutations of {1, . . . , n}.
How would one show there is no fast algorithm for the permanent? In
§6.1.2 we will define algebraic circuits, which are a class of algorithms for
computing a polynomial, and their size, which is a measure of the complexity
of the algorithm. Let circuit-size(permn ) denote the size of the smallest
algebraic circuit computing permn .
Conjecture 1.2.1.2 (Valiant [Val79a]). circuit-size(permn ) grows faster
than any polynomial in n.
Since this conjecture is expected to be extremely difficult, one could
simply try to find lower bounds for circuit-size(permn ).
1.2.2. From algebra to algebraic geometry. As with our earlier discussion, one could work as follows:
2
Let S n Cn denote the vector space of all homogeneous polynomials of
2
degree n in n2 variables, so permn is a point of the vector space S n Cn .
(Here S n is short for the n-th symmetric tensor power.) If we write an
P
2
element of S n Cn as p = 1≤i1 ≤···≤in ≤n2 ci1 ,...,in yi1 · · · yin , then we may view
1.2. Separation of algebraic complexity classes
9
the coefficients ci1 ,...,in as coordinates on this vector space. We will look for
polynomials on our space of polynomials, that is, polynomials on the ci1 ,...,in .
Plan to show (permn ) 6∈ VP, or at least bound its circuit size by r
with algebraic geometry.
2
2
• Find a polynomial P on S n Cn such that P (p) = 0 for all p ∈ S n Cn
with circuit-size(p) ≤ r.
• Show that P (permn ) 6= 0.
By the discussion above on Zariski closure, this may be a more difficult
problem: we are not just trying to exclude permn from having a circuit, but
we are also requiring it not be “near” to having a small circuit. I return to
this issue below.
1.2.3. Shallow circuits and algebraic geometry. As with matrix multiplication, one would like to use symmetry, more precisely, representation
theory, to attack Valiant’s conjectures. Unfortunately, representation theory
gives little insight regarding polynomials for small circuits.
Fortunately, there are reduction theorems that allow one to restrict the
classes of circuits one works with. Chapter 7 deals with two reductions
that are amenable to algebraic geometry and representation theory. On one
hand, they have very appealing algebraic varieties associated to them (e.g.,
secant varieties of Chow varieties). On the other, one pays a heavy price:
instead of needing to prove non-polynomial growth, one needs to prove nonnearly-exponential growth.
1.2.4. Another path to algebraic geometry. The permanent resembles
one of the most, perhaps the most, studied polynomial, the determinant of
an n × n matrix x = (xij ):
X
(1.2.2)
detn (x) :=
sgn(σ)x1σ(1) · · · xnσ(n) .
σ∈Sn
Here sgn(σ) denotes the sign of the permutation σ. The determinant, despite
its enormous formula of n! terms, can be computed very quickly, e.g., by
Gaussian elimination. (See §6.1.2 for an explicit division free algorithm.) In
particular (detn ) ∈ VP. It is not known if detn is complete for VP, that is,
whether or not a sequence of polynomials is in VP if and only if it can be
reduced to the determinant in the sense made precise below.
Although
a b
perm
c d
a −b
= det
,
c d
10
1. Introduction
Marcus and Minc [MM61], building on work of Pólya and Szegö (see
[Gat87]), proved that one could not express permm (y) as a size m determinant of a matrix whose entries are affine linear functions of the xij when
m > 2. This raised the question that perhaps the permanent of an m×m matrix could be expressed as a slightly larger determinant. More precisely, we
say p(y 1 , . . . , y M ) is an affine linear projection of q(x1 , . . . , xN ), if there exist
affine linear functions xα (y) = xα (y 1 , . . . , y M ) such that p(y) = q(x(y)). For
example


0 y11 y12 y13 0 0 0
 0 1 0 0 y33 y32 0 


 0 0 1 0 0 y31 y33 


1
2
0
0
0
1
y
0
y
perm3 (y) = det7 
3
3

.
y 2 0 0 0 1 0 0 
 2

y 3 0 0 0 0 1 0 
2
y21
0
0
0
0
0
1
This formula is due to B. Grenet [Gre11], who also generalized it to express
permm as a determinant of size 2m − 1.
Valiant conjectured that one cannot do much better than this. Let p be
a polynomial. Let dc(p) denote the smallest n such that p is an affine linear
projection of the determinant. (detn would be VP complete if dc(pm ) grew
no faster than a polynomial for all sequences (pm ) ∈ VP.)
Conjecture 1.2.4.1 (Valiant [Val79a]). dc(permm ) grows faster than any
polynomial in m.
1.2.5. Geometric Complexity Theory. The “Zariski closed” version of
Conjecture 1.2.4.1 is the flagship conjecture of Geometric Complexity Theory
and the topic of Chapter 6. To state it in a useful form, rephrase Valiant’s
conjecture as follows:
2
2
2
Let End(Cn ) denote the space of all linear maps Cn → Cn , which
2
acts on S n Cn under the action L · p(x) := p(LT (x)), where x is viewed as a
column vector of size n2 , L is an n2 × n2 matrix, and T denotes transpose.
(The transpose is used so that L1 · (L2 · p) = (L1 L2 ) · p.) Define an auxiliary
2
variable ` ∈ C1 so `n−m permm ∈ S n Cm +1 . Consider any linear inclusion
2
2
2
C(m +1) → Cn , so we may consider `n−m permm ∈ S n Cn . Then
(1.2.3)
2
dc(permm ) ≤ n ⇐⇒ `n−m permm ∈ End(Cn ) · detn .
For p ∈ S d CM with d ≤ n and M ≤ n2 , let dc(p) denote the smallest n
such that `n−d p ∈ End(Cn2 ) · detn .
Conjecture 1.2.5.1. [MS01] dc(permm ) grows faster than any polynomial
in m.
1.3. Supplementary material
11
Representation theory indicates a path towards solving Conjecture 1.2.5.1.
To explain the path, introduce the following terminology: a polynomial
p ∈ S n CN is characterized by its symmetries if, letting Gp := {g ∈ GLN |
g · p = g}, for any q ∈ S n CN with Gq ⊇ Gp , one has p = λq for some λ ∈ C.
There are two essential observations:
• End(Cn2 ) · detn = GLn2 · detn , that is End(Cn2 ) · detn is an orbit
closure.
• detn and permn are characterized by their symmetries.
Representation theory (more precisely, the Peter-Weyl Theorem, see
§5.1), in principle gives a description of the polynomials vanishing on an
orbit closure modulo the effect of the boundary. (More precisely, it describes the ring of regular functions on the orbit.) In the problem at hand
this leads to questions about much-studied, but little understood, quantities in representation theory and combinatorics (Kronecker and plethysm
coefficients).
A potential problem is that there might be different orbits with the same
coordinate ring. The property of being characterized by symmetries avoids
this problem.
Unlike matrix multiplication, this problem is in its infancy and I do
not expect it to be fully resolved in the near future. But there are several
paths to making progress with this conjecture that I discuss in Chapter
6. To gain insight as to what techniques might work, it will be useful to
examine “toy” versions of the problem - these questions are of mathematical
significance in their own right, and lead to interesting connections between
combinatorics, representation theory and geometry. See §7.3 for one such,
called the Hadamard-Howe problem.
1.3. Supplementary material
This section consists of three topics: §1.3.1 discusses why the permanent
is natural for computer science, §1.3.2 is intended for readers who have no
experience with algebraic geometry, and §1.3.3 is intended for those not used
to working with big numbers.
1.3.1. Remarks on the permanent and complexity theory. A standard problem in graph theory, for which the only known algorithms are
exponential in the size of the graph, is to count the number of perfect
matchings of a bipartite graph, that is, a graph with two sets of vertices
and edges only joining vertices from one set to the other.
A perfect matching is a subset of the edges such that each vertex shares
an edge from the subset with exactly one other vertex.
12
1. Introduction
A
B
000 111
111
000
111
111
000
000 000
111
000
111
000 111
111
000
11
00
00
11
00
11
00
11
C
00
11
11
00
00
11
00
11
111
000
000
111
000
111
000
111
α
00
11
11
00
00
11
γ
β
Figure 1.3.1. A bipartite graph, Vertex sets are {A, B, C} and {α, β, γ}.
A
B
111 111
000
000
000 111
111
000
111
000
111
000
000 111
111
000
11
00
00
11
00
11
00
11
α
111
000
000
111
000
111
000
111
β
C
Α
11
00
00
11
00
11
00
11
Β
111
000
111
000 000
111
000
111
000 111
111
000
000
111
000
111
00
11
11
00
00
11
γ
00
11
11
00
00
11
00
11
α
000
111
111
000
000
111
000
111
β
C
00
11
11
00
00
11
00
11
11
00
00
11
00
11
γ
Figure 1.3.2. Two perfect matchings of the graph from Figure 1.3.1.
To a bipartite graph one associates an incidence matrix xij , where xij = 1
if an edge joins the vertex i above to the vertex j below and is zero otherwise.
For example the graph above has incidence matrix


1 1 0
0 1 1 .
0 1 1
A perfect matching corresponds to a matrix constructed from the incidence matrix by setting some of the entries to zero so that the resulting
matrix has exactly one 1 in each row and column, i.e., is a matrix obtained
by applying a permutation to the columns of the identity matrix.


1 1 0
For example, perm3 0 1 1 = 2.
0 1 1
Exercise 1.3.1.1: Show that if x is the incidence matrix of a bipartite
graph, then permn (x) indeed equals the number of perfect matchings.
Thus a classical problem: determine the complexity of counting the
number of perfect matchings of a bipartite graph (which is complete for the
complexity class ]P), can be studied via algebra - determine the complexity
of evaluating the permanent.
1.3.2. Exercises for those not familiar with algebraic geometry. In
§1.1.11 we defined the rank of a matrix in a way that made it clear it was a
property of the linear map the matrix represented.
1.3. Supplementary material
13
Theorem 1.3.2.1 (Fundamental theorem of linear algebra). Let V, W be
finite dimensional vector spaces, let f : V → W be a linear map and let Af
be a matrix representing f . Then
(1)
rank(f ) = dim f (V )
= dim(span{columns of Af })
= dim(span{rows of Af })
= dim V − dim ker f.
In particular rank(f ) ≤ min{dim V, dim W }.
(2) For generic f , rank(f ) = min{dim V, dim W }.
(3) If a sequence of linear maps ft of rank r has a limit f0 , then
rank(f0 ) ≤ r.
(4) rank(f ) ≤ r if and only if, in any choice of bases, the determinants
of all size r + 1 submatrices of the matrix representing f are all
zero.
Exercise 1.3.2.2: Prove the theorem. }
Note that (4) says the Zariski closure of the space of linear maps of rank
r is the space of linear maps of rank ≤ r and (3) says that the Euclidean
closure (i.e., closure under taking limits) of the linear maps of rank r is
contained in the set of linear maps of rank ≤ r (of course equality holds).
Now consider polynomials on spaces of polynomials. Let p ∈ S n CN .
How can one test if p is an n-th power of a linear form, p = `n for some
` ∈ CN ?
Exercise 1.3.2.3: Show that p = `n for some ` ∈ CN if and only if
∂p
∂p
1
N are coordinates on CN .
dim{ ∂x
1 , . . . , ∂xN } = 1, where x , . . . , x
How would one test if p is the sum of two n-th powers, p = `n1 + `n2 for
some `1 , `2 ∈ CN ?
∂p
∂p
Exercise 1.3.2.4: Show that p = `n1 +`n2 for some `j ∈ CN implies dim{ ∂x
1 , . . . , ∂xN } ≤
2.
Exercise 1.3.2.5: Show that the above condition is not sufficient by considering p = `n−1
`2 .
1
2
p
Exercise 1.3.2.6: Show that p = `n1 +`n2 for some `j ∈ CN implies dim{ ∂x∂i ∂x
j} ≤
2. Is this condition sufficient?
Exercise 1.3.2.7: Show that any polynomial vanishing on all polynomials
of the form p = `n1 + `n2 for some `j ∈ CN also vanishes on xn−1 y. }
14
1. Introduction
Exercise 1.3.2.8: More generally, show that the Euclidean closure (i.e.,
closure under taking limits) of a set is always contained in its Zariski closure.
A question to think about: how would one detect if p ∈ S n CN was a
product of n linear forms? (See §7.3.3 for the answer.)
1.3.3. Big numbers.
(1.3.1)
(1.3.2)
(1.3.3)
(1.3.4)
√
n
2πn( )n
e
ln(n!) = n ln(n) − O(ln(n))
2n
4n
'√
n
πn
αn
β
ln
= αHe ( )n − O(ln n)
βn
α
n! '
where He (x) := −x ln x − (1 − x) ln(1 − x) is the Shannon entropy. All
these identities follow from (1.3.1). Equation (1.3.1) is proved via complex
analysis (analysis of a contour integral) - it is not a simple matter of linear
algebra.
Exercise 1.3.3.1: Show alog(b) = blog(a) .
Exercise 1.3.3.2: Consider the following sequences of n:
√
2n
log(n)
n
2
3
log2 (n)
log2 (n)
, 2 ,
, n!, nn .
log2 (n), n, 100n, n , n , n
, 2
, n
n
In each case, determine for which n, the sequence surpasses the number of
atoms in the known universe. (It is estimated that there are between 1078
and 1082 atoms in the known universe.)
√
Exercise 1.3.3.3: Compare the sizes of s
d
√
and 2
dlogds .
Chapter 2
The complexity of
Matrix multiplication
I: lower bounds
I begin in §2.1 with a brief tour of multi-linear algebra and present matrix
multiplication from an invariant point of view. In §2.2 the basic complexity
measures tensor rank and border rank are defined. In §2.3 (resp. §2.4) basic
definitions and examples from representation theory (resp. algebraic geometry) are presented. To prove lower bounds, one needs to find equations on
spaces of tensors, and an introduction to this subject, as well as the state of
the art regarding lower bounds for the border rank of matrix multiplication,
are given in in §2.5. At first sight, algebraic geometry does not appear useful
for proving lower bounds on tensor rank beyond border rank bounds, but
thanks to the symmetries of the matrix multiplication tensor, it can be used
to prove rank lower bounds for Mhni , which is explained in §2.6. The chapter
concludes with §2.7, where I present the state of the art regarding equations
for border rank. As explained in Chapter 3, these equations may also be
useful for proving upper bounds for the complexity of matrix multiplication.
2.1. Matrix multiplication and multi-linear algebra
To better understand matrix multiplication as a bilinear map, I first review
basic facts from linear and multi-linear algebra. For more details on this
topic, see [Lan12, Chap. 2].
2.1.1. Linear maps. In what follows it will be essential to work without
bases, so instead of writing Cv I will work with a complex vector space V of
15
16
2. Lower bounds for matrix multiplication
dimension v. Given V , one can form a second vector space, called the dual
space to V , whose elements are linear maps from V to C:
V ∗ := {α : V → C | α is linear}
If one is working in bases and represents elements of V by column vectors,
then elements of V ∗ are naturally represented by row vectors and the map
v 7→ α(v) is just row-column matrix multiplication. Given a basis v1 , . . . , vv
of V , it determines a basis α1 , . . . , αv of V ∗ by αi (vj ) = δij .
Exercise 2.1.1.1: Assuming V is finite dimensional, write down a canonical
isomorphism V → (V ∗ )∗ . }
Let V ∗ ⊗W denote the vector space of all linear maps V → W . Given
α ∈ V ∗ and w ∈ W define a linear map α⊗w : V → W by α⊗w(v) := α(v)w.
(In bases, if α is represented by a row vector and w by a column vector, α⊗w
will be represented by the matrix wα.) Such a linear map is said to have
rank one.
Define the rank of an element f ∈ V ∗ ⊗W is the smallest r such f may
be expressed as a sum of r rank one linear maps.
A linear map f : V → W determines a linear map f T : W ∗ → V ∗
defined by f T (β)(v) := β(f (v)) for all v ∈ V and β ∈ W ∗ . Note that this
is consistent with the notation V ∗ ⊗W ' W ⊗V ∗ , being interpreted as the
space of all linear maps (W ∗ )∗ → V ∗ , that is, the order we write the factors
does not matter. If we work in bases and insist that all vectors are column
vectors, the matrix of f T is just the transpose of the matrix of f .
Exercise 2.1.1.2: Show that we may also consider an element f ∈ V ∗ ⊗W
as a bilinear map bf : V × W ∗ → C defined by bf (β, v) := β(f (v)).
Exercise 2.1.1.3: If v1 , . . . , vv is a basis of V and v 1 , . . . , v v the dual basis
of V ∗ , defined by v j (vi ) = δji , show that the identity map on V is IdV =
P j
j v ⊗vj .
Exercise 2.1.1.4: Show that there is a canonical isomorphism (V ∗ ⊗W )∗ →
V ⊗W ∗ where α⊗w(v⊗β) := α(w)β(v). Now let V = W and let IdV ∈
V ∗ ⊗V ' (V ∗ ⊗V )∗ denote the identity map. What is IdV (f ) for f ∈ V ∗ ⊗V ?
}
2.1.2. Multi-linear maps and tensors. We say V ⊗W is the tensor product of V with W . More generally, for vector spaces A1 , . . . , An define their
tensor product A1 ⊗ · · · ⊗ An to be the space of n-linear maps A∗1 ×· · ·×A∗n →
C, equivalently the space of (n − 1)-linear maps A∗1 × · · · × A∗n−1 → An etc..
Let aj ∈ Aj and define an element a1 ⊗ · · · ⊗ an ∈ A1 ⊗ · · · ⊗ An by
a1 ⊗ · · · ⊗ an (α1 , . . . , αn ) := α1 (a1 ) · · · αn (an ).
2.2. Complexity of bilinear maps
17
s
Exercise 2.1.2.1: Show that if {aj j }, 1 ≤ sj ≤ aj , is a basis of Aj , then
as11 ⊗ · · · ⊗ asnn is a basis of A1 ⊗ · · · ⊗ An . In particular dim(A1 ⊗ · · · ⊗ An ) =
a1 · · · an . }
Remark 2.1.2.2. One may identify A1 ⊗ · · · ⊗ An with any re-ordering of
the factors. When I need to be explicit about this, I will call this identification the re-ordering isomorphism.
2.1.3. Matrix multiplication. We may view matrix multiplication
MhU,V,W i : (U ∗ ⊗V )∗ × (V ∗ ⊗W )∗ → W ∗ ⊗U
as a trilinear map which I will still denote by MhU,V,W i :
MhU,V,W i : (U ∗ ⊗V )∗ × (V ∗ ⊗W )∗ × (U ⊗W ∗ )∗ → C.
Exercise 2.1.3.1: Show that in bases this map is (X, Y, Z) 7→ trace(XY Z).
We have MhU,V,W i ∈ (U ∗ ⊗V )⊗(V ∗ ⊗W )⊗(W ∗ ⊗U ) ' (U ⊗U ∗ )⊗(V ⊗V ∗ )⊗(W ⊗W ∗ ).
Exercise 2.1.3.2: Show that as a tensor MhU,V,W i = IdU ⊗IdV ⊗IdW .
Exercise 2.1.3.3: Show that IdU ⊗IdV ⊗IdV , when viewed as a bilinear
map (U ∗ ⊗V ) × (V ∗ ⊗W ) → U ∗ ⊗W , takes a linear map f : U → V , and a
second linear map g : V → W and sends it to their composition g ◦ f : U →
W.
Exercise 2.1.3.4: Show that IdV ⊗IdW ∈ V ⊗V ∗ ⊗W ⊗W ∗ = (V ⊗W )⊗(V ⊗W )∗
equals IdV ⊗W .
Exercise 2.1.3.5: Show that Mhn,m,li ⊗Mhn0 ,m0 ,l0 i = Mhnn0 ,mm0 ,ll0 i . }
2.2. Complexity of bilinear maps
2.2.1. Tensor rank. An element T ∈ A1 ⊗ · · · ⊗ An is said to have rank
one if there exist aj ∈ Aj such that T = a1 ⊗ · · · ⊗ an .
We will use the following measure of complexity:
Definition 2.2.1.1. Let T ∈ A1 ⊗ · · · ⊗ An . Define the rank (or tensor rank)
of T to be the smallest r such that T may be written as the sum of r rank
0
one tensors. We write R(T ) = r. Let σ̂r0 = σ̂r,A
⊂ A1 ⊗ · · · ⊗ An
1 ⊗···⊗ An
denote the set of tensors of rank at most r.
The rank of T ∈ A⊗B⊗C is comparable to all other standard measures
of complexity on the space of bilinear maps, see, e.g., [BCS97, §14.1].
For example, letting xiα , yuα , ziu respectively be bases of A = Cnm , B =
Cml , C = Cln , then the standard expression of matrix multiplication is
Mhl,m,ni =
n X
m X
l
X
i=1 α=1 u=1
xiα ⊗yuα ⊗ziu
18
2. Lower bounds for matrix multiplication
so we conclude R(Mhn,m,li ) ≤ nml.
Exercise 2.2.1.2: Write Strassen’s algorithm out as a tensor. }
Strassen’s algorithm shows R(Mh2,2,2i ) ≤ 7.
Recall our plan from §1.1.10 and that it was actually a plan for bounding
border rank. I now elaborate on the difference between rank and border rank
by comparing Zariski and Euclidean closure.
2.2.2. The Fundamental theorem of linear algebra is false for tensors. Recall the fundamental theorem of linear algebra from §1.3.2.
Theorem 2.2.2.1. Let n ≥ 3.
(1) If T ∈ A1 ⊗ · · · ⊗ An is outside the zero set of a certain finite collection of polynomials (in particular outside a certain set of measure
a1 ···an
ai .
zero), then R(T ) ≥ a1 +···+a
n −n+1
(2) Rank can jump up (or down) under limits.
The first assertion is proved in Exercise 2.4.14.3, To see that (2) holds,
at least when r = 2, consider
1
T (t) := [a1 ⊗b1 ⊗c1 − (a1 + ta2 )⊗(b1 + tb2 )⊗(c1 + tc2 )]
t
and note that
lim T (t) = a1 ⊗b1 ⊗c2 + a1 ⊗b2 ⊗c1 + a2 ⊗b1 ⊗c1
t→0
which does not have rank two (exercise).
To visualize why rank can jump up while taking limits, consider the
following picture, where the curve represents the points of σ̂10 . Points of σ̂20
(e.g., the dots limiting to the dot labelled T ) are those on a secant line to
σ̂10 , and the points where the rank jumps up, such at the dot labelled T , are
those that lie on a tangent line to σ̂10 . (This phenomena fails to occur for
matrices because for matrices, every point on a tangent line is also on an
honest secant line.)
We appear to have a problem:
• The set σ̂r0 is not closed under taking limits. (I will say a set that
is closed under taking limits is closed in the Euclidean topology.)
• It is also not closed in the Zariski topology, i.e., the zero set of
all polynomials vanishing on σ̂r0 includes tensors that are of rank
greater than r.
The tensors that are honestly “close” to tensors of rank r would be the
Euclidean closure, but to deal with polynomials, we needed to work with
the Zariski closure.
2.2. Complexity of bilinear maps
19
a1 b1 c1
T
Often the Zariski closure is much larger than the Euclidean closure. For
example, the Zariski closure of Z ⊂ C is C, while Z is already closed in the
Euclidean topology.
However, in this case we have good luck: the Zariski and Euclidean
closures of σ̂r0 coincide. I outline the proof in §2.4.3.
2.2.3. Border rank. Recall that σ̂r denotes the Zariski (and by the above
discussion Euclidean) closure of σ̂r0 , and the border rank of T ∈ A1 ⊗ · · · ⊗ An ,
denoted R(T ), is the smallest r such that T ∈ σ̂r . By our above discussion,
border rank is semi-continuous.
Border rank is easier to work with than rank for several reasons. For
example, the maximal rank of a tensor in Cm ⊗Cm ⊗Cm is not known in
m3 −1
e for all
general. In contrast, the maximal border rank is known to be d 3m−2
m 6= 3, and is 5 when m = 3 [Lic85]. The method of proof is a differentialgeometric calculation that dates back to Terracini [Ter11], see §2.4.14.
Exercise 2.2.3.1: Prove that if T ∈ A⊗B⊗C and T 0 := T |A0 ×B 0 ×C 0 for
some A0 ⊆ A∗ , B 0 ⊆ B ∗ , C 0 ⊆ C ∗ , then R(T ) ≥ R(T 0 ) and R(T ) ≥ R(T 0 ).
}
Exercise 2.2.3.2: Let Tj ∈ Aj ⊗Bj ⊗Cj , 1 ≤ j, k, l ≤ s. Consider ⊕j Tj ∈
(⊕j Aj )⊗(⊕k Bk )⊗(⊕l Cl ) and, letting A = ⊗j Aj , B = ⊗P
k Bk , and C =
s
⊗l Cl , T1 ⊗ · · · ⊗ Ts ∈ A⊗B⊗C. Show that R(⊕j Tj ) ≤
i=1 R(Ti ) and
R(⊗si=1 Ti ) ≤ Πsi=1 R(Ti ), and that the statements also hold for border rank.
2.2.4. Our first lower bound. Given T ∈ A⊗B⊗C, write T ∈ A⊗(B⊗C)
and think of T as a linear map TA : A∗ → B⊗C.
Proposition 2.2.4.1. R(T ) ≥ rank(TA ).
Exercise 2.2.4.2: Prove Proposition 2.2.4.1. }
20
2. Lower bounds for matrix multiplication
Exercise 2.2.4.3: Find a choice of bases such that


x


..
Mhni (A∗ ) = 

.
x
where x = (xij ) is n × n, i.e., the image in the space of n2 × n2 matrices is
block diagonal with all blocks the same.
Exercise 2.2.4.4: Show that R(Mhni ) ≥ n2 .
A fancier proof that R(Mhni ) ≥ n2 , which will be useful for proving
further lower bounds, is as follows: Write A = U ∗ ⊗V , B = V ∗ ⊗W , C =
W ∗ ⊗U , so (Mhni )A : A∗ → B⊗C is a map U ⊗V ∗ → V ∗ ⊗W ⊗W ∗ ⊗U . This
map is, for f ∈ A∗ , f 7→ f ⊗IdW , and thus is clearly injective.
Exercise 2.2.4.5: Show R(Mhm,n,1i ) = mn and R(Mhm,1,1i ) = m.
Exercise 2.2.4.6: Let b = c and assume TA is injective. Show that if T (A∗ )
is diagonalizable under the action of GL(B) × GL(C) then R(T ) ≤ b, and
therefore if T (A∗ ) is the limit of diagonalizable subspaces then R(T ) ≤ b.
2.3. Definitions and examples from representation theory
2.3.1. First definitions. Let GL(V ) denote the group of invertible linear
maps V → V . If we have chosen a basis, this is the set of invertible v × v
matrices. One says GL(V ) acts on V and that V is a G-module. Write
v 7→ g · v for the action. Then GL(V ) also acts on V ∗ where we define g · α
by (g · α)(v) := α(g −1 v). We take the inverse to have g1 · (g2 · α) = (g1 g2 ) · α.
(Aside: Inverse is preferable to the transpose used in the introduction here
because if one uses inverse, the induced action on linear maps is the usual
one corresponding to conjugation of matrices.)
More generally, for any group G, given a group homomorphism µ : G →
GL(V ), we will say G acts on V and that µ is a representation of G. The
action is irreducible if there does not exist a proper subspace U ⊂ V such
that µ(g) · u ∈ U for all u ∈ U , g ∈ G (such a subspace U is called a
submodule), and otherwise the action is decomposable. The word reducible
is used when there exists both a subspace U that and a complementary
subspace U c both of which are preserved by G. A module W is trivial if for
all g ∈ G, µ(g) = IdW .
2.3.2. First examples.
Example 2.3.2.1. GL(V ) acts irreducibly on V and GL(V ) × GL(W ) acts
irreducibly on V ∗ ⊗W by (g, h)·α⊗w = (g ·α)⊗(h·w) and extending linearly.
To see the first is irreducible, any nonzero vector can be mapped to any other
2.3. Definitions and examples from representation theory
21
nonzero vector by GL(V ). Exercise 2.3.3.5 below will show the second is
irreducible.
Example 2.3.2.2. Let
G=
1 t
| t ∈ C ⊂ GL2
0 1
Then G acts on C2 , the line C{e1 } is a submodule, but there is no complementary subspace preserved, so this action is decomposable, but not reducible.
Example 2.3.2.3. The permutation group Sd acts on Cd equipped with its
standard basis e1 , . . . , ed by σ(ej ) = eσ(j) and extending linearly. This action
is reducible, it decomposes into the direct sum of the trivial representation,
given by the span of e1 + · · · + ed , and a second irreducible representation
which has basis e1 − e2 , . . . , e1 − ed . The first is usually denoted [d], and the
second is denoted [d − 1, 1]. (These notations are explained in §5.2.2.)
There is a second one-dimensional reprsentation of Sd , denoted [1d ] :=
[1, . . . , 1], namely, for v ∈ C1 , define σ · v = sgn(σ)v. This is called the sign
representation.
Proposition 2.3.2.4. The action of GL(V ) on V ∗ ⊗V given by g · (α⊗v) =
(g ·α)⊗(g ·v) is reducible. It decomposes as V ∗ ⊗V = C{IdV }⊕sl(V ), where
IdV is the identity map and sl(V ) are the linear maps f : V → V such that,
IdV ∗ (f ) = 0, where we consider IdV ∗ ∈ V ⊗V ∗ ' (V ∗ ⊗V )∗ . In bases, for a
matrix X, the action is g · X = gXg −1 .
Exercise 2.3.2.5: Prove Proposition 2.3.2.4. Show that moreover C{Id} is
a trivial GL(V )-module. }
Remark 2.3.2.6. In any choice of basis, sl(V ) will be identified with the
traceless matrices, and by Exercise 2.1.1.4, the map f 7→ IdV ∗ (f ) is just
f 7→ trace(f ), i.e., f maps to the sum of its eigenvalues. For more details
see [Lan12, §2.3,§2.5].
Proposition 2.3.2.7. The action of GL(V ) on V ⊗V given by g · (v⊗w) =
(g · v)⊗(g · w) decomposes into two irreducible submodules. In bases this
decomposition corresponds to the splitting of v × v matrices into the direct
sum of symmetric and skew-symmetric matrices. If we represent the bilinear
map v⊗w by a matrix X, where, if α, β ∈ V ∗ are represented by row vectors,
so (v⊗w)(α, β) = αXβ T , then the matrix g ∈ GL(V ) acts by g · X = gXg T .
Exercise 2.3.2.8: Prove Proposition 2.3.2.7.
Propositions 2.3.2.4 and 2.3.2.7 illustrate the importance of keeping
track of the difference between V and V ∗ . In bases, both V ⊗V and V ⊗V ∗
22
2. Lower bounds for matrix multiplication
are just v × v matrices, but the action of GL(V ) on the two spaces is very
different.
2.3.3. Schur’s lemma and consequences.
Definition 2.3.3.1. Let ρj : G → GL(Wj ), j = 1, 2 be representations. A
G-module homomorphism, or G-module map, is a linear map f : W1 → W2
such that f (ρ1 (g) · v) = ρ2 (g) · f (v) for all v ∈ W1 and g ∈ G.
One says W1 and W2 are isomorphic G-modules if there exists a Gmodule homomorphism W1 → W2 that is a linear isomorphism.
For a group G and G-modules V and W , let HomG (V, W ) ⊂ V ∗ ⊗W
denote the vector space of G-module homomorphisms V → W .
Exercise 2.3.3.2: Show that the image and kernel of a G-module homomorphism are G-modules.
The following easy lemma is central to representation theory:
Lemma 2.3.3.3 (Schur’s Lemma). Let G be a group, let V and W be
irreducible G-modules and let f : V → W be a G-module homomorphism.
Then either f = 0 or f is an isomorphism. If further V = W , then f = λ Id
for some constant λ.
Exercise 2.3.3.4: Prove Schur’s Lemma.
Exercise 2.3.3.5: Let V be an irreducible G-module and let W be an irreducible H-module. Show V ⊗W is an irreducible G × H-module. }
2.3.4. Spaces of tensors. Give V ⊗d the structure of a GL(V )-module by
g · (v1 ⊗ · · · ⊗ vd ) := (g · v1 )⊗ · · · ⊗ (g · vd ) and extending linearly.
Exercise 2.3.4.1: Give V ⊗d the structure of a Sd -module by defining
σ · (v1 ⊗ · · · ⊗ vd ) := vσ−1 (1) ⊗ · · · ⊗ vσ−1 (d) .
Show that the actions of Sd and GL(V ) on V ⊗d commute with each other.
Definition 2.3.4.2. A tensor T ∈ V ⊗d is said to be symmetric if T (α1 , . . . , αd ) =
T (ασ(1) , . . . , ασ(d) ) for all σ ∈ Sd , and skew-symmetric if T (α1 , . . . , αd ) =
sgn(σ)T (ασ(1) , . . . , ασ(d) ) for all σ ∈ Sd . Let S d V ⊂ V ⊗d (resp. Λd V ⊂
V ⊗d ) denote the space of symmetric (resp. skew-symmetric) tensors.
Note that Λd V and S d V are GL(V )-submodules of V ⊗d .
Introduce the notations:
1 X
xσ(1) ⊗xσ(2) ⊗ · · · ⊗xσ(k) ∈ S k V.
x1 x2 · · · xk :=
k!
σ∈Sk
2.3. Definitions and examples from representation theory
23
and
x1 ∧ x2 ∧ · · · ∧ xk :=
1 X
sgn(σ)xσ(1) ⊗xσ(2) ⊗ · · · ⊗xσ(k) ∈ Λk V.
k!
σ∈Sk
If v1 , . . . , vv is a basis of V , then vi1 ⊗ · · · ⊗ vid with ij ∈ [v] is a basis of
V
vi1 · · · vid with 1 ≤ i1 ≤ · · · ≤ id ≤ v is a basis of S d V and vi1 ∧ · · · ∧ vid
with 1 ≤ i1 < · · · < id ≤ v is a basis of S d V . With respect to this basis, if
xj = (x1j , . . . , xvj )T , then the coefficient of x1 ∧ · · · ∧ xk on the basis vector
vi1 ∧ · · · ∧ vik is

 i1
x1 · · · xi1k


..
det 
.
.
⊗d ,
xik1
···
xikk
Definition 2.3.4.3. If l > k, define a contraction map V ⊗k × V ∗⊗l →
V ∗⊗l−k by
(X, φ) = (x1 ⊗ · · · ⊗ xk , φ1 ⊗ · · · ⊗ φl ) 7→ X φ := φ1 (x1 ) · · · φk (xk )φk+1 ⊗ · · · ⊗ φl .
2.3.5. Exercises.
(1) Show that the S2 -module V ⊗2 is reducible. Show that in matrices,
this decomposition corresponds to writing an v × v matrix as the
direct sum of a symmetric matrix and a skew-symmetric matrix.
(2) Show that as a GL(V )×S2 -module, V ⊗V = (S 2 V ⊗[2])⊕(Λ2 V ⊗[1, 1]).
(3) Given T ∈ S d V , show that T defines a homogeneous polynomial of
degree d on V ∗ , which, for the purposes of this exercise, denote it
by PT , via PT (α) = T (α, . . . , α). Similarly, given a homogeneous
polynomial P of degree d on V ∗ , define a tensor P by P (α1 , . . . , αd )
1
is d!
times the coefficient of t1 · · · td in the polynomial P (t1 α1 +· · ·+
td αd ). The tensor P is called the polarization of P . I will generally
use the the same notation for symmetric tensors and homogenous
polynomials.
α
α
α2
2
2∗
2
2
(4) Let P ∈ S C . If P
= aα +bαβ+cβ , what is P ( 1 ,
)?
β
β1
β2
(5) One can also define partial polarizations. Consider the surjective
map S q V ⊗S p V → S p+q V given by multiplication of polynomials.
Note that it is a GL(V )-module map. Consider the transpose map
S p+q V ∗ → S q V ∗ ⊗S p V ∗ . The image of a polynomial P ∈ S p+q V ∗
will be denoted Pq,p . Consider Pq,p : S q V → S p V ∗ and show that
in bases, its image is the span of the partial derivatives of P of
order q.
(6) Given a linear map f : V → W , one obtains linear maps f ⊗k :
V ⊗k → W ⊗k defined by f (v1 ⊗ · · · ⊗ vk ) = f (v1 )⊗ · · · ⊗ f (vk ). Show
24
2. Lower bounds for matrix multiplication
that f ⊗k descends to give linear maps f ◦k : S k V → S k W and
f ∧k : Λk V → Λk W . Show that if dim V = dim W = v, the map
f ∧v : Λv V → Λv W is multiplication by a scalar. If V = W , show
that the scalar is the determinant of the matrix of f with respect
to any choice of basis. Show that more generally, in bases, the
matrix of f ∧k has entries ± the size k minors of the matrix of f ,
in particular the matrix of f ∧v−1 is (up to transpose) the cofactor
matrix of the matrix of f .
(7) Show that as a GL(V )×GL(W )-module, S 2 (V ⊗W ) = (S 2 V ⊗S 2 W )⊕
(Λ2 V ⊗Λ2 W ) and that Λ2 (V ⊗W ) = (S 2 V ⊗Λ2 W ) ⊕ (Λ2 V ⊗S 2 W ).
}
For T ∈ A⊗B⊗C, define
(2.3.1)
GT := {g = (g1 , g2 , g3 ) ∈ GL(A)×GL(B)×GL(C) | g·T = T } ⊂ GL(A)×GL(B)×GL(C),
called the stabilizer of T .
Thanks to Exercises 2.1.3.2 and 2.3.2.5, we conclude
Proposition 2.3.5.1. Matrix multiplication MhU,V,W i is invariant under
GL(U ) × GL(V ) × GL(W ). In other words,
GL(U )×GL(V )×GL(W ) ⊆ GMhU,V,W i ⊂ GL(U ∗ ⊗V )×GL(V ∗ ⊗W )×GL(W ∗ ⊗U ).
Exercise 2.3.5.2: Let U = V = W so we are dealing with multiplication
of square matrices. Show that Mhni is also invariant under the group Z3 of
cyclic permutations of the three matrices. Since trace X T = trace X show
that there is also a Z2 -invariance, but note that this Z2 is not contained in
the S3 permuting the factors.
We conclude there is a map ρ : GL×3
n nZ3 nZ2 → GMhni , where, recalling
a = b = c = n2 , GMhni ⊂ GL(A) × GL(B) × GL(C) denotes the subgroup
preserving the tensor Mhni .
2.4. Definitions and examples from algebraic geometry
Standard references for this material are [Har95, Mum95, Sha94]. The
first is very good for examples, the second and third have clean proofs, with
the proofs in the second more concise and those in the third more elementary.
2.4.1. Varieties. For our purposes, an algebraic variety X̂ ⊂ V is defined
to be the common zero set of a collection of homogeneous polynomials on the
vector space V . Since we only deal with homogeneous polynomials, the zero
set will be invariant under re-scaling. For this, and other reasons, it will be
convenient to work in projective space PV := (V \0)/ ∼ where v ∼ w if and
only if v = λw for some λ ∈ C\0. Write π : V \0 → PV for the projection
2.4. Definitions and examples from algebraic geometry
25
map. For X ⊂ PV , write π −1 (X) ∪ {0} =: X̂ ⊂ V , and π(y) = [y]. If X̂ ⊂ V
is a variety, I will also refer to X ⊂ PV as a variety. For subsets Z ⊂ V , I’ll
write PZ ⊂ PV for its image under π.
d
Let Sym(V ) := ⊕∞
d=0 S V and note that Sym(V ) is an algebra under
multiplication of polynomials. The ideal of X is defined to be
I(X) := {P ∈ Sym(V ∗ ) | P (x) = 0 ∀[x] ∈ X}.
Note that I(X) is indeed an ideal in the algebra Sym(V ∗ ). Write C[X̂] :=
Sym(V ∗ )/I(X) and note that C[X̂] is a ring, called the ring of regular
functions on X̂.
A variety X is said to be irreducible if it is not possible to nontrivially
write X = Y ∪ Z with Y, Z varieties. If P ∈ S d V ∗ is an irreducible polynomial, then Z(P ) ⊂ PV is an irreducible variety, called a hypersurface of
degree d.
We will be mostly concerned with varieties in spaces of tensors (for the
study of matrix multiplication) and spaces of polynomials (for geometric
complexity theory).
2.4.2. First examples of varieties.
(1) Projective space PV ⊆ PV .
(2) The Segre variety of rank one tensors
σ1 = Seg(PA1 ×· · ·×PAn ) := P{T ∈ A1 ⊗ · · · ⊗ An | ∃aj ∈ Aj such that T = a1 ⊗ · · · ⊗ an } ⊂ P(A1 ⊗ · · ·
(3) The Veronese variety
vd (PV ) = P{P ∈ S d V | P = xd for some x ∈ V } ⊂ PS d V.
(4) The Grassmannian
G(k, V ) := P{T ∈ Λk V | ∃v1 , . . . , vk ∈ V such that T = v1 ∧· · ·∧vk } ⊂ PΛk V.
(5) The Chow variety
Chd (V ) := P{P ∈ S d V | ∃v1 , . . . , vd ∈ V such that P = v1 · · · vd } ⊂ PS d V.
The Grassmannian admits the geometric interpretation as the space
parametrizing the k-planes through the origin in V via the correspondence
[v1 ∧ · · · ∧ vk ] ↔ span{v1 , . . . , vk }.
By definition, projective space is a variety (the zero set of no equations).
Here is a proof that the Segre is a variety: Recall the two factor Segre
Seg(PA × PB) is a variety defined by 2 × 2 minors, for T ∈ A⊗B, of the
linear map TA : A∗ → B. Now given T ∈ A1 ⊗ · · · ⊗ An , consider the linear
maps Tj : A∗j → A1 ⊗ · · · ⊗ Aj−1 ⊗Aj+1 ⊗ · · · ⊗ An . If T = a1 ⊗ · · · ⊗ an ,
then Tj (αj ) = αj (aj )a1 ⊗ · · · ⊗ aj−1 ⊗aj+1 ⊗ · · · ⊗ an so the size two minors
are zero. It remains to show that if all these maps have rank one, then
26
2. Lower bounds for matrix multiplication
[T ] ∈ Seg(PA1 ⊗ · · · ⊗ PAn ). Since rankT1 = 1, by our result on linear
maps, T is of the form a1 ⊗X1 for some a1 ∈ A1 and X1 ∈ A2 ⊗ · · · ⊗ An ,
similarly (using the re-ordering isomorphism), for each j, T = aj ⊗Xj for
aj ∈ Aj and Xj ∈ A1 ⊗ · · · ⊗ Aj−1 ⊗Aj+1 ⊗ · · · ⊗ An . From this we conclude
T = ca1 ⊗ · · · ⊗ an for some constant c, i.e. [T ] ∈ Seg(PA1 ⊗ · · · ⊗ PAn ).
Exercise 2.4.2.1: Show that the Veronese variety and Grassmanian are
varieties. Explicitly:
(1) Show that the Veronese is the zero set of the 2 × 2 minors of the
linear map, for P ∈ S d V , P1,d−1 : V ∗ → S d−1 V , where the map is
contraction. (V ∗ may also be thought of as the space of first order
homogeneous linear differential operators on S d V .)
(2) The Grassmannian is the zero set of equations parametrized by
k
Λk−2j V ∗ ⊗Λk+2j V ∗ for 1 ≤ j ≤ min{b v−k
2 c, b 2 c} as follows: for
k−2j
∗
k+2j
∗
Y ∈Λ
V and Z ∈ Λ
V , recalling Definition 2.3.4.3, define
PY ⊗Z (T ) := (T Z)(Y T ), the evaluation of an element of Λ2j V ∗
on an element of Λ2j V . Note that these are quadratic equations
in the coefficients of T . Show that these equations indeed vanish
on the Grassmannian. We will prove later, using representation
theory, that they span the ideal of the Grassmannian in degree
two (in fact they generate the ideal), and moreover that there are
redundancies in each space, but the equations from different spaces
are independent of each other. }
The Chow variety is shown to be a variety in Exercise 2.4.3.1.
Exercise 2.4.2.2: Show that vd (PV ) = Seg(PV × · · · × PV ) ∩ PS d V .
Exercise 2.4.2.3: Let Xj ⊂ PAj be varieties. Show that
Seg(X1 ×· · ·×Xn ) := {[a1 ⊗ · · · ⊗ an ] | [aj ] ∈ Xj } ⊂ Seg(PA1 ⊗ · · · ⊗ PAn ) ⊂ P(A1 ⊗ · · · ⊗ An )
is a variety. When Aj = A and Xj = X, I will write X ×n = Seg(X×· · ·×X).
}
Example 2.4.2.4. Define the subspace variety
Ŝuba0 ,b0 ,c0 (A⊗B⊗C) := {T ∈ A⊗B⊗C |∃A0 ⊂ A, dim A0 = a0 , B 0 ⊂ B, dim B 0 = b0 ,
C 0 ⊂ C, dim C 0 = c0 , such that T ∈ A0 ⊗B 0 ⊗C 0 }
To see that Suba0 ,b0 ,c0 (A⊗B⊗C) is a variety, note that the existence of such
an A0 is exactly that the map TA : A∗ → B⊗C has rank at most a0 and
similarly for the other factors.
We may rephrase the statement that R(Mhni ) ≥ n2 as combining the observations [Mhni ] 6∈ Subn2 −1,n2 ,n2 (A⊗B⊗C) and σn2 −1 ⊂ Subn2 −1,n2 ,n2 (A⊗B⊗C)
2.4. Definitions and examples from algebraic geometry
27
(in fact σn2 −1 ⊂ Subn2 −1,n2 −1,n2 −1 (A⊗B⊗C)). This perspective illustrates
a general strategy:
Find varieties containing σr whose equations are easy to describe, and
then show Mhni (or whatever tensor we are trying to prove lower bounds for)
is not a point of the intersection of the varieties.
When trying to prove upper bounds, the goal is to find enough such
varieties that their intersection is exactly σr .
Exercise 2.4.2.5: Define
Ŝubr (S d V ) := {P ∈ S d V | ∃V 0 ⊂ V, dim V 0 = r, such that P ∈ S d V 0 }.
Show Subr (S d V ) is a variety.
In the cases of the Veronese, Segre and Grassmannian, the equations
are the minors of some matrix and are naturally expressed as such. Most
varieties do not admit such natural determinental equations.
2.4.3. Basic facts about varieties. If X ⊂ PW is a variety, L ⊂ W
is a subspace with PL ∩ X = ∅, and one considers the projection map
p : W → W/L, then Pp(X̂) ⊂ PW/L is also a variety. This is part of
the Noether normalization theorem (see, e.g., [Sha94, §5.4] or [Mum95,
§2C]). It is proved via elimination theory. This projection property does not
hold when one works over R, or in affine space: The surface in C3 given by
xz − y = 0, when projected to the x − y plane is the union of the open set
{(x, y) | x 6= 0} and the point (0, 0). Similarly, the surface in RP3 given by
x2 + z 2 − y 2 = 0 when projected from [1, 0, 0] is not a real algebraic variety.
Exercise 2.4.3.1: Show that if W = V ⊗d and L is the GL(V )-complement
to S d V in V ⊗d , taking p : V ⊗d → V ⊗d /L ' S d V , then p(Seg(PV × · · · ×
PV )) = Chd (V ). Conclude that the Chow variety is indeed a variety.
Equations for the Chow are known, see §7.3.3. However generators of
the ideal of the Chow variety are not known explicitly.
2.4.4. Dimension. Informally, the dimension of a variety is the number
of parameters needed to describe it locally. For example, the dimension of
PV is v − 1 because in coordinates on the open neighborhood where x1 6= 0,
points of PV have a unique expression as [1, x2 , . . . , xv ], where x2 , . . . , xv
are free parameters.
Two definitions of dimension are as follows:
First, the codimension of X ⊂ PV is the unique non-negative integer
c such that there exists some Pc−1 ⊂ PV with X ∩ Pc−1 = ∅, but for all
Pc ⊂ PV one has X ∩ Pc 6= ∅ (see, e.g. [Har95, §11]), and the dimension
28
2. Lower bounds for matrix multiplication
is dim(X) = dim PV − c. In this case proj : X → P(V /Cc ) can be defined, is surjective, and finite to one. This definition illustrates a remarkable
property of projective space: that two projective varieties of complementary
dimension must intersect. One defines the degree of X to be the number
of points of intersection of X with a general Pc , equivalently the number of
points in the inverse image of a general point of proj(X).
Second, the dimension of an irreducible variety X̂ ⊂ V is the dimension
of the tangent space at a smooth point of X̂: define T̂x X̂ = T̂[x] X ⊂ V to
be the span of the tangent vectors x0 (0) to curves x(t) on X̂ with x(0) = x.
A point x ∈ X̂ is defined to be a smooth point if dim T̂y X̂ is constant for all
y in some neighborhood of x. If x is a smooth point, dim X = dim X̂ − 1 =
dim T̂x X̂ − 1.
If a Zariski open subset of a variety is given parametrically, then one
can calculate the tangent space to the variety via the parameter space. For
example Ŝeg(PA × PB × BP C) may be thought of as the image of the map
A × B × C → A⊗B⊗C
(a, b, c) 7→ a⊗b⊗c,
so to compute T̂[a⊗b⊗c] Seg(PA × PB × PC), take curves a(t) ⊂ A with
d
|t=0 a(t)⊗b(t)⊗c(t) = a0 ⊗b⊗c +
a(0) = a and similarly for B, C, then dt
0
0
a⊗b ⊗c + a⊗b⊗c by the Leibnitz rule. Since a0 can be any vector in A and
similarly for b0 , c0 we conclude
T̂[a⊗b⊗c] Seg(PA × PB × PC) = A⊗b⊗c + a⊗B⊗c + a⊗b⊗C.
The right hand side spans a space of dimension a+b+c−2 so dim(Seg(PA×
PB × PC)) = a + b + c − 3.
2.4.5. Zariski and Euclidean closure. Recall from §zarcldef that the
Zariski closure of a set can be larger than the Euclidean closure. Nevertheless, the following theorem, proved using Noether normalization, shows that
in our situation, the competing definitions of closure agree:
Theorem 2.4.5.1. Let Z ⊂ PV be a subset. Then the Euclidean closure of
Z is contained in the Zariski closure of Z. If Z is a Zariski open subset of its
Zariski closure and the closure is irreducible, then the two closures coincide.
The same assertions hold for subsets Z ⊂ V .
For the proof, see, e.g. [Mum95, Thm. 2.33]. Here is a sketch: The first
assertion is easy, as if xj is a sequence of points converging to x0 such that
P (xj ) = 0, then P (x0 ) = 0 because polynomials are continuous functions.
For the second, let X n ⊂ PV be a variety of dimension n and let X0 ⊂ X
be a Zariski open subset whose Zariski closure is X. Let p ∈ X\X0 , we need
to construct a sequence of points in X0 limiting to p. Project X onto a Pn ,
2.4. Definitions and examples from algebraic geometry
29
call the projection π. The image of X\X0 will be a proper subvariety of
Pn . Let f be a nonzero equation in its ideal. Take a point a ∈ Pn such that
f (a) 6= 0. Now restrict f to the P1 spanned by a and π(p). It will have a
finite number of zeros on this line, so one can take a sequence of points aj
on this line, starting at a, limiting to π(p) that avoids the zeros of f . Finally
there exists a choice of points xj on X0 that project to the aj , which limit
to p.
**picture to be added here***
Definition 2.4.5.2. A general point of a variety Z is a point not lying on
some explicit Zariski closed subset of Z. This subset is often understood
from the context and so not mentioned.
2.4.6. A geometric description of the ideal of a variety. For a linear
subspace U ⊂ V , define its annhilator
U ⊥ := {α ∈ V ∗ | α(u) = 0 ∀u ∈ U } ⊂ V ∗ .
Exercise 2.4.6.1: Show that (U ⊥ )⊥ = U .
Proposition 2.4.6.2. Let X ⊂ PV be a variety. Then Id (X) = (span(v̂d (X)))⊥ .
Proof. We have the chain of equalities:
P ∈ Id (X) ⇔ P (x) = 0 ∀x ∈ X̂
⇔ P (x, . . . , x) = 0 ∀x ∈ X̂
⇔ hP , xd i = 0 ∀x ∈ X̂
⇔ P ∈ (x̂d )⊥ ∀x ∈ X̂
where the brackets in the third line represent the pairing of a vector in a
vector space and a vector in the dual space. The intersection of all the
hyperplanes (x̂d )⊥ is (span(v̂d (X)))⊥ .
2.4.7. Secant Varieties. In order to better study σr , which governs the
complexity of Mhni , it will be useful to place the study in the broader context
of secant varieties, an extensively studied class of varieties.
Given a variety X ⊂ PV , define the X-rank of p ∈ V , RX (p), to be the
smallest r such that there exist x1 , . . . , xr ∈ X̂ such that p is in the span of
of x1 , . . . , xr , and the X-border rank RX (p) is defined to be the smallest r
such that there exist curves x1 (t), . . . , xr (t) ∈ X̂ such that p is in the span of
the limiting plane limt→0 hx1 (t), . . . , xr (t)i, where hx1 (t), . . . , xr (t)i is viewed
as a curve the Grassmannian. I will use the same terms and notation for
points in projective space.
30
2. Lower bounds for matrix multiplication
Let σr (X) ⊂ PV denote the set of points of X-border rank at most r,
called the r-th secant variety of X. (Theorem 2.4.5.1 assures us that σr (X)
is indeed a variety.) When X = σ1 = Seg(PA1 × · · · × PAn ) is the set of
rank one tensors, σr (X) = σr .
2.4.8. Glynn’s algorithms for the permanent. **Glynn’s theorem to
go here***
2.4.9. Homogeneous varieties, orbit closures, and G-varieties. The
Segre, Veronese and Grassmannian are examples of homogeneous varieties:
Definition 2.4.9.1. A subvariety X ⊂ PV , is homogeneous if it is a closed
orbit of some point x ∈ PV under the action of some group G ⊂ GL(V ). If
Gx ⊂ G is the subgroup fixing x, we write X = G/Gx .
Most orbits are not varieties, so one takes their orbit closures to obtain
a variety. When d ≤ v the Chow variety is an orbit closure, namely, if
x1 , . . . , xv is a basis of V ,
Chd (V ) = GL(V ) · [x1 · · · xd ] ⊂ PS d V.
Exercise 2.4.9.2: Show that Chd+1 (Cd ) is also an orbit closure.
When r ≤ ai for 1 ≤ i ≤ n, σr (Seg(PA1 × · · · × PAn )) is an orbit closure.
α
Namely, letting aj j , 1 ≤ αj ≤ aj , be a basis of Aj ,
σr (Seg(PA1 ×· · ·×PAn )) = GL(A1 ) × · · · × GL(An ) · [a11 ⊗ · · · ⊗ a1n + · · · + ar1 ⊗ · · · ⊗ arn ] ⊂ P(A1 ⊗ · · · ⊗
In particular,
(2.4.1)
⊕r
σr (Seg(Pr−1 × Pr−1 × Pr−1 )) = GLr × GLr × GLr · [Mh1i
].
2.4.10. Lie algebras. In much of mathematics it is convenient to work
infinitesimally to linearize problems. The same is true when studying group
actions. This linearization of the group action is via its Lie algebra.
Let gl(V ) := (End(V ), [·, ·]) be the endomorphisms of V endowed with
the product [X, Y ] := XY − Y X. With this structure, gl(V ) is called the
Lie algebra of GL(V ). Define an action of gl(V ) on V ⊗d by
X.(v1 ⊗ · · · ⊗ vd ) := (Xv1 )⊗v2 ⊗ · · · ⊗ vd +v1 ⊗(Xv2 )⊗ · · · ⊗ vd +· · ·+v1 ⊗ · · · ⊗ vd−1 ⊗(Xvd ).
This is an infinitesimal version of the action of GL(V ) on V ⊗d : let g(t) ⊂
d
GL(V ) be a curve with g(0) = Id and g 0 (0) = X. Then dt
|t=0 g(t) ·
v1 ⊗ · · · ⊗ vd = X.(v1 ⊗ · · · ⊗ vd ). If G ⊂ GL(V ) is a subgroup, and we restrict to curves in G, the resulting subset of gl(V ) obtained is a sub-algebra,
called the Lie algebra of G, and is denoted g. There is a 1−1 correspondence
between GL(V )-modules in V ⊗d and gl(V )-modules.
2.4. Definitions and examples from algebraic geometry
31
In particular, the action of ⊕j gl(Aj ) on A1 ⊗ · · · ⊗ An is
(X1 ⊕ · · · ⊕ Xn ).a1 ⊗ · · · ⊗ an =(X1 a1 )⊗a2 ⊗ · · · ⊗ an + a1 ⊗(X2 a2 )⊗ · · · ⊗ an +
· · · + a1 ⊗ · · · ⊗ an−1 ⊗(Xn an ).
Exercise 2.4.10.1: Show that sl(U ) defined in Proposition 2.3.2.4 is the
Lie algebra of SL(U ) := {g ∈ GL(U ) | det(g) = 1}.
2.4.11. Tangent spaces to orbit closures. If X = G · [v] ⊂ PV is an
orbit closure, we can compute the tangent space T̂v X̂ ⊂ V using the Lie
algebra:
T̂v X̂ = g.v,
as these are the points
d
dt |t=0 g(t)
· v for curves g(t) ⊂ G with g(0) = Id.
Exercise 2.4.11.1: Compute T̂e1 ⊗f1 Ŝeg(PV ×PW ), T̂ed v̂d (PV ), and T̂e1 ∧···∧ek Ĝ(k, V )
1
by this method.
If X = G · [v] is an orbit closure, let gv := {Z ∈ g | Z.v = 0} ⊂ g and
g[v] := {Z ∈ g | Z.v ≡ 0 mod v} ⊂ g, respectively the Lie algebras of the
stabilizers of v and [v]. (Here x ≡ 0 mod v means x = λv for some λ ∈ C.)
Exercise 2.4.11.2: Show that gv and g[v] are indeed the Lie algebras of Gv
and G[v] .
Exercise 2.4.11.3: Show that dim X = dim g − dim g[v] .
2.4.12. Border rank and orbit closures. This section will be important
for our study of geometric complexity theory.
Proposition 2.4.12.1. If T 0 ∈ GL(A) × GL(B) × GL(C) · T ⊂ A⊗B⊗C,
then R(T 0 ) ≤ R(T ).
Exercise 2.4.12.2: Prove Proposition 2.4.12.1. }
Consider the orbit closure of the matrix multiplication tensor
GL(A) × GL(B) × GL(C) · [MhU,V,W i ] ⊂ P(A⊗B⊗C).
Exercise 2.4.12.3: Let V be a G-module and let v, w ∈ V . Show that
w ∈ G · v if and only if G · w ⊆ G · v.
By Exercise 2.4.12.3, we may rephrase our characterization of border
rank as, taking inclusions A, B, C ⊂ Cr ,
R(Mhni ) > r ⇔ [Mhni ] 6∈ σr (Seg(PA × PB × PC))
⇔ GLr × GLr × GLr · [Mhni ] 6⊂ σr (Seg(Pr−1 × Pr−1 × Pr−1 ))
⊕r
]
⇔ GLr × GLr × GLr · [Mhni ] 6⊂ GLr × GLr × GLr · [Mh1i
32
2. Lower bounds for matrix multiplication
The last equality indicates that this problem may be viewed as a “toy” case
of GCT.
Although it is not immediately clear why the following containment is
useful, we will see in §3.4 that it is an important step in implementing
Strassen’s “laser” method to prove upper bounds on the exponent of matrix
multiplication:
Theorem 2.4.12.4 (Strassen [Str87]). Set r = b 34 n2 c and choose a linear
2
embedding Cr ⊂ Cn . Then
σr (Seg(Pr−1 × Pr−1 × Pr−1 )) ⊂ GLn2 × GLn2 × GLn2 · [Mhni ],
i.e.,
⊕r
] ⊂ GLn2 × GLn2 × GLn2 · [Mhni ].
GLr × GLr × GLr · [Mh1i
Proof. Write xij for a basis of A etc. Then, setting h = d 3n
2 e + 1, we have
X
i+j+k=h
n
1 X 1 (i2 +j 2 )+2ij+( h −i−j)h+r
3
(t 2
xij )
t→0 t3r
xij ⊗yjk ⊗zki = lim
i,j,k=1
⊗(t
1
(k2 +j 2 )+2kj+( h
−k−j)h+r
2
3
1
yjk )⊗(t 2 (i
2 +k 2 )+2ik+( h −i−k)h+r
3
because the exponent of t in each term in the summation is
i2 + j 2 + k 2 + 2(ij + jk + ik) + h2 − 2(i + j + k)h + 3r = (i + j + k − h)2 + 3r.
and dividing by t3r we obtain an exponent of (i + j + k − h)2 which is zero
exactly when i + j + k = h and is positive otherwise. Since any two indices
in a zero-th order term uniquely determine the third, each variable appears
in at most one term, and there are b 34 n2 c such terms. Now just relabel. Exercise 2.4.12.5: Verify the assertions that the exponents of t are all
non-negative and that there are b 43 n2 c choices of triples.
Remark 2.4.12.6. Theorem 2.4.12.4 may be interpreted as saying that one
can degenerate Mhni to a tensor that computes b 43 n2 c independent scalar
multiplications. If we have any tensor realized as Mhni ⊗T , the same degen⊕r
eration procedure works to degenerate it to Mh1i
⊗T , which is the key to
Strassen’s “laser method”.
2.4.13. G-varieties. Secant varieties of the Segre variety are not orbit closures in general, but they are invariant under the action of G := GL(A1 ) ×
· · · × GL(An ) ⊂ GL(A1 ⊗ · · · ⊗ An ), in the sense that if x ∈ σr (Seg(PA1 ×
· · · × PAn )) and g ∈ G, then g · x ∈ σr (Seg(PA1 × · · · × PAn )).
Definition 2.4.13.1. A variety invariant under the action of a group is
called a G-variety.
zki )
2.4. Definitions and examples from algebraic geometry
33
An elementary, but important observation is:
If X ⊂ PV is a G-variety, then I(X) is a G-submodule of Sym(V ∗ ).
Exercise 2.4.13.2: Prove the observation. }
Thus one can use representation theory to find equations for a G-variety
X ⊂ PV by decomposing Sym(V ∗ ) as a G-module. Consider the example of X = Seg(PA∗ × PB ∗ ) ⊂ P(A⊗B)∗ , the variety of rank one linear
maps. By Exercise 2.3.5.7, we have S 2 (A⊗B) = (S 2 A⊗S 2 B)⊕(Λ2 A⊗Λ2 B).
Given [a⊗b] ∈ Seg(PA × PB), we have v2 ([a⊗b]) = [a2 ⊗b2 ]. Using Exercise
2.4.6.2, we recover that as a GL(A)×GL(B)-module, I2 (Seg(PA∗ ×PB ∗ )) =
Λ2 A⊗Λ2 B.
Exercise 2.4.13.3: Determine I2 (Seg(PA×PB×PC)) as a GL(A)×GL(B)×
GL(C)-module. }
In order to make use of representation theory in the study of matrix
multiplication, it will be useful to have information about the decomposition
of S d (A⊗B⊗C) as a GL(A)×GL(B)×GL(C)-module. I postpone discussion
of this decomposition until §5.2.
2.4.14. Dimensions of secant varieties. Let X ⊂ PV be a variety. A
naı̈ve parameter count gives that if dim X = n, one expects dim σr (X) =
min{rn + r − 1, v − 1}, because to locate a point on σr (X), one gets to pick r
points on x and then a point on the Pr−1 that they span. Call this number
the expected dimension of σr (X). An efficient way to compute the actual
dimension is to use local differential geometry: Let ([x1 ], . . . , [xr ]) ∈ X ×r
be a general point in the sense that the next statement fails for a Zariski
closed subset of X ×r . If p = x1 + · · · + xr , then the dimension of σr (X)
is its expected dimension minus the “number of ways” one can move the
xi within X such that their span still contains p. Terracini’s lemma is an
infinitesimal version of this remark.
Consider the map
add : V × · · · × V → V
(v1 , . . . , vr ) 7→ v1 + · · · + vr .
When restricted to X̂ ×r , the image is σ̂r0 (X). In particular any curve w(t)
in σ̂r0 (X) may be written in the form w(t) = v1 (t) + · · · + vr (t) for some
curves vj (t) ⊂ X̂. Differentiating at t = 0, we conclude
Lemma 2.4.14.1 (Terracini’s lemma).
(2.4.2)
T̂[v1 +···+vr ] σr (X) ⊆ T̂[v1 ] X + · · · + T̂[vr ] X.
Equality holds for a Zariski open subset of points ([v1 ], . . . , [vr ]) ∈ X ×r .
34
2. Lower bounds for matrix multiplication
If the ambient space is large enough, usually the tangent spaces will
only intersect at the origin and we recover the computation of the expected
dimension.
Corollary 2.4.14.2. If (x1 , . . . , xr ) ∈ (X ×r )general , then dim σr (X) = dim(T̂x1 X+
· · · + T̂xr X) − 1.
Exercise 2.4.14.3: Show that the rank of a general element in A1 ⊗ · · · ⊗ An
n
is at least P aa1 ···a
.
j −n+1
j
Example 2.4.14.4. Let X = Seg(PA × PB × PC). Write p = a1 ⊗b1 ⊗c1 +
a2 ⊗b2 ⊗c2 for a general point of σ̂2 (X), where a1 , a2 etc. are linearly independent. Then
T̂[p] σ2 (X) = T̂[a1 ⊗b1 ⊗c1 ] X + T̂[a2 ⊗b2 ⊗c2 ] X
=(a1 ⊗b1 ⊗C + a1 ⊗B⊗c1 + A⊗b1 ⊗c1 )
+ (a2 ⊗b2 ⊗C + a2 ⊗B⊗c2 + A⊗b2 ⊗c2 )
and it is easy to see that T̂[a1 ⊗b1 ⊗c1 ] X and T̂[a2 ⊗b2 ⊗c2 ] X intersect only at the
origin. Thus dim σ2 (Seg(PA × PB × PC)) = 2 dim Seg(PA × PB × PC) + 1.
With a little more work, one can show the following, which would have
anticipated Strassen’s algorithm:
Proposition 2.4.14.5. σ7 (Seg(P3 × P3 × P3 )) = P63 , so any tensor T ∈
C4 ⊗C4 ⊗C4 is either the sum of at most seven rank one tensors, or a limit
of sums of seven rank one tensors.
2.5. Strassen’s equations and generalizations
This section gives a history of, and presents the state of the art for, lower
bounds for R(Mhni ).
2.5.1. Beyond the classical equations. Recall the equations for σa−1 (Seg(PA×
PB × PC)) from §2.2.4, which were, for T ∈ A⊗B⊗C, the maximal minors
of the linear map TA : A∗ → B⊗C (I suppress reference to the obvious
exchanges of A, B, C in what follows). To extract more information, let’s
examine the image of this map. Assume b = c so the image is a space of linear maps Cb → Cb . (If b < c, just restrict to some Cb ⊂ C.) If R(T ) = b,
then TA (A∗ ) = T (A∗ ) will be spanned by b rank one linear maps.
Exercise 2.5.1.1: Show, assuming a = b = c and TA is injective, that
R(T ) = b if and only if T (A∗ ) is spanned by b rank one linear maps.
2.5. Strassen’s equations and generalizations
35
How can we test if the image is spanned by b rank one linear maps? If
T = a1 ⊗b1 ⊗c1 + · · · aa ⊗ba ⊗ca with each set of vectors a basis, then



x1









x
2


∗
T (A ) = 
|
x
∈
C
,

j
..




.






xa
and this is the case for a general rank a tensor in Ca ⊗Ca ⊗Ca . That is,
TA (A∗ ) ⊂ B⊗C, when T has border rank a lies in the Zariski closure of the
subspaces that, under the action of GL(B) × GL(C) are diagonalizable. So
we have the problem: determine polynomials on B⊗C that vanish of the set
of diagonalizable subspaces.
A set of equations whose zero set is exactly the Zariski closure of the
diagonalizable matrices is not known! What follows are some equations.
Recall that B⊗C = Hom(B ∗ , C). If instead we had Hom(C, C), a necessary
condition for endomorphisms to be simultaneously diagonalizable is that
they must commute, and the algebraic test for a subspace U ⊂ Hom(C, C)
to be abelian is simple: the commutators [Xi , Xj ] := Xi Xj − Xj Xi must
vanish on a basis X1 , . . . , Xu of U . (Note that commutators only make
sense for maps from a vector space to itself.) These degree two equations
exactly characterize abelian subspaces. We do not have maps from a vector
space to itself, but we can fix the situation if there exists α ∈ A∗ such that
TA (α) : B ∗ → C is invertible, as then we could test if the commutators
[TA (α1 )TA (α)−1 , TA (α2 )TA (α)−1 ] are zero. So we now have a test, but it is
not expressed in terms of polynomials on A⊗B⊗C, and we cannot apply it
to all tensors.
To fix these problems, we need a substitute for the inverse of a linear
map. Recall that the inverse of a b × b invertible matrix X is X −1 =
1
T
det(X) ∆X , where ∆X is the cofactor matrix of X, whose entries are the
minors of size b − 1 of X. In bases,
[TA (α1 )TA (α)−1 , TA (α2 )TA (α)−1 ] = [TA (α1 )
1
1
∆TTA (α) , TA (α2 )
∆T
]
det(TA (α))
det(TA (α)) TA (α)
and, as long as det(TA (α)) 6= 0 (i.e., as long as the expression makes sense),
we can multiply the expression by det(TA (α))2 to obtain the equations
[T (α1 )∆TT (α) , T (α2 )∆TT (α) ] = 0.
The equations will not be robust if T (α) is not of full rank.
Here is a coordinate free description of the cofactor matrix which will
facilitate generalizations, as well as determining redundancies in equations.
Recall from Exercise 2.3.5(6) that a linear map f : V → W induces linear
36
2. Lower bounds for matrix multiplication
maps f ∧k : Λk V → Λk W . Assume v = w. Fixing a volume form on V , i.e.,
a nonzero element of Ω ∈ Λv V ∗ , we may identify Λv−1 V ' V ∗ by the map
V ∗ → Λv−1 V
α 7→ α Ω.
Fixing volume elements and the corresponding identifications, consider
f ∧v−1 : V ∗ → W ∗ , and taking transpose, (f ∧v−1 )T : W → V .
Exercise 2.5.1.2: What is the relationship between f ∧v−1 and f −1 when
f is invertible? What happens when det(f ) = 0?
2.5.2. Strassen’s test as polynomials. Combining the discussions of the
previous two subsections, we obtain:
Proposition 2.5.2.1. Let T ∈ A⊗B⊗C and assume b = c. Then R(T ) ≤
b implies that for all α, α1 , α2 ∈ A∗ , the linear map [(T (α)∧b−1 )T T (α1 ), (T (α)∧b−1 )T T (α2 )]
is zero.
To see the degree, these equations are of degree one in the entries of
the commutator, the entries of T (αj ) are linear in the entries of T , and
the entries of (T (α)∧b−1 )T are of degree b − 1 in the entries of T , so the
polynomials are of degree 2b. We can lower the degree by observing that
[(T (α)∧b−1 )T T (α1 ),(T (α)∧b−1 )T T (α2 )]
= (T (α)∧b−1 )T T (α1 )(T (α)∧b−1 )T T (α2 ) − (T (α)∧b−1 )T T (α2 )(T (α)∧b−1 )T T (α1 )
so we can work instead with
T (α1 )(T (α)∧b−1 )T T (α2 ) − T (α2 )(T (α)∧b−1 )T T (α1 ),
dropping the degree to b + 1.
2.5.3. Strassen’s equations. If T ∈ A⊗B⊗C is “close to” having rank
a = b = c, one expects, using α with T (α) invertible, that T (A∗ )T (α)−1
will be “close to” being abelian. The following theorem makes this precise:
Theorem 2.5.3.1 (Strassen). [Str83] Let T ∈ A⊗B⊗C and assume b = c.
Then R(T ) ≤ r implies that for all α, α1 , α2 ∈ A∗ , the size 2(r − b) + 1
minors of the linear map
(2.5.1)
ST Rα,α1 ,α2 (T ) := T (α1 )(T (α)∧b−1 )T T (α2 ) − T (α2 )(T (α)∧b−1 )T T (α1 )
are all zero. In particular
1
R(T ) ≥ rank(ST Rα,α1 ,α2 (T )) + b.
2
I prove Theorem 2.5.3.1 for the case of the maximal minors in §2.5.4
below and in general in §2.7.2.
2.5. Strassen’s equations and generalizations
37
We now have potential tests for border rank for tensors in CN ⊗CN ⊗CN
up to r = 23 N .
Strassen uses Theorem 2.5.3.1 to show that R(Mhni ) ≥ 32 n2 :
Exercise 2.5.3.2: Prove R(Mhni ) ≥ 23 n2 . }
A proof that R(Mhni ) ≥
§2.5.5.
3 2
2n
using representation theory is given in
A natural question arises: we actually have three sets of such equations are the three sets of equations the same or different? We should have already
asked this question for the three types of usual flattenings: are the equations
coming from the minors of TA , TB , TC the same or different? Representation
theory will enable us to answer these questions.
Remark 2.5.3.3. One can generalize Strassen’s equations by taking higher
order commutators, see [LM08]. These generalizations do give new equations, but they do not give equations for border rank beyond the 23 b of
Strassen’s equations.
2.5.4. Reformulation and proof of Strassen’s equations. Let dim A =
3 and dim B = dim C = b. We augment the linear map TB : B ∗ → A⊗C
by tensoring it with IdA , to get a linear map
TB ⊗IdA : B ∗ ⊗A → A⊗A⊗C.
So far we have done nothing interesting, but the target of this map decomposes as (Λ2 A⊗C)⊕(S 2 A⊗C), and we may project onto these factors. Write
the projections as:
(2.5.2)
∧
◦
TBA
= TA∧ : A⊗B ∗ → Λ2 A⊗C and TBA
: A⊗B ∗ → S 2 A⊗C.
Exercise 2.5.4.1: Show that if T = a⊗b⊗c is a rank one tensor, then
◦ ) = 3.
rank(TA∧ ) = 2 and rank(TBA
Exercise 2.5.4.1 implies:
◦ ) ≤ 3r.
Proposition 2.5.4.2. If T has rank r, rank(TA∧ ) ≤ 2r and rank(TBA
Since the dimension of the sources are 3b, by this method one respectively gets potential equations for σr (Seg(P2 ×Pb−1 ×Pb−1 )) up to r = 32 b−1
and r = b − 1. Thus only the first gives interesting bounds. The first set is
Strassen’s equations, as I now show.
Remark 2.5.4.3. As presented, this derivation of the equations seems a bit
“rabbit out of the hat”. However it comes from a long tradition of finding
determinantal equations for algebraic varieties that is out of the scope of
this book. For the experts, given a variety X and a subvariety Y ⊂ X,
one way to find defining equations for Y is to find vector bundles E, F
38
2. Lower bounds for matrix multiplication
over X and a vector bundle map φ : E → F such that Y is realized as
the degeneracy locus of φ, that is, the set of points x ∈ X such that φx
drops rank. Strassen’s equations in the partially symmetric case had been
discovered by Barth in this context. Variants of Strassen’s equations date
back to Frahm-Toeplitz and Aronhold. See [Lan12, §3.8.5] for a discussion.
We will also see in §5.3.3 and §5.3.2 two different ways of deriving Strassen’s
equations via representation theory.
Let a1 , a2 , a3 be a basis of A, with dual basis α1 , α2 , α3 of A∗ so T ∈
A⊗B⊗C may be written as T = a1 ⊗T (α1 ) + a2 ⊗T (α2 ) + a3 ⊗T (α3 ). Then
TA∧1 will be expressed by a 3b × 3b matrix. Ordering the basis of A⊗B ∗ by
a3 ⊗β 1 , . . . , a3 ⊗β b , a2 ⊗β 1 , . . . , a2 ⊗β b , a1 ⊗β 1 , . . . , a1 ⊗β b , and that of Λ2 A⊗C
by (a1 ∧a2 )⊗c1 , . . . , (a1 ∧a2 )⊗cb , (a1 ∧a3 )⊗c1 , . . . , (a1 ∧a3 )⊗cb , (a2 ∧a3 )⊗c1 , . . . , (a2 ∧
a3 )⊗cb , we obtain the block matrix


0
T (α1 ) −T (α2 )
∧
0 
= T (α2 ) T (α3 )
(2.5.3)
TA∧ := TBA
1
T (α )
0
T (α3 )
Recall the following basic identity about determinants, assuming W is invertible:
X Y
(2.5.4)
det
= det(W ) det(X − Y W −1 Z).
Z W
Assume T (α3 ) is invertible and use the (b, 2b) × (b, 2b) blocking (so
X = 0 in (2.5.4)) to obtain
(2.5.5)
det M at(TA∧ ) = det(T (α1 )T (α3 )−1 T (α2 ) − T (α2 )T (α3 )−1 T (α1 ))
Equation (2.5.5) shows the new formulation is equivalent to the old, at least
in the case of maximal rank, and combined with Proposition 2.5.4.2, proves
Theorem 2.5.3.1.
2.5.5. Proof that R(Mhni ) ≥ 32 n2 using representation theory.
Lemma 2.5.5.1. Let TA∧ : B ∗ ⊗A → Λ2 A⊗C be injective, and let A0 ⊂
A∗ be a general subspace. Consider the tensor T 0 := T |A0 ⊗B ∗ ⊗C ∗ . Then
∗
0 ∗
2
0 ∗
(T 0 )∧
(A0 )∗ : B ⊗(A ) → Λ (A ) ⊗C is injective.
Proof. Without loss of generality, assume A0 is a hyperplane, choose a
complementary hyperplane à to (A0 )⊥ in A and identify it with (A0 )∗ . Consider (T 0 )∧
as the composition of the injection B ∗ ⊗Ã → Λ2 A⊗C with
Ã
the projection to Λ2 Ã⊗C. The image of the first map is contained in
(Λ2 Ã⊗C) ⊕ ((A0 )⊥ ⊗Ã⊗C). The kernel of the projection map is the intersection of the image with (A0 )⊥ ⊗Ã⊗C. If we modify à by a small multiple
2.5. Strassen’s equations and generalizations
39
of (A0 )⊥ , the image of the first map will no longer intersect (A0 )⊥ ⊗Ã⊗C
nontrivially.
2
Proof that R(Mhni ) ≥ 3n2 via representation theory. By Lemma 2.5.5.1
∗
2
it will be sufficient to show that (Mhni )∧
A : B ⊗A → Λ A⊗C is injective.
Recall that A = U ∗ ⊗V , B = V ∗ ⊗W , C = W ∗ ⊗U , so
∗
∗
2
∗
∗
(Mhni )∧
A : (V ⊗W )⊗(U ⊗V ) → Λ (U ⊗V )⊗(W ⊗U )
Recalling that MhU,V,W i = IdU ⊗IdV ⊗IdW we see the map factors as (MhU,V,C1 i )∧
A ⊗IdW ∗ ,
j
∗
where, letting {uj } be a basis of U and {u } the dual basis of U ,
∗
2
∗
(MhU,V,C1 i )∧
A : V ⊗(U ⊗V ) → Λ (U ⊗V )⊗U
X
v⊗(µ⊗v 0 ) 7→
(uj ⊗v) ∧ (µ⊗v 0 )⊗uj .
j
This is a GL(U ) × GL(V )-module map, so we can proceed by decomposing the source into irreducible representations, and just show that one
vector in each irreducible component is not in the kernel. By Exercise 2.3.5.7,
Λ2 (U ∗ ⊗V ) = (S 2 U ∗ ⊗Λ2 V ) ⊕ (Λ2 U ∗ ⊗S 2 V ), and V ⊗V = Λ2 V ⊕ S 2 V . The
source is thus (U ∗ ⊗S 2 V ) ⊕ (U ∗ ⊗Λ2 V ). Now just check that the vectors
v⊗(α⊗v) and v1 ⊗(α⊗v2 ) − v2 ⊗(α⊗v1 ) do not map to zero.
2.5.6. Koszul flattenings. The reformulation of Strassen’s equations suggests the following generalization: let dim A = 2p+1, let b = c and consider
(2.5.6)
TA∧p : B ∗ ⊗Λp A → Λp+1 A⊗C
obtained by first taking TB ⊗IdΛp A : B ∗ ⊗Λp A → Λp A⊗A⊗C, and then
projecting to Λp+1 A⊗C.
If ai , bj , ck are bases of A, B, C and T = tijk ai ⊗bj ⊗ck , where I use the
summation convention, then
(2.5.7)
TA∧p (β⊗f1 ∧ · · · ∧ fp ) = tijk β(bj )ai ∧ f1 ∧ · · · ∧ fp ⊗ck .
The map TA∧p is called a Koszul flattening. Note that if T = a⊗b⊗c has
p
rank one, then rank(TA∧p ) = 2p
p as the image is a ∧ Λ A⊗c. By linearity of
the map T 7→ TA∧p we conclude:
Proposition 2.5.6.1. [LO] Let T ∈ A⊗B⊗C. If rank(TA∧p ) ≥ r, then
r
R(T ) ≥ 2p .
p
Since the source (resp. target) has dimension 2p+1
b (resp. 2p+1
c),
p
p
assuming b ≤ c, we potentially obtain equations for σr (Seg(PA×PB ×PC))
40
2. Lower bounds for matrix multiplication
up to
2p+1
b
p
2p
p
r=
−1=
2p + 1
b − 1.
p+1
Just as with Strassen’s equations (case p = 1), if dim A > 2p + 1, one
obtains the best bound for these equations by restricting to subspaces of A∗
of dimension 2p + 1.
The skew-symmetrization map A⊗Λp A → Λp+1 A is a surjective GL(A)module map, so its kernel is a GL(A)-module generalizing S21 A that we saw
in §4.2.2. Its isomorphism class as a GL(A)-module is denoted S2,1p−1 A. In
§5.2 we will see that it is irreducible.
Thus the projection map, A⊗Λp A⊗C → Λp+1 A⊗C, since it is a GL(A)×
GL(C)-module map, has kernel isomorphic to S2,1p−1 A⊗C, so one way to determine the rank of TA∧p would be to compute the dimension of S2,1p−1 A⊗C ∩
(TB (B ∗ )⊗Λp A).
2.5.7. Exercises.
(1) Prove an analogue of Lemma 2.5.5.1 for Koszul flattenings.
(2) Consider the symmetric cousin of Strassen’s equations: for P ∈
S 3 V ⊂ V ⊗V ⊗V , first take P1,2 ⊗IdV : V ∗ ⊗V → S 2 V ⊗V , then
considering S 2 V ⊂ V ⊗V , skew symmetrize on the second and third
factors, to obtain a map P ∧ : V ∗ ⊗V → V ⊗Λ2 V . Now assume further that dim V = 3, so, after the choice of a volume element,
Λ2 V = V ∗ . Show that the resulting matrix in properly ordered
bases is skew-symmetric, and thus its determinant is zero. Moreover, one can show that all the Pfaffians of size 8 centered about
the diagonal are equivalent and give rise to a polynomial called
the Aronhold invariant. What bound does this equation give on
symmetric border rank? See [Lan12, §3.10.1] for more details.
(3) More generally, consider the symmetric cousins of Koszul flattenings: for P ∈ S d V , first take the map Pd−k,k : S d−k V ∗ → S k V ,
tensor it with IdΛp V , and project to get a map
∧p
Pd−k,k
: S d−k V ∗ ⊗Λp V → S k−1 V ⊗Λp+1 V.
∧p
If P = xd , what is the rank of Pd−k,k
? (For those familiar with the
rudiments of differential geometry, note that the map S k V ⊗Λp V →
S k−1 V ⊗Λp+1 V is just the exterior derivative.) We will see later
that S k V ⊗Λp V decomposes into the direct sum of two irreducible
∧p
modules, called Sk,1p V and Sk+1,1p−1 V , so Pd−k,k
can at best surject
onto Sk,1p V .
2.5. Strassen’s equations and generalizations
41
(4) For those handy with computers: determine the rank of (Mh3i )∧4
A .
What lower bound does this give on the border rank?
Answer for those not handy: 14, the same as Strassen.
(5) For those handy with computers: determine the rank of (Mh4i )∧7
A ?
What lower bound does this give on the border rank?
This beats Strassen by one. Lickteig’s bound [Lic84]: R(Mhni ) ≥
3n2
n
2 + 2 − 1 gives the same answer when n = 4.
(6) For those handy with computers: determine the rank of (Mh5i )∧12
A .
What lower bound does this give on the border rank?
This also ties Lickteig’s bound.
(7) For those handy with computers: determine the rank of (Mh6i )∧17
A .
What lower bound does this give on the border rank?
2
Now it looks like something interesting, as 58 > 56 = 3(62 ) +
6
2 −1 .
How would one possibly determine the rank for arbitrary n? If we knew
how to decompose the source and target into irreducible representations, we
2
∧d n e−1
, and
could then see what modules must be in the kernel of (Mhni )A 2
then for the remaining modules, we could check on one vector. However, by
restricting to a well-chosen subspace, one can obtain a good lower bound,
which I now present.
2.5.8. Koszul flattenings and matrix multiplication. Our map is
∗
p
∗
p+1
(U ∗ ⊗V )⊗(W ∗ ⊗U ).
(MhU,V,W i )∧p
A : V ⊗W ⊗Λ (U ⊗V ) → Λ
The presence of IdW = IdW ∗ means the map factors as (MhU,V,W i )∧p
A =
∧p
(Mhu,v,1i )A ⊗IdW ∗ , where
(2.5.8)
p
∗
p+1
(Mhu,v,1i )∧p
(U ∗ ⊗V )⊗U.
A : V ⊗Λ (U ⊗V ) → Λ
v⊗(ξ 1 ⊗e1 ) ∧ · · · ∧ (ξ p ⊗ep ) 7→
u
X
us ⊗(γ s ⊗v) ∧ (ξ 1 ⊗e1 ) ∧ · · · ∧ (ξ p ⊗ep ).
s=1
1
u
∗
where
Pn us1 , . . . , uu is a basis of U with dual basis γ , . . . , γ of U , so IdU =
s=1 γ ⊗us .
Theorem 2.5.8.1. [LO] Let n ≤ m. Then
R(Mhm,n,li ) ≥
In particular R(Mhni ) ≥ 2n2 − n.
nl(n + m − 1)
.
m
42
2. Lower bounds for matrix multiplication
Proof. Set dim U = n, dim V = m, dim W = l. The idea is to choose
a subspace A0 ⊂ A∗ of dimension 2p + 1 on which (Mh1,m,ni |A0 ⊗U ∗ ⊗V ∗ )∧p
Ã
becomes injective for p = n − 1, where à ⊂ A is a choice of inclusion (A0 )∗ ⊂
A. We need a choice that on one hand is generic enough to force injectivity,
but on the other is easy to compute with. We will use representation theory
to help us by giving A extra structure as a G-module for some G ⊂ GL(A).
In fact we’ll just use G = SL2 as follows: Take a vector space E of dimension
2, and fix isomorphisms U ' S n−1 E, V ' S m−1 E ∗ . Let A0 be the SL(E)submodule S m+n−2 E ∗ ⊂ S n−1 E ∗ ⊗S m−1 E ∗ = U ∗ ⊗V . The SL(E)-module
structure of A determines an inclusion (A0 )∗ ⊂ A.
Recall from Definition 2.3.4.3 the contraction map, for α ≥ β:
S α E × S β E ∗ → S α−β E
(f, g) 7→ g f.
In the case f = lα for some l ∈ E, then g lα = g(l)lα−β (here g(l)
denotes g, considered as a polynomial, evaluated at the point l), so that
g lα = 0 if and only if l is a root of g.
Consider the transposed map
((Mh1,m,ni |A0 ⊗U ∗ ⊗V ∗ )∧p
)T : S m−1 E ∗ ⊗Λn S m+n−2 E → S n−1 E⊗Λn−1 S m+n−2 E
Ã
g⊗(f1 ∧ · · · ∧ fn ) 7→
n
X
(−1)i−1 (g fi )⊗f1 ∧ · · · fˆi · · · ∧ fn .
i=1
The map ((Mh1,m,ni |A0 ⊗U ∗ ⊗V ∗ )∧p
)T is surjective: Let ln−1 ⊗(l1m+n−2 ∧
Ã
m+n−2
) ∈ S n−1 E⊗Λn−1 S m+n−2 E with l, li ∈ E. Such elements span
· · · ∧ ln−1
the target so it will be sufficient to show any such element is in the image.
Assume first that l is distinct from the li . Since n ≤ m, there is a polynomial
g ∈ S m−1 E ∗ which vanishes on l1 , . . . , ln−1 and is nonzero on l. Then, up to
m+n−2
a nonzero scalar, g⊗(l1m+n−2 ∧ · · · ∧ ln−1
∧ lm+n−2 ) maps to our element.
The condition that l is distinct from the li may be removed by taking
limits, as the image of a linear map is closed.
We conclude:
rank(Mh1,m,ni |A0 ⊗U ⊗V )∧p
Ã
R(Mhn,m,li ) ≥ l
= nl
n+m−2
n−1
n+m−1
n−1
n+m−2
n−1
=
nl (n + m − 1)
.
m
Remark 2.5.8.2. The Koszul flattenings actually give equations for border
rank in CN ⊗CN ⊗CN up to 2N − 3, see [Lan]. On the other hand, the map
2
∧d n2 e−1
(Mhni )A
: Λd
n2
e−1
2
A⊗B ∗ → Λd
n2
e
2
A⊗C is far from being injective,
2.6. Lower bounds for the rank of matrix multiplication
43
see §5.2.10. This shows that there are nontrivial equations for border rank
2n2 − 3 that are satisfied by Mhni .
2.5.9. Koszul flattenings in coordinates. To prove lower bounds on the
rank of matrix multiplication, and to facilitate a comparison with Griesser’s
equations discussed in §2.7.2, it will be useful to view TA∧p in coordinates.
Let dim A = 2p + 1. Write T = a0 ⊗X0 + · · · + a2p ⊗X2p where aj is a basis
of A with dual basis αj and Xj = T (αj ). An expression of TA∧p in bases is
2p
as follows: write aI := ai1 ∧ · · · ∧ aip for Λp A, require that the first p−1
basis vectors of Λp A have i1 = 0, that the second 2p
p do not, and call these
2p
multi-indices 0J and K. Order the bases of Λp+1 A such that the first p+1
multi-indices do not have 0, and the second 2p
p do, and furthermore that
the second set of indices is ordered the same way as K, only we write 0K
since a zero index is included. The resulting matrix is of the form
0 Q
(2.5.9)
Q̃ R
2p
2p
2p
where this matrix is blocked ( p+1
b, 2p
p b) × ( p+1 b, p b),


X0


..
R=
,
.
X0
and Q, Q̃ have entries in blocks consisting of X1 , . . . , X2p and zero. Thus
if X0 is of full rank and we change coordinates such that it is the identity
matrix, so is R and the determinant equals the determinant of QQ̃ by (2.5.4).
When p = 1, QQ̃ = [X1 , X2 ], and when p = 2


0
[X1 , X2 ] [X1 , X3 ] [X1 , X4 ]
[X2 , X1 ]
0
[X2 , X3 ] [X2 , X4 ]
.
(2.5.10)
QQ̃ = 
[X3 , X1 ] [X3 , X2 ]
0
[X3 , X4 ]
[X4 , X1 ] [X4 , X2 ] [X4 , X3 ]
0
2p
2p
In general QQ̃ is a block p−1
b × p−1
b matrix whose block entries are
either zero or commutators [Xi , Xj ].
2.6. Lower bounds for the rank of matrix multiplication
2.6.1. The results. Any tensor T with a positive dimensional symmetry group is a good candidate to have lower border rank than rank. This
intuition comes from symmetry generally implying non-uniqueness of an expression of T as a sum of R(T ) rank one tensors, as discussed in §4.2.
Lower bounds for the rank of matrix multiplication beyond the border
rank arise by exploiting both a generic property and a pathological property
44
2. Lower bounds for matrix multiplication
of Mhni . Of these two, the pathological property is much more important,
the generic property only modifies the error term.
The pathological property is that Mhl,m,ni satisfies equations coming
from the Koszul flattenings if and only if Mh1,m,ni satisfies analogous equations (which are of lower degree).
Using the pathological property, one obtains:
Theorem 2.6.1.1. [Lan14b] Let p < n − 1. Then
2p + 1
2p + 1
2
R(Mhn,n,mi ) ≥
nm + n − (2p + 1)
n.
p+1
p
This gives a bound of the form R(Mhni ) ≥ 3n2 − o(n2 ) by taking, e.g.,
p = log(log(n)).
The generic property is as follows: Recall that when deriving Strassen’s
equations, there was a complication because we could not assume there
existed an invertible linear map in T (A∗ ) ⊂ B⊗C. However, if T ∈ A⊗B⊗C
is a tensor such that there does exist an invertible linear map in T (A∗ ), we
can simplify the equations by choosing it. The error term may be improved
by examining the equations in coordinates, using the genericity property
and exploiting identities about determinants. The result is:
Theorem 2.6.1.2. [MR13] Let p ≤ n be a natural number. Then
p
2p
2p − 2
2
(2.6.1) rank(Mn,n,m ) ≥ (1+
)nm+n − 2
−
+2 n.
p+1
p+1
p−1
When n = m,
(2.6.2)
rank(Mhni ) ≥ (3 −
1
2p
2p − 2
)n2 − 2
−
+ 2 n.
p+1
p+1
p−1
For example, when p = 1 one recovers Bläser’s bound of 25 n2 −3n. When
2
p = 3, the bound (2.6.2) becomes 11
4 n − 26n, which improves Bläser’s for
n ≥ 132. A modification of the method also yields R(Mhni ) ≥ 83 n2 − 7n.
See [MR13, Lan14b] for proofs of the modifications of the error terms.
2.6.2. Proof of Theorem 2.6.1.1. The following standard Lemma, also
used in [Blä03], appears in this form in [Lan12, Lemma 11.5.0.2]:
Lemma 2.6.2.1. Let Ca be given a basis e1 , . . . , ea . Given a polynomial P
of degree d on Ca , there exists a subset {ei1 , . . . , eid } such that P |hei1 ,...,ei i
d
is not identically zero.
The lemma follows by simply choosing the basis vectors from a monomial
that appears in P . For example, Lemma 2.6.2.1 implies that a quadric
surface in P3 cannot contain six lines whose pairwise intersections span P3 .
2.6. Lower bounds for the rank of matrix multiplication
45
Recall the Grassmannian G(k, V ) ⊂ PΛk V from §2.4.2. The Plücker
coordinates (xµα ), k + 1 ≤ µ ≤ dim V = v, 1 ≤ α ≤ k are obtained by
choosing a basis e1 , . . . , ev of V , centering
[e1 ∧ · · · ∧ ek ],
P the coordinates atP
and writing a nearby k-plane as [(e1 + xµ1 eµ ) ∧ · · · ∧ (ek + xµk eµ )]. One
says a function on G(k, V ) is a polynomial of degree d if, as a function in
the Plücker coordinates, it is a degree d polynomial. If the polynomial is
also homogeneous in the xµα , it is the restriction of a homogeneous degree d
polynomial on Λk V to G(k, V ).
Lemma 2.6.2.2. Let A be given a basis. Given a homogeneous polynomial
of degree d on the Grassmannian G(k, A), there exists at most dk basis
vectors such that, denoting their (at most) dk-dimensional span by A0 , P
restricted to G(k, A0 ) is not identically zero.
Proof. Consider the map f : A×k → G(k, A) given by (a1 , . . . , ak ) 7→ [a1 ∧
· · · ∧ ak ]. Then f is surjective. Take the polynomial P and pull it back by f .
Here the pullback f ∗ (P ) is defined by f ∗ (P )(a1 , . . . , ak ) := P (f (a1 , . . . , ak )).
The pullback is of degree d in each copy of A. (I.e., fixing k − 1 of the aj ,
it becomes a degree d polynomial in the k-th.) Now simply apply Lemma
2.6.2.1 k times to see that the pulled back polynomial is not identically zero
restricted to A0 , and thus P restricted to G(k, A0 ) is not identically zero. Remark 2.6.2.3. The bound in Lemma 2.6.2.2 is sharp, as give A a basis a1 , . . . , aa and consider the polynomial
on Λk A with coordinates xI =
P
i
i
x 1 , . . . , x k corresponding to the vector I xI ai1 ∧ · · · ∧ aik :
P = x1,...,k xk+1,...,2k · · · x(d−1)k+1,...,dk .
Then P restricted to G(k, ha1 , . . . , adk i) is non-vanishing but there is no
smaller subspace spanned by basis vectors on which it is non-vanishing.
Proof of Theorem 2.6.1.1. Say R(Mhn,n,mi ) = r and write
(2.6.3)
Mhn,n,mi =
r
X
aj ⊗bj ⊗cj .
j=1
We will show that the Koszul-flattening equation is already non-zero restricted to a subset of this expression for a judicious choice of à ⊂ A of
dimension 2p + 1 with p < n − 1. Then the rank will be at least the border
rank bound plus the number of terms not in the subset. Here are the details:
Define
P : G(2p + 1, A) → C
∧p
: Λp Ã⊗B ∗ → Λp+1 Ã⊗C).
à 7→ det((Mhn,n,mi |A0 ×B ∗ ×C ∗ )Ã
Since p ≤ n and we proved P is not identically zero when p = n, P will not
be identically zero when p < n either.
46
2. Lower bounds for matrix multiplication
Now P is a polynomial of degree 2p+1
nm > nm, so at first sight, e.g.,
p
when m ∼ n, Lemma 2.6.2.2 will be of no help because dk > dim A = n2 ,
but since
∧p
∧p
(Mhn,n,mi |A0 ×B ∗ ×C ∗ )Ã
= (Mhn,n,1i |A0 ×U ∗ ×V ∗ )Ã
⊗IdW ∗ ,
we actually have P = P̃ m , where
P̃ : G(2p + 1, A) → C
∧p
à 7→ det((Mhn,n,1i |A0 ×U ∗ ×V ∗ )Ã
: Λp Ã⊗U ∗ → Λp+1 Ã⊗V ).
Hence we may work with P̃ which is of degree 2p+1
n which will be less
p
2
∗
than n if p is sufficiently small. Since (Mhn,n,mi )A : A → B⊗C is injective,
some subset of the aj forms a basis of A. Lemma 2.6.2.2. implies
that there
2p+1
exists a subset of those basis vectors of size dk =
n(2p + 1), such
p
that if we restrict to terms of the expression (2.6.3) that use only aj whose
expansion in the fixed basis has nonzero terms from that subset of dk basis
vectors, calling the sum of these terms M 0 , we have R(M 0 ) ≥ 2p+1
p+1 nm. Let
00
M be the sum of the remaining
terms in the expression. There are at
least a − dk = n2 − 2p+1
n(2p
+
1)
of the aj appearing in M 00 (the terms
p
corresponding to the complementary basis vectors). Since we assumed we
had an optimal expression for Mhn,n,mi , we have
R(Mhn,n,mi ) = R(M 0 ) + R(M 00 )
2p + 1
2p + 1
2
≥
nm + [n − (2p + 1)
n].
p+1
p
The further lower bounds are obtained by lowering the degree of the
polynomial by localizing the equations. An easy such localization is to set
X0 = Id which reduces the determinant of (2.5.9) to that of (2.5.10) when
p = 2 and yields a similar reduction of degree in general. Further localizations both reduce the degree and the size of the Grassmannian, both of
which improve the error term.
2.7. Additional equations for σr (Seg(PA × PB × PC))
There are numerous motivations for studying equations for σr = σr (Seg(PA×
PB × PC)) from algebraic geometry, signal processing, quantum information theory etc. Two motivations coming from matrix multiplication that we
have discussed are: First, to prove lower bounds, if we have P ∈ I(σr ) such
that P (Mhni ) 6= 0, we know R(Mhni ) > r. Second, to prove upper bounds,
it would suffice to find set-theoretic equations for σR , that is, polynomials
P1 , . . . , Pq such that there common zero set is exactly σR , and then show
2.7. Additional equations for σr (Seg(PA × PB × PC))
47
Pj (Mhni ) = 0 for all j = 1, . . . , q to conclude R(Mhni ) ≤ R. As I describe
below, for the second problem, something weaker will suffice (because Mhni
is 1-generic), but even the easier problem seems currently out of reach in
the range of interest. Despite this, even small results in this direction, that
I believe are within reach, will be useful for proving upper bounds on the
exponent as I will explain in our discussion of upper bounds.
We will see further techniques for finding equations after we have covered
Schur-Weyl duality in §5.2.1. In this section I present equations of Griesser
and Friedland, each of which illustrates possible further paths towards finding new equations. I first discuss some generalities regarding reducing the
problem of finding equations for σr to questions in linear algebra.
Recall the classical flattenings: TA : A∗ → B⊗C. If TA is injective, to
gain further information, we examined the nature of the image (or equivalently, the kernel of TAT : B ∗ ⊗C ∗ → A). All equations I am aware of are
of this nature, in analogy with the study of secondary characteristic classes.
Friedland’s equations discussed below examine the kernel of TA∧1 to obtain
further information.
2.7.1. Multi-linear algebra via linear algebra. The following result
shows that when studying the rank and border rank of a tensor T ∈ A⊗B⊗C,
there is no loss of information in restricting attention to T (A∗ ):
Theorem 2.7.1.1. Let T ∈ A⊗B⊗C, Then R(T ) equals the number of
rank one matrices needed to span (a space containing) T (A∗ ) ⊂ B⊗C (and
similarly for the permuted statements).
Let Zr ⊂ G(a, B⊗C) denote the set of a-planes in B⊗C that can be
spanned by r rank one elements (so R(T ) ≤ r if and only if T ∈ Zr ). Then
R(T ) ≤ r if and only if T (A∗ ) ∈ Zr .
P
Proof. Let T have rank r so there is an expression T = ri=1 ai ⊗bi ⊗ci . (I
remind the reader that the vectors ai need not be linearly independent, and
similarly for the bi and ci .) Then T (A∗ ) ⊆ hb1 ⊗c1 , . . . , br ⊗cr i shows that
the number of rank one matrices needed to span T (A∗ ) ⊂ B⊗C is at most
R(T ).
On the other hand, say T (A∗ ) is spanned by rank one elements b1 ⊗c1 , . . . , br ⊗cr .
1
a
∗
Let
basis a1 , . . . , aaPof A. Then T (ai ) =
Pr a , .i . . , a be a basis of A , with dual
xs bs ⊗cs for some constants xis . But then T = s,i ai ⊗(xis bs ⊗cs ) =
Prs=1 P
i
s=1 ( i xs ai )⊗bs ⊗cs proving R(T ) is at most the number of rank one
matrices needed to span T (A∗ ) ⊂ B⊗C.
Exercise 2.7.1.2: Prove the border rank assertion.
48
2. Lower bounds for matrix multiplication
Call T ∈ A⊗B⊗C concise if it does not lie in any non-trivial subspace
variety, i.e., if the maps TA , TB , TC are injective. For a concise tensor T , we
will say T is 1A -generic if T (A∗ ) contains an element of maximal rank, and
that T is 1-generic if it is 1A , 1B and 1C -generic.
Exercise 2.7.1.3: Given T ∈ Cm ⊗Cm ⊗Cm = A⊗B⊗C that is concise,
show that PT (A∗ ) ∩ Seg(PB × PC) = ∅ implies R(T ) > m. }
All the equations we have seen so far arise as Koszul flattenings. Koszul
flattenings provide robust equations only if T is 1A , 1B or 1C -generic, because
otherwise the presence of T (α)∧a−1 in the expressions make them likely to
vanish. When T is 1A -generic, the Koszul flattenings TA∧p : Λp A⊗B ∗ →
Λp+1 A⊗C provide measures of the failure of T (A∗ )T (α)−1 ⊂ End(B) to be
an abelian subspace.
Fortunately Mhni is 1-generic. In order to test for upper bounds for a 1generic tensor, we only need equations for σr such that the intersection of the
zero set of the equations with the set of 1-generic tensors is the intersection
of σr with the set of 1-generic tensors.
How far are our equations from having this property? Let’s consider the
first case where Strassen’s commutator is identically zero, i.e., T (A∗ )T (α)−1 ⊂
End(B) is abelian.
Proposition 2.7.1.4. [LM] There exist 1A -generic tensors in Ca ⊗Ca ⊗Ca
2
satisfying Strassen’s equations of border rank a8 .
Proof. Assume a = b = c. Consider T such that

a1

..

.


a1
T (A∗ ) ⊂ 
 ∗ · · · ∗ a1

 ..
..
..
..
.
.
.
.
∗ ··· ∗
a1





.




a
a
and set a1 = 0. We obtain a generic tensor in Ca−1 ⊗Cb 2 c ⊗Cd 2 e , which will
2
have border greater than a8 . Conclude by applying Exercise 2.2.3.1.
If T (A∗ )T (α)−1 is diagonalizable, then R(T ) = b, and if T (A∗ )T (α)−1 ⊂
G(b, End(B)) is in the Zariski closure of the set of E ∈ G(b, End(B)) that
are diagonalizable, then R(T ) = b.
A classical result in linear algebra says a subspace U ⊂ End(B) is simultaneously diagonalizable if and only if U is abelian and every x ∈ U (or
equivalently for each xj in a basis of U ), x is diagonalizable, answering the
question for rank.
2.7. Additional equations for σr (Seg(PA × PB × PC))
49
***This paragraph to be updated*** If b ≤ 4, then the Zariski closure of
the diagonalizable spaces in G(b, End(B)) equals the abelian spaces [IM05],
so we have a complete answer. For b ≥ 7 it just gives a component of the
variety of abelian spaces in G(b, End(B)), and for b = 6, 7 it is not known
whether or not there are additional components. Proposition 2.7.1.7 below
gives a sufficient condition.
For x ∈ End(B), define the centralizer of x, denoted C(x), by
C(x) := {y ∈ End(B) | [y, x] = 0}.
Exercise 2.7.1.5: Show that dim C(x) ≥ b. }
Exercise 2.7.1.6: Show that dim C(x) = b if and only if the minimal
polynomial of x equals the characteristic polynomial.
We will say x ∈ End(B) is regular if dim C(x) = b. We say x is regular
semi-simple if x is diagonalizable with distinct eigenvalues. Note that x is
regular semi-simple if and only if C(x) ⊂ End(B) is diagonalizable.
Proposition 2.7.1.7. Let U ⊂ End(B) be an abelian subspace of dimension
b, and such that there exists x ∈ U that is regular. Then U lies in the Zariski
closure of the diagonalizable b-planes in G(b, End(B)).
Proof. Since the Zariski closure of the regular semi-simple elements is all
of End(B), for any x ∈ End(B), there exists a curve xt of regular semisimple elements with limt→0 xt = x. Consider the induced curve in the
Grassmannian C(xt ) ⊂ G(b, End(B)). Then C0 := limt→0 C(xt ) exists and
is contained in C(x) ⊂ End(B) and since U is abelian, we also have U ⊆
C(x). But if x is regular, then dim C0 = dim(U ) = b, so limτ →0 C(xt ), C0
and U must all be equal and thus U is a limit of diagonalizable subspaces. Problem 2.7.1.8. Determine a criterion for U ∈ G(b, End(B)) to be in the
closure of the diagonalizable b-planes, when U does not contain a regular
element.
For a 1A -generic tensor T ∈ A⊗B⊗C, define T to be 2A -generic if there
exist α ∈ A∗ such that T (α) : C ∗ → B is of maximal rank and α0 ∈ A∗ such
that T (α0 )T (α)−1 : B → B is regular semi-simple.
Unfortunately Mhni is not 2A -generic. The equations coming from Koszul
flattenings, and even more so Griesser’s equations, are less robust for tensors
that fail to be 2A -generic. This partially explains why it satisfies some of
the Koszul flattening equations and Griesser’s equations below. Thus an
important problem is to identify modules of equations for σr that are robust
for non-2-generic tensors.
50
2. Lower bounds for matrix multiplication
2.7.2. Griesser’s equations. The following theorem dates back to 1985.
It posits potential equations for σr in the range b < r ≤ 2b − 1.
Theorem 2.7.2.1. [Gri86] Let b = c. Given a 1A -generic tensor T ∈
A⊗B⊗C with R(T ) ≤ r, let α ∈ A∗ be such that T (α) is invertible. For
α0 ∈ A∗ , let X(α0 ) = T (α0 )T (α0 )−1 ∈ End(B). Fix α1 ∈ A∗ . Consider the
space of endomorphisms U := {[X(α1 ), X(α0 )] : B → B | α0 ∈ A∗ } ⊂ sl(B).
Then there exists E ∈ G(2b − r, B) such that dim(U.E) ≤ r − b.
Remark 2.7.2.2. As stated, Theorem 2.7.2.1 potentially could provide
equations for σr up to r = 2b − 1, but it was not determined in [Gri86]
whether or not these equations are trivial beyond r = 23 b.
Remark 2.7.2.3. Because of the way the equations are described, they are
difficult to write down.
Remark 2.7.2.4. Compared with the minors of TA∧p , here one is just examining the first block column of the matrix appearing in the expression QQ̃,
but one is apparently extracting more refined information from it.
P
Proof. For the moment assume R(T ) = r and T = rj=1 aj ⊗bj ⊗cj . Let
B̂ = Cr equipped with basis e1 , . . . , er . Define π : B̂ → B by π(ej ) = bj . Let
i : B → B̂ be such that π ◦ i = IdB . Choose B 0 ⊂ B̂ of dimension r − b such
that B̂ = i(B) ⊕ B 0 , and denote the inclusion and projection respectively
i0 : B 0 → B̂ and π 0 : B̂ → B 0 . Pictorially:
B̂
i %. π
B
π 0 &- i0
B0
P
Let α0 , α1 , . . . , αa−1 be a basis of A∗ . Let T̂ = rj=1 aj ⊗ej ⊗e∗j ∈ A⊗B̂⊗B̂ ∗
and let X̂j := T̂ (αj )T̂ (α0 )−1 . Now in End(B̂) all the commutators [X̂i , X̂j ]
are zero because R(T̂ ) = r. We have, for all 2 ≤ s ≤ a − 1, [X̂1 , X̂s ] = 0
implies
0 = π[X̂1 , X̂s ]i
(2.7.1)
= [X1 , Xs ] + (π X̂1 i0 )(π 0 X̂s i) − (π X̂s i0 )(π 0 X̂1 i)
Now take E ⊆ ker π 0 X̂1 i ⊂ B of dimension 2b − r. Then for all s, [X1 , Xs ] ·
E ⊂ Image π X̂1 i0 , which has dimension at most r−b because π X̂1 i0 : B 0 → B
and dim B 0 = r − b. The general case follows by taking limits.
Proof of Theorem 2.5.3.1. Here there is just one commutator [X1 , X2 ]
and its rank is at most the sum of the ranks of the other two terms in
(2.7.1). But each of the other two terms is a composition of linear maps
2.7. Additional equations for σr (Seg(PA × PB × PC))
51
including i0 which can have rank at most r − b, so their sum can have rank
at most 2(r − b).
Proposition 2.7.2.5. Let dim A = a, dim B = dim C = b. Then Griesser’s
equations for σ̂r have the following properties:
(1) They are trivial for r = 2b − 1 and all a.
(2) They are trivial for r = 2b − 2, and a ≤
a = b = 4.
b
2
+ 2, in particular
(3) Setting b = n2 , matrix multiplication Mhni fails to satisfy the equations for r ≤ 32 n2 − 1 when n is even and r ≤ 23 n2 + n2 − 2 when n
is odd, and satisfies the equations for all larger r.
I do not know whether or not the equations are trivial for r = 2b − 2,
a = b and b > 4. If they are nontrivial for r = 2b − 2 and even b, they
would give equations beyond the maximal minors of TA∧p .
Proof. Assuming T is sufficiently generic, we may choose X1 to be diagonal
with distinct entries on the diagonal, and this is a generic choice of X1 . Let
sl(B)R denote the matrices with zero on the diagonal (the span of the root
spaces). Then
ad(X1 ) : sl(B)R → sl(B)R ,
Y 7→ [X, Y ],
is a linear isomorphism, and ad(X1 ) kills the diagonal matrices. Write Us =
[X1 , Xs ], 2 ≤ s ≤ a − 1, so the Us will be matrices with zero on the diagonal,
and by picking T generically we can have any such matrices, and this is the
most general choice of T possible, so if the equations vanish for a generic
choice of Us ⊂ sl(B)R , they vanish identically.
Proof of (1): In the case r = 2b − 1, so r − b = b − 1 and a ≤ b + 1
the equations are trivial as we only have a − 2 ≤ b − 1 linear maps. When
a ≥ b + 2 a naı̈ve dimension count makes it possible for the equations
to be non-trivial, the equations are that there exists v ∈ B such that
dimhU2 v, . . . , Ua−1 vi ≤ b − 1. Taking v = (1, 0, . . . , 0)T (the superscript
T denotes transpose), the Uj v will be contained in the hyperplane of vectors
with their first entry zero. Since we only made genericity assumptions, we
conclude.
Proof of (2): In the case r = 2b − 2, the equations will be nontrivial
if and only if there exist U2 , . . . , Ua−1 ∈ sl(B)R such that for all linearly
independent v, w
dimhU2 v, . . . , Ua−1 v, U2 w, . . . , Ua−1 wi ≥ b − 1.
52
2. Lower bounds for matrix multiplication
The map U2 −1 U3 will in general have b linearly independent eigenvectors.
Take v, w to be two such, then hU2 vi = hU3 vi and hU2 wi = hU3 wi, and the
span will have dimension at most 2(a − 2) − 2 ≤ b − 2 , and we conclude.
2
2
2
Proof of (3): Consider matrix multiplication Mhni ∈ Cn ⊗Cn ⊗Cn =
A⊗B⊗C. Recall from Exercise 2.2.4.3 that with a judicious ordering of
bases, Mhni (A∗ ) is block diagonal


x


..
(2.7.2)


.
x
where x = (xij ) is n × n. In particular, the image is closed under brackets.
Choose X0 so it is the identity. It is not possible to have X1 diagonal with
distinct entries on the diagonal, the most generic choice for X1 is to be block
diagonal with each block having the same n distinct entries. For a subspace
E of dimension 2b−r = dn+e (recall b = n2 ) with 0 ≤ e ≤ n−1, the image
of a generic choice of [X1 , X2 ], . . . , [X1 , Xn2 −1 ] applied to E is of dimension at
least (d+1)n if e ≥ 2, at least (d+1)n−1 if e = 1 and dn if e = 0, and equality
will hold if we choose E to be, e.g., the span of the first 2b − r basis vectors
of B. (This is because the [X1 , Xs ] will span the entries of type (2.7.2) with
2
2
zeros on the diagonal.) If n is even, taking 2b − r = n2 + 1, so r = 3n2 − 1,
2
2
the image occupies a space of dimension n2 + n − 1 > n2 − 1 = r − b. If
2
2
one takes 2b − r = n2 , so r = 3n2 , the image occupies a space of dimension
n2
2 = r − b, showing Griesser’s equations cannot do better for n even. If n
2
2
is odd, taking 2b − r = n2 − n2 + 2, so r = 3n2 + n2 − 2, the image will have
2
2
2
dimension n2 + n2 > r − b = n2 + n2 − 1, and taking 2b − r = n2 − n2 + 1 the
2
image can have dimension n2 − n2 + (n − 1) = r − b, so the equations vanish
for this and all larger r. Thus Griesser’s equations for n odd give Lickteig’s
2
bound R(Mhni ) ≥ 3n2 + n2 − 1.
2.7.3. Friedland’s degree 16 equations. Secant varieties of Segre varieties appear in the study of algebraic statistical models corresponding to bifurcating phylogenetic trees, see [AR08], in particular σ4 (Seg(P3 ×P3 ×P3 ))
plays a central role. In 2008, equations for this variety were just beyond the
state of the art, so E. Allmann, who resides in Alaska, offered to hand-catch,
smoke and send an Alaskan salmon to anyone who could find the generators
of the ideal of this variety. In [LM08] the problem was reduced to finding
generators of the ideal for σ4 (Seg(P2 × P2 × P3 )). The first major breakthrough to this conjecture was by S. Friedland [Fri13]. The breakthrough
had two essential steps: finding new equations and proving the known equations plus the new ones were sufficient to cut out the variety set-theoretically.
2.7. Additional equations for σr (Seg(PA × PB × PC))
53
I explain the new equations in this section. For a proof of the second step,
see [Fri13] or [Lan12, §7.7.3].
First I describe a genericity condition for tensors T ∈ C3 ⊗C3 ⊗C4 statisfing Strassen’s equations that assures R(T ) ≤ 4. I then describe additional
equations that distinguish which tensors among those that fail to satisfy the
genericity conditions have border rank at most four.
Proposition 2.7.3.1. Let a = b, let T ∈ A⊗B⊗C and assume TA∧1 has a
nontrivial kernel. Let ψAB ∈ ker(TA∧1 : A⊗B ∗ → Λ2 A⊗C) .
∧1 ) is of maximal rank, then T is equivalent to a tensor
If ψAB ∈ ker(TAB
2
in S A⊗C.
∧1 ) of maximal
If furthermore c = a and there also exists ψAC ∈ ker(TAC
3
rank, then T is equivalent to a tensor in S A.
Proof. Consider (IdA ⊗ψAB ⊗IdC )(T ) ∈ A⊗A⊗C. The condition ψAB ∈
ker(TA∧1 ) implies (IdA ⊗ψAB ⊗IdC )(T ) ∈ S 2 A⊗C, and since ψAB is injective,
it is GL(A) × GL(B) × GL(C)-equivalent to T . The second assertion is
similar.
Proposition 2.7.3.2. If T ∈ C3 ⊗C3 ⊗C4 satisfies Strassen’s equations for
∧1 contains an element of maximal rank, then R(T ) ≤
border rank 4, and TAB
4.
Exercise 2.7.3.3: Show that σ4 (Seg(v2 (P2 ) × P3 )) = P(S 2 C3 ⊗C4 ).
Proof. Under the hypotheses, T is equivalent to an element of S 2 C3 ⊗C4 ,
so we conclude by Exercise 2.7.3.3.
Consider the tensor
TF = (a1 ⊗b1 +a2 ⊗b2 )⊗c1 +(a1 ⊗b1 +a2 ⊗b3 )⊗c2 +(a1 ⊗b1 +a3 ⊗b2 )⊗c3 +(a1 ⊗b1 +a3 ⊗b3 )⊗c4 .
Exercise 2.7.3.4: Show TF satisfies Strassen’s degree nine equations and
ker((TF )∧
A ) does not contain an element of maximal rank.
We will prove that R(TF ) ≥ 5 by finding additional equations for σ4 (Seg(P2 ×
P2 × P3 )) that it does not satisfy. How to find such equations?
Consider a general rank four element of C3 ⊗C3 ⊗C4 ; T = a1 ⊗b1 ⊗c1 +
a2 ⊗b2 ⊗c2 + a3 ⊗b3 ⊗c3 + (a1 + a2 + a3 )⊗(b1 + b2 + b3 )⊗c4 . (This is indeed
general, see [Lan12].)
Exercise 2.7.3.5: Show that ker TA∧ is spanned by ψAB := a1 ⊗β 1 +a2 ⊗β 2 +
a3 ⊗β 3 , where β j is the dual basis to bj and similarly ker TB∧ is spanned by
ψBA := b1 ⊗α1 + b2 ⊗α2 + b3 ⊗α3 .
54
2. Lower bounds for matrix multiplication
Note that for any choices of basis of the kernel, ψAB ψBA = λIdA and
ψBA ψAB = µIdB for some scalars λ, µ. To obtain equations from this observation, recall that A⊗A∗ = sl(A) ⊕ C{IdA }. The closed property to extract
from this information is:
∧1 , T ∧1 have one diProposition 2.7.3.6. [Fri13] If T ∈ C3 ⊗C3 ⊗C4 , TAB
BA
mensional kernels with bases ψAB , ψBA , and T ∈ σ̂4 , then
(2.7.3)
projsl(A) (ψAB ψBA ) = 0, projsl(B) (ψBA ψAB ) = 0.
Exercise 2.7.3.7: Show that R(TF ) ≥ 5.
These are equations of degree 16 for σ4 (Seg(P2 × P2 × P3 )). I do not
know what they are as a module.
Remark 2.7.3.8. These equations are now obsolete, in the sense that lower
degree equations found in [LM04] via representation theory were shown to
be sufficient to cut out σ4 (Seg(P2 × P2 × P3 )) in [BO11, FG10]. I include
them because they illustrate a technique that can be generalized. A similar
technique to Proposition 2.7.3.1 was used in [Qi] for tensors in A1 ⊗ · · · ⊗ An .
2.7.4. Summary. Our first equations for σr (Seg(PA × PB × PC)) came
from the flattenings, e.g. TA : A∗ → B⊗C. To go beyond that, Strassen
examined the nature of the image. Assuming b = c (or restricting to
a subspace), if T is 1A -generic, we can use a general α ∈ A to identify
B⊗C ' End(B) and then measure the failure of T (A∗ ) to be abelian. That
is, we exploited the structure of the image of the linear map to obtain further equations. If these next equations are satisfied, i.e., if it is abelian, a
second genericity condition (that T (A∗ ) contains a regular element) assures
us R(T ) = b. If T (A∗ ) fails to be abelian but there are no commutators of
full rank, in some special situations, one can prove, again under a further
genericity condition on ker TA∧1 (e.g. Proposition 2.7.3.2), that the border
rank equals the value predicted by Strassen’s equations. To obtain further
equations when this genericity condition fails, one exploits the structure of
ker TA∧1 as in Proposition 2.7.3.6.
We have the additional equations obtained from minors of TA∧p , and
matrix multiplication satisfies some of them. An important problem is to
determine if these equations are enough to conclude an upper bound on
Mhni . The discussion above suggests we should study the nature of the
kernel (or image) of TA∧p :
Problem 2.7.4.1. Find equations for tensors where TA∧p : Λp A⊗B ∗ →
Λp+1 A⊗C is not of full rank. In particular find equations for the case TC∧4
9 :
Λ4 C9 ⊗C9∗ → Λ5 C9 ⊗C9 .
2.7. Additional equations for σr (Seg(PA × PB × PC))
55
Assume b = c and a = 2p + 1. Say rank(TA∧p ) is “small” (so that there
are many equations satisfied). Are there conditions on ker(TA∧p ) that assure
R(T ) =
∧p
rank(TA
)
(2p
p)
?
Chapter 3
The complexity of
matrix multiplication
II: asymptotic upper
bounds
This chapter discusses progress towards the astounding conjecture that asymptotically, the complexity of multiplying two n × n matrices is nearly the
same as the complexity of adding them. I cover the main advances in upper
bounds for the exponent of matrix multiplication, emphasizing a geometric
perspective.
3.1. Overview
The exponent ω of matrix multiplication is naturally defined in terms of
tensor rank:
ω := inf{τ ∈ R | R(Mhni ) = O(nτ )}.
See [BCS97, §15.1] why tensor rank yields the same exponent as other
complexity measures.
The above-mentioned conjecture is that ω = 2. I do not know of any general tools for studying tensor rank. However, we have seen tools for studying
border rank. Fortunately, Bini et. al. showed that one may also define the
exponent in terms of border rank, namely (see Proposition 3.2.1.10)
ω = inf{τ ∈ R | R(Mhni ) = O(nτ )},
57
58
3. Asymptotic upper bounds for matrix multiplication
which implies:
ω≤
logR(Mhni )
.
logn
A systematic path to proving the astonishing conjecture would be to find
2
a function r(n) = O(n2 ) and set-theoretic equations for σr(n) (Seg(Pn −1 ×
2
2
Pn −1 × Pn −1 )), and show that [Mhni ] is in their zero set. As discussed in
§2.7.1, an important simplification is that it would be sufficient to find set2
theoretic equations for the open subset of 1-generic elements of σr(n) (Seg(Pn −1 ×
2
2
Pn −1 × Pn −1 )).
At this writing, I know of no evidence that R(Mhni ) > 2n2 − 1 for any
n.
While we have equations for r(n) ≤ 2n2 − 4 these equations are not
satisfied by matrix multiplication. This is why Problem 2.7.4.1 is important
for the study of upper bounds.
To disprove the conjecture, it would suffice to find a function r(n) growing faster than Cn2 for any constant C, and one equation (for each n) Pn in
2
2
2
the ideal of σr(n) (Seg(Pn −1 × Pn −1 × Pn −1 )) such that Pn (Mhni ) 6= 0. All
proposed equations for σr (Seg(Pm−1 × Pm−1 × Pm−1 )) when 2r ≥ m that I
am aware of have matrix multiplication in their zero set.
Fortunately, there are results that enable one to prove upper bounds
on ω while avoiding border rank beyond the range we understand. The
first is easy and part of Proposition 3.2.1.10, namely we can also consider
rectangular matrix multiplication:
ω≤
3logR(Mhm,n,li )
ω
, i.e., (lmn) 3 ≤ R(Mhm,n,li ).
log(mnl)
The second result, due to Schönhage (Theorem 3.3.3.1) and described
in §3.3, is more involved: it says it is sufficient to prove upper bounds on
sums of disjoint matrix multiplications:
s
X
i=1
ω
(mi ni li ) 3 ≤ R(
s
M
Mhmi ,ni ,li i ).
i=1
Schönhage also proves that the assertion is non-vacuous, that is there exist
disjoint tensors where the border rank of their sum is less than the sum of
their border ranks, see §3.3.2. Sometimes the border rank of the sums is
so low, it becomes in the range where we may determine its border rank.
(However researchers proving lower bounds by this method have usually
done so simply by writing out an explicit limit.)
As mentioned in §3.3 the inequalities regarding ω above are strict, e.g.,
there does not exist n with R(Mhni ) equal to nω . (This does not rule out
3.2. The upper bounds of Bini, Capovani, Lotti, and Romani
59
R(Mhni ) equal to 2nω for all n.) Thus one cannot exactly determine ω by
the above methods. One way to extend the above methods is to find ses(N )
quences of sums ⊕i=1 Mhli (N ),mi (N )ni (N )i with the border rank of the sums
giving upper bounds on ω that converge to the actual value. This is one
aspect of Strassen’s “laser method” described in §3.4. A second new ingredient of his method is that instead of dealing with the sum of a collection
of disjoint rectangular matrix multiplications, one looks for a larger tensor
T ∈ A⊗B⊗C, that has special structure rendering it easy to study, that can
be degenerated into a collection of disjoint matrix multiplications. More
precisely, to obtain a sequence as in the previous paragraph, one degenerates the tensor powers T ⊗N . Strassen’s degeneration is in the sense of the
GL(A) × GL(B) × GL(C)-orbit closure of T ⊗N .
After Strassen, all other subsequent upper bounds on ω use what I will
call combinatorial restrictions of T ⊗N , where entries of a coordinate presentation of T ⊗N are simply set equal to zero. The choice of entries to zero
out is subtle. I describe these developments in §??.
In addition to combinatorial restrictions, Cohn-Umans et. al. exploit a
geometric change of basis when a tensor is the multiplication tensor of an
algebra (or even more general structures). They use the discrete Fourier
transform for finite groups (and more general structures) to show that the
multiplication tensor in the Fourier basis (and thus in any basis) has “low”
rank, but nevertheless in the standard basis admits a combinatorial restriction to a “large” matrix multiplication tensor. I discuss this approach in
§3.6.
3.2. The upper bounds of Bini, Capovani, Lotti, and Romani
3.2.1. Rank, border rank, and the exponent of matrix multiplication.
Proposition 3.2.1.1. [Bin80] For all n, nω ≤ R(Mhni ), i.e., ω ≤
logR(Mhni )
log(n) .
Proof. By Exercise 2.2.3.1, R(Mhni ) ≤ r implies R(Mhnk i ) ≤ rk . Since
R(Mhvi ) is a non-decreasing function of v and nk−1 ≤ v < nk for some k,
logr
we conclude that the τ in the definition of the exponent satisfies τ ≤ logn
. ω
Proposition 3.2.1.2. For all l, m, n, (lmn) 3 ≤ R(Mhm,n,li ), i.e., ω ≤
3logR(Mhm,n,li )
.
log(mnl)
Exercise 3.2.1.3: Prove Proposition 3.2.1.2. }
To show that ω may also be defined in terms of border rank, introduce
a sequence of ranks that interpolate between rank and border rank.
60
3. Asymptotic upper bounds for matrix multiplication
We say Rh (T ) ≤ r and T ∈ σ̂r0,h if there exists an expression
1
(3.2.1)
T = lim h (a1 ()⊗b1 ()⊗c1 () + · · · + ar ()⊗br ()⊗cr ())
→0 where aj (), bj (), cj () are analytic functions of .
Proposition 3.2.1.4. R(T ) ≤ r if and only if there exists an h such that
Rh (T ) ≤ r.
To prove Proposition 3.2.1.4, we need the following lemma:
Lemma 3.2.1.5. Let Z ⊂ PV be an irreducible variety and let Z 0 ⊂ Z be
a Zariski open subset. Let p ∈ Z\Z 0 . Then there exists an analytic curve
C(t) such that C(t) ∈ Z 0 for all t 6= 0 and limt→0 C(t) = p.
For the proof of the lemma, we will need the following basic fact: every
irreducible algebraic curve C ⊂ PV (i.e., every one dimensional algebraic
variety) is such that there exists a smooth algebraic curve C̃ and a surjective
algebraic map π : C̃ → C that is one-to-one over the smooth points of C.
See, e.g., [Sha94, §II.5], Thms. 3 and 6 for a proof. The curve C̃ is called
the normalization of C.
Proof. Let c be the codimension of Z and take a general L ⊂ PV of dimension c + 1 that contains p. Then L ∩ Z will be a possibly reducible algebraic
curve containing p. Take a component C of the curve that contains p. If
p is a smooth point of the curve we are done, as we can expand a Taylor
series about p. Otherwise take the the normalization π : C̃ → C and a point
of π −1 (p), expand a Taylor series about that point and compose with π to
obtain the desired curve.
Proof of Proposition 3.2.1.4. Recall that every curve in Seg(PA × PB ×
PC) is of the form [a(t)⊗b(t)⊗c(t)] for some curves a(t) ⊂ A etc., and if the
curve is analytic, the functions a(t), b(t), c(t) can be taken to be analytic as
well.PThus every analytic curve in σr0 (Seg(PA × PB × PC)) may be written
as [ rj=1 a(t)j ⊗b(t)j ⊗cj (t)] for some analytic curves aj (t) ⊂ A etc. We
conclude that if T ∈ σ̂r , then T ∈ σ̂r0,h for some h.
Remark 3.2.1.6. In the matrix multiplication literature, e.g. [BCS97], σ̂r
is often defined to be the the set of T that are in σ̂r0,h for some h. One then
must show that this set is algebraically closed.
Proposition 3.2.1.7. If T ∈ σ̂r0,h , then R(T ) ≤ r h+2
< rh2 .
2
Proof. Write T as in (3.2.1). Then T is the coefficient of the h term of
the expression
in parentheses. For each monomial, there is a contribution
h+2
of 2 terms of degree h in .
Remark 3.2.1.8. In fact R(T ) ≤ rh, see Proposition 3.6.1.6.
3.2. The upper bounds of Bini, Capovani, Lotti, and Romani
61
h
Exercise 3.2.1.9: Show that if T ∈ σ̂r0,h then T ⊗N ∈ σ̂r0,N
.
N
Proposition 3.2.1.10. For all l, m, n, ω ≤
3logR(Mhm,n,li )
.
log(mnl)
Proof. Write r = R(Mhm,n,li ). As above, set N = mnl. We have MhN i ∈
3k (hk)2 , which implies
σ̂r0,h
3 for some h and thus R(MhN k i ) ≤ r
log(r3k (hk)2 )
logN k
3logr 2log(hk)
=
+
logN
klogN
ω≤
and the second term goes to zero as k → ∞.
3.2.2. Bini et. al’s algorithm. Bini, Capovani, Lotti, and Romani had
been looking for a rank five expression for the tensor corresponding to Mh2i
when the first matrix has an entry equal to zero. Say the entry is x22 = 0
and denote the tensor by
TBCLR := x11 ⊗(y11 ⊗z11 +y21 ⊗z12 )+x12 ⊗(y12 ⊗z11 +y22 ⊗z12 )+x21 ⊗(y11 ⊗z21 +y21 ⊗z22 ).
Their method was to minimize the norm of TBCLR minus a rank five tensor
that varied, and their computer kept on producing rank five tensors with
larger and larger coefficients. At first they thought there was a problem in
their computer code, and only later realized it was a manifestation of the
phenomenon Bini named border rank [Bin80]. (This information from Bini,
personal communication.)
The expression for the tensor TBCLR that their computer search found
is
(3.2.2)
1
TBCLR = lim [(x12 + tx11 )⊗(y21 + ty22 )⊗z12
t→0 t
+ (x21 + tx11 )⊗(y11 )⊗(z11 + tz21 )
− x12 ⊗y21 ⊗((z11 + z12 ) + tz22 )
− x21 ⊗((y11 + y21 ) + ty12 )⊗z11
+ (x12 + x21 )⊗(y21 + ty12 )⊗(z11 + tz22 )]
and thus R(TBCLR ) ≤ 5. I discuss the geometry of this algorithm in §4.3.1.
Exercise 3.2.2.1: Use (3.2.2) to show R(Mh2,2,3i ) ≤ 10. }
Using Proposition 3.2.1.10 we conclude:
Theorem 3.2.2.2. [BCRL79] (1979) ω < 2.78.
62
3. Asymptotic upper bounds for matrix multiplication
3.3. Schönhage’s upper bounds
The next contribution to upper bounds for the exponent of matrix multiplication was Schönhage’s realization that a border rank version of Strassen’s
additivity conjecture described below is false, and that this failure could be
exploited to prove further upper bounds on the exponent. From a geometric
perspective, this result is useful because it allows one to prove upper bounds
with tensors that are easier to analyze because of their low border rank.
3.3.1. Strassen’s additivity conjecture. Given T1 ∈ A1 ⊗B1 ⊗C1 and
T2 ∈ A2 ⊗B2 ⊗C2 , if one considers T1 +T2 ∈ (A1 ⊕A2 )⊗(B1 ⊕B2 )⊗(C1 ⊕C2 ),
where each Aj ⊗Bj ⊗Cj is naturally included in (A1 ⊕ A2 )⊗(B1 ⊕ B2 )⊗(C1 ⊕
C2 ), we saw that R(T1 + T2 ) ≤ R(T1 ) + R(T2 ).
Conjecture 3.3.1.1. [Str73] With the above notation, R(T1 +T2 ) = R(T1 )+
R(T2 ).
Let’s consider a potential border rank version of Conjecture 3.3.1.1. Say
T1 + T2 is as above and consider (T1 + T2 )(C1∗ ⊕ C2∗ ). The image looks like
T (C1∗ )
0
.
0
T (C2∗ )
One might attempt to construct a curve leading to a counter-example via
geometry as follows: say that dim C2 = 1, e.g., that T2 = Mh1,1,N i . If one
takes a curve with zero-th order terms in A1 ⊗B1 ⊗C2 , and takes one derivative, one can have terms in A1 ⊗B1 ⊗C1 and after two derivatives, one can
have terms in both A1 ⊗B1 ⊗C1 and A2 ⊗B2 ⊗C2 . The zero-th order terms
must be arranged to all cancel. Schönhage accomplishes this in the simplest
possible way: he takes dimensions sufficiently unbalanced that there are
more terms than the dimension of A1 ⊗B1 ⊗C2 , so they are linearly dependent and easily arranged to cancel. What is more subtle is the cancellation
of the first order terms, whose geometry I leave to the reader to explore.
3.3.2. Schönhage’s example. Recall from Exercise 2.2.4.5 that R(Mh1,m,ni ) =
mn and R(Mh1,1,N i ) = N .
Theorem 3.3.2.1 (Schönhage [Sch81]). Set N = (n − 1)(m − 1). Then
R(Mh1,m,ni ⊕ Mh1,1,N i ) = mn + 1.
3.3. Schönhage’s upper bounds
63
Proof.
m−1 n−1
1 XX
T = lim 2 [
(xu + txuv )⊗(bv + tyuv )⊗(z + t2 zuv )
t→0 t
u=1 v=1
m−1
X
xu ⊗(yn + t(−
+
+
X
yuv ))⊗(z + t2 zun )
v
u=1
n−1
X
(xm + t(−
X
xuv ))⊗bv ⊗(z + t2 zmv )
u
v=1
X
X
+ xm ⊗yn ⊗(z + t zmn ) − (
xi )⊗(
ys )⊗z].
2
s
i
3.3.3. Schönhage’s asymptotic sum inequality. To develop intuition
how an upper bound on a sum of matrix multiplications could give an upper
⊕s
bound on a single matrix multiplication, say we knew R(Mhni
) ≤ r with
3
s ≤ n . Then to compute Mhn2 i we could write Mhn2 i = Mhni ⊗Mhni . At
worst this is evaluating n3 disjoint copies of Mhni . Now group these disjoint
copies in groups of s and apply the bound to obtain a savings.
Here is the precise statement:
Theorem 3.3.3.1. [Sch81] [Schönhage’s asymptotic sum inequality] For
all li , mi , ni , with 1 ≤ i ≤ s:
s
X
i=1
ω
(mi ni li ) 3 ≤ R(
s
M
Mhmi ,ni ,li i ).
i=1
Remark 3.3.3.2. A similar result (also proven in [Sch81]) holds for the
border rank of the multiplication of matrices with some entries equal to zero,
where the product mi ni li is replaced by the number of multiplications in
the naı̈ve algorithm for the matrices with zeros.
Following notes of M. Bläser [Blä13], we first analyze a special case that
isolates the new ingredient:
⊕s
Lemma 3.3.3.3. If R(Mhni
) ≤ r, then ω ≤
⊕s
R(Mhni ).
logd rs e
logn .
Proof. It is sufficient to show that for all N ,
(3.3.1)
r N
⊕s
R(Mhn
N i) ≤ d e s
s
In particular, snω ≤
64
3. Asymptotic upper bounds for matrix multiplication
because (3.3.1) would imply R(MhnN i ) ≤ d rs eN s and thus one obtains
log(d rs eN s)
logd rs e
=
.
N →∞ log(nN )
logn
ω ≤ lim
We prove (3.3.1) by induction. The hypothesis is the case N = 1. Assume (3.3.1) holds up to N and compute
⊕s
⊕s
Mhn
N +1 i = Mhni ⊗MhnN i
⊕s
⊕s
⊕r
Now R(Mhni
) ≤ r implies Mhni
∈ GL×3
r · Mh1i .
⊕s
Thus R(Mhn
N +1 i ) ≤
⊕r
⊕t
⊕t
⊗MhnN i ). Recall that Mh1i
⊗MhnN i = Mhn
R(Mh1i
N i . Now
⊕d r es
⊕r
s
R(Mhn
N i ) ≤ R(MhnN i )
⊕d r e
⊕s
≤ R(Mh1i s ⊗Mhn
N i)
⊕d r e
⊕s
≤ R(Mh1i s )R(Mhn
N i)
r
r N
≤ d e(d e s)
s
s
where the last line follows from the induction hypothesis.
The general case essentially follows from the above lemma and repeating
arguments used previously: one first takes a high tensor power of the sum,
then switches to rank at the price of introducing an h that washes out in
the end. The new tensor is a sum of products of matrix multiplications that
one converts to a sum of matrix multiplications. One then takes the worst
term in the summation and estimates with respect to it (multiplying by the
number of terms in the summation), and applies the lemma to conclude.
Corollary 3.3.3.4. [Sch81] (1981) ω < 2.55.
Proof. Applying Theorem 3.3.3.1 to R(Mh1,m,ni ⊕Mh(m−1)(n−1),1,1i ) = mn+
1 gives
ω
ω
(mn) 3 + ((m − 1)(n − 1)) 3 ≤ mn + 1
and taking m = n = 4 gives the result.
In [CW82] they prove that for any tensor T that is a direct sum of
disjoint matrix multiplications, if R(T ) ≤ r, then there exists N and k such
that R(T ⊗k ⊕ Mh1,1,N i ) ≤ rk + 1. In particular, setting N = R(Mhl,m,ni ) −
l(m + n − 1),
R(Mhl,m,ni ⊕ Mh1,1,N i ) ≤ R(Mhl,m,ni ) + l.
This implies
3.4. Strassen’s laser method
65
Theorem 3.3.3.5. [CW82] For all li , mi , ni , with 1 ≤ i ≤ s:
s
s
X
M
ω
3
(mi ni li ) < R(
Mhmi ,ni ,li i ).
i=1
i=1
Thus to determine the exponent of matrix multiplication, one will need
to look at sequences of tensors.
3.4. Strassen’s laser method
3.4.1. Introduction. Recall our situation: we don’t understand rank or
even border rank in the range we would need to prove upper bounds on
ω via Mhni , so we showed upper bounds on ω could be proved first with
rectangular matrix multiplication, then with sums of disjoint matrix multiplications which had the property that the border rank of the sum was less
than the sum of the border ranks, and the border ranks were sufficiently
low that they were in the range of R that we can determine with equations
(although Schönhage determined the border rank via an explicit algorithm).
We also saw that to determine the exponent, one would need to deal
with sequences of tensors. Strassen’s laser method is based on taking high
tensor powers of a fixed tensor, but then degenerating it in a different way for
each power. Because the degenerations are different, there is no theoretical
obstruction to determining ω exactly via Strassen’s method.
Starting with Strassen’s method, all attempts to determine ω aim at best
for a Pyrrhic victory in the sense that even if ω were determined by these
methods, they would not give any indication as to what would be optimally
fast matrix multiplication for any given size matrix.
Definition 3.4.1.1. Given T ∈ A⊗B⊗C, if T 0 ∈ GL(A) × GL(B) × GL(C) · T ,
one says T degenerates to T 0 .
Recall from Proposition 2.4.12.1 that if T degenerates to T 0 , then R(T 0 ) ≤
R(T ).
In Schönhage’s work, raising tensors to a power was used to prove results about particular starting tensors. In Strassen’s work one utilizes the
tensor powers directly to obtain upper bounds on ω that could potentially
determine ω exactly. Note that if T degenerates to T 0 then T ⊗N degenerates
to T 0 ⊗N .
What tensors T are good candidates to have both low border rank and
admit a degeneration to a “good” sum of matrix multiplication tensors? To
formalize this question, we make a definition:
d (T )
Definition 3.4.1.2. For T ∈ A⊗B⊗C, N ≥ 1 and ρ ∈ [2, 3] define Vρ,N
Ps
ρ
⊗N to
3
to be the maximum of
i=1 (li mi ni ) over all degenerations of T
66
3. Asymptotic upper bounds for matrix multiplication
⊕si=1 Mhli ,mi ,ni i over all choices of s, li , mi , ni and define the degeneracy value
1
d (T ) N .
of T to be Vρd (T ) := supN Vρ,N
The supremum in the definition can be replaced by a limit since the
d (T ) is super-additive, so Feketes lemma implies log 1 V d (T )
sequence Vρ,N
N ρ,N
tends to a limit. See [AFL14] for details.
The utility of value is that an analogue of the asymptotic sum inequality
holds:
Theorem 3.4.1.3.
s
X
Vωd (Ti ) ≤ R(⊕si=1 Ti ).
i=1
The proof is similar to the proof of the asymptotic sum inequality. It is
clear that Vωd (T1 ⊗T2 ) ≥ Vωd (T1 )⊗Vωd (T2 ). To show Vωd (T1 ⊕ T2 ) ≥ Vωd (T1 ) +
d (T ⊕ T ), the result is a sum of products with
Vωd (T2 ) one expands out Vω,N
1
2
coefficients, but as with the asymptotic sum inequality, one can essentially
just look at the largest term, and as N tends to infinity, the coefficient
becomes irrelevant after taking N -th roots.
Since degeneracy value is a non-decreasing function of ρ, if we could show
≥ R(T ) for some ρ, we would obtain the upper bound ω ≤ ρ. Thus
tensors of low border rank with high degeneracy value give upper bounds on
ω. The problem is that we have no systematic way of estimating degeneracy
value. In subsequent work, researchers restrict to a special type of value
that is possible to estimate.
Vρd (T )
I expect that value is related to tensor rank, so that to obtain upper
bounds on ω, one should look for tensors whose rank is much larger than
their border rank.
3.4.2. Tensors of low border rank. Not much is known about the possible ranks of tensors of low border rank. It is classical that elements of
σ2 (Seg(PA×PB×PC))\Seg(PA×PB×PC) are either of rank two, or lie on a
general tangent line to the Segre, in which case their rank is three, and then
there is the normal form: T = a1 ⊗b1 ⊗c1 +a1 ⊗b1 ⊗c2 +a1 ⊗b2 ⊗c1 +a2 ⊗b1 ⊗c1 .
Normal forms for the third secant variety were determined in [BL14]:
Theorem 3.4.2.1. [BL14] Normal forms points of σ3 (Seg(PA × PB ×
PC))\σ2 (Seg(PA × PB × PC)) are (up to permuting the roles of A, B, C
in the last case) as follows:
(1) a1 ⊗b1 ⊗c1 + a2 ⊗b2 ⊗c2 + a3 ⊗b3 ⊗c3
(2) a1 ⊗b1 ⊗c1 + a2 ⊗b2 ⊗c3 + a2 ⊗b3 ⊗c2 + a3 ⊗b2 ⊗c2
(3) a1 ⊗b1 ⊗c2 +a1 ⊗b2 ⊗c1 +a2 ⊗b1 ⊗c1 +a1 ⊗b3 ⊗c3 +a3 ⊗b1 ⊗c3 +a3 ⊗b3 ⊗c1
3.4. Strassen’s laser method
67
(4) a3 ⊗b1 ⊗c1 + a1 ⊗b2 ⊗c2 + a1 ⊗b1 ⊗c2 + a2 ⊗b3 ⊗c1 + a2 ⊗b1 ⊗c3 .
The points of type (1) form a Zariski open subset, those of type (2) are of
the form x + y + y 0 where x, y ∈ Ŝeg(PA × PB × PC) and y 0 ∈ T̂y Ŝeg(PA ×
PB × PC) and have rank four, those of type (3) are of the form x + x0 + x00
where x(t) is a curve in the Segre, and have rank five, and those of type (4)
are of the form x0 + y 0 where [x], [y] are two points lying on a line in the
Segre and have rank five.
For higher secant varieties, all these types of points generalize, and many
kinds of new phenomena arise as well.
Recall how Schönhage’s example began: one took q + 1 points on a
⊂ Seg(PA × PB × PC). Then any point in the linear span of the tangent
spaces is a point on σq+1 (Seg(PA × PB × PC)).
Pq
Exercise 3.4.2.2: Show that for a linear space on the Segre, e.g., a γ-space
of the form P(a⊗b⊗U ), if c1 , . . . , cu is a basis of U ⊂ C, then any x ∈
P
P(a⊗b⊗U ) satisfies T̂x Seg(PA × PB × PC) ⊂ uj=1 T̂[a⊗b⊗cj ] Seg(PA × PB ×
PC). (For those familiar with uni-ruled varieties, an analogous statement
holds for arbitrary uni-ruled varieties.)
By Exercise 3.4.2.2 there are points of σ̂q+2 (Seg(PA × PB × PC)) of the
form
x001 + · · · + x00q
(3.4.1)
where x1 (t), . . . , xq (t) are curves in the Segre with xj (0) all lying on at most
a Pq−2 and summing to zero, and xj (0)0 all summing to zero as well. An
extreme case of this is when all the xj (0) coincide.
It may be useful to consider the corresponding curve in the Grassmannian. For example, consider the extreme case and the curve
[(a0 ⊗b0 ⊗c0 ) ∧ (t(a0 + ta1 )⊗(b0 + tb1 )⊗(c0 + tc1 ))∧
· · · ∧ (t(a0 + taq )⊗(b0 + tbq )⊗(c0 + tcq )) ∧ (a0 − t2
q
X
j=1
aj )⊗(b0 − t2
q
X
bj )⊗(c0 − t2
j=1
The resulting tensor is
(3.4.2) TCW :=
q
X
j=1
a0 ⊗bj ⊗cj + aj ⊗b0 ⊗cj + aj ⊗bj ⊗c0 ∈ Cq+1 ⊗Cq+1 ⊗Cq+1
q
X
j=1
cj )]
68
3. Asymptotic upper bounds for matrix multiplication
Exercise 3.4.2.3: Show the tensor
(3.4.3)
T̃CW :=
q
X
(a0 ⊗bj ⊗cj + aj ⊗b0 ⊗cj + aj ⊗bj ⊗c0 )
j=1
+ a0 ⊗b0 ⊗cq+1 + a0 ⊗bq+1 ⊗c0 + aq+1 ⊗b0 ⊗c0 ∈ Cq+2 ⊗Cq+2 ⊗Cq+2
also has border rank q + 2 by modifying the curves used to obtain TCW . }
Exercise 3.4.2.4: Show R(TCW ) ≤ 2q. }
For future reference, we record our observation:
Proposition 3.4.2.5. R(TCW ) = R(T̃CW ) = q + 2.
**need to add proof of lower bound***
Here the matrices are

0 c1 · · ·
c1 c0 0


TCW (C ∗ ) = c2 0 c0
 ..
..
.
.
cq 0 · · ·
cq
···
..
0







.
c0
and

cq+1 c1 · · ·
 c1 c0 0

 c2
0 c0

T̃CW (C ∗ ) =  .
..
 ..
.

 cq
0 ···
c0
0 ···
···
..
.
0

cq c0
0



.



c0
0 0
3.4.3. Strassen’s tensor. Strassen uses the following tensor,
(3.4.4)
TST R =
q
X
a0 ⊗bj ⊗cj + aj ⊗b0 ⊗cj ∈ Cq+1 ⊗Cq+1 ⊗Cq
j=1
which has border rank q + 1, as these are tangent vectors to q points of the
Pq−1 = P{a0 ⊗b0 ⊗hc1 , . . . , cq i} that lies on the Segre as discussed in §4.3.1.
The vector a0 ⊗bj ⊗cj + aj ⊗b0 ⊗cj is in the tangent space to a0 ⊗b0 ⊗cj ,
showing the border rank is at most q + 1, but since the tensor is concise, we
obtain equality.
3.4. Strassen’s laser method
69
The corresponding linear spaces are:

0 c1 · · ·
c1 0 · · ·


TST R (C ∗ ) = c2
 ..
..
.
.
0
···

a1 a2
a0 0


∗
TST R (A ) = 
 0 a0
 .. . .
.
.
0 ···
···
···
cq

cq
0


,
.. 
.
0
and
0
0

aq
0

.. 
.
.


a0
Consider T̃ := TST R ⊗σ(TST R )⊗σ 2 (TST R ) where σ is a cyclic permuta2
2
2
tion of TST R . Thus T̃ ∈ Cq(q+1) ⊗Cq(q+1) ⊗Cq(q+1) and R(T̃ ) ≤ (q + 1)3 .
Write aαβγ := aα ⊗aβ ⊗aγ and similarly for b’s and c’s. Then, omitting
the ⊗’s:
T̃ =
q
X
(aij0 b0jk ci0k + aijk b0jk ci00 + aij0 b00k cijk + aijk b00k cij0
i,j,k=1
+ a0j0 bijk ci0k + a0jk bijk ci00 + a0j0 bi0k cijk + a0jk bi0k cij0 )
There are four kinds of A-indices, ij0, ijk, 0j0 and 0jk. To emphasize
this, and to suggest what will come next, label these with superscripts 11,
21, 12 and 22. Label each of the B and C indices (which come in four types
as well) similarly. We obtain:
T̃ =
q
X
11 11
21 11 12
11 12 21
21 12 22
(a11
ij0 b0jk ci0k + aijk b0jk ci00 + aij0 b00k cijk + aijk b00k cij0
i,j,k=1
21 11
22 21 12
12 22 21
22 22 22
+ a12
0j0 bijk ci0k + a0jk bijk ci00 + a0j0 bi0k cijk + a0jk bi0k cij0 ).
This expression has the structure of block 2×2 matrix multiplication, where
inside the blocks are rectangular matrix multiplications. E.g., the first block
is Mhq,q,qi , the second is Mhq2 ,q,1i etc..
Were the matrix multiplications disjoint, we could apply the asymptotic sum inequality. To obtain disjoint matrix multiplications, we could
set some coefficients to zero. For example we could set all terms with
a12 , b12 , c12 , a21 , b21 , c21 to zero to be left with two independent matrix multiplications. But it makes far more sense to use Theorem 2.4.12.4 to keep three
(= b 43 (22 )c) independent matrix multiplications. Applying the asymptotic
70
3. Asymptotic upper bounds for matrix multiplication
sum inequality, we obtain 3q ω ≤ (q + 1)3 and taking q = 7 gives ω < 2.642,
which is not as good as Schönhage’s bound.
We can do better by taking a high tensor power of T̃ , as T̃ ⊗N contains
(2N )2 matrix multiplications Mhl,m,ni , all with lmn = q 3N , and again by
Theorem 2.4.12.4 we may keep 34 22N of them. The asymptotic sum inequality applied to the degenerated tensor gives
3 2N N ω
2 q
≤ (q + 1)3N .
4
Taking N -th roots and letting N tend to infinity, the
obtain
22 q ω ≤ (q + 1)3 .
3
4
goes away and we
Finally, the case q = 5 implies:
Theorem 3.4.3.1. [Str87] (1987) ω < 2.48 .
3.5. The Coppersmith-Winograd method
Degeneracy value had the defect that we have no way of computing it in
general. I now define two weaker notions of value, the second has the advantage that it is computable, but the disadvantage that it depends on a choice
of coordinates. (However, Strassen only applied his method to coordinate
degenerations as well.)
3.5.1. The value of a tensor. Let End(A) × End(B) × End(C) act on
A⊗B⊗C by the action inherited from the GL(A) × GL(B) × GL(C) action
(not the Lie algebra action). Then for all X ∈ End(A) × End(B) × End(C)
and T ∈ A⊗B⊗C, we have R(X · T ) ≤ R(T ) and R(X · T ) ≤ R(T ) by
Exercise 2.2.3.1.
Definition 3.5.1.1. One says T restricts to T 0 if there exists X ∈ End(A)×
End(B) × End(C) such that T 0 = X · T .
r (T )
Definition 3.5.1.2. For T ∈ A⊗B⊗C, N ≥ 1 and ρ ∈ [2, 3] define Vρ,N
Ps
ρ
to be the maximum of i=1 (li mi ni ) 3 over all restrictions of T ⊗N to ⊕si=1 Mhli ,mi ,ni i
1
r (T ) N .
and define the restriction value of T to be Vρr (T ) := supN Vρ,N
All the results for degeneracy value hold for restriction value, in particular the analog of the asymptotic sum inequality. Coppersmith-Winograd
and all subsequent work, use only the following type of restriction:
Definition 3.5.1.3. Let A, B, C be given bases bA , bB , bC . We say T ∈
A⊗B⊗C combinatorially restricts to T 0 with respect to bA , bB , bC , if T restricts to T 0 by coordinate projections in these bases.
3.5. The Coppersmith-Winograd method
71
For T 0 the matrix multiplication tensor, following [CU03], this may
be phrased as follows: write aα , bβ , cγ for the given bases of A, B, C and
Pa Pb Pc
α,β,γ a ⊗b ⊗c T ∈ Ca ⊗Cb ⊗Cc combinatorially
T =
α
γ
β
α=1
β=1
γ=1 t
restricts to Mhl,m,ni means that there exist functions
α : [l] × [m] → [a]
β : [m] × [n] → [b]
γ : [n] × [l] → [c]
such that
(3.5.1)
0
0
0
tα(i,u ),β(u,s )γ(s,i ) =
1 if i = i0 , u = u0 , s = s0
.
0
otherwise
Definition 3.5.1.4. For T ∈ A⊗B⊗C, N ≥ 1 and ρ ∈ [2, 3] define Vρ,N (T )
P
ρ
to be the maximum of si=1 (li mi ni ) 3 over all combinatorial degenerations
of T ⊗N to ⊕si=1 Mhli ,mi ,ni i and define the combinatorial value (or value
for short, since it is the value used in the literature) of T to be Vρ (T ) :=
1
limn→∞ Vρ,N (T ) N . (The limit is shown to exist in [DS13].)
Note that the values satisfy Vρd ≥ Vρr ≥ Vρ . As with all the values we
have
• Vρ (T ) is a non-decreasing function of ρ,
• Vω (T ) ≤ R(T ).
Thus if Vρ (T ) ≥ R(T ), then ω ≤ ρ.
Combinatorial value can be estimated in principle, as for each N , there
are only a finite number of combinatorial restrictions. In practice, the tensor
is presented in such a way that there are “obvious” combinatorial degenerations to disjoint matrix multiplication tensors and at first, one optimizes just
among these obvious combinatorial degenerations. However,
it may be that
P
there are matrix multiplication tensors of the form j a0 ⊗bj ⊗cj as well as
tensors of the form a0 ⊗bk ⊗ckP
where k is not in the range of j. Then one can
merge these tensors to a0 ⊗( j bj ⊗cj + bk ⊗ck ) because although formally
speaking they were not disjoint, they do not interfere with each other. So
the actual procedure is to optimize among combinatorial restrictions with
merged tensors.
Coppersmith and Winograd work with the tensors TCW and T̃CW of
(3.4.2) and (3.4.3). They get their best result of ω < 2.3755 by merging
⊗2
T̃CW
and then optimizing. In subsequent work Stothers [Sto], resp. V.
⊗4
⊗8
Williams [Wil], resp. LeGall [Gal] used merging with T̃CW
resp. T̃CW
,
⊗16
⊗32
resp. T̃CW and T̃CW leading to the current “world record”:
Theorem 3.5.1.5. [Gal][2014] ω < 2.3728639.
72
3. Asymptotic upper bounds for matrix multiplication
Ambainis, Filmus and LeGall [AFL14] showed that taking higher powers of T̃CW when q ≥ 5 cannot be used to prove ω < 2.30 by this method
alone. Thus one either needs to develop new methods, or find better base
tensors.
To get an idea of how the optimization procedure works, start with some
base tensor T that contains a collection of matrix multiplication tensors
Mhli ,mi ,ni i , 1 ≤ i ≤ x that are not disjoint. Then T ⊗N will contain matrix
multiplication tensors of the form Mhlµ ,mµ ,nµ i where lµ = lµ1 · · · lµN and
similarly for mµ , nµ , where µj ∈ [x].
Each matrix multiplication tensor will occur with a certain multiplicity
and certain variables. The problem becomes to zero out variables in a way
that maximizes the value of what remains. More precisely, for large N , one
P
ρ
wants to maximize the sum j Mj (lµj mµj nµj ) 3 where the surviving ma⊕M
j
and disjoint. One then takes
trix multiplication tensors are Mhlµ m
µj nµj i
j
P
ρ
the smallest ρ such that j Mj (lµj mµj nµj ) 3 ≥ R(T ) and concludes ω ≤ ρ.
There are many excellent expositions of this method, which involves exploiting the Salem-Spencer theorem on arithmetic progressions to be assured one
can get away with only zero-ing out a relatively small number of terms and
in the general case one assigns probability distributions and performs and
optimization to determine what percentage of each type gets zero-ed out. I
suggest [CW82] for the basic idea and [AFL14] for the state of the art.
3.6. The Cohn-Umans program
A conceptually appealing approach to proving upper bounds on ω was initiated by H. Cohn and C. Umans.
One works with two geometrically defined bases. In one (the “matrix
coefficient basis”), one gets an upper bound on the rank of the tensor, and
in the other (the “standard basis”) there are many potential combinatorial
degenerations and one gets a lower bound on the value.
Instead of finding arbitrary tensors that combinatorially restrict to matrix multiplication, they look for more structured ones that do: the multiplication tensors in algebras (especially group algebras of finite groups) and
more generally tensors constructed from such (“coherent configurations”).
I state the needed representation theory now, and defer proofs of the
statements to §5.1. I then present their method.
3.6.1. Preliminaries.
Structure tensor of an algebra. Let A beP
an algebra with basis a1 , . . . , aa
and dual basis α1 , . . . , αa . Write ai aj =
Akij ak for the multiplication in
3.6. The Cohn-Umans program
73
A. Define the structure tensor of A
X
(3.6.1)
MA :=
Akij αi ⊗αj ⊗ak ∈ A∗ ⊗A∗ ⊗A.
i,j,k
The group algebra of a finite group. Let G be a finite group and let C[G]
denote the vector space of complex-valued functions on G, called the group
algebra of G. The following exercise justifies the name:
Exercise 3.6.1.1: Show that if the elements of G are g1 , . . . , gr , then C[G]
has a basis indexed δg1 , . . . , δgr , where δgi (gj ) = δij . Show that C[G] may be
given the structure of an algebra by defining δgi δgj := δgi gj and extending
linearly.
is a finite group with elements g1 , . . . , gq , then MC[G] =
P Thus ∗ if G
∗ ⊗δ .
δ
⊗δ
gh
g,h∈G g
h
Example 3.6.1.2.
MC[Zm ] =
X
δi∗ ⊗δj∗ ⊗δi+j mod m .
0≤i,j<m
Notice that, labeling the elements of Zm x0 , . . . , xm−1 for 0, . . . , m − 1, one
obtains a circulant matrix for MC[Zm ] (A∗ ):


x0
x1 · · · xm−1
xm−1 x0 x1
··· 


MC[Zm ] (A∗ ) =  .

.
.
.
 .

.
x1
x2 · · ·
x0
What are R(MC[Zm ] ) and R(MC[Zm ] )? The space of circulant matrices
forms an abelian subspace, which indicates the rank and border rank might
be minimal or nearly minimal among concise tensors.
Exercise 3.6.1.3: Show that the centralizer of MC[Zm ] (x1 ) is MC[Zm ] (A∗ )
and thus R(MC[Zm ] ) = m.
We will determine the rank momentarily via the discrete Fourier transform.
Burnside’s theorem. In §5.1, we will see a proof of Burnside’s theorem and
the structure theorem of C[G] when G is finite:
Theorem 3.6.1.4. Let A be a finite dimensional simple algebra over C acting irreducibly on a finite-dimensional vector space V . Then A = End(V ).
More generally, a finite dimensional semi-simple algebra A over C is isomorphic to a direct sum of matrix algebras:
A ' M atd1 ×d1 (C) ⊕ · · · ⊕ M atdq ×dq (C)
for some d1 , . . . , dq .
74
3. Asymptotic upper bounds for matrix multiplication
Theorem 3.6.1.5. Let G be a finite group, then as a G × G-module and as
an algebra,
(3.6.2)
C[G] = ⊕i Vi∗ ⊗Vi
where the sum is over all the distinct irreducible representations of G. In
particular, if dim Vi = di , then
C[G] ' ⊕i M atdi ×di (C).
The (generalized) discrete Fourier transform. We have two natural expressions for MC[G] , the original presentation in terms of the algebra multiplication and the matrix coefficient basis in terms of Theorem 3.6.1.5. The
change of basis matrix from the standard expression to the matrix coefficient basis is called the (generalized) Discrete Fourier Transform (DFT).
The classical DFT is the case G = Zm and the matrix is
(e
2πi(j+k)
m
)0≤j,k≤m−1 .
Proposition 3.6.1.6. R(MC[Zm ] ) = R(MC[Zm ] ) = m.
Proof. In the matrix coefficient basis:

y0

y1

MC[Zm ] (A∗ ) = 


..


.

.
ym−1
where all but the diagonal entries are zero. This is because the irreducible
representations of Zm , where σ ∈ Zm is a generator, are all of the form
2πik
σ · v = e m v for 0 ≤ k ≤ m.
⊕m
In other words, as a tensor MC[Zm ] = Mh1i
.
Exercise 3.6.1.7: Show that if T ∈ σ̂r0,h , then R(T ) ≤ r(h + 1). }
Exercise 3.6.1.8: Obtain a fast algorithm for multiplying two polynomials
in one variable by the method you used to solve the previous exercise. }
Consider the example of S3 . In

x0
x1

x2
∗
MC[S3 ] (A ) = 
x3

x4
x5
the standard basis,
x1
x0
x5
x4
x3
x2
x2
x4
x0
x5
x1
x3
x3
x5
x4
x0
x2
x1

x4 x5
x2 x3 

x3 x1 

x1 x2 

x5 x0 
x0 x4 .
Here x0 = Id, x1 = (12), x2 = (13), x3 = (23), x4 = (123) and x5 = (132).
The irreducible representations of S3 are the trivial [3], the sign [1, 1, 1] and
3.6. The Cohn-Umans program
75
the two-dimensional standard representation [2, 1]. As a left S3 -module
(rather than as a S3 × S3 module as in the statement of the theorem),
⊕2
[2, 1]⊗[2, 1] ' [2, 1]⊗C2 ' [2, 1]⊕2 , so MC[S3 ] = Mh1i
⊕ Mh2i , and in the
matrix coefficient basis:


y0


y1




y
y
2
3
∗

MC[S3 ] (A ) = 


y4 y5



y2 y3 
y4 y5
where the blank entries are zero.
We conclude R(MC[S3 ] ) ≤ 1 + 1 + 7 = 9.
3.6.2. Upper bounds via finite groups. Here is the main idea:
Use the standard basis to get a lower bound on the value of MC[G] and
the matrix coefficient basis to get an upper bound on its cost (i.e., its border
rank).
Say MC[G] expressed in its standard basis degenerates (in particular,
if in standard coordinates it combinatorially restricts) to a sum of matrix
multiplications, say ⊕sj=1 Mhlj ,mj ,nj i . The standard basis is particularly well
suited to combinatorial restrictions because all the coefficients of the tensor in this basis are zero or one. Using the matrix coefficient basis, we
see MC[G] = ⊕qu=1 Mhdu i . Thus R(⊕sj=1 Mhlj ,mj ,nj i ) ≤ R(⊕qu=1 Mhdu i ) and
R(⊕sj=1 Mhlj ,mj ,nj i ) ≤ R(⊕qu=1 Mhdu i ).
A variant of the proof of asymptotic sum inequality implies:
Proposition 3.6.2.1. [CU03, CU] If MC[G] degenerates to ⊕sj=1 Mhlj ,mj ,nj i
P
ω
and du are the degrees of the irreducible representations of G, then sj=1 (lj mj nj ) 3 ≤
P ω
du .
In this section I will denote the canonical basis for C[G] given by the
group elements (which I have been denoting δgi ) simply by gi .
Say C[G] in the standard basis admits a combinatorial restriction to
Mhl,m,ni . The images of the functions α, β, γ in the combinatorial restriction
(3.5.1) may be identified with subsets of G of the form S1 −1 S2 , S2 −1 S3 , and
S3 −1 S1 with |S1 | = l, |S2 | = m, |S3 | = n and such that for any sj , s0j ∈ Sj ,
if s01 s1 −1 s02 s2 −1 s03 s3 −1 = Id, then s01 = s1 , s02 = s2 , s03 = s3 . This is called
the triple product property in [CU03]. There is a corresponding simultaneous triple product property when there is a combinatorial restriction to a
collection of disjoint matrix multiplication tensors.
76
3. Asymptotic upper bounds for matrix multiplication
×3
Example 3.6.2.2. [CKSU05] Let G = (Z×3
N × ZN ) n Z2 where Z2 acts by
switching the two factors. Write elements of G as [(ω i , ω j , ω k )(ω l , ω m , ω n )τ u ]
where 0 ≤ i, j, k, l, m, n ≤ N − 1, ω is a primitive N -th root of unity,
and u ∈ {0, 1}. Set l = m = n = 2N (N − 1). Label the elements of
[n] = [2N (N −1)] by a triple (a, b, ) where 1 ≤ a, a0 ≤ N −1, 0 ≤ b, b0 ≤ N −1
and , 0 ∈ {0, 1}, and define
α : [l] × [m] → [a]
0
0
0
((a, b, ), (a0 , b0 , 0 )) 7→ [(ω a , 1, 1)(1, ω b , 1)τ ]−1 [(1, ω a , 1)(1, 1, ω b )τ ]
β : [m] × [n] → [b]
0
0
0
((a, b, ), (a0 , b0 , 0 )) 7→ [(1, ω a , 1)(1, 1, ω b )τ ]−1 [(1, 1, ω a )(ω b , 1, 1)τ ]
γ : [n] × [l] → [c]
0
0
0
((a, b, ), (a0 , b0 , 0 )) 7→ [(1, 1, ω a )(ω b , 1, 1)τ ]−1 [(ω a , 1, 1)(1, ω b , 1)τ ]
In other words, S1 = [(ω a , 1, 1)(1, ω b , 1)τ ], S2 = [(1, ω a , 1)(1, 1, ω b )τ ] and
S3 = [(1, 1, ω a )(ω b , 1, 1)τ ], where 1 ≤ a ≤ N − 1, 0 ≤ b ≤ N − 1 and
∈ {0, 1}.
Then one verifies the triple product property (there are several cases).
Here |G| = 2N 6 , there are 2N 3 conjugacy classes with 1 element in it
3
3
and N2 conjugacy classes with 2 elements. Thus R(Mhni ) ≤ 2N 3 + 8 N2 ,
which is less than n3 for all admissible n ≥ 40. Asymptotically this is about
7 3
16 n . If one applies Proposition 3.6.2.1 with N = 17 (which is optimal),
one obtains ω < 2.9088. Note that this does not even exploit Strassen’s
3
algorithm, so one actually has R(Mhni ) ≤ 2N 3 + 7 N2 , however this does
not effect the asymptotics.
While this is worse than what one would obtain just using Strassen’s
algorithm (writing 40 = 32 + 8 and using Strassen in blocks), the algorithm
is different. In [CKSU05] they obtain a bound of ω < 2.41 by similar
techniques as Coopersmith-Winograd.
3.6.3. Further ideas towards upper bounds. The structure tensor of
C[G] had the convenient property that all the coefficients are zero or one.
In [CU] they propose looking at combinatorial restrictions of more general
structure tensors, where the coefficients can be more general. They make
the following definition, which is very particular to matrix multiplication:
Definition 3.6.3.1. We say T ∈ A⊗B⊗C, given in bases aα , bβ , cγ of
P α,β,γ
A, B, C, combinatorially supports Mhl,m,ni , if such that, writing T =
t
aα ⊗bβ ⊗cγ ,
3.6. The Cohn-Umans program
77
there exist functions
α : [l] × [m] → [a]
β : [m] × [n] → [b]
γ : [n] × [l] → [c]
α(i,u0 ),β(u,s0 )γ(s,u0 )
such that t
6= 0 if and only if i = i0 , u = u0 and s = s0 . (Recall that T combinatorially restricts to Mhl,m,ni if moreover tα(i,u),β(u,s)γ(s,u) =
1 for all i, s, u.)
T combinatorially supports Mhm,n,li if there exists a coordinate expression of T such that, upon setting some of the coefficients in the multidimensional matrix representing T to zero, one obtains mnl nonzero entries such that in that coordinate system, matrix multiplication is supported on exactly those mnl entries. They then proceed to define the
s-rank of a tensor T 0 , which is the lowest rank of a tensor T that combinatorially supports it. This is a strange concept because the s-rank of a
generic tensor
is one,
P a generic tensor is combinatorially supported
P because
P
by T = ( j aj )⊗( k bk )⊗( l cl ) where aj are a basis of A etc..
Despite this, they show that ω ≤ 32 ωs − 1 where ωs is the analog of the
exponent of matrix multiplication for s-rank. In particular, ωs = 2 would
imply ω = 2. The idea of the proof is that if T combinatorially supports
⊕t
with t = O(n2−o(1) ).
Mhni , then T ⊗3 combinatorially degenerates to Mhni
(Compare with the situation when T combinatorially restricts to Mhni , then
T ⊗3 combinatorially restricts to Mhni ⊗Mhn2 i and thus (combinatorially)
⊕b 3 n2 c
degenerates to Mhni 4
by Theorem 2.4.12.4.)
Chapter 4
The complexity of
Matrix multiplication
III: algorithms and
“practical” upper
bounds
One might argue that the exponent of matrix multiplication is unimportant
for the world we live in, since ω might not be relevant until the sizes of the
matrices are on the order of number of atoms in the known universe. For
implementation, it is more important to develop explicit algorithms that
provide a savings for matrices of sizes that need to be multiplied in practice.
In this chapter I discuss the known algorithms. I begin, in §4.1 with
general remarks on the geometry of decompositions. Then, in §4.2 I present
a purely geometric derivation for Strassen’s algorithm for Mh2i , and in fact
all rank six expressions of Mh2i . There are very few cases where we know the
border rank of Mhl,m,ni . One case where we have reasonable understanding
is for Mh2,2,ni . In §4.3 I describe the known border rank algorithms in these
cases and their geometry. There have been numerous algorithms found by
numerical methods. In §4.4 I describe one of the main methods for finding
such, the alternating least squares method. For Mh3i we do not know the rank
or border rank, so one cannot hope for too much geometry in any algorithm
found. In §4.5 I present several of the rank 23 expressions for Mh3i and
79
80
4. Algorithms for matrix multiplication
remark on their geometry. I conclude in §4.6 with Smirnov’s border rank 20
algorithm for Mh3i and a discussion of its geometry.
Warning: this chapter is in very rough form.
4.1. Geometry of decompositions
4.1.1. The abstract secant variety. I now construct a variety that will
facilitate the study of decompositions of a tensor. I make the construction
in the more general context of secant varieties.
Let X ⊂ PV be a variety. Consider the set
Sr (X)0 := {(x1 , . . . , xr , z) ∈ X ×r ×PV | z ∈ span{x1 , . . . , xr }} ⊂ Seg(X ×r ×PV ) ⊂ PV ⊗r+1
and let Sr (X) := Sr (X)0 denote its Zariski closure. (For those familiar with
quotients, it would be more convenient to deal with X (×r) := X ×r /Sr .) We
have a map π 0 : Sr (X)0 → PV , extending to a map π : Sr (X) → PV , given
by projection onto the last factor and the image is σr0 (X) (resp. σr (X)).
We will call Sr (X) the abstract r-th secant variety of X. As long as r < v
and X is not contained in a linear subspace of PV , dim Sr (X) = rn + r − 1
because dim X ×r = rn and a general set of r points on X will span a Pr−1 .
If σr (X) is of the expected dimension, so its dimension equals that of
Sr (X), then for general points z ∈ σr (X)0 , (π 0 )−1 (z) will consist of a finite
number of points and each point will correspond to a decomposition z =
x1 + · · · + xr for xj ∈ x̂j , z ∈ ẑ. In summary:
Proposition 4.1.1.1. If X n ⊂ PN and σr (X) is of (the expected) dimension
rn + r − 1 < N , then a Zariski open subset of points on σr (X) have a finite
number of decompositions into a sum of r elements of X.
If the fiber of π 0 over z ∈ σr0 (X) is k-dimensional, then there is a
k-parameter family of decompositions of z as a sum of r rank one tensors. This occurs, for example if z ∈ σr−1 (X), but it can also occur for
points in σr (X)\σr−1 (X). We will see this is indeed the case for Mh2,2,2i ∈
σ7 (Seg(P3 × P3 × P3 )).
If X is a G-variety, then σr (X) is also a G-variety, and if z ∈ σr0 (X) is
fixed by Gz ⊂ G, then Gz will act (possibly trivially) on (π 0 )−1 (z), and every
distinct (up to trivialities) point in its orbit will correspond to a distinct
decomposition of z. Let q ∈ (π 0 )−1 (x). If dim(Gz · q) = dz , then there is at
least a dz parameter family of decompositions of z as a sum of r elements
of X.
2
4.1.2. Decompositions of Mhni . Let A = Cn . The variety σr (Seg(PA ×
PA × PA)) ⊂ P(A⊗A⊗A) is a G = GL(A) × GL(A) × GL(A) n S3 -variety.
Recall from §2.1.3 that Mhni ∈ A⊗A⊗A is invariant under (SLn × SLn ×
4.2. Strassen’s algorithm for Mh2i
81
SLn × Z3 ) n Z2 . Thus there is potentially (at least) a (3n2 − 3)-parameter
family of decompositions for Mhni in its minimal decomposition.
For Mh2i the variety of decompositions is completely understood, thanks
to the following theorem of de Groote:
Theorem 4.1.2.1. [dG78] The decompositions of Mh2i as a sum of seven
rank one tensors is irredundantly parametrized by (SL2 ×SL2 ×SL2 ×Z3 )n
Z2 . In particular, the space of decompositions is the (SL2 × SL2 × SL2 ×
Z3 ) n Z2 orbit in S7 (Seg(P3 × P3 × P3 )) of Strassen’s algorithm (expressed
as a tensor).
The proof follows from a careful analysis of every possible decomposition,
taking into account that an element a⊗b⊗c is not just a triple of vectors,
but a triple of endomorphisms C2 → C2 , and the analysis is via the possible
triples of ranks that can appear.
Remark 4.1.2.2. This potentially huge family of optimal decompositions
illustrates again just how special matrix multiplication is. For T ∈ A⊗B⊗C
let GT ⊂ GL(A) × GL(B) × GL(C) denote the subgroup preserving T . For
generic T , GT is trivial.
Now dim Sr−1 (Seg(PA × PA × PA)) = (r − 1)(3n2 − 3) + r − 2 = 3n2 (r −
1) − 2r + 1 while dim GMhni = 3n2 − 3, so dim Sr−1 (Seg(PA × PA × PA)) +
dim GMhni = 3n2 r−2r−2 whereas dim Sr (Seg(PA×PA×PA)) = 3n2 r−2r−
1, so if the stabilizer of a point of π −1 (Mhni ) in GMhni is finite, the dimensions
fall just one short of one expecting an intersection with dim Sr−1 (Seg(PA ×
PA × PA)), and this is independent of r. Since Sr (Seg(PA × PA × PA)) is
not projective space, we would not be guaranteed an intersection even if the
dimensions did match up.
4.2. Strassen’s algorithm for Mh2i
In this section I give the promised geometric derivation of Strassen’s equations. I begin with a discussion of discrete symmetries of tensors in general.
4.2.1. The generalized Comon conjecture. In 2008 there was an AIM
workshop, Geometry and Representation theory of tensors for computer science, statistics and other areas, that brought together a very diverse group of
researchers. Among them was Pierre Comon, an engineer working in signal
processing. In signal processing (at least practiced by Comon), one wants to
decompose tensors presumed to be of rank r explicitly into a sum of r rank
one tensors. Sometimes the relevant tensors are symmetric. At the workshop
Comon presented the conjecture that if a tensor happens to be symmetric
and of rank r (as a tensor), then it will admit a decomposition as a sum
of r rank one symmetric tensors. The algebraic geometers in the audience
82
4. Algorithms for matrix multiplication
reacted to the conjecture with, one might say, skepticism. The conjecture is
still open, see [BGL13] for a discussion. Here is a generalization:
Conjecture 4.2.1.1 (Generalized Comon Conjecture). [CHI+ ] Let T ∈
A⊗d and say R(T ) = r and T is invariant under some Γ ⊆ Sd . Then T
admits a Γ-invariant decomposition as a sum of r rank one tensors.
In particular, I expect there to be an optimal decomposition of matrix
multiplication that is Z3 -invariant.
Remark 4.2.1.2. This conjecture is similar to Strassen’s additivity conjecture straddconj, in that in both cases a tensor that lives in a linear subspace
of the ambient space of tensors is conjectured to have an optimal decomposition only from elements of that subspace. See §?? for a discussion.
4.2.2. Z3 invariant tensors in V ⊗3 . To understand Z3 -invariant decompositions, it will be helpful to decompose V ⊗3 under the action of S3 ×
GL(V ). First let’s review the decomposition of V ⊗2 under S2 × GL(V )
from Exercise 2.3.5(2): V ⊗2 = [2]⊗S 2 V ⊕ [1, 1]⊗Λ2 V where [2]⊗S 2 V and
[1, 1]⊗Λ2 V are irreducible S2 × GL(V )-modules. As an S2 -module, the
first is the direct sum of v+1
2
copies of the trivial representation and the
v
second is the direct sum of 2 copies of the sign representation. As a
GL(V )-module, both modules are irreducible. (To see this, note that the
GL(V )-orbit of any non-zero vector does not lie in a proper subspace.)
There are three irreducible representations of S3 , the trivial [3], the sign
[1, 1, 1], and the complement of the trivial for the action of S3 on C3 , [2, 1]
(see Example 2.3.2.3), which is two dimensional.
It is straightforward to see that S 3 V, Λ3 V are both irreducible for GL(V ),
and are the isotypic components respectively of the trivial representation
and the sign representation. By counting dimensions, we see we have not
accounted for everything. Thus there must be a subspace corresponding
to
v
2
3 −
the isotypic component of [2, 1], of dimension v3 − v+2
−
=
(v
3
3
3
v). To get something complementary to S 3 V and Λ3 V , consider V ⊗3 =
V ⊗2 ⊗V = (S 2 V ⊕ Λ2 V )⊗V . We have the GL(V )-module symmetrization
map S 2 V ⊗V → S 3 V . Its kernel must be a GL(V )-module.
the map
Since
v+1
v+2
1
is surjective, the dimension of the kernel is v 2 − 3 = 3 (v3 − v),
exactly half of what we are missing. The kernel of the skew-symmetrization
map Λ2 V ⊗V → Λ3 V provides the rest. Both of these kernels are irreducible
GL(V )-modules and they are isomorphic, which will be proved in §5.2. Label
the irreducible GL(V )-module S21 V , so
(4.2.1)
V ⊗3 = (S 3 V ⊗[3]) ⊕ (S21 V ⊗[2, 1]) ⊕ (Λ3 V ⊗[1, 1, 1])
as an irreducible GL(V ) × S3 -module. In particular, as a GL(V )-module,
V ⊗3 = S 3 V ⊕ (S21 V )⊕2 ⊕ Λ3 V.
4.2. Strassen’s algorithm for Mh2i
83
Exercise 4.2.2.1: Show that the image of the map V ⊗Λ2 V → S 2 V ⊗V
obtained by inclusion into V ⊗V ⊗V combined with symmetrization on the
first two factors also has dimension 31 (v3 − v) and is isomorphic to S21 V as
a GL(V )-module.
We now consider how Z3 ⊂ S3 acts on each factor. It is generated by a
cyclic permutation.
Exercise 4.2.2.2: Show that the cyclic permutation σ = (1, 2, 3) acts trivially on S 3 V ⊕ Λ3 V (i.e., these are the +1 eigenspaces for (1, 2, 3), and
S21 V ⊗[2, 1] splits into a direct sum of eigenspaces for ω and ω 2 where
2πi
ω=e 3 .
To see the splitting in bases, the Z3 invariant tensors in V ⊗3 have
naı̈ve building blocks of three types: v⊗v⊗v, which lies in S 3 V , v⊗v⊗w +
v⊗w⊗v + w⊗v⊗v, which also lies in S 3 V , and
u⊗v⊗w + v⊗w⊗u + w⊗u⊗v =
1
(u⊗v⊗w + v⊗w⊗u + w⊗u⊗v + v⊗u⊗w + w⊗v⊗u + u⊗w⊗v)
2
1
+ (u⊗v⊗w + v⊗w⊗u + w⊗u⊗v − v⊗u⊗w − w⊗v⊗u − u⊗w⊗v) ∈ S 3 V ⊕ Λ3 V.
2
Exercise 4.2.2.3: Decompose V ⊗4 as a GL(V )-module, keeping in mind
that V ⊗4 = V ⊗3 ⊗V = V ⊗2 ⊗V ⊗2 . }
4.2.3. Decompositions of Mh2i . We are looking for a nine-parameter
family of decompositions of Mh2i , parametrized by SL2 × SL2 × SL2 , among
which there should be Z3 -invariant algorithms. Recall the invariant description of Mh2i : let U, V, W = C2 , and write Mh2i = IdU ⊗IdV ⊗IdW ∈
(U ∗ ⊗V )⊗(V ∗ ⊗W )⊗(W ∗ ⊗U ) = A⊗B⊗C. To obtain an Z3 -invariant algorithm, we will need to identify A, B, C, which we may do by identifying
U, V, W . Explicitly, choose isomorphisms a0 : U → V and b0 : V → W ,
which determines c0 = a0 −1 b0 −1 : W → U . These elements, in bases for
V, W induced from those of U , will correspond to identity matrices. The
choices have already killed off two of our SL2 ’s and we are only left with
one, the diagonal SL2 inside SL(U )×SL(V )×SL(W ), which for convenience
I will identify with SL(U ).
I have been a little sloppy above, since the group we are really interested
in is the action of GL(U ) × GL(V ) × GL(W ) on our space, and when the
general linear group acts on projective space, the image group is denoted
P GLm := GLm /C∗ .
It remains to to exactly use up a P GL(U )’s worth of freedom. Now
P GL2 acts three-transitively on CP1 , that is, given any three distinct points
84
4. Algorithms for matrix multiplication
in CP1 ' S 2 , there exists exactly one element of P GL2 that will move
them to e.g., 0, 1, ∞, i.e., [1, 0], [1, 1], [0, 1]. So our algorithm should be
obtained from the choice of a0 , b0 and three distinct points [u1 ], [u2 ], [u3 ] ∈
PU . Moreover, the algorithm should be unique if it is also Z3 -invariant.
We need to be careful here about what we mean by Z3 and Z2 -invariance.
Write Zstd
for the Z3 ⊂ S3 that is cyclic permutation and ZT2 for the Z2
3
corresponding to transpose. The algorithm should be Zstd
3 -invariant if we
represent a0 , b0 (and hence c0 ) by identity matrices, and then the additional
Z2 -action will be ZT2 .
Our choices also determine three points [u1 ⊥ ], [u2 ⊥ ], [u3 ⊥ ] ∈ PU ∗ , as the
annihilator of a line in C2 is a line in C2∗ . Applying a0 etc.. gives us triples
of points in each of PV, PV ∗ , PW, PW ∗ as well.
By the above principles, we should be able to obtain matrix multiplication as a sum of seven rank one tensors in a Z3 -invariant manner only from
this data. Since the Z3 -orbit of a rank one element of A⊗B⊗C is either a
rank one element or rank three, the possible decompositions are 7 = 1+3+3,
7 = 1 + 1 + 1 + 1 + 3, or 7 = 1 + 1 + 1 + 1 + 1 + 1 + 1. The last case is ruled
out because it would result in a symmetric tensor. The first case is most
promising, as we have the singleton a0 ⊗b0 ⊗c0 which is Zstd
3 invariant when
these are identity matrices, and to obtain two triplets, we may use the six
u∗i ⊗vj with i 6= j terms. Note that as linear maps, the terms u∗i ⊗vj all have
rank one.
Indeed in Strassen’s algorithm, all terms in it except the first are the
tensor product of three of rank one matrices, moreover, the first term, as
matrices, is Id⊗Id⊗Id, which corresponds to a0 ⊗b0 ⊗c0 .
Consider the term (u1 ⊥ ⊗v2 )⊗(v3 ⊥ ⊗w1 )⊗(w2 ⊥ ⊗u3 ) The sum of its images under the action of Z3 is
h(u1 ⊥ ⊗v3 )⊗(v2 ⊥ ⊗w1 )⊗(w3 ⊥ ⊗u2 )iZ3 :=
(u1 ⊥ ⊗v3 )⊗(v2 ⊥ ⊗w1 )⊗(w3 ⊥ ⊗u2 ) + (aT0 (v2 ⊥ )⊗b0 −1 (w1 ))⊗(bT0 (w3 ⊥ )⊗c0 −1 (u2 ))⊗(cT0 (u1 ⊥ )⊗a0 −1 (v3 ))
+ ((c0 −1 )T (w3 ⊥ )⊗a0 (u2 ))⊗((a0 −1 )T (u1 ⊥ )⊗b0 (v3 ))⊗((b0 −1 )T (v2 ⊥ )⊗c0 (w1 ))
=
(u1 ⊥ ⊗v3 )⊗(v2 ⊥ ⊗w1 )⊗(w3 ⊥ ⊗u2 )
+ (u2 ⊥ ⊗v1 )⊗(v3 ⊥ ⊗w2 )⊗(w1 ⊥ ⊗u3 )
+ (u3 ⊥ ⊗v2 )⊗(v1 ⊥ ⊗w3 )⊗(w2 ⊥ ⊗u1 ).
We have slightly violated our principles, as ui ⊥ , vj etc.. are only defined
up to scale, so we must adjust our expression to be invariant under our
choices of scale. Set λ = u1 ⊥ (v2 )v3 ⊥ (w1 )w2 ⊥ (u3 ) and µ = u1 ⊥ (v3 )v2 ⊥ (w1 )w3 ⊥ (u2 ).
4.2. Strassen’s algorithm for Mh2i
85
When we divide the first triple by λ and the second by µ, we obtain elements
that do not depend on our choices of scales.
Theorem 4.2.3.1. [CHI+ ] Notations as above. The following is a 9 parameter family, parametrized by SL(U ) × SL(V ) × SL(W ), of rank seven
expressions for Mh2i :
(4.2.2)
Mh2i =a0 ⊗b0 ⊗c0
1
+ h(u1 ⊥ ⊗v3 )⊗(v2 ⊥ ⊗w1 )⊗(w3 ⊥ ⊗u2 )iZ3
µ
1
+ h(u1 ⊥ ⊗v2 )⊗(v3 ⊥ ⊗w1 )⊗(w2 ⊥ ⊗u3 )iZ3
λ
These algorithms, are all possible expressions of Mh2i as a sum of seven
rank one tensors (up to trivialities).
If we fix bases such that a0 = b0 = Id, then the algorithms are also
and we obtain a three parameter family of Zstd
3 -invariant algorithms.
Zstd
3 -invariant
To prove (4.2.2) holds, of course one could
e.g., taking
check it directly,
0
1
1
⊥
⊥
⊥
u1 = (1, 0), u2 = (0, 1), u3 = (1, 1), u1 =
, u2 =
, u3 =
,
1
0
−1
1 0
and a0 = b0 =
. However instead I will give a geometric proof.
0 1
Proof. Since C{MhU,V,W i } is the unique instance of the trivial representation of GL(U )×GL(V )×GL(W ) in A⊗B⊗C = (U ∗ ⊗V )⊗(V ∗ ⊗W )⊗(W ∗ ⊗U ),
the matrix multiplication tensor is characterized by its symmetry group,
in the sense that no other tensor in A⊗B⊗C is preserved by a group
G0 ⊆ GL(U ) × GL(V ) × GL(W ). Thus to prove (4.2.2) it is sufficient
to show the right hand side is preserved by SL(U ) × SL(V ) × SL(W ), or
equivalently annihilated by sl(U ) + sl(V ) + sl(W ).
Assume a0 = b0 = Id. By Z3 -invariance it is sufficient to show the right
hand side is annihilated by sl(U ), which has basis u1 ⊥ ⊗u1 , u2 ⊥ ⊗u2 , u3 ⊥ ⊗u3 .
By the symmetry of the algorithm it is sufficient to show it is annihilated
by u1 ⊥ ⊗u1 , which is easy to check.
The penultimate assertion is Theorem 4.1.2.1, combined with the observation that the Z2 action takes an element of the family to another element of
the family. The Z3 -invariance in the final assertion was already shown. Show that no expression is fixed by Z2 and that Z2 indeed takes an
expression in the family to another expression in the family.
Exercise 4.2.3.2: Verify that u1 ⊥ ⊗u1 , u2 ⊥ ⊗u2 , u3 ⊥ ⊗u3 is a basis of sl(U ).
86
4. Algorithms for matrix multiplication
Exercise 4.2.3.3: Verify that the right hand side of (4.2.2) is annihilated
by u1 ⊥ ⊗u1 to complete the proof. }
4.3. Border rank algorithms for Mh2,2,ni
4.3.1. Geometry of the expression (3.2.2). Recall
1
TBCLR = lim [(x12 + tx11 )⊗(y21 + ty22 )⊗z12
t→0 t
+ (x21 + tx11 )⊗(y11 )⊗(z11 + tz21 )
− x12 ⊗y21 ⊗((z11 + z12 ) + tz22 )
− x21 ⊗((y11 + y21 ) + ty12 )⊗z11
+ (x12 + x21 )⊗(y21 + ty12 )⊗(z11 + tz22 )]
***This subsection to be expanded*** If we want to find expressions such as
(3.2.2) for larger matrix multiplication tensors, we will need to develop geometric insight for the examples we already have. What follows are remarks
in this direction.
First, from our lower bounds for the rank and border rank of matrix
multiplication, it should not be surprising that R(TBCLR ) < R(TBCLR ).
The group preserving TBCLR includes SL2 = SL(W ) and Z2 which acts
by (X, Y, Z) 7→ (X T , Z T , Y T ).
The key geometric feature of the expression (3.2.2) is based on the following general fact: If X ⊂ PV is a variety, and x1 , . . . , xr ∈ X̂ are linearly dependent elements of V , then any point on the sum of their tangent
spaces is in σ̂r (X). The idea of the proof is as follows: consider instead
the Grassmannian G(r, V ). For any curves xj (t) with xj (0) = xj such that
x1 (t) ∧ · · · ∧ xr (t) 6= 0 for t 6= 0, i.e., the points span an r-plane for all t 6= 0,
consider the limiting plane in G(r, V ):
lim[x1 (t) ∧ · · · xr (t)] = lim[(x1 + tx01 + O(t2 )) ∧ · · · ∧ (xr + tx0r + O(t2 ))]
t→0
t→0
1
= lim[ (x1 + tx01 + O(t2 )) ∧ · · · ∧ (xr + tx0r + O(t2 ))].
t→0 t
The zero-th order term in the expansion is zero. The first order term in
general will be nonzero, and one can arrange for it to be x1 ∧ · · · xr−1 ∧ v
where v is an arbitrary linear combination of elements of the T̂xj X. For
details, see [Lan12, §11.2].
The five points in (3.2.2) are:
(4.3.1)
x12 ⊗y21 ⊗z12 , x21 ⊗y11 ⊗z11 , −x12 ⊗y21 ⊗(z11 +z12 ), −x21 ⊗(y11 +y21 )⊗z11 , (x12 +x21 )⊗y21 ⊗z11 .
4.3. Border rank algorithms for Mh2,2,ni
87
and indeed they sum to zero. Notice all but x12 + x21 of the six elements
appearing in the expression, considered as matrices, have rank one.
Now consider the five tangent vectors:
x12 ⊗y22 ⊗z12 + x11 ⊗y21 ⊗z12 ,
x21 ⊗y11 ⊗z21 + x11 ⊗y11 ⊗z11 ,
− x12 ⊗y21 ⊗z22 ,
− x21 ⊗y12 ⊗z11 ,
(x12 + x21 )⊗y12 ⊗z11 + (x12 + x21 )⊗y21 ⊗z22 .
The first two and the last contribute two terms that appear in the standard expression, the last term also contributes two “bad” terms that do not
appear, which are canceled by the single terms appearing in the third and
fourth terms. Note that none of these are generic tangent vectors, and that
it was the point containing the rank two matrix that generated two bad
terms along with the two good ones that had to be canceled.
In general, given r − 1 points on a variety X ⊂ PV of codimension at
least r − 1, it is rare that one can find an r-th point on X contained in their
span. Thus it is worthwhile to see “why”one can find such points in our
case, so I take short detour.
Consider the case r = 3: An easy way for there to exist a third point on
X on the line spanned by two points of X is if the two points lie on a line
contained in X.
The Segre variety indeed has many linear spaces on it, but the points
used in (3.2.2) do not all lie on a single linear space contained in the Segre.
However the structure of lines on the Segre does arise in their construction.
So I begin with a discussion of lines on the smallest Segre varieties:
Consider first lines (i.e. linear P1 ’s) on Seg(P1 × P1 ) = Seg(PA × PB).
They come in two types, call them α-lines, those of the form [a⊗B] for some
[a] ∈ PA, and β-lines, which are of the form [A⊗b] for some [b] ∈ PB.
**picture of hyperbola with double ruling to appear here***
Exercise 4.3.1.1: Show that no two lines of the same type intersect, and
that each α-line intersects each β-line in one point.
Now consider Seg(P1 × P1 × P1 ) = Seg(PA × PB × PC), and call the
analogous lines of type α, β, γ.
Exercise 4.3.1.2: Show that given a β-line and a γ-line, either the two
lines do not intersect or they intersect at one point. Show that in either
case there is a unique α-line intersecting both lines.
88
4. Algorithms for matrix multiplication
Now for the description of the points (4.3.1): Since B, C are play the
same role in TBCLR , one expects the point selection to be symmetric in B,
C. Now B, C have internal structure as V ∗ ⊗W and W ∗ ⊗U . We first pick
a β-line and a γ-line. To pick the β line we need a line in PB and points in
PA, PC: take a line on Seg(PV ∗ × PW ) ⊂ PB, a point on PA ∩ Seg(PU ∗ ×
PV ), and a point on Seg(PW ∗ × PU ) ⊂ PC. Pick the γ-line by taking a line
in Seg(PV ∗ × PW ) containing the point picked for the β-line, a point in PC
lying on the line in Seg(PW ∗ × PU ) that was chosen for the β-line, and a
point of PA ∩ Seg(PU ∗ × PV ) that is not collinear to the point chosen for the
β line. The points chosen all lie in some P(C2 ⊗C2 ⊗C2 ) ⊂ P(A⊗B⊗C) and
such that there will be a unique α-line in this space intersecting the other
two lines.
**picture of the configuration to go here***
The points (4.3.1) are taken by taking two points on the β-line, and
two points on the γ-line and then the fifth point on the α-line that passes
through the other two lines.
Both the β and γ-lines contain exclusively rank one matrices, whereas
the α line consists of B and C matrices that are rank one, and all A-points
on the line except the two points of intersection with the β and γ-lines have
rank two.
Problem 4.3.1.3. Can one show this construction, plus perhaps some additional geometric constraints, uniquely determines the algorithm up to the
G-action?
Problem 4.3.1.4. Formulate general principles for algorithms for TBCLR
and other restricted matrix multiplication operators based on the above
remarks.
4.3.2. Smirnov’s algorithms for Mh2,2,ni . under construction
The algorithm that shows R(Mh2,2,3i ) ≤ 10, due to Bini et. al., is just
Bini’s algorithm for 2 × 2 matrices with one entry zero applied twice. Here
is an algorithm for the next case:
By the principles described in §4.2.1, the optimal algorithm should have
Z2 symmetry coming from the equality trace(XY Z) = trace(Z T Y T X T )
(although strictly speaking, this is more general than the generalized Comon
conjecture) and come in families parametrized by SL2 × SL2 × SL4 . That
is, they should correspond to a choice of 3 points in each of U, V ' C2 and
five points in W = C4 .
4.6. Smirnov’s algorithm showing the border rank of Mh3i is at most 20 89
1
times
t2
(x22 + x42 − tx12 )(−y11 + y22 )(−z11 + z31 − tz41 + t2 z22 )
+ (−x22 + tx12 )(y22 − ty12 )(−z11 + z31 + t(−z41 + z12 ))
+ (−(x22 + x42 ) + t(x12 − x21 ) + t2 x11 )y11 (z11 − t2 z22 )
+ (x22 + x42 + t(−x41 − x12 ))(−y11 + ty12 )(−z31 + tz41 )
+ x42 y22 (z11 − z31 + tz41 − t2 z22 )
+ x21 (y21 + t(−y11 + y12 ))(−z11 + tz21 + t2 z22 )
+ (x42 + tx32 + t2 x31 )(y21 − ty12 )(−z31 + z32 )
+ (x41 + tx31 + t2 x31 )(y21 + ty11 − t2 y12 )(z31 + t2 z42 )
+ (x42 + tx32 )(−y21 + t(y12 + y22 ))z32
+ (x22 + t2 x11 )(y21 + ty22 − t2 y12 )z12
+ (x21 + x22 )(y21 + ty12 )(z11 + tz21 )
+ (x42 − x41 + t(x32 − x31 ))y21 z31
+ x22 y21 (−(z11 + z12 ) − tz21 )
The algorithm in [AS13] is
This algorithm is not far from being Z2 invariant. Comparing the x’s and
the z’s, there are 3 terms each with t appearing to orders 0, 1, 2, 4 terms
each with t0 only and then for orders 0, 2 x has 1 term and z two, while for
orders 0, 1, x has 5 terms and z 4. This gives the first hint how to get a
Z2 -invariance.
4.4. Alternating least squares approach to algorithms
This section to be written.
4.5. Algorithms for 3 × 3 matrix multiplication with 23
multiplications
Will discuss how there are many algorithms for 3 × 3, what they all have in
common, uniqueness issues. under construction
4.6. Smirnov’s algorithm showing the border rank of Mh3i is
at most 20
To be added.
Chapter 5
Representation theory
for complexity theory
This chapter contains proofs of several results regarding matrix multiplication and estabishes the basic representation theory needed for Geometric
Complexity Theory. It begins, in §5.1 with the algebraic Peter-Weyl theorem, which is central to GCT. This section also contains a proof of Burnside’s
theorem owed from Chapter 3. Next, in §5.2 the representation theory of Sd
and GL(V ) is developed with an emphasis on practical implementation and
addressing issues that arise in the study of matrix multiplication. In particular, I explain the origin of the Koszul flattenings and the more general
Young flattenings. It concludes, in §5.3, with a discussion of various methods for finding modules of equations for secant varieties of Segre varieties
via representation theory.
For supplementary material to this chapter, I suggest Lectures 1-4 and
6 in [FH91] or [Lan12, Chap. 6].
5.1. Double-Commutant Theorem and first consequences
5.1.1. Algebras and their modules. It will sometimes be advantageous
to work with the Lie algebra of GL(V ) and the group algebra C[G] of a finite
group G because of the linear structure of the algebras. Any G-module is
naturally a C[G]-module and vice versa.
For an algebra A, and a ∈ A the ideal Aa is a (left) A-module.
Define a representation L : G → GL(C[G]) by L(g)δgi = δggi and extending the action linearly. Define a second representation R : G → GL(C[G])
91
92
5. Representation theory for complexity theory
by R(g)δgi = δgi g−1 . Thus C[G] is a G × G-module under the representation
(L, R), and for all c ∈ C[G], C[G]c is a G-module under the action L.
A representation ρ : G → GL(V ) induces an algebra homomorphism
C[G] → End(V ), and it is equivalent that V is a G-module or a left C[G]module.
Terminology for algebras is slightly different than for groups. A module
M (for a group, ring, or algebra) is simple if it has no proper submodules.
The module M is semi-simple if it may be written as the direct sum of simple
modules. An algebra is completely reducible if all its modules are semisimple. For groups alone I will continue to use the terminology irreducible
for a simple module, completely reducible for a semi-simple module, and
reductive for a group such that all its modules can be decomposed into a
direct sum of irreducible modules.
Exercise 5.1.1.1: Show that if A is completely reducible, V is an A-module
with an A-submodule U ⊂ V , then there exists an A-invariant complement
to U in V and a projection map π : V → U that is an A-module map. }
5.1.2. The double-commutant theorem. Our sought-after decomposition of V ⊗d as a GL(V )-module will be obtained by exploiting the fact that
the actions of GL(V ) and Sd on V ⊗d commute. In this subsection we study
commuting actions in general, as this is also the basis of the generalized DFT
used in the Cohn-Umans method and the key to much of GCT. References
for this section are [Pro07, Chap. 6], [GW09, §4.1.5] and [Lan02, Chap.
XVII]. Let S ⊂ End(V ) be any subset. Define the centralizer or commutator
of S to be
S 0 := {X ∈ End(V ) | Xs = sX ∀s ∈ S}
Proposition 5.1.2.1.
(1) S 0 is an algebra.
(2) S ⊂ (S 0 )0 .
Exercise 5.1.2.2: Prove Proposition 5.1.2.1.
Theorem 5.1.2.3. [Double-Commutant Theorem] Let A ⊂ End(V ) be a
completely reducible associative algebra. Then A00 = A.
Proof. By Proposition 5.1.2.1, A ⊆ A00 . To show the reverse inclusion, say
T ∈ A00 . Fix a basis v1 , . . . , vv of V . Since the action of T is determined
by its action on a basis, we need to find a ∈ A such that avj = T vj for
j = 1, . . . , v. Let w := v1 ⊕ · · · ⊕ vv ∈ V ⊕v and consider the submodule
Aw ⊆ V ⊕v . By Exercise 5.1.1.1, there exists an A-invariant complement
to this submodule and an A-equivariant projection π : V ⊕v → Aw ⊂ V ⊕v ,
that is, a projection π that commutes with the action of A, i.e., π ∈ A0 . Since
5.1. Double-Commutant Theorem and first consequences
93
T ∈ A00 we have π(T w) = T (π(w)) but T (π(w)) = T (w) = T v1 ⊕ · · · ⊕ T vv .
But since π(T w) ∈ Aw, there must be some a ∈ A such that aw = T (w),
i.e., av1 ⊕ · · · ⊕ avv = T v1 ⊕ · · · ⊕ T vv , i.e., avj = T vj for j = 1, . . . , v. Adopt the notation HomS (V ) := S 0 .
Burnside’s theorem, stated in §3.6, has a similar proof:
Theorem 5.1.2.4. Let A be a finite dimensional simple algebra over C acting irreducibly on a finite-dimensional vector space V . Then A = End(V ).
More generally, a finite dimensional semi-simple algebra A over C is isomorphic to a direct sum of matrix algebras:
A ' M atd1 ×d1 (C) ⊕ · · · ⊕ M atdq ×dq (C)
for some d1 , . . . , dq .
Proof. For the first assertion, we need to show that given X ∈ End(V ),
there exists a ∈ A such that avj = Xvj for v1 , . . . , vv a basis of V . Now just
imitate the proof of Theorem 5.1.2.3. For the second assertion, note that A
is a direct sum of simple algebras.
Remark 5.1.2.5. A pessimist could look at this theorem as a disappointment: all kinds of interesting looking algebras over C, such as the group
algebra of a finite group, are actually just plain old matrix algebras in disguise. An optimist could view this theorem as stating there is a rich structure
hidden in matrix algebras. We will determine the matrix algebra structure
explicitly for the group algebra of a finite group.
5.1.3. Consequences for reductive groups. Let S be a group or algebra
and let V, W be S-modules, adopt the notation HomS (V, W ) for the space
of S-module maps V → W , i.e.,
HomS (V, W ) : = {f ∈ Hom(V, W ) | s(f (v)) = f (s(v)) ∀ s ∈ S, v ∈ V }
= (V ∗ ⊗W )S .
Theorem 5.1.3.1. Let G be a reductive group and let V be a G-module.
Then
(1) The commutator EndG (V ) is a semi-simple algebra.
(2) The isotypic components of G and EndG (V ) in V coincide.
(3) Let U be one such isotypic component, say for irreducible representations A of G and B of EndG (V ). Then, as a G×EndG (V )-module,
U = A⊗B,
as an EndG (V )-module
B = HomG (A, U ),
94
5. Representation theory for complexity theory
and as a G-module
A = HomEndG (V ) (B, U ).
In particular, mult(A, V ) = dim B and mult(B, V ) = dim A.
Example 5.1.3.2. Below we will see that EndGL(V ) (V ⊗d ) = C[Sd ]. Recall from equation (4.2.1) the S3 × GL(V )-module decomposition V ⊗3 =
([3]⊗S 3 V )⊕([2, 1]⊗S21 V )⊕([1, 1, 1]⊗Λ3 V ) which illustrates Theorem 5.1.3.1.
To prove the theorem, we will need the following lemma:
Lemma 5.1.3.3. For U ⊂ V a G-submodule and f ∈ HomG (U, V ), there
exists a ∈ EndG (V ) such that a|U = f .
Proof. Consider the diagram
End(V ) −→ Hom(U, V )
↓
↓
EndG (V ) −→ HomG (U, V )
The vertical arrows are G-equivariant projections, and the horizontal arrows are restriction of domain of a linear map. The diagram is commutative. Since the vertical arrows and upper horizontal arrow are surjective, we
conclude the lower horizontal arrow is surjective as well.
Proof of Theorem. I first prove (3): The space HomG (A, V ) is an EndG (V )module because for s ∈ HomG (A, V ) and a ∈ EndG (V ), the composition as : A → V is still a G-module map. We need to show (i) that
HomG (A, V ) is irreducible and (ii) that the isotypic component of A in
V is A⊗ HomG (A, V ).
To show (i), it is sufficient to show that for all s, t ∈ HomG (A, V ), there
exists a ∈ EndG (V ) such that at = s. By Lemma 5.1.3.3, there exists
a ∈ EndG (V ) such that a(tA) = sA, i.e., at : A → sA is an isomorphism.
Consider the isomorphism S : A → sA, given by a 7→ sa, so S −1 at is a
nonzero scalar c times the identity. Then ã := 1c a has the property that
ãt = s.
To see (ii), let U be the isotypic component of A, so U = A⊗B for some
vector space B. Let b ∈ B and define a map b̃ : A → V by a 7→ a⊗b, which
is a G-module map where the action of G on the target is just the action
on the first factor. Thus B ⊆ HomG (A, V ). Any G-module map A → V by
definition has image in U , so equality holds.
(3) implies (2).
To see (1), note that EndG (V ) is semi-simple because if the irreducible
components of V are Ui , then EndG (V ) = ⊕i EndG (Ui ) = ⊕i EndG (Ai ⊗Bi ) =
⊕i End(Bi ).
5.1. Double-Commutant Theorem and first consequences
95
5.1.4. Matrix coefficients. For a large class of groups, one can obtain
all their irreducible representations from the ring of regular functions on
G, denoted C[G]. (Here G is not usually a projective variety, it is a quasiprojective variety, see, e.g., [Sha94, §4] and one can still define regular
functions on G. For a finite group, this just means any function.) Let G be
any group. Let ρ : G → GL(V ) be a finite dimensional representation of G.
Define a map iV : V ∗ ⊗V → C[G] by iV (α⊗v)(g) := α(ρ(g)v). The space of
functions iV (V ∗ ⊗V ) is called the space of matrix coefficients of V .
Exercises.
(1) Show iV is a G × G-module map.
(2) Show that if V is irreducible, iV is injective. }
(3) If we choose a basis v1 , . . . , vv of V with dual basis α1 , . . . , αv , then
iV (αi ⊗vj )(g) is the (i, j)-th entry of the matrix representing ρ(g)
in this basis (which explains the name “matrix coefficients”).
(4) Compute the matrix coefficient basis of the three irreducible representations of S3 in terms of the standard basis δσ , σ ∈ S3 .
a b
(5) Let G = GL2 C, write g =
∈ G, and compute the matrix
c d
coefficient basis as functions of a, b, c, d when V = S 2 C2 , S 3 C2 and
Λ2 C2 .
Theorem 5.1.4.1. Let G be any group and V an irreducible G-module.
Then iV (V ∗ ⊗V ) equals the isotypic component of type V in C[G] under the
action R and the isotypic component of V ∗ in C[G] under the action L.
Proof. It suffices to prove one of the assertions, consider the action R. Let
j : V → C[G] be a G-module map under the action R. We need to show
j(V ) ⊂ iV (V ∗ ⊗V ). Define α ∈ V ∗ by α(v) := j(v)(Id). Then j(v) =
iV (α⊗v), as j(v)g = j(v)(Id · g) = j(gv)(Id) = α(gv) = iV (α⊗v)g.
The matrix coefficients provide many functions on a group. For reductive
algebraic groups, we will see they provide all the algebraic functions C[G].
To make sense of this we need to define C[G]. Since we know what the
functions on an affine variety are, we just need to realize G as an affine
2
variety. Consider GL(W ) as the affine subvariety of Cn +1 , with coordinates
(xij , t) given by the equation t det(x) = 1. Then C[GL(W )] may be defined
2
to be the restriction of polynomial functions on Cn +1 to the subvariety
isomorphic to GL(W ). If G ⊂ GL(W ) is defined by algebraic equations,
this also enables us to define C[G] because G ⊂ GL(W ) is a subvariety.
5.1.5. Characters.
96
5. Representation theory for complexity theory
Exercise 5.1.5.1: Let ρ : G → GL(V ) be a representation. Define a function χρ : G → C by χρ (g) = trace(ρ(g)). The function χρ is called the
character of ρ. Show that χρ is constant on conjugacy classes of G. (In
general, a function f : G → C such that f (hgh−1 ) = f (g) for all g, h ∈ G
is called a class function.) Show that for representations ρj : G → GL(Vj ),
that χρ1 ⊕ρ2 = χρ1 + χρ2 .
Exercise 5.1.5.2: Given ρj : G → GL(Vj ) for j = 1, 2, define ρ1 ⊗ρ2 : G →
GL(V1 ⊗V2 ) by ρ1 ⊗ρ2 (g)(v1 ⊗v2 ) = ρ1 (g)v1 ⊗ρ2 (g)v2 . Show that χρ1 ⊗ρ2 =
χρ1 χρ2 .
5.1.6. Application to representations of finite groups. Theorem 5.1.4.1
implies:
Theorem 5.1.6.1. Let G be a finite group, then as a G × G-module under
the action (L, R) and as an algebra,
M
(5.1.1)
C[G] :=
Vi∗ ⊗Vi
i
where the sum is over all the distinct irreducible representations of G.
Exercise 5.1.6.2: Let G be a finite group and H a subgroup. Show that
C[G/H] = ⊕i Vi∗ ⊗(Vi )H as a G-module under the action L.
Theorem 5.1.6.1 has many generalizations. Its generalization to reductive algebraic groups was a major inspiration for the Geometric Complexity
Theory program.
Remark 5.1.6.3. The classical Heisenberg uncertainty principle from physics,
in the language of mathematics, is that it is not possible to localize both
a function and its Fourier transform. A discrete analog of this uncertainty
principle holds, in that the transforms of the delta functions have large support in terms of matrix coefficients and vice versa.
Theorem 5.1.6.1 is not yet useful, as we do not yet know what the Vi
are. Let µi : G → GL(Vi ) denote the representation. Recall from Exercise 5.1.5.1 the class function χµi (g) := trace(µi (g)). It is not difficult to
show that the functions χµi are linearly independent in C[G]. (One uses a
G-invariant inner-product and shows that they are orthogonal with respect
to the inner-product, see, e.g., [FH91, §2.2].) On the other hand, we have
a natural basis of the class functions, namely the δ-functions onP
each conjugacy class. Let Cj be a conjugacy class of G and define δCj := g∈Cj δg . It
is straight-forward to see, via the DFT, that the span of the δCj ’s equals the
span of the χµi ’s, that is the number of distinct irreducible representations
of G equals the number of conjugacy classes (see, e.g, [FH91, §2.2] for the
5.1. Double-Commutant Theorem and first consequences
97
standard proof and [GW09, §4.4] for a DFT proof). Unfortunately, in another manifestation of the uncertainty principle, the relation between these
two bases can be complicated. However, an apparent miracle occurs when
G = Sd : we will associate irreducible representations directly to conjugacy
classes.
A partition π = (p1 , . . . , pr ) of d is a set of integers p1 ≥ p2 ≥ · · · ≥ pr ,
pi ∈ N, such that p1 + · · · + pr = d. Write |π| = d and `(π) = r.
The conjugacy class of a permutation is determined by its decomposition
into a product of disjoint cycles. The conjugacy classes of Sd are in 1-1 correspondence with the set of partitions of d: to a partition π = (p1 , . . . , pr ) one
associates the conjugacy class of an element with disjoint cycles of lengths
p1 , . . . , pr . Let [π] denote the isomorphism class of the irreducible Sd -module
associated to π.
Thus in the case of Sd :
(5.1.2)
C[Sd ] =
M
[π]∗L ⊗[π]R
|π|=d
where I have included the L, R as subscripts to remind the reader that
different copies of Sd are acting on the different factors.
We can say even more, as Sd modules, [π] is isomorphic to [π]∗ . This is
usually proved by first noting that for any finite group G, and any irreducible
representation µ, χµ∗ = χµ where the overline denotes complex conjugate
and then observing that the characters of Sd are all real-valued functions.
Thus we may rewrite (5.1.2) as
(5.1.3)
C[Sd ] =
M
[π]L ⊗[π]R .
|π|=d
5.1.7. The algebraic Peter-Weyl theorem. All reductive algebraic groups
are affine varieties, so one can make sense of C[G] . Theorem 5.1.6.1 is still
valid for reductive algebraic groups with the same proof, except that one
has an infinite sum.
Theorem 5.1.7.1. Let G be a reductive algebraic group. Then there are
only countably many non-isomorphic irreducible G-modules. Let Λ+
G denote
a set indexing the irreducible G-modules, and let Vλ denote the irreducible
module associated to λ. Then, as a G × G-module
C[G] =
M
λ∈Λ+
G
Vλ ⊗Vλ∗ .
98
5. Representation theory for complexity theory
5.2. Representations of Sd and GL(V )
In this section we finally obtain our goal of the decomposition of V ⊗d as
a GL(V )-module. Representations of GL(V ) occurring in V ⊗ may also be
indexed by partitions, which is explained in §5.2.1 where Schur-Weyl duality
is stated and proved. I then implement the theory in §5.2.2.
5.2.1. Schur-Weyl duality. We have already seen that the actions of
GL(V ) and Sd on V ⊗d commute.
Proposition 5.2.1.1. EndGL(V ) (V ⊗d ) = C[Sd ].
Proof. We will show that EndC[Sd ] is the algebra generated by GL(V ) and
conclude by the double commutant theorem. Since
End(V ⊗d ) = V ⊗d ⊗(V ⊗d )∗
' (V ⊗V ∗ )⊗d
under the re-ordering isomorphism, End(V ⊗d ) is spanned by elements of the
form X1 ⊗ · · · ⊗ Xd with Xj ∈ End(V ), i.e., elements of Ŝeg(P(End(V )) ×
· · · × P(End(V ))). The action of X1 ⊗ · · · ⊗ Xd on v1 ⊗ · · · ⊗ vd induced from
the GL(V )-action is v1 ⊗ · · · ⊗ vd 7→ (X1 v1 )⊗ · · · ⊗ (Xd vd ). Since g ∈ GL(V )
acts by g · (v1 ⊗ · · · ⊗ vd ) = gv1 ⊗ · · · ⊗ gvd , the image of GL(V ) lies in
S d (V ⊗V ∗ ), in fact it is a Zariski open subset of v̂d (P(V ⊗V ∗ )) which spans
S d (V ⊗V ∗ ). In other words, the algebra generated by GL(V ) is S d (V ⊗V ∗ ) ⊂
End(V ⊗d ). But by definition S d (V ⊗V ∗ ) = (V ⊗V ∗ )Sd and we conclude. Applying Theorem 5.1.3.1 we obtain:
Theorem 5.2.1.2. [Schur-Weyl duality] The irreducible decomposition of
V ⊗d as a GL(V ) × C[Sd ]-module (equivalently, as a GL(V ) × Sd -module)
is
M
Sπ V ⊗[π],
(5.2.1)
V ⊗d =
|π|=d
where Sπ V := HomSd ([π], V ⊗d ) is an irreducible GL(V )-module.
Note that as far as we know, Sπ V could be zero. (It will be zero whenever
`(π) ≥ dim V .)
5.2.2. Explicit realizations of representations of Sd and GL(V ). By
Theorem 5.1.6.1 we may explicitly realize each irreducible Sd -module via
some projection from C[Sd ]. The question is, which projections?
To visualize π, define a Young diagram associated to a partition π to
be a collection of left-aligned boxes with pj boxes in the the j-th row, as in
Figure 5.2.1.
5.2. Representations of Sd and GL(V )
99
Figure 5.2.1. Young diagram for π = (4, 2, 1)
Define the conjugate partition π 0 to π to be the partition whose Young
diagram is the reflection of the Young diagram of π in the north-west to
south-east diagonal.
Figure 5.2.2. Young diagram for π 0 = (3, 2, 1, 1), the conjugate partition to π = (4, 2, 1).
Given π we would like to find elements cπ ∈ C[Sd ] such that C[Sd ]cπ is
isomorphic to [π]. I write π instead of just π because the elements are far
from unique; there is a vector space of dimension dim[π] of such projection
operators by Theorem 5.1.6.1, and the overline signifies a specific realization.
In other words, the Sd -module map RMcπ : C[Sd ] → C[Sd ], f 7→ f cπ should
kill all SR
d -modules not isomorphic to [π]R , and the image should be [π]L ⊗v
for some v ∈ [π]R . If this works, as a bonus, the map cπ : V ⊗d → V ⊗d
induced from the Sd -action will have image a copy of Sπ V for the same
reason.
Here are projection operators for the two representations we understand
well:
When π = (d), there is a unique up to scale c(d) and it is easy to see
P
it must be c(d) :=
σ∈Sd δσ , as the image of RMc(d) is clearly the line
through c(d) on which Sd acts trivially. Note further that c(d) (V ⊗d ) = S d V
as desired.
When π = (1d ), again wePhave a unique up to scale projection, and its
clear we should take c(1d ) = σ∈Sd sgn(σ)δσ as the image of any δτ will be
sgn(τ )c(1d ) , and c(1d ) (V ⊗d ) = Λd V .
The only other representation we have a reasonable understanding of is
the standard representation π = (d − 1, 1) which corresponds to the complement of the trivial representation in the permutation action on Cd . A basis
of this space could be given by e1 − ed , e2 − ed , . . . , ed−1 − ed . Note that
100
5. Representation theory for complexity theory
the roles of 1, . . . , d − 1 in this basis are the “same” in that if one permutes
them, one gets the same basis, and that the role of d with respect to any of
the other ej is “skew” in some sense. To capture this behavior, consider
X
δσ )
c(d−1,1) := (δId − δ(1,d) )(
σ∈Sd−1 [d−1]
where Sd−1 [d − 1] ⊂ Sd is the subgroup permuting the elements {1, . . . , d −
1}. Note that c(d−1,1) δτ = c(d−1,1) for any τ ∈ Sd−1 [d − 1] so the image is
of dimension at most d = dim(C[Sd ]/C[Sd−1 ]).
Exercise 5.2.2.1: Show that the image is in fact d − 1 dimensional.
Now consider RMc(d−1,1) (V ⊗d ): after re-orderings, it is the image of the
composition of the maps
V ⊗d → V ⊗d−2 ⊗Λ2 V → S d−1 V ⊗V.
In particular, in the case d = 3, it is the image of
V ⊗Λ2 V → S 2 V ⊗V,
which we saw was isomorphic to S21 V in §4.2.
Here is the general recipe to construct an Sd -module isomorphic to [π]:
fill the Young diagram of a partition π of d with integers 1, . . . , d from top
to bottom and left to right. For example let π = (4, 2, 1) and write:
(5.2.2)
1 4 6 7
2 5
3
Define Sπ0 ' Sq1 ×· · ·×Sqp1 ⊂ Sd to be the subgroup that preserves the
subsets of elements in the columns and Sπ is the subgroup of Sd permuting
the elements in the rows.
Explicitly, writing π = (p1 , . . . , pq1 ) and π 0 = (q1 , . . . , qp1 ), Sq1 permutes
the elements of {1, . . . , q1 }, Sq2 permutes the elements of {q1 +1, . . . , q1 +q2 }
etc. Similarly, Sπ ' Sp1 × · · · × Sp` ⊂ Sd is the subgroup where Sp1
permutes the elements {1, q1 + 1, q1 + q2 + 1, . . . , q1 + · · · + qp1 −1 + 1}, Sp2
permutes the elements {2, q1 + 2, q1 + q2 + 2, . . . , q1 + · · · + qp1 −1 + 2} etc..
P
P
Define two elements of C[Sd ]: sπ := σ∈Sπ δσ and aπ := σ∈S 0 sgn(σ)δσ .
π
Then [π] is defined to be the isomorphism class of the Sd -module C[Sd ]aπ sπ .
(It is also the isomorphism class of C[Sd ]sπ aπ , although these two realizations are generally distinct.)
Exercise 5.2.2.2: Show that [π 0 ] = [π]⊗[1d ] as Sd -modules. }
5.2. Representations of Sd and GL(V )
101
The space V ⊗d cπ will be a copy of the module Sπ V because cπ kills
all modules not isomorphic to π and maps [π] to a one dimensional vector space. The action on V ⊗d is first to map it to Λq1 V ⊗ · · · ⊗ Λqp1 V ,
and then the module Sπ V is realized as the image of the map from this
space to S p1 V ⊗ · · · ⊗ S pq1 V . So despite their original indirect definition,
we may realize the modules Sπ V explicitly simply be skew-symmetrizations
and symmetrizations.
Example 5.2.2.3. Consider c(2,2) , associated to
(5.2.3)
1 3
2 4
which realizes a copy of S(2,2) V ⊂ V ⊗4 . It first maps V ⊗4 to Λ2 V ⊗Λ2 V and
then maps that to S 2 V ⊗S 2 V . Explicitly, the maps are
a⊗b⊗c⊗c 7→ (a⊗b − b⊗a)⊗(c⊗d − d⊗c) = a⊗b⊗c⊗d − a⊗b⊗d⊗c − b⊗a⊗c⊗d + b⊗a⊗d⊗c
7→ (a⊗b⊗c⊗d + c⊗b⊗a⊗d + a⊗d⊗c⊗b + c⊗d⊗a⊗b)
− (a⊗b⊗d⊗c + d⊗b⊗a⊗c + a⊗c⊗d⊗b + d⊗c⊗a⊗b)
− (b⊗a⊗c⊗d + c⊗a⊗b⊗d + b⊗d⊗c⊗a + c⊗d⊗b⊗a)
+ (b⊗a⊗d⊗c + d⊗a⊗b⊗c + b⊗c⊗d⊗a + d⊗c⊗b⊗a)
5.2.3. Weights. Fix a basis e1 , . . . , ev of V , let T ⊂ GL(V ) denote the
subgroup of diagonal matrices, let B ⊂ GL(V ) be the subgroup of upper
triangular matrices, and let N ⊂ B be the upper triangular matrices with
1’s along the diagonal. (The notation is chosen because T is usually called
a torus, B a Borel subgroup and the Lie algebra n of N consists of nilpotent
matrices.) Call z ∈ V ⊗d a weight vector if T [z] = [z]. If


x1


p
p
..

 z = (x1 ) 1 · · · (xv ) v z
.
xv
we say z has weight π = (p1 , . . . , pv ) ∈ Zv .
Call z a highest weight vector if B[z] = [z], i.e., if N z = z. If M is an
irreducible GL(V )-module and z ∈ M is a highest weight vector, call the
weight of z the highest weight of M .
If G = GL(A1 ) × · · · × GL(An ) and V = V1 ⊗ · · · ⊗ Vn with Vj a module
for GL(Aj ), we define the maximal torus in G to be the product of the
maximal tori in the GL(Aj ), and similarly for the Borel. A weight is then
defined to be an n-tuple of weights etc...
Exercise 5.2.3.1: Show that if z ∈ V ⊗d is a highest weight vector of weight
π, then π is a partition of d.
102
5. Representation theory for complexity theory
Because of the relation with weights, it will often be convenient to pad
a partition with a string of zeros to make it a string of v integers.
Exercise 5.2.3.2: Find highest weight vectors in V, S 2 V, Λ2 V, S 3 V, Λ3 V and
the two realizations of S21 V given by the kernels of the symmetrization and
skew-symmetrization maps V ⊗S 2 V → S 3 V and V ⊗Λ2 V → Λ3 V .
Exercise 5.2.3.3: More generally, find a highest weight vector for the kernel
of the symmetrization map V ⊗S d−1 V → S d V and of the kernel of the
“exterior derivative” (or “Koszul”) map
(5.2.4)
S k V ⊗Λt V → S k−1 V ⊗Λt+1 V
x1 · · · xk ⊗y1 ∧ · · · ∧ yt 7→
k
X
x1 · · · x̂j · · · xk ⊗xj ∧ y1 ∧ · · · ∧ yt .
j=1
A necessary condition for two irreducible GL(V )-modules to be isomorphic is that they have the same highest weight (because they must also
be isomorphic T -modules). The condition is also sufficient, see Exercise
5.2.5.(2).
5.2.4. Dimension formulas. Given a Young diagram, and a square x in
the diagram, define h(x), the hook length of x to be the number of boxes
to the right of x in the same row, plus the number of boxes below x in the
same column, plus one. Then for π a partition of d,
(5.2.5)
dim[π] =
d!
.
Πx∈π h(x)
For a box x occurring in a Young diagram, define the content of x,
denoted c(x), to be 0 if x is on the diagonal, s if x is s steps above the
diagonal, and −s if x is s steps below the diagonal. Then
(5.2.6)
dim Sπ Cn =
Y n + c(x)
.
h(x)
x∈π
See any of, e.g., [FH91, Pro07, Mac95] for proofs of these formulas.
3 1
For example dim[(2, 1)] = 2 =
because the hook lengths are 1
0 1
n(n+1)(n−1)
−1
and dim S21 V =
because the contents are
.
3∗1∗1
3!
3∗1∗1
5.2.5. Exercises: Explicit submodules of V ⊗d isomorphic to Sπ V
and [π].
5.2. Representations of Sd and GL(V )
103
(1) Let π = (p1 , . . . , p` ) be a partition with at most v parts and let
π 0 = (q1 , . . . , qp1 ) denote the conjugate partition. Show that
(5.2.7)
zπ := (e1 ∧ · · · ∧ eq1 )⊗(e1 ∧ · · · ∧ eq2 )⊗ · · · ⊗(e1 ∧ · · · ∧ eqp1 )
is a highest weight vector of weight π.
(2) Show that zπ = [(e1 ⊗ · · · ⊗eq1 )⊗(e1 ⊗ · · · ⊗eq2 )⊗ · · · ⊗(e1 ⊗ · · · ⊗eqp1 )]sπ aπ
and thus C[Sd ]zπ is isomorphic to [π] and the highest weight of Sπ V
is (as the notation suggests) π. This also shows that the irreducible
GL(V )-modules in V ⊗d are determined up to equivalence by their
highest weight.
(3) Show that for any non-negative integers (p1 , . . . , pv ) with p1 + · · · +
pv = d, the set of vectors of weight (p1 , . . . , pv ) in V ⊗d is a Sd submodule.
(4) Show that if z is a highest weight vector, then so is σ · z for all
σ ∈ Sd .
(5) Show that Sπ V ⊗[π] is the span of the GL(V ) × Sd -orbit of zπ .
(6) Verify directly that S(d) V, S(1d ) V , and S(d−1,1) V respectively have
multiplicities 1, 1, d − 1 in V ⊗d by computing the Sd -orbits of the
corresponding zπ ’s. }
(7) Let t ⊂ End(V ) denote the diagonal matrices and b the upper
triangular matrices. Write b = t ⊕ n where n is the space of strictly
upper-triangular matrices. Show that each of these spaces are Lie
subalgebras of gl(V ), i.e., they are closed under the Lie bracket.
(8) Show that z ∈ V ⊗d is a highest weight vector if and only if n.z = 0.
(9) Show that Sπ+µ V ⊂ Sπ V ⊗Sµ V . Here, if π = (p1 , . . . , pv ) and µ =
(m1 , . . . , mv ), then π + µ = (p1 + m1 , . . . , pv + mv ). In particular
S a+b V ⊂ S a V ⊗S b V , and the map S a V × S b V → S a+b V is just
multiplication of polynomials. }
By the exercises above, to determine a copy of Sπ V in V ⊗d , it is sufficient
to obtain its highest weight vector. Two ways to obtain a basis of the space
of highest weight vectors of the isotypic component of Sπ V in V ⊗d are as
follows: First, exercise 5.2.5(5) provides an algorithm to obtain spanning
set from which we can extract a basis. Second, define a standard Young
tableau to be a Young diagram filled with integers 1, . . . , d where entries
are increasing both along the rows and down the columns. Then for each
standard Young tableau, form the corresponding projection operator cπ ,
⊗pv
1
and the vectors (e⊗p
1 ⊗ · · · ⊗ ev )cπ will be a basis of the space of highest
weight vectors. See e.g. [FH91] for details. I will use the notation Sπ V for
a specific copy of Sπ V realized in V ⊗d .
104
5. Representation theory for complexity theory
5.2.6. Polynomials on spaces of matrices and the Cauchy formula.
To study polynomials on spaces of matrices (respectively tensors), we need
to decompose S d (A⊗B) (resp. S d (A⊗B⊗C)) as GL(A) × GL(B) (resp.
GL(A) × GL(B) × GL(C)) module. To obtain the first decomposition, write
S d (A⊗B) = ((A⊗B)⊗d )Sd
= (A⊗d ⊗B ⊗d )Sd
M
=
([π]⊗Sπ A⊗[µ]⊗Sµ B)Sd
π,µ
=
M
([π]⊗[µ])Sd ⊗Sπ A⊗Sµ B.
π,µ
Since representations of Sd are self-dual, ([π]⊗[µ])Sd = HomSd ([π], [µ]). By
Schur’s lemma we conclude:
Proposition 5.2.6.1. As a GL(A) × GL(B)-module,
M
(5.2.8)
S d (A⊗B) =
Sπ A⊗Sπ B.
|π|=d,
`(π)≤min{a,b}
Formula (5.2.8) is called the Cauchy formula.
Exercise 5.2.6.2: Show that
Id (Seg(PA × PB)) =
M
Sπ A⊗Sπ B.
|π|=d,
`(π)≤min{a,b},π6=(d)
What is Id (σr (Seg(PA × PB)))?
Exercise 5.2.6.3: Show that as a GL(A) × GL(B)-module,
M
(5.2.9)
Λd (A⊗B) =
Sπ A⊗Sπ0 B.
|π|=d,
`(π)≤a, p1 ≤b
}
Exercise 5.2.6.4: Show that
mult(Sπ1 A1 ⊗ · · · ⊗ Sπn An , S d (A1 ⊗ · · · ⊗ An )) = mult([d], [π1 ]⊗ · · · ⊗ [πn ]).
Exercise 5.2.6.5: Show that the polynomials in S p (A⊗B⊗C) arising from
the usual flattenings Λp (A⊗B)⊗Λp C, Λp (A⊗C)⊗Λp B, Λp (B⊗C)⊗Λp A are
such that when p = 2, any third is in the span of the other two, but any
two are not completely redundant, and that for p = 3 all three are needed
to span the space of equations.
Recall that SL(V ) ⊂ GL(V ) is the subgroup that acts trivially on Λv V ,
i.e., the maps V → V with determinant equal to one.
5.2. Representations of Sd and GL(V )
105
Example 5.2.6.6. Let A, B = Cn . By the Cauchy formula, S n (A⊗B)
contains a unique vector up to scale that is invariant under SL(A) × SL(B).
This vector is detn ∈ Λn A⊗Λn B. To see this explicitly, a highest weight
vector of Λn A⊗Λn B is
X
X
1
sgn(τ )fτ (1) ⊗ · · · ⊗ fτ (n) ).
(e1 ∧· · ·∧en )⊗(f1 ∧· · ·∧fn ) =
sgn(σ)e
⊗
·
·
·
⊗
e
)⊗(
(
σ(1)
σ(n)
(n!)2
τ ∈Sn
σ∈Sn
(A⊗B)⊗n
Re-order to express this as an element of
and write
obtain
X
1
σ(1)
σ(n)
sgn(σ)sgn(τ )xτ (1) ⊗ · · · ⊗ xτ (n)
2
(n!)
xij
= ei ⊗fj to
σ,τ ∈Sn
The symmetrization map can be accomplished efficiently notationally by
simply erasing the ⊗ symbols and multiplying by n!. Thus the image of this
vector in S n (E⊗F ) under the projection given by symmetrization is
1 X X
σ(1)
σ(n)
(
sgn(σ)sgn(τ )xτ (1) · · · xτ (n) )
n!
σ∈Sn τ ∈Sn
We think of this as first summing over τ , so for each fixed σ in the sum over
τ , we can re-order the elements by σ −1 to obtain
X
1 X X
(
sgn(σ)sgn(τ )x1σ−1 τ (1) · · · xnσ−1 τ (n) ) =
sgn(σ −1 τ )x1σ−1 τ (1) · · · xnσ−1 τ (n)
n!
−1
σ∈Sn τ ∈Sn
σ
τ ∈Sn
which is the familiar formula for the determinant if one renames σ −1 τ as σ.
Fixing a torus T , we may talk about permutation matrices. Let WA ⊂
GL(A) denote the subgroup of permutation matrices. (The notation is chosen because WA is the Weyl group of GL(A).) The normalizer of T in GL(A)
is T o WA , where WA acts by conjugation.
Example 5.2.6.7. Let A, B = Cn . Then S n (A⊗B) contains a unique
vector up to scale invariant under (TA o WA ) × (TB o WB ). This vector
is permn . This vector lives in the unique weight (1, . . . , 1)A × (1, . . . , 1)B
line inside S n A⊗S n B. To recognize this as the permanent, write the weight
vector in S n A⊗S n B as (e1 · · · en )⊗(f1 · · · fn ) reorder as above and write
P
σ(1)
σ(n)
xij = ei ⊗fj to get σ,τ ∈Sn xτ (1) ⊗ · · · ⊗ xτ (n) and continue the calculation
as before to obtain the familiar formula for the permanent when one projects
to S n (E⊗F ).
Remark 5.2.6.8. From Example 5.2.6.7, one can begin to appreciate the
beauty of the permanent. Since detn is the only polynomial invariant under
SL(E) × SL(F ), to find other interesting polynomials on spaces of matrices,
one has to be content with subgroups of this group. But what could be
a more natural subgroup than the product of the normalizer of the tori?
106
5. Representation theory for complexity theory
In fact, say we begin by asking simply for a polynomial invariant under the
action of TE ×TF . We need to look at S n (E⊗F )0 , where the 0 denotes the slweight zero subspace. This decomposes as ⊕π (Sπ E)0 ⊗(Sπ F )0 . We will see
F
(Corollary ??(i)) that these spaces are the SE
n × Sn -modules [π]⊗[π]. Only
n
n
one of these is trivial, namely [n]⊗[n] ⊂ S E⊗S F and that corresponds
to the permanent! More generally, if we consider the diagonal Sn ⊂ SE
n ×
SFn , then both [π]’s are modules for the same group, and since [π] ' [π]∗ ,
there is then an element corresponding to the identity map. These vectors
are Littlewood’s immanants, of which the determinant and permanent are
special cases.
5.2.7. The Pieri rule. Let’s decompose Sπ V ⊗V as a GL(V )-module.
Write π 0 = (q1 , . . . , qp1 ) and recall zπ from (5.2.7). Consider the vectors:
(e1 ∧ · · · ∧ eq1 ∧ eq1 +1 )⊗(e1 ∧ · · · ∧ eq2 )⊗ · · · ⊗(e1 ∧ · · · ∧ eqp1 )
..
.
(e1 ∧ · · · ∧ eq1 )⊗(e1 ∧ · · · ∧ eq2 )⊗ · · · ⊗(e1 ∧ · · · ∧ eqp1 ∧ eqp1 +1 )
(e1 ∧ · · · ∧ eq1 )⊗(e1 ∧ · · · ∧ eq2 )⊗ · · · ⊗(e1 ∧ · · · ∧ eqp1 )⊗e1 .
These are all highest weight vectors obtained from zπ tensored with a vector
in V and skew-symmetrizing appropriately, so the associated modules are
contained in Sπ V ⊗V . With a little more work, one can show these are
highest weight vectors of all the modules that occur in Sπ V ⊗V . If qj =
qj+1 one gets the same module if one inserts eqj +1 into either slot, but its
multiplicity in Sπ V ⊗V is one. More generally one obtains:
Theorem 5.2.7.1 (The Pieri formula). The decomposition of Sπ V ⊗S d V is
multiplicity free. The partitions corresponding to modules Sµ V that occur
are those obtained from the Young diagram of π by adding d boxes to the
diagram of π, with no two boxes added to the same column.
Definition 5.2.7.2. Let π, µ be partitions with `(µ) < `(π) One says µ
interlaces π if p1 ≥ m1 ≥ p2 ≥ m2 ≥ · · · ≥ m`(π)−1 ≥ p`(π) .
Exercise 5.2.7.3: Show that Sπ V ⊗Sd V consists of all the Sµ V such that
|µ| = |π| + d and π interlaces µ.
Although a pictorial proof is possible, the standard proof of the Pieri
formula uses a character calculation, computing χπ χ(d) as a sum of χµ ’s.
See, e.g., [Mac95, §I.9]. A different proof, using Schur-Weyl duality is in
[GW09, §9.2]. There is an algorithm to compute arbitrary tensor product
decompositions called the Littlewood Richardson Rule. See, e.g., [Mac95,
§I.9] for details.
Similar considerations give:
5.2. Representations of Sd and GL(V )
107
Theorem 5.2.7.4. [The skew-Pieri formula] The decomposition of Sπ V ⊗Λk V
is multiplicity free. The partitions corresponding to modules Sµ V that occur are those obtained from the Young diagram of π by adding k boxes to
the diagram of π, with no two boxes added to the same row.
5.2.8. The GL(V )-modules not appearing in V ⊗ . The GL(V )-module
V ∗ does not appear in the tensor algebra of V . Nor do the one-dimensional
representations det−k : GL(V ) → GL(C1 ) given by, for v ∈ C1 , det−k (g)v :=
det(g)−k v.
Exercise 5.2.8.1: Show that if π = (p1 , . . . , pv ) with pv > 0, then det−1 ⊗Sπ V =
S(p1 −1,...,pv −1) V .
Every irreducible GL(V )-module is of the form Sπ V ⊗ det−k for some
k ≥ 0. Thus they may be indexed by non-increasing sequences of integers
(p1 , . . . , pv ) where p1 ≥ p2 ≥ · · · ≥ pv . Such a module is isomorphic to
S(p1 −pv ,...,pv−1 −pv ,0) V ⊗ detpv .
Exercise 5.2.8.2: Show that as a GL(V )-module, V ∗ = Λv−1 V ⊗ det−1 =
S1v−1 V ⊗ det−1 .
Using
Sπ V ⊗V ∗ = Sπ V ⊗Λv−1 V ⊗ det−1 ,
we may compute the decomposition of Sπ V ⊗V ∗ using the skew-symmetric
version of the Pieri rule.
Example 5.2.8.3. Let w = 3, then
S(32) W ⊗W ∗ = S(43) W ⊗ det−1 ⊕S(331) W ⊗ det−1 ⊕S(421) W ⊗ det−1
= S(43) W ⊗ det−1 ⊕S(22) W ⊕ S(31) W
so the first module does not occur in the tensor algebra but the rest do.
5.2.9. SL(V )-modules in V ⊗d . Fact: Every SL(V )-module is the restriction to SL(V ) of some GL(V )-module. However distinct GL(V )-modules,
when restricted to SL(V ) can become isomorphic, such as the trivial representation and Λv V .
Proposition 5.2.9.1. Let π = (p1 , . . . , pv ) be a partition. The SL(V )modules in the tensor algebra V ⊗ that are isomorphic to Sπ V are Sµ V with
µ = (p1 + j, p2 + j, . . . , pv + j) for −pv ≤ j < ∞.
Exercise 5.2.9.2: Prove Proposition 5.2.9.1. }
For example, for SL2 -modules, Sp1 ,p2 C2 ' S p1 −p2 C2 . We conclude:
Corollary 5.2.9.3. A complete set of the finite dimensional irreducible representations of SL2 are the S d C2 with d ≥ 0.
108
5. Representation theory for complexity theory
The GL(V )-modules that are SL(V )-equivalent to Sπ V may be visualized as being obtained by erasing or adding columns of size v from the
Young diagram of π.
,
,
,···
,
Figure 5.2.3. Young diagrams for SL3 -modules equivalent to S421 C3
5.2.10. The kernel of (MhU,V,C1 i )∧p
A . Consider the case u = v = 3, take
p = 4, so we need to study
∧4
4
∗
5
∗
∗
MhU,V,C
1 i : Λ (U ⊗V )⊗V → Λ (U ⊗V )⊗V .
Since v = u = 3, we have Λ4 (U ∗ ⊗V ) = S211 U ∗ ⊗S31 V ⊕ S22 U ∗ ⊗S22 V ⊕
S31 U ∗ ⊗S211 V . The Pieri rule implies
Λ4 (U ∗ ⊗V )⊗V =S31 U ∗ ⊗S311 V ⊕ S31 U ∗ ⊗S221 V ⊕ S22 U ∗ ⊗S32 V ⊕ S22 U ∗ ⊗S32 V ⊕ S22 U ∗ ⊗S221 V
⊕ S211 U ∗ ⊗S41 V ⊕ S211 U ∗ ⊗S32 V ⊕ S211 U ∗ ⊗S311 V.
Pictorially,
⊗
=
⊕
⊗
=
⊕
⊗
=
⊕
⊕
.
The kernel must contain all modules that do not appear in
Λ (U ∗ ⊗V )⊗V ∗ = S31 U ∗ ⊗S221 V ⊕ S22 U ∗ ⊗S221 V )⊗V
5
= S31 U ∗ ⊗S311 V ⊕ S211 U ∗ ⊗S311 V ⊕ S22 U ∗ ⊗S32 V ⊕ S211 U ∗ ⊗S32 V
⊕ (S43 U ∗ ⊗ det−1
U )⊗S221 V
So the module S41 U ∗ ⊗S221 V coming from the the third picture is in
the kernel and (S43 U ∗ ⊗ det−1
U )⊗S221 V is not in the image. The map is an
isomorphism on the remaining factors.
5.3. Methods for determining equations of secant varieties of Segre varieties using representation theory
109
For example, consider S31 U ∗ ⊗S311 V ⊂ Λ4 (U ∗ ⊗V )⊗V . We may write a
highest weight vector of S31 U ∗ ⊗S311 V ⊂ U ∗⊗4 ⊗V ⊗5 as
e1 ∧ e2 ⊗e1 ⊗e1 ⊗f1 ∧ f2 ∧ f3 ⊗f1 ⊗f1 ∈ S31 U ∗ ⊗S311 V.
This embeds into Λ4 (U ∗ ⊗V )⊗V as follows:
(e1 ⊗f1 ) ∧ (e2 ⊗f1 ) ∧ (e1 ⊗f2 ) ∧ (e1 ⊗f3 )⊗f1 .
Observe that this expression is still skew in the first e1 , e2 as well as the first
f 1 , f 2 , f 3 . When we map this to Λ5 (U ∗ ⊗V )⊗U we get
3
X
(e1 ⊗f1 ) ∧ (e2 ⊗f1 ) ∧ (e1 ⊗f2 ) ∧ (e1 ⊗f3 ) ∧ (uj ⊗f1 )⊗uj
j=1
= (e1 ⊗f1 ) ∧ (e2 ⊗f1 ) ∧ (e1 ⊗f2 ) ∧ (e1 ⊗f3 ) ∧ (e3 ⊗f1 )⊗e3
6= 0.
∧4
Thus the rank of Mhl,3,3i
is (3 ·
d
306l
9
4
− 72)l = 306l and R(Mh3,3,li ) ≥
d 306l
70 e
e=
which is 14 when l = 3.
(84)
More generally:
L
∗
l
Proposition 5.2.10.1. [LO] ker(Mhu,v,li )∧p
π Sπ 0 U ⊗Sπ+(1) V ⊗C where
A ⊇
the summation is over partitions π = (u, ν1 , . . . , νv−1 ) where ν = (ν1 , . . . , νv−1 )
is a partition of p − u, ν1 ≤ u and π + (1) = (u + 1, ν1 , . . . , νv−1 ).
Exercise 5.2.10.2: Show that the kernel contains the asserted modules.
Thus one expects that if one restricts to a sufficiently generic subspace
A0 of A∗ of dimension 2u + 1, the reduced map Λu−1 Ã⊗V → Λu Ã⊗U ∗ , will
be injective. This is what was done in the proof of Theorem 2.5.8.1.
5.3. Methods for determining equations of secant varieties of
Segre varieties using representation theory
***This section will be expanded****
5.3.1. Inheritance. Inheritance is a general technique for studying equations of G-varieties that come in series. It is discussed extensively in [Lan12,
§7.4,16.4], so I will just give an overview here.
If V ⊂ W then Sπ V induces a module Sπ W by, e.g., choosing a basis
of W whose first v vectors are a basis of V . Then the two modules have
the same highest weight vector and one obtains the GL(W )-module by the
action of GL(W ) on the highest weight vector.
Because the realizations of Sπ V in V ⊗d do not depend on the dimension of V , one can reduce the study of σr (Seg(PA × PB × PC)) to that of
110
5. Representation theory for complexity theory
σr (Seg(Pr−1 ×Pr−1 ×Pr−1 )). Note that this latter variety is an orbit closure,
⊕r
namely the orbit closure of the so called unit tensor Mh1i
.
Proposition 5.3.1.1. [LM04, Prop. 4.4] For all vector spaces Bj with
dim Bj = bj ≥ dim Aj = aj ≥ r, a module Sµ1 B1 ⊗ · · · ⊗Sµn Bn such that
`(µj ) ≤ aj for all j, is in Id (σr (Seg(PB1∗ × · · · × PBn∗ ))) if and only if
Sµ1 A1 ⊗ · · · ⊗Sµn An is in Id (σr (Seg(PA∗1 × · · · × PA∗n ))).
See [Lan12, §7.4] for the proof.
Thus if dim Aj = r and `(µj ) ≤ r,
Sµ1 A1 ⊗ · · · ⊗Sµ1 An ∈ I(σr (Seg(PA∗1 × · · · × PA∗n )))
if and only if
Sµ1 C`(µ1 ) ⊗ · · · ⊗Sµ1 C`(µn ) ∈ I(σr (Seg(P`(µ1 )−1 × · · · × P`(µn )−1 ))).
In summary:
Corollary 5.3.1.2. [LM04, AR03] Let dim Aj ≥ r, 1 ≤ j ≤ n. The ideal
of σr (Seg(PA1 × · · · × PAn )) is generated by the modules inherited from the
ideal of σr (Seg(Pr−1 × · · · × Pr−1 )) and the modules generating the ideal of
Subr,...,r . The analogous scheme and set-theoretic results hold as well.
5.3.2. Brute force Strassen. to type in
5.3.3. Young flattenings. There is an inclusion of modules A⊗B⊗C ⊂
(Sπ1 A⊗Sµ1 B⊗Sν1 C)∗ ⊗(Sπ2 A⊗Sµ2 B⊗Sν2 C) if and only if the Young diagram of π2 is obtained from that of π1 by adding a box, and similarly for
the other two factors. In each such situation we obtain, given T ∈ A⊗B⊗C,
a linear map:
(5.3.1)
T(π1 ,µ1 ,ν1 ),(π2 ,µ2 ,ν2 ) : Sπ1 A⊗Sµ1 B⊗Sν1 C → Sπ2 A⊗Sµ2 B⊗Sν2 C.
Say we are trying to find equations for σr (Seg(PA × PB × PC)). Let
[a⊗b⊗c] ∈ Seg(PA × PB × PC). Then
(5.3.2)
R(T ) ≥
rankT(π1 ,µ1 ,ν1 ),(π2 ,µ2 ,ν2 )
rank(a⊗b⊗c)(π1 ,µ1 ,ν1 ),(π2 ,µ2 ,ν2 )
The Koszul flattenings of §2.5.6 are the case π1 = (1p ), π2 = (1p+1 ),
ν1 = ∅, ν2 = (1), µ1 = (−1), µ2 = ∅. (B ∗ does not occur in the tensor
algebra of B, a simple way to account for it is to represent it by a Young
diagram with negative one boxes.)
Young flattenings were originally used for symmetric border rank, i.e.,
to find equations for secant varieties of Veronese varieties in [LO13]. There
the problem is to study inclusions S d V ⊂ Sπ V ⊗Sµ V . I’ll return to this in
our study of geometric complexity theory.
5.3. Methods for determining equations of secant varieties of Segre varieties using representation theory
111
5.3.4. Algorithms for determining the ideals of secant varieties of
Segre varieties and proving lower bounds for R(Mhni ). Write
M
(Sπ A⊗Sµ B⊗Sν C)⊕kπ,µ,ν .
(5.3.3)
S d (A⊗B⊗C) =
π,µ,ν
The multiplicities kπ,µ,ν are called Kronecker coefficients. By Exercise 5.2.6.4,
they equal dim([π]⊗[µ]⊗[ν])Sd , so they can be calculated via characters of
the symmetric group:
1 X
(5.3.4)
kπ,µ,ν =
χµ (σ)χπ (σ)χν (σ)
d!
σ∈Sd
where recall χπ (σ) = trace(ρπ (σ)) where ρπ : Sd → GL([π]) is the representation [π] (see, e.g., [Lan12, §6.6.2]).
Exercise 5.3.4.1: Show that kπ,µ,ν dim HomSd ([π], [µ]⊗[ν]).
Using the decomposition, one has systematic ways to determine equations for σr :
Naı̈ve algorithm using representation theory. For each d, decompose
S d (A⊗B⊗C). For each isotypic component, Sπ A⊗Sµ B⊗Sν C⊗([π]⊗[µ]⊗[ν])Sd ⊂
S d (A⊗B⊗C) write down a basis of highest weight vectors, v1 , . . . , vkπ,µ,ν .
Then pick kπ,µ,ν “random” points on σr (Seg(PA∗ × PB ∗ × PC ∗ )) and test
c1 v1 + · · · + ckπ,µ,ν vkπ,µ,ν on these points, where the cj are arbitrary. If for
some choice of cj one gets zero, then with high probability the corresponding
module is in Id (σr (Seg(PA∗ × PB ∗ × PC ∗ )).
If a = b = c = n2 , on can then test this module on Mhni . Say P ∈
Sπ A⊗Sµ B⊗Sν C is such that P (Mhni ) 6= 0, then .....
5.3.5. Enhanced search using numerical methods. While we don’t
have equations for σr (Seg(PA∗ × PB ∗ × PC ∗ )), we do have a simple parametric description of a large open subset of it via the addition
map A⊕r ×
P
r
B ⊕r ×C ⊕r → A⊗B⊗C, (a1 , . . . , ar , b1 , . . . , br , c1 , . . . , cr ) 7→ j=1 aj ⊗bj ⊗cj .
One picks a “random” point of σr (Seg(PA∗ × PB ∗ × PC ∗ )), then takes a
general linear space of complementary dimension intersecting it. The intersection will be a finite collection of points. The parametric description
enables one **?** to find other points of σr (Seg(PA∗ ×PB ∗ ×PC ∗ )). Say we
find q of them. One can then systematically search for equations vanishing
on these points. Once one finds the equations say of degree δ vanishing on
it, one looks for additional points on σr (Seg(PA∗ × PB ∗ × PC ∗ )) to see if
the polynomial also vanishes on it. If not, add the new point and check
if any polynomials vanish on it. If not, go to degree δ + 1, if so, add another point. One continues in this fashion, and there are methods to tell
112
5. Representation theory for complexity theory
one when to stop adding points and when one has found the ideal of the
point set. If everything has been done properly one now knows the dimension of the space of generators in the ideal of σr (Seg(PA∗ × PB ∗ × PC ∗ )) in
each degree. Now we use that the ideal of σr (Seg(PA∗ × PB ∗ × PC ∗ )) is a
G = GL(A) × GL(B) × GL(C)-module, we decompose S d V as a G-module.
If one is lucky, there are only a few possible irreducible modules whose dimensions add to the correct dimension. One is then reduced to just checking
these candidates. These methods were used to prove the following theorem:
Theorem 5.3.5.1. [HIL13] With extremely high probability, the ideal of
σ6 (Seg(P3 ×P3 ×P3 )) is generated in degree 19 by the module S5554 C4 ⊗S5554 C4 ⊗S5554 C4 .
This module does not vanish on Mh2i .
In the same paper, the trivial degree twenty module S5555 C4 ⊗S5555 C4 ⊗S5555 C4
is shown to be in the ideal of σ6 (Seg(P3 × P3 × P3 )) by symbolic methods,
giving a proof that R(Mh2i ) = 7. This was originally shown in [Lan06] by
a study of the components of σ6 (Seg(P3 × P3 × P3 ))\σ60 (Seg(P3 × P3 × P3 )).
The same methods have shown I45 (σ15 (Seg(P3 × P7 × P0 )) = 0 and that
I186,999 (σ18 (Seg(P6 × P6 × P6 )) = 0 (this variety is a hypersurface), both of
which are relevant for determining the border rank of Mh3i , see [HIL13].
5.3.6. Geometric Complexity Theory approach to equations. By
inheritance discussed below, in determining the ideal of σr (Seg(PA × PB ×
PC)) we may restrict attention to σr (Seg(Pr−1 ×Pr−1 ×Pr−1 )) which has the
⊕r
advantage that it is an orbit closure, the orbit closure of Mh1i
. In §6.2 we will
see how to use the Peter-Weyl theorem to in principle obtain a description
of the module structure of the coordinate ring of the orbit. From that, as
explained in §6.2, if one understands the module structure of the space of
polynomials, one can sometimes find modules that must be in the ideal of
σr (Seg(PA × PB × PC)).
Chapter 6
Geometric Complexity
Theory
This chapter is devoted to the flagship conjecture of Geometric Complexity
Theory 1.2.5.1 that permm cannot be realized as a polynomial sized determinant, even approximately. It begins in §6.1 with a discussion of circuits
and brief history and context. Next, §6.2 discusses the approach to the flagship conjecture proposed in [MS01, MS08]. This approach requires one
to search for GL(W )-modules with certain properties. In §6.3 I describe
two ways to narrow the search. The preprint [Mula] is discussed in §6.4:
this relates blackbox derandomization with the search for an explicit large
linear space that fails to intersect the orbit closure of the determinant. The
preprint [Mulb] is the subject of §6.5, where a case is made for the existence of representation-theoretic obstructions as defined in §6.2.2. The best
lower bound for the flagship conjecture was obtained via classical algebraic
geometry - the study of hypersurfaces with degenerate dual varieties. An
exposition of this result is presented in §6.6. The discussion in §6.2 leads
to questions regarding Kronecker and plethysm coefficients. A summary of
our knowledge of these coefficients relevant for the flagship conjecture is presented in §6.7. To better understand the flagship conjecture it will be useful
to make similar studies of other polynomials. Several such are discussed in
§6.8.
Unlike matrix multiplication, this is a subject in its infancy. There are
almost no mathematical results, so the exposition will be focused not so
much on the accomplishments, but on mathematics that I expect to be
useful in the future for the subject.
113
114
6. Geometric Complexity Theory
6.1. Circuits and the flagship conjecture of GCT
To be precise about the conjectural differences between the complexity of
the permanent and the determinant, one needs a model of computation. I
will use the model of arithmetic circuits.
6.1.1. Circuits.
Definition 6.1.1.1. An arithmetic circuit C is a finite, acyclic, directed
graph with vertices of in-degree 0 or 2 and exactly one vertex of out-degree
0. The vertices of in-degree 0 are labeled by elements of C ∪ {x1 , . . . , xn },
and called inputs. Those of in-degree 2 are labeled with + or ∗ and are
called gates. If the out-degree of v is 0, then v is called an output gate. The
size of C is the number of edges.
x
y
+
*
*
Figure 6.1.1. Circuit for (x + y)3
Exercise 6.1.1.2: Show that if one instead uses the number of gates to define the size, the asymptotic size estimates are the same. (Size is sometimes
defined as the number of gates.)
To each vertex v of a circuit C, associate the polynomial that is computed
at v, which will be denoted Cv . In particular the polynomial associated with
the output gate is called the polynomial computed by C.
At first glance, circuits do not look geometrical, as they depend on a
choice of coordinates. While computer scientists always view polynomials as
being given in some coordinate expression, in geometric complexity theory
one erases the choice of coordinates by allowing the group of all changes
of bases to act. The reason circuits will still be reasonable is that one is
not concerned with the precise size of a circuit, but its size up to, e.g., a
polynomial factor. At a small price of an additional layer of computation,
6.1. Circuits and the flagship conjecture of GCT
115
one can think of the inputs to our circuits as arbitrary affine linear or linear
functions on a vector space. We will soon homogenize in which case the
inputs will become linear functions on a vector space.
6.1.2. Arithmetic circuits and complexity classes. Let C[x1 , . . . , xN ]
denote the space of all polynomials in x1 , . . . , xN and let C[x1 , . . . , xN ]≤d
denote the space of polynomials of degree at most d.
Definition 6.1.2.1. Let (fn ) be a sequence of polynomials, fn ∈ C[x1 , . . . , xN (n) ]≤d(n) .
We say (fn ) ∈ VP if deg(fn ) and N (n) are bounded by a polynomial in n
and there exists a sequence of circuits Cn of size polynomial in n computing
fn .
Often the phrase “there exists a sequence of circuits Cn of size polynomial
in n computing fn ” is abbreviated “there exists a polynomial sized circuit
computing (fn )”.
The class VNP, which consists of sequences of polynomials whose coefficients are “easily” described, has a more complicated definition:
Definition 6.1.2.2. A sequence (fn ) is defined to be in VNP if there exists
a polynomial p and a sequence (gn ) ∈ VP such that
X
fn (x) =
gn (x, ).
∈{0,1}p(|x|)
One may think of the class VP as a bundle over VNP where elements of
VP are sequences of maps, say gn : CN (n) → C, and elements of VNP are
projections of these sequences obtained by eliminating some of the variables
by averaging or “integration over the fiber”. In algebraic geometry, it is well
known that projections of varieties can be far more complicated than the
original varieties. See [Bas14] for more on this perspective.
Conjecture 6.1.2.3. [Valiant [Val79b]] VP 6= VNP.
Definition 6.1.2.4. One says that a sequence (gm (y1 , . . . , yM (m) )) can be
reduced to (fn (x1 , . . . , xN (n) )) if there exists a polynomial n(m) and affine
linear functions X1 (y1 , . . . , yM ), . . . , XN (y1 , . . . , yM ) such that gm (y1 , . . . , yM (m) ) =
fn (X1 (y), . . . , XN (n) (y)). A sequence (pn ) is hard for a complexity class C
if every (fm ) ∈ C can be reduced to (pn ), and it is complete for C if furthermore (pn ) ∈ C.
Theorem 6.1.2.5. [Valiant [Val79b]] (permm ) is complete for VNP.
See the original [Val79b] or [BCS97, §21.4] for the proof.
Thus Conjecture 6.1.2.3 is equivalent to:
Conjecture 6.1.2.6. [Valiant][Val79b] There does not exist a polynomial
size circuit computing the permanent.
116
6. Geometric Complexity Theory
If one accepts that counting the number of perfect matchings of a bipartite graph is a reasonable “difficult” problem, then by the discussion in
§1.2.1, one can ignore all definitions regarding circuits and still grasp the
essence of Valiant’s conjectures.
Now for the determinant:
Proposition 6.1.2.7. (detn ) ∈ VP.
One can compute the determinant quickly via Gaussian elimination: one
uses the group to put a matrix in a form where the determinant is almost
effortless to compute (the determinant of an upper triangular matrix is just
the product of its diagonal entries). However this algorithm as presented is
not a circuit (there are divisions and one needs to check if pivots are zero).
Here is a construction of a small circuit for the determinant:
Proof of Proposition 6.1.2.7. The determinant of a linear map is the
product of its eigenvalues, en (λ) = λ1 · · · λn . This is an elementary symmetric function, fitting into the family
ek (λ) :=
X
λj1 · · · λjk .
J⊂[n]||J|=k
This family has a generating function
(6.1.1)
En (t) :=
X
ek (λ)tk = Πni=1 (1 + λi t).
k≥0
On the other hand, recall that trace(f ) is the sum of the eigenvalues of
f , and more generally, letting f k denote the composition of f with itself k
times,
trace(f k ) = pk (λ) := λk1 + · · · + λkn .
The quantities trace(f k ) can be computed with small circuits. The power
sum symmetric functions have the generating function
Pn (t) =
X
pk tk =
k≥1
d
logΠnj=1 (1 − λi t)−1 .
dt
In what follows, I suppress n from the notation.
Exercise 6.1.2.8: Show that
(6.1.2)
P (−t) =
E 0 (t)
.
E(t)
6.1. Circuits and the flagship conjecture of GCT
117
Exercise 6.1.2.8, together with a little more work (see, e.g. [Mac95, p.
28]) shows that


p1
1
0 ···
0
 p2
p1 2 · · ·
0 


1
 ..
..
.. . .
..  .
en (λ) =
det  .
.
.
.
. 


n!
pn−1 pn−2
· · · n − 1
pn pn−1
···
p1
While this is still a determinant, it is almost lower triangular and its naı̈ve
computation, e.g., with Laplace expansion, can be done with an O(n3 )-size
circuit and the full algorithm for computing detn can be executed with an
O(n4 ) size circuit.
Remark 6.1.2.9. A more restrictive class of circuits are formulas which
are circuits that are trees. The circuit in the proof above is not a formula
because results from computations are used more than once. It is known that
the determinant admits a quasi-polynomial size formula, that is, a formula
c
of size nO((logn) ) for some constant c, and it is complete for the complexity
class VQP = VPs consisting of sequences of polynomials admitting a quasipolynomial size formula (or equivalently, by ***, a polynomial sized “weakly
skew” circuit). See, e.g., [BCS97, §21.5]. It is not known whether or not
the determinant is complete for VP.
6.1.3. Permanent v. determinant. Despite the above remark, since the
determinant is so beautiful, researchers often work with it. Valiant made
the following possibly weaker conjecture:
Conjecture 6.1.3.1 (Valiant). [Val79b] Set n = mc , for any c > 0. Then
for all sufficiently large m,
2
`n−m permm 6∈ End(Cn ) · detn .
Since Conjecture 6.1.3.1 is expected to be difficult, define the following
measure of progress towards proving it: let dc(permm ) denote the smallest n
2
such that `n−m permm ∈ End(Cn ) · detn . Lower bounds on dc(permm ) give
2
a benchmark for progress. So far we know dc(permm ) ≥ m2 from [MR04].
This bound was obtained with local differential geometry.
**this paragraph needs cleaning*** Conjecture 6.1.3.1 takes us to a question of algebra. It involves our favorite polynomial, the determinant, which
recall from Example 5.2.6.6 is characterized by its symmetries because it
is the unique instance of the trivial representation of SL(E) × SL(F ) in
S n (E⊗F ). Here E, F = Cn
118
6. Geometric Complexity Theory
As we have seen in Example 5.2.6.7, the permanent is not so bad either:
it is also characterized by its symmetries, as it is the unique instance of the
trivial representation of T SL(E) o WE × T SL(F ) o WF in S n (E⊗F ).
In order to use more tools from algebraic geometry and representation
theory to separate complexity classes, the following conjecture appeared in
[MS01]:
Conjecture 6.1.3.2. [MS01] Let ` be a linear coordinate on C1 and con2
2
sider any linear inclusion C1 ⊕Cm → W = Cn , so in particular `n−m permm ∈
S n W . Let n(m) be a polynomial. Then for all sufficiently large m,
`n−m permm 6∈ GL(W ) · [detn(m) ].
Note that GL(W ) · [detn ] = End(W ) · [detn ] so this is a strengthening
of Conjecture 6.1.3.1. It will be useful to rephrase the conjecture slightly, to
highlight that it is a question about determining whether one orbit closure
is contained in another. Let
Detn := GL(W ) · [detn ],
and let
n−m perm ].
Permm
m
n := GL(W ) · [`
Conjecture 6.1.3.3. [MS01] Let n(m) be a polynomial. Then for all sufficiently large m,
Permm
n(m) 6⊂ Detn(m) .
The equivalence follows as `n−m permm 6∈ Detn implies GL(W )·`n−m permm 6∈
Detn , and since Detn is closed and both sides are irreducible, there is no
harm in taking closure on the left hand side.
Now the goal is clear: both varieties are invariant under GL(W ) so their
ideals will be GL(W )-modules. We look for a GL(W )-module M such that
M ⊂ I[Detn ] and M 6⊂ I[Permm
n ].
In the next section I explain a plan to use the algebraic Peter-Weyl
theorem to find modules in I[Detn ].
6.2. A program to find modules in I[Detn ] via representation
theory
In this section I present the program in [MS08] to find modules in the ideal
of Detn .
Remark 6.2.0.4. In this section I deal with the GL(W )-orbit closure. In
[MS08], the idea was to use the SL(W )-orbit closure of detn in affine space.
This has the advantage that the orbit is already closed, so one can apply the
6.2. A program to find modules in I[Detn ] via representation theory
119
algebraic Peter-Weyl theorem directly, as discussed below. It has the disadvantage that one cannot distinguish isomorphic SL(W )-modules appearing
in different degrees in the coordinate ring.
ˆ n ] := Sym(S n W ∗ )/I(Detn ), the ho6.2.1. Preliminaries. Define C[Det
mogeneous coordinate ring of the (cone over) Detn . This is the space of
ˆ n inherited from polynomials on the ambient
polynomial functions on Det
space.
Since I(Detn ) ⊂ Sym(S n W ∗ ) is a GL(W )-submodule, and since GL(W )
is reductive, we obtain the following splitting as a GL(W )-module:
ˆ n ].
Sym(S n W ∗ ) = I(Detn ) ⊕ C[Det
In particular, if a module Sπ W appears in Sym(S n W ∗ ) and it does not
appear in C[Detn ], it must appear in I(Detn ).
Define C[GL(W )·detn ] = C[GL(W )/Gdetn ] to be C[GL(W )]Gdetn . ***Need
to add explanation as to why this ring is the ring of regular functions of the
orbit - inclusion is clear *****. There is an injective map
ˆ n ] → C[GL(W ) · detn ]
C[Det
given by restriction of functions. The map is an injection because any function identically zero on a Zariski open subset of an irreducible variety is
identically zero on the variety. The algebraic Peter-Weyl theorem gives a
description of the G-module structure of C[G/H] when G is a reductive
algebraic group and H is a subgroup.
Plan : Find a module Sπ W ∗ not appearing in C[GL(W )/Gdetn ] that does
appear in Sym(S n W ∗ ).
By the above discussion such a module must appear in I(Detn ).
Definition 6.2.1.1. An irreducible GL(W )-module Sπ W ∗ appearing in
Sym(S n W ∗ ) and not appearing in C[GL(W )/Gdetn ] is called an orbit occurrence obstruction.
The precise condition a module must satisfy to be an orbit occurrence
obstruction is explained in Proposition 6.2.5.2. Whether such a module will
be useful or not for separating the determinant from the padded permanent
will be taken up in §6.3.
We will see that orbit occurrence obstructions may be difficult to find,
so several relaxations of the definition are given in the next subsection.
One might object that the coordinate rings of different orbits could coincide, or at least be very close. Indeed this is the case for generic polynomials,
but in GCT one generally restricts to polynomials whose symmetry groups
characterize the orbit as follows:
120
6. Geometric Complexity Theory
Definition 6.2.1.2. Let V be a G-module. A point P ∈ V is characterized
by its stabilizer GP if any Q ∈ V with GQ ⊇ GP is of the form Q = cP for
some constant c.
We have seen in §5.2.6 that both the determinant and permanent polynomials are characterized by their stabilizers.
One can think of polynomial sequences that are complete for their complexity classes and are characterized by their stabilizers as “best” representatives of their class. We will see that if P ∈ S d V is characterized by its
stabilizer, the coordinate ring of its G-orbit is unique as a module among
orbits of points in V .
Recall that Chn (Cn ) is the variety of polynomials that are products of
linear forms and observe that Chn (Cn ) is contained in Detn and Permnn . The
following conjecture, if true, would reduce the search space for occurrence
obstructions:
Conjecture 6.2.1.3 (Kumar [Kum]). For all partitions π with `(π) ≤ n,
2
the module Snπ Cn occurs in C[Chn (Cn )]. In particular, Snπ Cn occurs in
C[Detn ] and C[Permnn ].
In §7.3.5 I discuss evidence for this conjecture and how it has interesting
connections to a longstanding conjecture in combinatorics as well as the
evaluation of certain integrals over the special unitary group.
6.2.2. Three relaxations of orbit occurrence obstructions. First, if
the multiplicity of Sπ W ∗ in C[GL(W )/Gdetn ] is smaller than its multiplicity
in Sym(S n W ∗ ), then some copy of it must be in I(Detn ).
Definition 6.2.2.1. An irreducible GL(W )-module Sπ W ∗ satisfying mult(Sπ W ∗ , Sym(S n W ∗ )) >
mult(Sπ W ∗ , C[GL(W )/Gdetn ]) is called an orbit representation-theoretic obstruction.
While orbit representation-theoretic obstructions may be easier to find
than orbit occurrence obstructions, they have the disadvantage that to utilize them, one must either determine an explicit realization of the module
in I(Detn ) to test it on the padded permanent or have the multiplicity in
C[Permm
n ] be sufficiently large to guarantee at least one copy in I(Detn ) is
in C[Permm
n ].
A different relaxation is that even if a module appears in C[GL(W )/Gdetn ],
it may not appear in C[Detn ].
Definition 6.2.2.2. An irreducible GL(W )-module Sπ W ∗ appearing in
Sym(S n W ∗ ) and not appearing in C[Detn ] is called an occurrence obstruction.
6.2. A program to find modules in I[Detn ] via representation theory
121
Occurrence obstructions have the disadvantage in that the extension
problem of determining which modules in C[GL(W )/Gdetn ] extend to be
defined on all of C[Detn ] is notoriously difficult, even in simple cases where
of orbit closures where the boundary is well understood. See §8.1.1 for an
example.
Finally one can combine the two relaxations to define a representationtheoretic obstruction, a module Sπ W ∗ satisfying mult(Sπ W ∗ , Sym(S n W ∗ ) >
mult(Sπ W ∗ , C[Detn ]), which should be the easiest to find, but the hardest
to utilize.
6.2.3. The coordinate ring of an orbit. Recall the algebraic Peter-Weyl
Theorem from §5.1.7.
Corollary 6.2.3.1. Let H ⊂ G be a closed subgroup. Then, as a G-module,
M ⊕ dim(V ∗ )H
M
λ
.
Vλ
(6.2.1)
C[G/H] = C[G]H =
Vλ ⊗(Vλ∗ )H =
λ∈Λ+
G
λ∈Λ+
G
Here G acts on the Vλ and (Vλ∗ )H is just a vector space whose dimension
records the multiplicity of Vλ in C[G/H].
Corollary 6.2.3.1 motivates the study of polynomials characterized by
their stabilizers: if P ∈ V is characterized by its stabilizer, then G · P is the
unique orbit in V with coordinate ring isomorphic to C[G·P ] as a G-module.
Moreover, for any Q ∈ V that is not a multiple of P , C[G · Q] 6⊂ C[G · P ].
We next determine Gdetn ⊂ GL(W ) in order to get as much information
as possible about the modules in C[GL(W )/Gdetn ].
Remark 6.2.3.2. All GL(W )-modules S(p1 ,...,pw ) W may be graded using
p1 + · · · + pw as the grading. One does not have such a grading for SL(W )modules, which makes their use in GCT more difficult, as mentioned in
Remark 6.2.0.4.
6.2.4. The stabilizer of the determinant. I follow [Die49] in this sec2
tion. Write Cn = W = E⊗F = Hom(E ∗ , F ) with E, F = Cn , and
2
ρ : GLn2 → GL(S n Cn ) for the representation.
Theorem 6.2.4.1 (Frobenius [Fro97]). Let φ ∈ Gln2 be such that ρ(φ)(detn ) =
2
detn . Then, identifying Cn with the space of n × n matrices,
gzh, or
φ(z) =
gz T h
for some g, h ∈ GLn , with detn (g) detn (h) = 1. Here z T denotes the transpose of z.
122
6. Geometric Complexity Theory
Let µn,r denote the kernel of the product map (Zn )×r → Zn and think
of Zn ⊂ GLn as the n-th roots of unity times the identity matrix. Write
µn = µn,2 .
Corollary 6.2.4.2. Gdetn = (SL(E) × SL(F ))/µn o Z2
To prove the Corollary, just note that the C∗ corresponding to det(g)
above and µn are the kernel of the map C∗ × SL(E) × SL(F ) → GL(E⊗F ).
Exercise 6.2.4.3: Prove the n = 2 case of the theorem. }
Lemma 6.2.4.4. Let U ⊂ W be a linear subspace such that U ⊂ {detn =
0}. Then dim U ≤ n2 − n. The subvariety of the Grassmannian G(n2 −
n, W ) consisting of maximal linear spaces on {detn = 0} has two irreducible
components, call them Σα and Σβ , where
(6.2.2)
Σα = {X ∈ G(n2 − n, W ) | ker(X) = L̂ for some L ∈ PE}, and
(6.2.3)
Σβ = {X ∈ G(n2 − n, W ) | Image(X) = Ĥ for some H ∈ PF ∗ }.
Here ker(X) means the intersections of the kernels of all f ∈ X and Image(X)
is the span of all the images.
Moreover, for any two distinct Xj ∈ Σα , j = 1, 2, and Yj ∈ Σβ we have
(6.2.4)
(6.2.5)
dim(X1 ∩ X2 ) = dim(Y1 ∩ Y2 ) = n2 − 2n, and
dim(Xi ∩ Yj ) = n2 − 2n + 1.
Exercise 6.2.4.5: Prove Lemma 6.2.4.4.
Proof of theorem 6.2.4.1. **ask Fulvio why confused*** Let Σ = Σα ∪
Σβ . Then the map on G(n2 − n, W ) induced by φ must preserve Σ. By the
conditions (6.2.4),(6.2.5) of Lemma 6.2.4.4, in order to preserve dimensions
of intersections, either every X ∈ Σα must map to a point of Σα , in which
case every Y ∈ Σβ must map to a point of Σβ , or, every X ∈ Σα must map
to a point of Σβ , and every Y ∈ Σβ must map to a point of Σα . If we are
in the second case, replace φ by φ ◦ T , where T (z) = z T , so we may now
assume φ preserves both Σα and Σβ .
Now Σα ' PE, so φ induces an algebraic map φE : PE → PE. If
L1 , L2 ∈ PE lie on a P1 , in order for φ to preserve dim(XL1 ∩ XL2 ), the
images of the Lj under φE must also lie on a P1 , and thus φE must take
lines to lines (and similarly hyperplanes to hyperplanes). But then, (see,
e.g., [Har95, §18, p. 229]) φE ∈ P GL(E), and similarly, φF ∈ P GL(F ),
where φF : PF ∗ → PF ∗ is the corresponding map. Here P GL(E) denotes
GL(E)/C∗ , the image of GL(E) in its action on projective space. Write
φ̂E ∈ GL(E) for any choice of lift and similarly for F .
Consider the map φ̃ ∈ ρ(GL(W )) given by φ̃(X) = φ̂E −1 φ(X)φ̂F −1 . The
map φ̃ sends each Xj ∈ Σα to itself as well as each Yj ∈ Σβ , in particular
6.2. A program to find modules in I[Detn ] via representation theory
123
it does the same for all intersections. Hence it preserves Seg(PE × PF ) ⊂
P(E⊗F ) point-wise, so it is up to scale the identity map.
Remark 6.2.4.6. For those familiar with Picard groups, M. Brion points
out that there is a shorter proof of Theorem 6.2.4.1 as follows: In general,
if a polynomial P is reduced and irreducible, then GZ(P )∨ = GZ(P ) = G[P ] .
(This follows as the dual variety Z(P )∨ , see §6.6, satisfies (Z(P )∨ )∨ =
Z(P ).) The dual of Z(detn ) is the Segre Seg(Pn−1 × Pn−1 ). Now the automorphism group of Pn−1 × Pn−1 = PE × PF acts on the Picard group which
is Z × Z and preserves the two generators OPE×PF (1, 0) and OPE×PF (0, 1)
coming from the generators on PE, PF . Thus, possibly composing with Z2
swapping the generators (corresponding to transpose in the ambient space),
we may assume each generator is preserved. But then we must have an
element of Aut(PE) × Aut(PF ) = P GL(E) × P GL(F ). Passing back to the
ambient space, we obtain the result.
6.2.5. The coordinate ring of GL(W )·detn . We first compute the SL(E)×
SL(F )-invariants in Sπ W where |π| = d. Recall from §5.2.1 that by definition, Sπ W = HomSd ([π], W ⊗d ). Thus
Sπ W = HomSd ([π], E ⊗d ⊗F ⊗d )
M
M
= HomSd ([π], (
[µ]⊗Sµ E)⊗(
[ν]⊗Sν F )
|µ|=d
=
M
|ν|=d
HomSd ([π], [µ]⊗[ν])⊗Sµ E⊗Sν F
|µ|=|ν|=d
The vector space HomSd ([π], [µ]⊗[ν]) simply records the multiplicity of Sµ E⊗Sν F
in Sπ (E⊗F ). Recall from §5.3.4 that the numbers kπ,µ,ν = dim HomSd ([π], [µ]⊗[ν])
are called Kronecker coefficients.
Recall from §5.2.9 that Sµ E is a trivial SL(E) module if and only if
µ = (δ n ) for some δ ∈ Z. Thus so far, we are reduced to studying the
Kronecker coefficients kπ,δn ,δn . Now take the Z2 action given by exchanging
E and F into account. Write [µ]⊗[µ] = S 2 [µ] ⊕ Λ2 [µ]. The first module
will be invariant under Z2 = S2 , and the second will transform its sign
under the transposition. So define the symmetric Kronecker coefficients
π := dim(Hom ([π], S 2 [µ])).
skµ,µ
Sd
We conclude:
2
Proposition 6.2.5.1. [BLMW11] Let W = Cn . The coordinate ring of
the GL(W )-orbit of detn is
M M
π
C[GL(W ) · detn ] =
(Sπ W ∗ )⊕skdn dn .
d∈Z π | |π|=nδ
To phrase in the language of obstructions:
124
6. Geometric Complexity Theory
Proposition 6.2.5.2. Partitions π of dn such that skdπn dn < mult(Sπ W, S d (S n W ))
are orbit representation-theoretic obstructions, and if moreover skdπn dn = 0,
Sπ W is an orbit occurrence obstruction.
I discuss the computation of Kronecker and symmetric Kronecker coefficients in §6.7.
C. Ikenmeyer [Ike12] has examined the situation for Det3 . He found on
the order of 3, 000 orbit representation-theoretic obstructions, of which on
the order of 100 are orbit occurrence obstructions in degrees up to d = 15.
There are two such partitions with seven parts, (132 , 25 ) and (15, 56 ). The
rest consist of partitions with at least 8 parts (and many with 9). Also of
interest is that for approximately 2/3 of the partitions skδπ3 δ3 < kπδ3 δ3 . The
lowest degree of an occurrence obstruction is d = 10, where π = (92 , 26 )
π
10 (S 3 W )) = 1. In degree 11,
has sk10
3 103 = kπ103 103 = 0 but mult(Sπ W, S
2
5
π = (11 , 2 , 1) is an occurrence obstruction where mult(Sπ W, S 11 (S 3 W )) =
π
kπ113 113 = 1 > 0 = sk11
3 113 .
In §6.3, I describe restrictions on which modules one should search for,
taking into account just that one is comparing the determinant with a polynomial that is padded and only involves few variables.
6.3. Necessary conditions for modules of polynomials to be
useful for GCT
2
The polynomial `n−m permm ∈ S n Cn has two properties that can be studied
individually: it is padded, that is divisible by a large power of a linear form,
and its zero set is a cone with a n2 − m2 − 1 dimensional vertex, that is, it
only uses m2 + 1 of the n2 variables in an expression in good coordinates. I
begin with the study of cones, as this is a classical topic.
6.3.1. Cones.
Let Subk (S d V ) ⊂ PS d V be the variety of polynomials whose zero sets
in projective space are cones with a v − k dimensional vertex, called the
subspace variety. If [P ] ∈ Subk (S d V ), then there exist linear coordinates on
V such that the expression of P only involves at most k of the coordinates.
6.3. Necessary conditions for modules of polynomials to be useful for GCT
125
Equations for Subk (S d V ) are easy to describe: its ideal is generated by the
size k + 1 minors of
P1,d−1 : V ∗ → S d−1 V.
Proposition 6.3.1.1. Iδ (Subk (S d V )) consists of the isotypic components
of the modules Sπ V ∗ appearing in S δ (S d V ∗ ) such that `(π) > k.
Exercise 6.3.1.2: Prove Proposition 6.3.1.1.
With just a little more effort, one can prove our first equations generate
the ideal:
Theorem 6.3.1.3. [Wey03, Cor. 7.2.3] The ideal of Subk (S d V ) is generated by the image of Λk+1 V ∗ ⊗Λk+1 S d−1 V ∗ ⊂ S k+1 (V ∗ ⊗S d−1 V ∗ ) in S k+1 (S d V ∗ ).
Aside 6.3.1.4. Here is further information about the variety Subk (S d V ):
First, it is a good example of a variety admitting a Kempf-Weyman desingularization, a type of desingularization that G-varieties often admit. Rather
than discuss the general theory here (see [Wey03] for a full exposition or
[Lan12, Chap. 17] for an elementary introduction), I just explain this example, which gives a proof of Theorem 6.3.1.3, although more elementary
proofs are possible. Consider the Grassmannian G(k, V ) it has a tautological vector bundle π : S → G(k, V ), where the fiber over a k-plane E is
just the k-plane itself, so the whole bundle is a sub-bundle of the trivial
bundle V with fiber V . Consider the bundle S d S ⊂ S d V . We have a projection map p : S d V → S d V . The image of S d S under p is Ŝubk (S d V ).
Moreover, the map is a desingularization, that is S d S is smooth, and the
map to Ŝubk (S d V ) is generically one to one. In particular, this implies
dim Ŝubk (S d V ) = dim(S d S) = k+d−1
+ d(v − k). From these calculations,
d
one obtains the entire minimal free resolution.
6.3.2. The variety of padded polynomials. Define the variety of padded
polynomials
Padn−m (S n W ) := P{P ∈ S n W | P = `n−m h, for some ` ∈ W, h ∈ S m W ∗ } ⊂ PS n W.
Theorem 6.3.2.1. [KL14] Let π = (p1 , . . . , pw ) be a partition of dn.
(1) If p1 < d(n − m), then the isotypic component of Sπ W in S d (S n W )
is contained in Id (Padn−m (S n W ∗ )).
(2) If p1 ≥ min{d(n − 1), dn − m}, then Id (Padn−m (S n W ∗ )) does not
contain a copy of Sπ W .
Proof. An analog of inheritance for secant varieties of Segre varieties (see
§5.3.1) applies in the current situation: if k < w and we have an inclusion
Ck ⊂ W , then the modules Sπ Ck each induce a unique module Sπ W by
126
6. Geometric Complexity Theory
considering the highest weight vector in (Ck )⊗|π| as a highest weight vector
in W ⊗|π| .
Since all partitions in S d (S n W ) have length at most d, we may assume
by Proposition 6.3.2.4 that w = d. Fix a (weight) basis e1 , . . . , ed of W with
dual basis x1 , . . . , xd of W ∗ . Note any element `n−m h ∈ Padn−m (S n W )
is in the GL(W )-orbit of (e1 )n−m h̃ for some h̃, so it will be sufficient to
show that the ideal in degree d contains the modules vanishing on the orbits
of elements of the form (e1 )n−m h. The highest weight vector of any copy
of S(p1 ,...,pd ) W in S d (S n W ) will be a linear combination of vectors of the
1
1
d
d
form mI := ((x1 )i1 · · · (xd )id ) · · · ((x1 )i1 · · · (xd )id ), where i1j + · · · + idj = pj
and ik1 + · · · + ikd = n for all k. Each mI vanishes on any (e1 )n−m h unless
p1 ≥ d(n − m), proving (1). For a coordinate-free proof, see [KL14].
To prove (2), define a map on basis elements by
(6.3.1)
mx1 : S d (S m W ) → S d (S n W )
1
1
1
d
d
1
1
1
d
d
((x1 )i1 (x2 )i2 · · · (xd )id ) · · · ((x1 )i1 · · · (xd )id ) 7→ ((x1 )i1 +(n−m) (x2 )i2 · · · (xd )id ) · · · ((x1 )i1 +(n−m) · · · (xd )id )
and extend linearly. A vector of weight µ = (q1 , q2 , . . . , qd ) is mapped under
mx1 to a vector of weight π = (p1 , . . . , pd ) := µ + (d(n − m)) = (q1 +
d(n − m), q2 , . . . , qd ) in S d (S n W ∗ ). Moreover, mx1 takes highest weight
vectors to highest weight vectors, as raising operators annihilate x1 . The
image of the space of highest weight vectors for the isotypic component of
Sµ W ∗ in S d (S m W ∗ ) under mx1 will be in C[Padn−m (S n W ∗ )] because, for
1
1
example, such a polynomial will not vanish on (e1 )n−m [(e1 )i1 · · · (ed )id +· · ·+
d
d
(e1 )i1 · · · (ed )id ].
In §?? we will show
mult(Sπ W, S d (S m W )) = mult(Sπ+d(n−m) W, S d (S n W ))
as soon as d ≥ q2 + · · · + qd . This condition implies q1 ≥ dm − d, so
p1 = q1 + dn − dm ≥ dn − d = d(n − 1) as required. For the sufficiency of
p1 ≥ dn − m, note that if p1 ≥ (d − 1)n + (n − m) = dn − m, then in an
element of weight π, each of the exponents i11 , . . . , id1 of x1 must be at least
n−m. So there again exists an element of Padn−m (S n W ) such that a vector
of weight π does not vanish on it.
Remark 6.3.2.2. For the subspace variety Subk (S n W ), a module appears
in its ideal if and only if its entire isotypic component in Sym(S n W ∗ )
appears. This property fails for Padn−m (S n W ). For example, consider
I3 (Padn−2 (S n C2 )). For n ≥ 6, the module S(3(n−2),6) W appears with multiplicity two in S 3 (S n W ∗ ), by e.g., [Man98, Thm 4.2.2]. Using [Wey93,
Cor. 4a], one computes that the module S(3(n−2),6) W ∗ has multiplicity one
6.3. Necessary conditions for modules of polynomials to be useful for GCT
127
in C[Padn−2 (S n C2 )] and hence multiplicity one in I3 (Padn−2 (S n C2 )) as well.
Thus as far as a GCT guide for “where to look” for good (abstract) modules
is concerned, Theorem 6.3.3.3 appears to be sharp.
Regarding set-theoretic equations, we have the following result:
Proposition 6.3.2.3. [KL14] If 4m < n, then the variety of padded polynomials Padn−m (S n W ∗ ) is cut out set-theoretically by equations of degree
two. For any n > m,
bn
c+m
2
(6.3.2)
I2 (Padn−m (S n W ∗ )) =
M
S2(n−m−k),2(m+k) W.
k=1
Proof. Write x = x1 , y = x2 , and note that the highest weight vector of
S2n−2s,2s W ⊂ S 2 (S n W ) will be a linear combination of vectors of the form
(xn )(xn−2s y 2s ), (xn−1 y 1 )(xn−2s+1 y 2s−1 ), . . . , (xn−s y s )(xn−s y s )
with a nonzero coefficient on the last factor. These vectors vanish on e1n−m h
for any h if and only if s ≥ m + 1.
What we really need to study is the variety Padn−m (Subk (S d W )) of
padded cones.
Proposition 6.3.2.4. [KL14] (Inheritance) Notations as above. Id (Padn−m (Subk (S n W ∗ )))
∗
consists of all modules Sπ W such that Sπ Ck is in the ideal of Padn−m (S n Ck )
and all modules whose associated partition has length at least k + 1.
Exercise 6.3.2.5: Prove Proposition 6.3.2.4.
6.3.3. GCT useful polynomials. The results of the previous two sections indicate that for the purposes of separating the determinant from any
sequence of padded polynomials, we should restrict our search for modules
in the ideal of Detn as follows:
2
Definition 6.3.3.1. Let W = Cn . A module Sπ W ⊂ Id (Detn ) is (n, m)GCT useful if Sπ W 6⊂ Id (Padn−m (Subm2 +1 (S n W )).
Remark 6.3.3.2. For different complexity problems one makes a analogous
definitions, see [KL14].
In summary:
Corollary 6.3.3.3. [KL14] A necessary condition for a sequence of modules
Sπn Wn ⊂ Id (Detn ) to be (n, m)-GCT useful is
(1) `(πn(m) ) ≤ m2 + 1,
(2) If πn(m) = (p1 , . . . , pt ), then p1 ≥ d(n − m).
128
6. Geometric Complexity Theory
Moreover, if p1 ≥ min{d(n − 1), dn − m}, then the necessary conditions
are also sufficient. In particular, for p1 sufficiently large, GCT usefulness
depends only on the partition π.
Proof. By Proposition 6.3.2.4, the ideal of Padn−m (Subk (S n W )) may be
deduced from that of Padn−m (S n W ), which explains condition (1), so we
only consider the case k = w. Condition (2) and the “moreover” assertion
are a consequence of Theorem 6.3.2.1.
***mention state of the art here, also conjectures about ease of computation of Kronecker coefficients and transition from n = poly(m) to n = 2m .***
6.4. GCTV: Blackbox derandomization and the search for
explicit linear spaces
This section to be written. One path to separating Permm
n from Detn would
be to find a large linear space that fails to intersect Detn . One then would
have a reasonably tractable problem working with the projected varieties
because the codimension becomes a polynomial in n. This is the subject of
[Mula] which will be discussed here. This is a problem of finding hay in a
haystack, since a “random” linear space would work with probability one.
6.5. GCTVI: The needle exists
This section to be written. The search for representation-theoretic obstructions is a problem of finding a needle in a haystack. This section will discuss
[Mulb], where arguments are made that at least the needle exists.
6.6. Dual varieties and GCT
In this section I describe modules in the ideal of Detn that were found by
solving a problem in classical algebraic geometry. I begin with a detour into
the relevant classical geometry.
In mathematics, one often makes transforms to reorganize information,
such as the Fourier transform. There are geometric transforms to “reorganize” the information in an algebraic variety. Taking the dual variety of a
variety is one such, as I now describe.
6.6.1. Singular hypersurfaces and the dual of the Veronese. Perhaps the two most natural subvarieties in a space of polynomials are the
Veronese variety X = vd (PV ) ⊂ P(S d V ), consisting of polynomials that
are d-th powers, and the variety of polynomials P ∈ S d V whose zero set
Z(P ) ⊂ PV ∗ is not smooth. These two varieties are closely related:
6.6. Dual varieties and GCT
129
Let P ∈ S d V ∗ , and let ZP ⊂ PV be the corresponding hypersurface. By
definition, [x] ∈ ZP if and only if P (x) = 0, i.e., P (x, . . . , x) = P (xd ) = 0
considering P as a multilinear form. Similarly, [x] ∈ (ZP )sing if and only if
P (x) = 0, and dPx = 0, i.e., for all y ∈ W , 0 = dPx (y) = P (x, . . . , x, y) =
P (xd−1 y). Recall from Exercise 2.4.11.1 that T̂[x] vd (PV ) = {xd−1 y | y ∈ V }.
Introduce the notation vd (PV )∨ ⊂ PS d V ∗ for the variety of polynomials
whose zero set is not smooth. We have
vd (PV )∨ := {[P ] ∈ PS d V ∗ | ∃[x] ∈ vd (PV ), T̂x vd (PV ) ⊂ P ⊥ }
Here P ⊥ ⊂ S d V is the hyperplane annihilating the vector P , when P is
considered as a linear form on S d V .
The variety vd (PV )∨ is sometimes called the discriminant hypersurface.
Exercise 6.6.1.1: Show that
vd (PV ) = {[q] ∈ PS d V | ∃z ∈ vd (PV )∨ smooth such that T̂z vd (PV )∨ ⊂ q ⊥ }.
6.6.2. Dual varieties in general. For a smooth irreducible variety X ⊂
PV , define X ∨ ⊂ PV ∗ , the dual variety of X, by
X ∨ := {H ∈ PV ∗ | ∃x ∈ X, T̂x X ⊆ Ĥ ⊥ }.
Exercise 6.6.2.1: Show that X ∨ = {H ∈ PV ∗ | X ∩ H is not smooth}.
Here H refers both to a point in PV ∗ and the hyperplane P(Ĥ ⊥ ) ⊂ PV .
The dual variety can be defined even when X is singular, as long as it is
irreducible, by
X ∨ : = {H ∈ PV ∗ | ∃x ∈ Xsmooth , PT̂x X ⊆ H}
= {H ∈ PV ∗ | ∃x ∈ Xsmooth such that X ∩ H is not smooth at x}.
For our purposes, the most important property of dual varieties is that
for a smooth hypersurface other than a hyperplane, its dual variety is also a
hypersurface. This will be a consequence of the B. Segre dimension formula
6.6.4.1 below. If the dual of X ⊂ PV is not a hypersurface, one says that X ∨
is degenerate. It is a classical problem to study the varieties with degenerate
dual varieties.
Remark 6.6.2.2. If X is irreducible, then X ∨ is irreducible, see, e.g.,
[Har95, p. 197].
Recall that we are looking for pathologies of the hypersurface {detn = 0}
to exploit. It turns out its dual variety is very degenerate:
2 −1
Proposition 6.6.2.3. {detn = 0}∨ = Seg(Pn−1 × Pn−1 ) ⊂ Pn
ticular, dim{detn = 0}∨ = 2n − 2 n2 − 2.
, in par-
130
6. Geometric Complexity Theory
Proof. Let z ∈ {detn = 0}smooth , then without loss of generality



z=


1
..




.
1
0
where all off-diagonal entries are zero, so T̂z {detn = 0} consists of all matrices whose (n, n)-entry is zero, and thus (T̂z {detn = 0})⊥ is the line of matrices in the dual space where all entries are zero except the (n, n)-slot. In
particular, (T̂z {detn = 0})⊥ , is a rank one matrix, and thus (T̂y {detn = 0})⊥
is a rank one matrix for all y ∈ {detn = 0}smooth .
6.6.3. What do varieties with degenerate dual varieties “look like”?
Consider a curve C ⊂ P3 , and a point p ∈ P3 . Take J(C, p), the cone over
C with vertex p. Let [x] ∈ J(C, p), write x = y + p.
Exercise 6.6.3.1: Show that if p 6= y, T̂x J(C, p) = span{T̂y C, p̂}.
Thus the tangent space to the cone is constant along the rulings, and
the surface only has a curves worth of tangent (hyper)-planes, so its dual
variety is degenerate.
Consider again a curve C ⊂ P3 , and this time let τ (C) ⊂ P3 denote
the Zariski closure of the union of all points on PT̂x C as x ranges over the
smooth points of C.
6.6. Dual varieties and GCT
131
Exercise 6.6.3.2: Show that if y1 , y2 ∈ τ (C) are both on a tangent line to
x ∈ C, then T̂y1 τ (C) = T̂y2 τ (C), and thus τ (C)∨ is degenerate.
A theorem of C. Segre from 1910 states the above two examples are the
only surfaces with degenerate dual varieties:
Theorem 6.6.3.3. [Seg10, p. 105] Let X 2 ⊂ P3 be a surface with degenerate dual variety. Then X is one of the following:
(1) A linearly embedded P2 ,
(2) A cone over a curve C, denoted J(p, C),
(3) A tangential variety to a curve C, that is the Zariski closure of the
union of the points on tangent lines to C, denoted τ (C).
(1) is a special case of both (2) and (3) and is the only intersection of the
two.
The proof is differential-geometric, see [IL03, §3.4]. Griffiths and Harris [GH79] vaguely conjectured a higher dimensional generalization of C.
Segre’s theorem, namely that a variety with a degenerate dual is “built out
of ” cones and tangent developables. For example, {detn = 0} may be
thought of as the union of tangent lines to tangent lines to ... to the Segre
variety Seg(Pn−1 × Pn−1 ).
Segre’s theorem already indicates that if we take the Zariski closure in
of the set of irreducible hypersurfaces of degree d with degenerate
dual varieties, we will obtain a reducible variety.
PS d V ∗
Define the (naı̈ve) conormal space Nx∗ X := T̂x X ⊥ ⊂ V ∗ . (For the experts, this is a twist of the usual conormal space.) We obtain the geometric
interpretation
PNx∗ X = {H ∈ PV ∗ | PT̂x X ⊆ H}.
132
6. Geometric Complexity Theory
For more on dual varieties see [?, §8.2].
6.6.4. B. Segre’s dimension formula.
Proposition 6.6.4.1 (B. Segre, see e.g., [GKZ94]). Let P ∈ S d W ∗ be
irreducible and let x ∈ Ẑ(P ) be a general point. Then
dim Z(P )∨ = rank(Pd−2,2 (xd−2 )) − 2.
Here (Pd−2,2 (xd−2 )) ∈ S 2 W ∗ . In coordinates, Pd−2,2 may be written as
a symmetric matrix whose entries are polynomials of degree d − 2 in the
coordinates of x, and is called the Hesssian.
Proof. Let x ∈ Ẑ(P ) ⊂ W be a smooth point, so P (x) = P (x, . . . , x) = 0
and dPx = P (x, . . . , x, ·) 6= 0 and take h = dPx ∈ W ∗ , so [h] ∈ Z(P )∨ . Now
consider a curve ht ⊂ Ẑ(P )∨ with h0 = h. There must be a corresponding
(possibly stationary) curve xt ∈ Ẑ(P ) such that ht = P (xt , . . . , xt , ·) and
thus h00 = P (xd−2 , x00 , ·). Thus the dimension of T̂h Z(P )∨ is the rank of
Pd−2,2 (xd−2 ) minus one (we subtract one because x00 = x yields the zero
vector). Now just recall that dim X = dim T̂x X − 1.
Exercise 6.6.4.2: Show that every P ∈ Subk (S d W ) has dim Z(P )∨ ≤ k−2.
2
Exercise 6.6.4.3: Show that σ3 (Chn (Cn )) 6⊂ Detn .
2 −1
Exercise 6.6.4.4: Show that σ2n+1 (vn (Pn
)) 6⊂ Detn .
Exercise 6.6.4.5: Show that {x1 · · · xn + y1 · · · yn = 0} ⊂ P2n−1 is self dual,
in the sense that it is isomorphic to its own dual variety.
***fix V v. W and duals***
6.6.5. First steps towards equations. Segre’s formula implies, for P ∈
S d W ∗ , that dim Z(P )∨ ≤ k if and only if, for all w ∈ W ,
(6.6.1)
P (w) = 0 V detk+3 (Pd−2,2 (wd−2 )|F ) = 0 ∀F ∈ G(k + 3, W ).
Equivalently (assuming P is irreducible), for any F ∈ G(k + 3, W ), the
polynomial P must divide detk+3 (Pd−2,2 |F ) ∈ S (k+3)(d−2) W ∗ , where detk+3
is evaluated on the S 2 W ∗ factor in S 2 W ∗ ⊗S d−2 W ∗ .
Thus to find polynomials on S d W ∗ characterizing hypersurfaces with
degenerate duals, we need polynomials that detect if a polynomial P divides
a polynomial Q. Now P divides Q if and only if Q ∈ P · S e−d V , i.e.
x I1 P ∧ · · · ∧ x ID P ∧ Q = 0
), which potentially give
where xIj , is a basis of S e−d V (and D = v+e−d−1
e−d
v+e−1
a D+1 dimensional space of equations. Let Dk,d,N ⊂ PS d CN denote the
zero set of these equations.
6.6. Dual varieties and GCT
133
Define Dualk,d,N ⊂ P(S d W ∗ ) as the Zariski closure of the set of irreducible hypersurfaces of degree d in PW ' PN −1 , whose dual variety has
dimension at most k. Our discussion above implies Dualk,d,N ⊆ Dk,d,N .
Note that [detn ] ∈ Dual2n−2,n,n2 ⊆ D2n−2,n,n2 .
6.6.6. The lower bound on dc(permm ).
Exercise 6.6.6.1: Compute permm−2,2 (xm−2 ) when


1 − m 1 ··· 1
 1
1 · · · 1


x= .
..
.. 
.
 .
. · · · .
1
1 ···
1
and show it is of maximal rank.
Proposition 6.6.6.2. Let U = CM , let R ∈ S m U ∗ be irreducible, let ` be
a coordinate on C be nonzero, let U ∗ ⊕ C ⊂ W ∗ be a linear inclusion.
If [R] ∈ Dκ,m,M and [R] 6∈ Dκ−1,m,M , then [`d−m R] ∈ Dκ,d,N and
[`d−m R] 6∈ Dκ−1,d,N .
Proof. Choose a basis u1 , . . . , uM , v, wM +2 , . . . , wN of W so (U ∗ )⊥ = hwM +2 , . . . , wN i
and (L∗ )⊥ = hu1 , . . . , uM , wM +2 , . . . , wN i. Let c = (d − m)(d − m − 1). In
these coordinates, the matrix of 2n − 2, n, n2d−2,2 in (M, 1, N − M − 1) ×
(M, 1, N − M − 1)-block form:
 d−m

`
Rm−2,2 `d−m−1 Rm−1,1 0
2n − 2, n, n2d−2,2 = `d−m−1 Rm−1,1
c`d−m−2 R
0 .
0
0
0
First note that detM +1 (2n − 2, n, n2d−2,2 |F ) for any F ∈ G(M + 1, W ) is
either zero or a multiple of 2n−2, n, n2 . If dim Z(R)∨ = M −2 (the expected
dimension), then for a general F ∈ G(M + 1, W ), detM (2n − 2, n, n2d−2,2 |F )
will not be a multiple of 2n − 2, n, n2 , and more generally if dim Z(R)∨ = κ,
then for a general F ∈ G(κ + 2, W ), detκ+2 (`d−m Rd−2,2 |F ) will not be a
multiple of `d−m R but for any F ∈ G(κ + 3, W ), detκ+3 (`d−m Rd−2,2 |F ) will
be a multiple of `d−m R. This shows [R] 6∈ Dκ−1,m,M , implies [`d−m R] 6∈
Dκ−1,d,N .
Exercise 6.6.6.3: Show that [R] ∈ Dκ,m,M , implies [`d−m R] ∈ Dκ,d,N . }
Combining Exercise 6.6.6.1 and Proposition 6.6.6.2, we conclude:
Theorem 6.6.6.4. [LMR13] Permm
n 6⊂ D2n−2,n,n2 when m <
m2
ticular, dc(permm ) ≥ 2 .
n2
2 .
In par-
134
6. Geometric Complexity Theory
On the other hand, by Exercise 6.6.4.2 cones have degenerate duals, so
2
`n−m permm ∈ D2n−2,n,n2 whenever m ≥ n2 .
6.6.7. Abetter module of equations. The equations above are of degree
v+e−d−1
+ e, where e = (k + e)(d − 2). I now derive equations of much
e−d
lower degree. Since P ∈ S d W divides Q ∈ S e W if and only if for each
L ∈ G(2, W ), P |L divides Q|L , it will be sufficient to solve this problem for
polynomials on C2 . This will have the advantage of producing polynomials
of much lower degree.
Let V = C2 , let d ≤ e, let P ∈ S d V and Q ∈ S e V . If P divides Q then
·P will contain Q. That is, the vectors xe−d P, xe−d−1 yP, . . . , y e−d P, Q
e
in S V will fail to be linearly independent, i.e.,
S e−d V
xe−d P ∧ xe−d−1 yP ∧ · · · ∧ y e−d P ∧ Q = 0.
e+1
Since dim S e V = e + 1, these potentially give a e−d+2
-dimensional vector
space of equations, of degree e − d + 1 in the coefficients of P and linear in
the coefficients of Q.
Let’s compute the highest weight of these equations as a GL(V )-module.
Using the summation convention, write P = Pi,j xi y j , Q = Qs,t xs y t . Then
the highest weight of a vector appearing in
xe−d P ∧ xe−d−1 yP ∧ · · · ∧ y e−d P ∧ Q = 0.
will be the coefficient of xe ∧ xe−1 y ∧ · · · ∧ xd−1 y e−d+1 . The weight will be
the sum of the subscript indices appearing in a monomial. All monomials
appearing with this coefficient will have the same weight, so consider
xe Pd,0 ∧ xe−1 yPd,0 ∧ · · · ∧ xd y e Pd,0 ∧ Qd−1,e−d+1 ,
which has highest weight (ed − d2 + 2d − 1, e − d + 1). The irreducible GL2 module with this highest weight has dimension (d − 1)(2 + e − d) + 1, which
e+1
is much less than the number of equations, which is e−d+2
, so either there
are other irreducible modules in the decomposition, or the equations are far
from being linearly independent.
By taking our polynomials to be P = P |L and Q = detk+3 (Pn−2,2 |F )|L
for F ∈ G(k+3, V ) and L ∈ G(2, F ) (or, for those familiar with flag varieties,
better to say (L, F ) ∈ F lag2,k+3 (V )) we now have equations parametrized
by the pairs (L, F ). Note that deg(Q) = e = (k + 3)(d − 2).
Remark 6.6.7.1. More generally, with V = C2 , given P ∈ S d V , Q ∈ S e V ,
one can ask if P, Q have at least r roots in common (counting multiplicity).
Then P, Q having r points in common says the spaces S e−r V ·P and S d−r V ·Q
intersect. That is,
xe−r P ∧ xe−r−1 yP ∧ · · · ∧ y e−r P ∧ xd−r Q ∧ xd−r−1 yQ ∧ · · · ∧ y d−r Q = 0.
6.6. Dual varieties and GCT
135
In the case r = 1, we get a single polynomial, called the resultant, which
is of central importance. In particular, the proof that the projection of a
projective variety X ⊂ PW from a point y ∈ PW with y 6∈ X, to P(W/ŷ)
is still a projective variety relies on the resultant to produce equations for
the projection. This in turn enables the definition of dimension and degree
often used in projective algebraic geometry. See, e.g., [Har95, Lect. 3].
6.6.8. Module structure of the better set of equations. Write P =
P
J0 is
J
J P̃J x with the sum over |J| = d. The weight of a monomial P̃J0 x
J0 = (j1 , . . . , jn ). Adopt the notation [i] = (0, . . . , 0, 1, 0, . . . , 0) where the 1
is in the i-th slot and similarly for [i, j] where there are two 1’s. The entries of
Pd−2,2 are, for i 6= j, (Pd−2,2 )i,j = PI+[i,j] xI , and for i = j, PI+2[i] xI , where
|I| = d − 2, and PJ is P̃J with the coefficient adjusted, e.g., P(d,0,...,0) =
d(d − 1)P̃(d,0,...,0) etc.. (This won’t matter because we are only concerned
with the weights of the coefficients, not their values.) To determine the
highest weight vector, take L = span{e1 , e2 }, F = span{e1 , . . . , ek+3 }. The
highest weight term of
x2 P |L ) ∧ · · · ∧ (xe−d
(x1e−d P |L ) ∧ (xe−d−1
2 P |L ) ∧ (detk+3 (Pd−2,2 |F ))|L
1
e−(e−d+2)
. It will not matxe−d+2
is the coefficient of xe1 ∧ xe−1
2
1 x2 ∧ · · · ∧ x1
ter how we distribute these for the weight, so take the coefficient of xe1 in
(detk+3 (Pd−2,2 |F ))|L . It has leading term P(d,0,...,0) P(d−2,2,0,...,0) P(d−2,0,2,0,...,0) · · · P(d−2,0,...,0,2,0,...,0)
xs2 P |L ) take the
which is of weight (d+(k +2)(d−2), 2k+2 ). For each (xe−d−s
1
e−s−1 s+1
x2 which has the coefficient of P(d−1,1,0,...,0) each time,
coefficient of x1
to get a total weight contribution of ((e − d + 1)(d − 1), (e − d + 1), 0, . . . , 0)
from these terms. Adding the weights together, and recalling that e =
(k + 3)(d − 2) the highest weight is
(d2 k + 2d2 − 2dk − 4d + 1, dk + 2d − 2k − 3, 2k+1 ),
which may be written as
((k + 2)(d2 − 2d) + 1, (k + 2)(d − 2) + 1, 2k+1 ).
In summary:
Theorem 6.6.8.1. [LMR13] The ideal of the variety Dk,d,N ⊂ P(S d CN ∗ )
contains a copy of the GLN -module Sπ(k,d) CN , where
π(k, d) = ((k + 2)(d2 − 2d) + 1, d(k + 2) − 2k − 3, 2k+1 ).
Since |π| = d(k + 2)(d − 1), these equations have degree (k + 2)(d − 1).
Observe that the module π(2n − 2, n) indeed satisfies the requirements
2
to be (m, m2 )-GCT useful, as p1 = 2n3 − 2n2 + 1 > n(n − m) and `(π(2n −
2, n)) = 2n + 1.
136
6. Geometric Complexity Theory
Remark 6.6.8.2. I do not know whether or not the module of equations
obtained by the span of the pairs (L, P ) is irreducible or not. The dimension
count indicates either there is substantial redundancy among the equations
or the module is not irreducible.
Proposition 6.6.8.3. When restricted to the open subset of irreducible
∗
hypersurfaces in S d CN , Dualk,d,N = Dk,d,N as sets.
Proof. Let P ∈ Dk,d,N be irreducible. For each (L, F ), one obtains settheoretic equations for the condition that P |L divides Q|L , where Q =
det(Pd−2,2 |F ). But P divides Q if and only if restricted to each plane P
divides Q, so these conditions imply that the dual variety of the irreducible
hypersurface Z(P ) has dimension at most k.
Remark 6.6.8.4. The module with highest weight π(2n−2, n) in I2n(n−1) (Detn ) ⊂
2
S 2n(n−1) (S n Cn ) is unlikely to be even a representation-theoretic obstruction. For example, when n = 3, the module with highest weight (19, 7, 25 )
occurs with multiplicity six in S 12 (S 3 C9 ), but only one copy of it appears
to be in the ideal.
6.6.9. Detn is an irreducible component of D2n−2,n,n2 . This section is
for those familiar with Zariski tangent spaces to schemes. It is not used elsewhere. Given two schemes, X, Y with X irreducible and X ⊆ Y , when one
has an equality of Zariski tangent spaces, Tx X = Tx Y for some x ∈ Xsmooth ,
this implies that X is an irreducible component of Y (and in particular, if
Y is irreducible, that X = Y ). The following theorem is the main result of
[LMR13]:
Theorem 6.6.9.1. [LMR13] The scheme D2n−2,n,n2 is smooth at [detn ],
and Detn is an irreducible component of D2n−2,n,n2 .
**picture***
The idea of the proof is as follows: We need to show T[detn ] Dn,2n−2,n2 =
T[detn ] Detn . We already know T[detn ] Dn,2n−2,n2 ⊆ T[detn ] Detn . Both of these
vector spaces are Gdetn -submodules of S n (E⊗F ) = ⊕|π|=n Sπ E⊗Sπ F . It is
easy to see that, as a GL(E)×GL(F )-module, T[detn ] Detn = S2,1n−2 E⊗S2,1n−2 F ,
so the problem becomes to show that none of the other modules are in
T[detn ] Dn,2n−2,n2 . To do this, it suffices to check a single point in each module. A first guess would be to check highest weight vectors, but these are
not so easy to write down in any uniform manner. Fortunately in this case
there is another choice, namely the immanants (see Remark 5.2.6.8), and
the proof in [LMR13] proceeds by checking that none of these other than
IM2,1n−2 are contained in T[detn ] Dn,2n−2,n2 .
Exercise 6.6.9.2: Show that T̂detn Detn = S1n E⊗S1n F ⊕S2,1n−1 E⊗S2,1n−1 F .
6.6. Dual varieties and GCT
137
Theorem 6.6.9.1 implies that the GL(W )-module of highest weight π(2n−
2, n) given by Theorem 6.6.8.1 gives local equations at [detn ] of Detn , of degree 2n(n − 1). Since Subk (S n CN ) ⊂ Dk,n,N , the zero set of the equations
2
is strictly larger than Detn . Recall that dim Subk (S n Cn ) = k+n+1
+ (k +
n
2
2)(N − k − 2) − 1. For k = 2n − 2, N = n , this is larger than the dimension
of the orbit of [detn ], and therefore D2n−2,n,n2 is not irreducible.
6.6.10. On the boundary of the orbit of the determinant. **begin
with general discussion of finding components on the boundary*** Recall
2
that the transposition τ ∈ Gdetn allows us to write Cn = E⊗E = S 2 E ⊕
Λ2 E, where the decomposition is into the ±1 eigenspaces for τ . For M ∈
E⊗E, write M = MS + MΛ reflecting this decomposition.
2
Define a polynomial PΛ ∈ S n (Cn )∗ by
PΛ (M ) = detn (MΛ , . . . , MΛ , MS ).
Let P fi (MΛ ) denote the Pfaffian of the skew-symmetric matrix, obtained
from MΛ by suppressing its i-th row and column. Write MS = (sij ).
Exercise 6.6.10.1: Show that
X
PΛ (M ) =
sij P fi (MΛ )P fj (MΛ ).
i,j
In particular, PΛ = 0 if n is even but is not identically zero when n is odd.
Proposition 6.6.10.2. [LMR13] PΛ,n ∈ Detn . Moreover, GL(W ) · PΛ is
an irreducible codimension one component of the boundary of Detn , not
contained in End(W ) · [detn ]. In particular dc(PΛ,m ) = m < dc(PΛ,m ).
Proof. For the first assertion, note that
1
PΛ (M ) = lim det(MΛ + tMS ).
t→0 t
To prove the second assertion, we compute the stabilizer of PΛ inside GL(Mn (C)).
The action of GL(E) on E⊗E by M 7→ gM g t preserves PΛ up to scale, and
the Lie algebra of the stabilizer of [PΛ ] is a GL(E) submodule of End(E⊗E).
Decompose End(E⊗E) as a GL(E)-module:
End(E⊗E) = End(Λ2 E) ⊕ End(S 2 E) ⊕ Hom(Λ2 E, S 2 E) ⊕ Hom(S 2 E, Λ2 E)
= Λ2 E⊗Λ2 E ∗ ⊕ S 2 E⊗S 2 E ∗ ⊕ Λ2 E ∗ ⊗S 2 E ⊕ S 2 E ∗ ⊗Λ2 E
(6.6.2)
= (gl(E) ⊕ S22 ,1n−2 E) ⊕ (gl(E) ⊕ S4,2n−1 E) ⊕ (sl(E) ⊕ S3,1n−2 E) ⊕ (sl(E) ⊕ S32 ,2n−2 E)
By testing highest weight vectors, one concludes the Lie algebra of GPΛ is
isomorphic to gl(E) ⊕ gl(E), which has dimension 2n2 = dim Gdetn + 1,
implying GL(W ) · PΛ has codimension one in GL(W ) · [detn ]. Since it is
138
6. Geometric Complexity Theory
not contained in the orbit of the determinant, it must be an irreducible
component of its boundary. Since the zero set is not a cone, PΛ cannot be
in End(W ) · detn which consists of GL(W ) · detn plus cones, as any element
of End(W ) either has a kernel or is invertible.
Exercise 6.6.10.3: Verify by testing on highest weight vectors that the
only ones in (6.6.2) annihilating PΛ are those in gl(E) ⊕ gl(E). Note that
as a gl(E)-module, gl(E) = sl(E) ⊕ C so one must test the highest weight
vector of sl(E) and C.
The hypersurface defined by PΛ has interesting properties.
Proposition 6.6.10.4. [LMR13]
2 −1
Z(PΛ )∨ = P{v 2 ⊕ v ∧ w ∈ S 2 Cn ⊕ Λ2 Cn , v, w ∈ Cn } ⊂ Pn
.
As expected, Z(PΛ )∨ resembles Seg(Pn−1 × Pn−1 ).
Remark 6.6.10.5. For those familiar with the notation, Z(PΛ ) can be
defined as the image of the projective bundle π : P(E) → Pn−1 , where
E = O(−1) ⊕ Q is the sum of the tautological and quotient bundles on
Pn−1 , by a sub-linear system of OE (1) ⊗ π ∗ O(1). This sub-linear system
contracts the divisor P(Q) ⊂ P(E) to the Grassmannian G(2, n) ⊂ PΛ2 Cn .
Remark 6.6.10.6. A second way to realize the polynomial P = ∗ ∗ ∗ from
Example ?? is via PΛ : take




0
x3 x2
x1 0 0
0
x1  , MS =  0 x4 0  .
MΛ = −x3
−x2 −x1 0
0 0 x2
6.7. Remarks on Kronecker coefficients and plethysm
coefficients
To pursue the path outlined in [MS01, MS08], the first task would be to
find a useful module in I(Detn ). That is, by the results described in §6.3,
to find a sequence (depending on n, which I suppress from the notation of
π = π(n) and d = d(n)) of partitions π = (p1 , . . . , p` ) of dn such that
• skdπn ,dn < mult(Sπ W, S d (S n W ))
• `(π) ≤ m
• p1 > d(n − m).
To advance the state of
√ the art, by the results discussed in §6.6 it will
necessary to take m ≤ 2n . Write pπ (d[n]) := mult(Sπ W, S d (S n W )) for this
plethysm coefficient.
6.8. Symmetries of polynomials and coordinate rings of orbits
139
What follows is information about Kronecker coefficients and plethysm
coefficients that may be relevant for this problem.
6.7.1. Remarks on Kronecker coefficients. **This section under construction***
6.7.2. Remarks on plethysm coefficients. First observe that
S d (S n V ) = (V ⊗dn )Sn oSd
(6.7.1)
where Sn o Sd = S×d
n o Sd ⊂ Sdn is the wreath product, which consists
of permutations acting on V ⊗dn considered as d blocks of n copies of V ,
permuting within each of the n blocks and permuting the blocks. We have,
M
S d (S n V ) = (
[π]⊗Sπ V )Sn oSd
|π|=dn
=
M
[π]Sn oSd ⊗Sπ V
|π|=dn
and we conclude, assuming dim V ≥ dn, that
pπ (d[n]) := mult(Sπ V, S d (S n V )) = dim[π]Sn oSd .
Now let dim V = dn. Fix a basis of V so WV ' Sdn ⊂ GL(V ) is
realized as the group of permutation matrices. (I use WV instead of Sdn to
avoid confusion with the Sdn that acts on V ⊗dn by permuting the factors:
WV acts on V ⊗dn by acting on each factor.) Let (Sπ V )0 ⊂ Sπ V denote
the (1dn )-weight space (the sl(V )-weight zero subspace). Then WV acts on
(Sπ V )0 .
Proposition 6.7.2.1. Notation as above, let |π| = dn, then (Sπ V )0 = [π]
as a WV -module.
More generally:
Theorem 6.7.2.2. [Gay76] For |π| = d = vs and |µ| = v,
multSv ([µ], (Sπ V )0 ) = multGL(V ) (Sπ V, Sµ (S s V )).
In particular:
pπ (d[n]) = multSv ([d], (Sπ V )0 ).
***proofs to be added***
6.8. Symmetries of polynomials and coordinate rings of
orbits
***needs an introduction***
The calculations of the coordinate rings of the orbits follow [BLMW11].
Examples 6.8.2, 6.8.3, 6.8.5, and 6.8.6 follow [CKW10].
140
6. Geometric Complexity Theory
Throughout this section G = GL(V ), dim V = n, and I use index ranges
1 ≤ i, j, k ≤ n.
I begin with the cases of: a generic polynomial, and the most degenerate
polynomial, namely xd .
Other than §6.8.3, this section contains auxiliary results and may be
skipped on a first reading.
6.8.1. Generic polynomials. Let P ∈ S d V be generic. If d, n > 3, then
GP = {λId : λd = 1} ' Zd , hence GL(V ) · P ' GL(V )/Zd , where Zd acts
as multiplication by the d-th roots of unity, see [Pop75]. (If P ∈ S d V is
any element, then Zd ⊂ GP .)
We need to determine the Zd -invariants in GL(V )-modules. Since Sπ V is
a submodule of V ⊗|π| , ω ∈ Zd acts on Sπ V ⊗(det V )−s by the scalar ω |π|−ns .
By Theorem 5.1.7.1, we conclude the following equality of GL(V )-modules:
M
C[GL(V ) · P ] =
(Sπ V ∗ )⊕ dim Sπ V ⊗(det V ∗ )−s .
(π,s) | d||π|−ns
L
When we pass to C[GL(V ) · P ] = δ S δ (S d V ∗ )/Iδ (GL(V ) · P ) we loose
all terms with s > 0. Since degree is respected, we may write:
M
(6.8.1)
C[GL(V ) · P ]δ ⊆
(Sπ V ∗ )⊕ dim Sπ V .
π | |π|=δd
Note that C[GL(V ) · P ]δ ⊂ S δ (S d V ), but there are far fewer modules and
multiplicities in S δ (S d V ) than on the right hand side of (6.8.1).
6.8.2. A point of vd (PV ). Let P = xd1 ∈ S d V . Let g = (gji ) ∈ GL(V ).
Then g · (xd1 ) = (g1j xj )d so if g · (xd1 ) = xd1 , then g1j = 0 for j > 1 and g11 must
be a d-th root of unity. There are no other restrictions, thus






∗ ∗ ···
ω ∗ ··· ∗











0 ∗ · · ·

 0 ∗ · · · ∗

 d

GP = g ∈ GL(V ) | g = 
 , ω = 1 , G[P ] = g ∈ GL(V ) | g = 
..
..






.
.









0 ∗ ···
0 ∗ ··· ∗
The GL(V ) orbit of [xd1 ] is closed and equal to the Veronese variety
vd (PV ).
Exercise 6.8.2.1: Use Corollary 6.2.3.1 to determine C[vd (PV )] (even if
you already know it by a different method).

∗ 



∗

 .




∗
6.8. Symmetries of polynomials and coordinate rings of orbits
141
6.8.3. The Chow polynomial. Let P = chown := x1 · · · xn ∈ S n V ,
which I will call the “Chow polynomial”. It is clear Γn := TnSL o Sn ⊂
Gchown , we need to determine if the stabilizer is larger. We can work by
brute force: g · chown = (g j1 xj1 ) · · · (g jn xjn ). In order that this be equal to
x1 · · · xn , by unique factorization of polynomials, there must be a permutation σ ∈ Sn such that for each k, we have gkj xj = λk xσ(k) for some λk ∈ C∗ .
Composing with the inverse of this permutation we have gkj = δkj λj , and
finally we see that we must further have λ1 · · · λn = 1, which means it is an
element of TnSL , so the original g is an element of Γn . Thus Gchown = Γn .
The orbit closure of chown is the Chow variety Chn (V ) ⊂ PS n V .
When v = n (which we may restrict to by inheritance), the space of T SL
invariants in a GL(V )-module Sπ V is the (dn )-weight space, which I will
denote by (Sπ V )0 because it is the sl(V )-weight zero space. The algebraic
Peter-Weyl theorem 5.1.7.1, implies
M
Sn
C[GL(V ) · (x1 · · · xn )] =
(Sπ V ∗ )⊕ dim(Sπ V )0 ,
`(π)≤n
Now specialize to the case of modules appearing in Sym(S n V ). Corollary
n
= mult(Sπ V, S n (S s V )). If we consider all the π’s
??(ii) says dim(Sπ V )S
0
together, we conclude
M
C[GL(V ) · (x1 · · · xn )]poly =
S n (S s V ∗ ).
s
In particular,
this in §8.1.1.
L
sS
n (S s V ∗ )
inherits a ring structure. We’ll return to
6.8.4. Techniques. For other polynomials, the determination of GP needs
further techniques. Here is a brief overview of some methods.
One technique is to form auxiliary objects from P which have a symmetry group H that one can compute, and by construction H contains GP .
Usually it is easy to find a group H 0 that clearly is contained in GP , so if
H = H 0 , we are done.
One can determine the connected component of the stabilizer by a Lie
algebra calculation: If we are concerned with p ∈ S d V , the connected component of the identity of the stabilizer of p in GL(V ) is the connected Lie
group associated to the Lie subalgebra of gl(V ) that annihilates p. (The
analogous statement holds for tensors.) To see this, let h ⊂ gl(V ) denote the annihilator of p and let H = exp(h) ⊂ GL(V ) the corresponding Lie group. Then it is clear that H is contained in the stabilizer as
h · p = exp(X) · p = (Id + X + 21 XX + ...)p the first term preserves p
142
6. Geometric Complexity Theory
and the remaining terms annihilate it. Similarly, if H is the group preserving p, taking the derivative of any curve in H through Id at t = 0 give
d
dt |t=0 h(t) · p = 0.
To recover the full stabilizer from knowledge of the connected component
of the identity, we have the following observation, the first part was exploited
in [BGL14]:
Proposition 6.8.4.1. Let V be an irreducible GL(W )-module. Let G0 be
the identity component of the stabilizer Gv of some v ∈ V in GL(W ). Then
Gv is contained in the normalizer of G0 in GL(W ). If G0 is semi-simple and
[v] is determined by G0 , then equality holds up to scalar multiples of the
identity in GL(W ).
Proof. First note that for any group H, the full group H normalizes H 0 .
(If h ∈ H 0 , take a curve ht with h0 = Id and h1 = h, then take any g ∈ H,
the curve ght g −1 connects gh1 g −1 to the identity.) So Gv is contained in
the normalizer of G0 in GL(W ).
For the second assertion, let h ∈ N (G0 ) be in the normalizer. We have
h−1 ghv = g 0 v = v for some g 0 ∈ G0 , and thus g(hv) = (hv). But since [v] is
the unique line preserved by G0 we conclude hv = λv for some λ ∈ C∗ . The following lemma requires more representation theory than we have
used so far, but since it only appears here, readers not familiar with root
systems can skip it without harm:
Lemma 6.8.4.2. [BGL14, Prop. 2.2] Let G0 be semi-simple and act irreducibly on V . Then its normalizer is generated by itself, the scalar matrices, and a finite group constructed as follows: Assume we have chosen
a Borel for G0 , and thus have distinguished a set of simple roots ∆ and
a group homomorphism Aut(∆) → GL(V ). Assume V = Vλ is the irreducible representation with highest weight λ of G0 and consider the subgroup
Aut(∆, λ) ⊂ Aut(∆) that fixes λ. Then N (G0 ) = ((C∗ ×G0 )/Z)oAut(∆, λ).
For the proof, see [BGL14].
Further techniques come from geometry. Consider the hypersurface
Z(P ) := {[v] ∈ PV ∗ | P (v) = 0} ⊂ PV ∗ . If all the irreducible components of P are reduced, then GZ(P ) = G[P ] , as a reduced polynomial may be
recovered up to scale from its zero set, and in general GZ(P ) ⊇ G[P ] . Moreover, we can consider its singular set Z(P )sing , which may be described as
the zero set of the image of P1,d−1 (which is essentially the exterior derivative dP ). If P = aI xI , where ai1 ,...,id is symmetric in its lower indices,
then Z(P )sing = {[v] ∈ PV ∗ | ai1 ,i2 ,...,id xi2 (v) · · · xid (v) = 0 ∀i1 }. While
we could consider the singular locus of the singular locus etc.., it turns out
6.8. Symmetries of polynomials and coordinate rings of orbits
143
to be easier to work with what I will call the Jacobian loci. For an arbitrary variety X ⊂ PV , define XJac,1 := {x ∈ PV | dPx = 0∀P ∈ I(X)}.
If X is a hypersurface, then XJac,1 = Xsing but in general they can be
different. Define XJac,k := (XJac,k−1 )Jac,1 . Algebraically, if X = Z(P )
for some P ∈ S d V , then the ideal of Z(P )Jac,k is generated by the image of
Pk,d−k : S k V ∗ → S d−k V . The symmetry groups of these varieties all contain
GP .
6.8.5. The Fermat. Let fermatdn := xd1 +· · ·+xdn . The GL(V )-orbit closure
of [fermatdn ] is the n-th secant variety of the Veronese variety σn (vd (PV )) ⊂
PS n V . It is clear Sn ⊂ Gfermat , as well as the diagonal matrices whose
entries are d-th roots of unity. We need to see if there is anything else.
The first idea, to look at the singular locus, does not work, as the zero set
is smooth, so we consider fermat2,d−2 = x21 ⊗xd−2 + · · · + x2n ⊗xd−2 . Write
the further polarization P1,1,d−2 as a symmetric matrix whose entries are
homogeneous polynomials of degree d − 2 (the Hessian matrix). We get

 d−2
x1


..
.

.
xnd−2
Were the determinant of this matrix GL(V )-invariant, we could proceed as
we did with chown , using unique factorization. Although it is not, it is
close enough as follows: Recall that for a linear map f : W → V , where
dim W = dim V = n, we have f ∧n ∈ Λn W ∗ ⊗Λn V and an element (h, g) ∈
GL(W ) × GL(V ) acts on f ∧n by (h, g) · f ∧n = (det(h))−1 (det(g))f ∧n . In
∧n (x) = det(g)2 P ∧n (g · x), and the polynomial
our case W = V ∗ so P2,d−2
2,d−2
obtained by the determinant of the Hessian matrix is invariant up to scale.
Arguing as above, (g1j xj )d−2 · · · (gnj xj )d−2 = xd−2
· · · xd−2
and we conn
1
clude again by unique factorization that g is in Γn . Composing with a
permutation matrix to make g ∈ T , we see that, by acting on the Fermat
itself, that the entries on the diagonal are d-th roots of unity.
Exercise 6.8.5.1: Show that the Fermat is characterized by its symmetries.
6.8.6. The sum-product polynomial. The following polynomial, called
the sum-product polynomial, will be important when studying depth-3 circuits:
m
X
n
SPm
:=
Πnj=1 xij ∈ S n (Cnm ).
i=1
Its GL(mn)-orbit closure is the m-th secant variety of the Chow variety
σm (Chn (Cnm )).
144
6. Geometric Complexity Theory
n is characterized by
Exercise 6.8.6.1: Determine GSPmn and show that SPm
its symmetries.
2
6.8.7. Iterated matrix multiplication. Let IM Mnk ∈ S n (Ck n ) denote
the iterated matrix multiplication operator for k×k matrices, (X1 , . . . , Xn ) 7→
trace(X1 · · · Xn ). Letting Vj = Ck , invariantly
IM Mnk = IdV1 ⊗ · · · ⊗ IdVn ∈(V1 ⊗V2∗ )⊗(V2 ⊗V3∗ )⊗ · · · ⊗ (Vn−1 ⊗Vn∗ )⊗(Vn ⊗V1∗ )
⊂ S n ((V1 ⊗V2∗ ) ⊕ (V2 ⊗V3∗ ) ⊕ · · · ⊕ (Vn−1 ⊗Vn∗ ) ⊕ (Vn ⊗V1∗ )),
2
and the connected component of the identity of GIM Mnk ⊂ GL(Ck n ) is clear.
The case of IM Mn3 is important as this sequence is complete for the
complexity class VPe , of sequences of polynomials admitting small formulas,
see [BOC92]. Moreover IM Mnn is complete for the same complexity class
as the determinant, namely VQP, see [Blä01].
Problem 6.8.7.1. Find equations in the ideal of GL9n · IM Mn3 . Determine
3
lower bounds for the inclusions Permm
n ⊂ GL9n · IM Mn and study common
geometric properties (and differences) of Detn and GL9n · IM Mn3 .
Problem 6.8.7.2. Determine GIM Mn3 and GIM Mnn . Are IM Mn3 , IM Mnn
determined by their stabilizers?
2
6.8.8. The permanent. Write Cn = E⊗F . Then it is easy to see (ΓE
n ×
ΓFn ) o Z2 ⊆ Gpermn , where the nontrivial element of Z2 acts by sending a
SL
matrix to its transpose and recall ΓE
n = TE o Sn . We would like to show
this is the entire symmetry group. However, it is not when n = 2.
Exercise 6.8.8.1: What is Gperm2 ? }
F
Theorem 6.8.8.2. [MM62] For n ≥ 3, Gpermn = (ΓE
n × Γn )/µn o Z2 .
Consider Z(permn )sing ⊂ P(E⊗F )∗ . It consists of the matrices all of
whose size n − 1 submatrices have zero permanent. (To see this, note the
permanent has Laplace type expansions.) This seems even more complicated
than the hypersurface Z(permn ) itself. Continuing, Z(permn )Jac,k consists
of the matrices all of whose sub-matrices of size n − k have zero permanent.
In particular Z(permn )Jac,n−2 is defined by quadratic equations. Its zero
set has many components, but each component is easy to describe:
Lemma 6.8.8.3. Let A be an n × n matrix all of whose size 2 submatrices
have zero permanent. Then one of the following hold:
(1) all the entries of A are zero except those in a single size 2 submatrix,
and that submatrix has zero permanent.
(2) all the entries of A are zero except those in the j-th row for some
j. Call the associated component C j .
6.8. Symmetries of polynomials and coordinate rings of orbits
145
(3) all the entries of A are zero except those in the j-th column for
some j. Call the associated component Cj .
The proof is straight-forward. Take a matrix with entries that don’t fit
that pattern, e.g., one that begins
a b e
∗ d ∗
and note that it is not possible to fill in the two unknown entries and have
all size two sub-permanents, even in this corner, zero. There are just a few
F
such cases since we are free to act by SE
n × Sn .
Proof of theorem 6.8.8.2. I follow [Ye11]. Any linear transformation
preserving the permanent must send a component of Z(permn )Jac,n−2 of
type (1) to another of type (1). It must send a component C j either to some
C k or some Ci . But if i 6= j, C j ∩ C i = 0 and for all i, j, dim(C i ∩ Cj ) = 1.
Since intersections must be mapped to intersections, either all components
C i are sent to components Ck or all are permuted among themselves. By
composing with an element of Z2 , we may assume all the C i ’s are sent to
C i ’s and the Cj ’s are sent to Cj ’s. Similarly, by composing with an element
of Sn × Sn we may assume each Ci and C j is sent to itself. But then their
intersections are sent to themselves. So we have, for all i, j,
(6.8.2)
(xij ) 7→ (λij xij )
for some λij and there is no summation in the expression. Consider the
image of a size 2 submatrix, e.g.,
(6.8.3)
x11 x12
λ11 x11 λ12 x12
7
→
.
2
2
x1 x2
λ21 x21 λ22 x22
In order that the map (6.8.2) be in Gpermn , when (xij ) ∈ Z(permn )Jac,n−2 ,
the permanent of the matrix on the right hand side of (6.8.3) must be zero.
The permanent of the right hand side of (6.8.3) when (xij ) ∈ Z(permn )Jac,n−2
is λ11 λ22 x11 x22 + λ21 λ12 x12 x21 = x11 x22 (λ11 λ22 − λ21 λ12 ) which implies λ11 λ22 − λ12 λ21 = 0,
thus all the 2 × 2 minors of the matrix (λij ) are zero, so it has rank one and
is the product of a column vector and a row vector, but then it is an element
of TE × TF .
Because the stabilizer of the permanent is so small, the coordinate ring
of its orbit will be much larger than that of the determinant. An analysis of
the coordinate ring of the orbits of the permanent and padded permanent
is given in [BLMW11].
146
6. Geometric Complexity Theory
6.8.9. The Pascal determinant. Let k be even, and let Aj = Cn . Define
the k-factor Pascal determinant P Dk,n to be the unique up to scale element
of ΛnP
A1 ⊗ · · · ⊗ Λn Ak ⊂ S n (A1 ⊗ · · · ⊗ Ak ). Choose the scale such that if
X = xi1 ,...,ik a1,i1 ⊗ · · · ⊗ ak,ik with aα,j a basis of Aα , then
X
sgn(σ2 · · · σk )x1,σ2 (1),...,σk (1) · · · xn,σ2 (n),...,σk (n)
P Dk,n (X) =
σ2 ,...,σk ∈Sn
By this expression we see, fixing k, that (P Dk,n ) ∈ VNP.
Proposition 6.8.9.1 (Gurvits). The sequence (P D4,n ) is VNP complete.
Proof. Set xijkl = 0 unless i = j and k = l. Then xi,σ2 (i),σ3 (i),σ4 (i) = 0
unless σ2 (i) = i and σ3 (i) = σ4 (i) so the only nonzero monomials are those
where σ2 = Id and σ3 = σ4 , since the sign of σ3 is squared, the result is the
permanent.
Thus we could just as well work with the sequence P D4,n as the permanent. Initially this is appealing, as its stabilizer resembles that of the
determinant, and, after all, P D2,n = detn .
It is clear the identity component of the stabilizer includes SL×k
n /µn,k
where µn is as in §6.2.4, and a straight-forward Lie algebra calculation confirms this is the entire identity component. (Alternatively, one can use
Dynkin’s classification [Dyn52] of maximal subalgebras.) It is also clear
that Sk preserves P Dn,k by permuting the factors.
Theorem 6.8.9.2 (Garibaldi, personal communication). For all k even
GP Dk,n = SL×···×k
/µn,k o Sk
n
Note that this includes the case of the determinant, and gives a new
proof.
It follows from
Lemma 6.8.9.3. [Garibaldi, personal communication] Let V = A1 ⊗ · · · ⊗ Ak .
×k
The normalizer of SL×k
n /µn in GL(V ) is GLn /Z o Sk , where Z denotes
the kernel of the product map (C∗ )×k → C∗ .
Proof of Lemma 6.8.9.3. We use Lemma 6.8.4.2. In our case, the Dynkin
diagram for (∆, λ) is
and Aut(∆, λ) is clearly Sk .
The theorem follows.
6.8. Symmetries of polynomials and coordinate rings of orbits
...
...
...
.
.
.
...
Figure 6.8.1. Marked Dynkin diagram for V
147
Chapter 7
Shallow circuits and
the Chow variety
This chapter discusses approaches to GCT that do not involve the determinant.
In §7.1, I explain results from [GKKS13b] and their consequences for
geometry. The result is that to show VP 6= VNP, one can show the padded
permanent is not in the orbit closure of either of two relatively nice varieties.
The price is that instead of a polynomial separation, one needs a much more
severe separation.
In [GKKS13a], they utilize the method of shifted partial derivatives,
which is a special case of Young-flattenings. I discuss this method and its
relationship to Hilbert functions of Jacobian ideals in §7.2.
One of the two varieties mentioned above is r-th secant variety of the
Chow variety of products of linear forms. The Chow variety itself has been
studied for over 100 years, and in §7.3 I give a history and bring the reader
up to the state of the art.
I believe the Chow variety is important for complexity theory for several
reasons: the study of depth three circuits, as a “toy” orbit closure problem,
and since it is contained in the orbit closures central to GCT, information
about its coordinate ring will be useful for the study of Detn , the orbit
closure of the determinant. It also has a fascinating history, and deep connections to problems in combinatorics, representation theory and algebraic
geometry.
149
150
7. Shallow circuits and the Chow variety
7.1. Shallow circuits and geometry
7.1.1. Introduction. The depth of a circuit C is the length of (i.e., the
number of edges in) the longest path in C from an input to its output. If a
circuit has small depth, it is called a shallow circuit, and the polynomial it
computes can be computed quickly in parallel. When one studies circuits of
bounded depth, one must allow gates to have an arbitrary number of edges
coming in to them (“unbounded fanin”). For such circuits, multiplication
by constants is considered “free.”
**picture here***
The main theoretical interest in shallow circuits for separating complexity classes is that there are depth reduction theorems described in §7.1.2 that
enable one substitute the problem of e.g., showing that there does not exist
a small circuit computing the permanent to the problem of showing that
there does not exists a “slightly less small” shallow circuit computing the
permanent. Two classes of shallow circuits have very nice algebraic varieties
associated to them: the depth three or ΣΠΣ circuits, which consist of depth
three formulas where the first layer of gates consist of additions, the second
of multiplications, and the last gate is an addition gate, and the ΣΛΣΛΣ circuits, which are special depth five circuits, where the first layer of gates are
additions, the second layer consists of “powering gates”, where a powering
gate takes f to f δ for some natural number δ, the third layer addition gates,
the fourth layer again powering gates, and the fifth layer is an addition gate.
I describe the associated varieties to these classes of circuits in §7.1.3 and
§7.1.4.
One can restrict one’s class of circuits further by requiring that they are
homogeneous in the sense that each gate computes a homogeneous polynomial. It turns out that for ΣΛΣΛΣ circuits, this is not restrictive for the
questions of interest, but for ΣΠΣ circuits, there is a tremendous loss of
computing power described in §7.1.4. Fortunately this loss of computing
power can be overcome by something we are already familiar with: computing padded polynomials.
7.1.2. Depth reduction theorems. A major result in the study of shallow circuits was [VSBR83], where it was shown that if a polynomial of
degree d can be computed by a circuit of size s, then it can be computed by
a circuit of depth O(logdlogs) and size polynomial in s.
Here are the relevant results relevant for our discussion. They combine
results of [Bre74, GKKS13b, Tav, Koi, AV08]:
Theorem 7.1.2.1. Let d be a polynomial in n and let Pn ∈ S d Cn be a
sequence of polynomials that can be computed by a circuit of size s = s(n).
7.1. Shallow circuits and geometry
151
Then:
√
(1) P is computable√by a ΣΠΣ circuit of size roughly s
cisely of size 2O( dlog(n)log(ds)) .
d,
more pre-
(2) P is computable, by a homogeneous
√ ΣΛΣΛΣ circuit of size roughly
√
O( dlog(ds)log(n))
d
s , more precisely of size
, and both powering
√ 2
gates of size of roughly d.
Here are ideas towards the proof: In [GKKS13b] they prove upper
bounds for the size of an inhomogeneous depth three circuit computing
a polynomial, in terms of the size of an arbitrary circuit computing the
polynomial. They first apply the work of [Koi, AV08], which allows one to
reduce an arbitrary circuit of size s computing a polynomial of degree d in
n variables to a formula of size 2O(logslogd)
√ and depth d. Next they reduce
0
O(
dlogdlogslogn) . This second passage
to a depth four circuit of size s = 2
is via iterated matrix multiplication. From the depth four circuit, they use
Equation (7.1.1) to convert all multiplication gates to ΣΛΣ circuits to have
a depth five circuit of size O(s0 ) and of the form ΣΛΣΛΣ. Finally, they
convert the power sums to elementary symmetric functions which keeps the
size at O(s0 ) and drops the depth to three.
7.1.3. Geometric reformulation of homogeneous ΣΛΣΛΣ circuits.
Recall that computer scientists always work in bases and the inputs to the
circuits are constants and variables. For homogeneous circuits, the inputs
are simply the variables. The first layer of such a circuit is just to obtain
arbitrary linear forms from these variables, so it plays no role in the geometry. The second layer sends a linear form ` to `δ , i.e., we are forming
points of vδ (PV ). The next layer consists of addition gates, which means
we obtain sums of d-th powers, i.e., points of σr (vδ (PV )). Then at the next
layer, we take Veronese re-embeddings of these secant varieties to obtain
points of vδ0 (σr (vδ (PV ))), and in the final addition gate we obtain a point
of σr0 (vδ0 (σr (vδ (PV )))). Thus we may rephrase Theorem 7.1.2.1(2) as:
Proposition 7.1.3.1. [Lan14a] Let d = nO(1) and let P ∈ S d Cn be a
polynomial sequence that can be computed by √
a circuit of size √
s. Then
n−1
[P ] ∈ σr1 (v d (σr2 (vδ (P
)))) with roughly δ ∼ d and r1 r2 ∼ s d , more
δ
√
precisely r1 r2 δ = 2O( dlog(ds)log(n)) .
√
Corollary 7.1.3.2. [GKKS13b]
If for all but finitely many m, δ ' m,
√
2
and all r1 , r2 such that r1 r2 = 2 mlog(m)ω(1) , one has [permm ] 6∈ σr1 (vm/δ (σr2 (vδ (Pm −1 )))),
then there is no circuit of polynomial size computing the permanent, i.e.,
VP 6= VNP.
2 −1
Problem 7.1.3.3. Find equations for σr1 (vδ (σr2 (vδ (Pm
)))).
152
7. Shallow circuits and the Chow variety
7.1.4. Geometric reformulation of ΣΠΣ circuits. The homogeneous
ΣΠΣ circuits are easily seen to be modeled by σr (Chn (V )), but Theorem
7.1.2.1(1) requires inhomogeneous ΣΠΣ circuits. One might think that for
a small price, one could restrict to homogeneous ΣΠΣ√ circuits.
E.g., does
√
detn admit a homogeneous ΣΠΣ circuit of size ∼ (n4 ) n ∼ 2 nlogn ?, i.e., is
2
[detn ] ∈ σ2√nlogn Chn (Cn )? The answer is no:
2
Proposition 7.1.4.1. detn 6∈ σ 2n (Chn (Cn )).
n
Proof. Consider the flattenings:
n
n
(detn )d n2 e,b n2 c : S d 2 e W ∗ → S b 2 c W
and
n
n
(x1 · · · xn )d n2 e,b n2 c : S d 2 e W ∗ → S b 2 c W.
In the first case the image is spanned by the minors of size b n2 c, so it is of
2
dimension b nn c , whereas in the second case the image is spanned by the
2
monomials xi1 · · · xib n c for I ⊂ [n], so it has dimension b nn c . Thus
2
2
detn 6∈ σ(
n2
) (Chn (C )).
n
bn
2c
Recalling that
2m
m
∼
m
√4
,
πm
we conclude.
Remark 7.1.4.2. The flattening above gives a lower bound on the symmetric tensor border rank of points of the Chow variety. The symmetric
rank is known: On one hand,
X
1
(7.1.1) x1 · · · xn = n−1
(x1 + 1 x2 + · · · + n−1 xn )n 1 · · · n−1 ,
2
n!
n−1
∈{−1,1}
so RS (x1 · · · xn ) ≤ 2n−1 . (This expression dates at least back to [Fis94].)
On the other hand, in [RS11] they showed
(7.1.2)
RS (x1 · · · xn ) = 2n−1 .
Exercise 7.1.4.3: Show that [permn ] 6∈ σO( 4n ) vn (PW ).
n
In summary:
Proposition 7.1.4.4. [NW97] The polynomial sequences detn and permn
do not admit depth three circuits of size 2n .
Remark 7.1.4.5. In [NW97] they consider all partial derivatives of all
orders simultaneously, but the bulk of the dimension is concentrated in the
middle order flattening.
7.1. Shallow circuits and geometry
153
Thus homogeneous depth three circuits at first sight do not seem that
powerful because a polynomial sized homogeneous depth 3 circuit cannot
compute the determinant.
To make matters worse, consider the polynomial corresponding to iterated matrix multiplication of three by three matrices IM Mk3 ∈ S k (C9k ). It
is in VPe , i.e., it admits a polynomial size formula.
Exercise 7.1.4.6: Show that IM Mk3 6∈ σpoly(k) (Chk (W )).
By Exercise 7.1.4.6, homogeneous depth three circuits (naı̈vely applied)
cannot even capture sequences of polynomials admitting small formulas.
Fortunately this problem is easy to fix. If at the first level, we add a homogenizing variable `, so that the affine linear outputs become linear in our
original variables plus `, the product gates will each produce a homogeneous
polynomial. While the different product gates may produce polynomials of
different degrees, if we were trying to produce a homogeneous polynomial,
when we add them up what remains must be a sum of homogeneous polynomials, such that when we set ` = 1, we obtain the desired homogeneous
polynomial. Say the largest power of ` appearing in this sum is qL . For
each other term there is some other power of ` appearing, say qi for the i-th
term. Then to the original circuit, add qL − qi inputs to the i-th product
gate, where each input is `. This will not change the size of the circuit by
more than qL r, so it remains of the same order. Our new homogeneous
depth three circuit will output `qL P .
We conclude:
Proposition 7.1.4.7. [Lan14a] Let d = nO(1) and let P ∈ S d Cn be a
polynomial that can be computed by a circuit of size s.
Then √
[`N −d P ] ∈ σr (ChN (Cn+1 )) with roughly rN ∼ s
rN = 2O( dlog(n)log(ds)) .
√
d,
Corollary
7.1.4.8. [GKKS13b] [`n−m detm ] ∈ σr (Chn (Cm
√
2O( mlogm) .
more precisely,
2 +1
)) where rn =
Proof. The determinant admits a circuit of size m4 , so it admits a ΣΠΣ
circuit of size
√
2O(
mlog(m)log(m∗m4 ))
so its padded version lies in σr (Chn
∼ 2O(
2
(Cm +1 ))
√
mlogm)
,
√
where rn = 2O(
mlogm) .
Corollary 7.1.4.9.
[GKKS13b] If for all but finitely many m and all r, n
√
2
mlog(m)ω(1)
with rn = 2
, one has [`n−m permm ] 6∈ σr (Chn (Cm +1 )), then
there is no circuit of polynomial size computing the permanent, i.e., VP 6=
VNP.
154
7. Shallow circuits and the Chow variety
Proof. One just needs to observe that the number of edges in the first layer
(which are invisible from the geometric perspective) is dominated by the
number of edges in the other layers.
Problem 7.1.4.10. [CKW10, Open problem 11.1] Find an explicit sequence of polynomials Pm ∈ S m Cw−1 such that for m sufficiently large
`n−m Pm 6∈ σr (Chn (W )), whenever r, w, n are polynomials in m.
Remark 7.1.4.11. The expected dimension of σr (Chd (W )) is rdw + r −
m
1. If we take d0 = d2m and work instead with padded polynomials `2 P ,
the expected dimension of σr (Chd0 (W )) is 2m rdw + r − 1. In contrast,
the expected dimension of σr (vd−a (σρ (va (PW )))) does not change when one
increases the degree, which gives some insight as to why padding is so useful
for homogeneous depth three circuits but not for ΣΛΣΛΣ circuits.
7.1.5. Elementary symmetric functions and depth 3 circuits. **This
section under construction***
7.2. Hilbert functions and the method of shifted partial
derivatives
**This section to be written***
7.3. The Chow variety
Motivation for studying the ideal of the Chow variety was given above.
This ideal has been studied for over 100 years, dating back at least to Brill,
Gordan and Hadamard. The history is rife with rediscoveries and errors
that only make the subject more intriguing.
7.3.1. The Hermite-Hadamard-Howe map and the ideal of the
Chow variety. The following linear map was first defined when dim W = 2
by Hermite (1854), and in general independently by Hadamard (1897) and
Howe (1988).
Definition 7.3.1.1. The Hermite-Hadamard-Howe map hd,n : S d (S n W ) →
S n (S d W ) is defined as follows: First include S d (S n W ) ⊂ W ⊗nd . Next,
reorder the copies of W from d blocks of n to n blocks of d and symmetrize
the blocks of d to obtain an element of (S d W )⊗n . Finally, thinking of S d W
as a single vector space, symmetrize the n blocks.
7.3. The Chow variety
155
For example, putting subscripts on W to indicate position:
S 2 (S 3 W ) ⊂ W ⊗6 = W1 ⊗W2 ⊗W3 ⊗W4 ⊗W5 ⊗W6
→ (W1 ⊗W4 )⊗(W2 ⊗W5 )⊗(W3 ⊗W6 )
→ S 2 W ⊗S 2 W ⊗S 2 W
→ S 3 (S 2 W )
Note that hd,n is a GL(W )-module map.
Example 7.3.1.2. Here is h2,2 ((xy)2 ):
1
(xy)2 7→ [(x⊗y + y⊗x)⊗(x⊗y + y⊗x)]
4
1
= [x⊗y⊗x⊗y + x⊗y⊗y⊗x + y⊗x⊗x⊗y + y⊗x⊗y⊗x]
4
1
7→ [x⊗x⊗y⊗y + x⊗y⊗y⊗x + y⊗x⊗x⊗y + y⊗y⊗x⊗x]
4
1
7→ [2(x2 )⊗(y 2 ) + 2(xy)⊗(xy)]
4
1
7→ [(x2 )(y 2 ) + (xy)(xy)].
2
Exercise 7.3.1.3: Show that hd,n (xn1 · · · xnd ) = (x1 · · · xd )n .
Exercise 7.3.1.4: Show that hd,n : S d (S n V ) → S n (S d V ) is “self-dual” in
the sense that hTd,n = hn,d : S n (S d V ∗ ) → S d (S n V ∗ ). Conclude that hd,n
surjective if and only if hn,d is injective.
Theorem 7.3.1.5 (Hadamard [Had97]). ker hd,n = Id (Chn (W ∗ )).
Proof. Let P ∈ S d (S n W ). Write P =
Let `1 , . . . , `n ∈ W ∗ .
P
j
xn1j · · · xndj for some xα,j ∈ W .
P (`1 · · · `n ) = hP , (`1 · · · `n )d i
X
=
hxn1j · · · xndj , (`1 · · · `n )d i
j
=
X
=
X
=
X
hxn1j , (`1 · · · `n )i · · · hxndj , (`1 · · · `n )i
j
Πns=1 Πdi=1 xij (`s )
j
hx1j · · · xdj , (`1 )d i · · · hx1j · · · xdj , (`n )d i
j
= hhd,n (P ), (`1 )d · · · (`n )d i
156
7. Shallow circuits and the Chow variety
If hd,n (P ) is nonzero, there will be some monomial it will pair with to be
nonzero. On the other hand, if hd,n (P ) = 0, then P annihilates all points of
Chn (W ∗ ).
Exercise 7.3.1.6: Show that if hd,n : S d (S n Cm ) → S n (S d Cm ) is not surjective, then hd,n : S d (S n Ck ) → S n (S d Ck ) is not surjective for all k > m,
and that the partitions describing the kernel are the same in both cases if
d ≤ m.
Exercise 7.3.1.7: Show that if hd,n : S d (S n Cm ) → S n (S d Cm ) is surjective,
then hd,n : S d (S n Ck ) → S n (S d Ck ) is surjective for all k < m.
Example 7.3.1.8 (The case dim W = 2). When dim W = 2, every polynomial decomposes as a product of linear factors, so the ideal of Chn (C2 ) is
zero. We recover the following theorem of Hermite:
Theorem 7.3.1.9 (Hermite reciprocity). The map hd,n : S d (S n C2 ) →
S n (S d C2 ) is an isomorphism for all d, n. In particular S d (S n C2 ) and S n (S d C2 )
are isomorphic GL2 -modules.
Often in modern textbooks only the “In particular” is stated.
Theorem 7.3.1.10 (Hadamard [Had99]). The map h3,3 : S 3 (S 3 W ) →
S 3 (S 3 W ) is an isomorphism.
Proof. Without loss of generality, assume w = 3 and x1 , x2 , x3 ∈ W ∗ are a
basis. Say we had P ∈ I3 (Ch3 (W ∗ )). Consider P (µ(x31 +x32 +x33 )−λx1 x2 x3 )
as a cubic polynomial on P1 with coordinates [µ, λ]. Note that it vanishes at
the four points [0, 1], [1, 3], [1, 3ω], [1, 3ω 2 ] where ω is a primitive third root of
unity. A cubic polynomial on P1 vanishing at four points is identically zero.
In particular, P (1, 0) = 0, i.e., P vanishes on x31 + x32 + x33 . Hence it must
vanish identically on σ3 (v3 (P2 )). But I3 (σ3 (v3 (P2 ))) = 0, see, e.g., Corollary
?? (in fact σ3 (v3 (P2 )) ⊂ PS 3 C3 is a hypersurface of degree four).
Remark 7.3.1.11. The above proof is due to A. Abdesselam (personal
communication). It is a variant of Hadamard’s original proof, where instead
of x31 + x32 + x33 one uses an arbitrary cubic f , and generalizing x1 x2 x3 one
uses the Hessian H(f ). Then the curves f = 0 and H(f ) = 0 intersect in
9 points (the nine flexes of f = 0) and there are four groups of three lines
going through these points, i.e. four places where the polynomial becomes
a product of linear forms.
Theorem 7.3.1.12. [BL89] (also see [McK08, Thm. 8.1]) If hd,n is surjective, then hd0 ,n is surjective for all d0 > d. In other words, if hn,d is injective,
then hn,d0 is injective for all d0 > d.
Proof. **proof to be inserted***
7.3. The Chow variety
157
Remark 7.3.1.13. The statements and proofs in [BL89, McK08] were
regarding the map hd,n:0 defined in §7.3.2 below.
Originally Hadamard mistakenly thought the maps hd,n were always of
maximal rank, but in [Had99] he proved the map is an isomorphism when
d = n = 3 and posed determining if injectivity holds in general when d ≤ n
as a open problem. (Injectivity for d ≤ n is equivalent to surjectivity when
d ≥ n by Exercise 7.3.1.4.) Howe [How87] also investigated the map hd,n
and wrote “it is reasonable to expect” that hd,n is always of maximal rank.
Theorem 7.3.1.14. [MN05] The map h5,5 is not surjective.
Remark 7.3.1.15. In [MN05] they showed the map h5,5:0 defined in §7.3.2
below was not injective. A. Abdessalem realized their computation showed
the map h5,5 is not injective and pointed this out to them. Evidently there
was some miscommunication because in [MN05] they mistakenly say the
result comes from [Bri02] rather than their own paper.
The GL(V )-module structure of the kernel of h5,5 was determined by C.
Ikenmeyer and S. Mrktchyan as part of a 2012 AMS MRC program:
Proposition 7.3.1.16 (Ikenmeyer and Mkrtchyan, unpublished). The kernel of h5,5 : S 5 (S 5 C5 ) → S 5 (S 5 C5 ) consists of irreducible modules corresponding to the following partitions:
{(14, 7, 2, 2), (13, 7, 2, 2, 1), (12, 7, 3, 2, 1), (12, 6, 3, 2, 2),
(12, 5, 4, 3, 1), (11, 5, 4, 4, 1), (10, 8, 4, 2, 1), (9, 7, 6, 3)}.
All these occur with multiplicity one in the kernel, but not all occur with
multiplicity one in S 5 (S 5 C5 ). In particular, the kernel is not an isotypic
component.
The Young diagrams of the kernel of h5,5 are:
,
,
,
,
,
158
7. Shallow circuits and the Chow variety
,
.
The phrase “with high probability” means the result was obtained numerically, not symbolically.
While the Hermite-Hadamard-Howe map is not always of maximal rank,
it is “eventually” of maximal rank:
Theorem 7.3.1.17. [Bri93] The Hermite-Hadamard-Howe map
hd,n : S d (S n W ∗ ) → S n (S d W ∗ )
is surjective for d sufficiently large.
I present the proof of Theorem 7.3.1.17 in §8.1.1.
Moreover, in [Bri97] Brion gives an explicit exponential bound on the
size of d required with respect to n that assures surjectivity, see Equation
(8.1.3).
Problem 7.3.1.18 (The Hadamard-Howe Problem). Determine the function d(n) such that hd,n is surjective for all d ≥ d(n).
A more ambitious problem would be:
Problem 7.3.1.19. Determine the kernel of hd,n .
The following special case of the problem would prove Conjecture 6.2.1.3.
Conjecture 7.3.1.20 (Kumar [Kum]). Let n be even, then for all d ≤ n,
Snd Cn 6⊂ ker hd,n , i.e., Snd Cn ⊂ C[Chn (Cn )].
From Exercise ??, the trivial SLn -module Snn Cn occurs in S n (S n Cn )
with multiplicity one when n is even and zero when n is odd.
It is not hard to see that the d = n case implies the others. Adopt the
notation that if π = (p1 , . . . , pk ), then mπ = (mp1 , . . . , mpk ). By taking
Cartan products in the coordinate ring, Conjecture 7.3.1.20 would imply
Conjecture 6.2.1.3.
7.3.2. Sdn -formulation of the Hadamard-Howe problem. The dimension of V , as long as it is at least d, is irrelevant for the GL(V )-module
structure of the kernel of hd,n . In this section assume dim V = dn. Fix
a basis of V . Then the Weyl group WV of GL(V ) is the subgroup of
GL(V ) consisting of the permutation matrices (in particular, it is isomorphic to Sdn ). It acts on V ⊗dn by acting on each factor. (I write WV
7.3. The Chow variety
159
to distinguish this from the Sdn -action permuting the factors.) An element x ∈ V ⊗dn has sl(V )-weight zero if, in the standard basis {ei }1≤i≤dn
of V P
induced from the identification V ' Cnd , x is a sum of monomials
x = I=(i1 ,...,ind ) xI ei1 ⊗ · · · ⊗ eind , where I runs over the orderings of [nd].
If one restricts hd,n to the sl(V )-weight zero subspace, one obtains a WV module map
hd,n:0 : S d (S n V )0 → S n (S d V )0 .
(7.3.1)
These WV -modules are as follows: Let Sn o Sd ⊂ Sdn denote the wreath
product, which, by definition, is the normalizer of S×d
n in Sdn . It is the
semi-direct product of S×d
with
S
,
where
S
acts
by
permuting the facd
d
n
,
see
e.g.,
[Mac95,
p
158].
Since
dim
V
=
dn, S d (S n V )0 =
tors of S×d
n
dn
IndS
Sn oSd triv, where triv denotes the trivial Sn o Sd -module as explained in
§6.7.2.
In other words, as a WV = Sdn -module map, (7.3.1) is
Sdn
dn
hd,n:0 : IndS
Sn oSd triv → IndSd oSn triv .
(7.3.2)
Moreover, since every irreducible module appearing in S d (S n V ) has a nonzero SL(V )-weight zero subspace, hd,n is the unique SL(V )-module extension of hd,n:0 .
The map hd,n:0 was defined purely in terms of combinatorics in [BL89]
as a path to try to prove the following conjecture of Foulkes:
Conjecture 7.3.2.1. [Fou50] Let d > n, let π be a partition of dn and let
[π] denote the corresponding Sdn -module. Then,
Sdn
dn
mult([π], IndS
Sn oSd triv) ≥ mult([π], IndSd oSn triv).
Equivalently,
mult(Sπ V, S d (S n V )) ≥ mult(Sπ V, S n (S d V )).
Conjecture 7.3.2.1, and in fact equality was shown to hold asymptotically
by L. Manivel in [Man98], in the sense that for any partition µ, the multiplicity of the partition (dn − |µ|, µ) is the same in S d (S n V ) and S n (S d V )
as soon as d and n are at least |µ|. (The conjecture also holds when d is
sufficiently large or small with respect to n by Brion’s theorem.) Conjecture
7.3.2.1 is still open in general.
***include survey of results in the combinatorics literature here***
7.3.3. Brill’s equations. Set theoretic equations of Chd (V ) have been
known since 1894. Here is a modern presentation elaborating the presentation in [Lan12, §8.6], which was suggested by E. Briand.
Our goal is a polynomial test to see if f ∈ S d V is a product of linear
factors. We can first try to see just if P is divisible by a power of a linear
160
7. Shallow circuits and the Chow variety
form. The discussion in §6.3.2 will not be helpful as the conditions there
are vacuous when n − m = 1. We could proceed as in §6.6.5 and check if
`xI1 ∧ · · · ∧ `xID ∧ f = 0 where the xIj are a basis of S d−1 V , but in this case
there is a simpler test to see if a given linear form ` divides f :
Consider the map πd,d : S d V ⊗S d V → Sd,d V obtained by projection.
(By the Pieri rule 5.2.7.1, Sd,d V ⊂ S d V ⊗S d V with multiplicity one.)
Lemma 7.3.3.1. Let ` ∈ V , f ∈ S d V . Then f = `h for some h ∈ S d−1 V if
and only if πd,d (f ⊗`d ) = 0.
Proof. Since πd,d, is linear, it suffices to prove the lemma when f = `1 · · · `d .
In that case πd,d (f ⊗`d ), up to a constant, is (`1 ∧ `) · · · (`d ∧ `). Since the
expression of a polynomial as a monomial is essentially unique, the result
follows.
P
We would like a map that sends `1 · · · `d to j `dj ⊗stuf fj , as then we
could apply πd,d ⊗Idstuf f to f tensored with the result of our desired map
to obtain our equations.
While it is not obvious how to obtain such a map for powers, there
is an easy way to get elementary symmetric
functions, namely the maps
P
f 7→ fj,d−j because (`1 · · · `d )j,d−j = |K|=j `K ⊗`K c where `K = `k1 · · · `kj
and K c denotes the complementary index set in [d]. We can try to convert
this to power sums by the conversion formula obtained from the relation
between generating functions (6.1.2):


e1
1
0
··· 0
2e2 e1
1
··· 0


(7.3.3)
pd = Pd (e1 , . . . , ed ) := det  .
.
.
..  .
.
.
.
 .
.
.
.
ded ed−1 ed−2 · · · e1
The desired term comes from the diagonal ed1 and the rest of the terms kill off
the unwanted terms of ed1 . This idea almost works- the only problem is that
our naı̈ve correction terms have the wrong degree on the right hand side.
For example, when d = 3, naı̈vely using p3 = e31 − 3e1 e2 + 3e3 would give,
for the first term, degree 6 = 2 + 2 + 2 on the right hand side of the tensor
product, the second degree 3 = 2 + 1 and the third degree zero. In general,
the right hand side of the ed1 term would have degree (d − 1)d , whereas the
ded term would have degree zero. In addition to fixing the degree mismatch,
we need to formalize how we will treat the right hand sides.
To these ends, recall that for any two algebras A, B, one can give A⊗B
the structure of an algebra by defining (α⊗β) · (α0 ⊗β 0 ) := αα0 ⊗ββ 0 and
extending linearly. Give Sym(V )⊗Sym(V ) this algebra structure. Define
7.3. The Chow variety
161
maps
Ej : S δ V → S j V ⊗S j(δ−1) V
(7.3.4)
f 7→ fj,δ−j · (1⊗f j−1 ).
The (1⊗f j−1 ) fixes our degree problem. If j > δ define Ej (f ) = 0.
Our desired map is
(7.3.5)
Qd : S d V → S d V ⊗S d(d−1) V
f 7→ Pd (E1 (f ), . . . , Ed (f )).
Theorem 7.3.3.2 (Brill [Bri93], Gordan [Gor94], Gelfand-Kapranov-Zelevinsky [GKZ94], Briand [Bri10]). Consider the map
(7.3.6)
B : S d V → Sd,d V ⊗S d
2 −d
V
f 7→ (πd,d ⊗IdS d2 −d V )[f ⊗Qd (f )].
(7.3.7)
Then [f ] ∈ Chd (V ) if and only if B(f ) = 0.
For the proof, we’ll argue by induction, and thus need to generalize the
definition of Qd a little. Define
(7.3.8)
Qd,δ : S δ V → S d V ⊗S d(δ−1) V
f 7→ Pd (E1 (f ), . . . , Ed (f ))
0
Lemma 7.3.3.3. If f1 ∈ S δ V and f2 ∈ S d −δ V , then
Qd,d0 (f1 f2 ) = (1⊗f1d ) · Qd,d0 −δ (f2 ) + (1⊗f2d ) · Qd,δ (f1 ).
Assume Lemma 7.3.3.3 for the moment:
Proof of Theorem 7.3.3.2. Say f = `1 · · · `d . First note that for ` ∈ V ,
Ej (`) = `j ⊗`j−1 and Qd,1 (`) = `d ⊗1. Next, compute E1 (`1 `2 ) = `1 ⊗`2 +
`2 ⊗`1 and E2 (`1 `2 ) = `1 `2 ⊗`1 `2 , so Q2,2 (`1 `2 ) = `21 ⊗`22 + `22 ⊗`21 . By induction and Lemma 7.3.3.3,
X
Qd,δ (`1 · · · `δ ) =
`dj ⊗(`d1 · · · `dj−1 `dj+1 · · · `dδ ).
j
We conclude Qd (f ) = j `dj ⊗(`d1 · · · `dj−1 `dj+1 · · · `dd ) and πd,d (`1 · · · `d , `dj ) = 0
for each j by Lemma 7.3.3.1.
P
For the other direction, first assume f is reduced, i.e., P
has no repeated
factors. Let z ∈ Zeros(f )smooth , then Qd (f ) = (E1 (f ))d + µj ⊗ψj where
2
ψj ∈ S d −d V , µj ∈ S d V and f divides ψj for each j because E1 (f )d occurs
as a monomial in the determinant (7.3.3) and all the other terms contain an
Ej (f ) with j > 1, and so are divisible by f .
Thus B(f )(·, z) = πd,d (f ⊗(dfz )d ) because E1 (f )d = (f1,d−1 )d and f1,d−1 (·, z) =
dfz , and all the ψj (z) are zero. By Lemma 7.3.3.1, dfz divides f for all
162
7. Shallow circuits and the Chow variety
z ∈ Zeros(f ). But this implies the tangent space to f is constant in a neighborhood of z, i.e., that the component containing z is a linear space, and
since every component of Zeros(f ) contains a smooth point, Zeros(f ) is a
union of hyperplanes, which is what we set out to prove.
Finally, say f = g k h where g is irreducible of degree q and h is of degree
d − qk and is relatively prime to g. Apply Lemma 7.3.3.3:
Qd (g(g k−1 h)) = (1⊗g d ) · Qd,d−q (g k−1 h) + (1⊗(g k−1 h)d ) · Qd,q (g).
A second application gives
Qd (g k h) = (1⊗g d )·[(1⊗g d )·Qd,d−2q (g k−2 h)+(1⊗(g k−2 h)d )·Qd,q (g)+(1⊗(g k−2 h)d )·Qd,q (g)].
After k − 1 applications one obtains:
Qd (g k h) = (1⊗g d(k−1) ) · [k(1⊗hd ) · Qd,q (g) + (1⊗g d ) · Qd,d−qk (h)]
and (1⊗g d(k−1) ) will also factor out of B(f ). Since B(f ) is identically zero
but g d(k−1) is not, we conclude
0 = πd,d ⊗IdS d2 −d V f ⊗[k(1⊗hd ) · Qd,q (g) + (1⊗g d ) · Qd,d−qk (h)]
Let w ∈ Zeros(g) be a general point, so in particular h(w) 6= 0. Evaluating
at (z, w) with z arbitrary gives zero on the second term and the first implies
πd,d ⊗IdS d2 −d V (f ⊗Qd,q (g)) = 0 which implies dgw divides g, so g is a linear
form.
Proof of Lemma 7.3.3.3. Define, for u ∈ Sym(V )⊗Sym(V ),
∆u : Sym(V ) → Sym(V )⊗Sym(V )
X
f 7→
uj · fj,deg(f )−j .
j
Exercise 7.3.3.4: Show that ∆u (f g) = (∆u f ) · (∆u g), and that the generating series for the Ej (f ) may be written as
1
Ef (t) =
· ∆t(1⊗f ) f.
1⊗f
Note that (1⊗f )·s = 1⊗f s and (1⊗f g) = (1⊗f ) · (1⊗g). Thus
1
1
· ∆[t(1⊗g)](1⊗f ) (f )] · [
· ∆[t(1⊗f )](1⊗g) (g)],
Ef g (t) = [
1⊗f
1⊗g
and taking the logarithmic derivative (recalling Equation (6.1.2)) we conclude.
Remark 7.3.3.5. There was a gap in the argument in [Gor94], repeated in
[GKZ94], when proving the “only if” part of the argument. They assumed
that the zero set of f contains a smooth point, i.e., that the differential of
f is not identically zero. This gap was fixed in [Bri10]. In [GKZ94] they
use G0 (d, dim V ) to denote Chd (V ).
7.3. The Chow variety
163
7.3.4. Brill’s equations as modules. Brill’s equations are of degree d+1
2
on S d V ∗ . (The total degree of Sd,d V ⊗S d −d V is d(d + 1) which is the total
degree of S d+1 (S d V ).) Consider the GL(V )-module map
Sdd V ⊗S d
2 −d
V → S d+1 (S d V )
given by Brill’s equations. The components of the target are not known in
general and the set of modules present grows extremely fast. One can use
the Pieri formula 5.2.7.1 to get the components of the first. Using the Pieri
formula, we conclude:
Proposition 7.3.4.1. As a GL(V )-module, Brill’s equations are multiplicity free.
Exercise 7.3.4.2: Write out the decomposition and show that only partitions with three parts appear as modules in Brill’s equations. }
Remark 7.3.4.3. If d < v = dim V , then Chd (V ) ⊂ Subd (S d V ) so I(Chd (V )) ⊃
Λd+1 V ∗ ⊗Λd+1 (S d−1 V ∗ ). J. Weyman (in unpublished notes from 1994) observed that these equations are not in the ideal generated by Brill’s equations. More precisely, the ideal generated by Brill’s equations does not
include modules Sπ V ∗ with `(π) > 3, so it does not cut out Chd (V ) scheme
theoretically when d < v. By Theorem 7.3.1.14 the same holds for Ch5 (C5 )
and almost certainly holds for all Chn (Cn ) with n ≥ 5.
2 −n
Problem 7.3.4.4. What is the kernel of Brill : Sn,n W ⊗S n
W → S n+1 (S n W )?
7.3.5. Conjecture 7.3.1.20 and a conjecture in combinatorics. Let
P ∈ Snd (Cd ) ⊂ S d (S n Cd ) be non-zero. Conjecture 7.3.1.20 may be stated
as P ((x1 · · · xn )d ) 6= 0. Our first task is to obtain an expression for P .
Let V = Cd . For any even n, the one-dimensional module S(nd ) V occurs
with multiplicity one in S d (S n V ) (cf. [How87, Proposition 4.3]). Fix a
volume form on V so that detd ∈ S d V is well defined.
Proposition 7.3.5.1. [KL] Let n be even. The unique (up to scale) polynomial P ∈ S(nd ) V ⊂ S d (S n V ) evaluates on
x = (v11 · · · vn1 )(v12 · · · vn2 ) · · · (v1d · · · vnd ) ∈ S d (S n V ∗ ), for any vji ∈ V ∗ ,
to give
(7.3.9)
hP, xi =
X
detd (vσ11 (1) , . . . , vσdd (1) ) · · · detd (vσ11 (n) , . . . , vσdd (n) ).
σ1 ,...,σd ∈Sn
Proof. Let P̄ ∈ (V )⊗nd be defined by the identity (7.3.9) (with P replaced
by P̄ ). It suffices to check that
(i) P̄ ∈ S d (S n V ),
164
7. Shallow circuits and the Chow variety
(ii) P̄ is SL(V ) invariant, and
(iii) P̄ is not identically zero.
Observe that (iii) follows from the identity (7.3.9) by taking vji = xi
where x1 , . . . , xd is a basis of V ∗ , and (ii) follows because SL(V ) acts trivially
on detd .
To see (i), we show (ia) P̄ ∈ S d ((V )⊗n ) and (ib) P̄ ∈ (S n V )⊗d to conclude. To see (ia), it is sufficient to show that exchanging two adjacent
factors in parentheses in the expression of x will not change (7.3.9). Exchange vj1 with vj2 in the expression for j = 1, . . . , n. Then, each individual
determinant will change sign, but there are an even number of determinants,
so the right hand side of (7.3.9) is unchanged. To see (ib), it is sufficient
to show the expression is unchanged if we swap v11 with v21 in (7.3.9). If we
multiply by n!, we may assume σ1 = Id, i.e.,
hP̄ , xi =
X
n!
detd (v11 , vσ22 (1) , . . . , vσdd (1) ) detd (v21 , vσ22 (2) , . . . , vσdd (2) ) · · · detd (vn1 , vσ22 (n) , . . . , vσdd (n) ).
σ2 ,...,σd ∈Sn
With the two elements v11 and v21 swapped, we get
(7.3.10)
X
n!
detd (v21 , vσ22 (1) , . . . , vσdd (1) ) detd (v11 , vσ22 (2) , . . . , vσdd (2) ) · · · detd (vn1 , vσ22 (n) , . . . , vσdd (n) ).
σ2 ,...,σd ∈Sn
Now right compose each σs in (7.3.10) by the transposition (1, 2). The
expressions become the same.
Now specialize to the case d = n (this is the critical case) and evaluate
on (x1 · · · xn )n , where x1 , . . . , xn is a uni-modular basis of V ∗ .
(7.3.11)
X
hP, (x1 · · · xn )n i =
detd (xσ1 (1) , . . . , xσn (1) ) · · · detd (xσ1 (n) , . . . , xσn (n) ).
σ1 ,...,σn ∈Sn
For a fixed (σ1 , . . . , σn ) the contribution will either be 0, 1 or −1. The
contribution is zero unless for each j, the indices σ1 (j), . . . , σn (j) are distinct.
Arrange these numbers in an array:


σ1 (1) · · · σn (1)


..


.
σ1 (n) · · ·
σn (n)
The contribution is zero unless the array is a Latin square, i.e., an n × n matrix such that each row and column consists of the integers {1, . . . , n}. If it is
a Latin square, the rows correspond to permutations, and the contribution
of the term is the product of the signs of these permutations. Call this the
row sign of the Latin square. There is a famous conjecture in combinatorics
7.3. The Chow variety
165
regarding the products of both the signs of the row permutations and the
column permutations, called the sign of the Latin square:
Conjecture 7.3.5.2 (Alon-Tarsi [AT92]). Let n be even. The number of
sign −1 Latin squares of size n is not equal to the number of sign +1 Latin
squares of size n.
Conjecture 7.3.5.2 is known to be true when n = p±1, where p is an odd
prime; in particular, it is known to be true up to n = 24 [Gly10, Dri97].
In [HR94], Huang and Rota showed:
Theorem 7.3.5.3. [HR94, Identities 8,9] The difference between the number of column even Latin squares of size n and the number of column odd
Latin squares of size n equals the difference between the number of even
Latin squares of size n and the number of odd Latin squares of size n, up to
sign. In particular, the Alon-Tarsi conjecture holds for n if and only if the
column-sign Latin square conjecture holds for n.
Thus
Theorem 7.3.5.4. [KL] The Alon-Tarsi conjecture holds for n if and only
if Snn (Cn ) ∈ C[Chn (Cn )].
In [KL] several additional statements equivalent to the conjecture were
given. In particular, for those familiar with integration over compact Lie
groups, the conjecture holds for n if and only if
Z
Π1≤i,j≤n gji dµ 6= 0
(gji )∈SU (n)
where dµ is Haar measure.
Chapter 8
Advanced Topics
In this chapter I present two (possibly more) results that require a more
advanced background in algebraic geometry. In §8.1 I present M. Brion’s
proof of the asymptotic surjectivity of the Hermite-Hadamard-Howe map.
In §8.2 I present S. Kumar’s proof of the non-normality of the determinant
orbit closure.
8.1. Asymptotic surjectivity of the Hadamard-Howe map
*This section is still in rough form*****
8.1.1. Coordinate ring of the normalization of the Chow variety.
*** introduction about normalization, and normal varieties to be added***
In this section I follow [Bri93]. There is another variety whose coordinate ring is as computable as the coordinate ring of the orbit, the normalization of the Chow variety. We work in affine space.
An affine variety Z is normal if C[Z] is integrally closed, that is if every
element of C(Z), the field of fractions of C[Z], that is integral over C[Z]
(i.e., that satisfies a monic polynomial with coefficients in C[Z]) is in C[Z].
To every affine variety Z one may associate a unique normal affine variety
N or(Z), called the normalization of Z, such that there is a finite map π :
N or(Z) → Z (i.e. C[N or(Z)] is integral over C[Z]), in particular it is
generically one to one and one to one over the smooth points of Z. For
details see [Sha94, Chap II.5].
In particular, there is an inclusion C[Z] → C[N or(Z)] given by pullback
of functions, e.g., given f ∈ C[Z], define f˜ ∈ C[N or(Z)] by f˜(z) = f (π(x)).
167
168
8. Advanced Topics
If the non-normal points of Z form a finite set, then the cokernel is finite
dimensional. If Z is a G-variety, then N or(Z) will be too.
Recall that Chn (W ) is the projection of the Segre variety, but since we
want to deal with affine varieties, we will deal with the cone over it. Consider
the product map
φn : W ×n → S n W
(u1 , . . . , un ) 7→ u1 · · · un
Note that i) the image of φn is Ĉhn (W ), ii) φn is Γn = TW n Sn equivariant.
For any affine algebraic group Γ and any Γ-variety Z, define the GIT
quotient Z//Γ to be the affine algebraic variety whose coordinate ring is
C[Z]Γ . (When Γ is finite, this is just the usual set-theoretic quotient. In
the general case, Γ-orbits will be identified in the quotient when there is no
Γ-invariant regular function that can distinguish them.) If Z is normal, then
so is Z//Γ (see, e.g. [Dol03, Prop 3.1]). In our case W ×n is an affine Γn variety and φn factors through the GIT quotient because it is Γn -equivariant,
so we obtain a map
ψn : W ×n //Γn → S n W
whose image is Ĉhn (W ). By unique factorization, ψn is generically one
to one. Elements of W ×n of the form (0, u2 , . . . , un ) cannot be distinguished from (0, . . . , 0) by Γn invariant functions, so they are identified with
(0, . . . , 0) in the quotient, which is consistent with the fact that φn (0, u2 , . . . , un ) =
0. Observe that φn and ψn are GL(W ) = SL(W ) × C∗ equivariant.
Consider the induced map on coordinate rings:
ψn∗ : C[S n W ] → C[W ×n //Γn ] = C[W ×n ]Γn .
For affine varieties, C[Y × Z] = C[Y ]⊗C[Z] (see, e.g. [Sha94, §2.2]), so
C[W ×n ] = C[W ]⊗n
= Sym(W ∗ )⊗ · · · ⊗ Sym(W ∗ )
M
=
S i1 W ∗ ⊗ · · · ⊗ S in W ∗ .
i1 ,...,in ∈Z≥0
Taking torus invariants gives
SL
C[W ×n ]Tn =
M
SiW ∗⊗ · · · ⊗ SiW ∗,
i
and finally
SL
(C[W ×n ]Tn )Sn =
M
S n (S i W ∗ ).
i
In summary,
ψn∗ : Sym(S n W ∗ ) → ⊕i (S n (S i W ∗ )),
8.1. Asymptotic surjectivity of the Hadamard-Howe map
169
and this map respects GL-degree, so it gives rise to maps h̃d,n : S d (S n W ∗ ) →
S n (S d W ∗ ).
Proposition 8.1.1.1. h̃d,n = hd,n .
Proof. Since elements of the form xn1 · · · xnd span S d (S n W ) it will be sufficient to prove the maps agree on such elements. By Exercise 7.3.1.3,
hd,n (xn1 · · · xnd ) = (x1 · · · xd )n . On the other hand, in the algebra C[W ]⊗n ,
the multiplication is (f1 ⊗ · · · ⊗ fn ) } (g1 ⊗ · · · ⊗ gn ) = f1 g1 ⊗ · · · ⊗ fn gn and
this descends to the algebra (C[W ]⊗n )Γn which is the target of the algebra
map ψn∗ , i.e.,
h̃d,n (xn1 · · · xnd ) = ψn∗ (xn1 · · · xnd )
= ψn∗ (xn1 ) } · · · } ψn∗ (xnd )
= xn1 } · · · } xnd
= (x1 · · · xd )n .
Proposition 8.1.1.2. ψn : W ×n //Γn → Ĉhn (W ) is the normalization of
Ĉhn (W ).
8.1.2. Brion’s asymptotic surjectivity result. A regular (see, e.g. [Sha94,
p.27] for the definition of regular) map between affine varieties f : X → Y
such that f (X) is dense in Y is defined to be finite if C[X] is integral over
C[Y ] (see, e.g. [Sha94, p. 61]). To prove the proposition, we will need a
lemma:
Lemma 8.1.2.1. Let X, Y be affine varieties equipped with polynomial C∗ actions with unique fixed points 0X ∈ X, 0Y ∈ Y , and let f : X → Y be
a C∗ -equivariant morphism such that as sets, f −1 (0Y ) = {0X }. Then f is
finite.
Proof of Proposition 8.1.1.2. Since W ×n //Γn is normal and ψn is regular and generically one to one, it just remains to show ψn is finite.
Write [0] = [0, . . . , 0]. To show finiteness, by Lemma 8.1.2.1, it is sufficient to show ψn −1 (0) = [0] as a set, as [0] is the unique C∗ fixed point
in W ×n //Γn , and every C∗ orbit closure contains [0]. Now u1 · · · un = 0 if
and only if some uj = 0, say u1 = 0. The T -orbit closure of (0, u2 , . . . , un )
contains the origin so [0, u2 , . . . , un ] = [0].
Proof of Lemma 8.1.2.1. C[X], C[Y ] are Z≥0 -graded, and the hypothesis
f −1 (0Y ) = {0X } states that C[X]/f ∗ (C[Y ]>0 )C[X] is a finite dimensional
vector space. We want to show that C[X] is integral over C[Y ]. This is a
graded version of Nakayama’s Lemma (the algebraic implicit function theorem).
170
8. Advanced Topics
In more detail (see, e.g. [Kum13, Lemmas 3.1,3.2], or [Eis95, p136,
Ex. 4.6a]):
Lemma 8.1.2.2. Let R, S be Z≥0 -graded, finitely generated domains over
C such that R0 = S0 = C, and let f ∗ : R → S be an injective graded
algebra homomorphism. If f −1 (R>0 ) = {S>0 } as sets, where f : Spec(S) →
Spec(R) is the induced map on the associated schemes, then S is a finitely
generated R-module. In particular, it is integral over R.
Proof. The hypotheses on the sets says that S>0 is the only maximal ideal
of S containing the ideal m generated by f ∗ (R>0 ), so the radical of m must
d must be contained in it for all d > d ,
equal S>0 , and in particular S>0
0
for some d0 . So S/m is a finite dimensional vector space, and by the next
lemma, S is a finitely generated R-module.
Lemma 8.1.2.3. Let S be as above, and let M be a Z≥0 -graded S-module.
Assume M/(S>0 · M ) is a finite dimensional vector space over S/S>0 ' C.
Then M is a finitely generated S-module.
Proof. Choose a set of homogeneous generators {x1 , . . . , xn } ⊂ M/(S>0 ·M )
and let xj ∈ M be a homogeneous lift of xj . Let N ⊂ M be the graded
S-submodule Sx1 + · · · + Sxn . Then M = S>0 M + N , as let a ∈ M ,
consider a ∈ M/(S>0 M ) and lift it to some b ∈ N , so a − b ∈ S>0 M , and
a = (a − b) + b. Now quotient by N to obtain
(8.1.1)
S>0 · (M/N ) = M/N.
If M/N 6= 0, let d0 be the smallest degree such that (M/N )d0 6= 0. But
S>0 · (M/N )≥d0 ⊂ (M/N )≥d0 +1 so there is no way to obtain (M/N )d0 on
the right hand side. Contradiction.
Theorem 8.1.2.4. [Bri93] For all n ≥ 1, ψn restricts to a map
(8.1.2)
ψno : (W ×n //Γn )\[0] → S n W \0
such that ψno∗ : C[S n W \0] → C[(W ×n //Γn )\[0]] is surjective.
Corollary 8.1.2.5. [Bri93] The Hermite-Hadamard-Howe map
hd,n : S d (S n W ∗ ) → S n (S d W ∗ )
is surjective for d sufficiently large.
Proof of Corollary. Theorem 8.1.2.4 implies (ψn∗ )d is surjective for d sufficiently large, because the cokernel of ψn∗ is supported at a point and thus
must vanish in large degree.
The proof of Theorem 8.1.2.4 will give a second proof that the kernel of
ψn∗ equals the ideal of Chn (W ).
8.1. Asymptotic surjectivity of the Hadamard-Howe map
171
Proof of Theorem. Since ψn is C∗ -equivariant, we can consider the quotient to projective space
ψ n : ((W ×n //Γn )\[0])/C∗ → (S n W \0)/C∗ = PS n W
and show that ψ ∗n is surjective. Note that ((W ×n //Γn )\[0])/C∗ is GL(V )isomorphic to (PW )×n /Sn , as
(W ×n //Γn )\[0] = (W \0)×n /Γn
and Γn × C∗ = (C∗ )×n o Sn . So
ψ n : (PW )×n /Sn → PS n W.
It will be sufficient to show ψ ∗n is surjective on affine open subsets that cover
the spaces. Let w1 , . . . , ww be a basis of W and consider the affine open
subset of PW given by elements where the coordinate on w1 is nonzero,
and the corresponding induced affine open subsets of (PW )×n and PS n W ,
call these (PW )×n
and (PS n W )1 . We will show that the algebra of Sn 1
n
invariant functions on (PW )×n
1 is in the image of (PS W )1 . The restriction
×n
of the quotient by Sn of (PW ) composed with ψ n to these open subsets
in coordinates is
((w1 +
w
X
s=2
x1s ws ), . . . , (w1 +
w
X
n
xw
s ws ) 7→ Πi=1 (w1 +
s=2
w
X
xis ws ).
s=2
Finally, by e.g., [Wey97, §II.3], the coordinates on the right hand side
generate the algebra of Sn -invariant functions in the n sets of variables
(xis )i=1,...,n .
With more work, in [Bri97, Thm 3.3], Brion obtains an explicit (but
enormous) function d0 (n, w) which is
$ n+w−1 %
(8.1.3)
d0 (n, w) = (n − 1)(w − 1)((n − 1)
w−1
w
− n)
for which the hd,n is surjective for all d > d0 where dim W = w.
Problem 8.1.2.6. Improve Brion’s bound to say, a polynomial bound in n
when n = w.
Problem 8.1.2.7. Note that C[N or(Chn (W ))] = C[GL(W ) · (x1 · · · xn )]≥0
and that the the boundary of the orbit closure is irreducible. Under what
conditions will a GL(W )-orbit closure with reductive stabilizer that has an
irreducible boundary will be such that the coordinate ring of the normalization of the orbit closure equals the positive part of the coordinate ring of
the orbit?
172
8. Advanced Topics
8.2. Non-normality of Detn
**give context** be sure to include how SL-orbits are closed*** I follow
[Kum13] in this section. Throughout this section I make the following
assumptions and adopt the following notation:
Set up:
• V is a GL(W )-module,
• Let P 0 := GL(W ) · P and P := GL(W ) · P denote its orbit and
orbit closure, and let ∂P = P\P 0 denote its boundary, which we
assume to be more than zero (otherwise [P] is homogeneous).
(8.2.1)
Assumptions :
(1) P ∈ V is such that the SL(W )-orbit of P is closed.
(2) The stabilizer GP ⊂ GL(W ) is reductive, which is equivalent (by a
theorem of Matsushima [Mat60]) to requiring that P 0 is an affine
variety.
This situation holds when V = S n W , dim W = n2 and P = detn or
Pr
j
j
permn as well as when dim W = rn and P = Snr :=
j=1 x1 · · · xn , the
sum-product polynomial, in which case P = σ̂r (Chn (W )).
Lemma 8.2.0.8. [Kum13] Assumptions as in (8.2.1). Let M ⊂ C[P] be
a nonzero GL(W )-module, and let Z(M ) = {y ∈ P | f (y) = 0 ∀f ∈ M }
denote its zero set. Then 0 ⊆ Z(M ) ⊆ ∂P.
If moreover M ⊂ I(∂P), then as sets, Z(M ) = ∂P.
Proof. Since Z(M ) is a GL(W )-stable subset, if it contains a point of P 0 it
must contain all of P 0 and thus M vanishes identically on P, which cannot
happen as M is nonzero. Thus Z(M ) ⊆ ∂P. For the second assertion, since
M ⊂ I(∂P), we also have Z(M ) ⊇ ∂P.
Proposition 8.2.0.9. [Kum13] Assumptions as in (8.2.1). The space of
SL(W )
SL(W )-invariants of positive degree in the coordinate ring of P, C[P]>0
,
is non-empty and contained in I(∂P). Moreover,
SL(W )
(1) any element of C[P]>0
cuts out ∂P set-theoretically, and
(2) the components of ∂P all have codimension one in P.
Proof. To study C[P]SL(W ) , consider the GIT quotient P//SL(W ) whose
coordinate ring, by definition, is C[P]SL(W ) . It parametrizes the closed
SL(W )-orbits in P, so it is non-empty. Thus C[P]SL(W ) is nontrivial.
8.2. Non-normality of Detn
173
Claim: every SL(W )-orbit in ∂P contains {0} in its closure, i.e., ∂P
maps to zero in the GIT quotient. This will imply any SL(W )-invariant
of positive degree is in I(∂P) because any non-constant function on the
GIT quotient vanishes on the inverse image of [0]. Thus (1) follows from
Lemma 8.2.0.8. The zero set of a single polynomial, if it is not empty, has
codimension one, which implies the components of ∂P are all of codimension
one, proving (2).
It remains to show ∂P maps to zero in P//SL(W ), where ρ : GL(W ) →
GL(V ) is the representation. This GIT quotient inherits a C∗ action via
ρ(λId), for λ ∈ C∗ . Its normalization is just the affine line A1 = C. To see
this, consider the C∗ -equivariant map σ : C → P given by z 7→ ρ(zId) · P ,
which descends to a map σ : C → P//SL(W ). Since the SL(W )-orbit of P
is closed, for any λ ∈ C∗ , ρ(λId)P does not map to zero in the GIT quotient,
so we have σ −1 ([0]) = {0} as a set. Lemma 8.1.2.1 applies so σ is finite and
gives the normalization. Finally, were there a closed nonzero orbit in ∂P,
it would have to equal SL(W ) · σ(λ) for some λ ∈ C∗ since σ is surjective.
But SL(W ) · σ(λ) ⊂ P 0 .
Remark 8.2.0.10. That each irreducible component of ∂P is of codimension one in P is due to Matsushima [Mat60]. It is a consequence of his
result mentioned above.
ˆ n and Perm
ˆ n is to find an
The key to proving non-normality of Det
n
SL(W )-invariant in the coordinate ring of the normalization (which has a
GL(W )-grading), which does not occur in the corresponding graded component of the coordinate ring of S n W , so it cannot occur in the coordinate
ring of any GL(W )-subvariety.
Lemma 8.2.0.11. Assumptions as in (8.2.1). Let P ∈ S n W be such that
SL(W ) · P is closed and GP is reductive. Let d be the smallest positive
SL(W )
GL(W )-degree such that C[P 0 ]d
6= 0. If n is even and d < nw (resp. n
is odd and d < 2nw) then P is not normal.
Proof. Since P 0 ⊂ P is a Zariski open subset, we have the equality of
GL(W )-modules C(P) = C(P 0 ). By restriction of functions C[P] ⊂ C[P 0 ]
and thus C[P]SL(W ) ⊂ C[P 0 ]SL(W ) . Now P 0 //SL(W ) = P 0 /SL(W ) ' C∗ ,
so C[P 0 ]SL(W ) ' ⊕k∈Z C{z k }. Under this identification, z has GL(W )degree d. By Proposition 8.2.0.9, C[P]SL(W ) 6= 0. Let h ∈ C[P]SL(W ) be
the smallest element in positive degree. Then h = z k for some k. Were P
normal, we would have k = 1.
But now we also have a surjection C[S n W ] → C[P], and by Exercise ??
the smallest possible GL(W )-degree of an SL(W )-invariant in C[S n W ] when
n is even (resp. odd) is wn (resp. 2wn) which would occur in S w (S n W )
(resp. S 2w (S n W )). We obtain a contradiction.
174
8. Advanced Topics
Theorem 8.2.0.12 (Kumar [Kum13]). For all n ≥ 3, Detn and Permnn are
not normal. For all n ≥ 2m (the range of interest), Permm
n is not normal.
I give the proof for Detn , the case of Permnn is an easy exercise. Despite
the variety being much more singular, the proof for Permm
n is more difficult,
see [Kum13].
SL(W )
Proof. We will show that when n is congruent to 0 or 1 mod 4, C[Det0n ]n−GL 6=
SL(W )
0 and when n is congruent to 2 or 3 mod 4, C[Det0n ]2n−GL 6= 0. Since
n, 2n < (n2 )n Lemma 8.2.0.11 applies.
2
The SL(W )-trivial modules are (Λn W )⊗s = Ssn2 W . Write W = E⊗F .
We want to determine the lowest degree trivial SL(W )-module that has
a Gdetn = (SL(E) × SL(F )/µn ) o Z2 invariant. We have the decompo2
sition (Λn W )⊗s = (⊕|π|=n2 Sπ E⊗Sπ0 F )⊗s , where π 0 is the conjugate par2
tition to π. Thus (Λn W )⊗s contains the trivial SL(E) × SL(F ) module (Λn E)⊗ns ⊗(Λn F )⊗ns with multiplicity one. (In the language of §6.2.5,
ksn2 ,(sn)n ,(sn)n = 1.) Now we consider the effect of the Z2 ⊂ Gdetn with
generator τ ∈ GL(W ). It sends ei ⊗fj to ej ⊗fi , so acting on W it has +1
eigenspace ei ⊗fj + ej ⊗fi for i ≤ j and −1 eigenspace ei ⊗fj − ej ⊗fi for
2
1 ≤ i < j ≤ n. Thus it acts on the one-dimensional vector space (Λn W )⊗s
n
by ((−1)( 2 ) )s , i.e., by −1 if n ≡ 2, 3 mod 4 and s is odd and by 1 otherwise.
We conclude that there is an invariant as asserted above. (In the language
2
2
n
sn
sn
of §6.2.5, sk(sn)
is
even,
and
sk
n ,(sn)n = 1 for all s when
2
(sn)n ,(sn)n = 1
n
for even s when 2 is odd and is zero for odd s.)
Exercise 8.2.0.13: Write out the proof of the non-normality of Permnn .
Exercise 8.2.0.14: Show the same method gives another proof that Chn (W )
is not normal, but that it fails (with good reason) to show σn (vd (Pn−1 )) is
not normal.
Exercise 8.2.0.15: Show a variant of the above holds for any reductive
group with a nontrivial center (one gets a Zk -grading of modules if the center
is k-dimensional), in particular it holds for G = GL(A) × GL(B) × GL(C).
Use this to show that σr (Seg(PA × PB × PC)) is not normal when dim A =
dim B = dim C = r > 2.
Hints and Answers to
Selected Exercises
1.3.2.2 For the second assertion, a generic matrix will have nonzero determinant. In general, the complement to the zero set of any polynomial
over the complex numbers has full measure. For the last assertion, first say
rank(f ) = r0 ≤ r and let v1 , . . . , vv be a basis of V such that the kernel is
spanned by the last v − r0 vectors. Then the matrix representing f will be
nonzero only in the upper r0 × r0 block and thus all minors of size greater
than r0 will be zero. Next say rank(f ) = s > r. Taking basis in the same
manner, we see the upper right size s submatrix will have a nonzero determinant. Taking a Laplace expansion, we see at least one size r + 1 minor of
it is nonzero. In any other choice of basis minors expressed in the new basis
are linear combinations of minors expressed in the old, so we conclude. If
you need help with the third assertion, use Proposition 2.4.5.1.
1.3.2.7 Consider lim→0 1 ((x + y)n − xn ).
?? answer coming soon
2.1.1.1 v ∈ V goes to the map β 7→ β(v).
2.1.1.4 trace(f ).
2.1.2.1 A multi-linear map is determined by its action on bases of A∗1 , . . . , A∗n .
2.1.3.5 Use Exercise 2.3.5.2.1.3.4.
2.3.2.5 Show that a basis of sl(V ) may be obtained from elements of GL(V )
acting on e.g, a matrix with a single nonzero entry off the diagonal, and the
matrix whose entries are all zero except the (1, 1), which is 1, and the (2, 2)
which is −1. In fact it is sufficient to use permutation matrices.
175
176
Hints and Answers to Selected Exercises
2.3.2.8 We need αXβ T = (g · α)T (g · X)(g · β). If α is a row vector then
g · α is the row vector αg −1 .
2.3.3.5 First note that V ⊗W is an H module, which decomposes as dim V
copies of W , i.e., the irreducible H-submodules are of the form v⊗W for
some v ∈ V . Thus if U ⊂ V ⊗W is a G × H-submodule, it must contain a
vector of the form v⊗W . But then it must contain span(G · v)⊗W because
it is a G-module as well. But the span of G · v is V because V is irreducible.
2.3.5.7 In each case show that both factors are included and conclude by
counting dimensions.
2.2.1.2 See [Lan12, §2.4.4]
P
A → A/(A0 )⊥ be the projec2.2.3.1 If T = ri=1 ai ⊗bi ⊗ci , then, letting πA : P
tion, and similarly for B, C, then TA0 ⊗B 0 ⊗C 0 = ri=1 πA (ai )⊗πB (bi )⊗π(ci ).
2.2.4.2 First assume R(T ) = R(T ) and write T = a1 ⊗b1 ⊗c1 + · · · +
ar ⊗br ⊗cr . 2.4.6.2 P (x) = 0 if and only if P (x, . . . , x) = P (xd ) = 0.
2.4.2.1 It is sufficient to take Y = α1 ∧ · · · ∧ αk−2j and Z = α1 ∧ · · · ∧ αk+2j .
P
2.4.2.3 The ideal is generated by j I(Xj )⊗Sym(A1 ⊗ · · · ⊗ Aj−1 ⊗Aj+1 ⊗ · · · ⊗ An )∗ +
I(Seg(PA1 × · · · × PAn )).
2.4.12.2 Since the border rank of points in GL(A) × GL(B) × GL(C) · T
equals the border rank of T , the border rank of points in the closure cannot
increase.
2.4.13.2 Since X is a G-variety, P (x) = 0 for all x ∈ X implies P (gx) = 0
for all g ∈ G.
2.4.13.3 Use that
S 2 (A⊗B⊗C) = S 2 A⊗S 2 (B⊗C) ⊕ Λ2 A⊗Λ2 (B⊗C)
= S 2 A⊗S 2 B⊗S 2 C ⊕ S 2 A⊗Λ2 B⊗Λ2 C ⊕ Λ2 A⊗S 2 B⊗Λ2 C ⊕ Λ2 A⊗Λ2 B⊗S 2 C
2.5.3.2 Use Exercise 2.2.4.3.
2.7.1.3 If R(T ) = m, then T is a limit of points T with PT (A∗ )∩Seg(PB ×
PC) 6= ∅.
2.7.1.5 First note that if x is generic, it is diagonalizable with distinct eigenvalues so if x is generic, then dim C(x) = b. Then observe that dim(C(x))
is semi-continuous as the set {y | dim C(y) ≤ p} is an algebraic variety.
3.2.1.3 Recall from Exercise 2.1.3.5 that ⊗j Mhlj ,mj ,nj i = MhΠj lj ,Πk mk ,Πl nl i .
Set N = nml and consider MhN i = Mhm,n,li ⊗Mhn,l,mi ⊗Mhl,m,ni .
3.2.2.1 Consider


♥ ♥
♥ ♠ 
♠ ♠
Hints and Answers to Selected Exercises
177
3.4.2.3 Instead of the curve a0 + ta1 use a0 + ta1 + t2 aq+1 and similarly for
b, c.
3.4.2.4 Consider the coefficient of t2 in an expansion.
3.6.1.7 When writing T = limt→0 T (t) we may take t ∈ Zh+1 . 3.6.1.8 See
[BCS97, §2.1]
4.2.2.3 Note that while for V ⊗3 , the kernels of S 2 V ⊗V → S 3 V and Λ2 V ⊗V →
Λ3 V were isomorphic GL(V )-modules, the kernels of S 3 V ⊗V → S 4 V and
Λ3 V ⊗V → Λ4 V are not. One can avoid dealing with spaces like S21 V ⊗V
by using maps like, e.g. the kernel of S 2 V ⊗S 2 V → S 4 V and keeping track
of dimensions of spaces uncovered. The answer is given by Theorem 5.2.1.2.
4.2.3 If a0 = b0 = c0 = Id then (u1 ⊥ ⊗v3 )⊗(v2 ⊥ ⊗w1 )⊗(w3 ⊥ ⊗u2 ) is mapped
to (uT2 ⊗(w3 ⊥ )T )⊗(w1T ⊥ ⊗(v2 ⊥ )T )⊗(v3T ⊗(u1 ⊥ )T ). The general case is just
notationally more cumbersome.
4.2.3.3 See [CHI+ ]. The calculation is a little more involved than indicated
in the section.
5.1.1.1 Prove an algebra version of Schur’s lemma.
5.1.4.2 If V is an irreducible G-module, then V ∗ ⊗V is an irreducible G × Gmodule.)
5.2.5.6 The highest weight vectors of Sd−1,1 V are all permutations of x1d−1 ⊗x2 −
xd−2
1 x2 ⊗x1 , so the only choice one has is the position of x2 . But now if one
sums over all possible positions one gets zero, and this is the only linear
relation.
P
P
5.2.2.2 cπ0 = σ∈Sdef δσ σ∈Sdef sgn(σ)δσ . Now show c(1d ) cπ = cπ0 .
π
π0
5.2.5.9 zπ ⊗zµ ⊂ V
⊗(|π|+|µ|)
and is a highest weight vector.
|otd
Sd
5.2.6.4 Recall that S d (A1 ⊗ · · · ⊗ An ) = (A⊗d
1 ⊗ · · · ⊗ An ) .
5.2.6.3 Use Exercise 5.2.2.2.
5.2.9.2 g · e1 ∧ · · · ∧ ev = det(g)e1 ∧ · · · ∧ ev
?? This is clear for the default Young tableau, now let Sd act.
⊗N
?? Consider Mh12,12,12i
.
6.2.4.3 In this case the determinant is a smooth quadric.
??.?? Use the tangent space as described in §2.4.4.
??.?? PNx∗ X ⊂ X ∨ .
?? The hypersurface {x1 · · · xn + y1 · · · yn = 0} is self-dual.
P
2R
∂R
= j ∂x∂i ∂x
and now consider the last nonzero col6.6.6.3 Note that ∂x
i
j
umn.
6.8.8.1 {perm2 = 0} is a smooth quadric.
178
7.3.4.2 Chd (C2 ) = PS d C2 .
Hints and Answers to Selected Exercises
Bibliography
[AFL14]
Andris Ambainis, Yuval Filmus, and Francois LeGall, Fast matrix multiplication: Limitations of the laser method, 2014.
[AR03]
Elizabeth S. Allman and John A. Rhodes, Phylogenetic invariants for the
general Markov model of sequence mutation, Math. Biosci. 186 (2003),
no. 2, 113–144. MR 2 024 609
[AR08]
, Phylogenetic ideals and varieties for the general Markov model,
Adv. in Appl. Math. 40 (2008), no. 2, 127–148. MR 2388607 (2008m:60145)
[AS13]
V. B. Alekseeva and A. V. Smirnov, On the exact and approximate bilinear
complexities of multiplication of 42 and 22 matrices, Proceedings of the
Steklov Institute of Mathematics 282 (2013), no. 1, 123–139.
[AT92]
N. Alon and M. Tarsi, Colorings and orientations of graphs, Combinatorica
12 (1992), no. 2, 125–134. MR 1179249 (93h:05067)
[AV08]
M. Agrawal and V. Vinay, Arithmetic circuits: A chasm at depth four, In
Proc. 49th IEEE Symposium on Foundations of Computer Science (2008),
6775.
[Bas14]
Saugata Basu, A complexity theory of constructible functions and sheaves,
arXiv:1309.5905 (2014).
[BB]
Austin R. Benson and Grey Ballard, A framework for practical parallel fast
matrix multiplication, arXiv:1409.2908.
[BCRL79]
Dario Bini, Milvio Capovani, Francesco Romani, and Grazia Lotti,
O(n2.7799 ) complexity for n × n approximate matrix multiplication, Inform.
Process. Lett. 8 (1979), no. 5, 234–235. MR MR534068 (80h:68024)
[BCS97]
Peter Bürgisser, Michael Clausen, and M. Amin Shokrollahi, Algebraic
complexity theory, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 315, Springer-Verlag,
Berlin, 1997, With the collaboration of Thomas Lickteig. MR 99c:68002
[BGL13]
Jaroslaw Buczyński, Adam Ginensky, and J. M. Landsberg, Determinantal
equations for secant varieties and the Eisenbud-Koh-Stillman conjecture, J.
Lond. Math. Soc. (2) 88 (2013), no. 1, 1–24. MR 3092255
179
180
Bibliography
[BGL14]
H. Bermudez, S. Garibaldi, and V. Larsen, Linear preservers and representations with a 1-dimensional ring of invariants, Trans. Amer. Math. Soc.
366 (2014), no. 9, 4755–4780. MR 3217699
[Bin80]
D. Bini, Relations between exact and approximate bilinear algorithms. Applications, Calcolo 17 (1980), no. 1, 87–97. MR 605920 (83f:68043b)
[BL89]
S. C. Black and R. J. List, A note on plethysm, European J. Combin. 10
(1989), no. 1, 111–112. MR 977186 (89m:20011)
[BL14]
Jaroslaw Buczyński and J. M. Landsberg, On the third secant variety, J.
Algebraic Combin. 40 (2014), no. 2, 475–502. MR 3239293
[Blä01]
Markus Bläser, Complete problems for Valiant’s class of qp-computable
families of polynomials, Computing and combinatorics (Guilin, 2001), Lecture Notes in Comput. Sci., vol. 2108, Springer, Berlin, 2001, pp. 1–10.
MR 1935355 (2003j:68051)
[Blä03]
, On the complexity of the multiplication of matrices of small
formats, J. Complexity 19 (2003), no. 1, 43–60.
MR MR1951322
(2003k:68040)
[Blä13]
Markus Bläser, Fast matrix multiplication, Graduate Surveys, no. 5, Theory
of Computing Library, 2013.
[BLMW11]
Peter Bürgisser, J. M. Landsberg, Laurent Manivel, and Jerzy Weyman, An
overview of mathematical issues arising in the geometric complexity theory
approach to VP 6= VNP, SIAM J. Comput. 40 (2011), no. 4, 1179–1209.
MR 2861717
[BO11]
Daniel J. Bates and Luke Oeding, Toward a salmon conjecture, Exp. Math.
20 (2011), no. 3, 358–370. MR 2836258 (2012i:14056)
[BOC92]
Micheal Ben Or and Richard Cleve, Computing algebraic formulas using a
constant number of registers, SIAM J. Comput. 21 (1992), no. 21, 54–58.
[Bre74]
Richard P. Brent, The parallel evaluation of general arithmetic expressions,
J. Assoc. Comput. Mach. 21 (1974), 201–206. MR 0660280 (58 #31996)
[Bri93]
A. Brill, über symmetrische functionen von variabelnpaaren, Nachrichten
von der Königlichen Gesellschaft der Wissenschaften und der GeorgAugusts-Universität zu Göttingen 20 (1893), 757–762.
[Bri93]
Michel Brion, Stable properties of plethysm: on two conjectures of Foulkes,
Manuscripta Math. 80 (1993), no. 4, 347–371. MR MR1243152 (95c:20056)
[Bri97]
, Sur certains modules gradués associés aux produits symétriques,
Algèbre non commutative, groupes quantiques et invariants (Reims, 1995),
Sémin. Congr., vol. 2, Soc. Math. France, Paris, 1997, pp. 157–183.
MR 1601139 (99e:20054)
[Bri02]
Emmanuel Briand, Polynômes multisymétriques, Ph.D. thesis, Université
de Rennes 1 et Universidad de Cantabria, 2002.
[Bri10]
, Covariants vanishing on totally decomposable forms, Liaison,
Schottky problem and invariant theory, Progr. Math., vol. 280, Birkhäuser
Verlag, Basel, 2010, pp. 237–256. MR 2664658
[CHI+ ]
Luca Chiantini, Jon Hauenstein, Christian Ikenmeyer, J.M. Landsberg, and
Giorgio Ottaviani, Towards algorithms for matrix multiplication, in preparation.
[CKSU05]
H. Cohn, R. Kleinberg, B. Szegedy, and C. Umans, Group-theoretic algorithms for matrix multiplication, Proceedings of the 46th annual Symposium on Foundations of Computer Science (2005), 379–388.
Bibliography
181
[CKW10]
Xi Chen, Neeraj Kayal, and Avi Wigderson, Partial derivatives in arithmetic complexity and beyond, Found. Trends Theor. Comput. Sci. 6 (2010),
no. 1-2, front matter, 1–138 (2011). MR 2901512
[CU]
H Cohn and C. Umans, Fast matrix multiplication using coherent configurations, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium
on Discrete Algorithms.
[CU03]
, A group theoretic approach to fast matrix multiplication, Proceedings of the 44th annual Symposium on Foundations of Computer Science
(2003), no. 2, 438–449.
[CW82]
D. Coppersmith and S. Winograd, On the asymptotic complexity of matrix
multiplication, SIAM J. Comput. 11 (1982), no. 3, 472–492. MR 664715
(83j:68047b)
[dG78]
Hans F. de Groote, On varieties of optimal algorithms for the computation
of bilinear mappings. II. Optimal algorithms for 2×2-matrix multiplication,
Theoret. Comput. Sci. 7 (1978), no. 2, 127–148. MR 509013 (81i:68061)
[Die49]
Jean Dieudonné, Sur une généralisation du groupe orthogonal à quatre variables, Arch. Math. 1 (1949), 282–287. MR 0029360 (10,586l)
[Dol03]
Igor Dolgachev, Lectures on invariant theory, London Mathematical Society Lecture Note Series, vol. 296, Cambridge University Press, Cambridge,
2003. MR MR2004511 (2004g:14051)
[Dri97]
Arthur A. Drisko, On the number of even and odd Latin squares of order
p + 1, Adv. Math. 128 (1997), no. 1, 20–35. MR 1451417 (98e:05018)
[DS13]
A. M. Davie and A. J. Stothers, Improved bound for complexity of matrix
multiplication, Proc. Roy. Soc. Edinburgh Sect. A 143 (2013), no. 2, 351–
369. MR 3039815
[Dyn52]
E. B. Dynkin, Maximal subgroups of the classical groups, Trudy Moskov.
Mat. Obšč. 1 (1952), 39–166. MR 0049903 (14,244d)
[Eis95]
David Eisenbud, Commutative algebra, Graduate Texts in Mathematics,
vol. 150, Springer-Verlag, New York, 1995, With a view toward algebraic
geometry. MR MR1322960 (97a:13001)
[FG10]
Shmuel Friedland and Elizabeth Gross, A proof of the set-theoretic version
of the salmon conjecture, preprint, arXiv:1104.1776 (2010).
[FH91]
William Fulton and Joe Harris, Representation theory, Graduate Texts in
Mathematics, vol. 129, Springer-Verlag, New York, 1991, A first course,
Readings in Mathematics. MR 1153249 (93a:20069)
[Fis94]
Ismor Fischer, Sums of Like Powers of Multivariate Linear Forms, Math.
Mag. 67 (1994), no. 1, 59–61. MR 1573008
[Fou50]
H. O. Foulkes, Concomitants of the quintic and sextic up to degree four
in the coefficients of the ground form, J. London Math. Soc. 25 (1950),
205–209. MR MR0037276 (12,236e)
[Fri13]
Shmuel Friedland, On tensors of border rank l in Cm×n×l , Linear Algebra
Appl. 438 (2013), no. 2, 713–737. MR 2996364
[Fro97]
G. Frobenius, Über die Darstellung der endlichen Gruppen durch lineare
Substitutionen, Sitzungsber Deutsch. Akad. Wiss. Berlin (1897), 994–1015.
[Gal]
Franois Le Gall, Powers of tensors and fast matrix multiplication,
arXiv:1401.7714.
182
Bibliography
[Gat87]
Joachim von zur Gathen, Feasible arithmetic computations: Valiant’s hypothesis, J. Symbolic Comput. 4 (1987), no. 2, 137–172. MR MR922386
(89f:68021)
[Gay76]
David A. Gay, Characters of the Weyl group of SU (n) on zero weight spaces
and centralizers of permutation representations, Rocky Mountain J. Math.
6 (1976), no. 3, 449–455. MR MR0414794 (54 #2886)
[GH79]
Phillip Griffiths and Joseph Harris, Algebraic geometry and local differential geometry, Ann. Sci. École Norm. Sup. (4) 12 (1979), no. 3, 355–452.
MR 81k:53004
[GHIL]
Fulvio Gesmundo, Jonathan Hauenstein, Christian Ikenmeyer, and J. M.
Landsberg, Geometry and matrix rigidity, arXiv:1310.1362.
[GKKS13a]
Ankit Gupta, Pritish Kamath, Neeraj Kayal, and Ramprasad Saptharishi,
Approaching the chasm at depth four, Proceedings of the Conference on
Computational Complexity (CCC) (2013).
[GKKS13b]
, Arithmetic circuits: A chasm at depth three, Electronic Colloquium
on Computational Complexity (ECCC) 20 (2013), 26.
[GKZ94]
I. M. Gel0 fand, M. M. Kapranov, and A. V. Zelevinsky, Discriminants,
resultants, and multidimensional determinants, Mathematics: Theory &
Applications, Birkhäuser Boston Inc., Boston, MA, 1994. MR 95e:14045
[Gly10]
David G. Glynn, The conjectures of Alon-Tarsi and Rota in dimension
prime minus one, SIAM J. Discrete Math. 24 (2010), no. 2, 394–399.
MR 2646093 (2011i:05034)
[Gor94]
P. Gordon, Das zerfallen der curven in gerade linien, Math. Ann. (1894),
no. 45, 410–427.
[Gre11]
Bruno Grenet, An Upper Bound for the Permanent versus Determinant
Problem, Manuscript (submitted), 2011.
[Gri86]
B. Griesser, A lower bound for the border rank of a bilinear map, Calcolo
23 (1986), no. 2, 105–114 (1987). MR 88g:15021
[GW09]
Roe Goodman and Nolan R. Wallach, Symmetry, representations, and invariants, Graduate Texts in Mathematics, vol. 255, Springer, Dordrecht,
2009. MR 2522486
[Had97]
J. Hadamard, Mémoire sur l’élimination, Acta Math. 20 (1897), no. 1,
201–238. MR 1554881
[Had99]
, Sur les conditions de décomposition des formes, Bull. Soc. Math.
France 27 (1899), 34–47. MR 1504330
[Har95]
Joe Harris, Algebraic geometry, Graduate Texts in Mathematics, vol. 133,
Springer-Verlag, New York, 1995, A first course, Corrected reprint of the
1992 original. MR 1416564 (97e:14001)
[HIL13]
Jonathan D. Hauenstein, Christian Ikenmeyer, and J. M. Landsberg, Equations for lower bounds on border rank, Exp. Math. 22 (2013), no. 4, 372–383.
MR 3171099
[How87]
Roger Howe, (GLn , GLm )-duality and symmetric plethysm, Proc. Indian
Acad. Sci. Math. Sci. 97 (1987), no. 1-3, 85–109 (1988). MR MR983608
(90b:22020)
[HR94]
Rosa Huang and Gian-Carlo Rota, On the relations of various conjectures
on Latin squares and straightening coefficients, Discrete Math. 128 (1994),
no. 1-3, 225–236. MR 1271866 (95i:05036)
Bibliography
183
[Ike12]
Christian Ikenmeyer, Geometric complexity theory, tensor rank, and
Littlewood-Richardson coefficients, Ph.D. thesis, Institute of Mathematics, University of Paderborn, Available at http://math-www.unipaderborn.de/agpb/work/ikenmeyer thesis.pdf (2012).
[IL03]
Thomas A. Ivey and J. M. Landsberg, Cartan for beginners: differential geometry via moving frames and exterior differential systems, Graduate Studies in Mathematics, vol. 61, American Mathematical Society, Providence,
RI, 2003. MR 2 003 610
[IM05]
Atanas Iliev and Laurent Manivel, Varieties of reductions for f rakgln , Projective varieties with unexpected properties, Walter de Gruyter GmbH &
Co. KG, Berlin, 2005, pp. 287–316. MR MR2202260 (2006j:14056)
[KL]
Shrawan Kumar and JM Landsberg, Connections between conjectures of
alon-tarsi, hadamard-howe, and integrals over the special unitary group,
preprint arXiv:arXiv:1410.8585.
[KL14]
Harlan Kadish and J. M. Landsberg, Padded polynomials, their cousins, and
geometric complexity theory, Comm. Algebra 42 (2014), no. 5, 2171–2180.
MR 3169697
[KLPSMN09] Abhinav Kumar, Satyanarayana V. Lokam, Vijay M. Patankar, and Jayalal
Sarma M. N., Using elimination theory to construct rigid matrices, Foundations of software technology and theoretical computer science—FSTTCS
2009, LIPIcs. Leibniz Int. Proc. Inform., vol. 4, Schloss Dagstuhl. LeibnizZent. Inform., Wadern, 2009, pp. 299–310. MR 2870721
[Koi]
Pascal Koiran, Arithmetic circuits: the chasm at depth four gets wider,
preprint arXiv:1006.4700.
[Kum]
Shrawan Kumar, A study of the representations supported by the orbit closure of the determinant, arXiv:1109.5996.
[Kum13]
, Geometry of orbits of permanents and determinants, Comment.
Math. Helv. 88 (2013), no. 3, 759–788. MR 3093509
[Lan]
J. M. Landsberg, Explicit tensors of border rank at least 2n − 1,
arXiv:1209.1664.
[Lan02]
Serge Lang, Algebra, third ed., Graduate Texts in Mathematics, vol. 211,
Springer-Verlag, New York, 2002. MR MR1878556 (2003e:00003)
[Lan06]
J. M. Landsberg, The border rank of the multiplication of 2 × 2 matrices
is seven, J. Amer. Math. Soc. 19 (2006), no. 2, 447–459. MR 2188132
(2006j:68034)
[Lan12]
, Tensors: geometry and applications, Graduate Studies in Mathematics, vol. 128, American Mathematical Society, Providence, RI, 2012.
MR 2865915
[Lan14a]
, Geometric complexity theory: an introduction for geometers, ANNALI DELL’UNIVERSITA’ DI FERRARA (2014), 1–53 (English).
[Lan14b]
, New lower bounds for the rank of matrix multiplication, SIAM J.
Comput. 43 (2014), no. 1, 144–149. MR 3162411
[Lic84]
Thomas Lickteig, A note on border rank, Inform. Process. Lett. 18 (1984),
no. 3, 173–178. MR 86c:68040
[Lic85]
, Typical tensorial rank, Linear Algebra Appl. 69 (1985), 95–120.
MR 87f:15017
[LM]
J. M. Landsberg and Mateusz Michalek, Abelian tensors, preprint.
184
Bibliography
[LM04]
J. M. Landsberg and Laurent Manivel, On the ideals of secant varieties of Segre varieties, Found. Comput. Math. 4 (2004), no. 4, 397–422.
MR MR2097214 (2005m:14101)
[LM08]
, Generalizations of Strassen’s equations for secant varieties of Segre
varieties, Comm. Algebra 36 (2008), no. 2, 405–422. MR MR2387532
[LMR13]
Joseph M. Landsberg, Laurent Manivel, and Nicolas Ressayre, Hypersurfaces with degenerate duals and the geometric complexity theory program,
Comment. Math. Helv. 88 (2013), no. 2, 469–484. MR 3048194
[LO]
J.M. Landsberg and Giorgio Ottaviani, New lower bounds for the border
rank of matrix multiplication, preprint, arXiv:1112.6007.
[LO13]
J. M. Landsberg and Giorgio Ottaviani, Equations for secant varieties of
Veronese and other varieties, Ann. Mat. Pura Appl. (4) 192 (2013), no. 4,
569–606. MR 3081636
[Mac95]
I. G. Macdonald, Symmetric functions and Hall polynomials, second ed.,
Oxford Mathematical Monographs, The Clarendon Press Oxford University
Press, New York, 1995, With contributions by A. Zelevinsky, Oxford Science
Publications. MR 1354144 (96h:05207)
[Man98]
Laurent Manivel, Gaussian maps and plethysm, Algebraic geometry (Catania, 1993/Barcelona, 1994), Lecture Notes in Pure and Appl. Math., vol.
200, Dekker, New York, 1998, pp. 91–117. MR MR1651092 (99h:20070)
[Mat60]
Yozô Matsushima, Espaces homogènes de Stein des groupes de Lie complexes, Nagoya Math. J 16 (1960), 205–218. MR MR0109854 (22 #739)
[McK08]
Tom McKay, On plethysm conjectures of Stanley and Foulkes, J. Algebra
319 (2008), no. 5, 2050–2071. MR 2394689 (2008m:20023)
[MM61]
Marvin Marcus and Henryk Minc, On the relation between the determinant
and the permanent, Illinois J. Math. 5 (1961), 376–381. MR 0147488 (26
#5004)
[MM62]
Marvin Marcus and F. C. May, The permanent function, Canad. J. Math.
14 (1962), 177–189. MR MR0137729 (25 #1178)
[MN05]
Jurgen Müller and Max Neunhöffer, Some computations regarding Foulkes’
conjecture, Experiment. Math. 14 (2005), no. 3, 277–283. MR MR2172706
(2006e:05186)
[MR04]
Thierry Mignon and Nicolas Ressayre, A quadratic bound for the determinant and permanent problem, Int. Math. Res. Not. (2004), no. 79, 4241–
4253. MR MR2126826 (2006b:15015)
[MR13]
Alex Massarenti and Emanuele Raviolo, The rank of n × n matrix multi√ 3
plication is at least 3n2 − 2 2n 2 − 3n, Linear Algebra Appl. 438 (2013),
no. 11, 4500–4509. MR 3034546
[MS01]
Ketan D. Mulmuley and Milind Sohoni, Geometric complexity theory. I.
An approach to the P vs. NP and related problems, SIAM J. Comput. 31
(2001), no. 2, 496–526 (electronic). MR MR1861288 (2003a:68047)
[MS08]
, Geometric complexity theory. II. Towards explicit obstructions for
embeddings among class varieties, SIAM J. Comput. 38 (2008), no. 3, 1175–
1206. MR MR2421083
[Mula]
Ketan D. Mulmuley, Geometric complexity theory V: Equivalence between
blackbox derandomization of polynomial identity testing and derandomization of noether’s normalization lemma, arXiv:1209.5993.
Bibliography
185
[Mulb]
, Geometric complexity theory VI: The flip via positivity, available
at http://gct.cs.uchicago.edu/gct6.pdf.
[Mum95]
David Mumford, Algebraic geometry. I, Classics in Mathematics, SpringerVerlag, Berlin, 1995, Complex projective varieties, Reprint of the 1976 edition. MR 1344216 (96d:14001)
[NW97]
Noam Nisan and Avi Wigderson, Lower bounds on arithmetic circuits
via partial derivatives, Comput. Complexity 6 (1996/97), no. 3, 217–234.
MR 1486927 (99f:68107)
[Pop75]
A. M. Popov, Irreducible simple linear Lie groups with finite standard subgroups in general position, Funkcional. Anal. i Priložen. 9 (1975), no. 4,
81–82. MR MR0396847 (53 #707)
[Pro07]
Claudio Procesi, Lie groups, Universitext, Springer, New York, 2007,
An approach through invariants and representations. MR MR2265844
(2007j:22016)
[Qi]
Yang Qi, Equations for the third secant variety of the segre product of n
projective spaces, arXiv:1311.2566.
[RS11]
Kristian Ranestad and Frank-Olaf Schreyer, On the rank of a symmetric
form, J. Algebra 346 (2011), 340–342. MR 2842085 (2012j:13037)
[Sch81]
A. Schönhage, Partial and total matrix multiplication, SIAM J. Comput.
10 (1981), no. 3, 434–455. MR MR623057 (82h:68070)
[Seg10]
C. Segre, Preliminari di una teoria delle varieta luoghi di spazi, Rend. Circ.
Mat. Palermo XXX (1910), 87–121.
[Sha94]
Igor R. Shafarevich, Basic algebraic geometry. 1, second ed., SpringerVerlag, Berlin, 1994, Varieties in projective space, Translated from the 1988
Russian edition and with notes by Miles Reid. MR MR1328833 (95m:14001)
[Sip92]
Michael Sipser, The history and status of the p versus np question, STOC
’92 Proceedings of the twenty-fourth annual ACM symposium on Theory
of computing (1992), 603–618.
[Sto]
A. Stothers, On the complexity of matrix multiplication, PhD thesis, University of Edinburgh, 2010.
[Str69]
Volker Strassen, Gaussian elimination is not optimal, Numer. Math. 13
(1969), 354–356. MR 40 #2223
[Str73]
, Vermeidung von Divisionen, J. Reine Angew. Math. 264 (1973),
184–202. MR MR0521168 (58 #25128)
[Str83]
V. Strassen, Rank and optimal computation of generic tensors, Linear Algebra Appl. 52/53 (1983), 645–685. MR 85b:15039
[Str87]
, Relative bilinear complexity and matrix multiplication, J. Reine
Angew. Math. 375/376 (1987), 406–443. MR MR882307 (88h:11026)
[Tav]
Sebastien Tavenas, Improved bounds for reduction to depth 4 and depth 3,
preprint arXiv:1304.5777.
[Ter11]
A. Terracini, Sulla vk per cui la varieta degli sh (h+1)-seganti ha dimensione
minore dell’ordinario, Rend. Circ. Mat. Palermo 31 (1911), 392–396.
[Val77]
Leslie G. Valiant, Graph-theoretic arguments in low-level complexity, Mathematical foundations of computer science (Proc. Sixth Sympos., Tatranská
Lomnica, 1977), Springer, Berlin, 1977, pp. 162–176. Lecture Notes in Comput. Sci., Vol. 53. MR 0660702 (58 #32067)
186
Bibliography
[Val79a]
L. G. Valiant, The complexity of computing the permanent, Theoret. Comput. Sci. 8 (1979), no. 2, 189–201. MR MR526203 (80f:68054)
[Val79b]
Leslie G. Valiant, Completeness classes in algebra, Proc. 11th ACM STOC,
1979, pp. 249–261.
[VSBR83]
L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff, Fast parallel computation of polynomials using few processors, SIAM J. Comput. 12 (1983),
no. 4, 641–644. MR 721003 (86a:68044)
[Wey93]
Jerzy Weyman, Gordan ideals in the theory of binary forms, J. Algebra 161
(1993), no. 2, 370–391. MR 1247362 (94i:13003b)
[Wey97]
Hermann Weyl, The classical groups, Princeton Landmarks in Mathematics, Princeton University Press, Princeton, NJ, 1997, Their invariants and
representations, Fifteenth printing, Princeton Paperbacks. MR 1488158
(98k:01049)
[Wey03]
Jerzy Weyman, Cohomology of vector bundles and syzygies, Cambridge
Tracts in Mathematics, vol. 149, Cambridge University Press, Cambridge,
2003. MR MR1988690 (2004d:13020)
[Wil]
Virginia Williams, Breaking the coppersimith-winograd barrier, preprint.
[Win71]
S. Winograd, On multiplication of 2 × 2 matrices, Linear Algebra and Appl.
4 (1971), 381–388. MR 45 #6173
[Ye11]
Ke Ye, The stabilizer of immanants, Linear Algebra Appl. 435 (2011), no. 5,
1085–1098. MR 2807220 (2012e:15017)
Index
2A -generic, 49
G-module, 20
G-module map, 22
Subk (S d V )
equations of, 123
X ∨ , dual variety of X, 127
Pad, 123
π 0 , 96
chown , 139
VNP, 113
VP, 113
s-rank, 75
vd (PV )∨ , dual of Veronese, 127
affine linear projection, 10
algebraic variety, 6, 24
arithmetic circuit, 112
bilinear map, 4
border rank, 19, 59
Brill’s equations, 157
Burnside’s theorem, 91
Cauchy formulas, 101
centralizer, 49, 90
character of representation, 94
Chow variety
Brill’s equations for, 157
equations of, 152
class function, 94
codimension of variety, 27
combinatorial restriction, 57
combinatorial value, 69
commutator, 90
complete problem, 113
completely reducible module, 90
complexity class
complete problem for, 113
hard problem for, 113
concise, 47
cone, 122
conjugate partition, 96
content, 100
contraction map, 23
decomposable representation, 20
degeneracy value, 64
degree of variety, 28
determinant, 9
DFT, 72
Discrete Fourier Transform, 72
discriminant hypersurface, 127
dual variety, 126, 127
dual vector space, 16
elementary symmetric function, 114
exponent of matrix multiplication, 4
formula, 115
quasi-polynomial, 115
general point, 29
generating function
for elementary symmetric polynomials,
114
generic
1A , 47
group algebra, 71
hard problem, 113
highest weight, 99
highest weight vector, 99
187
188
hook length, 100
ideal, 6
immanant, 104
inheritance, 107
interlace, 104
irreducible action, 20
isomorphic G-modules, 22
Jacobian loci, 141
Kempf-Weyman desingularization, 123
Koszul flattening, 39
Kronecker coefficients, 109, 121
symmetric, 121
Lie algebra, 30
linear map
rank, 16
Littlewood Richardson Rule, 104
matrix coefficient, 72
matrix coefficients, 93
module, 20
completely reducible, 90
semi-simple, 90
simple, 90
module homomorphism, 22
module map, 22
normalization
of a curve, 58
obstruction
occurrence, 118
orbit occurrence, 117
orbit representation-theoretic, 118
representation-theoretic, 119
occurrence obstruction, 118
orbit occurrence obstruction, 117
orbit representation-theoretic obstruction,
118
partition, 95
partpolar, 23
Pascal determinant, 144
Pieri formula, 104
Plücker coordinates, 44
polarization, 23
pullback, 45
quasi-polynomial size formula, 115
quasi-projective variety, 93
rank of linear map, 16
re-ordering isomorphism, 17
reducible representation, 20
Index
reduction of sequence, 113
regular endomorphism, 49
regular semi-simple, 49
representation
decomposable, 20
irreducible, 20
reducible, 20
representation-theoretic obstruction, 119
restriction value, 68
resultant, 133
Schur’s lemma, 22
semi-simple module, 90
Shannon entropy, 14
sign representation, 21
simple module, 90
size of
arithmetic circut, 112
size of circuit, 112
skew-symmetric tensor, 22
smooth point, 28
stabilizer
characterizes point, 117
of tensor, 24
standard Young tableau, 101
structure tensor of an algebra, 71
submodule, 20
subspace variety, 26, 122
sum-product polynomial, 141
symmetric Kronecker coefficients, 121
symmetric subspace variety
equations of, 123
symmetric tensor, 22
tensor
combinatorial restriction, 68
degenerates to, 63
restricts to, 68
skew-symmetric, 22
symmetric, 22
tensor product, 16
tensor rank, 5
Terracini’s Lemma, 33
triple product property, 73
trivial representation, 20
value
combinatorial, 69
degeneracy, 64
restriction, 68
variety, 24
algebraic, 6
codimension of, 27
degree of, 28
of padded polynomials, 123
quasi-projective, 93
Index
weight vector, 99
weight, highest, 99
Weyl group, 103
wreath product, 137
Young diagram, 96
Zariski closure, 6
189
Download