Method: The back-projection method time-reverses seismograms recorded at dense seismic arrays back to a set of grid points around the hypocenter of the event based upon a 1-D velocity model [e.g., IASP91, Kennett and Engdahl, 1991]. Corrections for 3-D lateral variations that are not in the 1-D velocity model are determined using a cross correlation analysis of the first-arriving P waves [Ishii et al., 2007]. This analysis determines time corrections, t k , at the kth seismic station that force the initial P waves to align at the hypocenter after time shifting by the theoretical 1-D travel times. After time shifting the kth seismogram to the ith grid point, tik , the seismograms are added together to form a stack, si (t) . These stacks can be expressed as K si (t) k u(t t ik t k ) . k1 Here K is the total number of seismograms, and k is a weighting factor that normalizes the amplitudes of each seismogram and corrects for polarity changes within the array. The time corrections t k are usually determined using the first arriving P waves from the earthquake on which the back-projection analysis is applied. However, when the first are difficult to align, one can use time arriving P waves of the event of interest corrections from an event in the same region that has more impulsive first-arriving P is the approach we take for the Mw 9.0 Tohoku-oki earthquake, which has waves. This very emergent initial P waveforms. In contrast, a Mw 7.1 aftershock on April 7, 2011 occurred at a depth of 65 km and has very impulsive waveforms that are easier to align with the cross correlation analysis. The time corrections from the April 7th event are applied to the sequence of events investigated in this paper. This means that all imaged energy is with respect to the hypocenter of the April 7th event, and any error in this location will be reflected in the back-projection results. In addition, by using this aftershock, we have set the depth of the plane of grid points to 65 km. The fact that this depth is deeper than most of the events in this study means that the imaged energy will be slightly shifted towards the seismic array. This will occur because smearing of backprojection results occurs along the ray paths [Kiser et al., 2011]. Both of these sources of energy mislocation contribute to the imaged energy crossing the trench location (Figure 2c). However, neither contribution should be frequency-dependent, therefore the relative rupture properties in the main text are robust. To better image rupture propagation, an additional processing step is taken that eliminates amplitude information from the back-projection results. This step calculates a coherency function, xi (t) at the ith grid point as t T 1 K x i (t) K k1 pk uk ( t ik t k ) si ( ) t t T u ( t 2 k t ik t k ) . t T s ( ) 2 i t This function is the average cross correlation value between individual, time shifted seismograms and the stack at the ith grid point. T is the time window of the cross correlation. This time window should include multiple cycles of the waveforms, and therefore increases when using lower frequency data. pk is the polarity correction at station k mentioned above. The frequency-dependent updip shift seen in Figure 2(c) is based upon plotting the center of the energy kernels at 5-second time increments. At lower frequencies, the resolution of the energy kernels degrades, and approximating the location of energy release with a single point is accompanied by more uncertainty. In Figure S1, the back-projection results are plotted as a function of longitude and time. This figure shows how resolution degrades at lower frequencies, but also demonstrates that the updip shift in imaged energy can still be observed without the point source approximation. An additional test that investigates if the updip shift in lower frequency energy is an artifact of the backprojection analysis demonstrates that these results are robust (Figure S2). Auxiliary Figure Captions: Figure S1: Longitude/Time Plots From top to bottom: back-projection results of the mainshock with respect to longitude and time using bandpass-filtered data between 0.8 and 2 Hz, 0.25 and 0.5 Hz, 0.1 and 0.2 Hz, and 0.05 and 0.1 Hz. These images demonstrate how resolution degrades at lower frequencies. The imaged energy is normalized at each time step, and the white star is the hypocentral longitude and time. The vertical white lines are the longitudes of the Oshika Peninsula (left) and the trench location at the epicentral latitude (right). Figure S2: March 9th Mw 6.4 Foreshock Back-projection results for a M 6.4 foreshock (centers of the energy kernels) using data filtered to 0.8-2 Hz (red dots) and 0.1-0.2 Hz (green dots). This result shows that for this particular earthquake the high-frequency energy is imaged updip of the lower-frequency energy. This shift is opposite of that of the Mw 9.0 mainshock, and demonstrates that the frequency-dependent behavior reported in the main text is a real feature of the rupture and not an artifact caused by a processing step in the back-projection analysis. The white star is the epicenter of the Mw 6.4 foreshock. Figure S3: Rupture Areas of Foreshocks and Aftershocks Rupture areas (white contours) of foreshocks and aftershocks between March 9th and April 7th. All earthquakes have been identified in the JMA catalogue and have magnitudes greater than or equal to 6. The red contour is the energy kernel of a synthetic point source for reference. The yellow line is the location of the Japan Trench. The white star is the epicenter of the Mw 9.0 mainshock. Figure S4: Subduction Interface Failure (a) Similar to Figure S3 except that we have selected foreshocks and aftershocks with rupture areas that do not overlap significantly with previous foreshocks and aftershocks. All symbols are the same as in Figure S3. (b) Average seismicity rate of the regions within the contours of (a) for the 48 hours surrounding each large event. Time is with respect to the hypocentral time of each event. This graph shows that seismicity within the contours dramatically increases following these large earthquakes, which lends support to the idea that the interface becomes active through a cascading series of large aftershocks, whose rupture areas are spatially and seismically distinct from the rupture area of the mainshock. Auxiliary References: Ishii, M., P.M. Shearer, H. Houston, and J.E. Vidale (2007), Teleseismic P wave imaging of the 26 December 2004 Sumatra-Andaman and 28 March 2005 Sumatra earthquake ruptures using the Hi-net array, J. Geophys. Res., 112, B11307. Kennett, B.L.N., and E.R. Engdahl (1991), Traveltimes for global earthquake location and phase identification, Geophys. J. Int., 105, 429-465. Kiser, E., M. Ishii, C.H. Langmuir, P.M. Shearer, and H. Hirose (2011), Insights into the mechanism of intermediate-depth earthquakes from source properties as imaged by backprojection of multiple seismic phases, J. Geophys. Res., 116, B06310.