Supplementary Information Database searching extant protein sequences for fossil proteins We searched the moa peptides against databases of extant proteins, including closely related avian taxa, allowing us to determine large stretches of the collagen sequences. This approach has been used previously for determination of partial protein sequences not in the database, such as sequences from >100 proteins from mammoth [1-6]. Our search space allows us to detect a broad set of peptides from many taxa and our subsequent alignment of the sequenced peptides allowed us to combine homologous sequences for moa that produce a unique total sequence. Alternatively, de novo sequencing is an option to determine sequences from extinct taxa; however, this approach requires complete fragmentation of the peptides and/or high resolution to provide accurate novel protein sequences [7]. These limitations are confounded by the requirement of homology comparison to determine both what protein the peptide derives from and whether the peptides are the result of contamination (e.g., human keratin, bacterial peptides). This approach remains limited for highly variable proteins (i.e., ones that have very species specific sequences) [7], so the majority of protein and peptide sequences determined from fossil taxa are from highly conserved proteins and/or highly conserved portions of proteins. Phylogenetic analysis of Moa and other Archosaurian Collagen I We aligned our moa sequences to collagen I alpha 1 and alpha 2 sequences of other taxa (Table S9) in Seaview. Mature collagen I sequences of alpha 1 and alpha 2 were generated by cutting at the following motifs: alpha 1 N-terminus: FAP|QM, alpha 1 C-terminus: RYY|RAD, alpha 2 N-terminus: FAA|QYD, alpha 2 C-terminus: GPPG|PNGGG. After alignment, the sequences were analyzed using “Traditional Search” in TNT [8] with the following parameters: TBR, 1000 random seeds, 10,000 replicates, 100 trees/replicated stored. Tree support was calculated using 10,000 jackknife rearrangements with 36% removal probability and Bremer support was calculated for suboptimal trees within 10 steps of the most parsimonious tree. Table S8: Protein coverage for Collagen II and Collagen V Mascot Sequest PEAKS Combined Collagen II alpha 1 25.85% 22.64% 15% 37.55% Collagen V alpha 1 4.24% N.D. N.D. Collagen V alpha 2 4.07% 6.41% N.D. Collagen V alpha 3 N.D. 4.64% N.D. Collagen II compared to mouse collagen II because chicken sequence is incomplete. Collagen V alpha 1 and 2 compared to mouse and collagen V alpha 3 compared to human. N.D. = not detected Table S9: Collagen I sequences for phylogeny from Uniprot (ex. G1NB83) or NCBI nr (ex. gi|557280805|ref|XP_006023387.1|). Species Gallus gallus Halieetus leucocephalus Acanthisitta chloris Pseudopodoces humilis Falco cherrug Falco peregrinus Corvus brachyrhychos Serinus canaria Colius striatus Aquila chrysaetos canadensis Mesitornis unicolor Caprimulgus carolinensis Fulmarus glacialis Nestor notabilis Tinamus guttatus Picoides pubescens Nipponia nippon Calypte anna Cuculus canorus Tauraco erythrolophus Pelecanus crispus Opisthocomus hoazin Merops nubicus Egretta garzetta Corvus cornix cornix Anas platyrhynchos Meleagris gallopavo Alligator mississippiensis Alligator sinensis Collagen (I) alpha 1 P02457 gi|729725683|ref|XP_010583768.1| gi|678004726|ref|XP_009078835.1| gi|543378683|ref|XP_005532542.1| gi|541971469|ref|XP_005439463.1| gi|529432211|ref|XP_005235901.1| gi|669287747|ref|XP_008636727.1| gi|683934078|ref|XP_009096333.1| gi|706139306|ref|XP_010202685.1| gi|706121071|ref|XP_010196347.1| gi|768393217|ref|XP_011592259.1| gi|704579485|ref|XP_010188005.1| gi|704549976|ref|XP_010179386.1| gi|704320308|ref|XP_010168851.1| gi|697032266|ref|XP_009578273.1| gi|701314007|ref|XP_010017878.1| gi|719764165|ref|XP_010216927.1| gi|699701676|ref|XP_009907392.1| gi|694843991|ref|XP_009462891.1| gi|663256038|ref|XP_008490435.1| gi|696962633|ref|XP_009569448.1| gi|678130795|gb|KFU99981.1| gi|694654154|ref|XP_009483316.1| gi|700386166|ref|XP_009933168.1| gi|675615678|ref|XP_008936670.1| gi|726995379|ref|XP_010412372.1| gi|514767400|ref|XP_005024678.1| gi|733929245|ref|XP_010725420.1| gi|564240577|ref|XP_006277120.1| gi|557280805|ref|XP_006023387.1| Collagen (I) alpha 2 P02467 gi|729719813|ref|XP_010568320.1| gi|677971594|ref|XP_009071196.1| gi|543346326|ref|XP_005518863.1| gi|541950754|ref|XP_005432180.1| gi|529417920|ref|XP_005228841.1| gi|669272225|ref|XP_008628130.1| gi|683900718|ref|XP_009084209.1| gi|706114711|ref|XP_010194131.1| gi|768337876|ref|XP_011583521.1| gi|704555715|ref|XP_010181131.1| gi|704291102|ref|XP_010176198.1| gi|697023504|ref|XP_009573423.1| gi|701308887|ref|XP_010015343.1| gi|719730974|ref|XP_010210602.1| gi|699630750|ref|XP_009905768.1| gi|694847576|ref|XP_009464876.1| gi|663262068|ref|XP_008492371.1| gi|696975215|ref|XP_009557161.1| gi|701313377|ref|XP_009983928.1| gi|694628038|ref|XP_009490310.1| gi|700388462|ref|XP_009934436.1| gi|675608533|ref|XP_008948217.1| gi|697818148|ref|XP_009640166.1| gi|727017062|ref|XP_010397702.1| gi|514708862|ref|XP_005010992.1| G1NB83 gi|564228122|ref|XP_006258514.1| gi|557286414|ref|XP_006025957.1| Figure S1: Collagen II alpha 1 sequence coverage determined by Sequest, Mascot, and PEAKS. Figure S2: Collagen I alpha 2 peptide showing dimethylation of asparagine. Figure S3: Collagen I alpha 1 peptide showing hydroxylation of proline and acetylation of lysine. Figure S4: Collagen I alpha 2 peptide showing acetylation of alanine. Additionally, this peptide was detected with (bottom) and without (top) deamidation of asparagine. Figure S5: Collagen II alpha 1 peptide showing fucose on serine. Figure S6: Collagen I alpha 1 peptide showing hydroxylation of proline and carboxymethyllysine on the C-terminal lysine residue. The position of this CML residue could represent a backbone cleavage because other CML peptides showed missed cleavages at the modified lysine residue. Figure S7: TNT-based parsimony phylogeny derived from Collagen I sequences. Values above branches are jackknife values and values below branches are Bremer support. Orbitrap XL Parameters API SOURCE Source Voltage (kV): Source Current (uA): Capillary Voltage (V): Capillary Temp (C): Tube Lens Voltage (V): 2.02 0.28 47.01 150.01 99.97 VACUUM Ion Gauge (E-5 Torr): 1.80 Convectron Gauge (Torr): 0.93 FT VACUUM FT Penning Gauge (E-10 Torr) 0.42 FT Pirani Gauge 1 (Torr): 0.85 FT Pirani Gauge 2 (Torr): 0.00 ION OPTICS Multipole 00 Offset (V): -5.50 Lens 0 (V): -5.90 Multipole 0 Offset (V): -5.76 Lens 1 (V): -10.01 Gate Lens (V): -31.99 Multipole 1 Offset (V): -15.37 Multipole RF Amplitude (Vp-p 401.02 Front Lens (V): -6.21 Front Section (V): -9.00 Center Section (V): -12.03 Back Section (V): -6.99 Back Lens (V): 0.00 Trap Eject Offset (V): 6.00 FT Transfer Multipole Offset 4.36 FT Transfer Multipole Amplit 500.00 FT Gate Lens Offset (V): 6.40 FT Trap Lens Offset (V): 7.88 FT Storage Multipole Offset 8.55 FT Storage Multipole Amplitu 500.00 FT Reflect Lens Offset (V): 18.31 FT Main RF Amplitude (Vp-p): 2305.30 FT Main RF Current (A): 0.31 FT Main RF Frequency (kHz): 3090.31 FT HV Ion Energy (V): 1076.41 FT HV Lens 1 (V): 197.11 FT HV Lens 2 (V): -0.09 FT HV Lens 3 (V): -176.61 FT HV Lens 4 (V): 0.09 FT HV Push Voltage (V): 176.61 FT HV Pull Voltage (V): -215.33 ION DETECTION SYSTEM Dynode Voltage (kV): -14.88 Multiplier 1 (V): -1000.00 Multiplier 2 (V): -973.47 FT Analyzer FT CE Measure Voltage (V): -3453.72 FT CE Inject Voltage (V): -2467.06 FT Deflector Measure Voltage 315.71 FT Deflector Inject Voltage 21.64 FT Analyzer Temp. (°C): 25.99 FT Analyzer TEC Voltage: 1.16 FT Analyzer TEC Current: 0.20 FT Analyzer TEC Temp. (°C): 27.14 FT CE Electronics Temp. (°C) 33.13 FT CE Electronics TEC Temp. 29.92 Reagent Ion Source Status: Standby Filament: Off Emission Current (uA): 2.69 CI Gas Pressure (psi): 19.96 Source Temp (°C): 160.12 Vial 1 Temp (°C): 67.41 Restrictor Temp (°C): 160.18 Transfer Line Temp (°C): 159.85 Reagent Ion Optics Reagent Ion Lens 1 (V): Reagent Ion Lens 2 (V): Reagent Ion Lens 3 (V): 42.03 21.95 17.99 Reagent Vacuum Ion Gauge (E-5 Torr): 20.11 Convectron Pressure OK: Yes Convectron Pressure (Torr): 0.04 MS Detector Settings: Experiment Type: Nth Order Double Play Tune Method: JK_14Apr2011_tune Scan Event Details: 1: FTMS + p norm o(375.0-2000.0) CV = 0.0V 2: ITMS + p norm Dep MS/MS Most intense ion from (1) Activation Type: CID Min. Signal Required: 1000.0 Isolation Width: 2.00 Normalized Coll. Energy: 35.0 Default Charge State: 2 Activation Q: 0.250 Activation Time: 30.000 CV = 0.0V Scan Event 2 repeated for top 5 peaks. Data Dependent Settings: Use separate polarity settings disabled Parent Mass List: (none) Reject Mass List: (none) Neutral Loss Mass List: (none) Product Mass List: (none) Neutral loss in top: 3 Product in top: 3 Most intense if no parent masses found not enabled Add/subtract mass not enabled FT master scan preview mode enabled Charge state screening enabled Charge state dependent ETD time not enabled Monoisotopic precursor selection enabled Non-peptide monoisotopic recognition not enabled Charge state rejection enabled Unassigned charge states : rejected Charge state 1 : rejected Charge state 2 : not rejected Charge state 3 : not rejected Charge states 4+ : not rejected Chromatography mode is disabled Global Data Dependent Settings: Use global parent and reject mass lists not enabled Exclude parent mass from data dependent selection not enabled Exclusion mass width relative to mass Exclusion mass width relative to low (ppm): 5.000 Exclusion mass width relative to high (ppm): 5.000 Parent mass width relative to mass Parent mass width relative to low (ppm): 5.000 Parent mass width relative to high (ppm): 5.000 Reject mass width relative to mass Reject mass width relative to low (ppm): 5.000 Reject mass width relative to high (ppm): 5.000 Zoom/UltraZoom scan mass width by mass Zoom/UltraZoom scan mass width low: 5.00 Zoom/UltraZoom scan mass width high: 5.00 FT SIM scan mass width low: 5.00 FT SIM scan mass width high: 5.00 Neutral Loss candidates processed by decreasing intensity Neutral Loss mass width by mass Neutral Loss mass width low: 0.50000 Neutral Loss mass width high: 0.50000 Product candidates processed by decreasing intensity Product mass width by mass Product mass width low: 0.50000 Product mass width high: 0.50000 MS mass range: 0.00-1000000.00 MSn mass range by mass MSn mass range: 0.00-1000000.00 Use m/z values as masses not enabled Analog UV data dep. not enabled Dynamic exclusion enabled Repeat Count: 2 Repeat Duration: 30.00 Exclusion List Size: 500 Exclusion Duration: 10.00 Exclusion mass width relative to mass Exclusion mass width relative to low (ppm): 5.000 Exclusion mass width relative to high (ppm): 5.000 Expiration: disabled Isotopic data dependence not enabled Custom Data Dependent Settings: Not enabled Tune File Values Source Type: NSI Capillary Temp (C): 150.00 APCI Vaporizer Temp (C): 0.00 Sheath Gas Flow (): 0.00 Aux Gas Flow (): 0.00 Sweep Gas Flow (): 0.00 Injection Waveforms: Off Ion Trap Zoom AGC Target: 3000.00 Ion Trap Full AGC Target: 30000.00 Ion Trap SIM AGC Target: 10000.00 Ion Trap MSn AGC Target: 10000.00 FTMS Injection Waveforms: Off FTMS Full AGC Target: 500000.00 FTMS SIM AGC Target: 50000.00 FTMS MSn AGC Target: 200000.00 Reagent Ion Source Polarity: Negative Reagent Ion Source Temp (C): 160.00 Reagent Ion Source Emission Current (uA): 50.00 Reagent Ion Source Electron Energy (V): -70.00 Reagent Ion Source CI Pressure (psi): 20.00 Reagent Vial 1 Ion Time: 100.00 Reagent Vial 1 AGC Target: 400000.00 Reagent Vial 2 Ion Time: 50.00 Reagent Vial 2 AGC Target: 100000.00 Supplemental Activation Energy: 15.00 POSITIVE POLARITY Source Voltage (kV): 2.00 Source Current (uA): 100.00 Capillary Voltage (V): 47.00 Tube Lens (V): 100.00 Skimmer Offset (V): 0.00 Multipole RF Amplifier (Vp-p): 400.00 Multipole 00 Offset (V): -5.50 Lens 0 Voltage (V): -6.00 Multipole 0 Offset (V): -5.75 Lens 1 Voltage (V): -10.00 Gate Lens Offset (V): -32.00 Multipole 1 Offset (V): -15.50 Front Lens (V): -6.25 Ion Trap Zoom Micro Scans: 1 Ion Trap Zoom Max Ion Time (ms): 25.00 Ion Trap Full Micro Scans: 1 Ion Trap Full Max Ion Time (ms): 100.00 Ion Trap SIM Micro Scans: 1 Ion Trap SIM Max Ion Time (ms): 25.00 Ion Trap MSn Micro Scans: 1 Ion Trap MSn Max Ion Time (ms): 25.00 FTMS Full Micro Scans: 1 FTMS Full Max Ion Time (ms): 500.00 FTMS SIM Micro Scans: 1 FTMS SIM Max Ion Time (ms): 50.00 FTMS MSn Micro Scans: 1 FTMS MSn Max Ion Time (ms): 100.00 Reagent Ion Lens 1 (V): -20.00 Reagent Ion Gate Lens (V): -120.00 Reagent Ion Lens 2 (V): -15.00 Reagent Ion Lens 3 (V): -15.00 Reagent Ion Back Lens Offset (V): -6.50 Reagent Ion Back Multipole Offset (V): -7.00 Calibration File Values Multiple RF Frequency: 2522.800000 Main RF Frequency: 1184.500000 QMSlope0: 32.183829 QMSlope1: 32.186907 QMSlope2: 31.931027 QMSlope3: 0.000000 QMSlope4: 0.000000 QMInt0: -33.766301 QMInt1: 0.000000 QMInt2: -31.779899 QMInt3: 0.000000 QMInt4: 0.000000 End Section Slope: 0.000000 End Section Int: 12.000000 PQD CE Factor: 12.615609 IsoW Slope: 0.000419 IsoW Int: 0.146691 Reagent MP Slope: 5.965347 Reagent MP Int: -2.863759 Tickle Amp. Slope0: 0.000054 Tickle Amp. Int0: 0.001125 Tickle Amp. Slope1: 0.002000 Tickle Amp. Int1: 0.400000 Tickle Amp. Slope2: 0.002000 Tickle Amp. Int2: 0.400000 Tickle Amp. Slope3: 0.002000 Tickle Amp. Int3: 0.400000 Multiplier 1 Normal Gain (pos): -990.000000 Multiplier 1 High Gain (pos): -1110.000000 Multiplier 2 Normal Gain (pos): -965.000000 Multiplier 2 High Gain (pos): -1080.000000 Multiplier 1 Normal Gain (neg): -775.000000 Multiplier 1 High Gain (neg): -870.000000 Multiplier 2 Normal Gain (neg): -770.000000 Multiplier 2 High Gain (neg): -860.000000 Normal Res. Eject Slope: 0.011245 Normal Res. Eject Intercept: 7.582153 Zoom Res. Eject Slope: 0.005004 Zoom Res. Eject Intercept: 2.034130 Turbo Res. Eject Slope: 0.069200 Turbo Res. Eject Intercept: 35.000000 AGC Res. Eject Slope: 0.069200 AGC Res. Eject Intercept: 17.300000 UltraZoom Res. Eject Slope: 0.001200 UltraZoom Res. Eject Intercept: 0.642694 Normal Mass Slope: 28.233332 Normal Mass Intercept: -35.169076 Zoom Mass Slope: 26.658070 Zoom Mass Intercept: -41.694562 Turbo Mass Slope: 27.889913 Turbo Mass Intercept: 133.764542 AGC Mass Slope: 27.889913 AGC Mass Intercept: 133.764542 UltraZoom Mass Slope: 26.704217 UltraZoom Mass Intercept: -39.008584 Vernier Fine Mass Slope: 421.986199 Vernier Fine Mass Intercept: 0.000000 Vernier Coarse Mass Slope: 0.000000 Vernier Coarse Mass Intercept: 0.000000 Cap. Device Min (V): -139.809167 Cap. Device Max (V):139.706461 Tube Lens Device Min (V): 259.660557 Tube Lens Device Max (V): -257.960396 Skimmer Device Min (V): -139.385391 Skimmer Device Max (V): 139.269295 Multipole 00 Device Min (V): -140.659337 Multipole 00 Device Max (V): 140.202135 Lens 0 Device Min (V): -140.349642 Lens 0 Device Max (V): 140.041140 Gate Lens Device Min (V): -136.758808 Gate Lens Device Max (V): 136.192680 Split Gate Device Min (V): 0.133214 Split Gate Device Max (V): 0.149519 Multipole 0 Device Min (V): -139.939776 Multipole 0 Device Max (V): 139.401846 Lens 1 Device Min (V): -140.051652 Lens 1 Device Max (V): 139.488103 Multipole 1 Device Min (V): -140.038065 Multipole 1 Device Max (V): 139.874486 Front Lens Device Min (V): -139.968421 Front Lens Device Max (V): 139.513134 Front Section Device Min (V): -142.778426 Front Section Device Max (V): 142.375895 Center Section Device Min (V): -141.345987 Center Section Device Max (V): 140.935342 Back Section Device Min (V): -141.806801 Back Section Device Max (V): 141.426282 Back Lens Device Min (V): -142.903688 Back Lens Device Max (V): 142.584530 Reagent Lens 1 Device Min (V): -142.051170 Reagent Lens 1 Device Max (V): 141.776483 Reagent Gate Lens Min (V): -132.319901 Reagent Gate Lens Max (V): 132.171194 Reagent Lens 2 Device Min (V): -141.875420 Reagent Lens 2 Device Max (V): 141.580896 Reagent Lens 3 Device Min (V): -143.387012 Reagent Lens 3 Device Max (V): 143.118333 Reagent Electron Lens Device Min (V): -0.255970 Reagent Electron Lens Device Max (V): 150.339771 1. Cappellini E., Jensen L.J., Szklarczyk D., Ginolhac A., da Fonseca R.A.R., Stafford T.W., Holen S.R., Collins M.J., Orlando L., Willerslev E., et al. 2012 Proteomic Analysis of a Pleistocene Mammoth Femur Reveals More than One Hundred Ancient Bone Proteins. Journal of Proteome Research 11(2), 917-926. (doi:10.1021/pr200721u). 2. Wadsworth C., Buckley M. 2014 Proteome degradation in fossils: investigating the longevity of protein survival in ancient bone. Rapid Communications in Mass Spectrometry 28(6), 605-615. (doi:10.1002/rcm.6821). 3. Orlando L., Ginolhac A., Zhang G., Froese D., Albrechtsen A., Stiller M., Schubert M., Cappellini E., Petersen B., Moltke I., et al. 2013 Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499(7456), 74-78. (doi:10.1038/nature12323). 4. Asara J.M., Schweitzer M.H., Freimark L.M., Phillips M., Cantley L.C. 2007 Protein Sequences from Mastodon and Tyrannosaurus Rex Revealed by Mass Spectrometry. Science 316(5822), 280-285. (doi:10.1126/science.1137614). 5. Schweitzer M.H., Zheng W., Organ C.L., Avci R., Suo Z., Freimark L.M., Lebleu V.S., Duncan M.B., Vander Heiden M.G., Neveu J.M., et al. 2009 Biomolecular Characterization and Protein Sequences of the Campanian Hadrosaur B. canadensis. Science 324(5927), 626631. (doi:10.1126/science.1165069). 6. Schweitzer M.H., Zheng W., Cleland T.P., Bern M. 2013 Molecular analyses of dinosaur osteocytes support the presence of endogenous molecules. Bone 52, 414-423. (doi:10.1016/j.bone.2012.10.010). 7. Medzihradszky K.F., Chalkley R.J. 2013 Lessons in de novo peptide sequencing by tandem mass spectrometry. Mass Spectrometry Reviews, n/a-n/a. (doi:10.1002/mas.21406). 8. Goloboff P.A., Farris J.S., Nixon K.C. 2008 TNT, a free program for phylogenetic analysis. Cladistics 24, 774-786.