Analysis of human upstream open reading frames and impact on gene expression Yuhua Yea, Yidan Lianga, Qiuxia Yua, Lingling Hua, Haoli Lia, Zhenhai Zhangb,c,d* & Xiangmin Xua* a Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China. b State Key Laboratory of Organ Failure Research and, cNational Clinical Research Center for Kidney Disease, dDivision of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, 510515, China Corresponding authors: Xiangmin Xu, Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, G uangzhou 510515, Guangdong, P.R. China. Tel: +86-020-61648293. Fax: +86-020-87 278766. E-mail: gzxuxm@pub.guangzhou.gd.cn Zhenhai Zhang, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, Guangdong, P.R. China. E-mail: zhenhaismu@163.com Table S1. The correspondence between the substitute numbers and the tissues for Fig. 2a. Substitute Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Tissue Adult Adrenal Adult Colon Adult Esophagus Adult Frontal Cortex Adult Gallbladder Adult Heart Adult Kidney Adult Liver Adult Lung Adult Ovary Adult Pancreas Adult Prostate Adult Rectum Adult Retina Adult Spinal Cord Adult Testis Adult Urinary Bladder B Cells CD4 Cells CD8 Cells Fetal Brain Fetal Gut Fetal Heart Fetal Liver Fetal Ovary Fetal Testis Monocytes NK Cells Placenta Platelets 100% Minus_identical 90% Minus_complement 80% Plus_identical 70% 60% 50% 40% 30% 20% 10% 0% ClinVar COSMIC TCGA_37 TCGA_36 Figure S1. The consistency between the reference nucleotide of each presented variation and its corresponding position in human genome hg19. Blue bars indicate the reference nucleotide and its hg19 corresponding position are identical given the strand is a plus strand. Red bars indicate the reference nucleotide and its hg19 corresponding position are base-complementary given the strand is a minus strand. As the variant records in COSMIC has divided the plus and minus strand according to RefGene, 97.21% of the entries whose reference nucleotide and its hg19 corresponding position are always identical. Table S2. The GO terms that uORF genes are enriched in. GO term ID GO term p value 0005882 intermediate filament 3.75128E-11 0045095 keratin filament 5.9813E-11 0044822 poly(A) RNA binding 8.09032E-09 0005198 structural molecule activity 9.73258E-09 0050789 regulation of biological process 1.58522E-08 0003700 sequence-specific DNA binding transcription factor activity 2.80049E-08 0006355 regulation of transcription, DNA-templated 3.21538E-08 0001071 nucleic acid binding transcription factor activity 3.5957E-08 0065007 biological regulation 3.97601E-08 0005576 extracellular region 5.22067E-08 0003723 RNA binding 1.26541E-07 0035556 intracellular signal transduction 1.53163E-07 0030529 ribonucleoprotein complex 1.88428E-07 0046872 metal ion binding 2.1263E-06 0070062 extracellular vesicular exosome 2.45475E-06 0065010 extracellular membrane-bounded organelle 2.45475E-06 0043169 cation binding 2.5239E-06 0005634 nucleus 1.13403E-05 0050794 regulation of cellular process 1.41408E-05 0007165 signal transduction 2.09173E-05 0003677 DNA binding 2.50662E-05 0006351 transcription, DNA-templated 8.60893E-05 0043167 ion binding 0.000123429 0032991 macromolecular complex 0.000165091 0042995 cell projection 0.000183242 0009966 regulation of signal transduction 0.00021263 0045893 positive regulation of transcription, DNA-templated 0.000264491 0010646 regulation of cell communication 0.000276592 0044267 cellular protein metabolic process 0.000299065 0003735 structural constituent of ribosome 0.000342283 0006412 translation 0.000376857 0010468 regulation of gene expression 0.000503052 0044421 extracellular region part 0.000591215 0000981 sequence-specific DNA binding RNA polymerase II transcription factor activity 0.000722597 0045047 protein targeting to ER 0.000888552 0006613 cotranslational protein targeting to membrane 0.000888552 031982 vesicle 0.001120198 0031988 membrane-bounded vesicle 0.001424449 0006414 translational elongation 0.001690669 2001141 regulation of RNA biosynthetic process 0.001745987 0044391 ribosomal subunit 0.001797848 0006614 SRP-dependent cotranslational protein targeting to membrane 0.001988005 0060255 regulation of macromolecule metabolic process 0.002039866 0006357 regulation of transcription from RNA polymerase II promoter 0.002351032 0019222 regulation of metabolic process 0.002506615 0005840 ribosome 0.002679485 0006413 translational initiation 0.003232669 0000139 Golgi membrane 0.003820427 0036211 protein modification process 0.00432175 0006464 cellular protein modification process 0.00432175 0005615 extracellular space 0.004529194 0005886 plasma membrane 0.008799083 2000112 regulation of cellular macromolecule biosynthetic process 0.016180632 0016020 membrane 0.01884283 0060089 molecular transducer activity 0.02178162 0004871 signal transducer activity 0.02178162 0032774 RNA biosynthetic process 0.02195449 0019083 viral transcription 0.03128947 0044767 single-organism developmental process 0.04771212 0016021 integral component of membrane 0.04909508