Efficient interpretable embedding learning for gene expression data with deep neural networks for cancer Project background Recently, deep learning (DL) has demonstrated great potential in medical applications, such as cardiovascular disease diagnosis, skin disease classification, protein structure prediction, and singlecell-level understanding of gene expression data for diagnosis and prognosis. However, the power of DL has been limited by that fact that the gene expression data often has an extremely large feature space, often over tens of thousands of dimensions, and the number of samples is highly underrepresented due to data collection. Thus, it is a very challenging task to learn an efficient representation of gene expression data that is interpretable in the context of biology and clinical implication, especially for cancer subtype identification and precise treatment design. Project description A few studies have been explored the possibility of applying deep autoencoders on gene expression data, but the interpretability of the learned models still requires further improvement. In this project, we will approach the aim of learning an efficient representation of gene expression data for the purpose of precise medicine in cancer patients by 1. exploring current existing literature and methods for gene expression data understanding 2. incorporating expert knowledge in feature selecting and engineering 3. exploring different network structures of the deep autoencoder 4. developing methods to interpreting and evaluating latent dimensions of the learned embedding in the context of cancer mechanisms ultimately, we aim to submit a manuscript in a fitting venue (peer-reviewed high impact journal or conference) and publish the code repository for reproducibility. Project requirements A strong background in computer science, engineering, or mathematics with experience of machine learning or deep learning is desirable. A successful candidate should be a confident programmer and curious of the state-of-the-art deep learning advances as well as their biological applications. Proficiency with python and one of the machine learning frameworks, i.e., tensorflow or pytorch is required. What we can offer is a group of outstanding researchers from biology, bioinformatic, and computer science and tightly bond as a whole team. The candidate will be exploring deep learning methods, understanding genetic data in the context of cancer research, and developing scientific thinking/writing/problem solving skills with this project and be prepared to be highly qualified in future career development. Diversity Women and people from underrepresented groups are strongly encouraged to apply. We are committed to seeking and providing any support you require to complete the project.