Here is my CV; Contact: hanbin973@snu.ac.kr
News
About
I hold a double major in medicine and pure mathematics. My work lies in the intersection of causal inference and population/statistical genetics. Recently, Iām working on identification of genetic associataion studies under various population genetic processes. At the same time, I develop sparsity-aware algorithms for single-cell data that scales easily up to millions of cells.
Education
Seoul National University, Republic of Korea
- Doctor of Medicine, Department of Medicine (2016.2 - 2023.8)
- Bachelor of Mathematics, Department of Mathematical Sciences (2017.2 - 2023.8)
South Korean medical schools are undergraduate-based unlike the US counterparts. I passed the Korea Medical License Exam. The license will be in effect from August, 2023.
Professional service
Peer review
International Journal of Epidemiology
Publications
-
Journals
-
Hanbin Lee,
and Buhm Han
FastRNA: an efficient exact solution for PCA of single-cell RNA sequencing data based on a batch-accounting count model.
American Journal of Human Genetics,
2022
Description: FastRNA produces batch-corrected low-dimensional representation of single-cell RNA sequencing data. We combine (1) the theory of sufficient statistics and (2) eigendecomposition to derive a completely sparse algorithm to perform feature selection and principal component analysis (PCA). The memory consumption and runtime are two to four orders of magnitude smaller than competing methods, enabling atlas-scale scRNA-seq data analysis within an ordinary personal computer.
-
Hanbin Lee,
and Buhm Han
A theory-based practical solution to correct for sex-differential participation bias.
Genome Biology,
2022
Description: The paper shows that sex-differential participation bias is largely dependent to the overall participation rate of a genetic cohort under standard parameteric assumptions. The major findings are (1) selection bias is highly sensitive to the parametric choice of the participation model (2) bias due to discrete variables contributing to study participation can be mitigated through stratification under low participation rate. As a result, it raises concern on the use of commercial genetic cohorts in which the target population is poorly specified, in turn, making the overall participation rate unidentifiable.
-
Chanwoo Kim*, Hanbin Lee*,
Juhee Jeong, Keehoon Jung and Buhm Han
MarcoPolo: a clustering-free approach to the exploration of differentially expressed genes along with group information in single-cell RNA-seq data.
Nucleic Acids Research,
2022
Description: MarcoPolo prioritizes potentially important genes in a single-cell RNA sequencing data through a unique ranking system. The method first fits a bimodal Poisson distribution and uses the fitted attributes to generate ranks of genes using four distinct metrics. The process does not require preceeding clustering of cells so it is robust to clustering results which are sensitive to program parameters.
*: Equal contribution
-
Hanbin Lee,
and Buhm Han
FastRNA: an efficient exact solution for PCA of single-cell RNA sequencing data based on a batch-accounting count model.
American Journal of Human Genetics,
2022
-
Unpublished Work
-
Hanbin Lee* and Moo Hyuk Lee*,
Theoretical Interpretation of Genetic Association Studies in Admixed Populations.
In preparation,
2023
Description: Admixed populations are formed through complicated evolutionary processes resulting in heterogeneous linkage disequilibrium patterns. This hampered the interpretation of genetic association studies followed by the development of various methods tailored for the purpose without clear guidance. In this work, we propose a unifying theory built upon population genetics and causal inference that brings a straightforward interpretation of GWAS findings in admixed populations. Furthermore, the theory establishes the connection between distinct methods in admixed GWAS. Finally, we propose two novel tests motivated by our theory that combine association and admixture signals.
-
Hanbin Lee* and Moo Hyuk Lee*,
Disentangling linkage and population structure in association mapping.
In preparation,
2023
Description: Genome-wide association study (GWAS) tests single nucleotide polymorphism (SNP) markers across the genome to localize the underlying causal variant of a trait. Because causal variants are seldom observed directly, a surrogate model based on genotyped markers are widely considered. Although many methods estimating the parameters of the surrogate model have been proposed, the connection between the surrogate model and the true causal model is yet investigated. In this work, we establish the connection between the surrogate model and the true causal model. Our theory shows the importance of the underlying demographic model in understanding the signal in GWAS.
-
Hanbin Lee,
and Buhm Han
Why do cell-level test methods for detecting differentially expressed genes fail in single-cell RNA-seq data?.
In preparation,
2023
Description: Earlier studies have found that cell-level tests for detecting differentially expressed genes (DEGs) exhibit inflated type 1 error in multi-subject scRNA-seq data. Pseudoreplication ā cells from a same subject are non-independent observations ā is thought to be the major cause which motivates the use of the pseudobulk. In this work, we argue that distributional misspecification, rather than pseudoreplication, might be a major cause of the inflated type 1 error in scRNA-seq DEG. We propose a simple solution to existing cell-level test methods that makes them robust against distributional misspecification. Using real data, we show that the robust generalized linear models (GLMs) that are free of distribution constraints perform well, providing both controlled type 1 error and high statistical power. Since the solution is widely available in existing software, it can be immediately adapted to scRNA-seq analysis ecosystems.
-
Hanbin Lee* and Moo Hyuk Lee*,
Theoretical Interpretation of Genetic Association Studies in Admixed Populations.
In preparation,
2023