Here is my CV; Contact: hanbin973@snu.ac.kr
News
About
I hold a double major in medicine and pure mathematics. My work lies in the intersection of causal inference and population/statistical genetics. Recently, Iām working on identification of genetic associataion studies under various population genetic processes. At the same time, I develop sparsity-aware algorithms for single-cell data that scales easily up to millions of cells.
Education
Seoul National University, Republic of Korea
- Doctor of Medicine, Department of Medicine (2016.2 - 2023.8)
- Bachelor of Mathematics, Department of Mathematical Sciences (2017.2 - 2023.8)
South Korean medical schools are undergraduate-based unlike the US counterparts. I passed the Korea Medical License Exam. The license will be in effect from August, 2023.
Professional service
Peer review
International Journal of Epidemiology, Nature Communications
Publications
-
Journals
-
Hanbin Lee,
and Buhm Han
FastRNA: an efficient exact solution for PCA of single-cell RNA sequencing data based on a batch-accounting count model.
American Journal of Human Genetics,
2022
Description: FastRNA produces batch-corrected low-dimensional representation of single-cell RNA sequencing data. We combine (1) the theory of sufficient statistics and (2) eigendecomposition to derive a completely sparse algorithm to perform feature selection and principal component analysis (PCA). The memory consumption and runtime are two to four orders of magnitude smaller than competing methods, enabling atlas-scale scRNA-seq data analysis within an ordinary personal computer.
-
Hanbin Lee,
and Buhm Han
A theory-based practical solution to correct for sex-differential participation bias.
Genome Biology,
2022
Description: The paper shows that sex-differential participation bias is largely dependent to the overall participation rate of a genetic cohort under standard parameteric assumptions. The major findings are (1) selection bias is highly sensitive to the parametric choice of the participation model (2) bias due to discrete variables contributing to study participation can be mitigated through stratification under low participation rate. As a result, it raises concern on the use of commercial genetic cohorts in which the target population is poorly specified, in turn, making the overall participation rate unidentifiable.
-
Chanwoo Kim*, Hanbin Lee*,
Juhee Jeong, Keehoon Jung and Buhm Han
MarcoPolo: a clustering-free approach to the exploration of differentially expressed genes along with group information in single-cell RNA-seq data.
Nucleic Acids Research,
2022
Description: MarcoPolo prioritizes potentially important genes in a single-cell RNA sequencing data through a unique ranking system. The method first fits a bimodal Poisson distribution and uses the fitted attributes to generate ranks of genes using four distinct metrics. The process does not require preceeding clustering of cells so it is robust to clustering results which are sensitive to program parameters.
*: Equal contribution
-
Hanbin Lee,
and Buhm Han
FastRNA: an efficient exact solution for PCA of single-cell RNA sequencing data based on a batch-accounting count model.
American Journal of Human Genetics,
2022
-
Unpublished Work
-
Hanbin Lee* and Moo Hyuk Lee*,
Theoretical Interpretation of Genetic Association Studies in Admixed Populations.
In preparation,
2023
Description: Admixed populations are formed through complicated evolutionary processes resulting in heterogeneous linkage disequilibrium patterns. This hampered the interpretation of genetic association studies followed by the development of various methods tailored for the purpose without clear guidance. In this work, we propose a unifying theory built upon population genetics and causal inference that brings a straightforward interpretation of GWAS findings in admixed populations. Furthermore, the theory establishes the connection between distinct methods in admixed GWAS. Finally, we propose two novel tests motivated by our theory that combine association and admixture signals.
-
Hanbin Lee* and Moo Hyuk Lee*,
Disentangling linkage and population structure in association mapping.
In preparation,
2023
Description: Genome-wide association study (GWAS) tests single nucleotide polymorphism (SNP) markers across the genome to localize the underlying causal variant of a trait. Because causal variants are seldom observed directly, a surrogate model based on genotyped markers are widely considered. Although many methods estimating the parameters of the surrogate model have been proposed, the connection between the surrogate model and the true causal model is yet investigated. In this work, we establish the connection between the surrogate model and the true causal model. The connection shows that population structure is accounted in GWAS by modelling the variant of interest and not the trait. Such observation explains how environmental confounding can be partially corrected using genetic covariates and why the previously claimed connection between PC correction and linear mixed models is incorrect.
-
Hanbin Lee,
and Buhm Han
On the cause of inflated type 1 error in single cell DEG.
In preparation,
2023
Description: Earlier studies have found that cell-level tests for detecting differentially expressed genes (DEGs) exhibit inflated type 1 error in multi-subject scRNA-seq data. Pseudoreplication ā cells from a same subject are non-independent observations ā is thought to be the major cause which motivates the use of the pseudobulk. In this work, we argue that distributional misspecification can be another, if not primary, cause of the inflated type 1 error in scRNA-seq DEG. We show that the robust sandwich estimator for distributionally misspecified maximum likelihood estimator (MLE) produces calibrated type 1 error rate. Using real data, we show that the robust method performs surprisingly well, providing both controlled type 1 error and high statistical power, and is superior to many other methods that explicitly account only for pseudoreplication. Since our suggested solution is already available in existing software, it can be immediately adapted to scRNA-seq analysis ecosystems.
-
Hanbin Lee* and Moo Hyuk Lee*,
Theoretical Interpretation of Genetic Association Studies in Admixed Populations.
In preparation,
2023