Updated on May 7th, 2024
About
Any statistical and computational methods operate under the constraints imposed by the data generating process. To put it another way, no method can extract information from data beyond what is given by nature. Based on this insight, I study how evolutionary processes shape statistical inference.
I’m currently working on the following topics:
- Graphical algorithms and tree sequences for scalable statistical genetics
- Phylogenetic constraints of molecular language models
Contact: hblee@umich.edu
Education
University of Michigan, Ann Arbor
- Doctor of Philosophy, Department of Statistics (2024.9 -)
Seoul National University
- Doctor of Medicine, Department of Medicine (2016.3 - 2023.8)
- Bachelor of Mathematics, Department of Mathematical Sciences (2017.3 - 2023.8)
Professional service
Peer review
International Journal of Epidemiology, Nature Communications
Publications
-
*: Equal contribution
-
Journals
-
Hanbin Lee,
and Buhm Han
Pseudobulk with proper offsets has the same statistical properties as generalized linear mixed models in single-cell case-control studies.
Bioinformatics,
2024
Description: By adjusting the offset of a pseudobulk method, one can produce results that are identical to generalized linear mixed models in single-cell case-control studies. Hence, the massive computational burden and convergence problems of mixed models can be avoided by using pseudobulk instead without affecting the result.
-
Hanbin Lee,
and Buhm Han
FastRNA: an efficient exact solution for PCA of single-cell RNA sequencing data based on a batch-accounting count model.
American Journal of Human Genetics,
2022
Description: FastRNA produces batch-corrected low-dimensional representation of single-cell RNA sequencing data. We combine (1) the theory of sufficient statistics and (2) eigendecomposition to derive a completely sparse algorithm to perform feature selection and principal component analysis (PCA). The memory consumption and runtime are two to four orders of magnitude smaller than competing methods, enabling atlas-scale scRNA-seq data analysis within an ordinary personal computer.
-
Hanbin Lee,
and Buhm Han
A theory-based practical solution to correct for sex-differential participation bias.
Genome Biology,
2022
Description: The paper shows that sex-differential participation bias is largely dependent to the overall participation rate of a genetic cohort under standard parameteric assumptions. The major findings are (1) selection bias is highly sensitive to the parametric choice of the participation model (2) bias due to discrete variables contributing to study participation can be mitigated through stratification under low participation rate. As a result, it raises concern on the use of commercial genetic cohorts in which the target population is poorly specified, in turn, making the overall participation rate unidentifiable.
-
Chanwoo Kim*, Hanbin Lee*,
Juhee Jeong, Keehoon Jung and Buhm Han
MarcoPolo: a clustering-free approach to the exploration of differentially expressed genes along with group information in single-cell RNA-seq data.
Nucleic Acids Research,
2022
Description: MarcoPolo prioritizes potentially important genes in a single-cell RNA sequencing data through a unique ranking system. The method first fits a bimodal Poisson distribution and uses the fitted attributes to generate ranks of genes using four distinct metrics. The process does not require preceeding clustering of cells so it is robust to clustering results which are sensitive to program parameters.
-
Hanbin Lee,
and Buhm Han
Pseudobulk with proper offsets has the same statistical properties as generalized linear mixed models in single-cell case-control studies.
Bioinformatics,
2024