Le Bao

Department of Statistics
Penn State University
514C Wartick Lab
University Park, PA 16802
Telephone: (814) 863-7395
Fax: (814) 865-9443
e-mail: lebao@psu.edu

I am an Associate Professor of Statistics and the associate director of Center for Advanced Data Assimilation and Predictability Techniques (ADAPT) at the The Pennsylvania State University, University Park right in the middle of Pennsylvania. I received a Ph.D. in Statistics from the University of Washington-Seattle (2011, Advisor: Adrian E. Raftery), an MS in Statistics at Dalhousie Univeristy (2005, Advisors: Hong Gu and Joseph Bielawski), and a BS in Financial Mathematics at Peking University. My hometown is Beijing in China.

Teaching    Research Interests    Publications    Software    Working Groups   CV


Research Interests

Refereed Publications [+PhD or Postdoc trainee] [*corresponding author]

  1. Bao, L.*, Niu, X., Mahy M. and Ghys P.D. (2023) Estimating HIV Epidemics for Sub-national Areas. To appear in Annals of Applied Statistics. arXiv:1508.06618.
  2. Laga, I.+, Bao, L., and Niu, X. (2023) A Correlated Network Scale-up Model: Finding the Connection Between Subpopulations. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2023.2165929
  3. Laga, I.+, Niu, X., # and Bao, L.* (2023) Mapping the number of female sex workers in countries across sub-Saharan Africa. Proceedings of the National Academy of Sciences. 120 (2) e2200633120. https://doi.org/10.1073/pnas.2200633120
  4. Sheng B.+, Li, C.+, Bao, L.* & Li, R. (2022) Probabilistic HIV Recency Classification -- A Logistic Regression without Labeled Individual Level Training Data. To appear in Annals of Applied Statistics. arXiv:2104.05150.
  5. Bao, L.*, Zhang, Y.+ and Niu, X. (2022) What Can We Learn from the Travelers Data in Detecting Disease Outbreaks--A Case Study of the COVID-19 Epidemic. Annals of Epidemiology. 75: 67-72. https://doi.org/10.1016/j.annepidem.2022.09.005
  6. Parsons, J.+, Niu, X. & Bao, L.* (2022) A Bayesian hierarchical modeling approach to combining multiple data sources: A case study in size estimation. Annals of Applied Statistics. 16(3): 1550-1562.
  7. Li, X.+, Zhang, A.+, Al-Zaidy, R., Baral, S., Bao, L.* and Giles, C. L. (2022) Automating document classification with distant supervision to increase the efficiency of systematic reviews: A case study on identifying studies with HIV impacts on female sex workers. PLOS ONE. doi.org/10.1371/journal.pone.0270034 14. 
  8. Bao L., Li, C.+, Li, R., & Yang S.+ (2022) Causal Structural Learning on MPHIA Individual Dataset. Journal of the American Statistical Association. 117.540, 1642-1655; https://doi.org/10.1080/01621459.2022.2077209  
  9. VanEvery H.+, Yang W., Su J., Olsen N., Bao L., Lu B., Wu S., Cui L., Gao X. (2022) Low density lipoprotein cholesterol and risk of rheumatoid arthritis: a prospective study. Nutrients, 14(6), 1240; https://doi.org/10.3390/nu14061240.
  10. Parsons, J.+ and Bao, L.* (2021) A Unified Approach for Outliers and Influential Data Detection 每 The Value of Information in Retrospect. Stat. doi.org/10.1002/sta4.442
  11. Laga I.+, Bao L., & Niu X. (2022) Thirty Years of The Network Scale-up Method. Journal of the American Statistical Association. 116:535, 1548-1559; https://doi.org/10.1080/01621459.2021.1935267
  12. Laga I.+, Niu X., & Bao L.* (2021) Modeling the Marked Presence-only Data: A Case Study of Estimating the Female Sex Worker Size in Malawi. Journal of the American Statistical Association. 117.537, 27-37; https://doi.org/10.1080/01621459.2021.1944873
  13. VanEvery H.+, Yang W., Olsen N., Bao L., Lu B., Wu S., Cui L., Gao X. (2021) Alcohol consumption and risk of rheumatoid arthritis: a prospective study. Nutrients, 13(7), 2231; https://doi.org/10.3390/nu13072231.
  14. Wu Z., Huang Z., Lichtenstein A., Liu Y., Chen S., Jin Y., Na M., Bao L., Wu S. and Gao X. (2021) The Risk of Ischemic Stroke and Hemorrhagic Stroke in Chinese adults with low density lipoprotein cholesterol concentrations<70 mg/dL. BMC Medicine, 16;19(1):142. doi: 10.1186/s12916-021-02014-4.
  15. Niu, X. M., Rao, A., Chen, D.+, Sheng, B.+, Weir, S., Umar, E., ... & Bao, L.* (2020). Using factor analyses to estimate the number of female sex workers across Malawi from multiple regional sources. Annals of Epidemiology, 55, 34-40. https://doi.org/10.1016/j.annepidem.2020.12.001
  16. Parsons, J.+, Niu, X., & Bao, L.* (2020). Evaluating the relative contribution of data sources in a Bayesian analysis with the application of estimating the size of hard to reach populations. Statistical Communications in Infectious Diseases, 12(s1): 20190020; https://doi.org/10.1515/scid-2019-0020.
  17. Sheng B.+, Eaton J., Mahy M. and Bao L.* (2020). Comparison of HIV Prevalence Among Antenatal Clinic Attendees Estimated from Routine Testing and Unlinked Anonymous Testing, Statistics in Biosciences, 12: 279每294; https://doi.org/10.1007/s12561-020-09265-4
  18. Eaton J., Brown T., Puckett R., Glaubius R., Mutai K., Bao L., Salomon J., Stover J. Mahy M., Hallett T. (2019). The Estimation and Projection Package Age-Sex Model and the r-hybrid model: new tools for estimating HIV incidence trends in sub-Saharan Africa, AIDS. 33: S235每S244.
  19. Datta A., Lin W., Rao A., Diouf D., Edwards J., Bao L., Louis T. and Baral S. (2018). Bayesian estimation of MSM population size in Cote d'Ivoire, Statistics and Public Policy. 6(1): 1每13. doi: 10.1080/2330443X.2018.1546634
  20. Huang S.+, Li J., Wu Y., Ranjbar S., Xing A., Zhao H., Wang Y., Shearer G. C., Bao L., Lichtenstein A. H., Wu S. and Gao X. (2018). Tea consumption and longitudinal change in high-density lipoprotein cholesterol concentration in Chinese adults, Journal of the American Heart Association. 7, 13, e008814.
  21. Cheng F.W.+, Gao X., Bao L., Mitchell D.C., Wood C., Sliwinski M.J., Smiciklas-Wright H., Still C.D., Rolston D.D.K., and Jensen G.L. (2017). Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis. Obesity (Silver Spring). 25(7):1263-1269.
  22. Wu Z., Su X., Sheng H., Chen Y., Gao X., Bao L., Jin W. (2017) Conditional Inference Tree for Multiple Gene-Environment Interactions on Myocardial Infarction Among Chinese Men. Archives of Medical Research. doi.org/10.1016/j.arcmed.2017.12.001
  23. Eaton J. and Bao L. (2017). Accounting for non-sampling error in estimates of HIV epidemic trends from antenatal clinic sentinel surveillance. AIDS 31: S61-S68.
  24. Niu X., Zhang A.+, Brown T., Puckett R., Mahy M., Bao L.* (2017). Incorporation of hierarchical structure into EPP fitting with examples of estimating sub-national HIV/AIDS dynamics. AIDS 31: S51-S59.
  25. Sheng B.+, Marsh K., Slavkovic A.B., Simon Gregson, Eaton J., Bao L.* (2017). Statistical Models for Incorporating Data from Routine HIV Testing of Pregnant Women at Antenatal Clinics into HIV/AIDS Epidemic Estimates. AIDS 31: S87-S94.
  26. Hunter D.R., Bao L., and Poss M. (2017). Assignment of Endogeneous Retrovirus Integration Sites Using a Mixture Mode. Annals of Applied Statistics 11(2): 751-770.
  27. Thomas J. and Bao L. (2016). Modeling the dynamics of an HIV epidemic. Dynamic Demographic Analysis. 91-144.
  28. Malhotra, R., Elleder, D., Bao, L., Hunter, D. R., Poss, M., Acharya, R. (2016). A pipeline for identifying integration sites of mobile elements in the genome using next-generation sequencing. Proceedings of the 8th International Conference on Bioinformatics and Computational Biology (BICOB). 63-69.
  29. Li R., Dudek S.M., Kim D., Hall M.A., Bradford Y., Peissig P.L., Brilliant M.H., Linneman J.G., McCarty C.A., Bao L., and Ritchie M.D. (2016) Identification of genetic interaction networks via an evolutionary algorithm evolved Bayesian Network. Bio Data Mining, 9(18) DOI: 10.1186/s13040-016-0094-4.
  30. Bao L.*, Raftery A.E., Reddy A. (2015) Estimating the sizes of populations at risk of HIV infection from multiple data sources using a Bayesian hierarchical model.Statistics and Its inference. 8(2): 125每136.
  31. Bao L., Elleder D., Malhotra R., DeGiorgio M., Maravegias T., Horvath L., Carrel L., Gillin C., Hron T., Fabryova H., Hunter D. and Poss M. (2014) Computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non-model organism. Computation. 2: 221-245.
  32. Bao L.*, Ye J., Hallett T.B. (2014) Incorporating incidence information within the UNAIDS estimation and projection Package framework: a study based on simulated incidence assay data. AIDS 28: S515-S522.
  33. Brown T., Bao L., Eaton J.W., Hogan D.R., Mahy M., March K., Mathers B.M., Puckett R. (2014) Improvements in prevalence trend fitting and incidence estimation in EPP 2013. AIDS 28: S415-S425.
  34. Kamath P.., Elleder D., Bao L., Cross P., Poss M. (2013) The population history of endogenous retroviral elements in mule deer (Odocoileus hemionus). Journal of Heredity, 105: 173-187.
  35. Bao L. (2012) A new infectious disease model for estimating and projecting HIV/AIDS epidemics. Sexually Transmitted Infections, 88: i58-i65.Bao L. (2012). A new infectious disease model for estimating and projecting HIV/AIDS epidemics. Sexually Transmitted Infections, 88: i58-i65.
  36. Bao L.*, Salomon J.A., Brown T., Raftery A.E., and Hogan D.R. (2012) Modelling national HIV/AIDS epidemics: revised approach in the UNAIDS estimation and projection package 2011. Sexually Transmitted Infections, 88: i3-i10.
  37. Clark S.J., Thomas J., and Bao L. (2012) Estimates of age-specific reductions in HIV Prevalence in Uganda: Bayesian melding estimation and probabilistic population forecast with an HIV-enabled cohort component projection model. Demographic Research 27: 743-774.
  38. Meila M.P. and Bao L. (2010) An exponential model for infinite rankings. Journal of Machine Learning Research, 11: 3481-3518.
    pdf Technical report 529 Technical report 524
  39. Raftery A.E. and Bao L. (2010) Estimating and projecting trends in HIV/AIDS generalized epidemics using incremental mixture importance sampling. Biometrics, 66: 1162-1173.
    pdf Technical report 560
  40. Bao L. and Raftery A.E. (2010) A stochastic infection rate model for estimating and projecting national HIV prevalence rates. Sexually Transmitted Infections. 86: ii93-ii99.
  41. Brown T., Bao L., Raftery A.E., Salomon J.A., Baggaley R.F., Stover J., and Gerland P. (2010) EPP 2009: bringing the UNAIDS estimation and projection package into the ART era. Sexually Transmitted Infections. 86: ii3-ii10.
  42. Bao L., Gneiting T., Grimit E., Guttrop P. and Raftery A.E. (2010) Bias correction and Bayesian model averaging for ensemble forecasts of surface wind direction. Monthly Weather Review. 138:1811-1821.
    pdf Technical report 557
  43. Bao L., Zhu, Z. and Ye, J.(2009) Modeling oncology gene pathways network with multiple genotypes and phenotypes via a copula method. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. 237-246.
  44. Meila M.P. and Bao L. (2008) Estimation and clustering with infinite rankings. Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence. 393-402.
  45. Bao L., Gu H., Dunn, K.A. and Bielawski J. (2008) Likelihood Based Clustering (LiBaC) for codon models, a method for grouping sites according to similarities in the underlying process of evolution. Molecular Biology and Evolution. 25:1995-2007.
  46. Bao L., Gu H., Dunn K.A. and Bielawski J. (2007) Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evolutionary Biology. 7 Suppl 1:S5.
  47. Mitnitski A, Bao L., and Rockwood K. (2007) A cross-national study of transitions in deficit counts in two birth cohorts: implications for modeling ageing. Experimental Gerontology. 42:241-246.
  48. Mitnitski A, Bao L., and Rockwood K. (2006) Going from bad to worse: a stochastic model of transitions in deficit accumulation, in relation to mortality. Mechanisms of Ageing and Development. 127: 490-493.

Contributed Software
Working Groups

Current students in my research group:

Former students in my research group:


updated on Feb 28 2022