• Biomarker Shedding at the Institute for Disease Modeling Symposium

    Yuke Wang and I presented our work on biomarker shedding at the Institute for Disease Modeling Symposium hosted by the Bill and Melinda Gates Foundation in Seattle, WA.

  • BST228: Applied Bayesian Analysis

    BST228 Applied Bayesian Analysis is a practical introduction to the Bayesian analysis of biomedical data taught in the Department of Biostatistics at the Harvard T.H. Chan School of Public Health taught by Prof Stephenson and Dr Hoffmann. It is an intermediate graduate-level course in the philosophy, analytic strategies, implementation, and interpretation of Bayesian data analysis. Specific topics that will be covered include: the Bayesian paradigm; Bayesian analysis of basic models; Markov Chain Monte Carlo for posterior inference; Stan R software package for Bayesian data analysis; linear regression; hierarchical regression models; generalized linear models; meta-analysis; models for missing data.

  • Scientific Coding and Data Science with Python and Git

    Based on the popular Software Carpentries project, Ariel Vardi, Karen Soenen, and I delivered a two-day workshop for fifty graduate students and research staff at the Woods Hole Oceanographic Institution. Slides from my presentation on using the version control system Git and GitHub for reproducible data science are available here.

  • Approximate Inference for Longitudinal Mechanistic HIV Contact Networks

    Network models are increasingly used to study infectious disease spread. Exponential Random Graph models have a history in this area, with scalable inference methods now available. An alternative approach uses mechanistic network models. Mechanistic network models directly capture individual behaviors, making them suitable for studying sexually transmitted diseases. Combining mechanistic models with Approximate Bayesian Computation allows flexible modeling using domain-specific interaction rules among agents, avoiding network model oversimplifications. These models are ideal for longitudinal settings as they explicitly incorporate network evolution over time. We implemented a discrete-time version of a previously published continuous-time model of evolving contact networks for men who have sex with men (MSM) and proposed an ABC-based approximate inference scheme for it. As expected, we found that a two-wave longitudinal study design improves the accuracy of inference compared to a cross-sectional design. However, the gains in precision in collecting data twice, up to 18%, depend on the spacing of the two waves and are sensitive to the choice of summary statistics. In addition to methodological developments, our results inform the design of future longitudinal network studies in sexually transmitted diseases, specifically in terms of what data to collect from participants and when to do so.

  • Network Layout Algorithm With Covariate Smoothing

    Network science explores intricate connections among objects, employed in diverse domains like social interactions, fraud detection, and disease spread. Visualization of networks facilitates conceptualizing research questions and forming scientific hypotheses. Networks, as mathematical high-dimensional objects, require dimensionality reduction for (planar) visualization. Visualizing empirical networks present additional challenges. They often contain false positive (spurious) and false negative (missing) edges. Traditional visualization methods don’t account for errors in observation, potentially biasing interpretations. Moreover, contemporary network data includes rich nodal attributes. However, traditional methods neglect these attributes when computing node locations. Our visualization approach aims to leverage nodal attribute richness to compensate for network data limitations. We employ a statistical model estimating the probability of edge connections between nodes based on their covariates. We enhance the Fruchterman-Reingold algorithm to incorporate estimated dyad connection probabilities, allowing practitioners to balance reliance on observed versus estimated edges. We explore optimal smoothing levels, offering a natural way to include relevant nodal information in layouts. Results demonstrate the effectiveness of our method in achieving robust network visualization, providing insights for improved analysis.