Hi there! I’m the Head of Machine Learning and Statistics at Sisu Data. We’re working on challenging (and pressing) problems – across high-dimensional statistical estimation, testing, causal inference, time series, machine learning, and optimization – in the analytics space. I’m hiring, so if you find any of these things interesting, feel free to get in touch!

Before starting at Sisu, I was a post-doc at Stanford University, where I worked with John Duchi, Stephen Boyd, and Guenther Walther on problems at the interface of statistics and optimization, and I completed my Ph.D. in Machine Learning at Carnegie Mellon University, where I worked with Ryan Tibshirani and Zico Kolter. And before that, I worked at Microsoft and Microsoft Research (on Bing) for several years, doing applied machine learning.


research interests

I'm broadly interested in statistics, machine learning, and optimization. I'm also interested in lots of applications: to finance, operations research, public policy, social good, sustainability, epidemiology, healthcare, autonomous vehicles, analytics broadly, ...

A good amount of my work has focused on building reliable and trustworthy machine learning systems, by taking a close look at the (many) statistical and computational issues that arise after a machine learning model has been deployed (i.e., released) into real-world systems and scientific applications. To tackle these issues, my fantastic collaborators and I have developed:

  • methodology and associated theory that tracks deployed model performance, raises an alarm when performance degrades, and generates interpretable diagnostics
  • methodology and theory that repairs and protects any deployed model from future degradations in performance without retraining it
  • reliable algorithms for (re)training models without carefully tuning step sizes
  • statistical theory advancing our understanding of popular optimization heuristics
On a more technical level, I've worked on projects related to the following areas:
  • distribution shift, robust optimization, subpopulation-level performance
  • conformal inference, distribution-free uncertainty quantification, weak supervision
  • large-scale multiple testing
  • risk estimation and model selection
  • tuning parameter-free stochastic optimization
  • sparse regression
  • implicit regularization


    1. The Lifecycle of a Statistical Model: Model Failure Detection, Identification, and Refitting. Alnur Ali, Maxime Cauchois, and John Duchi. [paper]
    2. A Comment and Erratum on “Excess Optimism: How Biased is the Apparent Error of an Estimator Tuned by SURE?” Maxime Cauchois, Alnur Ali, and John Duchi. [paper]
    3. Predictive Inference with Weak Supervision. Maxime Cauchois, Suyash Gupta, Alnur Ali, and John Duchi. [paper]
    4. Accelerated Gradient Flow: Risk, Stability, and Implicit Regularization. Yue Sheng and Alnur Ali. [paper]
    5. Computationally Efficient Posterior Inference With Langevin Monte Carlo and Early Stopping. Dushyant Sahoo, Alnur Ali, and Edgar Dobriban. [paper]
    6. Minimizing Oracle-Structured Composite Functions. Xinyue Shen, Alnur Ali, and Stephen Boyd. Optimization and Engineering (accepted with revisions), 2021. [paper] [code]
    7. Minimum-Distortion Embedding. Akshay Agrawal, Alnur Ali, and Stephen Boyd. Foundations and Trends in Machine Learning, 2021. [paper] [code]
    8. Robust Validation: Confident Predictions Even When Distributions Shift. Maxime Cauchois, Suyash Gupta, Alnur Ali, and John Duchi. Journal of the American Statistical Association (accepted with revisions), 2021. [paper]
    9. Confidence Bands for a Log-Concave Density. Guenther Walther, Alnur Ali, Xinyue Shen, and Stephen Boyd. Journal of Computational and Graphical Statistics (accepted with revisions), 2020. [paper]
    10. The Implicit Regularization of Stochastic Gradient Flow for Least Squares. Alnur Ali, Edgar Dobriban, and Ryan Tibshirani. International Conference on Machine Learning (ICML), 2020. [paper] [blog]
    11. A Continuous-Time View of Early Stopping for Least Squares. Alnur Ali, Zico Kolter, and Ryan Tibshirani. International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. [paper] [blog]
    12. The Generalized Lasso Problem and Uniqueness. Alnur Ali and Ryan Tibshirani. Electronic Journal of Statistics, 2019. [paper]
    13. Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation. Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydin Buluc, Dmitriy Morozov, Leonid Oliker, Katherine Yelick, and Sang-Yun Oh. International Conference on Artificial Intelligence and Statistics (AISTATS), 2018. [paper] [code]
    14. A Semismooth Newton Method for Fast, Generic Convex Programming. Alnur Ali, Eric Wong, and Zico Kolter. International Conference on Machine Learning (ICML), 2017. [paper] [code]
    15. Generalized Pseudolikelihood Methods for Inverse Covariance Estimation. Alnur Ali, Kshitij Khare, Sang-Yun Oh, and Bala Rajaratnam. International Conference on Artificial Intelligence and Statistics (AISTATS), 2017. [paper] [code]
    16. The Multiple Quantile Graphical Model. Alnur Ali, Zico Kolter, and Ryan Tibshirani. Advances in Neural Information Processing Systems 29 (NeurIPS), 2016. [paper]
    17. Disciplined Convex Stochastic Programming: A New Framework for Stochastic Optimization. Alnur Ali, Zico Kolter, Steven Diamond, and Stephen Boyd. Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), 2015. [paper]
    18. Active Learning With Model Selection. Alnur Ali, Rich Caruana, and Ashish Kapoor. AAAI Conference on Artificial Intelligence (AAAI), 2014. [paper]
    19. Experiments With Kemeny Ranking: What Works When? Alnur Ali and Marina Meila. Mathematical Social Sciences, 2012. [paper] [code]
    20. Learning Lexicon Models from Search Logs for Query Expansion. Jianfeng Gao, Xiaodong He, Shasha Xie, and Alnur Ali. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2012. [paper]
    21. Preferences in College Applications: A Nonparametric Bayesian Analysis of Top-10 Rankings. Alnur Ali, Brendan Murphy, Marina Meila, and Harr Chen. Neural Information Processing Systems (NeurIPS) Workshop on Computational Social Science, 2010. [paper] [supplement]