NeurIPS 2021 Lead Author Spotlight
Hassan Mortagy, PhD Operations Research student
Optimization algorithms such as projected Newton’s method, FISTA, mirror descent and its variants enjoy near-optimal regret bounds and convergence rates, but suffer from a computational bottleneck of computing “projections” in potentially each iteration (e.g., O(T1/2) regret of online mirror descent). On the other hand, conditional gradient variants solve a linear optimization in each iteration, but result in suboptimal rates (e.g., O(T3/4) regret of online Frank-Wolfe). Motivated by this trade-off in runtime v/s convergence rates, we consider iterative projections of close-by points over widely-prevalent submodular base polytopes B(f). We develop a toolkit to speed up the computation of projections using both discrete and continuous perspectives. We subsequently adapt the away-step Frank-Wolfe algorithm to use this information and enable early termination. For the special case of cardinality based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of Ω(n/log(n)). Our theoretical results show orders of magnitude reduction in runtime in preliminary computational experiments.
Q&A with Hassan Mortagy
(click question to show answer)
What motivated your work on this paper?
In practice, we always encounter the tradeoff between the performance of an algorithm and its runtime. It is often the case that algorithms that have optimal theoretical performance suffer from computational bottlenecks that cause their running times to be very high and restrictive in practice. However, in some settings having optimal performance is desirable and necessary since that would translate to better revenue for example. The questions is, should we give up on those algorithms despite their optimal performance? In this paper, we make an argument that the answer to that question is no. In particular, we attempt to eliminate computational bottlenecks within some optimal machine learning and optimization algorithms (such as mirror descent), with the aim of improving their runtimes by orders of magnitude to make these algorithms competitive in practice, while maintaining their optimal performance.
If readers remember one takeaway from the paper, what should it be and why?
In this work, we bridge discrete and continuous optimization insights to eliminate projections bottlenecks appearing within optimal optimization algorithms such as mirror descent.
Were there any “aha” moments or lessons that you’ll use to inform your future work?
This has to be to always keep the big picture and practical impact of your work in mind. Also, that submitting to NeurIPS is a goal that should always be in your mind while doing the research and not after the research is done; this allows you to adapt the work to the “NeurIPS way” as you go along.
What are you most excited for at NeurIPS and what do you hope to take away from the experience?
I am really excited to see the latest cutting-edge research, and hope to network and make as many connections as I can.