Thoughts on doing PhD in optimization
I had a lot of pleasure working on optimization during my PhD. I believe that optimization as a field will remain relevant, while the specific ideas that are popular will keep changing. The field has been constantly evolving over many decades, and it seems that the pace of change has accelerated recently. Nowadays, I rarely hear about primal-dual methods, mirror descent, and variance reduction, despite these ideas being very popular just a few years ago (and I worked on them too), not to mention topics that were popular before them.
Noticing that certain topics are falling out of popularity, one may jump to the conclusion that the whole field is dying. This is where I think a quote from Michael Jordan’s 2018 talk would be appropriate:
Optimization <…> was a topic that, when I was a graduate student, was viewed as dead, 70s and 80s. Dead in the sense ‘all hard problems have been solved and people are moving on to other things.’ There can be nothing that’s less true now.
I’ll add to this that it’s very unlikely optimization will be considered dead again any time soon. This resilience of the field is due to the persistent core challenges: we need fast solvers, a lot of problems are non-smooth and non-convex, and the size of the problems only increases. The approaches to the challenges, however, will keep changing, so I recommend keeping an eye on the literature and looking for cool ideas that appeal to you.
My research directions have changed over time as well. When I was starting my PhD, I was skeptical about studying standard methods such as SGD, I wrongly thought the theory was already mature and I doubted my ability to contribute meaningfully. I also remember telling another PhD student Filip Hanzely in 2017 that I didn’t believe there would be interesting problems left in optimization by 2022. Boy, I was wrong. Surprisingly, the more work I’ve done, the less I feel that I understand how stochastic gradient descent works. For example, crazy results from 2023 by Grimmer et al. showed me that I don’t understand all the nuances of gradient descent even without the noise. Adaptive methods are even more of a mystery, with the best method for deep learning, Adam, having no satisfying theory. I believe I’m not alone in hoping that one day there will be an optimizer better than Adam that has some theory to back it up, but there is no road map to get there.
In conclusion, I’m quite optimistic about the field in general and if I were to choose a topic for my PhD, I’d still seriously consider optimization. I enjoyed working on its theory, there is a lot of good math out there, and you can learn a lot about computation and linear algebra by running experiments and checking your intuition. If you do choose it, I highly recommend learning JAX and checking numerically what methods are useful and easy to use. There is often a difference between the best method in theory and the ones that are used by practitioners, and it’s important to be aware of it. I do not mean to say that the theoretical methods are not useful – I find theory to be the best guide in the quest of exploring ideas – but I also find it important to then re-evaluate my intuition by doing numerical experiments, which quite often changes my perspective on what ideas are interesting to push more.
There are a few other factors to consider before deciding on what to work on during your PhD. It is generally easier to find research positions for theoretical topics inside academia. If, however, you want to eventually work in industry, I recommend staying close to applications. And remember, choosing a good advisor can be as crucial as selecting your research topic. Always reach out to the former students of the potential advisor to ask about their experience. And if you hear that they were happy working with their advisor, it is often a much better signal than the number of citations or publications by that person.