Elnur Gasanov

I am a second year PhD student in Optimization and Machine Learning at Visual Computing Center (VCC), at King Abdullah University of Science and Technology, advised by professor Peter Richtárik. My research interests include Stochastic Optimization, Machine Learning and Randomized Linear Algebra.

I earned my bachelor's degree in Applied Mathematics and Physics from Moscow Institute of Physics and Technology.

Email  /  CV  /  LinkedIn  /  GitHub

profile photo
News

Talks and poster presentations

  • In July 2021, at "Optimization without Borders" conference, I presented our poster "FedMix: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning" (poster, photo)
  • In November 2019, at KAUST-Tsinghua-Industry workshop, I presented a poster, based on our NeurIPS 2018 paper(poster here).
  • In June 2019, at DS3, I presented our poster "A New Randomized Method for Solving Large Linear Systems".
  • In June 2019, at LJK, I gave a talk on our new asynchronous delay-tolerant distributed algorithm.
  • In February 2018, at Optimization and Big Data workshop, our team presented a poster on randomized linear algebra(poster here).

Publications

FedMix: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning
Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learn- ing is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control. We identify several key desiderata in frameworks for federated learning and introduce a new framework, FedMix, that takes into account the unique challenges brought by federated learning. FedMix has a standard finite-sum form, which enables practitioners to tap into the immense wealth of existing (potentially non-local) methods for distributed optimization. Through a smart initialization that does not require any communication, FedMix does not re- quire the use of local steps but is still provably capable of performing dissimilarity regularization on par with local methods. We give several algorithms for solving the FedMix formulation efficiently under communication constraints. Finally, we corroborate our theoretical results with extensive experimentation.
Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik
FL-ICML'21 (workshop)

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization over Time-Varying Networks
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time. We solve two fundamental problems for this task. First, we establish the first lower bounds on the number of decentralized communication rounds and the number of local computations required to find an ε-accurate solution. Second, we design two optimal algorithms that attain these lower bounds: (i) a variant of the recently proposed algorithm ADOM (Kovalev et al., 2021) enhanced via a multi-consensus subroutine, which is optimal in the case when access to the dual gradients is assumed, and (ii) a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
Dmitry Kovalev, Elnur Gasanov, Peter Richtárik, Alexander Gasnikov
NeurIPS 2021

From Local SGD to Local Fixed-Point Methods for Federated Learning
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computations done locally on a mobile device. We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations. In both cases, the goal is to limit communication of the locally-computed variables, which is often the bottleneck in distributed frameworks. We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach.
Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik
ICML 2020

Stochastic Spectral and Conjugate Descent Methods
The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmenta- tion of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.
Dmitry Kovalev, Eduard Gorbunov, Elnur Gasanov, Peter Richtárik
NeurIPS 2018

Creation of approximating scalogram description in a problem of movement prediction [in Russian]
The paper addresses the problem of a thumb movement prediction using electrocorticographic (ECoG) activity. The task is to predict thumb positions from the voltage time series of cortical activity. The scalograms are used as input features to this regression problem. Scalograms are generated by the spatio-spectro-temporal integration of voltage time series across multiple cortical areas. To reduce the dimension of a feature space, local approximation is used: every scalogram is approximated by parametric model. The predictions are obtained with partial least squares regression applied to local approximation parameters. Local approximation of scalograms does not significantly lower the quality of prediction while it efficiently reduces the dimension of feature space.
Elnur Gasanov, Motrenko Anastasia
Journal of Machine Learning and Data Analysis (in Russian)

This guy makes a cool webpage.