Sanjiv Kumar

PhD (2005; Robotics, SCS, CMU)

Distinguished Research Scientist

Google Research, NY

76, Ninth Ave

New York, NY 10011, USA

email: sanjivk AT google.com

 

 

Research Interests

 

Large Scale Machine Learning, Artificial Intelligence, Health AI, Computer Vision, Robotics

 

Teaching

 

EECS6898: Large-Scale Machine Learning, Fall 2010, Columbia University, New York, NY.

 

 

Tutorials

 

Approximate Nearest Neighbor Search (Trees and Hashes): Part-I, Part-II.

 

Fast Matrix Decomposition: Part-I, Part-II.

 


           
Recent Publications [ All Publications ]
 

.        A. K. Menon, A. S. Rawat, S. J. Reddi, S. Kim, and S. Kumar

A Statistical Perspective on Distillation

International Conference on Machine Learning (ICML), 2021.

[pdf]


.        A. S. Rawat, A. K. Menon, W. Jitkrittum, S. Jayasumana, F. X. Yu, S. J. Reddi, and S. Kumar

Disentangling Labeling and Sampling Bias for Learning in Large-output Spaces

International Conference on Machine Learning (ICML), 2021.

[pdf]


.        S. J. Reddi, R. K. Pasumarthi, A. K. Menon, A. S. Rawat, F. Yu, S. Kim, A. Veit, and S. Kumar

RankDistil: Knowledge Distillation for Ranking

International Conference on Artificial Intelligence and Statistics (AISTATS) 2021.

[pdf]


.        A. K. Menon, A. S. Rawat, and S. Kumar

Overparameterisation and Worst-case Generalisation: Friend or Foe?

International Conference on Learning Representations (ICLR), 2021.

[pdf]


.        S. J. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konecný, S. Kumar, and H. B. McMahan

Adaptive Federated Optimization

International Conference on Learning Representations (ICLR), 2021.

[pdf]


.        A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain A. Veit and S. Kumar

Long-tail Learning via Logit Adjustment

International Conference on Learning Representations (ICLR), 2021.

[pdf]


.        C.-Y. Hsieh, C.-K. Yeh, X. Liu, P. Ravikumar, S. Kim, S. Kumar, and C.-J. Hsieh

Evaluations and Methods for Explanation Through Robustness Analysis

International Conference on Learning Representations (ICLR), 2021.

[pdf]


.        J. Zhang, A. K. Menon, A. Veit, S. Bhojanapalli, S. Kumar, and S. Sra

Coping With Label Shift via Distributionally Robust Optimisation

International Conference on Learning Representations (ICLR), 2021.

[pdf]


.        C. Yun, Y.-W. Chang, S. Bhojanapalli, A. S. Rawat, S. Reddi, and S. Kumar

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Neural Information Processing Systems (NeurIPS), 2020.

[pdf]


.        J. Zhang, S. P. Karimireddy, A. Veit, S. Kim, S. Reddi, and S. Kumar

Why are Adaptive Methods Good for Attention Models?

Neural Information Processing Systems (NeurIPS), 2020.

[pdf]


.        M. Weber, M. Zaheer, A. S. Rawat, A. Menon, and S. Kumar

Robust Large-Margin Learning in Hyperbolic Space

Neural Information Processing Systems (NeurIPS), 2020.

[pdf]


.        Y. Liu, A. T. Suresh, F. Yu, S. Kumar, and M. Riley

Learning Discrete Distributions: User vs Item-Level Privacy

Neural Information Processing Systems (NeurIPS), 2020.

[pdf]


.        H. Chen, S. Si, Y. Li, C. Chelba, S. Kumar, D. Boning, and C.-J. Hsieh

Multi-Stage Influence Function

Neural Information Processing Systems (NeurIPS), 2020.

[pdf]


.        S. Bhojanapalli, C. Yun, A. S. Rawat, S. Reddi, and S. Kumar

Low-Rank Bottleneck in Multi-head Attention Models

International Conference on Machine Learning (ICML), 2020.

[pdf]


.        M. Lukasik, S. Bhojanapalli, A. K. Menon, and S. Kumar

Does Label Smoothing Mitigate Label Noise?

International Conference on Machine Learning (ICML), 2020.

[pdf]


.        R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern and S. Kumar

Accelerating Large-Scale Inference with Anisotropic Vector Quantization

International Conference on Machine Learning (ICML), 2020.

[pdf]


.        F. X. Yu, A. S. Rawat, A. K. Menon, and S. Kumar

Federated Learning with Only Positive Labels

International Conference on Machine Learning (ICML), 2020.

[pdf]


.        Y. You, J. Li, S. Reddi, J. Hseu, S. Kumar, S. Bhojanapalli, X. Song, J. Demmel, K. Keutzer, and C.-J. Hsieh

Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes

International Conference on Learning Representations (ICLR), 2020.

[pdf]


.        C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar

Are Transformers Universal Approximators of Sequence-to-Sequence Functions?

International Conference on Learning Representations (ICLR), 2020.

[pdf]


.        A. K. Menon, A. S. Rawat, S. J. Reddi, and S. Kumar

Can Gradient Clipping Mitigate Label Noise?

International Conference on Learning Representations (ICLR), 2020.

[pdf]


.        W.-C. Chang, F. Yu, Y.-W. Chang, and S. Kumar

Pre-training Tasks for Embedding-based Large-scale Retrieval

International Conference on Learning Representations (ICLR), 2020.

[pdf]


.        Y. Ruan, Y. Xiong, S. Reddi, S. Kumar, C.-J. Hsieh

Learning to Learn by Zeroth-Order Oracle

International Conference on Learning Representations (ICLR), 2020.

[pdf]


.        C.-J. Hsieh, Q. Cao, S. Kumar, S. Si, T. Xiao, and X. Liu

How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework

International Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[pdf]


.        A. K. Menon, A. S. Rawat, S. J. Reddi, and S. Kumar

Multilabel Reductions: What is My Loss Optimising?

Neural Information Processing Systems (NeurIPS), 2019.

[pdf]


.        C. Guo, A. Mousavi, X. Wu, D. Holtmann-Rice, S. Kale, S. Reddi and S. Kumar

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Neural Information Processing Systems (NeurIPS), 2019.

[pdf]


.        A. S. Rawat, J. Chen, F. Yu, A. T. Suresh, and S. Kumar

Sampled Softmax with Random Fourier Features

Neural Information Processing Systems (NeurIPS), 2019.

[pdf]


.        M. Staib, S. Reddi, S. Kale, S. Kumar, and S. Sra

Escaping Saddle Points with Adaptive Gradient Methods

International Conference on Machine Learning (ICML), 2019.

[pdf]


.        S. Wu, A. G. Dimakis, S. Sanghavi, F. X. Yu, D. Holtmann-Rice, D. Storcheus, A. Rostamizadeh, and S. Kumar

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

International Conference on Machine Learning (ICML), 2019.

[pdf]


.        P.-H. (Patrick) Chen, S. Si, S. Kumar, Y. Li, and C.-J. Hsieh

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

International Conference on Learning Representations (ICLR), 2019.

[pdf]


.        S. Reddi, S. Kale, F. X. Yu, D. Holtmann-Rice, J. chen and S. Kumar

Stochastic Negative Mining for Learning with Large Output Spaces

International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

[pdf]


.        Q. Geng, W. Ding, R. Guo, and S. Kumar

Optimal Noise-Adding Mechanism in Additive Differential Privacy

International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

[pdf]


.        S. Reddi, M. Zaheer, D. Sachan, S. Kale, and S. Kumar

Adaptive Methods for Nonconvex Optimization

Neural Information Processing Systems (NIPS), 2018.

[pdf]


.        N. Agarwal, A. T. Suresh, F. X. Yu, S. Kumar, and H. B. McMahan

cpSGD: Communication-efficient and differentially-private distributed SGD

Neural Information Processing Systems (NIPS), 2018.

[pdf]


.        Ian E. H. Yen, S. Kale, F. X. Yu, D. Holtmann-Rice, S. Kumar, P. Ravikumar

Loss Decomposition for Fast Learning in Large Output Spaces

International Conference on Machine Learning (ICML), 2018.

[pdf]


.        S. Reddi, S. Kale, S. Kumar [best paper award]

On the Convergence of Adam and Beyond

International Conference on Learning Representations (ICLR), 2018.

[pdf]


.        S. Si, S. Kumar, Y. Li

Nonlinear Online Learning with Adaptive Nystrom Approximation

arXiv:1802.07887, 2018.

[pdf]


.        X. Wu, R. Guo, A. T. Suresh, S. Kumar, D. Holtmann-Rice, D. Simcha, F. X. Yu

Multiscale Quantization for Fast Similarity Search

Neural Information Processing Systems (NIPS), 2017.

[pdf]


.        B. Dai, R. Guo, S. Kumar, N. He, L. Song

Stochastic Generative Hashing

International Conference on Machine Learning (ICML), 2017.

[pdf]


.        A. T. Suresh, F. X. Yu, S. Kumar, H. B. McMahan

Distributed Mean Estimation with Limited Communication

International Conference on Machine Learning (ICML), 2017.

[pdf]


.        X. Zhang, F. X. Yu, S. Kumar, S. F. Chang

Learning Spread-out Local Feature Descriptors

International Conference on Computer Vision (ICCV), 2017.

[pdf]


.        K. Zhong, R. Guo, S. Kumar, B. Yan, D. Simcha, I. S. Dhillon

Fast Classification with Binary Prototypes

International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.

[pdf]


.        F. X. Yu, A. T. Suresh, K. Choromanski, D. Holtmann-Rice, S. Kumar

Orthogonal Random Features

Neural Information Processing Systems (NIPS), 2016.

[pdf]


.        A. Choromanska, K. Choromanski, M. Bojarski, T. Jebara, S. Kumar, Y. LeCun

Binary Embeddings with Structured Hash Projections

International Conference on Machine Learning (ICML), 2016.

[pdf]


.        R. Guo, S. Kumar, K. Choromanski, and D. Simcha

Quantization based Fast Inner Product Search

International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.

[pdf]

Previous Arxiv version: arXiv:1509.01469, 2015. [pdf]

 

.        J. Wang, W. Liu, S. Kumar, S. F. Chang

Learning to Hash for Indexing Big Data - A Survey

Proceedings of the IEEE, Volume: 104, Issue: 1 , Jan. 2016.

[pdf]


.        J. Pennington, F. X. Yu, S. Kumar

Spherical Random Features for Polynomial Kernels

Neural Information Processing Systems (NIPS), 2015.

[pdf]


.        V. Sindhwani, T. Sainath, S. Kumar

Structured Transforms for Small-Footprint Deep Learning

Neural Information Processing Systems (NIPS), 2015.

[pdf]


.        X. Zhang, F. X. Yu, Ruiqi Guo, S. Kumar, S. Wang, S.-F. Chang

Fast Orthogonal Projection Based on Kronecker Product

International Conference on Computer Vision (ICCV), 2015.

[pdf]


.        Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, and S. F. Chang

An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections

International Conference on Computer Vision (ICCV), 2015.

[pdf]

Previous arXiv version: arXiv:1502.03436v1, 2015. [pdf]

 

.        K. Choromanski, S. Kumar, and X. Liu

Fast Online Clustering with Randomized Skeleton Sets

arXiv:1506.03425v1, 2015.

[pdf]

 

.        F. X. Yu, S. Kumar, H. Rowley, and S. F. Chang

Compact Nonlinear Maps and Circulant Extensions

arXiv:1503.03893v1, 2015.

[pdf]

 

.        F. X. Yu, Y. Gong, and S. Kumar

Fast Binary Embedding for High-Dimensional Data

Book Chapter, Multimedia Data Mining and Analytics, 2015.

[pdf]

 

.        W. Liu, C. Mu, S. Kumar, and S. F. Chang

Discrete Graph Hashing

Neural Information Processing Systems (NIPS), 2014.

[pdf]

Supplementary material can be found here.

 

.        F. X. Yu, S. Kumar, Y. Gong, and S. F. Chang

Circulant Binary Embedding

International Conference on Machine Learning (ICML), 2014.

[pdf]

Matlab code can be found here.

 

.        F. X. Yu, D. Liu, S. Kumar, T. Jebara, and S. F. Chang

pSVM for Learning with Label Proportions

International Conference on Machine Learning (ICML), 2013.

[pdf]

The supplementary file with additional proofs and experiments is here.

 

.        Y. Gong, S. Kumar, H. Rowley, and S. Lazebnik

Learning Binary Codes for High-Dimensional Data Using Bilinear Projections

IEEE Computer Vision and Pattern Recognition (CVPR), 2013.

[pdf]

 

.        A. Talwalkar, S. Kumar, M. Mohri and H. Rowley

Large-scale SVD and Manifold Learning

Journal of Machine Learning Research (JMLR), 2013.

[pdf]

 

.        Y. Gong, S. Kumar, V. Verma, and S. Lazebnik

Angular Quantization-based Binary Codes for Fast Similarity Search

Advances in Neural Information Processing Systems (NIPS), 2012.

[pdf]

 

.        J. He, S. Kumar, and S. F. Chang

On the Difficulty of Nearest Neighbor Search

International Conference on Machine Learning (ICML), 2012.

[pdf]

The supplementary file containing all the proofs is here.

NOTE: This is a slightly edited version of what is in the ICML proceedings.

 

.        W. Liu, J. Wang, Y. Mu, S. Kumar, and S. F. Chang

Compact Hyperplane Hashing with Bilinear Functions

International Conference on Machine Learning (ICML), 2012.

[pdf]

The supplementary file containing extended proofs and results is here.

 

.        J. Wang, S. Kumar, and S. F. Chang

Semi-Supervised Hashing for Large Scale Search

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2012.

[pdf]

 

.        S. Kumar, M. Mohri and A. Talwalkar

Sampling Methods for the Nystrom Method

Journal of Machine Learning Research (JMLR), 2012.

[pdf]

 

.        W. Liu, J. Wang, S. Kumar, and S. F. Chang

Hashing with Graphs

International Conference on Machine Learning (ICML), 2011.

[pdf]

 

.        A. Talwalkar, S. Kumar, M. Mohri and H. Rowley

Large-Scale Manifold Learning

Book chapter in Manifold Learning Theory and Applications. Editors: Y. Ma and Y. Fu. CRC Press, 2011.

[pdf]

 

.        S. Kumar, M. Mohri and A. Talwalkar

Ensemble Nystrom

Book chapter in Ensemble Machine Learning: Theory and Applications, Springer, 2011.

[pdf]

Modified to correct an error in the computational complexity analysis. April 2011.

 

.        A. Makadia, V. Pavlovic and S. Kumar

Baselines for Image Annotation

International Journal on Computer Vision (IJCV), 2010.

[pdf]

 

.        J. Wang, S. Kumar, and S. F. Chang

Sequential Projection Learning for Hashing with Compact Codes

International Conference on Machine Learning (ICML), 2010.

[pdf]

 

.        Z. Wang, M. Zhao, Y. Song, S. Kumar and B. Li

YouTubeCat: Learning to Categorize Wild Web Videos

IEEE Computer Vision and Pattern Recognition (CVPR), 2010.

[pdf]

 

.        J. Wang, S. Kumar, and S. F. Chang

Semi-Supervised Hashing for Scalable Image Retrieval

IEEE Computer Vision and Pattern Recognition (CVPR), 2010.

[pdf]

Typo in Eq (21) corrected. June 2010.

 

.        S. Kumar

Discriminative Graphical Models for Context-Based Classification

Book chapter in Computer Vision: Detection, Recognition and Reconstruction, Springer, 2010.

Eds. R. Cipolla, S. Battiato, and G. M. Farinella.

[pdf]

 

.        S. Kumar, M. Mohri and A. Talwalkar

Ensemble Nystrom Method

Neural Information Processing Systems (NIPS), 2009.

[pdf]

Modified to correct an error in the computational complexity analysis. April 2011.

 

.        S. Kumar, M. Mohri and A. Talwalkar

On Sampling-based Approximate Spectral Decomposition

International Conference on Machine Learning (ICML), 2009.

[pdf]

 

.        S. Kumar, M. Mohri and A. Talwalkar

Sampling Techniques for the Nystrom Method

Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

[pdf]

 

.        A. Makadia, V. Pavlovic and S. Kumar

A New Baseline for Image Annotation

European Conference on Computer Vision (ECCV), 2008.

[pdf]

 

.        A. Talwalkar, S. Kumar and H. A. Rowley

Large-Scale Manifold Learning

IEEE Computer Vision and Pattern Recognition (CVPR), 2008.

[pdf]

 

.        M. Kim, S. Kumar, V. Pavlovic and H. A. Rowley

Face Tracking and Recognition with Visual Constraints in Real-World Videos

IEEE Computer Vision and Pattern Recognition (CVPR), 2008.

[pdf]

 

.        S. Kumar and H. A. Rowley

Classification of Weakly-Labeled Data with Partial Equivalence Relations

IEEE International Conference on Computer Vision (ICCV), 2007.

[pdf]

Some additional results and parts of the video and retrieval datasets used in this work can be seen here.

 

·        S. Kumar and M. Hebert

Discriminative Random Fields

International Journal of Computer Vision (IJCV), 68(2), 179-201, 2006.

[pdf]

 

·        S. Kumar, J. August and M. Hebert

Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study

Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), 2005.

[pdf]

This paper is an extended and revised version of the earlier work presented in Snowbird Learning Workshop, 2004.

 

·        S. Kumar

Models for Learning Spatial Interactions in Natural Images for Context-Based Classification

PhD Thesis, The Robotics Institute, School of Computer Science, Carnegie Mellon University, September 2005.

[pdf] [ps]

Revised October 2005.

 

·        S. Kumar and M. Hebert

A Hierarchical Field Framework for Unified Context-Based Classification

IEEE International Conference on Computer Vision (ICCV), 2005.

[pdf] [ps]

Revised October 2005.

 

·        C. Rother, S. Kumar, V. Kolmogorov and A. Blake

Digital Tapestry

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June, 2005.

[pdf]

 

·        S. Kumar and M. Hebert

Approximate Parameter Learning in Discriminative Fields

Snowbird Learning Workshop, Utah, 2004.

[pdf] [ps]

The synthetic dataset used for learning and inference experiments can be obtained from here.

 

·        S. Kumar and M. Hebert

Multiclass Discriminative Fields for Parts-Based Object Detection

Snowbird Learning Workshop, Utah, 2004.

[pdf]

 

·        S. Kumar and M. Hebert

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Advances in Neural Information Processing Systems, NIPS 16, 2004.

[pdf] [ps]

The binary denoising synthetic dataset used for training and testing can be obtained from here.

 

·        B. Nabbe, S. Kumar, and M. Hebert

      Path Planning with Hallucinated Worlds

      In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2004.

      [pdf]

           

·        S. Kumar and M. Hebert

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

IEEE International Conference on Computer Vision (ICCV), 2003.

[pdf] [ps]

 

·        S. Kumar and M. Hebert

Man-Made Structure Detection in Natural Images using a Causal Multiscale Random Field

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2003.

[pdf]

Some more example results and comparisons.

The structure detection database used for training and testing can be obtained from here.

 

·        S. Kumar, A. C. Loui, and M. Hebert

An Observation-Constrained Generative Approach for Probabilistic Classification of Image Regions

Image and Vision Computing, 21, pp. 87-97, 2003.

[pdf] 

A shorter version of this paper appeared in the following workshop:

 

·        S. Kumar, A. C. Loui, and M. Hebert

Probabilistic Classification of Image Regions using an Observation-Constrained Generative Approach

ECCV Workshop on Generative Models based Vision (GMBV), 2002.

[pdf]