Conjugate Directions for Stochastic Gradient Descent

N. N. Schraudolph and T. Graepel. Conjugate Directions for Stochastic Gradient Descent. In Proc. Intl. Conf. Artificial Neural Networks (ICANN), pp. 1351–1356, Springer Verlag, Berlin, Madrid, Spain, 2002.
Latest version     Related paper

Download

pdf djvu ps.gz
202.7kB   63.8kB   70.8kB  

Abstract

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.

BibTeX Entry

@inproceedings{SchGra02,
     author = {Nicol N. Schraudolph and Thore Graepel},
      title = {\href{http://nic.schraudolph.org/pubs/SchGra02.pdf}{
               Conjugate Directions for Stochastic Gradient Descent}},
      pages = {1351--1356},
     editor = {Jos\'e R. Dorronsoro},
  booktitle =  icann,
    address = {Madrid, Spain},
     volume =  2415,
     series = {\href{http://www.springer.de/comp/lncs/}{
               Lecture Notes in Computer Science}},
  publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
       year =  2002,
   b2h_type = {Top Conferences},
  b2h_topic = {Gradient Descent},
   b2h_note = {<a href="b2hd-SchGra03.html">Latest version</a> &nbsp;&nbsp;&nbsp; <a href="b2hd-SchGra02b.html">Related paper</a>},
   abstract = {
    The method of conjugate gradients provides a very effective way
    to optimize large, deterministic systems by gradient descent.
    In its standard form, however, it is not amenable to stochastic
    approximation of the gradient.  Here we explore ideas from
    conjugate gradient in the stochastic (online) setting, using fast
    Hessian-gradient products to set up low-dimensional Krylov subspaces
    within individual mini-batches.  In our benchmark experiments the
    resulting online learning algorithms converge orders of magnitude
    faster than ordinary stochastic gradient descent.
}}

Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33