Joining link : https://meet.google.com/kyz-exxu-hvw
ABSTRACT :
One of the paramount mathematical mysteries of our times is to be able to
explain the phenomenon of deep-learning. Neural nets can be made to paint
while imitating classical art styles or play chess better than any machine
or human ever and they seem to be the closest we have ever come to
achieving "artificial intelligence". But trying to reason about these
successes quickly lands us into a plethora of extremely challenging
mathematical questions - typically about discrete stochastic processes.
Some of these questions remain unsolved for even the smallest neural nets!
In this talk we will give a brief introduction to neural nets and describe
two of the most recent themes of our work in this direction.
Firstly we will explain how under certain structural and mild
distributional conditions our iterative algorithms like ``Neuro-Tron"
which do not use a gradient oracle can often be proven to train nets using
as much time/sample complexity as expected from gradient based methods but
in regimes where usual algorithms like (S)GD remain unproven. Our theorems
include the particularly challenging regime of non-realizable data. Next
we will briefly look at our first-of-its-kind results about sufficient
conditions for fast convergence of standard deep-learning algorithms like
RMSProp, which use the history of gradients to decide the next step. In
the second half of the talk, we will focus on the recent rise of the
PAC-Bayesian technology in being able to explain the low risk of certain
over-parameterized nets on standardized tests. We will present our recent
results in this domain which empirically supersede some of the existing
theoretical benchmarks in this field and this we achieve via our new
proofs about the key property of noise resilience of nets.
This is joint work with Amitabh Basu (JHU), Ramchandran Muthukumar (JHU),
Jiayao Zhang (UPenn), Dan Roy (UToronto, Vector Institute), Pushpendre
Rastogi (JHU ->Amazon), Soham De (DeepMind, Google), Enayat Ullah (JHU),
Jun Yang (UToronto, Vector Institute) and Anup Rao (Adobe).
About the speaker :
Anirbit Mukherjee finished his Ph.D. in applied mathematics at the Johns
Hopkins University advised by Prof. Amitabh Basu. He is now a post-doc at
Wharton (UPenn) with Prof. Weijie Su. He specializes in deep-learning
theory.
Time:
7:00pm
Description:
Title: Standard monomials, matroids, and lattice paths.
Abstract: Every finite collection of points is the set of solutions to
some system of polynomial equations. This is a (computationally)
reasonable representation, in particular when writing down defining
equations is easier than the actual points. Motivated by Grobner basis
theory for finite point
configurations, I will discuss standard complexes of 0/1-point
configurations. For a matroid basis configuration, the corresponding
standard complex is a subcomplexes of the independence complex, which is
invariant under matroid duality. For the lexicographic term order, the
standard complexes satisfy a deletion-contraction-type recurrence. For
lattice path matroids these complexes can be explicitly described in terms
of lattice path combinatorics. The talk is based on work with Alexander
Engstrom and Christian Stump.