Bayesian optimization has been demonstrated as an effective methodology for the global optimization of functions with expensive evaluations. Its strategy relies on querying a distribution over functions defined by a relatively cheap surrogate model. The ability to accurately model this distribution over functions is critical to the effectiveness of Bayesian optimization, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires a large number of evaluations, and as such, massively parallelizing the optimization.
In this work, we explore the use of neural networks as an alternative to Gaussian processes to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we use to rapidly search over large spaces of models. We achieve state-of-the-art results on benchmark object recognition tasks using convolutional neural networks, and image caption generation using multimodal neural language models.