[Defense] Learning to Learn via Meta-Dataset Distillation
Friday, April 19, 2024
11:00 am - 12:30 pm
In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Mikhail Mekhedkin-Meskhi
will defend his dissertation
Learning to Learn via Meta-Dataset Distillation
Abstract
Meta-learning, also known as learning to learn, is a sub-field of machine learning that focuses on developing algorithms and techniques capable of learning from multiple learning tasks or domains in order to improve generalization and adaptation to new tasks or domains. In this thesis we present three contributions: 1) a formalized description of knowledge transfer in metalearning, 2) a deep learning based method of learning a abstract representation of meta-features for algorithm performance predictions and finally, 3) a novel data distillation method using Sparse Gaussian Processes paired with a meta-data selection algorithm for task specific optimization. Our first contribution is a formal definition of the concept of transfer learning, which aims to improve learning performance on new tasks by leveraging knowledge gained from previous related tasks. We discuss various approaches, including representational transfer, where explicit knowledge is transferred after source models are trained, and functional transfer, where models share internal structure during simultaneous training. We explore how knowledge can be transferred in the form of instances, features, or model parameters, with a focus on neural networks allowing the transfer of partial network structures. We also present theoretical frameworks that provide bounds on the generalization error of meta-learners based on task relatedness and the number of tasks or examples. Furthermore, we examine transfer learning in kernel methods and Bayesian models, as well as the application of the bias-variance tradeoff in the context of meta-learning. Our second contribution builds on the ability of deep neural networks to learn a latent representation of meta-features collected from various tasks. Meta-features allow us to characterize various datasets to perform meta-learning. In this scenario we focus on predicting learning algorithm performance given a set of meta-features. Using a supervised learning setting, we train a neural network to learn abstract meta-features conditioned on learning algorithm performance metrics. We demonstrate that meta-models built using abstract meta-features outperform traditional meta-features in predicting learning algorithm performance on new datasets. Our third contribution revolves around data distillation and meta-learning a optimal support set (meta-dataset). We propose a novel meta-learning data distillation method based on Sparse Gaussian Processes and Neural Tangent Kernels to meta-learn a distilled subset across tasks. We demonstrate that such a mechanism can be repurposed to compress the most salient information across multiple heterogeneous datasets into a single set of meta-data points. We also propose a probabilistic data selection algorithm trained to select the most relevant meta-data points for a given context, improving the model鈥檚 customizability on specific task data. Our evaluation of the proposed method on several benchmarks shows competitive performance compared to existing state-of-the-art techniques in meta-learning.
Friday, April 19, 2024
11:00 AM - 12:30 PM CT
Agrawal Research Building Room 205F
& Virtual via
Dr. Ricardo Vilalta, dissertation advisor
Faculty, students, and the general public are invited.
