Can we perform Neural Architecture Search (NAS) with a smaller subset of target dataset and still fair better in terms of performance with significant reduction in search cost? In this… Click to show full abstract
Can we perform Neural Architecture Search (NAS) with a smaller subset of target dataset and still fair better in terms of performance with significant reduction in search cost? In this work, we propose a method, called DistilNAS, which utilizes a curriculum learning based approach to distill the target dataset into a very efficient smaller dataset to perform NAS. We hypothesize that only the data samples containing features highly relevant to a given class should be used in the search phase of the NAS. We perform NAS with a distilled version of dataset and the searched model achieves a better performance with a much reduced search cost in comparison with various baselines. For instance, on Imagenet dataset, the DistilNAS uses only 10% of the training data and produces a model in ≈1 GPU-day (includes the time needed for clustering) that achieves near SOTA accuracy of 75.75% (PC-DARTS had achieved SOTA with an accuracy of 75.8% but needed 3.8 GPU-days for architecture search). We also demonstrate and discuss the efficacy of DistilNAS on several other publicly available datasets.
               
Click one of the above tabs to view related content.