{"id":15,"url":"https://pm.philipcastiglione.com/papers/15.json","title":"$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources","read":false,"authors":"Apoorv Khandelwal, Tian Yun, Nihal V. Nayak, Jack Merullo, Stephen H. Bach, Chen Sun, Ellie Pavlick","year":2023,"auto_summary":"The paper \"$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources\" by Apoorv Khandelwal et al. explores the feasibility of pre-training large models within the constraints of academic compute resources. The authors challenge the assumption that academics cannot pre-train models due to limited computational power. They conduct a survey of academic researchers to understand their available compute resources and empirically measure the time required to replicate models using these resources.\n\nThe study introduces a benchmark to assess the time needed to pre-train models on various GPUs and identifies optimal settings to maximize training speed. The authors spend 2,000 GPU hours on experiments and find that models like Pythia-1B, originally trained on 64 GPUs for 3 days, can be replicated using 4 GPUs in 18 days with the same hyperparameters, demonstrating a significant reduction in compute requirements.\n\nThe paper outlines the trade-offs between cost and pre-training time, suggesting that academic researchers can conduct experiments that require training larger models on more data. The authors fully release their codebase to facilitate further research and experimentation.\n\nKey findings include:\n1. Academic researchers typically have access to 1-8 GPUs, often for days or weeks at a time, with limited budgets for cloud computing.\n2. The study measures and reports the time necessary to replicate several models on academic GPU configurations, optimizing performance through efficient training methods.\n3. The authors achieve a 3x reduction in compute compared to original reports, enabling otherwise infeasible training experiments.\n4. A cost-benefit analysis helps determine the best hardware for fast pre-training given a financial budget, suggesting that certain configurations (e.g., 4 H100 GPUs) are more cost-effective than others.\n\nThe paper concludes that academic researchers can indeed pre-train large models with the right optimizations and resources, and it encourages a more transparent understanding of the costs and feasibility of pre-training in academia. The authors hope their benchmark will aid researchers in making informed decisions about resource allocation and experiment design.","notes":{"id":15,"name":"notes","body":null,"record_type":"Paper","record_id":15,"created_at":"2024-12-10T04:51:50.627Z","updated_at":"2024-12-10T04:51:50.627Z"},"created_at":"2024-12-10T04:51:30.872Z","updated_at":"2024-12-10T04:51:57.110Z"}