Energy and Policy Considerations for Deep Learning in NLP

Table 1: Estimated CO2 emissions from training common NLP models, compared to familiar consumption.
Table 2: Estimated cost of training a model in terms of CO2 emissions (lbs) and cloud compute cost (USD).
Table 3: Estimated cost in terms of cloud compute and electricity for training: (1) a single model (2) a single tune and (3) all models trained during R&D.
  1. Authors should report training time and sensitivity to hyperparameters.
  • The training time, computational resources, and hyperparameters’ sensitivity should be reported for new models. This will enable fair comparison across models, allowing consumers to accurately assess whether the required computational resources are compatible with their setting.
  • Developers should seek and implement more efficient alternatives to brute-force grid search for hyperparameter tuning. For example, Bayesian hyperparameter search should be integrated into deep learning tools (e.g., PyTorch and Tensorflow).
  • The experiments for the LISA project [6] are developed outside academia. State-of-the-art accuracies are possible thanks to industry access to large-scale compute.
  • Limiting research to rich industry labs will hurt the research community in multiple ways. First, it stifles creativity. Researchers with good ideas will not execute their ideas if large-scale resources are not available. Second, it promotes the already problematic “rich get richer” cycle of research funding, where successful and well-funded groups receive more funding due to their existing accomplishments. Third, the prohibitive start-up cost of building in-house resources forces resource-poor groups to rely on cloud compute services (e.g., Google Cloud and Microsoft Azure).
  • Yet, the cost of these cloud resources is two times (2x) the actual hardware cost per project. Unlike money spent on cloud compute, the purchased resources would continue to pay off as resources are shared across many projects. Unfortunately, non-profit educational institutions lack the initial funding required to build compute-centers (computer clusters). Accordingly, the paper [1] suggests that it is more cost-effective for academic researchers to pool resources to build shared compute-centers.
  • The paper is short (5 papers) and it is well-written. The paper covers interconnected topics related to energy and policy for deep Learning.
  • As a graduate student, there is very little I can contribute to the paper topic. Yet, I think graduate students should be aware of these energy and policy discussions.
  • While reading the *equitable access to computation resources* recommendation, I kept remembering the quotation “life is not fair”. It is true, but we should not accept life as it is. We should strive, within our capabilities, to make life fairer.
  • The paper proposes to pool resources among academic researchers to build shared compute-centers effectively. Of course, shared resources are better than either no resources or cloud compute. However, shared resources are not ideal. Shared resources tend to be overused inefficiently. Concretely, a researcher will utilize shared resources recklessly to maximize his/her return on investment (ROI). With shared resources, there is no incentive to promote the best interest of both the researcher and his/her fellow researchers. This argument is beautifully present in this movie scene [7].



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Taha

Ahmed Taha

I write reviews on computer vision papers. Writing tips are welcomed.