0% found this document useful (0 votes)
9 views4 pages

Do Generative AI Tools Ensure Green Code? An Investigative Study

This study investigates the sustainability of code generated by popular generative AI tools, specifically ChatGPT, BARD, and Copilot, focusing on their adherence to sustainable coding practices. The findings reveal that these tools often produce non-green code, which could contribute to increased energy consumption and carbon emissions if adopted without modifications. The research highlights the need for further evaluation and improvement of AI-generated code to promote environmentally friendly coding approaches.

Uploaded by

radegastx0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Do Generative AI Tools Ensure Green Code? An Investigative Study

This study investigates the sustainability of code generated by popular generative AI tools, specifically ChatGPT, BARD, and Copilot, focusing on their adherence to sustainable coding practices. The findings reveal that these tools often produce non-green code, which could contribute to increased energy consumption and carbon emissions if adopted without modifications. The research highlights the need for further evaluation and improvement of AI-generated code to promote environmentally friendly coding approaches.

Uploaded by

radegastx0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Do Generative AI Tools Ensure Green Code?

An Investigative
Study
Samarth Sikand† , Rohit Mehra† , Vibhu Saujanya Sharma† , Vikrant Kaulgud† , Sanjay Podder‡ , Adam
P. Burden*
† Accenture
Labs, India ‡ Accenture, India *Accenture, USA
{[Link],[Link],[Link],[Link],[Link],[Link]}@[Link]

ABSTRACT coding practices will lead to unoptimized code, eventually leading


Software sustainability is emerging as a primary concern, aiming to higher energy consumption and carbon emissions. Moreover,
to optimize resource utilization, minimize environmental impact, in traditional human-based development, the primary responsibil-
ity lies with the developer to incorporate these sustainable best
arXiv:2506.08790v1 [[Link]] 10 Jun 2025

and promote a greener, more resilient digital ecosystem. The sus-


tainability or ’greenness’ of software is typically determined by the practices into the code and optimize it accordingly.
adoption of sustainable coding practices. With a maturing ecosys- With rapid advancements in generative AI, particularly in AI-
tem around generative AI, many software developers now rely on assisted coding, numerous software developers are already utiliz-
these tools to generate code using natural language prompts. De- ing or considering the integration of these tools into their daily
spite their potential advantages, there is a significant lack of studies coding activities [8]. These tools include specialized code genera-
on the sustainability aspects of AI-generated code. Specifically, how tion/completion tools like GitHub Copilot [7], Tabnine [20], and
environmentally friendly is the AI-generated code based upon its others, as well as generic large-language models (LLMs) with code
adoption of sustainable coding practices? In this paper, we present generation capabilities-based tools, such as OpenAI ChatGPT [15],
the results of an early investigation into the sustainability aspects Google BARD [9], Meta Code Llama [14], among others. Research
of AI-generated code across three popular generative AI tools — indicates that developers utilize these tools to expedite task comple-
ChatGPT, BARD, and Copilot. The results highlight the default tion, reduce code searches, and enhance code quality, among other
non-green behavior of tools for generating code, across multiple benefits [8, 12, 24]. This results in an overall increase in developer
rules and scenarios. It underscores the need for further in-depth productivity and software quality. As a result of the potential signifi-
investigations and effective remediation strategies. cant benefits offered by these tools, their adoption will rapidly surge
in the near future. Estimates suggest that by 2027, approximately
CCS CONCEPTS 70% of all professional software developers will be utilizing AI-
assisted coding tools for their day-to-day coding-related activities,
• Social and professional topics → Sustainability; • Comput-
leading to a majority of the future software code being AI-generated
ing methodologies → Natural language generation.
[5]. Additionally, in this modern human-AI teaming-based software
ACM Reference Format: development, the burden of ensuring the adoption of sustainable
Samarth Sikand† , Rohit Mehra† , Vibhu Saujanya Sharma† , Vikrant Kaulgud† , coding practices gets distributed between the human developer and
Sanjay Podder‡ , Adam P. Burden*. 2024. Do Generative AI Tools Ensure the AI, with the latter playing a more proactive role in integrating
Green Code? An Investigative Study. In 2024 International Workshop on
and promoting environmentally friendly coding approaches.
Responsible AI Engineering (RAIE ’24), April 16, 2024, Lisbon, Portugal. ACM,
New York, NY, USA, 4 pages. [Link]
While code generation tools have been rapidly adopted and offer
numerous advantages, there is a notable lack of evaluation stud-
ies focusing on the sustainability aspects of the generated code.
1 INTRODUCTION Specifically, there is a gap in understanding how sustainable or en-
Despite playing a pivotal role in advancing sustainability across vironmentally friendly the code produced by these tools is, based on
various domains, software systems exert an often underestimated the implementation of sustainable coding practices. Although AI-
carbon footprint, thereby emerging as a significant and rapidly generated code has been recently studied for other critical software
evolving contributor to global carbon emissions. Multiple studies engineering aspects such as security, performance, correctness,
have estimated that the internet and communications technology quality, and maintainability, etc. sustainability has been largely
industry, which encompasses software and the corresponding hard- overlooked [3, 11, 25]. Conducting evaluation studies in this re-
ware, currently accounts for 2-7% of global greenhouse gas emis- gard can provide insights into the sustainability efficacy related
sions and is predicted to increase to a massive 14% by 2040 [4, 23]. to the usage of these tools, thereby influencing their adoption or
One of the primary reasons behind these high carbon emissions is non-adoption from a sustainability perspective. Furthermore, these
the non-optimization of software code from a sustainability per- evaluations may extend to the development of approaches and
spective, specifically green, energy, and emissions perspective [16]. tools to address the issue in various scenarios where it may arise.
For example, usage of energy-hungry design patterns in code, as In this paper, we present an early exploration aimed at studying the
opposed to energy-efficient design patterns [19]. Among other rea- sustainability aspects of AI-generated code, grounded in the adop-
sons, this non-optimization can majorly be attributed to the lack of tion of sustainable coding practices. The study encompasses three
awareness on the part of the developer, leading to lower adoption of widely used AI code generation tools: ChatGPT, BARD, and Copilot.
sustainable coding practices [10, 13]. Lower adoption of sustainable These tools were assessed for their adherence to six sustainable

DOI: [Link]
RAIE ’24, April 16, 2024, Lisbon, Portugal Sikand et al.

Figure 2: Overall approach to investigative study.

approximately 900 rules in the Green IT set of rules, from which we


considered rules relevant for Java, Javascript and Python language
Figure 1: Illustration of “default” behavior of Gen AI in gen- only.
erating energy-inefficient outputs. For the purpose of this study, we chose 5 rules from CAST Green
IT rules and 1 rule was constructed from a study evaluating the
energy footprint of various Java I/O APIs [18]. Table 1 demonstrates
coding practices, selected from previous sustainability research the rules chosen in our study. The underlying reason for choosing
and standard knowledge bases. Preliminary results highlight the these rules is twofold (i) easy to validate the code-snippet/source
non-green default behavior of the evaluated tools across multi- code manually (ii) the underlying fundamental task(e.g., I/O opera-
ple scenarios. If adopted as-is during software development, the tions, loops) is relatively easy to replicate.
generated non-green code could contribute to excess energy con-
2.1.2 Shortlisted Generative AI tools. Since ChatGPT’s inception,
sumption and carbon emissions, thereby negatively impacting the
there has been a meteoric rise in foundational models, like LLMs,
sustainability of our environment.
as every organization is constantly striving to push state-of-the-art
performance with its models. For our study, we chose 3 popular
2 EVALUATING THE GREENNESS OF AI Generative AI tools with thousands(or millions) of users, which are
GENERATED CODE as follows:
To evaluate the GenAI tool’s adherence to sustainable practices, we
• Github Copilot : Copilot, released in 2021, has become
formulated an effective methodology that will help understand the
one of the most popular AI-powered code generation tools
tool’s adherence to those practices. To that end, we create a nano-
today. Recently, Github claimed to have more than a million
dataset of prompts, for whom sustainable a priori patterns/solutions
developers using their tool[6].
would be known. The presence or absence of green/sustainable
• OpenAI ChatGPT: ChatGPT has become one of the fastest-
patterns in tool outputs would inform our understanding of the
growing internet tools ever, by amassing more than 100
tool’s default behavior. In further subsections, we will delve deeper
million users in a span of 2 months[17]. One of the most
into the methodology and discuss the outcomes of our study.
popular AI tools in recent times and its near-human level
performance makes it a prime candidate for evaluation.
2.1 Study Methodology • Google BARD: Google released BARD in early 2023[1],
Fig 2 illustrates our overall study approach. Our approach has which claimed to overcome some of the shortcomings of
approximately three high-level steps : (i) Choosing an appropriate ChatGPT, like pulling real-time information from the World
set of green rules. (ii) Shortlisting Gen AI tools to be evaluated Wide Web.
(iii) NL Prompt creation. To narrow the scope of the study, the
prompts used for evaluation were created based on the context of The tools chosen reflect the diversity in terms of approaches
the rules rather than framing the prompts independently. The main taken to train the models and how they are packaged to be used. We
reason for this design choice was to influence the LLM behavior to expect that results from these tools will inform our understanding
generate specific outputs pertaining to the rule’s context by solving of these model’s knowledge and reasoning capabilities regarding
a directed task energy-efficient coding practices.

2.1.1 Choosing Green Rules. While many research studies have 2.1.3 Prompt creation. For testing the efficacy of Gen AI tool’s
been conducted to find energy-hungry code patterns in code, there adherence to Green best practices (without external/additional in-
is a lack of a standardized dataset or knowledge base for such puts), we manually created two Natural Language (NL) prompts
patterns. One of the more known rule sets is CAST’s Green IT for each of the six rules shortlisted. The NL prompts, for each rule,
rules[21], which enumerates many energy-inefficient code patterns. are crafted in a manner to ensure that the underlying task of the re-
The Green IT rules comprise a set of energy-efficient coding prac- spective rule will be implemented. The outputs of the Gen AI tools
tices that can be segregated by technologies(JEE, Python, SAP, will represent their default behavior on the prompt tasks. Table 1
etc.), criticality, and other categorizations. Currently, there exist describes all the prompts crafted for the chosen rules
Do Generative AI Tools Ensure Green Code? An Investigative Study RAIE ’24, April 16, 2024, Lisbon, Portugal

Table 1: Prompts crafted for each of the rules and the evaluation results of each tool on respective prompts. Red indicates
absence of green patterns in any solution, Green indicates presence of green patterns in all presented solutions and Orange
indicates partial presence of green patterns(provided with %age. %age is computed by {# of green suggestion/total suggestions
shown by Copilot})

Rule # Rule Title Description Prompt Copilot ChatGPT BARD


I/O APIs like Scanner Create a class which reads a 20MB HTML file and
Comprehending Energy and FileInputStream prints the number of words in the file
Rule 1 are not energy Create a function which takes text file as input and
Behaviors of Java I/O APIs
efficient outputs the number of words of file
Create StringBuilder Create a function which creates a consolidated string
or StringBuffer before by combining all the elements of the given array
Avoid String entering loop, and separated by commas.
Rule 2
concatenation in loops append to it within Create a function to parse a JSON file and output all
loop the data in a consolidated string
Write a SQL query for creating a table for Pets. The
columns should store about information a Pet’s
Change char and
Use varchar2 instead of unique ID, name, age, color, breed, weight, owner’s
Rule 3 VARCHAR columns
char and varchar ID, and owner’s name.
to VARCHAR2
Write a SQL query to create tables for the following
Java class: «code»
Write a java function to print duplicate entries in an
HashMap should be
Rule 4 Avoid using HashTable array.
preferred over
HashTable Write a java function to create a lookup table of
locally stored username and passwords.
Function-based Write a javascript function to print all values of an 50%
Rule 5 Avoid using forEach() iteration takes up to array. (5/10)
eight times as long as Write a javascript function to print all entries of a 60%
loop-based iteration. string array that are palindrome. (6/10)
Write a python function to print the contents of a
Open file using with
Avoid leaving open file local file.
Rule 6 statement or
resources (Python) explicitly close Write a python function to prepend a line
62.5%
opened files. "Accenture-Proprietary" at the start of each file.
(5/8)
Filenames are listed in an array

2.2 Experiment Results and Discussion are non-green. Meanwhile, ChatGPT fails to adhere to sustainable
To conduct our experiments, we leveraged ChatGPT and BARD practices for 33% of the selected rules. BARD follows closely behind
conversational UI tools on their respective organization’s sites. ChatGPT in terms of non-adherence to rules in its solutions.
Additionally, for rules pertinent to Java/JEE we prepend each NL It can be observed from the results that even the basic or funda-
prompt with the following statement: “All responses should mental sustainable practices are not followed consistently across
be for Java language”, to warrant the usage of Java syntax in different tools. We have observed Copilot recommended solutions
tool’s outputs. For Copilot, the official VSCode extension was used to be less sustainable than that of ChatGPT or BARD, as at least
to generate up to 10 solutions(default setting) for a NL prompt, one of its solutions being non-green for 5 out of 6 rules. These ob-
by explicitly triggering Copilot suggestions panel. In Table 1, we servations can be attributed to multiple factors, like differences in
present our findings for each combination of rule and GenAI tool. It training data and training methods for underlying models of Copi-
can be clearly inferred that at least one GenAI tool is not following lot, ChatGPT, and BARD. Although the results present an irrefutable
the chosen Green IT rules. For Rule 1, none of the tools provided fact that GenAI tools may not provide sustainable responses for
the most optimum solution. The I/O functionality is one of the most every scenario, we believe the tool’s responses can be improved by
fundamental of tasks and it can have a significant energy impact, certain actions.
as it can have a multiplicative effect of being used in millions of Depending on the access to underlying LLM model of GenAI tool,
developer codebases. Gen AI’s inclination towards Java I/O func- one can mitigate these sustainability shortcomings. If an organiza-
tions, like Scanner and FileInputStream, can be attributed to the tion has partial/full access to underlying model, they can fine-tune
popularity and ease of use of such I/O functions in majority of the LLMs to include knowledge around sustainable coding practices.
codebases. We also observe that most suggestions by Github Copi- But if they leverage Gen AI tool through blackbox APIs, they can
lot do not exhibit Green coding patterns, for the rules under study. leverage Prompt engineering methods to explicitly append relevant
For Rule 5 and Rule 6, at least 40% of the suggestions from Copilot sustainable coding practice to a prompt. The practical solution for
RAIE ’24, April 16, 2024, Lisbon, Portugal Sikand et al.

anyone would depend on business use case and other constraints [3] Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen,
(e.g., budget/compute constraints). Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2023. ClassEval: A Manually-
Crafted Benchmark for Evaluating LLMs on Class-level Code Generation. ArXiv
abs/2308.01861 (2023).
3 LIMITATIONS [4] Charlotte Freitag, Mike Berners-Lee, Kelly Widdicks, Bran Knowles, Gordon S.
Blair, and Adrian Friday. 2021. The real climate and transformative impact of
While we initiated this study to investigate the default behavior of ICT: A critique of estimates, trends, and regulations. Patterns 2, 9 (2021), 100340.
Generative AI tools to generate green software artifacts, it was not [5] [Link]. Accessed - 02/12/2023. Set Up Now for AI to Augment Soft-
ware Development. [Link]
feasible to cover all the energy-efficiency coding patterns and best to-augment-software-development.
practices. In one of the initial experiments, we discovered that many [6] Github. 2023. The economic impact of the AI-powered developer lifecy-
of the rules descriptions were difficult to replicate with a NL prompt. cle and lessons from GitHub Copilot. Retrieved November 12, 2023
from [Link]
For example, an empirical study put forth that Inheritance is more developer-lifecycle-and-lessons-from-github-copilot/
energy-efficient than Delegation pattern[2]. We attempted to craft [7] [Link]. Accessed - 02/12/2023. Github Copilot. [Link]
copilot.
a NL prompt without explicitly mentioning the code pattern, but [8] [Link]. Accessed - 02/12/2023. Survey reveals AI’s impact on the developer
LLM’s non-deterministic responses didn’t include solutions with experience.
Inheritance or Delegation pattern. Such issues made the evaluation [9] [Link]. Accessed - 02/12/2023. BARD. [Link]
[10] Leila Karita, Brunna C. Mourão, and Ivan Machado. 2019. Software Industry
of many rules extremely challenging. Additionally, the small set Awareness on Green and Sustainable Software Engineering: A State-of-the-
of Green rules selected was by design, firstly due to limited run Practice Survey. In Proceedings of the XXXIII Brazilian Symposium on Software
time, and secondly, to understand if this area of research was worth Engineering (Salvador, Brazil) (SBES ’19).
[11] Raphaël Khoury, Anderson R. Avila, Jacob Brunelle, and Baba Mamadou Camara.
exploration. Hence, the current results are not representative of 2023. How Secure is Code Generated by ChatGPT? arXiv:2304.09655 [[Link]]
one tool’s superior capability, to generate green code, over another. [12] J. T. Liang, C. Yang, and B. A. Myers. 2024. A Large-Scale Survey on the Usability
of AI Programming Assistants: Successes and Challenges. In 2024 IEEE/ACM 46th
Moreover, evaluation hundreds of LLMs1 (e.g., LLAMA[22]), from International Conference on Software Engineering (ICSE). 605–617.
various organizations and with different licenses, would make the [13] Rohit Mehra, Vibhu Saujanya Sharma, Vikrant Kaulgud, Sanjay Podder, and
study extremely challenging. Due to the huge scale of effort re- Adam P. Burden. 2022. Towards a Green Quotient for Software Projects. In
2022 IEEE/ACM 44th International Conference on Software Engineering: Software
quired for exhaustive evaluation, we narrowed our scope to three Engineering in Practice (ICSE-SEIP). 295–296.
tools only. Furthermore, our approach has been bottom-up i.e. we [14] [Link]. Accessed - 02/12/2023. Code Llama. [Link]
shortlisted the rules and then crafted prompts, whereas alternate [15] [Link]. Accessed - 02/12/2023. ChatGPT. [Link]
[16] Gustavo Pinto and Fernando Castor. 2017. Energy Efficiency: A New Concern
approach would be to consider prompts (NL and code both) first, for Application Software Developers. Commun. ACM 60, 12 (nov 2017), 68–75.
independent of the sustainable coding practices. While we believe [17] Reuters. 2023. ChatGPT sets record for fastest-growing user base. Retrieved No-
vember 12, 2023 from [Link]
that this exploratory study is a good starting point for evaluating fastest-growing-user-base-analyst-note-2023-02-01/
sustainability of AI-generated artifacts, a more comprehensive and [18] Gilson Rocha, Fernando Castor, and Gustavo Pinto. 2019. Comprehending energy
standardized approach is needed to evaluate sustainable behavior behaviors of java i/o apis. In 2019 ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement (ESEM). IEEE, 1–12.
of foundation models with different modalities. [19] Cagri Sahin, Furkan Cayci, Irene Lizeth Manotas Gutiérrez, James Clause, Fouad
Kiamilev, Lori Pollock, and Kristina Winbladh. 2012. Initial explorations on
4 FUTURE WORK AND CONCLUSION design pattern energy usage. In 2012 First International Workshop on Green and
Sustainable Software (GREENS). 55–61.
In our study, we have highlighted some of the concerning “default” [20] [Link]. Accessed - 02/12/2023. Tabnine. [Link]
[21] CAST Technologies. Accessed - 02/12/2023. CAST Rules Documentation. https:
behavior of diverse Gen AI tools in generating green code. It can //[Link]/rules?sec=idx_green&ref=||.
be observed that it is important to explore the default behavior [22] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
of the Gen AI tools from the perspective of green code and not Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal
Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv
trust its outputs blindly. Some novel solutions can help shift some preprint arXiv:2302.13971 (2023).
of the sustainable coding best practices considerations from the [23] United-Nations-Environment-Programme-Copenhagen-Climate-Centre. Ac-
developer and reduce their rework in later stages of development cessed - 02/12/2023. Greenhouse gas emissions in the ICT sector: trends and
methodologies.
along with cognitive workload. For future work, we plan to do [24] Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation
a more exhaustive behavior profiling of a larger set of GenAI or vs Experience - Evaluating the Usability of Code Generation Tools Powered by
Large Language Models. In Extended Abstracts of the 2022 CHI Conference on
LLM tools from the perspective of sustainability. Also, approach the Human Factors in Computing Systems (CHI EA ’22). New York, NY, USA.
problem from other way around, by setting tasks independent of the [25] Burak Yetistiren, Isik Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating
sustainable coding practices and then do the profiling. Our aim is the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on
GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. ArXiv abs/2304.10778
to encourage researchers in industry and academia to evaluate the (2023).
sustainability aspects of generated artifacts along with functional
correctness, security, etc.

REFERENCES
[1] Google Blog. 2023. An important next step on our AI journey. Retrieved No-
vember 12, 2023 from [Link]
updates/
[2] Déaglán Connolly Bree and Mel Ó Cinnéide. 2020. Inheritance versus delegation:
which is more energy efficient?. In Proceedings of the IEEE/ACM 42nd International
Conference on Software Engineering Workshops. 323–329.

1 Stanford HELM : [Link]

You might also like