0% found this document useful (0 votes)
46 views6 pages

Code Confabulator Harnessing LLMs To Compile Code For Visualization

The document presents a framework called Code Confabulator, which utilizes large language models (LLMs) like GPT-3.5 to enhance code visualization for programming languages such as Python, Java, C, and C++. This framework aims to address challenges faced by existing code visualization tools, including time complexity and limited adaptability, by providing a more human-like understanding of code through visual representations. The paper details the methodology, implementation, and results of the framework, showcasing its potential to improve learning and debugging processes for novice programmers.

Uploaded by

Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views6 pages

Code Confabulator Harnessing LLMs To Compile Code For Visualization

The document presents a framework called Code Confabulator, which utilizes large language models (LLMs) like GPT-3.5 to enhance code visualization for programming languages such as Python, Java, C, and C++. This framework aims to address challenges faced by existing code visualization tools, including time complexity and limited adaptability, by providing a more human-like understanding of code through visual representations. The paper details the methodology, implementation, and results of the framework, showcasing its potential to improve learning and debugging processes for novice programmers.

Uploaded by

Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE - 61001

Code Confabulator : Harnessing LLMs to Compile


Code for Visualization
2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) | 979-8-3503-7024-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICCCNT61001.2024.10724543

Nannapaneni Rayvanth Pasupuleti Pranavi Suryaa E


Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering Engineering Engineering
Amrita School of Computing, Bengaluru Amrita School of Computing, Bengaluru Amrita School of Computing, Bengaluru
Amrita Vishwa Vidyapeetham, India Amrita Vishwa Vidyapeetham, India Amrita Vishwa Vidyapeetham, India
[email protected] [email protected] [email protected]

Venkatahemant Kumar Reddy Challa Meena Belwal


Department of Computer Science and Engineering Department of Computer Science and Engineering
Amrita School of Computing, Bengaluru Amrita School of Computing, Bengaluru
Amrita Vishwa Vidyapeetham, India Amrita Vishwa Vidyapeetham, India
[email protected] [email protected]

Abstract— Code visualization tools play a core role for experiments with various LLMs to determine those suitable for
visualization of the code and helping novices in understanding, our task. Gemini proved unsuitable, and so we carried over our
analyzing, and optimization of their programs. Tools such as work to use GPT-3.5.
Algorithm Visualizer and PythonTutor are developed to make
algorithm visualization possible across different programming The challenges faced by code visualization tools include
languages like Python, Java, C, and C++. Despite their representing complex data structures in an effective manner,
advantages, these tools suffer from unnecessary data collection, which again limits the visualization accuracy. The
increased time complexity, and limitation in visualizing a few data computational needs of the generation of visualizations may
structures. In our research, we want to introduce a more human- increase time complexity, thereby affecting real-time
like view of understanding and solve the shortcomings mentioned performance. The adaptability constraint of many tools—the
above by proposing a framework, which will make use of LLMs tailor-made ones for specific languages—hinders their
for extracting data from the code in Python, Java, C, or C++ to be widespread applicability across diverse coding environments.
visualized. In our framework, we have used LLMs instead of the Hence, in this paper, we have developed a framework to address
traditional compiler for feasibility and adaptability. Our objective
these challenges. The compiler iterates over 5 stages. First, the
is the appropriateness evaluation of LLMs for this task, starting
from freeform prompts. The LLM is asked to generate outputs in
code will be loaded into the compiler. An extractive summary
a given format; hence, these outputs are taken and converted into of the code is drawn out using GPT-3.5 through API calls.
visualizations with a rule-based approach. The task will be done to Non- essential text/code is processed in the pre-processing step.
enhance quality visualization and reduce time complexity while After compilation or information extraction, a rule-based
allowing the user to view the code at a human level view. We have approach is followed for getting frames for the animation of
also publicly released the framework so that people are able to visualized data. Finally, display the compiled result to the user
provide further contributions and improvements. in the form of an animation in the app.

Keywords—Large language Models, Code visualization The rest of the paper has been organized in the following
manner. Section 2 discusses in detail the related works within
I. INTRODUCTION this domain. Section 3 proposes the method through the
explanation of the system diagram, the methodology used for the
Acquiring coding skills has become a priority with the compiler development, and the results. Section 4 describes the
increase in technological advancement and demands from the results of the implementation in detail. Finally, Section 5
industry. A lot of difficulties arise while understanding and concludes with the conclusion and discusses future research.
mastering the principles of coding, especially for beginners.
Code visualization techniques [1-5] can be used to aid the II. RELATED WORKS
learning process for beginners. Using visual representations of
data structures within the code, one is able to analyse and Bothra et al. [1] proposed a code visualization tool known as
optimize their programming constructs effectively and thus Code Viz in this paper after realizing the difficulties faced by
quickly understand the concepts of coding. new learners of data structures. The tool provides visualizations
of data structures that are available in Python and C accurately.
A compiler [13-16] is software that translates high-level it provided animation of the implemented data structure after
language instructions into machine-level language. It plays a going through all the lines of code. The authors made use of the
very vital role in software development for the correctness and Data Structure Visualization library. This tool also provided
efficiency of programs. In our work, we have replaced the task users a better way to debug the code after visualizing. It was
of a compiler with an LLM, which reads the program, checks for easy to access as this was an online tool and the users didn't
errors, compiles the code and then extracts information to
have to install it to use.
visualize the code.
Egan et al. [2] evaluated a tool known as SeeC that was
LLMs are state-of-the-art deep learning architectures that developed for C program beginners. It also provided
can perform tasks on large datasets. Using LLMs makes it
debugging support. This was developed in order to understand
possible to transform code into a defined format, thus enhancing
the run time behavior of the C programs by the users. For the
its applicability in different programming languages. So, using
LLMs for the development of this tool will contribute to evaluation of this tool the authors recorded the user’s
enhancing the visualization and scalability. We have carried out interaction with the tool and also when they were using it to

15th ICCCNT IEEE Conference,


June 24-28,on
Authorized licensed use limited to: Amrita School of Engineering. Downloaded 2024,
June 06,2025 at 11:59:08 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

debug the programs. A survey was conducted for the students the authors for the summarization process included GPT, Bert
who made use of the tool for debugging purposes and the and Elmo. The data used to train these models was obtained
authors found that users with no experience were able to from Daily mail, News broadcasters, various conference
comfortably use SeeC. However, they were also suggested that documents. The process schema was pre-processing the
the user interface of this tool could be improved. collected text data, model training and evaluating the
performance of summarization. The survey included various
Ishizue et al. [3] designed a visualization tool known as PVC
datasets used for summarization task, models used and
for C program beginners. It helps them to debug their code
different metrics to evaluate the generated summaries. They
effectively. This tool provided users with visualization of the
concluded this work by reviewing the scores given by the
code to help them understand the program flow in a better
models for text summarization task.
way.It supported dynamic allocation of memory. It is a web-
based application and user-friendly tool that allowed the users
In our research we found that the work has been done in this
to easily access. To check the efficiency of this tool the authors
field has some limitations. Few of the limitations are, most of
gave set of questions to users and found that it was 1.7 times
the code visualization tools available are not optimized as
faster than seeC and provided more accurate answers and better
additional data is collected, the codes are visualized purely but
debugging.
the abstract meaning is not captured, existing visualization tools
Online Python Tutor, a tool to visualize Python programs was are limited to certain data structures and there is scope for
revied by Guo et al. [4]. The tool was developed to visualize improvement in visualization and one more limitation was that
Python programs with the main advantage of being a the existing tools could visualize the variables instead of
completely online tool with no plugins being required. The tool visualizing the data structure utilized in the code.
produces execution trace as the output by inputting a Python
program. The trace is the code visualization of the data The key contributions of this work are listed below:
structures. The review also provides the use of this tool by 1. To find the suitability of LLM instead of a compiler.
teachers in universities. However, this particular tool cannot 2. Giving human level understanding to visualizers.
visualize complex data structures like Graphs, Trees and also 3. Reduce irrelevant data collection
the execution is time consuming. The animations of this are not
great compared to other tools. The approach proposed in this paper abolishes the need to
collect the data of each state of the compiler to achieve
Back et al. [5] proposed HDPV a runtime visualization for C, visualization, hence not collecting irrelevant data. By producing
C++ and Java programs. This tool was proposed keeping in visual maps which resemble the human understanding of Data
mind the limitations of other program visualization tools which Structures, it helps the user, better perceive a code snippet. The
supported only single language, view of the program was primary goal of this paper is to take a step forward in the
restricted. HDPV is useful for the most basic programs too direction of replacing compilers with LLMs.
along with visualizing errors in programming. Program
monitoring was implemented using viztop for C++ and vizasm III. METHODOLOGY
for Java which makes use of bytecode information. The
evaluation was done by considering certain tasks most of the Our design goal is to make a tool that coding instructors and
users prefer learning and the tool visualized the data structures students can use instead of or in addition to traditional
whiteboard and PowerPoint diagrams. The work begins with the
efficiently.
creation of User Interface as a platform to handle the code input
Keswani et al. [6] objective was to make use of LLMs for from users and also for the visualization of the code. The user
summarizing lengthy texts. The model used for this purpose interface is coded in HTML, Javascript and CSS for efficient
was Llama 2 to generate summary that is relevant to the input usage of computer resources.
text. In addition to this, the authors also worked on increasing The designed architecture for accomplishing our aim is shown
efficiency of question answering systems by making use of in Fig 1. The user code of Python, Java, C or C++ language is
LLM model known as RAG. This model uses cosine similarity entered into the compiler. An LLM is called using APIs and
to search in databases and give the context to LLM in such a tuned to provide the output in a specified JSON format. The
way that it could generate relevant answers to question asked. LLM utilizes text extraction techniques to extract essential
Then this answer was compared to the response a candidate had information and summarize the code into a desirable format. The
given. The question answering data obtained was pre-processed output from the LLM is shown in Fig 2.
and fed into the RAG model. Based on the scores obtained for
text summarization and question answering system it was
concluded that LLM's were suitable for such tasks.

Jin et al. [7] mentioned the survey the had done on


Text Summarization using various LLMs. This
overview was presented by them in a structured way
which was process or task based. The models selected by
Fig 2. Desired output JSON format from LLM

Fig 1. System Architecture

15th ICCCNT IEEE Conference,


June 24-28,on
Authorized licensed use limited to: Amrita School of Engineering. Downloaded 2024,
June 06,2025 at 11:59:08 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

Non-essential data like comments and such are handled in webpage hosted on the internet. Users select their preferred
pre-processing of the code. The output from the LLM is then programming language and input their code into the
utilized to build data structures along with the actions and specified area.
information present in the data structure. Each output from the
LLM is utilized to construct and obtain an animation frame that B. Compilation using GPT
visualizes each action to be performed on the data structure and The Code Confabulator starts when the user has entered and
the information present in the data structure at that point of compiled his code. It starts with a series of prompts:
time. Several Large Language Models (LLMs) have been understand and refine the user's code to get the desired
explored and assessed, including Gemini, Mistral, Llama, and result. First, it starts checking for any errors or bugs in the
GPT, for their capacity to extract relevant data from user code code through the first prompt.
and refine it.
Through an extensive testing process, it was found that GPT-3.5 If any issues are found, then the visualizer highlights an
needed fewer prompts and tuning procedures to consistently error message with a simple explanation to the user. But if
produce the desired outcome. Based on this finding, we deduced the code is error-free and compiles with no problem, then it
that GPT-3.5 is the best option for the Language Model of the sends the second prompt to GPT-3.5 to know what data
Code Confabulator. structure the user has used in his code. Now, the outputs in
The collection of all the animation frames are concatenated into the mentioned format are already trained on GPT-3.5. So,
an animation and then sent as a output to the user interface. The considering insights from past interactions, it then sends the
animation is next shown frame by frame to help the user to third prompt to GPT-3.5 for converting the code to the
comprehend the action/operation sequence and data structure required format and generating the JSON file.
flow. This is the way that the user can both learn how the data C. Visualization in Front End
structure works and also to use the code to fix and debug any
issues or errors with it. The output in JSON format from GPT-3.5 is taken and is
further processed. It creates frames for each operation of the
The various parts of the implementation can be split into 3 data structure by a rule-based method. These individual
phases to get the desired animation output from a user code as frames are then combined to build an animation, which is
shown in Fig 3. displayed to the user. This animation helps the users to
A. Front End understand the processes and actions happening within the
data structure.
The Code Confabulator is accessed by users through a

Fig 3. Code Work Flow

Fig 4. Code Confabulator UI

15th ICCCNT IEEE Conference,


June 24-28,on
Authorized licensed use limited to: Amrita School of Engineering. Downloaded 2024,
June 06,2025 at 11:59:08 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

IV. RESULTS Linked List package was utilized in the Java code for ease of
The experiment to utilize LLMs to compile and visualize use. “addFirst”, “addLast”, “addIndex”, “removeFirst”,
codes of multiple programming languages can be said to be a “removeLast”, “removeIndex” and a java coded “search”
success. The Code Confabulator UI is split into broad areas, function were utilized to show the different operations that can
user input area, where the user can input the code of their be performed in Java Linked Lists.
programming language and the visualization area, where the
data structure and its corresponding operations are visualized. The same packages are not present in Python or C. Hence,
The UI also incorporates a vertical scrolling feature to enhance custom classes were coded in Python (utilized “deque”
user accessibility and navigation while simultaneously handling package) and C.
the constraint in user input area as shown in Fig 4. Despite the difference in packages and code overall, we
observed that the LLM, GPT-3.5, provided the output JSON
Code Confabulator was tested on Java, Python and C using the prompt specified types and operations.
programming languages to display the flexibility of coding
while using an LLM to compile the code. The codes of different For testing the ability of the LLM to identify errors and bugs in
languages utilizing the same data structure, here, Linked List the user codes, multiple cases were tested and the LLM was
were used to test GPT-3.5’s understanding and output format. observed to provide an accurate output which explains the error
The cases and output format are shown in Fig 5. in detail. One such example of a code having an error and the
output given by the LLM is shown in Fig. 6.

Fig 5. Input and LLM JSON output for Java, Python and C code

Fig 6. Java code with error and error message returned to the user

15th ICCCNT IEEE Conference,


June 24-28,on
Authorized licensed use limited to: Amrita School of Engineering. Downloaded 2024,
June 06,2025 at 11:59:08 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

a. Insertion of values from Linked List using “addFirst”, “addLast” and “addIndex”

b. Removal of values from Linked List using “removeFirst, “removeLast” and “removeIndex”

c. Searching of element in Linked List with pointer

d. Insertion of elements into Binary Tree

e. Searching of element 3
Fig 7. Animation frames of operations performed on Linked List and Binary Trees

Using the output JSON from the LLM, a rule-based approach is The frames for each individual operation are shown in Fig. 7.
used to generate frames for each operation on the data structure. Fig 7 a. visualizes the insertion of values into the LinkedList,
The Code Confabulator then takes these frames and provides a Fig 7 b. visualizes deletion or removal of the same. Fig 7 c.
step-by-step animation to the user for whatever process is visualizes the frames that denotes how the pointer moves to find
occurring in the code. For the purpose of experimentation, two elements in a Linked List. Fig 7 d. represents frames that shows
data structures have been used, namely Linked Lists and Binary the insertion of elements into a binary tree. Fig 7 e. contains the
Trees. For Linked Lists, the main operations “addFirst”, frames that represent the search operation in a binary tree.
“addLast”, “addIndex”, “removeFirst”, “removeLast”,
“removeIndex” and “indexOf” have been used to test the It is to be noted that data structures similar to Linked Lists such
versatility of the LLM. as Arrays, ArrayLists and data structures similar to Binary Trees
such as Heaps were not individually implemented. This is due
Similarly, for Binary Trees, insertion and traversal to search for to the flexibility of the visualizer and LLM. Minor changes to
an element were implemented to test the LLM. The LLM the codes of Linked List and Binary Tree visualization can
returned the required JSON and Code Confabulator utilized it result in the visualization of structures like Arrays and Heaps.
to create frames for animation. This shows the flexibility of Code Confabulator.

15th ICCCNT IEEE Conference,


June 24-28,on
Authorized licensed use limited to: Amrita School of Engineering. Downloaded 2024,
June 06,2025 at 11:59:08 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

V. CONCLUSION AND FUTURE SCOPE [12] Helminen, Juha & Malmi, Lauri, "Jype - A Program Visualization and
Programming Exercise Tool for Python", Proceedings of the ACM
The idea of replacing a compiler with AI is not far-fetched, the Conference on Computer and Communications Security.
aim of this paper is to spread across this idea. We have achieved [13] Murali, Ritwik, Rajkumar Sukumar, Mary Sanjana Gali, and
significant improvement in terms of optimization as the LLM Veeramanohar Avudaiappan. "Empowering Novice Programmers with
Visual Problem Solving tools." In Proceedings of the 16th Annual ACM
gives only necessary data for visualization. The animation of India Compute Conference, pp, 2023.
the code visualizer has been improved leveraging front-end web [14] Pichler, Christoph, Paley Li, Roland Schatz, and Hanspeter Mössenböck.
technologies i.e.HTML, CSS and JavaScript. Finally using an "Hybrid Execution: Combining Ahead-of-Time and Just-in-Time
LLM takes code confabulator a step closer to understanding the Compilation." In Proceedings of the 15th ACM SIGPLAN International
abstract meaning of the written code thus aiding in crafting Workshop on Virtual Machines and Intermediate Languages, pp. 39-49.
2023.
better visualization techniques.
[15] Baghdadi, Riyadh, Jessica Ray, Malek Ben Romdhane, Emanuele Del
Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib
Code confabulator has demonstrated strong performance with Kamil, and Saman Amarasinghe. "Tiramisu: A polyhedral compiler for
the data structures the LLM has been trained with, since the expressing fast and portable code." In 2019 IEEE/ACM International
Symposium on Code Generation and Optimization (CGO), pp. 193-205.
approach is promising, it is sure to demonstrate good IEEE, 2019.
performance for other data structures as well. To accomplish [16] Brauckmann, Alexander, Andrés Goens, Sebastian Ertel, and Jeronimo
that a substantial amount of data is needed to train the LLM. Castrillon. "Compiler-based graph representations for deep learning
Hence, the project has been open-sourced. For this purpose, the models of code." In Proceedings of the 29th International Conference on
project has been hosted in Github as “Code Confabulator” [21]. Compiler Construction, pp. 201-211. 2020.
[17] Ben-Nun, Tal, Alice Shoshana Jakobovits, and Torsten Hoefler. "Neural
code comprehension: A learnable representation of code semantics."
Since the approach relies on obtaining data from an LLM, the Advances in neural information processing systems 31 (2018).
extent to which the LLM depends on variable names versus its [18] Mesbah, Ali, Andrew Rice, Emily Johnston, Nick Glorioso, and Edward
actual understanding of the code remains uncertain, further Aftandilian. "Deepdelta: learning to repair compilation errors." In
experimentation is required to clarify this. Despite code Proceedings of the 2019 27th ACM Joint Meeting on European Software
confabulator’s good performance in the trained data structures, Engineering Conference and Symposium on the Foundations of Software
Engineering, pp. 925-936. 2019.
its effectiveness with other data structures remains unproven.
[19] J. Sundararaman, G. Back, "Hdpv: interactive, faithful, in-vivo runtime
Future efforts will aim to extend visualization support to state visual-ization for c/c++ and java", Proceedings of the 4th ACM
additional data structures. Symposium on Software Visualization, ACM, 2008.
[20] Helminen, Juha & Malmi, Lauri, "Jype - A Program Visualization and
REFERENCES Programming Exercise Tool for Python", Proceedings of the ACM
Conference on Computer and Communications Security.
[1] Kumar N S, Revanth Babu P N, Sai Eashwar K S, Srinath M P, Sreyans [21] https://github.com/orgs/Code-Confabulator/repositories
Bothra, "Code-Viz: Data Structure Specific Visualization and Animation
Tool For User-Provided Code", 2021 International Conference on Smart
Generation Computing, Communication and Networking.
[2] Matthew Heinsen Egan, Chris McDonald, "An evaluation of SeeC: a tool
designed to assist novice C programmers with program understanding and
debugging", Computer Science Education, Volume 31, 340-373.
[3] Ryosuke Ishizue, Kazunori Sakamoto, Hironori Washizaki, Yoshiaki
Fukazawa, "PVC.js: visualizing C programs on web browsers for
novices", Heliyon, Volume 6, Issue 4, April 2020.
[4] Philip J. Guo, "Online Python Tutor: Embeddable Web-Based Program
Visualization for CS Education", Proceeding of the 44th ACM technical
symposium on Computer Science Education
[5] J. Sundararaman, G. Back, "Hdpv: interactive, faithful, in-vivo runtime
state visual-ization for c/c++ and java", Proceedings of the 4th ACM
Symposium on Software Visualization, ACM, 2008.
[6] Keswani, Gunjan, Wani Bisen, Hirkani Padwad, Yash Wankhedkar,
Sudhanshu Pandey, and Ayushi Soni. "Abstractive Long Text
Summarization Using Large Language Models." International Journal of
Intelligent Systems and Applications in Engineering 12, no. 12s (2024):
160-168.
[7] Jin, Hanlei, Yang Zhang, Dan Meng, Jun Wang, and Jinghua Tan. "A
comprehensive survey on process-oriented automatic text summarization
with exploration of llm-based methods." arXiv preprint arXiv:2403.02901
(2024).
[8] Mahesh, M., and P. Sivraj. "DrawCode: Visual tool for programming
microcontrollers." In 2017 3rd International Conference on Advances in
Computing, Communication & Automation (ICACCA)(Fall), pp. 1-6.
IEEE, 2017.
[9] Jayaraman, Swaminathan, Bharat Jayaraman, and Demian Lessa.
"Compact visualization of Java program execution." Software: Practice
and Experience 47, no. 2 (2017): 163-191.
[10] Pecheti, Shiva Teja, H. M. Basavadeepthi, Nithin Kodurupaka, and Meena
Belwal. "Recursive Descent Parser for Abstract Syntax Tree Visualization
of Mathematical Expressions." In 2023 7th International Conference on
Computation System and Information Technology for Sustainable
Solutions (CSITSS), pp. 1-6. IEEE, 2023.
[11] Vishwas, Gade, Nayini Sai Nithin, Peddineni Varshith, and Meena
Belwal. "Unveiling the World of Code Obfuscation: A Comprehensive
Survey." In 2023 7th International Conference on Computation System
and Information Technology for Sustainable Solutions (CSITSS), pp. 1-8.
IEEE, 2023.

15th ICCCNT IEEE Conference,


June 24-28,on
Authorized licensed use limited to: Amrita School of Engineering. Downloaded 2024,
June 06,2025 at 11:59:08 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India

You might also like