Sustainable AI Infrastructure: Energy Aware Cloud Computing for Carbon Neutral Digital Systems

Abstract

The rapid growth of Large Language Models (LLMs) and deep learning has led to rising data center energy demands, expected to top 1,000 TWh by 2026. This paper introduces Eco-Orchestrator, a sustainable AI framework that uses real-time grid carbon intensity for dynamic workload scheduling. The proposed Carbon Aware Reinforcement Learning (CARL) algorithm balances model training latency and carbon footprint, minimizing emissions by aligning hardware power consumption with grid carbon intensity and shifting compute loads to periods of lower carbon output.

Experiments were conducted on a Kubernetes testbed with 64 NVIDIA A100 GPUs, using Prometheus for telemetry and Kepler for detailed power metrics. Compared to standard FCFS schedulers on ResNet-50 and BERT-Large workloads, Eco-Orchestrator showed:

Key results include:

Carbon Reduction: A 34.7% reduction in total carbon emissions by shifting compute loads to temporal windows of low grid carbon intensity (
Energy Efficiency: Implementation of Dynamic Voltage and Frequency Scaling (DVFS) yielded a 22% decrease in total energy consumption with a negligible performance degradation of < 3.5% in training time.
PUE Optimization: The framework achieved a Power Usage Effectiveness (PUE) reduction from a baseline of 1.58 to 1.12 under peak load conditions.

Adopting carbon-adaptive computing is necessary for net-zero digital systems and serves as a replicable model for green cloud infrastructure.

Keywords: Green AI, Sustainable Computing, Dynamic Voltage and Frequency Scaling (DVFS), Carbon-Aware Scheduling, Kubernetes, Power Usage Effectiveness (PUE).

Introduction

The rise of generative AI and large language models has increased demand for high-powered cloud computing and triggered energy concerns. Training a single advanced model can produce as much CO₂ as five cars in their lifetime, with global data center energy use expected to double by 2026, matching Japan's electricity consumption. Traditional cloud infrastructure prioritizes performance and speed, often ignoring carbon impact by scheduling workloads during peak demand with carbon-heavy energy sources.

This paper presents Eco-Orchestrator, a sustainable AI framework that prioritizes carbon impact alongside computing performance. It integrates real-time energy grid telemetry and hardware controls to shift toward "Green AI." Using the Carbon-Aware Reinforcement Learning (CARL) algorithm, the system actively adjusts hardware states and schedules jobs during periods of renewable energy availability. Validated on Kubernetes clusters, Eco-Orchestrator reduces carbon output without changing model architectures.

Key contributions include:

Eco-Orchestrator Framework: Real-time pod-level energy tracking using Kepler in Kubernetes.
Carbon-Aware Scheduling Algorithm: Job placement based on local grid carbon forecasts.
Hardware-Level Optimization: Automated DVFS policies for NVIDIA A100 GPUs to cut energy use.
Empirical Validation: Data showing improved Carbon Usage Effectiveness (CUE) and Power Usage Effectiveness (PUE) compared to standard scheduling, tested on ResNet-50 and BERT workloads.

Related Work

The transition toward carbon-neutral digital systems has sparked a multi-disciplinary research effort spanning software engineering, hardware optimization, and power systems. Our work builds upon three primary pillars: Green AI paradigms, carbon-aware scheduling, and cloud-native energy observability.

From "Red AI" to "Green AI"

"Red AI" focuses on performance at any resource cost, while "Green AI" prioritizes sustainability. Recent studies measure both operational and embodied carbon emissions from AI hardware. Efficiency improvements alone aren't enough; infrastructure changes are needed for sustainable LLM training.

Carbon-Aware Orchestration

Carbon-aware computing targets using cleaner power rather than just less power. Delay-tolerant workloads allow AI tasks to be scheduled during renewable energy peaks. Google uses global migration for carbon management, but Eco-Orchestrator emphasizes localized GPU modulation within clusters.

Hardware-Level Energy Optimization

GPUs dominate server power usage in AI. Methods like DVFS reduce waste during data stalls, and frameworks such as GPOEO use reinforcement learning for dynamic power control. CARL enhances this approach by integrating grid carbon signals.

Energy Observability in Cloud-Native Systems

Tools like NVIDIA-SMI lack detailed energy data in Kubernetes environments. Kepler exports energy metrics with eBPF, though its models may underestimate some consumption. Eco-Orchestrator improves accuracy by combining Kepler's data with external grid-intensity APIs for comprehensive carbon telemetry.

Summary of Comparative Research

Approach	Optimization Metric	Granularity	Dynamic Adjustment?
Traditional K8s	Resource Availability	Node-Level	No
Carbon-Intelligent (Google)	Grid Intensity	Global/Regional	Yes (Temporal)
GPOEO (2026)	Energy/Delay Product	GPU-Level	Yes (Hardware)
Eco-Orchestrator (This paper)	Carbon Intensity + PUE	Pod/GPU-Level	Yes (Spatio-Temporal)

Methodology and System Architecture

The proposed Eco-Orchestrator framework is designed as a modular extension to the standard Kubernetes (K8s) control plane. Its primary objective is to transform the "black box" of AI energy consumption into a transparent, actionable data stream that guides scheduling decisions in real-time.

System Overview

The architecture consists of three functional layers: the Observability Layer, the Decision Engine (CARL), and the Execution Layer.

Figure 1: Eco-Orchestrator Architecture Flow Diagram

Observability Layer: Utilizes Kepler (Kubernetes-based Efficient Power Level Exporter) to capture per-pod energy metrics. These metrics are aggregated in a Prometheus time-series database. Simultaneously, an external API client fetches real-time and forecasted carbon intensity from the local energy provider.
Decision Engine (CARL): A Reinforcement Learning agent that processes telemetry and grid data to determine the optimal execution strategy for pending AI jobs.
Execution Layer: Interfaces with the NVIDIA Management Library (NVML) and Kubernetes custom controllers to apply hardware-level power caps or pause/resume pods.

Carbon-Aware Reinforcement Learning (CARL) Logic

The CARL agent operates on a continuous feedback loop. Instead of solving a single static optimization, it treats the cloud environment as a dynamic state space.

State: Includes current GPU utilization, remaining training steps, current grid carbon intensity, and forecasted carbon intensity for the next 6 hours.
Action: The agent selects from three primary actions:
- Pass-through: Execute the job at maximum performance (high energy, high speed).
- Throttling: Apply Dynamic Voltage and Frequency Scaling (DVFS) to reduce GPU clock speeds by 15%--30% during memory-intensive phases.
- Deferral: Suspend the pod and requeue it for a period when renewable energy (e.g., solar or wind) is expected to peak.
Reward: The agent receives a positive reward for minimizing total carbon emissions while ensuring the job is completed before a user-defined "Soft Deadline."

By using this approach, the system learns to "save"its heavy compute budget for times when the grid is cleanest, without needing a human operator to manually set thresholds.

Hardware Control: Intelligent DVFS

A critical component of our methodology is the granular control of the NVIDIA A100 GPU states. We observed that during deep learning training, there are distinct "stalls" where the GPU waits for data from the CPU or storage.

Eco-Orchestrator monitors these stalls via eBPF (Extended Berkeley Packet Filter). When the framework detects a high ratio of I/O wait times, it triggers a hardware command to lower the GPU core frequency. This reduces the "dynamic power" consumption, which scales with the square of the voltage, without significantly impacting the overall training time, as the processor was already idling for data.

The Carbon-Neutral Pipeline

The operational flow for a scholarly AI workload follows this sequence:

Job Submission: The researcher submits a standard YAML manifest for a training job (e.g., PyTorch).
Profiling: The system runs a 5-minute profiling phase to determine the energy signature of the model.
Grid Sync: CARL checks the 24-hour carbon forecast.
Scheduled Execution: The job is either started immediately with a power cap or scheduled for a "Green Window."
Telemetry Logging: Final carbon footprints are logged and returned to the user alongside the model weights, fostering a culture of "carbon-accountable" research.

Comparative Analysis: Traditional vs. Sustainable AI Infrastructure

Feature	Traditional *"Red AI"* Flow	Sustainable *"Eco-Orchestrator"* Flow
Primary Objective	Maximize throughput and minimize latency at any cost.	Optimize the trade-off between training time and carbon footprint.
Scheduling Logic	Resource-Centric: Jobs start as soon as GPU/RAM becomes available.	Carbon-Centric: Jobs are scheduled based on real-time and forecasted grid cleanliness.
Energy Awareness	Static/Opaque: Infrastructure is agnostic to the energy source.	Adaptive: Integrates real-time signals from the energy grid.
Hardware State	Fixed Performance: Hardware runs at peak voltage and frequency.	Dynamic (DVFS): Modulates GPU clock speeds during memory-bound "stalls."
Execution Timing	Immediate: Executes in a First-Come-First-Served (FCFS) queue.	Temporal Shifting: Defers non-critical jobs to highly renewable energy windows.
Observability	Basic: Monitors standard metrics like GPU/CPU utilization.	Granular: Uses eBPF (Kepler) for pod-level energy and carbon telemetry.
Key Metrics	TFLOPS, Throughput, Latency.	CUE (Carbon Usage Effectiveness), PUE, and Energy/Step.

Experimental Setup

To evaluate the efficacy of the Eco-Orchestrator framework, we conducted a series of controlled experiments on a production-grade high-performance computing (HPC) cluster. This section details the hardware, software stack, and the specific AI workloads used to benchmark our system.

Hardware Environment

The experiments were performed on a cluster of eight NVIDIA HGX A100 nodes. Each node is optimized for high-throughput deep learning and interconnected via a 200Gb/s InfiniBand fabric to minimize data-bottleneck latency.

Component	Specification
GPU Cluster	64 × NVIDIA A100 Tensor Core GPUs (80GB VRAM)
CPU per Node	2 × AMD EPYC™ 7763 (64 cores, 2.45 GHz)
System Memory	1 TB DDR4-3200 MHz per node
Storage	3.2 TB NVMe local scratch; 100 TB Shared Lustre FS
Power Monitoring	Integrated BMC with PMBus support

Software Stack and Observability

The cluster operates on Kubernetes v1.30, utilizing a customized scheduler plugin to implement the CARL algorithm.

Energy Telemetry: We deployed the Kepler (Kubernetes-based Efficient Power Level Exporter) agent on every node. Kepler utilizes eBPF (Extended Berkeley Packet Filter) to capture CPU and GPU hardware counters, which are then exported to a Prometheus instance for real-time analysis.
Hardware Control: The NVIDIA Management Library (NVML) was used to interface with the GPU driver, allowing the CARL agent to dynamically set power limits (ranging from 250W to 400W) and adjust core clock frequencies.
Carbon Data: Real-time carbon intensity data () was sourced from the Electricity Maps API, using historical traces from the California (CAISO) and Germany (ENTSO-E) grids to simulate varying levels of renewable energy penetration.

Workload Selection

We selected two representative AI workloads to test the framework under different computational pressures:

Computer Vision (ResNet-50): A compute-intensive task training on the ImageNet-1K dataset. This workload typically exhibits high GPU utilization and steady power draw, making it a primary candidate for temporal shifting.
Natural Language Processing (BERT-Large): A memory-intensive fine-tuning task using the SQuAD v2.0 dataset. This workload often experiences I/O stalls during data loading, providing an ideal scenario for testing Dynamic Voltage and Frequency Scaling (DVFS) without impacting throughput.

Baseline for Comparison

The performance of the Eco-Orchestrator was measured against a Baseline Scheduler, which represents standard industry practice:

Policy: First-Come-First-Served (FCFS).
Energy Mode: Performance-maximalist (no power capping).
PUE Assumptions: A fixed Power Usage Effectiveness (PUE) of 1.58, reflecting the global average for legacy data centers as of 2025–2026.

Results and Discussion

The experimental evaluation of the Eco-Orchestrator framework demonstrates that carbon-neutrality in AI is achievable through intelligent coordination of the infrastructure stack. By correlating hardware power states with grid-intensity signals, we achieved significant sustainability gains with minimal impact on computational throughput.

Carbon Emission Reduction Analysis

The primary metric for success was the reduction in total operational carbon footprint, measured in kilograms of equivalent.

Temporal Shifting Impact: The CARL scheduler successfully deferred non-time-critical ResNet-50 training batches to hours where grid intensity dropped below .
Quantitative Outcome: Compared to the baseline FCFS scheduler, which operates at a constant intensity, Eco-Orchestrator achieved a 34.7% reduction in total carbon emissions.

Energy Efficiency and Hardware Modulation

Beyond carbon intensity, we evaluated the raw energy efficiency gained through Dynamic Voltage and Frequency Scaling (DVFS).

Workload Response: BERT-Large fine-tuning, being highly memory-bound, proved most responsive to DVFS. By reducing the GPU core frequency during identified stall cycles, the framework achieved a 22% decrease in total energy consumption (kWh).
Performance Trade-off: The training delay was marginal. For ResNet-50, the total training time increased by only 3.2%, while BERT-Large experienced a 2.8% increase. This confirms that modern AI workloads contain significant "power slack" that can be reclaimed without jeopardizing model convergence.

Infrastructure Metrics: PUE and CUE

The framework's impact on data center infrastructure efficiency was measured using Power Usage Effectiveness (PUE) and Carbon Usage Effectiveness (CUE).

Metric	Baseline (Standard K8s)	Eco-Orchestrator	Improvement
PUE	1.58	1.12	29.1%
CUE (kWh)	0.42	0.27	35.7%
Avg. Energy/Job (kWh)	142.5	111.1	22.0%

The reduction in PUE from 1.58 to 1.12 is particularly notable; it suggests that by intelligently managing fan speeds and cooling cycles in tandem with GPU power capping, the overhead energy consumed by the facility itself is drastically minimized.

Conclusion and Future Work

The transition toward carbon-neutral digital systems is no longer a peripheral concern for the AI community; it is an operational necessity. As the computational demands of generative models continue to outpace traditional energy efficiency gains, the infrastructure layer must become an active participant in environmental stewardship.

In this paper, we introduced Eco-Orchestrator, a framework that bridges the gap between high-level AI orchestration and low-level energy dynamics. By leveraging the Carbon-Aware Reinforcement Learning (CARL) algorithm, we demonstrated that AI workloads can be dynamically aligned with the availability of renewable energy without sacrificing the momentum of model development.

The results provide a clear empirical foundation for "Green AI" practices:

Decoupling Growth from Emissions: We achieved a 34.7% reduction in carbon footprint through temporal workload shifting, proving that "when" we compute is as important as "how" we compute.
Hardware-Software Synergy: By utilizing DVFS and eBPF-based monitoring via Kepler, we reclaimed 22% of energy waste previously lost to GPU idling and I/O stalls.
Operational Excellence: The reduction of PUE to 1.12 demonstrates that carbon-aware scheduling has a positive ripple effect on the physical cooling and power management of the data center.

Future Work

While Eco-Orchestrator represents a significant step forward, several avenues for future research remain:

Carbon-Aware Inference: This study focused primarily on training. Future work will investigate the optimization of real-time inference pipelines, where strict latency requirements (e.g., <100ms) make temporal shifting more challenging.
Spatial Migration: We intend to expand the framework to support multi-region migration, allowing workloads to "follow the sun" across globally distributed Kubernetes clusters to maximize the use of solar and wind energy.
Embodied Carbon Integration: Current metrics focus on operational energy. A truly sustainable system must also account for the embodied carbon of the hardware, the emissions generated during the manufacturing and disposal of the GPUs and servers themselves.
Generative Efficiency: We plan to explore "Carbon-Budgeted Training," where the CARL agent can automatically suggest model pruning or quantization if the carbon cost of the current training run exceeds a predefined threshold.

In conclusion, the path to sustainable AI requires a departure from the "performance at any cost" mentality. By integrating carbon awareness into the heart of the cloud-native stack, we can ensure that the next generation of digital intelligence does not come at the expense of our planet's future.

Join the Discussion