Open-Source RISC-V GPU Vortex 3.0 Adds Full 3D Pipeline, Vulkan, ASIC Flows

Georgia Tech ships rasterizer, texture units, and vortexpipe Vulkan driver targeting ASAP7 and SAED14

Vortex
Vortex Vortex.cc.gatech.edu

Georgia Tech released Vortex 3.0 on June 9, 2026, completing what may be the most capable openly synthesizable GPU in existence: a RISC-V-based design that now pairs a full fixed-function 3D graphics pipeline with Vulkan driver support, modern machine-learning tensor core extensions, and productized ASIC synthesis flows targeting three open process nodes. The release closes a gap that has existed since the project's founding: prior versions had an ISA designed to support graphics, but lacked the actual rasterizer hardware and graphics driver software to use it. Any academic research group or chip startup can now download a GPU design, run it on a field-programmable gate array (FPGA), and send the same RTL source through a synthesis flow targeting a 7nm or 14nm predictive process node — all without paying a licensing fee to any proprietary IP vendor.

Prior Open-Source GPUs Left Gaps Vortex 3.0 Now Fills

The open-source hardware ecosystem has had GPU projects for over a decade, but each carried significant limitations. MIAOW, developed at the University of Wisconsin–Madison, cloned AMD's Southern Islands instruction set architecture, giving it no independent legal standing and limiting it to compute-only workloads with no full memory system. Nyuzi, developed at Binghamton University, used a custom ISA rather than RISC-V, predating the standardization and toolchain ecosystem that makes RISC-V practical. "Before RISC-V, building full-stack GPU hardware would have been an insurmountable task that only large companies would have had the resources to do," said Blaise Tine, the Georgia Tech PhD student who leads the Vortex project.

Vortex's original 2021 MICRO-54 paper described a GPU that extended the RISC-V instruction set with minimal additions to support Single Instruction, Multiple Threads (SIMT) execution, keeping the changes small enough that the existing RISC-V compiler and toolchain ecosystem could be carried forward without rewriting. Version 3.0 now delivers the graphics hardware that the ISA design always anticipated.

Rasterizer, Texture Units, Output Mergers: How the 3D Pipeline Works

The fixed-function graphics stack in Vortex 3.0 follows the canonical three-stage pipeline that has defined GPU rendering since the early 2000s: a rasterizer converts triangle geometry into fragments by performing tile binning and coverage tests; texture units sample and filter texture maps against those fragments; and output mergers blend, depth-test, and write final pixel values to the frame buffer. In Vortex 3.0, all three stages are implemented as synthesizable RTL blocks under hw/rtl/{raster,tex,om}/, described in the SystemVerilog hardware description language rather than as software simulations.

The host-facing public API in graphics.h exposes triangle setup and tile binning as the vortex::graphics::Binning() function, which produces the on-wire rast_prim_t data stream that the hardware rasterizer consumes. That interface was deliberately designed so that external drivers — including Vulkan, HIP, and OpenGL drivers — can target Vortex as a real graphics device by feeding the rasterizer its expected input format without knowing the implementation details of the RTL.

The new Mesa Gallium driver, named vortexpipe, implements the Vulkan API using the lavapipe Installable Client Driver (ICD) framework. Mesa is the dominant open-source graphics driver library for Linux and handles the translation layer between Vulkan API calls and the device-specific hardware instructions. By extending Mesa with a Vortex-specific back-end, the project plugs into the same driver framework used by AMD and Intel open-source GPU drivers, meaning any existing Vulkan application can target Vortex hardware without modification beyond driver installation. The release ships a test suite covering compute, 3D drawing, depth rendering, textured rendering, and ray tracing to validate the full stack.

Tensor Cores, HIP Support, and ML Workloads

Beyond graphics, Vortex 3.0 significantly advances its machine-learning compute capabilities. The tensor core unit adds structured sparsity in the 2:4 format — the same pruning pattern NVIDIA uses in its Ampere and later architectures — which allows weight matrices to be compressed and computed at effectively double the throughput by exploiting the known zero locations. Warpgroup-level matrix multiplication (WGMMA), a mechanism for distributing a matrix multiplication across multiple warps sharing shared memory, is now supported alongside a new in-house floating-point unit that handles both 32-bit and 64-bit IEEE-754 precision natively, removing the project's previous dependency on the external FPNEW library entirely.

HIP support via chipStar arrives for both rv32 and rv64 targets. HIP is AMD's heterogeneous compute API, and chipStar is the open-source runtime that allows HIP kernels to run on non-AMD hardware by translating them to SPIR-V intermediate representation and then to the target device's instruction set. With chipStar, Vortex 3.0 can now accept GPU code written for AMD's ROCm ecosystem without source modification — expanding the research workload library available to groups using Vortex as their architecture study platform.

ASIC Synthesis Flows Across Three Open Process Nodes

Vortex has always targeted FPGAs, but version 3.0 makes a sustained push toward chip fabrication research. The release productizes synthesis flows for both Synopsys Design Compiler and Yosys, the dominant commercial and open-source logic synthesis tools respectively, targeting three process nodes: ASAP7 (a 7nm predictive process design kit from Arizona State University used extensively in academic research), SAED14 (a 14nm standard cell library from Synopsys available for academic licensing), and NanGate 15nm OCL (an openly available standard cell library). All three share a common hw/syn/common.mk infrastructure, standardize the optimization level interface across backends, and integrate OpenSTA for timing and power analysis.

The release also adds a SAIF (Switching Activity Interchange Format) workflow that captures actual gate-level switching activity from simulation runs and feeds it into power estimation, replacing the vectorless static analysis that prior versions used. Workload-driven power estimation reduces the error range in reported power figures from the order-of-magnitude uncertainty of static analysis to numbers tied to specific programs — a prerequisite for serious comparison with proprietary GPU power data.

Memory System, Execution Model, and Simulator Updates

Version 3.0 introduces a Data-transfer Acceleration (DXA) engine for asynchronous direct memory access between global memory and local shared memory, enabling tile staging pipelines that overlap compute with data movement. A new hardware Kernel Management Unit (KMU) takes over compute thread array (CTA) dispatch, and a Command Processor (CP) architecture with a host-resident command ring handles kernel launch, memory operations, and cache flush commands.

The SimX cycle-accurate simulator has been rearchitected in version 3 around a transaction-level memory model with fixed-size handshake channels, replacing the earlier design's unbounded queues with non-blocking send and receive operations that directly mirror the ready/valid backpressure behavior of the RTL. This means buffering bugs that previously only surfaced during hardware bring-up now appear during software simulation — reducing the gap between simulation and silicon behavior. GEM5 integration is also new in 3.0, adding a second validated simulation environment alongside SimX.

Build System Rework Enables Parallel Development Worktrees

A significant housekeeping effort moves the hardware configuration system from a flat Verilog preprocessor header file to TOML-driven configuration files (VX_config.toml, VX_types.toml), with a code-generation script that produces per-target headers at build time. This change eliminates longstanding namespace collisions between Vortex's configuration macros and those used by LLVM and Verilator, both of which appear in the same build environment and previously could silently override each other. The global toolchain_env.sh script has been retired in favor of per-build resolved tool paths, which for the first time allows parallel worktrees of different Vortex versions to coexist in the same shell session — something critical for researchers maintaining multiple branches simultaneously.

The toolchain itself has been refreshed to LLVM 20 and POCL 7.0, the current stable releases of the compiler infrastructure and OpenCL runtime respectively.

What Vortex 3.0 Means for GPU Architecture Research

The practical consequence of these changes is that a research group studying GPU microarchitecture — warp scheduling, memory system behavior, graphics-compute interaction, tensor core design — can now do so with a tool that spans the full stack from ISA specification to synthesizable RTL to Vulkan-compatible software driver to documented ASIC flows, all under open-source licenses. That combination did not exist in a single coherent project before this release.

The project's original motivation, stated in the MICRO-54 paper, was that "there is very little open-source GPU infrastructure in the public domain," and that the complexity of the ISA and software stack was the primary reason. RISC-V solved the ISA problem. Vortex 3.0 addresses the remaining gaps: the graphics pipeline, the graphics driver, the ML compute extensions, and the ASIC synthesis path that would allow a group using academic fabrication programs — such as Google's open shuttle runs on SkyWater's 130nm process or Efabless's open MPW programs — to eventually tape out an open-source GPU.

One constraint remains: virtual memory support via the SV32 mechanism currently works only in the SimX simulator and has not yet been wired into the RTL for real hardware deployment. That limitation means Vortex 3.0 is still a research and FPGA platform rather than a production-grade device, but it represents the most complete open-source GPU implementation available today.


Frequently Asked Questions

What is Vortex and who makes it?

Vortex is an open-source GPU project developed at Georgia Tech's College of Computing. It uses the RISC-V instruction set architecture extended with GPU-specific instructions to support both general-purpose compute and 3D graphics workloads. The project is sponsored by Oak Ridge National Laboratory and SiliconArts and is freely available on GitHub.

What does Vortex 3.0 add that earlier versions lacked?

Vortex 3.0 adds a complete fixed-function 3D graphics pipeline — comprising a rasterizer, texture units, and output mergers — plus a Vulkan driver called vortexpipe built on the Mesa Gallium framework. Earlier versions had an ISA designed for graphics but lacked the actual hardware pipeline and graphics driver software to run graphics workloads end-to-end.

Can Vortex 3.0 be used for GPU architecture research on AI and machine learning workloads?

Yes. Version 3.0 adds tensor core structured sparsity in the 2:4 format, warpgroup-level matrix multiplication, support for FP8, BF16, and TF32 number formats, and HIP compute support via the chipStar runtime. These features allow researchers to study ML-relevant GPU microarchitecture — including sparsity acceleration and tensor core design — on fully open and modifiable hardware.

Does Vortex 3.0 support chip fabrication beyond FPGAs?

Version 3.0 includes productized synthesis flows for Synopsys Design Compiler and the open-source Yosys synthesizer, targeting ASAP7 (7nm), SAED14 (14nm), and NanGate 15nm process nodes. This makes it possible to study Vortex's area, timing, and power characteristics on realistic technology nodes, and to prepare netlists suitable for academic chip fabrication programs.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Join the Discussion