Voice AI has finally broken free from heavy hardware and cloud lock-in. With NeuTTS Air, built by Neuphonic, we are entering a new era of text-to-speech (TTS) technology where studio-grade realism, instant cloning, and real-time speech generation can all happen locally on your device, without any internet connection required.

This breakthrough is not just an engineering milestone; it is a paradigm shift in how we build and deploy voice intelligence systems. For years, creating lifelike AI voices required access to massive GPUs, proprietary APIs, and costly cloud infrastructure. NeuTTS Air changes that completely.

When combined with Spheron Network’s decentralized GPU infrastructure, you can now set up, run, and scale ultra-realistic TTS models affordably, powered by community and data center-grade compute from around the world.

The Evolution of Text-to-Speech: From Cloud Dependence to Local Autonomy

Before diving into NeuTTS Air, it’s important to understand the evolution of text-to-speech technology and why this release is such a breakthrough.

The Early Days: Synthetic and Static: Traditional TTS systems were rule-based, stitching together phonemes to simulate human speech. Voices sounded robotic, flat, and emotionless. They lacked rhythm, emotion, and realism.

The Neural Wave: Cloud-Powered Realism: The 2010s saw a revolution with DeepMind’s WaveNet and Tacotron from Google. These neural TTS systems generated remarkably realistic speech using deep learning. However, they came with a major limitation: they were cloud-bound. Running these large models required specialized infrastructure, typically accessible only through APIs offered by major tech players like Google, Amazon, or Microsoft. Developers were effectively locked into closed ecosystems and pricing models.

The Next Frontier: Edge-Ready Voice AI: In the AI renaissance of 2024–2025, a new focus emerged on local, privacy-first AI. Users and enterprises demanded control over their data. Devices became more powerful. The natural next step was bringing high-quality voice synthesis to the edge without sacrificing realism or latency. That is where NeuTTS Air stands out.

Introducing NeuTTS Air: Redefining On-Device Speech Generation

NeuTTS Air is the world’s first on-device, super-realistic text-to-speech system capable of running locally, without internet access or external APIs.

Key Highlights

Studio-Grade Realism: Speech indistinguishable from human recordings, complete with tone, pitch, and emotion.

Instant Voice Cloning: Clone any voice with just 3 seconds of audio.

Real-Time Generation: Produces speech instantly, even on laptops or Raspberry Pis.

Privacy-First: Keeps data and audio securely on your device.

Efficient Performance: Optimized for speed and low power usage.

NeuTTS Air runs on a 0.5B parameter LLM backbone and NeuCodec, a custom neural audio codec designed by Neuphonic to balance speed, quality, and efficiency.

Hardware Requirements

To ensure smooth and real-time inference, the recommended system setup is:

Deploying NeuTTS Air on Spheron Network

Spheron Network provides affordable, privacy-preserving GPU compute, sourced from both data center-grade and community GPUs. This decentralized infrastructure makes it perfect for running NeuTTS Air locally without relying on cloud APIs.

Step-by-Step Setup Guide

Step 1: Access Spheron Console and Add Credits

Head over to console.spheron.network and log in to your account. If you don’t have an account yet, create one by signing up with your Email/Google/Discord/GitHub.

Once logged in, navigate to the Deposit section. You’ll see two payment options:

SPON Token: This is the native token of Spheron Network. When you deposit with SPON, you unlock the full power of the ecosystem. SPON credits can be used on both:

Community GPUs: Lower-cost GPU resources powered by community Fizz Nodes (personal machines and home setups)

Secure GPUs: Data center-grade GPU providers offering enterprise reliability

USD Credits: With USD deposits, you can deploy only on Secure GPUs. Community GPUs are not available with USD deposits.

For running NeuTTS, we recommend starting with Secure GPUs to ensure consistent performance. Add sufficient credits to your account based on your expected usage.

Step 2: Navigate to GPU Marketplace

After adding credits, click on Marketplace. Here you’ll see two main categories:

Secure GPUs: These run on data center-grade providers with enterprise SLAs, high uptime guarantees, and consistent performance. Ideal for production workloads and applications that require reliability.

Community GPUs: These run on community Fizz Nodes, essentially personal machines contributed by community members. They’re significantly cheaper than Secure GPUs but may have variable availability and performance.

For this tutorial, we’ll use Secure GPUs to ensure smooth installation and optimal performance.

Step 3: Search and Select Your GPU

You can search for GPUs by:

Region: Find GPUs geographically close to your users

Address: Search by specific provider addresses

Name: Filter by GPU model (RTX 4090, A100, etc.)

For this demo, we’ll select a Secure RTX 4090 (or A6000 GPU), which has excellent performance for running NeuTTS. The 4090 provides the perfect balance of cost and capability for both testing and moderate production workloads.

Click Rent Now on your selected GPU to proceed to configuration.

Step 4: Select Custom Image Template

After clicking Rent Now, you’ll see the Rent Confirmation dialog. This screen shows all the configuration options for your GPU deployment. Let’s configure each section. Unlike pre-built application templates, running NeuTTS requires a customized environment for development capabilities. Select the configuration as shown in the image below and click “Confirm” to deploy.

GPU Type: The screen displays your selected GPU (RTX 4090 in the image) with specifications: Storage, CPU Cores, RAM.

GPU Count: Use the + and – buttons to adjust the number of GPUs. For this tutorial, keep it at 1 GPU for cost efficiency.

Select Template: Click the dropdown that shows “Ubuntu 24” and look for template options. For running NeuTTS, we need an Ubuntu-based template with SSH enabled. You’ll notice the template shows an SSH-enabled badge, which is essential for accessing your instance via terminal. Select: Ubuntu 24 or Ubuntu 22 (both work perfectly)

Duration: Set how long you want to rent the GPU. The dropdown shows options like: 1hr (good for quick testing), 8hr, 24hr, or longer for production use. For this tutorial, select 1 hour initially. You can always extend the duration later if needed.

Select SSH Key: Click the dropdown to choose your SSH key for secure authentication. If you haven’t added an SSH key yet, you’ll see a message to create one.

Expose Ports: This section allows you to expose specific ports from your deployment. For basic command-line access, you can leave this empty. If you plan to run web services or Jupyter notebooks, you can add ports.

Provider Details: The screen shows provider information:

This shows which decentralized provider will host your GPU instance.

Scroll down to the Choose Payment section. Select your preferred payment option:

USD – Pay with traditional currency (credit card or other USD payment methods)

SPON: Pay with Spheron’s native token for potential discounts and access to both Community and Secure GPUs

The dropdown shows “USD” in the example, but you can switch to SPON if you have tokens deposited.

Step 5: Check the “Deployment in Progress“

Next, you’ll see a live status window showing every step of what’s happening, like: Validating configuration, Checking balance, Creating order, Waiting for bids, Accepting a bid, Sending manifest, and finally, Lease Created Successfully. Once this is complete, your Ubuntu server is live!

Deployment typically completes in under 60 seconds. Once you see “Lease Created Successfully,” your Ubuntu server with GPU access is live and ready to use!

Step 6: Access Your Deployment

Once deployment completes, navigate to the Overview tab in your Spheron console. You’ll see your deployment listed with:

Status: Running

Provider details: GPU location and specifications

Connection information: SSH access details

Port mappings: Any exposed services

Step 7: Connect via SSH

Click the SSH tab, and you will see the steps on how to connect your terminal via SSH to your deployment details. It will look something like the image below, follow it:

ssh -i -p root@

Open your terminal and paste this command. Upon your first connection, you’ll see a security prompt requesting that you verify the server’s fingerprint. Type “yes” to continue. You’re now connected to your GPU-powered virtual machine on the Spheron decentralized network.

Step 8: Installing Miniconda and Setting Up the Environment

We will use Miniconda to create a clean Python environment for NeuTTS Air.

1. Download Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

2. Make the Installer Executable and Run It

chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh -b -p /root/miniconda3

3. Initialize Conda

/root/miniconda3/bin/conda init bash

Step 9: Creating the TTS Environment

1. Create and Activate the Environment

conda create -n tts python=3.11 -y && conda activate tts

If you see TOS not accepted errors, run the following commands one by one:

conda tos accept –override-channels –channel https://repo.anaconda.com/pkgs/main
conda tos accept –override-channels –channel https://repo.anaconda.com/pkgs/r

Then run again:

conda create -n tts python=3.11 -y && conda activate tts

2. Initialize Conda

conda init bash

Step 10: Installing Dependencies and Cloning NeuTTS Air

1. Install Git

apt update && apt install -y git

2. Clone the NeuTTS Air Repository

git clone https://github.com/neuphonic/neutts-air.git && cd neutts-air

3. Install Dependencies

pip install -r requirements.txt
apt install espeak-ng

4. Install Gradio for Browser Access

pip install gradio

Step 11: Running the NeuTTS Air Application

1. Connecting a Code Editor

While you can write Python scripts directly in the terminal using editors like nano or vim, connecting a modern code editor dramatically improves productivity. We recommend VS Code, Cursor, or any IDE supporting SSH remote development. For this tutorial, we are using Cursor. Just open it and connect it to “Connect Via SSH“

2. Create a file named app.py

import os
import sys
sys.path.append(“neutts-air”)
from neuttsair.neutts import NeuTTSAir
import numpy as np
import gradio as gr

SAMPLES_PATH = os.path.join(os.getcwd(), “neutts-air”, “samples”)
DEFAULT_REF_TEXT = “So I’m live on radio. And I say, well, my dear friend James here clearly, and the whole room just froze. Turns out I’d completely misspoken and mentioned our other friend.”
DEFAULT_REF_PATH = os.path.join(SAMPLES_PATH, “dave.wav”)
DEFAULT_GEN_TEXT = “My name is Dave, and um, I’m from London.”

tts = NeuTTSAir(
backbone_repo=“neuphonic/neutts-air”,
backbone_device=“cuda”,
codec_repo=“neuphonic/neucodec”,
codec_device=“cuda”
)

def infer(
ref_text: str,
ref_audio_path: str,
gen_text: str,
) -> tuple[int, np.ndarray]:
“”
Generates speech using NeuTTS-Air given a reference audio and text, and new text to synthesize.
Args:
ref_text (str): The text corresponding to the reference audio.
ref_audio_path (str): The file path to the reference audio.
gen_text (str): The new text to synthesize.
Returns:
tuple [int, np.ndarray]: A tuple containing the sample rate (24000) and the generated audio waveform as a numpy array.
“”

gr.Info(“Starting inference request!”)
gr.Info(“Encoding reference…”)
ref_codes = tts.encode_reference(ref_audio_path)

gr.Info(f“Generating audio for input text: {gen_text}”)
wav = tts.infer(gen_text, ref_codes, ref_text)

return (24_000, wav)

demo = gr.Interface(
fn=infer,
inputs=[
gr.Textbox(label=“Reference Text”, value=DEFAULT_REF_TEXT),
gr.Audio(type=“filepath”, label=“Reference Audio”, value=DEFAULT_REF_PATH),
gr.Textbox(label=“Text to Generate”, value=DEFAULT_GEN_TEXT),
],
outputs=gr.Audio(type=“numpy”, label=“Generated Speech”),
title=“NeuTTS-Air☁️”,
description=“Upload a reference audio sample, provide the reference text, and enter new text to synthesize.”
)

if __name__ == “__main__”:
demo.launch(allowed_paths=[SAMPLES_PATH], mcp_server=True, inbrowser=True, share=True)

The code creates an interactive, browser-based voice cloning demo where you upload a short sample of someone’s voice, input a new sentence, and instantly hear that person’s cloned voice speak the new text, all powered by NeuTTS Air running locally on a GPU.

Create a Python file named app.py Then run the following command in the terminal

python3 app.py

Then open the given link in your browser. You can now upload a reference audio, type new text, and listen to real-time voice synthesis with your cloned voice.

The results show a highly realistic tone, pacing, and emotional delivery.

Why Spheron is the Perfect Platform

FeatureSpheronTraditional Cloud

CostUp to 90% cheaperHigh and fixed

PrivacyLocal or on-deviceData passes through APIs

FlexibilitySecure + Community GPUsFixed provider

OwnershipToken-based pay-as-you-goVendor lock-in

Ecosystem$SPON-poweredNone

Spheron ensures compute sovereignty, letting developers own and control their AI infrastructure completely.

The Future of Decentralized Voice Intelligence

NeuTTS Air and Spheron together mark the rise of privacy-first, decentralized AI.This approach enables:

Local-first apps like assistants and toys.

Reduced reliance on cloud monopolies.

A foundation for DePIN (Decentralized Physical Infrastructure Networks) for compute supply.

NeuTTS Air is more than a TTS model. It represents freedom in voice AI. By combining realistic speech synthesis, instant cloning, and local-first architecture, it sets a new benchmark for voice generation.

With Spheron Network, you can deploy and experiment with NeuTTS Air quickly, securely, and affordably, while keeping full control over your data.

Whether you are building a voice assistant, AI storyteller, or enterprise-grade audio solution, NeuTTS Air on Spheron brings human-like voices to life, locally, privately, and beautifully.

Get Started Now at console.spheron.networkDeploy NeuTTS Air today. Own your compute. Shape the future of Voice AI.



Source link