Web3

Home Web3 Page 56

Cardano’s Hoskinson Warns Crypto Becoming Post-Quantum Will Require Trade-Offs – Decrypt

0
Cardano’s Hoskinson Warns Crypto Becoming Post-Quantum Will Require Trade-Offs – Decrypt



In brief

Charles Hoskinson said quantum-resistant cryptography is already standardized, but remains too slow for widespread use.
He pointed to DARPA’s quantum benchmarking program as a key reference for when cryptographic risk becomes practical.
Hoskinson said Cardano is exploring staged mitigations while waiting for hardware acceleration to mature.

As blockchain developers debate protocol updates to counter possible future quantum attacks, Cardano founder Charles Hoskinson said the central issue is timing and not what changes to make, warning that moving too soon could carry a high cost for blockchain networks.

According to Hoskinson, the cryptographic tools needed to protect blockchains from future quantum attacks already exist, pointing to post-quantum standards released by the U.S. National Institute of Standards and Technology in 2024. The problem Hoskinson explained is what it would cost if the new protocols are implemented before miners and validators are ready.

“Post-quantum crypto oftentimes it’s about 10 times slower, 10 times larger proof sizes, and 10 times more inefficient,” Hoskinson told Decrypt. “So if you adopt it, what you’re basically doing is taking the throughput of your blockchain and reducing it by cutting off a zero.”

While researchers broadly agree that sufficiently powerful quantum computers could one day break today’s cryptography, there is far less agreement on when that threat becomes real. Estimates place the arrival of a practical quantum computing anywhere from a few years to more than a decade away.

Hoskinson said instead of focusing on hype and corporate timelines when judging how quickly the threat might arrive, paying attention to DARPA’s Quantum Benchmarking Initiative, which is testing whether different quantum computing approaches can deliver useful results, would be a better option.

“It’s the best independent, objective benchmark that can be referenced for whether quantum computers are going to be real or not, and when they’re going to hit and who’s going to make them,” he said.

]]>

DARPA has set 2033 as a target year for determining whether utility-scale quantum computing is feasible.

Like most major networks, including Bitcoin, Ethereum, and Solana, Cardano relies on elliptic-curve cryptography, which could theoretically be broken by Shor’s algorithm if sufficiently powerful quantum computers emerge. Hoskinson said the industry already knows how to address that vulnerability, but said the debate came down to a choice between two competing cryptographic approaches.

“There’s two big bets you can make,” Hoskinson said. “Hashes, which is what Ethereum is making, and lattices, which is what we’re making.”

Hash-based cryptography uses cryptographic hash functions to create digital signatures that are widely seen as safe from future quantum attacks. These systems are simple, well-studied, and conservative by design, but they are mainly used for signing data and are not suited for general-purpose encryption.

Lattice-based cryptography relies on hard mathematical problems that are expected to remain difficult even for quantum computers. Lattice cryptography supports not just digital signatures but also encryption, and more advanced cryptographic tools, which proponents say make it better suited for a post-quantum world.

“You can do all your crypto operations on your graphics card, like you would an AI operation,” he said. “So you get to reuse hundreds of billions of dollars of AI computers, and you don’t have to build ASICs to accelerate these things.”

Hoskinson, however, did not call for an immediate protocol-wide change in favor of one method or another. Instead, he described a staged mitigation approach. One option he noted involved creating post-quantum-signed checkpoints of Cardano’s ledger history using systems such as Mithril and the privacy-focused Midnight sidechain.

“There are always trade-offs with these systems,” he said. “You can’t go from instant finality to probabilistic finality. Once you’ve made that decision, you’ve made that decision, and you live with the consequences.”

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.



Source link

How Solana neutralized a 6 Tbps attack using a specific traffic-shaping protocol that makes spam impossible to scale

0
How Solana neutralized a 6 Tbps attack using a specific traffic-shaping protocol that makes spam impossible to scale



When a network brags about throughput, it’s really bragging about how much chaos it can swallow before it chokes. That’s why the most interesting part of Solana’s latest “stress test” is that there’s no story at all.

A delivery network called Pipe published data that put a recent barrage against Solana at roughly 6 terabits per second, and Solana’s co-founders backed the broad thrust of it in public posts. If the number is right, it’s the kind of traffic volume usually reserved for the internet’s biggest targets, the sort of thing Cloudflare writes long blog posts about because it isn’t supposed to be normal.

And yet Solana kept producing blocks. There was no coordinated restart or validator-wide group chat turning into a late-night disaster movie.

CryptoSlate’s own reporting on the incident said block production remained steady and confirmations kept moving, with no meaningful jump in user fees. There was even a counterpoint tucked into the chatter: SolanaFloor noted that an Anza contributor argued the 6 Tbps number was a short peak burst rather than a constant week-long wall of traffic, which matters because “peak” can be both true and slightly theatrical.

That kind of nuance is fine. In real-world denial-of-service, the peak is often the point, because a short punch can still knock over a system tuned for a steady state.

Cloudflare’s threat reporting points out how many large attacks end quickly, sometimes too quickly for humans to react, which is why modern defense is supposed to be automatic. Solana’s latest incident now shows a network that learned how to make spam boring.

What kind of attack was this, and what do attackers actually want?

A DDoS is the internet’s crudest but most effective weapon: overwhelm a target’s normal traffic by flooding it with junk traffic from many machines at once. Cloudflare’s definition is blunt; it’s a malicious attempt to disrupt normal traffic by overwhelming the target or nearby infrastructure with a flood of internet traffic, typically sourced from compromised systems.

That’s the web2 version, and it’s the version Pipe is gesturing at with a terabits-per-second chart. Crypto networks add a second, more crypto-native flavor on top: spam that isn’t “junk packets at a website” so much as “endless transactions at a chain,” often because there’s money on the other side of congestion.

Solana’s own outage history is like a handbook for that incentive problem. In September 2021, the chain went offline for more than 17 hours, and Solana’s early postmortem framed the flood of bot-driven transactions as, in effect, a denial-of-service event tied to a Raydium-hosted IDO.

In April 2022, Solana’s official outage report described an even more intense wall of inbound transactions, 6 million per second, with individual nodes seeing more than 100 Gbps. The report said there was no evidence of a classic denial-of-service campaign, and that the fingerprints looked like bots trying to win an NFT mint where the first caller gets the prize.

The network stopped producing blocks that day and had to coordinate a restart.

So what do attackers want, besides attention and the joy of ruining everyone’s Sunday? Sometimes it’s straightforward extortion: pay us, or we keep the firehose on.

Sometimes it’s reputational damage, because a chain that can’t stay live can’t credibly host the kind of apps people want to build. Sometimes it’s market gamesmanship, where broken UX creates odd pricing, delayed liquidations, and forced reroutes that reward people positioned for disorder.

In the on-chain spam version, the goal can be direct: win the mint, win the trade, win the liquidation, win the block space.

What’s different now is that Solana has built more ways to refuse the invitation.

The design changes that kept Solana running

Solana became better at staying online by changing where the pain shows up. In 2022, failures had a familiar shape: too many inbound requests, too much node-level resource strain, too little ability to slow bad actors, and knock-on effects that turned congestion into liveness problems.

The upgrades that matter most sit at the edge of the network, where traffic hits validators and leaders. One is the transition to QUIC for network communication, which Solana later listed as part of its stability work, alongside local fee markets and stake-weighted quality of service.

QUIC isn’t magic, but it’s built for controlled, multiplexed connections rather than the older connection patterns that make abuse cheap.

More importantly, Solana’s validator-side documentation describes how QUIC is used inside the Transaction Processing Unit path: limits on concurrent QUIC connections per client identity, limits on concurrent streams per connection, and limits that scale with the sender’s stake. It also describes packets-per-second rate limiting applied based on stake, and notes the server can drop streams with a throttling code, with clients expected to back off.

That turns “spam” into “spam that gets shoved into the slow lane.” It’s no longer enough to have bandwidth and a botnet, because now you need privileged access to leader capacity, or you’re competing for a narrower slice of it.

Solana’s developer guide for stake-weighted QoS spells this out: with the feature enabled, a validator holding 1% of stake has the right to transmit up to 1% of the packets to the leader. That stops low-stake senders from flooding out everyone else and raises Sybil resistance.

In other words, stake becomes a kind of bandwidth claim, not just voting weight.

Then there’s the fee side, which is where Solana tries to avoid “one noisy app ruins the whole city.” Local fee markets and priority fees give users a way to compete for execution without turning every busy moment into a chain-wide auction.

Solana’s fee documentation explains how priority fees work through compute units, with users able to set a compute unit limit and an optional compute unit price, which acts like a tip to encourage prioritization. It also notes a practical gotcha: the priority fee is based on the requested compute unit limit, not the compute actually used, so sloppy settings can mean paying for unused headroom.

That prices computationally heavy behavior and gives the network a knob to make abuse more expensive where it hurts.

Put those pieces together, and you get a different failure mode. Instead of a flood of inbound noise pushing nodes into memory death spirals, the network has more ways to throttle, prioritize, and contain.

Solana itself, looking back at the 2022 era, framed QUIC, local fee markets, and stake-weighted QoS as concrete steps taken to keep reliability from being sacrificed for speed.

That’s why a terabit-scale weekend can pass without real repercussions: the chain has more automatic “no’s” at the front door and more ways to keep the line moving for users who aren’t trying to break it.

None of this means Solana is immune to ugly days. Even people cheering the 6 Tbps anecdote argue about what the number means and how long it lasted, which is a polite way of saying internet measurements are messy and bragging rights don’t come with an audit report.

And the trade-offs don’t vanish. A system that ties better traffic treatment to stake is, by design, friendlier to well-capitalized operators than hobbyist validators. A system that stays fast under load can still become a venue for bots that are willing to pay.

Still, the fact that the network was quiet matters. Solana’s earlier outages weren’t “people noticed a little latency.” Block production ceased completely, followed by public restarts and long coordination windows, including the April 2022 halt that took hours to resolve.

In contrast, this week’s story is that the chain remained live while traffic allegedly hit a scale more at home in Cloudflare’s threat reports than in crypto lore.

Solana is behaving like a network that expects to be attacked and has decided the attacker should be the one who gets tired first.



Source link

Dedicated vs Shared GPU Memory: VRAM Bandwidth, Paging, and LLM Perfor

0
Dedicated vs Shared GPU Memory: VRAM Bandwidth, Paging, and LLM Perfor


Dedicated GPU memory is the only sane choice for serious AI training and production inference. Shared memory belongs in prototypes, laptops, and light graphics workloads, not in systems that carry real SLAs. As models grow larger and latency expectations tighten, memory architecture stops being a detail and becomes a first-order design decision.

This is exactly why Spheron AI is built around dedicated VRAM GPUs and bare-metal deployments, not shared or overcommitted memory abstractions. When you deploy on Spheron AI, the memory you see is the memory your model actually gets. No silent borrowing from system RAM. No surprise headroom loss under load. No paging cliffs at three in the morning.

To make the case concrete, this article breaks down what actually happens inside GPUs when memory is shared, why outages keep repeating across cloud environments, and why dedicated VRAM is the only architecture that scales cleanly for modern AI workloads.

Why This Outage Keeps Happening

At three in the morning, a production AI system goes down. Inference starts throwing out-of-memory errors. Latency spikes. Traffic backs up. The on-call team scrambles, convinced the model has a bug. After hours of digging, the real issue becomes clear. The GPU they deployed was advertised with 16 GB of memory, but half of it was quietly shared with system processes. The model never had the headroom it needed.

This is not a rare edge case; it is a pattern. Teams deploy on “16 GB GPUs” that, in practice, behave like 8–10 GB devices once shared memory and background processes are accounted for, especially in cloud or virtualized environments. The difference between dedicated and shared GPU memory determines whether you ship features or spend your nights chasing tail latency.​

Dedicated vs Shared GPU Memory

Dedicated GPU memory is VRAM soldered directly onto the GPU board (GDDR or HBM) and connected via a wide, ultra–high-bandwidth bus. When your model accesses weights, activations, or intermediate tensors, the GPU reads them directly from this VRAM at hundreds to thousands of GB/s without competing with CPU, network, or disk traffic.​

Shared GPU memory is borrowed system RAM that the GPU accesses over the system bus when onboard VRAM runs out. Typical dual-channel DDR4/DDR5 setups for CPU memory offer on the order of 40–100 GB/s of bandwidth, a tiny fraction of what high-end GPU VRAM can sustain. That gap is the heart of the problem.​

Key idea: Dedicated VRAM is a private, high-bandwidth highway; shared memory is a congested city street shared with everything else on the machine.​

Bandwidth vs Capacity: The Real Bottleneck

Over the last decade, compute throughput on AI accelerators has exploded, while memory bandwidth has grown much more slowly. Analysis of 1,700+ GPUs from 2007–2025 shows bandwidth rising steadily but nowhere near the exponential gains in FLOPs that AI chips deliver. The result: for many modern AI workloads, performance is bandwidth-bound, not compute-bound.​

For deep learning, every forward and backward pass is a story of moving tensors, not just multiplying them. If memory cannot feed the compute units fast enough, adding more FLOPs does nothing. Shared memory makes this worse, because data must cross the system bus before it ever reaches the GPU.​

You can visualize this with a chart comparing memory bandwidth across memory types used in AI systems (values approximate, but directionally accurate):​

Memory bandwidth and VRAM capacity differences across GPU memory types and models used in AI workloads

System DDR4/DDR5 RAM: ~50 GB/s effective per CPU socket in many servers

GDDR6X on RTX 4090: ~1,008 GB/s

HBM2e on A100 80 GB: ~2,039 GB/s

HBM3 on H100: ~3,000 GB/s

HBM3e on H200: ~4,800 GB/s

This is a two orders-of-magnitude spread between system RAM and the latest HBM3e. Using shared memory means voluntarily dropping from terabytes per second to tens of gigabytes per second.​

Stats: What Dedicated Memory Looks Like

Modern AI GPUs are designed around dedicated VRAM with extreme bandwidth. Here are representative numbers you can embed as a spec table or chart:​

GPU

Memory type

VRAM (GB)

Bandwidth (approx)

RTX 4090

GDDR6X

24

~1,008 GB/s ​

A100 80 GB

HBM2e

80

~2,000 GB/s ​

H100 80 GB

HBM3

80

~3,000 GB/s ​

H200

HBM3e

141

~4,800 GB/s ​

These devices are built so that, once your model fits in VRAM, the GPU can stream data at TB/s scale without touching system memory. A second useful bar chart compares VRAM capacity directly: 24 GB (RTX 4090) vs 80 GB (A100/H100) vs 141 GB (H200).​

Memory bandwidth and VRAM capacity differences across GPU memory types and models used in AI workloads

By contrast, CPUs with DDR4/DDR5 usually top out around 40–100 GB/s of memory bandwidth per socket, even in high-end servers. Once your GPU spills into shared memory, you are throttling a multi-teraflop accelerator through a 50 GB/s straw.​

Where Shared Memory Breaks AI Workloads

Large model training

Transformer training must hold parameters, activations, gradients, and optimizer state simultaneously. A 70B-parameter model in FP16/FP8 can demand hundreds of gigabytes of effective memory budget once you include optimizer states and activation checkpoints. On GPUs like A100/H100 with 80 GB HBM, teams already rely on tensor and pipeline parallelism; spilling further into shared memory is catastrophic.​

On systems that allow GPU page faults into system RAM, you effectively turn high-end GPUs into I/O-bound devices. Batch sizes must shrink, gradient accumulation steps increase, and training time can stretch by 2–5x or more versus a configuration that keeps everything in HBM.​

Batch processing and throughput

High throughput training and offline inference depend on saturating the GPU with large or at least efficient batches. When VRAM is tight and shared memory kicks in, you start paying for:​

Smaller batches and more steps

More frequent host-device transfers

Idle SMs waiting on memory

Benchmarks comparing A100 vs RTX 4090 for fine-tuning show that, when the model fits comfortably in the A100’s 80 GB HBM2e, it can maintain high utilization, whereas the 24 GB 4090 is more prone to batch-size compromises or offloading overhead on large models. That gap widens further if the 4090 has to lean on shared memory.​

Real-time inference and tail latency

Production inference lives or dies on P95–P99 latency, not the median. Shared memory introduces jitter because:​

GPU page faults into host RAM are slower and less predictable than HBM reads

Host RAM competes with CPU workloads, networking stacks, and file I/O

NUMA and PCIe topologies create non-uniform latency paths

LLM inference limit studies show that memory bandwidth and data movement dominate latency once models grow beyond a few billion parameters. Every extra hop—from HBM to GDDR to DDR adds variance. Tail latency spikes are often just memory architecture leaking into user experience.​

How Cloud GPUs Hide the Memory Trap

Cloud platforms abstract hardware to look simple: N vCPUs, M GB RAM, K GB GPU memory. But the implementation details vary:​ Some “GPU memory” numbers include a slice of system RAM, not just dedicated VRAM.​ Overcommitted hosts rely on paging and ballooning, which amplifies shared memory behavior under load.​ Multi-tenant GPUs can reserve part of VRAM for host or hypervisor services.

For teams choosing providers, two questions matter more than the headline VRAM number:​

How much of this memory is true on-board VRAM vs shared/borrowed system memory?

What is the effective bandwidth and contention pattern under load?

Platforms that explicitly offer bare-metal or dedicated VRAM GPUs (e.g., A100/H100/H200, or RTX 4090 with full 24 GB dedicated) avoid the hidden shared-memory cliff and deliver behavior that matches spec sheets.​

Economic Impact: Memory as a Cost Lever

Dedicated memory looks expensive on a price sheet, but cheap in a P&L. HBM-based accelerators (A100/H100/H200) cost more per hour than consumer GPUs or shared-memory setups, yet they often win on:​

Time-to-train: fewer days per run means fewer total GPU-hours.​

Engineering time: less time spent on memory gymnastics and firefighting.​

Capacity planning: predictable batch sizes and scaling behaviors.​

By contrast, shared memory systems lure teams with lower hourly rates or bigger “total memory” numbers that quietly include system RAM. The hidden bill shows up as:​ Training runs that take 2–4x longer than planned. Over-provisioning instances to offset jitter. Extra infra and SRE headcount to chase incidents

When GPUs like the H100 and H200 deliver 2–4x the bandwidth of older architectures while keeping models entirely in HBM, even a 30–50% higher hourly rate can translate into lower cost per trained model or per million tokens served.​

Practical Workarounds, and Their Limits

Teams use several tactics to work around memory limits. They help, but they cannot turn shared memory into HBM.

Gradient accumulation: Simulates large batches using multiple smaller ones. It reduces VRAM pressure but increases wall-clock time proportionally to the number of accumulation steps.​

Model parallelism: Splits models across GPUs and shines when GPUs have fast, consistent interconnects (NVLink, NVSwitch, high-bandwidth HBM). It performs poorly if each device is already starved by shared memory or slow PCIe/host RAM.​

Mixed precision (FP16/FP8): Cuts memory footprint and often boosts throughput, but still relies on fast VRAM to see full benefits.​

Quantization: Great for inference memory savings, but training remains bandwidth-sensitive, and heavy offloading still hurts.​

These techniques are multipliers on good hardware, not band-aids that turn shared memory architectures into dedicated ones.

Monitoring: Catching Memory Trouble Early

Teams that avoid 3 a.m. outages treat memory as a first-class SLI. Useful signals include:​

High memory bandwidth utilization with low compute utilization → memory-bound workload.​

Frequent host-to-device and device-to-host transfers → offloading or shared memory behavior.​

GPU page fault counters and PCIe utilization spikes → workloads spilling out of VRAM.​

Tools like nvidia-smi, Nsight Systems, and profiling frameworks expose these metrics and can be wired into alerts long before user-facing errors appear. The goal is to identify “VRAM almost full, bandwidth saturated, compute idle” patterns classic signatures of shared memory pain before they translate into downtime.​

Choosing the Right Memory Model by Stage

Different phases of an AI project tolerate different tradeoffs.

Early prototyping: Small models, frequent code changes. Shared memory or smaller dedicated GPUs can be acceptable to optimize for iteration speed over perfect latency.​

Research and scaling: As models cross tens of billions of parameters and experiments get expensive, dedicated VRAM becomes non-negotiable. A100/H100-era GPUs with 80 GB+ HBM give researchers room to explore without rewriting everything around memory limits.​

Production: Inference SLAs and user expectations demand dedicated memory with high bandwidth and consistent behavior. H100 and H200-class hardware exist precisely to keep large models in HBM and deliver predictable latency.​

Budget-conscious teams often choose RTX 4090-class cards first. These offer 24 GB of dedicated GDDR6X and ~1 TB/s of bandwidth, which is enough for mid-size models and aggressive quantization. As workloads grow, they graduate to HBM-based GPUs to avoid hitting the bandwidth wall.​

The Real Bottom Line

Shared GPU memory has a place. It does not belong at the core of serious AI systems.

As models become larger and more bandwidth-hungry, memory architecture defines whether systems scale smoothly or fail under pressure. Platforms that hide shared memory behind friendly numbers create fragility. Platforms that expose dedicated VRAM deliver reliability.

Spheron AI is built around this principle. Dedicated GPU memory, bare-metal performance, and transparent hardware access are not optional features. They are the foundation for AI systems that work when it matters.



Source link

Copyseeker Launches n8n Community Node for Automated Reverse Image Search | Web3Wire

0
Copyseeker Launches n8n Community Node for Automated Reverse Image Search | Web3Wire


If you’ve ever tried to track down where your images end up online, you know it’s a nightmare. Manual searching takes forever, and you’re bound to miss stuff. That’s the problem Copyseeker set out to solve—and today, it just got a whole lot easier with the release of an official n8n community node.

The node is live on npm: https://www.npmjs.com/package/n8n-nodes-copyseeker

What’s the big deal?

n8n is one of the most popular workflow automation tools out there, and now Copyseeker users can plug reverse image search directly into their automations. No coding required. Just drag, drop, connect, and let it run.

Think about what that means in practice. A photographer can set up a workflow that checks their portfolio images every morning, flags any new matches, and sends an email if something pops up. An e-commerce brand can monitor their product photos for knockoffs on competing sites. A marketing team can track where their campaign visuals spread across the web.

All of this runs in the background without anyone lifting a finger.

Why this matters

“I built Copyseeker because I was frustrated with how hard it was to find where images show up online,” said Mantas, who runs the service. “The API was the first step. But most people don’t want to write code just to check if someone’s using their photos. The n8n node changes that—now anyone can set up automated monitoring in minutes.”

The timing makes sense. Visual content theft is everywhere, from scraped blog images to stolen product photos on shady marketplaces. Creators and businesses need better tools to stay on top of it, and manual searches just don’t cut it anymore.

What you can actually do with it

Here’s where it gets practical:

Set up a daily scan of your image library and get Slack alerts when matches appear. Log everything to a Google Sheet or Airtable for documentation. Build a workflow that checks new product images right after upload. Combine it with other n8n nodes to create something more complex—like auto-generating takedown request drafts when high-confidence matches are found.

The node takes image URLs, runs them through Copyseeker’s search, and returns match data including where the image was found, similarity scores, and page info. From there, you route it wherever you need.

Getting started

The node works with any self-hosted n8n instance or n8n Cloud (with community nodes enabled). You’ll need a Copyseeker API key from RapidAPI, and then you’re good to go.

Installation is straightforward—just search for “copyseeker” in the n8n community nodes or grab it directly from npm.

About Web3Wire Web3Wire – Information, news, press releases, events and research articles about Web3, Metaverse, Blockchain, Artificial Intelligence, Cryptocurrencies, Decentralized Finance, NFTs and Gaming. Visit Web3Wire for Web3 News and Events, Block3Wire for the latest Blockchain news and Meta3Wire to stay updated with Metaverse News.



Source link

Marshall Islands Test Crypto for Universal Basic Income as Cash and Banks Fall Short – Decrypt

0
Marshall Islands Test Crypto for Universal Basic Income as Cash and Banks Fall Short – Decrypt



In brief

The Marshall Islands used Stellar for universal basic income disbursements last month.
It sent citizens USDM1 as a savings and payments vehicle.
The island nation is currently reliant on physical cash.

Access to financial services is shifting in the Republic of the Marshall Islands (RMI), as the island nation begins using digital assets to support its citizens.

Late last month, some Marshallese accepted paper checks under ENRA, the RMI’s universal basic income program, while others saw a token called USDM1 appear in Lomalo, a Stellar-based “digital citizen wallet” developed by enterprise blockchain platform Crossmint.

As a fully collateralized sovereign bond, the token generates yield and is designed to serve as a medium of exchange for the Marshall Islands’ 40,000 population, according to Paul Wong, director of special projects at the Stellar Development Fund (SDF).

]]>

“Unlike a stablecoin, where the issuer is actually earning yield, in this case, the asset holder is earning yield,” he told Decrypt, describing USDM1 as effectively a money market fund.

The distinction between a stablecoin and sovereign bond may be somewhat trivial to Lomalo’s users, but USDM1 shows how governments can offer digital assets that serve dual purposes, while avoiding issues that may arise, for example, if a stablecoin were to lose its peg.

“All they care about is whether there’s money in their account,” Crossmint co-founder Rodri Fernandez Touza told Decrypt, noting that Lomalo was built for simplicity.

Touza characterized features that crypto users have grown accustomed to, such as seed phrases and “weird popups,” as unworkable for the general public. As a result, those features aren’t present in Lomalo, where Crossmint generates and manages user credentials.

USDM1 disbursements are made quarterly to eligible citizens in the RMI. That provides “an opportunity to digitize the economy,” Wong said, for a country that is already dollarized and serviced by the U.S. Postal Service.

Shipping containers

In the Marshall Islands, physical cash is king, but not necessarily by choice.

A white paper tied to USDM1’s debut describes how the Marshall Islands became increasingly reliant on physical cash after several banks withdrew from the country following the 2008 global financial crisis. 

As subsequent reforms altered risk-return profiles, many concluded that corresponding banking relationships with the Marshall Islands weren’t worth it.

Today, the Marshall Islands has only one correspondent bank that provides services such as domestic wire transfers, with a few domestic branches across the nation’s islands. It is not uncommon for citizens to travel far distances just to cash a check, the white paper states.

“If they were to lose that correspondent bank, it would be disconnected from the global financial system,” Wong said. “This instrument provides an alternative.”

Although the Marshall Islands are vast, covering an area comparable to Mexico, the whitepaper notes that SpaceX’s Starlink has made internet access widely available. Still, the country relies on physical cash, often arriving via shipping containers.

“Even if you want to make it work with cash, there are many times where constraints in the economy prevent people from having access to money,” Touza said, explaining that some citizens travel large distances by water, only to discover an empty ATM.

The RMI’s adoption of USDM1 continues the SDF’s efforts to broaden access to financial services in hard-to-reach areas, including those affected by geopolitical conflict. The development of USDM1 was funded with a multi-million-dollar grant by the SDF.

Wong said the SDF is currently working with the German government to support payroll services for healthcare workers in the Middle East. The SDF is also working with the United Nations Development Programme on several cash-disbursement projects, he added.

Collaborating with a United Nations agency dedicated to refugees, the SDF helped establish an aid distribution system in Ukraine supporting Circle’s USDC stablecoin. The SDF partnered with the Ukrainian government in 2021, resulting in the creation of a payments system.

Wong said that work has influenced the SDF’s approach to USDM1, including the notion that individuals are treated as the sole beneficiary of their digital funds. In practice, that could affect longstanding social dynamics for marginalized groups, he said.

“That risk of physical threat is much lower,” Wong said. “When you distribute universal basic income to a woman, it’s not going to some joint account where, historically, a man has used it for purposes other than the family.”

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.



Source link

The​‍​‌‍​‍‌​‍​‌‍​‍‌ Hidden Cost of Colour Shift in ND Filters | Web3Wire

0
The​‍​‌‍​‍‌​‍​‌‍​‍‌ Hidden Cost of Colour Shift in ND Filters | Web3Wire


Neutral-density (ND) filters are best known for their use in reducing the amount of light that reaches a camera sensor, which allows longer exposure times or wider apertures to be used even in bright lighting conditions. In theory, such filters should just lower the intensity of all wavelengths equally so that the color remains the same. However, in reality a number of ND filters have been found to cause slight shifts in their color balance.

Why Colour Shift Matters

The color shift is a transformation that is against the very basic idea of an ND filter: to keep the colors neutral. A filtered image that has been favored by certain wavelengths over others gets not only a tint but also the characters-blue, magenta, or green-that identify the type of the distorted colors. This is becoming a serious issue in professional photography, landscape work, or long‐exposure scenes where accurate color reproduction is critical. Color casts may make color correction more difficult, and thus more time may be spent in post‐processing, while in the worst cases, color casts may remain visible even after adjustments have been made.

How Colour Shift Arises in ND Filters

One of the leading causes of color shift is the different wavelength treatment of the filter. An appropriately made ND filter should reduce the intensity of the light in a uniform manner for all the wavelengths that fall within the visible range. If a filter can’t do this, it will let certain wavelengths (likely infrared or ultraviolet) pass more freely, causing unwanted tints in the image. When infrared is leaked, it causes magenta or red casts, whereas if ultraviolet is not adequately blocked, it will lend a cooler, bluish tone to the image.

Conclusion

The color shift of ND filters is the hidden cost that not only impedes their smooth operation but also compromises the fidelity of images. It results from spectral imbalance, substandard coatings, or materials of low quality-these issues become more apparent with heavy filtering or when using a high-resolution device. While white balance correction or post-processing can alleviate some of the symptoms, they cannot always bring back the original color, especially if the cast is uneven or of a high degree.

For more information, visit https://www.mecoopticalgroup.com/product/camera-mrc-nd-filter/

SEO MAVENS LLC1001 S MAIN ST STE 500KALISPELL, MT 59901

SEO Mavens is a U.S.-based digital marketing agency specializing in SEO, link building, and content strategy, helping global brands improve online visibility through ethical, data-driven search optimization.

This release was published on openPR.

About Web3Wire Web3Wire – Information, news, press releases, events and research articles about Web3, Metaverse, Blockchain, Artificial Intelligence, Cryptocurrencies, Decentralized Finance, NFTs and Gaming. Visit Web3Wire for Web3 News and Events, Block3Wire for the latest Blockchain news and Meta3Wire to stay updated with Metaverse News.



Source link

Ethereum Foundation refocuses to security over speed – sets strict 128-bit rule for 2026

0
Ethereum Foundation refocuses to security over speed – sets strict 128-bit rule for 2026



The zkEVM ecosystem spent a year sprinting on latency. Proving time for an Ethereum block collapsed from 16 minutes to 16 seconds, costs dropped 45-fold, and participating zkVMs now prove 99% of mainnet blocks in under 10 seconds on target hardware.

The Ethereum Foundation (EF) declared victory on Dec. 18: real-time proving works. The performance bottlenecks are cleared. Now the real work starts, because speed without soundness is a liability, not an asset, and the math under many STARK-based zkEVMs has been quietly breaking for months.

In July, the EF set a formal target for “real-time proving” that bundled latency, hardware, energy, openness and security: prove at least 99% of mainnet blocks within 10 seconds, on hardware that costs roughly $100,000 and runs within 10 kilowatts, with fully open-source code, at 128-bit security, and with proof sizes at or below 300 kilobytes.

The Dec. 18 post claims the ecosystem met the performance target, as measured on the EthProofs benchmarking site.

Real-time here is defined relative to the 12-second slot time and about 1.5 seconds for block propagation. The standard is essentially “proofs are ready fast enough that validators can verify them without breaking liveness.”

The EF now pivots from throughput to soundness, and the pivot is blunt. Many STARK-based zkEVMs have relied on unproven mathematical conjectures to achieve advertised security levels.

Over the past months, some of those conjectures, especially the “proximity gap” assumptions used in hash-based SNARK and STARK low-degree tests, have been mathematically broken, knocking down the effective bit-security of parameter sets that depended on them.

The EF says the only acceptable endgame for L1 use is “provable security,” not “security assuming conjecture X holds.”

They set 128-bit security as the target, aligning it with mainstream crypto standards bodies and academic literature on long-lived systems, as well as with real-world record computations that show 128 bits is realistically out of reach for attackers.

The emphasis on soundness over speed reflects a qualitative difference.

If someone can forge a zkEVM proof, they can mint arbitrary tokens or rewrite L1 state and make the system lie, not just drain one contract.

That justifies what the EF calls a “non-negotiable” security margin for any L1 zkEVM.

Three-milestone roadmap

The post lays out a clean roadmap with three hard stops. First, by the end of February 2026, every zkEVM team in the race plugs its proof system and circuits into “soundcalc,” an EF-maintained tool that computes security estimates based on current cryptanalytic bounds and the scheme’s parameters.

The story here is “common ruler.” Instead of each team quoting their own bit security with bespoke assumptions, soundcalc becomes the canonical calculator and can be updated as new attacks emerge.

Second, “Glamsterdam” by the end of May 2026 demands at least 100-bit provable security via soundcalc, final proofs at or below 600 kilobytes, and a compact public explanation of each team’s recursion architecture with a sketch of why it should be sound.

That quietly walks back the original 128-bit requirement for early deployment and treats 100 bits as an interim target.

Third, “H-star” by the end of 2026 is the full bar: 128-bit provable security by soundcalc, proofs at or below 300 kilobytes, plus a formal security argument for the recursion topology. That is where this becomes less about engineering and more about formal methods and cryptographic proofs.

Technical levers

The EF points to several concrete tools intended to make the 128-bit, sub-300-kilobyte target feasible. They highlight WHIR, a new Reed-Solomon proximity test that doubles as a multilinear polynomial commitment scheme.

WHIR offers transparent, post-quantum security and produces proofs that are smaller and verification faster than those of older FRI-style schemes at the same security level.

Benchmarks at 128-bit security show proofs roughly 1.95 times smaller and verification several times faster than baseline constructions.

They reference “JaggedPCS,” a set of techniques for avoiding excessive padding when encoding traces as polynomials, which let provers avoid wasted work while still producing succinct commitments.

They mention “grinding,” which is brute-force searching over protocol randomness to find cheaper or smaller proofs while staying within soundness bounds, and “well-structured recursion topology,” meaning layered schemes in which many smaller proofs are aggregated into a single final proof with carefully argued soundness.

Exotic polynomial math and recursion tricks are being used to shrink proofs back down after cranking security up to 128 bits.

Independent work like Whirlaway uses WHIR to build multilinear STARKs with improved efficiency, and more experimental polynomial-commitment constructions are being built from data-availability schemes.

The math is moving fast, but it’s also moving away from assumptions that looked safe six months ago.

What changes and the open questions

If proofs are consistently ready within 10 seconds and stay under 300 kilobytes, Ethereum can increase the gas limit without forcing validators to re-execute every transaction.

Validators would instead verify a small proof, letting block capacity grow while keeping home-staking realistic. This is why the EF’s earlier real-time post tied latency and power explicitly to “home proving” budgets like 10 kilowatts and sub-$100,000 rigs.

The combination of large security margins and small proofs is what makes an “L1 zkEVM” a credible settlement layer. If those proofs are both fast and provably 128-bit secure, L2s and zk-rollups can reuse the same machinery via precompiles, and the distinction between “rollup” and “L1 execution” becomes more of a configuration choice than a rigid boundary.

Real-time proving is currently an off-chain benchmark, not an on-chain reality. The latency and cost numbers come from EthProofs’ curated hardware setups and workloads.

There is still a gap between that and thousands of independent validators actually running these provers at home. The security story is in flux. The whole reason soundcalc exists is that STARK and hash-based SNARK security parameters keep moving as conjectures are disproven.

Recent results have redrawn the line between “definitely safe,” “conjecturally safe,” and “definitely unsafe” parameter regimes, meaning today’s “100-bit” settings may be revised again as new attacks emerge.

It’s not clear whether all major zkEVM teams will actually hit 100-bit provable security by May 2026 and 128-bit by December 2026 while staying under the proof-size caps, or whether some will quietly accept lower margins, rely on heavier assumptions, or push verification off-chain for longer.

The hardest part may not be math or GPUs, but formalizing and auditing the full recursion architectures.

The EF admits that different zkEVMs often compose many circuits with substantial “glue code” between them, and that documenting and proving soundness for those bespoke stacks is essential.

That opens a long tail of work for projects like Verified-zkEVM and formal verification frameworks, which are still early and uneven across ecosystems.

A year ago, the question was whether zkEVMs could prove fast enough. That question is answered.The new question is whether they can prove soundly enough, at a security level that doesn’t depend on conjectures that may break tomorrow, with proofs small enough to propagate across Ethereum’s P2P network, and with recursion architectures formally verified enough to anchor hundreds of billions of dollars.

The performance sprint is over. The security race just started.

Mentioned in this article



Source link

‘Bitcoin Senator’ Cynthia Lummis Will Not Run for Reelection – Decrypt

0
‘Bitcoin Senator’ Cynthia Lummis Will Not Run for Reelection – Decrypt



In brief

Sen. Cynthia Lummis (R-WY) announced she won’t seek reelection when her Senate term ends next year.
Lummis was a central force behind major crypto efforts, including passage of the GENIUS Act and ongoing market structure bill talks.
She has also been a particularly avid supporter of Bitcoin.

Sen. Cynthia Lummis (R-WY), one of the crypto industry’s most reliable and powerful allies on Capitol Hill, announced Friday that she will not seek reelection when her term expires next year.

“Deciding not to run for reelection does represent a change of heart for me, but in the difficult, exhausting session weeks this fall I’ve come to accept that I do not have six more years in me,” Lummis said in a statement. “I am a devout legislator, but I feel like a sprinter in a marathon. The energy required doesn’t match up.”

Earlier this year, Lummis—who has been called the “Bitcoin Senator” for her crypto support and advocacy—was instrumental to the passage of the GENIUS Act, the first-ever major piece of crypto legislation signed into law. The bill, which established a federal framework for issuing and trading stablecoins, faced many dramatic starts and stops before ultimately getting over the finish line in late July.

Lummis has also been at the center of ongoing negotiations over the crypto industry’s coveted market structure bill, which has faced even more substantial hurdles to passage. The history of that bill, which would formally legalize most crypto activity in the United States, stretches back to 2022, when Lummis and Sen. Kirsten Gillibrand (D-NY) first drafted a version that was ultimately never passed. 

The sprawling market structure bill currently faces numerous obstacles—among them growing dissension between factions within the crypto industry over the legislation’s content and necessity. Senate Republicans first aimed to see the bill passed by the end of summer, then by September, then by the end of this year—a target that has also now slipped by.

]]>

The legislation has not yet been marked up by the Senate Banking Committee, and Congress is expected to grind to a halt by spring in anticipation of the 2026 midterms. Whether the bill will manage to become law will likely become one of the final benchmarks of Lummis’ 18-year tenure in Congress.

In her time advocating for crypto-related issues, Lummis has also placed a particular emphasis on the importance of Bitcoin. Earlier this year, the senator introduced the Bitcoin Act, which would obligate the U.S. government to purchase some $80 billion worth of Bitcoin over a five-year period in the interest of bolstering a federal strategic Bitcoin reserve.

Lummis’ retirement announcement Friday immediately prompted messages of support from crypto industry leaders. 

“Senator Lummis has been a leading champion for digital assets in Washington,” Ji Kim, CEO of the Crypto Council for Innovation, said in a statement shared with Decrypt. “The digital asset ecosystem is stronger because of her service, and we are grateful for her leadership.”

Lummis would have been up for reelection next year. She will retire from Congress in January 2027.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.



Source link

Dynamic Aerospace Systems Announces Official Corporate Name Change from BrooQLy, Inc. | Web3Wire

0
Dynamic Aerospace Systems Announces Official Corporate Name Change from BrooQLy, Inc. | Web3Wire


ANN ARBOR, MICHIGAN / ACCESS Newswire / December 19, 2025 / Dynamic Aerospace Systems (OTCQB:BRQL), a leading innovator in U.S.-manufactured unmanned aerial vehicles (UAVs) and aerospace technologies, today announced that its corporate name change, from BrooQLy, Inc. to Dynamic Aerospace Systems, has become effective.

The name change, approved by shareholders on December 11th, 2025, is in anticipation of the company’s planned uplisting to the New York Stock Exchange (NYSE) in 2026, which better aligns with the company’s core focus on advanced drone systems proudly designed and manufactured in the United States at its facility in Ann Arbor, Michigan.

The Company filed the applicable documentation with the State of Nevada, where it is incorporated, with the change taking effect today. Dynamic Aerospace Systems will now submit the necessary documentation to the Financial Industry Regulatory Authority (FINRA) to reflect the corporate name change.

The Company’s ticker symbol will remain “BRQL” until the anticipated NYSE listing. Once uplisted, it will transition to “DAS”, which the Company reserved in March of this year.

“Operating as Dynamic Aerospace Systems more accurately represents our mission, while aligning us with defense, commercial, and logistics applications. This clarity is imperative in communicating our story to the public as we move towards our NYSE debut,” said Kent Wilson, CEO of Dynamic Aerospace Systems.

About Dynamic Aerospace Systems (DAS):Dynamic Aerospace Systems is a Nevada-incorporated business dedicated to developing innovative aerospace technologies, with a focus on advanced drones (UAVs) for military defense and commercial applications. Committed to engineering excellence and strategic partnerships, DAS delivers reliable, high-performance solutions to meet the evolving needs of the aerospace industry. The Company’s common stock is traded on the OTCQB Market under the ticker symbol “BRQL.”

For more information about DAS, visit: https://www.dynamicaerosystems.com/investor-relations/why-dynamic

Contact Information:Dynamic Aerospace Systems (DAS)3753 Plaza Dr, Ann Arbor, MI 48108

Investor Relations: [email protected]Media Inquiries: [email protected]

Follow DAS news and updates:X: https://x.com/DynamicAeroSysLinkedIn: https://www.linkedin.com/company/dynamic-aerospace-systems/BlueSky: https://bsky.app/profile/dynamicaerosys.bsky.socialFacebook: https://www.facebook.com/profile.php?id=61572730386312StockTwits: https://stocktwits.com/symbol/BRQL

Forward-Looking Statement:

This press release contains forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995, including, without limitation, statements regarding the anticipated benefits of the company’s corporate name change, the expected timing and outcome of the company’s planned uplisting to the New York Stock Exchange, the future transition of the Company’s ticker symbol to DAS, and the Company’s strategic growth plans as Dynamic Aerospace Systems. Forward-looking statements are often identified by words such as “may,” “will,” “should,” “expect,” “anticipate,” “intend,” “plan,” “believe,” “estimate,” “potential,” “project,” or similar terminology. These statements are based on current expectations, estimates, forecasts, and assumptions that involve risks and uncertainties which could cause actual results or events to differ materially from those expressed or implied.

Factors that could cause such differences include, but are not limited to: the Company’s ability to satisfy the listing requirements of the New York Stock Exchange, the timing and outcome of regulatory reviews including FINRA processing of the name change, changes in market conditions or investor sentiment, operational or financial challenges, shifts in U.S. defense or commercial drone demand, competitive developments, and broader economic or geopolitical factors. Additional risks are described in the Company’s filings with the Securities and Exchange Commission.

Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of the date of this release. Except as required by law, the Company undertakes no obligation to update or revise any forward-looking statements, whether as a result of new information, future developments, or otherwise.

SOURCE: BrooQLy, Inc.

About Web3Wire Web3Wire – Information, news, press releases, events and research articles about Web3, Metaverse, Blockchain, Artificial Intelligence, Cryptocurrencies, Decentralized Finance, NFTs and Gaming. Visit Web3Wire for Web3 News and Events, Block3Wire for the latest Blockchain news and Meta3Wire to stay updated with Metaverse News.



Source link

GPU Monitoring for ML: SM Efficiency, Memory Bandwidth, and Bottleneck

0
GPU Monitoring for ML: SM Efficiency, Memory Bandwidth, and Bottleneck


Your training job crashes. Again. The error mentions memory, but system monitors show plenty of free RAM. CPU usage looks normal. Disk is fine. You restart the job, lower the batch size, and try again. A few hours later, it fails in the same way.

After enough digging, the real issue becomes clear. The GPU ran out of memory, but nobody was actively watching GPU utilization or VRAM usage. The system failed silently until it hit a hard limit.

This situation is painfully common in AI teams. According to recent industry surveys, more than 75% of organizations run GPUs below 70% utilization even at peak load. That means teams waste capacity while still dealing with crashes, slow training, and unpredictable performance.​

Knowing how to check GPU usage correctly turns GPUs from opaque, failure-prone assets into predictable infrastructure you can trust.

The Silent Cost of Hidden Failures

The financial impact of poor GPU monitoring extends far beyond software debugging. The data center GPU market alone is projected to grow from $119.97 billion in 2025 to $228.04 billion by 2030, representing a 13.7% compound annual growth rate. GPU installations themselves are scaling at 1.8x annually, with each server consuming 5.9x more power than traditional CPU-based systems. This explosive growth makes visibility not just a debugging convenience but a business imperative.​

At Meta’s scale, the operational impact of monitoring failures is staggering. During a 54-day training run using their Grand Teton platform, the team experienced 419 job interruptions, roughly one failure every 3 hours. When projected to a 128,000-GPU cluster (the scale needed for next-generation models), this translates to a job interruption every 23 minutes. Without proper monitoring and fault detection, these interruptions cascade through training pipelines, turning days of computation into wasted infrastructure costs.​

Why GPU Monitoring Is Not Optional Anymore

GPUs sit at the center of modern AI systems. They are also one of the most expensive parts of the stack. Whether you buy hardware or rent it in the cloud, every idle minute costs money. Current on-demand pricing ranges from $1.21 per hour for H100s on Spheron AI to $6.98 per hour on Azure a 5.7x variance depending on provider selection.​

Without monitoring, teams operate on assumptions. They assume GPUs are busy. They assume memory is fine. They assume slow training is a model issue. Most of the time, those assumptions are wrong.

Research shows that 54.5% of teams cite cost as their biggest GPU issue, not hardware scarcity. More troubling, 90% of organizations report cost or resource-sharing as top blockers to GPU utilization. When teams dig deeper, poor monitoring reveals itself as a major culprit. 16% of organizations explicitly cite monitoring and visibility gaps as a primary GPU challenge.​

Top GPU Resource Issues Blocking Organizations (2025)

Proper GPU monitoring gives teams visibility into what actually happens during training and inference. It helps catch memory pressure before jobs crash. It exposes data pipeline bottlenecks that starve GPUs. It reveals whether expensive accelerators deliver real value or sit idle.

As models grow larger and pipelines become more complex, GPU monitoring shifts from a debugging tool to a core operational requirement.

What “GPU Usage” Really Means

Many teams think GPU usage is a single number. It is not.

GPU usage includes several different dimensions, each telling a different story about system health.

Compute utilization shows how often GPU cores execute kernels. Memory usage shows how much VRAM the workload consumes. Memory bandwidth reveals how fast data moves to compute units. Streaming multiprocessor efficiency shows how well kernels map to GPU architecture. Power draw and temperature indicate whether the GPU runs efficiently or throttles.

Looking at one metric in isolation often misleads teams. A GPU can show 100% utilization while delivering poor performance because kernels do not fully occupy hardware units. Another GPU can show 50% utilization while running efficiently due to bursty workloads.

The memory bandwidth dimension alone reveals critical architectural differences. Modern GPUs show exponential growth in this capability: the RTX A4000 delivers 448 GB/s of memory bandwidth, while the A100 reaches 1,555 GB/s, and the H100 exceeds 3.5 TB/s. These increases enable training of progressively larger models without I/O bottlenecks becoming the limiting factor.​

GPU Memory Bandwidth Evolution Across NVIDIA Generations

Real understanding comes from reading these signals together.

The Fastest Way to Check GPU Usage

Most developers already have the tools they need.

The nvidia-smi command ships with NVIDIA drivers and gives immediate insight into GPU state. It reports utilization, memory usage, temperature, power draw, and running processes.

Running nvidia-smi once gives a snapshot. Running nvidia-smi -l 1 updates every second and shows how metrics evolve during training or inference. This alone often reveals issues such as memory steadily climbing toward failure or GPUs sitting idle between batches.

For a cleaner view, many teams use gpustat. It provides a compact summary of GPU load, VRAM usage, and active processes in a format that is easier to scan during development.

These tools work well for local debugging and small systems.

Monitoring GPU Usage Inside Training Code

Framework-level monitoring adds another layer of visibility.

PyTorch allows developers to query allocated and reserved GPU memory directly from training scripts. This helps track memory growth across epochs and identify leaks caused by tensors lingering on the GPU:

pythonimport torch

torch.cuda.memory._record_memory_history(max_entries=100000)

for epoch in range(num_epochs):
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

torch.cuda.memory._dump_snapshot(“profile.pkl”)
torch.cuda.memory._record_memory_history(enabled=None)

TensorFlow exposes similar APIs for inspecting GPU memory usage. Logging these metrics during training helps correlate memory spikes with specific operations or data batches.

When teams log GPU metrics alongside loss curves and throughput, patterns emerge quickly. Performance issues stop being mysterious and start becoming measurable.

Beyond Single-Node Monitoring: Profiling at Scale

As systems move into production or scale across multiple GPUs, basic tools stop being enough.

NVIDIA Nsight Systems provides deep profiling of GPU and CPU activity over time. It shows exactly when GPUs compute, wait, or stall. However, it is designed primarily for lab environments, supporting a maximum profiling duration of just 5 minutes with 20-200x runtime overhead. This makes it impractical for continuous production monitoring.​

For production-grade visibility at cluster scale, specialized tools emerge. Prometheus collects GPU metrics over time, while Grafana visualizes them in real-time dashboards. With NVIDIA’s GPU exporter, teams track utilization, memory, temperature, and power across entire clusters with approximately 5% overhead.​

Alerts notify teams when GPUs idle for too long, memory approaches limits, or temperatures spike. Historical data reveals trends that point to deeper issues long before users notice problems.

For the most demanding environments, zymtrace represents a newer generation of tools. It provides always-on cluster-wide profiling with minimal overhead (approximately 1 logical core per node), capturing transient performance issues that point-in-time snapshots cannot detect. Unlike Nsight Systems, it correlates GPU performance with CPU stack traces and system-wide metrics, making it ideal for distributed training.​

GPU Monitoring Tools: Trade-offs Between Complexity, Overhead, and Production Readiness

GPU Metrics That Actually Matter

GPU utilization often gets the most attention, but it rarely tells the full story.

GPU utilization measures how often kernels run. High utilization does not guarantee efficient computation. Low utilization does not always mean waste. Context matters.

Memory usage often predicts failures earlier than compute metrics. Gradual memory growth across iterations usually signals leaks. Sudden spikes often indicate oversized batches or unexpected data shapes. Research shows that memory exhaustion is the most frequent cause of GPU crashes in distributed training environments. Uncleared tensors, insufficient memory pinning, and third-party library bugs compound this problem.​

Streaming multiprocessor efficiency shows how well kernels use GPU hardware. Low SM efficiency with high utilization often means kernels are poorly parallelized or memory bound.

Memory bandwidth utilization reveals whether GPUs are truly saturated. A GPU can show high compute utilization while memory bandwidth remains far below peak, indicating that the GPU is waiting for data.

Power draw acts as a sanity check. GPUs doing real work typically draw power near their design limits. Low power usage often indicates that something else in the system blocks performance.

Temperature matters because sustained heat leads to throttling. Throttled GPUs look busy but run slower than expected, with reduced clock speeds leading to sudden performance drops.​

How Different AI Workloads Use GPUs

Training workloads usually show steady GPU usage during forward and backward passes. Short dips between batches are normal. Long idle gaps usually point to slow data loading or CPU bottlenecks.

Well-optimized training pipelines maintain 85-95% GPU utilization during active training phases. When utilization falls below 80%, particularly with high CPU usage, data loading bottlenecks are likely the culprit. This happens when the data loader cannot keep pace with the GPU’s computational speed.​

Inference workloads behave differently. Batch inference shows bursts of activity followed by idle time. Real-time inference creates short spikes when requests arrive. Some idle time is expected, but extreme variability often traces back to memory pressure or scheduling issues.

Multi-GPU training should show similar utilization across all devices. Large differences between GPUs usually indicate load imbalance, communication overhead, or inefficient parallelism.

These patterns help teams distinguish normal behavior from problems.

Turning Monitoring Data into Action

Monitoring only helps if teams act on what they see.

Low utilization often comes from data pipelines that cannot keep up. Increasing dataloader workers, using faster storage, prefetching data, or caching frequently accessed samples often fixes the issue. Research from IBM and other companies confirms that slow data access can stem from object storage throughput limits, the “many small files” problem, or GPUs positioned far from data storage.​

Small batch sizes leave GPUs underutilized. Mixed precision training often allows larger batches without increasing memory usage.

Memory pressure requires careful trade-offs. Gradient accumulation simulates large batches without extra memory. Gradient checkpointing trades extra compute for lower memory usage. Mixed precision reduces memory footprint across the board.

Low SM efficiency often points to kernel-level issues. Using optimized libraries, kernel fusion, and modern attention implementations can dramatically improve efficiency.

Thermal throttling requires addressing cooling infrastructure. GPUs sustaining high temperatures reduce clock speeds automatically, throttling performance by up to 25-30%. Enterprise-scale deployments require proper thermal management and monitoring of sustained temperatures above 80°C.​

Manual checks do not scale when models serve real users. Teams need alerts when metrics drift outside safe ranges. They need dashboards that show trends over time. They need correlation between GPU metrics and application behavior.

Historical analysis matters as much as real-time monitoring. Gradual drops in utilization often signal data distribution changes or model growth. Memory creep often indicates leaks that will eventually crash systems.

When GPU metrics integrate with broader observability platforms, teams gain the context needed to prioritize fixes.

Cost Control Through GPU Visibility

GPU monitoring is also a financial tool.

Idle GPUs waste money. Underutilized GPUs slow delivery. Over-provisioned GPUs inflate cloud bills. Without monitoring, teams cannot quantify these losses.

By correlating utilization with cost, teams identify which workloads justify premium hardware and which do not. They can right-size instances, schedule jobs more efficiently, and shut down idle resources.

Consider the financial impact across cloud providers. At $3.00 per hour for AWS H100 GPUs versus $1.21 per hour on Spheron AI, the difference for a 100-GPU training run over 200 hours is staggering: $60,000 versus $24,200 a savings of $35,800 by simply choosing a more cost-efficient provider. Add in proper monitoring to reduce idle time by even 10%, and the savings multiply across large-scale operations.​

Image

Over time, these optimizations save more money than most model-level tweaks. Teams that implement GPU monitoring often recover the monitoring cost within weeks through reduced idle time and better resource allocation.

The Organizational Reality of GPU Under-utilization

The gap between purchased capacity and actual utilization represents one of the largest hidden costs in AI infrastructure. Current utilization data reveals a troubling pattern:

15% of organizations use 50% or less of available GPU resources​

40% operate in the 50-70% utilization range​

Only 7% achieve over 85% utilization during peak periods​

This means that nearly three-quarters of organizations are leaving significant compute capacity on the table. The reasons are multifaceted: poor scheduling, inefficient resource allocation, and most critically, lack of visibility into what is actually happening on the GPUs.​

GPU Utilization Distribution Across Organizations (2024)

Building Your GPU Monitoring Strategy

The path to operational excellence in GPU infrastructure follows a progression:

Stage 1: Development – Start with nvidia-smi and gpustat for immediate feedback during model development. These tools add zero overhead and are available on every system with NVIDIA drivers.

Stage 2: Framework Integration – Embed PyTorch or TensorFlow profiling into your training scripts. This adds minimal overhead and provides memory tracking that native GPU monitoring cannot offer.

Stage 3: Cluster Monitoring – Deploy Prometheus + Grafana for persistent visibility across multiple nodes. Accept approximately 5% overhead in exchange for historical trends and alerting.

Stage 4: Production Profiling – For critical workloads, implement zymtrace or similar production-grade profilers that capture cluster-wide metrics with negligible overhead and correlation across the full system stack.

Each stage builds on the previous one. Early-stage projects do not need zymtrace; production systems running million-dollar-per-week clusters cannot afford to skip any stage.

GPU Monitoring Tools: Trade-offs Between Complexity, Overhead, and Production Readiness

Common GPU Failure Patterns and Their Root Causes

Understanding how GPUs fail under load helps teams prevent common scenarios:

Memory Exhaustion (OOM): The most frequent failure mode. Memory usage steadily climbs across iterations without adequate monitoring until the GPU hits its VRAM limit. Prevention requires continuous memory tracking and alerts well before capacity is exhausted.

Memory Leaks: Uncleared tensors accumulate on the GPU. Custom CUDA kernels or third-party library bugs often cause these leaks, which are invisible until a job crashes after 100+ iterations. Regular memory profiling snapshots catch these early.​

Data Pipeline Bottlenecks: The GPU cannot find data fast enough to keep compute units busy. This manifests as low GPU utilization despite the job running. Proper I/O monitoring and prefetching strategies resolve this.

Synchronization Failures: In distributed training, timeouts or errors during gradient synchronization across multiple GPUs crash the entire job. Monitoring NCCL communication overhead helps identify these bottlenecks.​

Thermal Throttling: Sustained high temperatures cause the GPU to reduce clock speeds automatically. The GPU appears to run but delivers less throughput than expected. Proper thermal management and monitoring prevent this.

Running GPUs with Visibility on Spheron AI

Access to GPUs should not mean losing control or visibility. Spheron AI provides on-demand access to NVIDIA GPUs with clear performance characteristics and predictable behavior.

Teams can monitor utilization, memory, and performance without hidden abstractions or misleading metrics. Whether training models, running inference, or scaling experiments, teams know exactly how their GPUs behave.

That visibility turns GPUs from a cost center into a reliable foundation for AI systems. Knowing how to check GPU usage properly separates stable AI systems from fragile ones.

Conclusion: From Guesswork to Engineering

Basic tools like nvidia-smi catch problems early. Advanced profiling reveals deeper inefficiencies. Centralized monitoring keeps production systems healthy. The teams that succeed are not the ones with the most GPUs. They are the ones who understand how their GPUs work. Monitoring replaces guesswork with engineering, and that difference shows up in reliability, speed, and cost.

The path forward is clear: start simple with basic monitoring, graduate to framework-level profiling, and scale to cluster-wide observability as your needs grow. Each step removes mystery from your infrastructure, making crashes predictable, utilization measurable, and costs optimizable.

The GPU revolution depends on visibility. Make it a priority, and your infrastructure will thank you.



Source link

Popular Posts

My Favorites

What is 0G Labs? The Decentralized AI Operating System

0
The convergence of artificial intelligence and blockchain technology has reached a transformative moment. While AI models become increasingly powerful, they remain trapped within...