Wednesday, February 19, 2025

Sui Mainnet Experiences Temporary Network Halt Due to Bug

Share

KEY TAKEAWAYS

  • The Sui Mainnet experienced a network halt due to a bug in the congestion control code, affecting all validators and halting transactions.
  • The issue was traced to the TotalGasBudgetWithCap mode, which caused a crash when processing certain transactions.
  • A swift fix was implemented, restoring the network within 15 minutes, showcasing effective incident response systems.
  • Sui plans to enhance testing and build workflows to prevent future incidents and reduce response times.

On November 21, 2024, the Sui Mainnet experienced a complete network halt between approximately 1:15 and 3:45 am PT. The incident was caused by a bug in the congestion control code, which led to all validators being stuck in a crash loop, preventing any transaction processing.

The issue arose from an error in the congestion control system, specifically in the TotalGasBudgetWithCap mode. This mode was briefly enabled in protocol version 63, reverted, and then re-enabled with the accumulating scheduler in protocol version 68. The bug triggered when the network received a transaction with a mutable shared object input and zero MoveCall commands, causing all validators to crash.

Technical Details and Resolution

The Sui network’s object-based architecture allows for massive parallel processing of user transactions. However, transactions writing to the same shared object must execute sequentially, limiting the number of transactions that can be processed for that object. The congestion control system, designed to prevent network overload, was recently upgraded to improve shared object utilization by better estimating transaction complexity.

The bug in the TotalGasBudgetWithCap mode led to the network halt. Once identified, the fix was straightforward and implemented through PR #20365. The corrected code was deployed to Mainnet in version 1.37.4 and Testnet in version 1.38.1. Thanks to the swift response from the Sui validator community, the network was restored within 15 minutes of the fix’s release.

Future Prevention Measures

In response to the incident, Sui plans to enhance its testing systems to generate a wider variety of adversarial transactions to prevent similar bugs. Additionally, efforts will be made to improve build workflows, reducing incident response times by making debug and release binaries available more quickly. A significant portion of the outage time was spent waiting for the release to build.

The incident highlighted the effectiveness of Sui’s incident detection and response systems, with automated alerts reaching on-call engineers simultaneously with community reports. The swift collaboration among Sui validators ensured a rapid resolution, as detailed here.


Disclaimer: The views expressed in this article are those of the authors and do not necessarily reflect the official policy of CoinsHolder. Content, including that generated with the help of AI, is for informational purposes only and is not intended as legal, financial, or professional advice. Readers should do their research before taking any actions related to the company and carry full responsibility for their decisions.
Shree Narayan Jha
Shree Narayan Jha
Shree Narayan Jha is a tech professional with extensive experience in blockchain technology. As a writer for CoinsHolder.com, Shree simplifies complex blockchain concepts, providing readers with clear and insightful content on the latest trends and developments in the industry.

Read more

Related Articles