Contents
Part 1. Transaction Review
Part 2. Transaction Correctness Issues
Part 3. How to Efficiently DEBUG
Readers familiar with database distributed transactions should understand that debugging bugs related to the correctness of distributed transactions is a very challenging task. This article mainly introduces how MatrixOne debugs transaction correctness issues during its development process.
1. Transaction Review
Let's first review how transactions are implemented in MatrixOne.
1.1 Where the Data Resides
MatrixOne is a cloud-native database, with most of the data stored in object storage
(any S3
compatible object storage), which is immutable. The immutability of data brings many benefits, such as eliminating the need to consider these data's consistency.
Besides the data stored in object storage
, a very small amount of data is stored in LogService
, a high-performance WAL
service implemented using Raft. This part of the data in MatrixOne is called LogTail
. This data is mutable and can be considered the latest commit data of the MatrixOne cluster.
Object storage
+ LogTail
constitutes the full data of MatrixOne.
1.2 Transaction Isolation Levels
MatrixOne's transactions support RC
and SI
isolation levels, with RC
being the default.
1.3 Transaction Modes
MatrixOne's transactions support pessimistic
and optimistic
modes, with pessimistic
being the default.
1.4 Transaction Concurrency Control
MatrixOne uses MVCC for transaction concurrency control and HLC
for transaction timing.
1.5 Transaction Read Operations
For transaction read operations, the first thing to determine is which data is visible to the transaction. At any given moment, the data visible to the transaction includes:
- All data in object storage with
CommitTimestamp < txn.SnapshotTimestamp
- All data in LogTail with
CommitTimestamp < txn.SnapshotTimestamp
- All
Uncommitted
data in the transaction'sWorkspace
Once we have clarified which data is visible to the transaction, we need to ensure that the data meeting the conditions is complete when read.
Workspace
data, at any moment, is complete for the transaction. What needs to be ensured is the data in object storage
and LogTail
. Since LogTail
is the latest data write of the entire MatrixOne cluster, ensuring the completeness of LogTail's data means that the corresponding data in object storage
is also complete for the transaction.
The question now is how to ensure the completeness of LogTail
data for the transaction.
MatrixOne's transactions are created at CN nodes. After a transaction is created, a SnapshotTimestamp
for the transaction is determined (this timestamp is constant throughout the transaction lifecycle for SI
, and constant within the lifecycle of each Statement
for RC
).
LogTail
data is generated at TN
nodes and written into LogService
. CN uses a subscription method to obtain the latest LogTail
data and Apply
these LogTail
data to CN's memory.
CN maintains a maximum Apply
CommitTimestamp
in memory. The completeness of LogTail's data for the transaction can be ensured based on the watermark of this timestamp and the transaction's SnapshotTimestamp
.
1.6 Transaction Write Operations
MatrixOne's transaction Uncommitted
data is written into Workspace
, which is in CN's memory.
The more data a transaction writes, the larger the memory occupied by this Workspace
, until OOM occurs. To solve this problem, MatrixOne has a threshold for the memory size of Workspace
(default is 1MB). When it is found that the memory of Workspace
exceeds this threshold, the data in Workspace
will be written into object storage
, and the data in Workspace
will be replaced with temporary file names on object storage
.
Before a transaction is committed, it does not interact with TN
. At the time of Commit
, the data in Workspace
is sent to TN
nodes for Commit
processing.
2. Transaction Correctness Issues
2.1 What Are Correctness Issues
In the sections above, we reviewed transactions. Now, we need to explain what transaction correctness issues are. The main transaction correctness BUGs encountered in the development process of MatrixOne include:
Lost Update
underRC
mode- Failure of
pessimistic transaction
lock service Workspace
data issues
These issues all lead to transaction correctness problems. These problems result in transactions reading incorrect data or committing incorrect data.
2.2 How to Test
MatrixOne conducts many tests to help us discover these transaction correctness issues, including:
- Unit tests
- Integration tests
- CI tests before PR Merge
- Performance benchmark tests after PR Merge
- 7*24 hours stability tests
- Daily various tests
- Chaos tests
These tests help us discover transaction correctness issues.
2.3 Conventional Problem Analysis Methods
For development, our conventional problem analysis methods are usually these:
- Breakpoint Debugging
- Log Analysis
- Metrics
- Tracing
Among these, logs are the most useful for analyzing distributed cluster transaction issues, with the rest being almost useless. Breakpoint debugging can only analyze reproducible issues, metrics can only observe a general situation of the system, and cannot locate data issues. Tracing generally refers to call chain monitoring, which can be used to analyze performance issues but cannot help with data error issues.
In summary, logs are almost the only useful tool for analyzing correctness issues.
2.3.1 What Logs Are Needed
Transaction correctness issues, in essence, are about reading or writing incorrect data for a row of records. The problem itself is that the content of the data is wrong. If the data content is wrong, the problem may occur wherever this row of data is read or written.
If we need to analyze transaction correctness issues, then we need to analyze where the problem occurs in the transaction that reads or writes. And there's another condition, which is to be able to link the logs of these places based on the problematic transaction and row, to find the problem.
But the problem is complex. Due to the pessimistic transaction mode, multiple concurrent transactions will affect each other, so it is also necessary to link all relevant information of conflicting transactions together for analysis.
2.3.2 Issues with Logs
We analyzed what information we need to analyze issues, and these information need to be recorded in logs. These logs cannot operate at the Info
log level, only at the DEBUG
level. This brings some problems:
-
In some tests, it is impossible to turn on
DEBUG
level logsIt is impossible to turn on DEBUG logs in performance tests. If an error occurs in a performance test, it is almost impossible to analyze.
-
DEBUG
logs make it difficult to reproduce issuesTransaction correctness issues are sometimes very difficult to reproduce, possibly requiring specific concurrent timing. If it occurs in tests where DEBUG cannot be turned on, you need to turn on the DEBUG log level, run the same load, and too many DEBUG logs change the system's operating timing, making the problem even more difficult to reproduce.
-
Logs are difficult to analyze
When there are complete DEBUG logs, the scale of this log may be very large, and it is a log generated by each node and process in a distributed environment, making the analysis exceptionally difficult.
3. How to Efficiently DEBUG
In the development process of MatrixOne, fixing transaction correctness bugs has always been a painful experience. MatrixOne is still in a rapid development stage, and there are many optimizations in the system that have not been dealt with. These changes may bring new transaction correctness bugs. Therefore, we need an efficient tool and method to DEBUG transaction correctness bugs.
3.1 Design Goals
The disadvantages of using logs to analyze problems, we already have deep experience. Now the design goals we need to achieve are 3 points:
- In any test scenario, as long as a bug occurs, there is enough information to analyze the problem without needing to reproduce it.
- Cannot have too much impact on performance tests, a performance impact within 10% is acceptable.
- To provide a very rich way and means of analysis information.
3.2 Design Challenges
A huge amount of data that needs to be analyzed will be generated in tests. How these data are stored and how to provide rich analysis query capabilities. Because when analyzing problems, you need to analyze and query information based on various conditions.
3.2.1 How to Provide Analysis Query Capability
First, we don't consider how the data is stored, let's look at how to provide data query analysis capability. Solving this problem may solve the problem of how data is stored.
So far, there is no more convenient and semantically richer analysis query method than providing these data queries in SQL. If we can provide a SQL way to provide DEBUG data analysis query capability, the benefits are obvious, and the efficiency of DEBUG is also greatly improved.
Therefore, we decided to provide DEBUG data query analysis in SQL.
3.2.2 How Data Is Stored
The storage method is obvious, because providing SQL query capability means that the data needs to be stored in the database. So we need a database that can provide strong AP capabilities to store these DEBUG data.
The conclusion is obvious, MatrixOne itself is a database that supports high-performance AP queries.
3.2.3 How Data Is Written
We have a design goal that when collecting DEBUG information is turned on, the performance cannot be affected by more than 10%. We need some special designs for data writing to the database:
- Asynchronously write data to the database in
Load
mode. - DEBUG data writing can skip some non-critical data to improve performance.
- Control the frequency of writing to the database to avoid excessive performance impact.
3.3 Trace Framework Design
From MatrixOne version 1.2, MatrixOne provides a mo_debug
built-in database, and based on the experience of analyzing logs, we abstracted the data needed to analyze transaction issues and provided some tables to store data.
And we provided some special statements to dynamically turn on and off the Trace functionality.
Due to space limitations, this article will not describe the specific design meaning of these tables, it will only provide a brief introduction. The main purpose is to share the thought process.
3.3.1 Data Tables
create table trace_event_txn (
ts bigint not null,
txn_id varchar(50) not null,
cn varchar(100) not null,
event_type varchar(50) not null,
txn_status varchar(10),
snapshot_ts varchar(50),
commit_ts varchar(50),
info varchar(1000)
)
create table trace_event_data (
ts bigint not null,
cn varchar(100) not null,
event_type varchar(50) not null,
entry_type varchar(50) not null,
table_id bigint UNSIGNED not null,
txn_id varchar(50),
row_data varchar(500) not null,
committed_ts varchar(50),
snapshot_ts varchar(50)
)
create table trace_event_txn_action (
ts bigint not null,
txn_id varchar(50) not null,
cn varchar(50) not null,
table_id bigint UNSIGNED,
action varchar(100) not null,
action_sequence bigint UNSIGNED not null,
value bigint,
unit varchar(10),
err varchar(100)
)
create table trace_event_error (
ts bigint not null,
txn_id varchar(50) not null,
error_info varchar(1000) not null
)
create table trace_statement (
ts bigint not null,
txn_id varchar(50) not null,
sql varchar(1000) not null,
cost_us bigint not null
)
These data tables mainly record all data writes, reads, transaction metadata changes, executed SQL, concurrency conflicts, and other key information during the execution process.
3.3.2 Filter Tables
create table trace_table_filters (
id bigint UNSIGNED primary key auto_increment,
table_id bigint UNSIGNED not null,
table_name varchar(50) not null,
columns varchar(200)
);
create table trace_txn_filters (
id bigint UNSIGNED primary key auto_increment,
method varchar(50) not null,
value varchar(500) not null
);
create table trace_statement_filters (
id bigint UNSIGNED primary key auto_increment,
method varchar(50) not null,
value varchar(500) not null
);
These Filter tables are used for filtering to reduce the amount of data that needs to be recorded as much as possible.
3.4 Effect
After turning on Trace, the impact on performance is around 5%. Relying on the high-performance AP query service capability provided by MatrixOne, developers can use SQL to query DEBUG issues, query all needed data changes, transaction metadata changes, etc., during the execution period, which greatly improves efficiency and speeds up the FIX of transaction correctness bugs.
About MatrixOne
MatrixOne is a multi-model database based on cloud-native technology that can be deployed on both public and private clouds. This product uses an original architecture featuring storage-compute separation, read-write separation, and hot-cold data separation. It supports multiple workloads such as transactions, analytics, streaming, time-series, and vectors within a single storage and computing system, and can isolate or share storage and computing resources in real-time and on-demand. The cloud-native database MatrixOne helps users significantly simplify increasingly complex IT architectures, providing minimalist, highly flexible, cost-effective, and high-performance data services.
Since its release, MatrixOne Enterprise Edition and MatrixOne Cloud Service have been applied in various industries including internet, finance, energy, manufacturing, education, and healthcare. Thanks to its unique architecture design, users can reduce hardware and maintenance costs by up to 70%, increase development efficiency by 3-5 times, respond more flexibly to market changes, and seize innovation opportunities more efficiently. With the same hardware investment, MatrixOne can achieve several times higher performance.
Keywords: hyper-converged database, multi-model database, cloud-native database, domestic database.