Event Review

Without "Operating System", AI Doesn't Work Well Even If It's Free

Posted by MatrixOriginPublished on

ai

It's been a month where AI continues to move forward at high speed: the technology is moving forward at a frenetic pace, and prices are dropping just as quickly.

OpenAI, Google has released GPT-4o, Project Astra, and Microsoft has just released more than 50 AI-related updates; ByteDance's "beanbag" large model family, and Tencent's hybrid large model made a collective appearance.

Within hours, the computational power of large models has entered the "century era." Alibaba Cloud, Baidu, and ByteDance have all slashed prices, even offering services for free, stopping just short of paying users to adopt them.

In this environment, on May 22, GeekPark's Tech Talk Tonight invited Wang Long, founder and CEO of MatrixOrigin, to discuss the foundational elements of large model development: data and computing power.

MatrixOrigin, which has upgraded from a database business to a large model operating system, has recently secured tens of millions of dollars in Pre-A round funding. Similar to Silicon Valley's star data company Databricks, MatrixOrigin has consistently adhered to a long-term vision since the big data era. From the outset, it considered future integration with AI technology and maintained a vision of creating a "digital world operating system." This foresight has enabled it to become a data company better suited for the new AI era.

Despite the current intense competition in technology and pricing, very few companies are effectively utilizing AI capabilities. At this stage, the implementation of large models, which appear to be "highly priced but scarcely used," is an evident pain point.

Wang Long believes that it's crucial to adjust our understanding and expectations of large model capabilities. "Just like autonomous driving, L2-L3 levels can already bring a lot of value," he says. In the application of large model capabilities in enterprises, much tedious self-adaptation is currently required. MatrixOrigin continues to undertake challenging but worthwhile endeavors, such as launching the operating system Matrix OS and co-releasing "Neolink.AI" with Century Interconnect. They aim to create the "large model workbench" of the future, connecting data and computing power as seamlessly as a mouse and keyboard, thereby helping more companies enter the large model era with a single click.

As Wang Long clearly stated, "Our operating system connects data storage and computing power resources, serving both large model developers and traditional enterprise application developers. I hope to provide them with the necessary computing power and data resources in the simplest and most reasonable way."

AI has become more affordable, but the pain point isn't the price.

Zhang Peng: Recently, there has been a lot of attention on the price war in the large model field. In fact, behind this price war, on the one hand, reflects the cost advantage brought by technological progress and economies of scale. On the other hand, it also shows the industry's desire to enable more industries to truly apply large models.

Today, we're introducing a startup called MatrixOrigin, which focuses on two very important factors in the development of large models - data and computing power. They have just secured tens of millions of dollars in Pre-A round funding. What kind of problems are they trying to solve that make capital so bullish and invest on such a large scale?We have invited Wang Long, the founder and executive chairman of MatrixOrigin. Could you please introduce yourself first?

Wang Long: I'm an industry veteran and have worked in three countries and dozens of companies. In China, I've mainly worked at Tencent as the head of IoT, big data, and artificial intelligence. In 2021, I decided to start a new venture again, focusing on doing something around data.

Our core product, MatrixOne database, and MatrixOne Cloud fully managed cloud-native data platform, brings together structured and unstructured data, data from intelligent IoT and AIGC, and traditional enterprise data. They are all processed using the same storage and computing framework. Recently, after the financing, we've also added a focus on computing power.

Zhang Peng: Recently, OpenAI released GPT-4o, and Google also launched Project Astra, both of which mentioned multimodal technology. What does this change signify?

Wang Long: Let me share my experience. Previously, our company equipped all developers with GitHub's Copilot tool. After personally experiencing the latest GPT-4o, I decided to purchase GPT service packages for all employees. Because I've been doing a lot of Office-related work lately, such as making charts and writing documents, I found GPT-4o to be excellent in handling multimodal tasks.

I'd like to share how my view on large models has evolved. About at the end of the previous year, when GPT was first released, it was truly impressive. At that time, I realized that it was a revolution in the field of human-computer natural language interaction. Initially, I didn't think too much about it. I thought chatting with a robot wouldn't bring much real benefit.

During the last wave of deep learning, people made money in fields like face recognition, security monitoring, OCR, etc. These areas didn't truly commercialize and profit until about six months to a year later, and the process of making money was quite difficult.

But I found that this AI wave is very different from the previous one in deep learning technology. The impact and development speed of this AI are much faster. In January to February 2023, some people had already started making millions or even tens of millions of dollars using these technologies, such as using AI to generate copy and conduct advertising marketing. The development speed and impact of this AI wave are at least ten times that of the previous one.

Now the situation is different again. Multimodal technology makes us realize that, although pure text has a lot of imaginative space, compared to the multimedia capabilities combined with images, videos, and audio, the value of text is still somewhat lacking. Its potential imaginative space may be at least a hundred times that of the AI wave in 2016. Although I dare not assert whether it can reach the level of the steam engine during the industrial revolution, the market potential is already huge.

Zhang Peng: Now that we see the future potential value of multimodal capabilities, how does the relationship between multimodal technology and the core capabilities of large models work? How can enterprises better understand and utilize their own data assets in the era of large models, and what changes will occur in the future?

Wang Long: When I started my business, I was considering whether to continue doing things related to data or AI. AI tends to be more application-oriented and easier to make money through people and projects, but data is more foundational and needs product support.

Because regardless of how AI develops in the future, data is undoubtedly the most fundamental "first brick." Data plays a crucial role in serving AI or other systems. So from the very beginning of founding the company, our goal was to become the "operating system of the digital world," to be a pioneer rather than just a database.

Zhang Peng: The database is "the first brick", right?

Wang Long: Yes, regardless of how AI develops in the future, data is undoubtedly the most basic "first brick." Whether it is for AI services or other system services, data plays a key role. So we chose to focus on the data field.

Next, we thought about which data to focus on. Traditional enterprise data is not very reliable, and the Internet data field already has many pioneers. So we turned our attention to emerging intelligent IoT data, including data generated by cameras, microphones, industrial sensors, etc., hoping to build a data operating system to integrate these heterogeneous data sources.

To integrate heterogeneous data from different eras and systems, our system architecture needs to maintain a high degree of innovation. And the advantages of this architecture will become more apparent after the rise of AI.

Now our database system has capabilities in inference models, agent systems, vector processing, etc. We are further planning how to organically integrate training data to further enhance the functionality of the system.

From the beginning, we set the vision of being the "operating system of the digital world," so we made some forward-looking layouts in advance. Now it seems that this is not only the key factor for our successful financing, but also makes our database system stand out. It is no longer just an ordinary database but an innovative data management platform oriented toward the future.

Zhang Peng: You believed that databases were a meaningful problem to tackle in the future, which is why you set new architectures and goals. How did you judge the future at that time?

Wang Long: I roughly categorized data into several types.

The first category is enterprise data, mainly involving internal process management, such as ERP, inventory management, OA, etc., generating structured data. This type of data focuses on characteristics like storage space utilization and data integrity and is overall quite "well-behaved."

The second category is internet data, mainly in the areas of personal consumption, UGC content, etc. This type of data is not as "serious" as enterprise data and has a lot of uncertainties. For example, Google's Hadoop (a distributed system infrastructure) ecosystem is designed for this type of large, varied, and inconsistent data. Their approach is to faithfully record the original data and then let retrieval systems handle the analysis.

With the rise of mobile internet, data volume and complexity further increased. These evolutions reflect changes in the nature of data - from the early structured and deterministic enterprise data to the massive, diverse, and uncertain internet data. This presents entirely new challenges for data management and processing.

Additionally, intelligent IoT data has changed. Previously, most internet data came from human-generated sources, but in the era of IoT, much data comes from various machine devices such as cameras, phones, etc.

Machine-generated data is different from human-generated data. If this data is incorrect, it's challenging to determine responsibility. Compared to before, this type of data not only requires attention to data integrity and reliability but also addresses responsibility attribution when data errors occur.

With the arrival of the AI era, the nature of data has changed again. Previously, AI was mainly trained on public internet data. However, when it comes to practical applications, it needs to be combined with internal enterprise data and IoT data, such as design, exploration, industrial control, etc.

Zhang Peng: How did you make the next decision?

Wang Long: Regardless of whether it's about AI or other areas, in the future, humans will tend to centralize various types of data for processing. Because only when data is aggregated can its value be maximized.

For example, production data may exist within the enterprise, while marketing data may be on the internet. If these can be integrated, more optimized applications can be achieved. Even internet companies have some data generated by financial, management, and other systems internally. So, unifying these heterogeneous data sources is the best approach.

To meet the special requirements of data from different eras, we considered these factors when designing the system architecture. First, we chose cloud-native storage technology, which can unify various heterogeneous data in the same storage system. Whether it's enterprise internal data, internet data, or IoT data, cloud-native storage can flexibly accommodate them. At the same time, we designed the storage system finely to support efficient data processing.

Secondly, we realized that different types of data have different processing requirements. Some need to record error information, some need to focus on data birth traces, and some have high throughput requirements. To meet these diverse needs, we adopted cloud-native computing resource fine-grained scheduling technology to flexibly adapt to the characteristics of different data.

Zhang Peng: This isn't just a problem of the data lake (open architecture), it's also about storage. Has this become a new goal?

Wang Long: Yes, and then we adopted a completely new architecture to build our computing framework, which of course comes with a cost, that is, numerous code refactoring by the team. We've undergone numerous trials and received support from some clients who value our ideas, providing real data to improve our products. This continuous process helps us identify areas that need improvement and make timely adjustments.

By October last year, when we officially launched the product, we already had thirty to forty clients. Thanks to their support, we've built a unique framework for managing, storing, and computing resources.

The advantage of this framework is that it's very easy to integrate whatever type of data we access. For example, AI data or accessing any type of computing resources (such as the mentioned manufacturing or GPU resources). This is because our system is essentially a plug-and-play system, similar to the concept of an operating system, making it easy to connect different vendors' mice or monitors.

Many startups may have acquired a large number of customers in one or two years, while it took us three years to acquire thirty to forty customers. However, I believe this effort is worthwhile because it helps us build a strong product foundation.

Zhang Peng: In the outbreak of large model technology and the domestic enthusiasm for large models, how do you think entrepreneurs should identify and seize future opportunities and make strategic adjustments? As a mature entrepreneur, how did you go through the decision-making process at that time?

Wang Long: Once, I and our investor, VNET's founder and executive chairman, Chen Sheng, talked about how some clients faced the problem of IT investment growth far exceeding business growth. They asked if there was a way to control this uncontrollable growth.

After analyzing, I found that the client's software costs were increasing reasonably. Most domestic manufacturers either use open-source products or products from domestic software companies. They don't pay for 500 servers if they only use 500 servers.

So, where did their costs mainly grow? Originally, I thought, their hardware increased from 500 to 1000 servers, and MatrixOrigin's software could help save around 200 or 300 servers. However, the client requested a 50% cost reduction, and since software costs only accounted for 20% of the total, even if we didn't charge, the cost could only be reduced by 20%.

Chen Sheng said that the client's needs were no longer about using ordinary server cabinets; what they cared about most now was intelligent computing power. In the past, as an enterprise IT manager, when you built an IT system, the investment might be distributed as follows: hardware accounted for 50%, and other costs such as electricity fees, software fees, service fees, etc., each accounted for around 10%.

But now the situation has changed. Enterprises need to invest a large amount of capital in intelligent computing power first, such as Nvidia's GPUs, which could account for 70% to 80%, electricity costs 20%, and other software fees, labor costs, service fees, etc., adding up to only 10%. Chen Sheng said that optimizing that 10% didn't make much sense; you should find ways to optimize the 70% that takes the lion's share.

Zhang Peng: So you can't do micromanagement on a grain of rice.

Wang Long: This statement was quite enlightening to me. I thought if we develop a data product, such as an operating system, we hope customers can quickly apply and implement it. Therefore, we need to focus on where the customers' money is mainly spent and how to help them use this part of the funds more effectively.

This leads to a question: what are users doing? Users are developing large models or applying them, and their entire logic has undergone revolutionary changes. So, we can't just focus on traditional centralized data management; we need to think about what else we can do.

We found that the key was how to better integrate computing power with data. Now our company has transitioned from a data-centric single-mode three or four years ago to a dual-core mode centered on data and intelligent computing power.

So, going back to your previous question, as an entrepreneur, how do you adjust your strategy? It's about recognizing the impact and challenges brought by the large model wave and finding ways to address them.

Zhang Peng: Entrepreneurs sometimes make a mistake, that is, they focus too much on what they are good at and believe in, neglecting the actual needs of the market. They invest a lot of energy in perfecting their products, but if what everyone lacks isn't this grain of rice, they may miss the opportunity to carve a big Buddha on a mountain. You instantly shifted your focus to intelligent computing power, which is a crucial point.

Wang Long: Yes, and this change is not a complete turnaround; it's just a change in direction, becoming a combination of data and intelligent computing power.

"The difficulty with One Click AI is that it's missing the Workbench.

Zhang Peng: Let's talk about the relationship between data and computing power.

Wang Long: In the past, in the field of databases, we often talked about the concepts of separating storage and computation and decoupling storage and computation. Both storage and computing resources are very valuable, and if they can be decoupled, computing power and data can be flexibly scheduled and matched according to actual needs.

In the past, people often discussed whether to move data to the vicinity of computing resources or move computing resources to the vicinity of data. There are various technical architectures surrounding this issue, with complex advantages and disadvantages, and many researchers. But ultimately, data and computing power are the two most critical factors in IT systems.

The matching between data and computing power itself is a core capability. In the past, more general computing power was used, basically calculated according to the account of the CPU. But now, the situation with intelligent computing power is different. It is calculated according to the GPU, and the cost of the GPU is much higher than that of the CPU. We also need to consider the network costs, electricity costs brought by the GPU, as well as the complex factors such as the cooling system of the entire system.

One of the things that must be considered is the power consumption of the computing power cluster and the carrying capacity of the power grid. You can see this change from the experience of Century Internet. They used to focus on IDC cabinets, and their core competitiveness lies in the nationwide distribution of cabinet resources and strong network connectivity.

But with the rise of intelligent computing power, Century Internet found that they needed to change. In the past, the power consumption of a cabinet was only about tens of kilowatts, but now, the starting power consumption of the latest NVIDIA server is 120 kilowatts, such as NVL72, which requires 72 GB200 cards. If you add other factors such as cooling systems, power consumption can easily reach 200 kilowatts.

For IDCs, they may have thought their power supply was sufficient. But now, even for slightly larger clusters, such as NVIDIA's latest demonstration cluster with 36,000 cards, each card consumes over 1 kilowatt, finding a data center that can withstand such high power consumption is actually quite challenging.

Zhang Peng: It's like we need to get them a nuclear power plant. The power consumption is indeed too high, and no city can handle it.

Wang Long: The United States is currently worrying about where to build data centers. Although electricity and computing power seem like traditional terms, the formulas and meanings behind them are actually not the same as before. In any case, how to better coordinate data and GPU is becoming a key point for the future, whether it's for AI-related applications or other purposes.

Zhang Peng: Actually, data and computing have always been inseparable. Although we now need to consider high energy consumption, I think the energy problem is something humanity will always find a way to solve. The key issue is whether we can coordinate data and GPU effectively.

Wang Long: Recently, I was in the United States and had the opportunity to meet with Jensen Huang, the founder of NVIDIA, and his team. I had an in-depth discussion with their chief solutions architect about the issue of high energy consumption.

Firstly, he believes that energy consumption will decrease because currently, we mainly focus on model training, and inference tasks are relatively few, with much lower energy consumption compared to training. Of course, if everyone starts using inference, energy consumption may rise, but overall, we can see that many startups in the industry are researching how to reduce the energy consumption of both inference and training.

Secondly, he put forward a very insightful point. He suggested that we should not just consider large models as a kind of software but rather as an energy storage device.

Large models convert electricity into energy that can be used in the future, storing it within the large model. He believes that the essence of large models is to integrate human intelligence—including data, algorithms, a wealth of information on the Internet—and the energy being generated into a digital model.

Zhang Peng: Speaking of costs, the recent price wars in the field of large models are quite intense. Does it mean that those large models trained at huge costs suddenly seem less valuable? Does this price competition really indicate an exponential decrease in the cost of using AI as a productivity tool in the future? Will it become the starting point for more widespread applications?

Wang Long: Indeed, the competition in the field of large models is very interesting at the moment. Although we used to say that traffic is crucial, the era of large models is different. Most models have very similar interfaces, and for developers, switching to different large models may only require a small change in code. Traffic needs stickiness to demonstrate value; large models are not core-driven by traffic.

Our CTO and scientists in Silicon Valley have had in-depth discussions on this issue. When they first deployed Copilot for assisting in code writing, our CTO felt a bit uneasy; he joked that we might not need a CTO in the future.

But in practice, it turns out there's still a gap between large models and human experts, although the way we write programs might change in the future. Large models may provide assistance in certain aspects, but core innovation and decision-making still require human expertise.

So, cost reduction might not necessarily be the key to large model applications.

Traditionally, becoming a programmer meant learning various programming languages such as Java, C++, Python, or PHP. Once you mastered these languages, you could write code and communicate with computers. Writing code is essentially a way of manipulating computers.

But now, you have to communicate with large models and let them do the work. This makes programming easier because anyone can interact with large models using natural language, continuously debugging and optimizing. When applying large models in enterprises, the most basic point is that as long as you can speak, you can make large models work for you, even if you have no technical background. When you feel the machine language's performance is not good enough and you want to make it smarter, various reinforcement learning-based agents may appear to help improve the performance of large models.

Zhang Peng: You don't even need to master the machine's language. Essentially, this is still programming, but compared to the programming in the pre-large model era, it's like an advanced programming after prompting engineering.

Wang Long: That's right, it's programming based on data. I feed data to large models to get the answers I want. If I'm still not satisfied with the large model, I can fine-tune it directly by changing its parameters. If I'm completely dissatisfied with the large model, I'll just create a brand new one, even do my own pre-training.

Zhang Peng: In fact, we are all participating in shaping the development of large models. If we still need to pay expensive fees, it seems unreasonable.

The practices of foreign companies in this regard are worth learning from. They treat users as developers, hold regular developer conferences, and provide incentives. We should change our mindset and see large models as a platform that can be collectively created and shared.

Wang Long: Yes, when you use AI like OpenAI or Google's Gemini, they ask if you allow them to use your data for training. In fact, it's like helping them improve their algorithms or patching their models.

This is just my speculation, but when I agree to let them use my data for training, their response time seems to improve significantly. This makes me wonder if there's a background mechanism that can use the data provided by users to optimize model performance in real-time.

Zhang Peng: So, given this trend, how do you envision your business? Recently, you introduced a new operating system, Matrix OS, from hyper-converged databases, but the term "operating system" is quite broad. How do you understand it?

Wang Long: Although the term "operating system" sounds like a grand concept, its basic principles are quite simple. For example, what does the operating system of our laptop do? Its main function is to coordinate various hardware devices such as mouse, keyboard, and display.

As a user, you only need to care about how you want to operate the computer, access information, and display content, without worrying about whether the mouse is from Acer or Dell, or where the display and chip come from. The operating system handles all this complexity in the background for you. If you were a new mouse manufacturer, you wouldn't need to report to the operating system; you just need to adhere to the established interface standards to be recognized and used by the operating system.

Essentially, the operating system is a connector. It simplifies complex systems and acts as a connector, providing convenience for developers.

What does our operating system connect? It connects data storage and computing resources. Who are the service objects? Model developers and ordinary application developers.

Whether it's AI developers or traditional enterprise application developers, I want to provide them with the required computing power and data resources in the simplest and most reasonable way. Whether it's developing AI applications, fine-tuning, prompting engineering, or any other development work, the tools and services we provide can meet their needs.

We are committed to making the operating system an easy-to-use, efficient platform, allowing developers to focus on innovation and problem-solving without being troubled by the complexity of underlying hardware and software.

Zhang Peng: If we analogize, Matrix OS is a bit like a programmer's workstation in the era of large models, providing developers with a comprehensive toolset. On this platform, you can start from basic programming skills like prompt engineering, then gradually advance to more advanced rag techniques, and further into complex model training.

No matter your programming level, Matrix OS can provide you with the most suitable tools and resources to efficiently complete your work. It allows you to choose different levels of development tasks based on your needs and abilities, whether it's basic programming, advanced algorithm development, or custom model training. Matrix OS supports you in every step of the way, catering to your specific requirements and capabilities. That's the core of Matrix OS.

Wang Long: Large models are like a programming factory for me, available for everyone to use.

Zhang Peng: It can be a workstation or even a large model production factory for an enterprise. This year, we found that the actual application and penetration of large models seemed slightly lower than expected. Any technological innovation is aimed at solving practical problems. Could you help us identify where the bottleneck lies?

Wang Long: I'm generally very cautious about technology, but now I'm more optimistic than before. This optimism may be related to my expectations.

Artificial intelligence is essentially a probabilistic system, especially in the field of large models, where the core is the Transformer architecture built upon a probabilistic system. It's a process of gradually increasing from a 50% probability to a 99% probability.

We can compare the development of artificial intelligence to autonomous driving. The goal of autonomous driving is to replace human drivers with machines. If machines can never make mistakes, that would be a remarkable achievement.

Zhang Peng: Isn't that Level 5?

Wang Long: Yes, even at Level 5, there are still mistakes, but it may be much better than humans. Autonomous driving is divided into 5 levels, from L1 to L5. Currently, most Chinese manufacturers are at L2, and Tesla is actually at around L2.5 to L3.

Tesla's Full Self-Driving (FSD) system is an end-to-end solution that essentially relies on cameras to handle all decision-making tasks, demonstrating its multimodal processing capabilities to some extent. Its value is already very obvious, even L2-level advanced driver assistance systems in China have provided tremendous help to users.

So, when you realize that it may take a long time to reach Level 5 fully autonomous driving, but Level 3 is already providing substantial assistance, you hold a relatively optimistic attitude towards the future of large models.

Zhang Peng: From you, I learned a lesson that optimism depends mainly on our expectations. When talking about large models, we can at least determine the value of L2 level. Each subsequent level upgrade may unlock exponentially more scenarios and create greater total value.

Wang Long: When we discuss the application of large model technology in enterprises, if we expect it to handle real-time decision-making tasks for mission-critical systems from the beginning and require it to be absolutely error-free, we may be disappointed. This expectation is similar to the Level 5 requirement in the autonomous driving field, which demands high precision and a zero-error rate. If measured by this standard, large models may still struggle to meet it.

However, if we apply large models to areas that can tolerate some errors, such as document writing, knowledge retrieval, or design work, they can provide significant assistance.

Therefore, when enterprises apply large models, they should choose the appropriate intelligence level based on the nature of the task. It's not right to apply an L3-level intelligence to a scenario that requires L5-level precision. Understanding the specific requirements of each scenario and matching them with appropriate intelligent solutions is the key to applying large models.

Zhang Peng: Programmers often say "show me the code," while in the business startup field, the emphasis is on "show me the money." Only when your product can bring profits can it prove its value and practicality.

Although we may not be able to deeply understand all the professional details, could you outline the problems we originally faced? Then explain how we solved these problems through what methods?

Wang Long: Firstly, the data problem is relatively easy to solve. Currently, most data training and inference tasks use a shared storage architecture, which is standard practice. The core problem lies in how to match suitable computing power according to the requirements.

Take our product as an example, the logic of matching GPU computing power and CPU computing power is different. If matching CPU computing power, we may focus more on containerization, servicization, and virtualization technologies. When matching GPU computing power, we need to consider more task-related factors, such as whether the data is used for training or inference, and the different uses of data throughout the development process, such as RAG model construction or data annotation, which involves different management processes.

The DevOps and OPS (operations) toolchains we often mention need to match these processes. It's like owning a store where everything is available, but whether the products are placed scientifically and reasonably needs to be continuously optimized based on actual operations.

This is an iterative and optimizing process that requires adjustment based on user feedback and actual usage.

We have a preset optimization strategy for deploying large models. For example, for models of different scales, we decide where to perform inference and choose which hardware acceleration cards to execute these tasks. At the same time, we prepare resources according to the expected number of concurrent requests and estimate the time required for resource preparation.

During inference, we also need to consider multiple factors, such as whether to deploy services in regions with lower costs or closer to users to reduce latency. Once the tools and resources are ready, we need to continuously optimize them in various scenarios.

This is similar to the self-optimizing database computing framework we developed before. Its core idea is that the system improves itself with usage, becoming smoother and more efficient over time, and better allocates resources. Ideally, whether it's training, inference, or any other stage in the process, we can always find the most suitable position to match computing power and data to the task, optimizing based on the data volume.

The most ideal state we pursue is to dynamically and intelligently manage and allocate resources to achieve optimal performance and cost-effectiveness. Of course, this process needs to be implemented step by step, continuously adjusted and improved with experience accumulation and scenario expansion.

Zhang Peng: So, the capability of infrastructure is crucial for enterprises, and adopting best practices can reduce unnecessary trial and error costs, is that the idea?

Wang Long: That logic is correct. Let me share another thought. We're discussing the cost and architectural changes encountered when developing and deploying modern applications. Now, when we talk about connecting data and computing power, this connection is different from before. For example, in the past, when connecting computing power, we didn't need to consider the specific location of hardware or energy consumption issues too much, but now these factors have become a very important part of the cost, unlike before when they could be ignored.

Similar considerations exist for data. We not only need to consider data sharing, security, and processing issues but also how data is used by different applications. This requires us to have deeper considerations at every link, making the chain longer.

This is also a necessary change in our Matrix OS business model. We cannot completely place ourselves in the middle layer, acting only as a software platform without caring about the details of the underlying infrastructure. Now, new programming paradigms have emerged, and you need to understand more.

This is what I mean by "integrating software and hardware" and "integrating data and applications," both of which may be important changes in the future.

From Data Intelligence to "Computing Power Intelligence"

Zhang Peng: In the process of helping everyone solve problems, what else do you envision that your products need to cover?

Wang Long: Actually, you can look at the American company Databricks, their logic is somewhat similar to ours, both centered around data. But they firmly believe that machine learning should take place within the database. This idea is quite challenging because in traditional architectures, databases are not used to execute deep learning tasks. Databricks has been committed to promoting this concept and has been highly regarded in the United States over the past year. They firmly believe that AI should be closely integrated with data. With the rise of AIGC, their architecture makes it relatively easy to integrate various AI toolsets. Whether it's through purchasing or integration, such as support for Python, it makes embracing AI technology easier. This development process is worth our attention and study. Our architecture is somewhat similar to theirs, and we have also considered the integration with AI technology from the beginning of the design, leaving many interfaces reserved. Although we may not know what the future training frameworks will look like, or what new technologies will emerge in the future, we are already prepared to some extent. For example, some are discussing whether there is room for improvement in the Transformer architecture or whether there are other methods to disrupt existing models, these are still unknowns.

Zhang Peng: So, it's like walking a path that's too uncertain.

Wang Long: I'm just focused on doing a good job with the data platform and connecting it with the computing power platform. We have already clarified how different types of data should be handled: enterprise data, IoT data, and AI-generated data, etc., which forms a solid foundation. The training part can be done through open cooperation, with other parties accessing our interfaces, and us accessing other training platforms, to form a mutually beneficial cooperation relationship. This is the meaning of an ecosystem. Our platform is open source, and our core philosophy is to ensure effective management of data and computing power.

Zhang Peng: Speaking of Databricks, the company's market value has now reached over $40 billion. It has indeed surpassed Snowflake in this wave and become very popular. Last year, I attended conferences for both of these companies, held at the same time. Snowflake has a long history and a big name, even inviting Richard Huang to endorse it, but the atmosphere wasn't as lively as Databricks'. I noticed this trend last year.

Wang Long: The stories of Databricks and Snowflake are very interesting. Since its founding in 2012, Snowflake has been committed to enterprise data management and optimization. They focused on how to break through in the field of enterprise data. However, with the rise of AI technology, Snowflake found that things were not so simple. AI training often relies on public data on the Internet, or cleaning and processing of unstructured and semi-structured data. Snowflake primarily had enterprise application data, which was not easily adaptable in the AI era. But Databricks focused on AI and open data management capabilities from the beginning. They have been expanding their capabilities based on the open data management framework Apache Spark. Databricks' early challenge was that databases were more profitable because structured data is easy to standardize, leading to higher gross margins. However, the AI path, especially in the field of deep learning, did not bring the expected returns at that time. The reason was that data processing at the time required a lot of cleaning, mining, and governance work, and data without these capabilities was not very valuable.

But as AI technology developed, people realized that the core of AI was actually Internet data. This shift allowed Databricks' open data management strategy and early investment in AI to start showing advantages, while Snowflake needed to adjust its strategy to adapt to this new market trend.

Zhang Peng: Thirty years in the East and thirty years in the West, and it's not even thirty years.

Wang Long: When our company initially referred to itself as a database, it may have led people to think of Snowflake, but we adopted open data formats comprehensively from the beginning. We carefully considered a question: what should future data look like to better adapt to the times? Therefore, our design in storage architecture allows for easy access to multimodal data. We hope that our persistence and forward-looking vision will bring us good luck.

Zhang Peng: It sounds like a victory of architectural belief. Comparing ourselves with Databricks, it seems that we are still in the early stages. Having Databricks as a leading example does indeed make the future prospects very exciting. This significance goes much deeper than our initial identity as just a database company. We previously discussed that Century Internet and your company have established a very close cooperative relationship. In the so-called "data plus computing power" field, especially at the computing power level, it seems that your strategic partnership is transitioning from the traditional IDC to the currently popular AIDC (Artificial Intelligence Data Center). What changes does it represent? How do you understand this resonance between you and Century Internet?

Wang Long: First of all, in my communication with the management of Century Internet, I am very clear about the consensus we have reached on certain analytical methods. In the AI-native world, computing power is key, and IDCs must upgrade to AIDCs to embrace the future of AI and have a chance to succeed in the market. If we move away from the value center, we will encounter problems. So we have this opportunity for cooperation.

In addition, over the past few months, we have been closely cooperating on research and development. We found that we must extend the value chain to consider how to combine with higher-level products. For example, Century Internet's accumulation in network technology and its investment in AIDC in the past two years have allowed us to understand the difficulties and challenges of building GPU clusters, including electricity, cooling systems, network service quality, and a series of issues closely related to large models.

In the AIDC field, we found that the situation is different from before. Previous IDCs were traditional enterprises, relatively stable and easy to predict. But in the AIDC field, these have become uncertainties. As a software platform, we must find solutions and cannot always assume that the underlying infrastructure is good or ready at any time.

Zhang Peng: Your cooperation in launching the "Neolink.AI" platform with Century Internet seems somewhat similar to Hugging Face, doesn't it?

Wang Long: Our Neolink.AI platform, you could say, is a combination of CoreWeave and Hugging Face. CoreWeave, as a computing power platform, provides powerful computing resources. On traditional CPU computing platforms, it's very straightforward to use, just like public cloud services. You just need to create a CPU instance, whether it's a virtual machine or a container, and the cloud platform will quickly respond and create the corresponding resources.

However, the operation of GPU computing power cloud platforms is different. While you can request GPU resources and most of the time the platform will provide them, you often find yourself needing to make reservations. For example, when you request a certain GPU computing power, the platform may tell you that you need to wait in a queue for a few hours. This is not only because of the scarcity of computing power resources, but also because of the special nature of GPUs and AI tasks.

In the past, most tasks could be completed on general-purpose GPUs or CPUs, and you could simply increase the number of resources as needed and then scale out. But the logic of GPUs and AI is not like that; AI tasks often require fine-tuning, which necessitates building a cluster.

For example, if you need to perform a large-scale fine-tuning, you may need 100 machines to build a cluster. This is a required number of resources, not just saying "I only need 20, I can do it a little slower." Using insufficient resources may lead to difficulties in task execution or even system crashes, unable to complete the task.

GPUs and AI tasks have unique logic and requirements, which place higher demands on computing power platforms, requiring them to provide more flexible and powerful support. Therefore, when developing software, we must deeply understand two aspects: the characteristics of the application itself and the data processing methods, and the specific situation of hardware resources, such as the locations of GPUs and CPUs, how to build clusters, and whether there are available cluster resources. If there are no existing clusters, we may need to set up a reservation button on the user interface for users to queue and wait for resource allocation.

Zhang Peng: This is not simply saying to take over the demand and then distribute it. In fact, the logic in between is changing. To some extent, this reflects the cooperative relationship between Century Internet and you, a relationship that aligns very well with the trend of considering data and computing power comprehensively in the future, to truly solve practical problems. Both of you naturally resonate.

Wang Long: As I mentioned earlier, the cost structure is an important issue. I've heard some rumors, although I don't know the specific internal data, for example, OpenAI's GPT-4 model, which is said to have used 30,000 GPUs during training, and GPT-5 is planned to increase to 100,000. Imagine, the daily cost of 30,000 GPUs is tens of millions of RMB, which is a huge number. And supposedly the utilization rate of the entire cluster is between 70% and 90%, which is quite high.

In China, most clusters have not reached such a large scale, or haven't reached the scale of 30,000 GPUs. Some have reached 10,000 GPUs, which exists. But I saw in a paper from a certain vendor that the utilization rate was 57%. If the daily cost of a cluster is one hundred million US dollars, and if the utilization rate drops from 80% to 60%, then the difference is millions of RMB.

If software can add value in this scenario, or if your solution or approach can improve efficiency in this scenario, then the difference it makes will be huge.

Zhang Peng: Under the drive of large models and other new technologies, industrial structure and value chains are likely to undergo systemic changes. You have found your own position, but there are still many entrepreneurs seeking their opportunities. What emerging opportunities have you observed that are worth our attention and exploration?

Wang Long: The application of multimodal large models definitely has entrepreneurial opportunities. Multimodal technology is not only based on NLP (natural language processing) applications; it will change human-machine interaction, and even the interaction logic between machines. Some applications already use multimodal, such as electronic pets, personal assistants.

Furthermore, in terms of infrastructure, there is great potential both in improving training capabilities and in increasing resource utilization. For example, due to US restrictions on China, our single-card performance may not be as good as others, so we need to make up for it by improving resource utilization and the ability to scale out. In this regard, China's strength—using quantity to compensate for the lack of quality—will play an important role.

Zhang Peng: The ultimate integration of diverse and heterogeneous computing power is what we need, and if it's not enough, we'll bring it together.

Wang Long: Yes, and actually, this technology is very difficult. But generally, where there is great difficulty, there is corresponding value. Because it is closely related to the application, the workload of each manufacturer's large model varies. We are also studying this, and I think there may be some opportunities in the future, especially considering that China has its unique scenario requirements. For example, if the US needs to integrate 30,000 cards, we may need to integrate 100,000 cards. In this case, the entire underlying logic will change.

Zhang Peng: How to really train and use large models well. We still need good infrastructure to help us get things done.