• Eugene Cheah, chủ tịch Recursal.ai, đã phát triển mô hình AI mới có tên receptance weighted key value (RWKV) nhằm giúp AI trở nên dễ tiếp cận hơn ở Đông Nam Á.
• RWKV kết hợp công nghệ mạng nơ-ron hồi quy (RNN) với các yếu tố của mô hình transformer được huấn luyện trước (GPT) để tạo ra một mô hình AI có thể chạy dễ dàng trên các thiết bị cấu hình thấp.
• Mô hình này có thể chạy AI với 7 tỷ tham số chỉ trên 12GB RAM - tương đương GPU dùng để chơi game. Điều này giúp giảm chi phí chip AI xuống còn vài trăm đô la thay vì hàng nghìn đô la.
• RWKV giảm chi phí tính toán xuống 100 lần so với các mô hình GPT thông thường. Điều này giúp các startup không cần phải sử dụng điện toán đám mây đắt đỏ.
• Hiện tại, chi phí phát triển AI đang tăng cao do giá GPU và phần cứng liên quan tăng mạnh. Ví dụ: một máy chủ với 8 card Nvidia H100 SXM có giá 296.644 USD.
• Các mô hình AI lớn như ChatGPT4 với 1,8 nghìn tỷ tham số cần 3 tháng huấn luyện trên 8.000 GPU H100, tổng chi phí lên tới 32,7 triệu USD.
• Nhiều startup buộc phải sử dụng điện toán đám mây để phát triển AI, nhưng các máy chủ đám mây ở Đông Nam Á còn hạn chế và đắt đỏ.
• Cheah cho rằng hầu hết các doanh nghiệp không cần mô hình với hàng nghìn tỷ tham số. Mô hình 7 tỷ tham số đã đủ cho nhiều ứng dụng doanh nghiệp.
• RWKV đã được sử dụng thành công trong một số trường hợp như điện thoại an toàn cho trẻ em, hỗ trợ bác sĩ viết báo cáo xuất viện, hay hỗ trợ công việc pháp lý.
• Mặc dù có ưu điểm về chi phí, mô hình RWKV được tinh chỉnh cho một lĩnh vực cụ thể sẽ kém linh hoạt hơn so với các mô hình đa năng như ChatGPT.
📌 Mô hình RWKV giúp giảm chi phí phát triển AI xuống 100 lần, cho phép chạy mô hình 7 tỷ tham số trên GPU thông thường. Điều này mở ra cơ hội tiếp cận AI cho các startup và doanh nghiệp vừa và nhỏ ở Đông Nam Á, thay vì chỉ giới hạn trong nhóm các công ty công nghệ lớn.
https://www.techinasia.com/cant-afford-300k-ai-chips-new-model-sea-cuts-high-costs
#TechinAsia
Can’t afford $300k for AI chips? New model for SEA cuts high costs
There is a saying that goes, “During a gold rush, sell shovels.”
No one has learned that better than the semiconductor industry. The price of graphical processing units (GPUs) – the shovels of the AI age – has skyrocketed since the release of OpenAI’s ChatGPT in November 2022.
While the AI industry’s Magnificent Seven – which includes Google, Microsoft, and Meta – can afford to spend billions, many startups can no longer afford to dig.
Eugene Cheah says he doesn’t want to see that happen.
“I want to make sure that AI is accessible for everyone else in the rest of Southeast Asia,” he tells Tech in Asia. Cheah is the chairman of Recursal.ai, an open-source AI platform that has developed a new model called receptance weighted key value (RWKV).
He was previously the co-founder and chief technology officer of Uilicious, a low-code test automation tool.
RWKV combines recurrent neural network (RNN) technology with elements of generative pre-trained transformer (GPT) – a type of large language model (LLM) like ChatGPT – to provide an AI model that runs easily on low-end devices.
While most people associate AI with the likes of ChatGPT for individuals, enterprise AI is software integrated and built to support the functions of businesses. It can be used for data collection, analysis, and automation of tasks, among others.
In short, RWKV is an AI model that can run enterprise-level software without the costs associated with high-end computer chips. It can even run an AI model with 7 billion parameters on 12 GB of RAM – the same GPUs used for playing video games, Cheah says.
That reduces the cost that firms spend on AI chips to hundreds of dollars instead of thousands.
These more efficient AI models mean that startups no longer need to run to the cloud, Cheah says.
For businesses using GPT technology to develop AI, costs have soared due to high computation expenses and the escalating prices of AI chips and related hardware. As a result, AI development is becoming affordable only to deep-pocketed players.
So can the RWKV model compete with the LLMs from the AI industry’s Magnificent Seven?
Lower costs = AI accessibility
“The RWKV project’s primary goal is to make AI accessible to everyone, regardless of language or nation,” Cheah says.
It does this by reducing computational cost by a scale of 100x, he adds.
RWKV is owned by Recursal and the not-for-profit Linux Foundation. While Recursal, which is backed by Hack VC and Soma Capital, has not released how much it has raised to date, it plans to raise no more than US$250 million.
RNN technology, which the RWKV model uses, has largely been abandoned by the AI industry in place of GPT technology. RNN was designed to train one word (or token) at a time. And unlike other LLM models, it does not link the token to every other token in the system.
While this cuts down on the amount of processing needed, Cheah says it also leads to a design bottleneck where “even if you throw in high-end GPUs, you are still training one token at a time or one word at a time.” As such, it’s not possible to scale up training speed.
“That’s why previously we never had an RNN that has over a billion parameters,” he says. In AI, parameter refers to the internal variables that models use to make predictions or decisions.
By combining RNN technology with elements of GPT, RWKV has built several models that can provide industry-specific use cases.
We have a client that uses a 7B model to provide a safe phone for kids… These are very low-hanging fruits for AI models, and 7B is more than sufficient.
“We have already successfully scaled our RWKV architecture to 7 billion parameters (7B) and 14 billion parameter (14B). Now the question is, can we scale it even further? And that’s really a question more of funding and experimentation,” Cheah explains.
In the open-source space, the high-end standard is currently at 70 billion and 405 billion parameters.
But that’s nothing compared to what Magnificent Seven are building, which are using 1 trillion parameters models. This is possible due to technology introduced in a 2017 paper called “Attention Is All You Need.”
Transformer-based AI models use attention mechanisms, which can determine the relative importance of words or even just parts of words and how they relate to each other. It is incredibly data intensive, and training a 1-billion parameter model requires 80 GB of GPU memory, which is prohibitively expensive in the current AI boom.
Nvidia’s H100 SXM, which uses the top AI chip on the market – the H100 – has 80 GB of memory and sells for US$32,700. (Tech in Asia found one currently marked down to US$29,600. You’re welcome.)
A preloaded server with 8 SXM cards sells for US$296,644.
Using one of those often isn’t enough, either. Case in point: OpenAI’s ChatGPT4, which has 1.8 trillion parameters. It requires three months of training using 8,000 H100 GPUs, Nvidia CEO Jensen Huang said at a recent conference. That would cost US$32.7 million in total at current prices.
These of course are just the hardware prices. Electricity is extra. For instance, using 2,000 Nvidia cards could cost Texas-based firms over US$2 million a year, according to a report from US-based Liftr Insights.
To give an example of just how many of these chips are being used by the Magnificent Seven, Meta founder and CEO Mark Zuckerberg said the social media giant will have 350,000 H100 cards by the end of 2024.
Meta’s latest AI model, Llama 3, is designed to work across all its social media platforms and runs with both 7B and 70B parameters.
Running to the cloud
This level of pricing has forced many startups to use cloud computing to train and develop their AI applications, but that often presents its own issues. Startups tell Tech in Asia that cloud servers in Southeast Asia are limited and expensive, although prices have come down thanks to new data centers in Malaysia and Singapore.
Google has pledged US$2.2 billion to build data centers in Malaysia for the express purpose of using them for AI development.
Revenue from public cloud computing in Southeast Asia is expected to reach US$16.4 billion in 2024, according to Statista. It is expected to grow annually by 20%, meaning the market could reach US$40.8 billion by 2029.
Financing new data centers isn’t cheap. With high-end chips still in demand, owners will be charging a premium to recoup their costs.
More power than you need
Cheah believes, however, that most companies don’t need 1T parameters.
“We have a client that uses a 7B model to provide a safe phone for kids,” he says. Every message the phone receives is scanned for sexual content and expletives. “These are very low-hanging fruits for AI models, and 7B is more than sufficient.”
Most of the business enterprise models – including medical, banking, finance, and legal – do not require high-end computation. They just have a single area that can be focused on.
Cheah cites the process of discharging patients at a hospital as an example. The manual process requires a doctor to write a review of the case, a process that can take 30 minutes. In comparison, a 7B AI can write a review in three minutes.
“Multiply that by the amount of patients in the network and by the amount of doctors in the network, and then it starts adding up to tens of millions of dollars,” he estimates. “These are what sometimes I call ‘boring but big impact’ use cases.”
For instance, a fine-tuned RWKV model used in a law office actually performed better than the ChatGPT model – at least when it came to legal benchmarks, according to Cheah. But that fine-tuning has a price.
“The downside is if you ask the law model how to cook – or anything else – it will just utter garbage,” he says.