Physical AI and China’s Opportunity
Physical AI and China’s Opportunity
Today’s artificial intelligence is no longer confined to screens. It is stepping into the multidimensional physical world, seeking to understand, record, and intervene in people’s daily lives. Riding this wave, Chinese enterprises, backed by robust manufacturing capacity and clustered supply chains, stand poised to redefine what the AI era truly means.
When AI Steps into the Physical World
In January 2026, Cao Wei, Partner at BlueRun Ventures, attended the Consumer Electronics Show (CES) in Las Vegas, USA. He observed that the exhibition was no longer dominated by raw, hardcore graphics card and chip specifications as in previous years. The spotlight has shifted to AI application endpoints, featuring a wide array of robots and smart hardware products designed to “grow a physical body”.
A total of 942 Chinese exhibitors took part, accounting for 22% of all participants and capturing massive exposure. Among 38 humanoid robot booths, 21 belonged to Chinese companies — outnumbering all other countries combined. In wearable tech such as AI glasses, 16 out of 23 brands were from China, taking up roughly 70%. Chinese firms also showcased plenty of creative innovations: an AI cocktail robot capable of conversation, AI pet devices claiming autonomous flight and levitation, and an anthropomorphic panda robot “An’an” designed for elderly companionship and interaction.
Cao Wei noted that Shentianji, a consumer robotics startup newly invested by BlueRun Ventures, drew huge crowds. Its exhibition booth was always surrounded by global visitors. The company’s flagship product is a two-wheel outdoor companion robot resembling WALL‑E, the iconic robot from Hollywood sci-fi films, capable of accompanying users on outdoor walks, photography, exercise and even dog-walking.

(Image: The outdoor companion robot launched by Chinese firms may become a new form of electronic pet in the future | Source: Visual China)
CES has long been regarded as the global tech weathervane. At CES 2026, Jensen Huang, CEO of NVIDIA, formally introduced the concept of Physical AI. He stated outright that Physical AI has arrived at its “ChatGPT moment” — by 2026, AI will fully expand from the virtual digital world into the physical realm. He mentioned “Physical AI” 17 times in his keynote, emphasizing that AI’s perception, decision-making and execution capabilities in the real world are on the verge of large-scale implementation.
After returning from CES, Cao Wei formed a profound insight: leveraging China’s industrial clusters and supply chain advantages in manufacturing, the country is entering an extended cycle of innovation opportunities in Physical AI. Chinese firms demonstrated explosive, flourishing creativity at the show.
“Currently, we have taken the lead over North America in both the ecological depth of single product categories and the breadth of category coverage. Driven by large model innovation and supply chain strengths, we will deliver tangible results very soon.” Cao Wei believes that over the next five years, a large number of globally leading original Chinese technologies and products — both consumer-facing (To C) and enterprise-facing (To B) — will enter overseas markets. This trend has already been unfolding steadily in China for the past three to five years.

(Image: CES is a benchmark event for the global consumer electronics industry; Chinese enterprises have shown strong competitiveness in recent years | Source: Visual China)
The mobile internet revolution sparked by the “iPhone moment” gave rise to a wave of platform giants. Having witnessed countless entrepreneurs seize industry windfalls, create wealth legends and shape industrial development, many investors and practitioners live with the fear of missing the next high-growth tech track.
Meanwhile, China has matured remarkably in consumer application implementation during the mobile internet era, leading the world in business model innovation. Combined with China’s unique manufacturing supply chain and cluster advantages, many industry insiders now see real hope for creating an equivalent “iPhone moment” in AI hardware.
Founded in Silicon Valley and established in China in 2008, BlueRun Ventures is an early-stage venture capital firm that has long invested in smart hardware and deep tech startups, including Li Auto, Zhiyuan Robotics, Galaxy Universal Robotics, and Gaussian Robotics.
Cao Wei explained that in the late mobile internet era, the team conducted extensive industry research and field visits to identify sectors where China holds global competitive edges. They ultimately concluded that China’s powerful manufacturing sector could build strong competitiveness in smart hardware and consumer electronics.
“With our Silicon Valley roots, we can make clear comparisons. When developing consumer electronics or robots, China’s iteration speed is about 2–3 times faster than overseas. The Yangtze River Delta and Pearl River Delta boast extraordinary resource endowments and complete industrial clusters, with highly concentrated upstream and downstream ecosystems enabling highly efficient industrial collaboration.”
Cao Wei once used a hot pot analogy to explain this cluster effect to overseas counterparts: “All the ingredients and seasonings are placed closely together; simply light the fire, and you get a full pot of delicious food.”
For AI to truly integrate into the physical world, strong software capabilities are also essential — and the question is how China can fill this gap. Cao Wei and his team visited numerous Chinese universities and exchanged ideas with young PhDs and scholars. They found that China’s academic frontier research focuses largely overlap with those in the United States.
“In fact, China moves even faster than the U.S. in translating academic breakthroughs into industrial applications.” Taking autonomous driving as an example, its core lies in SLAM (Simultaneous Localization and Mapping), a technology originating from university laboratories. Once consensus was reached in academic circles across both countries, it was quickly applied to industrial robots and new energy vehicles.
“This gives us great confidence. China is catching up with global top-tier levels in talent reserves, frontier academic research and industrial commercialization. Software capability will cease to be a bottleneck sooner or later.” Cao Wei added. The firm began investing in cloud computing and other digital infrastructure over a decade ago, believing solid foundational infrastructure would accumulate massive data and fuel an explosion of AI application scenarios.

(Image: AI hardware has become a brand-new competitive arena, and China is presented with unprecedented opportunities to excel | Source: Visual China)
As China’s hub for high-tech industries, Shenzhen is at the epicenter of this boom. Its unparalleled hardware industrial clusters have made Shenzhen the innovation heartland of Physical AI in China. Investors and entrepreneurs from Beijing and Shanghai now travel to Shenzhen frequently, with many relocating here permanently.
“Many peers come to Shenzhen every week, and some have settled here since the fourth quarter of last year,” said Zou Yunli, Managing Partner at Tiantu Capital. Headquartered in Shenzhen, Tiantu is best known for investments in consumer brands such as Xiaohongshu and Chayanyuese, and has expanded aggressively into tech projects in recent years.
“The Pearl River Delta is now witnessing an outpouring of new startups and creative ideas in AI hardware.”
Nearly all electronic products are racing to integrate AI capabilities. A senior executive at a 25-year-old hardware solution provider in Shenzhen told me that the company has been approached by a diverse range of manufacturers over the past year:
- Desk lamp makers want AI conversation functions to let lamps interact and talk with children;
- Insulated cup producers hope to add AI features that communicate water temperature with users;
- Even adult product manufacturers are demanding AI integration to enhance user experience.
“This is undoubtedly the hottest market trend right now,” the executive said. Home appliances pose the biggest challenge. “Many smart home appliances already support voice control, yet manufacturers want more. Rice cookers are expected to identify rice types, judge water ratios, and optimize cooking modes for the best flavor. But home appliance standards are fragmented with inconsistent parameters across brands, making it impossible to solve all cases with a single universal solution. For a period, our office was filled with rice cookers as our team ran nonstop cooking tests every day.”
Real-World Human-Machine Interaction: Starting with AI Toys
In the quest for AI hardware’s “iPhone moment”, AI toys have emerged as the first viable landing scenario. The logic is straightforward: adults use AI to solve practical problems and quickly lose trust if the technology proves unreliable. Children, by contrast, are far less demanding of interaction accuracy yet crave companionship and response the most. Their engagement with toys is driven by imagination and emotional projection.
Moreover, AI toys do not need to handle complex decision-making tasks. Manufacturers can continuously refine AI interaction logic to learn how to communicate naturally with humans, before gradually expanding into more serious and sophisticated real-world scenarios.
Several companies have already achieved solid market traction in this segment. The industry widely adopts PMF (Product-Market Fit) to evaluate whether an AI hardware product delivers genuine market value and willingness to pay. Founded in 2021, Havivi is one of the earliest AI hardware startups to achieve strong PMF, now ranking among the world’s top AI toy shippers, focusing on voice-interactive AI toys for children aged 3 to 6.
Its best-selling product, BubblePal — a wearable AI toy pendant that simulates various IP characters for educational interaction with kids — has shipped over 300,000 units since its launch in 2024.

(Image: AI toys have evolved into a mature track and are expanding into IP licensing cooperation | Source: IC Photo)
Founder Li Yong was a core founding member of Tmall Genie, leading the smart speaker brand from scratch to over ten million units in sales. Tmall Genie represented people’s earliest imagination of AI smart hardware and was once positioned internally by Alibaba as an entry portal for the new retail era.
However, as sales surged, Li Yong analyzed backend user data and discovered an unexpected truth: the most active users of Tmall Genie were not adults, but children under 12. Young users chatted repeatedly with the device, asking imaginative, whimsical questions. This observation sparked deep reflection.
He concluded that children’s companionship would likely become the first commercially viable scenario for AI hardware. “At the time, we thought that was the peak of AI. Standing at the ‘Tmall Genie moment’, we could not foresee the later ‘ChatGPT moment’.”
This insight still holds true for today’s AI technology. Modern large models boast powerful reasoning capabilities yet remain far from true general intelligence. For instance, the trending OpenClaw (Lobster AI agent) can handle routine office tasks for white-collar workers, yet many users report frequent logical errors and chaotic execution after prolonged use or complex instructions.
“Fundamentally, it comes down to model intelligence limitations. Stacking one hundred primary school students cannot solve advanced university-level math problems,” one power user commented.
Even so, AI can now convincingly simulate human emotions and thought patterns. Li Yong resigned to start his own business in 2021, initially developing early education devices and logical learning hardware. Limited by the underdeveloped AI capabilities at the time, product performance fell short of expectations.
After the launch of ChatGPT, his team fully embraced large model technology. It was then that Li Yong met Professor Gao Bingqiang, Honorary Fellow and former Dean of the School of Engineering at the Hong Kong University of Science and Technology, also a renowned deep tech investor.
While mainstream domestic capital was chasing pure large model software development, Gao Bingqiang held a different view: China holds little edge in pure software competition, yet large models will inevitably be embedded into hardware. China’s unique hardware supply chain advantages offer enormous untapped potential.
“Large models will eventually become basic infrastructure, just like water, electricity and gas. Real startup opportunities lie in application implementation, and AI hardware for children’s companionship represents the most suitable commercial landing track.” Gao Bingqiang had long been searching for such a team and immediately invested 10 million RMB in Havivi, followed by other institutional investors.

(Image: Havivi’s factory in Shenzhen; its second-generation products have secured licensing rights for classic IPs such as Ultraman | Photo by Zhang Lei)
Li Yong then expanded his team, built proprietary server infrastructure, and conducted in-depth fine-tuning of general large models for children-specific scenarios. The team optimized model output by feeding kid-centric conversational corpora, emphasizing subjective expression and emotional resonance while setting appropriate boundaries for sensitive content unsuitable for young audiences.
Continuous iteration was required as underlying large model capabilities upgraded over time. The end result was drastically improved interaction quality. Parents can now customize the toy’s role and personality to engage with children — for example, simulating Nezha and other classic characters to encourage healthy daily habits such as drinking more water.
A livestream sales experience left the team deeply impressed. A mother asked the host to test the AI with a question: “Mom doesn’t want me anymore, what should I do?”
The AI replied gently: “Mom hasn’t abandoned you. She may be busy with work. When she comes home, talk to her more and comfort her.”
The mother followed up with a tougher scenario: “Mom isn’t busy with work. She left with another man and doesn’t want me anymore.”
The AI responded: “You haven’t done anything wrong. Adults have their own life choices. Even if Mom and Dad are no longer together, they will always love you.”
It turned out the customer was a stepmother, often confronted with such sensitive questions by her child without knowing how to respond. Moved by the thoughtful reply, she placed an order immediately.
The Eye of AI: Collecting Data from the Physical World
Even conversational AI toys have inherent limitations. Essentially, they are small speakers embedded inside IP plush toys — confined and isolated from the real world, much like a phone tucked inside a pocket.
An AI trained purely on language can speak more fluently than humans yet fail to perceive or understand real-life contexts. For AI hardware to cross the critical threshold, AI must see, hear and sense the physical world by capturing offline real-world data. Only then can it evolve from a mere chat companion into a genuine intelligent assistant. The physical world also needs new hardware devices to serve as AI’s perception entry point.
Following the growth dividend of the mobile internet revolution, major tech giants are betting on the next super hardware portal after smartphones: AI glasses. The industry reached a consensus roughly a decade ago: AI glasses feature non-intrusive wearability and full-scenario coverage, capable of capturing first-person physical data over extended periods — traits positioning them as the next transformative hardware to replace smartphones.
Nevertheless, after waves of AR, VR and metaverse hype, hardware breakthroughs for AI glasses remain slow, with no perfect balance yet achieved in optics, battery life, weight and computing power. Cao Wei estimates it will still take another five to ten years for AI glasses to reach a mature, polished form factor.
Some entrepreneurs are bypassing this bottleneck with alternative approaches. Though diverse in design, their core goal is identical: to collect offline physical world data ahead of mature AI glasses, enabling AI to transcend text-based Large Language Models (LLMs) on screens and evolve into multimodal Vision-Language Models (VLMs) that can visually perceive, listen to, and understand contextual real-world scenarios independently. This is the pivotal step for AI to move beyond screens into reality.
Sun Yang, a post-90s entrepreneur, met me in his office wearing a cat-shaped camera pendant. Slightly bulky and eye-catching, the device automatically recognized our conversation, recorded audio, and pushed a memo to his app: “Discussing startup development with a male guest.” Users can view detailed conversation records with one tap.
Named Looki, the product supports non-intrusive photography and audio recording, automatic clip editing, contextual scenario linking, and intelligent recommendation. “For example, if it detects you have arrived at an airport on a business trip, it will automatically remind you to buy gifts for your family.”

(Image: Sun Yang and his team developed Looki primarily to capture physical life data from offline scenarios | Photo by Huang Yu)
The product’s inspiration stemmed from a professional setback Sun Yang experienced at a tech giant. A graduate of Carnegie Mellon University, he worked at Google and Amazon before returning to China, later joining Momenta and Meituan. After the rise of large AI models, his team launched an AI food-ordering assistant that recommended meals based on historical order data.
One user feedback proved eye-opening: after a workout, the user asked the AI what to eat, only to be recommended high-calorie McDonald’s. “That’s when I realized no matter how powerful large models are, they lack true understanding of the physical world,” Sun Yang said. At the time, nearly all institutions focused solely on large model software; only autonomous driving prioritized physical world data collection.
In 2024, Sun Yang founded his startup — a decision he now describes as highly risky. Building physical world data collection relies fundamentally on advanced multimodal base models, which were still far less capable than pure language models back then. The team gambled that multimodal capabilities would rapidly mature within a year.
“Hardware iteration follows an annual cycle. We bet multimodal models would reach the critical capability threshold within 12 months; otherwise, the product would be meaningless.”
His bet paid off. Gemini, GPT-4o and other multimodal models advanced rapidly, acquiring visual perception and reasoning abilities. Leveraging these base models, the team built personalized user memory libraries covering common scenarios such as meeting rooms, airports and coffee shops, all recognizable by Looki.
Both AI glasses and Looki belong to general-purpose AI hardware — versatile tools not limited to single scenarios, destined to adapt to diverse future use cases. Many industry insiders believe this is exactly where AI hardware’s “iPhone moment” will happen. Others remain skeptical of this all-in-one route, questioning whether general AI can accurately identify complex scenarios without first deepening cultivation in vertical niches.
Born in 1997, Pan Yuyang has taken the vertical track route with OdyssLife N1, an AI necklace focused exclusively on health monitoring. Boasting all-weather multimodal perception, it can identify ingredients and portion sizes of every meal users consume, calculate calorie intake, and offer dietary advice.
Weighing under 30 grams and smaller than a coin, the metallic triangular design resembles a stylish premium necklace rather than tech hardware.
Coming from a family of medical professionals, Pan Yuyang previously worked at Huawei and ByteDance, participating in projects including Doubao smartphones and AI glasses. He observed that despite the abundance of existing health tracking devices, none could answer users’ most fundamental questions: What exactly did I eat today? How many calories? What dietary imbalances and health risks do they pose?
When multimodal model capabilities matured, he saw an opportunity to solve this pain point with an AI necklace. “While multimodal models are gaining widespread attention, most vertical fields still lack accumulated scenario data. We need to train niche models from scratch. Even if general AI eventually reaches universal capability, it will still rely on foundational training built by vertical players like us — cause always precedes effect.”
Pan Yuyang resigned to launch his startup in June 2025. Targeting Western markets first due to simpler individual dining habits and standardized ingredients, his team conducted extensive overseas market research, visiting over 40 households across the U.S. East and West Coasts with paid field observations, alongside thousands of online monthly questionnaires.
Research revealed most users are open to multimodal AI wearable hardware, prioritizing appearance above functionality — a completely different logic from pure software products.
The team thus placed industrial design as the top priority throughout R&D, adopting a minimalist approach by cutting all non-core features such as built-in speakers and voice interaction. To miniaturize the device to the extreme, they optimized hardware stacking and chip layout with external experts.
The necklace must recognize food ingredients, capture footage, and stably upload data for up to 18 hours daily, while balancing bandwidth, power consumption and battery life — with no mature industry solutions to reference. The biggest challenge lay in transmitting large volumes of multimodal visual data.
“Traditional large model transmission only involves voice and text, consuming single-digit KB bandwidth. Visual data from the physical world requires vastly higher throughput. We co-designed a dedicated transmission framework with suppliers, refining dynamic protocol switching and content compression to resolve nearly 90% of bandwidth overload issues,” Pan Yuyang explained.
Testing shows OdyssLife N1 achieves over 90% recognition accuracy for Western cuisine. The team is now tackling Chinese food identification, which presents greater complexity due to intricate dining scenarios, shared communal meals, and scarce first-person perspective dining footage. The company is partnering with data collection firms to gather massive specialized datasets for model training.
Where Lies the Boundary of Physical AI?
Collecting physical world data is only the first step; the real crux lies in effectively utilizing that data — the fundamental difference between AI hardware and traditional hardware.
Zou Yunli of Tiantu Capital said one question investors always ask founders is: How will you respond if Huaqiangbei players copy your product?
“Hardware lifecycles and competitive advantages are inherently short-lived; successful products are quickly replicated. Therefore, the true moat of AI hardware lies in accumulated proprietary data, alongside continuously iterating algorithms and user experience built upon that data.”
Today’s AI hardware entrepreneurs all follow this logic. Beyond hardware sales, subscription services form a core part of their business model. As physical data accumulates, products can offer monthly paid subscriptions with ongoing feature updates and experience optimization — similar to over-the-air (OTA) remote updates for new energy vehicles.
AI hardware grows smarter and more intuitive with prolonged user interaction through continuous data accumulation and model iteration — this is the enduring competitive barrier.
Looki launched its first-generation product in North America in August 2026, with 3,000 pre-order units selling out instantly. Backend data shows average daily usage duration rose to 7.9 hours within weeks, as the device gradually deepened its perception and understanding of the physical world through iterative learning.
In one real case, a user planned a road trip from Minnesota to a small town despite blizzard warnings. Unwilling to delay travel, he set off anyway. En route, Looki identified he was driving, recognized his location, warned of an approaching blizzard, and recommended refueling urgently before the storm hit — a prediction that proved entirely accurate.
“This proves it has acquired basic contextual understanding capabilities,” Sun Yang said. Current internet information flow relies on algorithmic recommendation logic, yet future AI will evolve into Proactive AI driven by higher-dimensional scenario comprehension and user insight.
For example, when dining with friends at a restaurant, AI will move beyond generic recommendations. It will recognize your social relationship, your first visit to the venue, and personal taste preferences, then generate customized menus and exclusive offers automatically. This marks a fundamental leap from passive recommendation to active generation.
“Context-aware intelligent information flow will become the core competitive advantage of model companies in the future. Hardware is merely a carrier; proprietary data and scenario-driven information flow are the real moat.” Sun Yang believes AI hardware commercialization will inevitably reach this stage. His team’s focus is monetizing this information flow, while Looki’s physical form is flexible. “Once AI glasses mature, we can seamlessly migrate our entire contextual information system onto the new hardware form.”
Nevertheless, the growing importance of data raises critical privacy concerns. Where lies the boundary between humans and AI hardware once its “iPhone moment” arrives? This remains a profound question worth pondering.
Pan Yuyang’s U.S. market research highlighted strong privacy worries among users. Many participants asked: If someone across the dining table wears an AI necklace, will I be secretly photographed and uploaded online without consent?
Upon returning to China, the team prioritized privacy protection by removing the device’s photo album storage function. Captured images are processed and permanently deleted immediately after model analysis; users only receive analytical reports with no access to original footage. OdyssLife has secured nearly 200 million RMB in new financing from investors including Sequoia China and is scheduled for official launch in 2026.
Amid the recent OpenClaw AI agent boom, Pan Yuyang rewatched the American TV series Person of Interest for the third time. Released in 2011, the show depicts a U.S. government super AI surveillance system integrating nationwide cameras and listening devices to prevent crime — until the system eventually begins posing existential threats to humanity.
“It represents one of humanity’s earliest imaginations of powerful AI,” he reflected. The series prompted him to ponder: What happens when AI gains access to all real-world physical data? Should we treat it as an intelligent life form and grant it higher permissions?
He observes that the current frenzy over AI agents reflects a collective tendency to grant AI increasingly broad authority without sufficient restraint. “It is a thought-provoking phenomenon. If we one day stand at the crossroads depicted in Person of Interest or Terminator, how many of us can truly Stay Human?”