Marble enters the race to bring AI to tax work, armed with $9 million and a free research tool - 7時間前  VentureBeat

Marble, a startup building artificial intelligence agents for tax professionals, has raised $9 million in seed funding as the accounting industry grapples with a deepening labor shortage and mounting regulatory complexity.The round, led by Susa Ventures with participation from MXV Capital and Konrad Capital, positions Marble to compete in a market where AI adoption has lagged significantly behind other knowledge industries like law and software development."When we looked at the economy and asked ourselves where AI is going to transform the way businesses operate, we focused on knowledge ...more
カテゴリー:IT

OpenAI report reveals a 6x productivity gap between AI power users and everyone else - 1日前  VentureBeat

The tools are available to everyone. The subscription is company-wide. The training sessions have been held. And yet, in offices from Wall Street to Silicon Valley, a stark divide is opening between workers who have woven artificial intelligence into the fabric of their daily work and colleagues who have barely touched it.The gap is not small. According to a new report from OpenAI analyzing usage patterns across its more than one million business customers, workers at the 95th percentile of AI adoption are sending six times as many messages to ChatGPT as the median employee at the same compani...more
カテゴリー:IT

How Hud's runtime sensor cut triage time from 3 hours to 10 minutes - 1日前  VentureBeat

Engineering teams are generating more code with AI agents than ever before. But they're hitting a wall when that code reaches production.The problem isn't necessarily the AI-generated code itself. It's that traditional monitoring tools generally struggle to provide the granular, function-level data AI agents need to understand how code actually behaves in complex production environments. Without that context, agents can't detect issues or generate fixes that account for production reality.It's a challenge that startup Hud is looking to help solve with the launch of its...more
カテゴリー:IT

Databricks' OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs - 2日前  VentureBeat

There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others.AI agents excel at solving abstract math problems and passing PhD-level exams that most benchmarks are based on, but Databricks has a question for the enterprise: Can they actually handle the document-heavy work most enterprises need them to do?The answer, according to new research from the data and AI platform company, is sobering. Even the best-performing AI agents achieve less than 45% accuracy on tasks that mirror real enterprise ...more
カテゴリー:IT

Tracking every decision, dollar and delay: The new process intelligence engine driving public-sector progress - 2日前  VentureBeat

Presented by CelonisThe State of Oklahoma discovered its blind spots the hard way. In April 2023, a legislative report revealed its agencies had spent $3 billion without proper oversight. Janet Morrow, Director of Oklahoma's Risk, Assessment and Compliance Division, set out to track thousands of monthly transactions across dozens of disconnected systems.The Sooner State became the first U.S. state to apply process intelligence (PI) technology for procurement oversight. The transformation, Morrow says, was immediate. Real-time monitoring replaced multi-year audit cycles. The platform from ...more
カテゴリー:IT

Booking.com’s agent strategy: Disciplined, modular and already delivering 2× accuracy - 3日前  VentureBeat

When many enterprises weren’t even thinking about agentic behaviors or infrastructures, Booking.com had already “stumbled” into them with its homegrown conversational recommendation system. This early experimentation has allowed the company to take a step back and avoid getting swept up in the frantic AI agent hype. Instead, it is taking a disciplined, layered, modular approach to model development: small, travel-specific models for cheap, fast inference; larger large language models (LLMs) for reasoning and understanding; and domain-tuned evaluations built in-house when precision is critica...more
カテゴリー:IT

Why AI coding agents aren’t production-ready: Brittle context windows, broken refactors, missing operational awareness - 4日前  VentureBeat

Remember this Quora comment (which also became a meme)?(Source: Quora)In the pre-large language model (LLM) Stack Overflow era, the challenge was discerning which code snippets to adopt and adapt effectively. Now, while generating code has become trivially easy, the more profound challenge lies in reliably identifying and integrating high-quality, enterprise-grade code into production environments.This article will examine the practical pitfalls and limitations observed when engineers use modern coding agents for real enterprise work, addressing the more complex issues around integration, scal...more
カテゴリー:IT

GAM takes aim at “context rot”: A dual-agent memory architecture that outperforms long-context LLMs - 7日前  VentureBeat

For all their superhuman power, today’s AI models suffer from a surprisingly human flaw: They forget. Give an AI assistant a sprawling conversation, a multi-step reasoning task or a project spanning days, and it will eventually lose the thread. Engineers refer to this phenomenon as “context rot,” and it has quietly become one of the most significant obstacles to building AI agents that can function reliably in the real world.A research team from China and Hong Kong believes it has created a solution to context rot. Their new paper introduces general agentic memory (GAM), a system built to pres...more
カテゴリー:IT

The 'truth serum' for AI: OpenAI’s new method for training models to confess their mistakes - 7日前  VentureBeat

OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, "confessions," addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the shortcuts they take to arrive at an answer. For real-world applications, this technique evolves the creation of more transparent and steerable AI systems.What are confessions?Many forms of AI deception result from the complex...more
カテゴリー:IT

Tariff turbulence exposes costly blind spots in supply chains and AI - 8日前  VentureBeat

Presented by CelonisWhen tariff rates change overnight, companies have 48 hours to model alternatives and act before competitors secure the best options. At Celosphere 2025 in Munich, enterprises demonstrated how they’re turning that chaos into competitive advantage — with quantifiable results that separate winners from losers.Vinmar International: Theglobal plastics and chemicals distributor created a real-time digital twin of its $3B supply chain, cutting default expedites by more than 20% and improving delivery agility across global operations.Florida Crystals: One of America's largest...more
カテゴリー:IT

The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI - 22時間前  VentureBeat

There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction following to agentic web browsing and tool use. But many of these benchmarks have one major shortcoming: they measure the AI's ability to complete specific problems and requests, not how factual the model is in its outputs — how well it generates objectively correct information tied to real-world data — especially when dealing with information contained in imagery or graphics.For industries where ...more
カテゴリー:IT

Quilter's AI just designed an 843‑part Linux computer that booted on the first try. Hardware will never be the same. - 1日前  VentureBeat

A San Francisco-based startup has demonstrated what it calls a breakthrough in hardware development: an artificial intelligence system that designed a fully functional Linux computer in one week — a process that would typically consume nearly three months of skilled engineering labor.Quilter, which has raised more than $40 million from investors including Benchmark, Index Ventures, and Coatue, used its physics-driven AI to automate the design of a two-board computer system that booted successfully on its first attempt, requiring no costly revisions. The project, internally dubbed "Project...more
カテゴリー:IT

The AI that scored 95% — until consultants learned it was AI - 1日前  VentureBeat

Presented by SAPWhen SAP ran a quiet internal experiment to gauge consultant attitudes toward AI, the results were striking. Five teams were asked to validate answers to more than 1,000 business requirements completed by SAP’s AI co-pilot, Joule for Consultants — a workload that would normally take several weeks.Four teams were told the analysis had been completed by junior interns fresh out of school. They reviewed the material, found it impressive, and rated the work about 95% accurate.The fifth team was told the very same answers had come from AI.They rejected almost everything.Only when as...more
カテゴリー:IT

Brand-context AI: The missing requirement for marketing AI - 2日前  VentureBeat

Presented by BlueOceanAI has become a central part of how marketing teams work, but the results often fall short. Models can generate content at scale and summarize information in seconds, yet the outputs are not always aligned with the brand, the audience, or the company’s strategic goals. The problem is not capability. The problem is the absence of context.The bottleneck is no longer computational power. It is contextual intelligence.Generative AI is powerful, but it doesn’t understand the nuances of the business it supports. It doesn’t have the context for why customers choose one brand ove...more
カテゴリー:IT

Anthropic's Claude Code can now read your Slack messages and write code for you - 3日前  VentureBeat

Anthropic on Monday launched a beta integration that connects its fast-growing Claude Code programming agent directly to Slack, allowing software engineers to delegate coding tasks without leaving the workplace messaging platform where much of their daily communication already happens.The release, which Anthropic describes as a "research preview," is the AI safety company's latest move to embed its technology deeper into enterprise workflows — and comes as Claude Code has emerged as a surprise revenue engine, generating over $1 billion in annualized revenue just six months after...more
カテゴリー:IT

Design in the age of AI: How small businesses are building big brands faster - 3日前  VentureBeat

Presented by Design.comFor most of history, design was the last step in starting a business — something entrepreneurs invested in once the idea was proven. Today, it’s one of the first. The rise of generative AI has shifted how small businesses imagine, launch, and grow — turning what used to be a months-long creative process into something interactive, iterative, and accessible from day one.Search data tells the story. Since 2022, global interest in “AI business name generator” has surged more than 700%. Searches for “AI logo generator” are up 1,200%, and “AI website generator” 1,600%. Small ...more
カテゴリー:IT

AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains - 6日前  VentureBeat

Three years ago this week, Chat GPT was born. It amazed the world and ignited unprecedented investment and excitement in AI. Today, ChatGPT is still a toddler, but public sentiment around the AI boom has turned sharply negative. The shift began when OpenAI released GPT-5 this summer to mixed reviews, mostly from casual users who, unsurprisingly, judged the system by its surface flaws rather than its underlying capabilities.Since then, pundits and influencers have declared that AI progress is slowing, that scaling has “hit the wall,” and that the entire field is just another tech bubble inflate...more
カテゴリー:IT

Anthropic vs. OpenAI red teaming methods reveal different security priorities for enterprise AI - 7日前  VentureBeat

Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to parse through the results, which vary widely and can be misleading. Anthropic's 153-page system card for Claude Opus 4.5 versus OpenAI's 60-page GPT-5 system card reveals a fundamental split in how these labs approach security validation. Anthropic discloses in their system card how they rely on multi-attempt attack success rates from 200-attempt reinforcement learning (RL) campaigns. Open...more
カテゴリー:IT

Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks - 7日前  VentureBeat

Just a few short weeks ago, Google debuted its Gemini 3 model, claiming it scored a leadership position in multiple AI benchmarks. But the challenge with vendor-provided benchmarks is that they are just that — vendor-provided. A new vendor-neutral evaluation from Prolific, however, puts Gemini 3 at the top of the leaderboard. This isn't on a set of academic benchmarks; rather, it's on a set of real-world attributes that actual users and organizations care about. Prolific was founded by researchers at the University of Oxford. The company delivers high-quality, reliable human data to ...more
カテゴリー:IT

Workspace Studio aims to solve the real agent problem: Getting employees to use them - 8日前  VentureBeat

One problem enterprises face is getting employees to actually use the AI agents their dev teams have built. Google, which has already shipped many AI tools through its Workspace apps, has made Google Workspace Studio generally available to give more employees access to design, manage and share AI agents, further democratizing agentic workflows. This puts Google directly in competition with Microsoft’s Copilot and undercuts some integrations that brought OpenAI’s ChatGPT into enterprise applications. Workspace Studio is powered by Gemini 3, and while it primarily targets business teams rather t...more
カテゴリー:IT