Unstructured, a pioneering AI firm, raises $40 million to enhance data preprocessing tools for large language models, aiming to bridge the gap in handling unstructured enterprise data.
In a significant move that underscores the burgeoning role of large language models (LLMs) in the world of data analytics and AI, San Francisco-based startup Unstructured has successfully secured $40 million in a Series B funding round. The round was led by Menlo Ventures, with pivotal contributions from Databricks Ventures, IBM Ventures, Sacramento Kings Chairman Vivek Ranadivé, Datastax CEO Chet Kapoor, Allison Pickens of the New Normal Fund, and NVentures, the venture capital arm of NVIDIA. Existing investors including Madrona, Bain Capital Ventures (BCV), and Mango Capital also joined this financing effort. This investment brings Unstructured’s total capital raised to an impressive $65 million. The company plans to use this fresh capital injection to expand its team and hasten the development of its pioneering data preprocessing tools for LLMs.
Unstructured’s innovation comes at a crucial time when over half of the global organizations have ramped up their investments in generative AI over the past year. Despite the transformative potential of generative AI, a significant challenge has emerged – the massive amount of enterprise data is largely unstructured. This encompasses a range of data types from emails and documents to images and videos – which accounts for over 80% of enterprise data – that organizations have historically found difficult to scale in machine learning applications. Unstructured tackles this head-on as the premier entity capable of ingesting and pre-processing all unstructured data into LLM-ready formats.
Founded in 2022, Unstructured swiftly positioned itself at the vanguard of enterprise LLM productization, enabling organizations to automate the transformation of their complex, unstructured data into formats essential for retrieval augmented generation (RAG) and LLM fine-tuning. Its technology has proven indispensable for delivering LLM-ready data and has achieved performance enhancements of over 20% across various LLM applications without necessitating any customizations. The company’s open source library has notably racked up more than 6 million downloads, servicing over 12,000 code bases and 45,000 organizations, including a significant portion of the Fortune 500 companies.
This January, Unstructured introduced its commercial SaaS API, which has already garnered over 1,000 paying customers. Furthermore, in February, they announced an enterprise platform that marks a pioneering solution for continuously extracting raw unstructured data from databases, transforming it into LLM-ready formats, and loading this data into a vector database for RAG. This innovation promises to drastically reduce the time developers and data scientists spend preparing data – previously pegged at over 75% – thereby streamlining the process of moving LLM pilots into production.
The significance of Unstructured’s advancements cannot be understated. With enterprises generating vast amounts of data daily, generative AI becomes crucial for deriving intelligent insights. Unstructured not only champions the efficiency and scalability of these processes but also aligns with the broader initiative of making AI technologies more accessible and impactful across industries. This has garnered enthusiastic support from venture partners and industry leaders, all of whom are optimistic about the revolutionary potentials of RAG and LLM for unstructured data management.
For companies eager to explore and harness the full capabilities of their data, Unstructured presents various solutions, ranging from open source tools to commercial platforms currently in beta. This promising technology paves the way for enterprises to leverage generative AI and LLMs more effectively, positioning Unstructured as a key player in the next wave of AI-driven business innovation.