Fluid Inference is an Applied AI Research Lab building the future of ambient intelligence. We believe intelligence should be embedded everywhere: in your applications, woven into your hardware, responding to the moments that matter.
Smaller models, built for specific tasks, outperform large models at what they're designed to do. We ship open-source models that embed directly into applications and hardware. We work with companies to train task-specific models optimized for their use cases. Intelligence that runs where it matters, with SDKs that make deployment simple.
We'll build tools that let anyone create and personalize models, not just ML engineers. Vibe coders and developers will build custom intelligence. We'll work with more customers to train task-specific models that outperform the giants for their needs.
Models will live in your environment and evolve with it. They'll learn through embedded fine-tuning and memory, adapting to each user over time. Intelligence that senses context, anticipates needs, and becomes genuinely personal. Present everywhere.
See what developers are saying
still not as fast as Core ML (~120x on M4 Pro) @Alex_tra_memory github.com/FluidInference…
This is @nvidia’s parakeet realtime eou model that is a streaming speech recognition model that’s also performs end of utterance detection. Credit to @fluidinference for the FluidAudio library and the CoreML version of the model.
Custom vocabulary support is now available in FluidAudio from @fluidinference
There’s a project that Spokenly and Slipbox utilise, they’re in the showcase section. I’ve used both of these on mobile and am really happy with the concept of a free unlimited local voice to text model that takes advantage of the Apple Neural Engine github.com/FluidInference…
Nvidia parakeet v3 is an insane model as well. Found a Swift library to bundle it into your apps. It runs entirely on device! github.com/FluidInference…
Got my Clawdbot transcribing WhatsApp voice notes in ~200ms using FluidAudio + Parakeet TDT on CoreML. Fully local on a Mac Mini, no cloud APIs. Transcription latency is basically instant.
There is two transcription engine you can use, the first one is from apple - preinstalled on macOS Tahoe. Second one is parakeet from FluidAudio, accessible on all devices, but you need to trigger the download model.
3: Hex turns voice into text with a hotkey—press-and-hold to transcribe, or double-tap to lock and paste anywhere on macOS (Apple Silicon). Open-source, on-device options via WhisperKit/FluidAudio. github.com/kitlangton/Hex github.com/kitlangton/Hex
Yep, noticed the same—FluidAudio + Parakeet on ANE is a sweet spot for low-latency + privacy. I built Hapi for that exact reason (staying fully local on M-series). Curious if you’ve benchmarked WER vs Whisper on your setup?
Parakeet is great. Transcribes at roughly 200x realtime on my M4 MacBook Pro. My current stack for local-only podcast transcription is Parakeet + FluidAudio for diarisation: github.com/HartreeWorks/s…
it seems like both Hex and Spokenly both use Parakeet (model from Nvidia) on FluidAudio ("a Swift SDK for fully local, low-latency audio AI on Apple devices") whats interesting is that FluidAudio an open source project that makes use of the M-series chips neural engine (ANE)
The CAM++ CoreML conversion I did myself, but the segmentation-3.0 model on CoreML is made possible thanks to the amazing FluidAudio project by FluidInference: github.com/FluidInference…