WeXL AI

Building a Voice-Based Form Filling Assistant for Bahmni in Low-Resource Settings

Building a Voice-Based Form Filling Assistant for Bahmni in Low-Resource Settings Electronic Medical Records bring structure, consistency, and continuity to patient care. In low-resource hospitals, tools like Bahmni and OpenMRS have become essential because they are open, flexible, and can run reliably on constrained infrastructure. But one challenge persists everywhere: entering data into large clinical forms takes time. Clinicians in Malawi, Zimbabwe, Ethiopia, and other low-connectivity environments often work with high patient volumes, shared computers, intermittent power, and limited time for documentation. Even experienced users find it slow to navigate multi-page forms while focusing on the patient. Over the past year, voice technology has finally matured enough to make an alternative possible:Speak naturally → AI fills the form.This blog explains how we built a voice agent that acts as a new UI layer for Bahmni without changing Bahmni itself. 1. The Problem: Documentation Burden in Low-Resource Hospitals Bahmni and OpenMRS offer robust clinical forms that capture detailed information—diagnosis, scores, vitals, surgical plans, medication, counselling notes, and follow-ups. This richness is valuable but comes with practical difficulties: Typing and navigation slow down consultations. Computers are often shared among many providers. Connectivity may be unreliable, so offline support is essential. Many clinicians prefer speaking to typing—especially during busy patient flows. After-hours documentation becomes the norm in many hospitals. The result is a familiar pattern: incomplete forms, delayed entry, and reduced data quality. The clinical workflow demands something more natural. 2. Voice Agents in 2025 — A Leap Beyond Transcription Voice technology in 2025 is different from the early dictation systems we all remember. Modern voice agents are powered by real-time speech recognition, small/medium LLMs, and streaming pipelines that understand context rather than simply transcribing text. A clinician can say: “The child has left-sided clubfoot, Pirani score four point five. Started Ponseti casting today. Follow up next Thursday.” The system not only transcribes the speech but extracts structured concepts: Diagnosis: Left clubfoot Score: Pirani = 4.5 Procedure: Ponseti casting Follow-up date: Next Thursday This shift means voice can finally act as a true interface, not an afterthought. 3. Pipecat: The Backbone of a Real-Time Voice Pipeline To build the Voice Agent, we used Pipecat, a modern framework designed for constructing realtime AI voice pipelines. Pipecat was a good fit because: It composes STT → LLM → business logic → TTS pipelines elegantly It supports low-latency streaming interactions It works with both open-source and commercial components It can run on-premise without relying on external cloud services It handles accents and noisy environments better than older stacks It aligns well with the open, modular philosophy of Bahmni Most importantly for low-resource deployments:Pipecat can be fully self-hosted. Hospitals running Bahmni on local networks can run the entire pipeline—STT, LLM, form extraction, and TTS—on their own machines, ensuring data privacy and high availability even during outages. 4. STT & TTS in 2025 — A Solved Problem with Great Options The biggest enabler of a reliable voice agent today is the maturity of speech-to-text and text-to-speech technologies. Open-source STT options: Whisper / Whisper-turbo models Vosk Nvidia NeMo ASR Commercial STT options (if bandwidth permits): Deepgram AssemblyAI Azure Speech Google Speech These engines perform well even with African and South Asian accents, which historically were difficult for automated speech systems. TTS options: Coqui TTS (open source) NeMo TTS Azure or Google TTS (for cloud setups) The critical point is that either path—open-source or commercial—works reliably. Hospitals choosing fully offline deployments can stay 100% self-contained using Whisper + Coqui. Those with stable bandwidth can use cloud engines to reduce local compute load. STT/TTS are no longer the bottleneck. That allows us to focus on the meaningful piece: structuring medical data. 5. Making Bahmni Work with a Voice Agent — Without Changing Bahmni One of the design goals was simple: Do not modify Bahmni.Do not modify OpenMRS concepts.Do not redesign existing forms. The voice agent should work as a new input layer, nothing more. Here’s the architecture in plain terms: Clinician Speaks Pipecat captures the audio stream. Speech-to-Text (STT) Converts speech into text—on-prem or cloud. LLM Extraction A model parses the text and extracts structured fields tied to OpenMRS concepts. Example: {“diagnosis”: “Left clubfoot”, “pirani_score”: 4.5, “plan”: “Ponseti casting”} Mapping to Bahmni Concepts A simple mapping layer turns extracted values into Obs (observations) and form properties. Bahmni REST API The Voice Agent sends the structured JSON to Bahmni’s existing API endpoints. Bahmni treats it exactly as if a user filled the form manually. Optional Confirmation Step The agent speaks back: “I recorded left clubfoot with a Pirani score of 4.5. Should I submit the form?” There are no changes to: Bahmni UI Bahmni ERP OpenMRS core Concept dictionaries Existing workflows This preserves maintainability across upgrades and keeps the system community-friendly. 6. Demo Video This clip illustrates the end-to-end workflow:A clinician speaks naturally, the agent extracts the information, and the encounter form fills itself instantly—all using normal Bahmni APIs.   7. Open-Source and Community Friendly The entire approach stays true to Bahmni’s and OpenMRS’s ethos: Uses open frameworks (Pipecat, Whisper, NeMo, Coqui) Works with on-prem Bahmni installations Modular architecture that others can extend Easy to plug in different STT or LLM providers Localizable for different languages Clean separation from Bahmni codebase Encourages experimentation in the global digital health community Hospital implementers can fork, adapt, and evolve the agent for their own clinical forms or languages without needing to alter the EMR. 8. Why This Matters for Low-Resource Settings Voice-based documentation can meaningfully improve workflows in environments where: Computer access is limited Clinicians handle long queues Typing slows down patient care Power and connectivity fluctuate Staff availability is stretched thin Local languages and accents vary By reducing the time spent navigating forms, clinicians can focus more on the child sitting in front of them.By improving completeness and consistency, hospitals gain better data for clinical decisions and reporting.By keeping everything offline-capable and open-source, the solution remains practical for real-world deployments. This is not about replacing existing Bahmni features.It’s about removing friction and letting clinicians document naturally. If the community finds this valuable, we hope