WeXL AI

Building a Voice-Based Form Filling Assistant for Bahmni in Low-Resource Settings

Electronic Medical Records bring structure, consistency, and continuity to patient care. In low-resource hospitals, tools like Bahmni and OpenMRS have become essential because they are open, flexible, and can run reliably on constrained infrastructure. But one challenge persists everywhere: entering data into large clinical forms takes time.

Clinicians in Malawi, Zimbabwe, Ethiopia, and other low-connectivity environments often work with high patient volumes, shared computers, intermittent power, and limited time for documentation. Even experienced users find it slow to navigate multi-page forms while focusing on the patient.

Over the past year, voice technology has finally matured enough to make an alternative possible:
Speak naturally → AI fills the form.
This blog explains how we built a voice agent that acts as a new UI layer for Bahmni without changing Bahmni itself.


1. The Problem: Documentation Burden in Low-Resource Hospitals

Bahmni and OpenMRS offer robust clinical forms that capture detailed information—diagnosis, scores, vitals, surgical plans, medication, counselling notes, and follow-ups. This richness is valuable but comes with practical difficulties:

    • Typing and navigation slow down consultations.

    • Computers are often shared among many providers.

    • Connectivity may be unreliable, so offline support is essential.

    • Many clinicians prefer speaking to typing—especially during busy patient flows.

    • After-hours documentation becomes the norm in many hospitals.

The result is a familiar pattern: incomplete forms, delayed entry, and reduced data quality. The clinical workflow demands something more natural.

2. Voice Agents in 2025 — A Leap Beyond Transcription

Voice technology in 2025 is different from the early dictation systems we all remember. Modern voice agents are powered by real-time speech recognition, small/medium LLMs, and streaming pipelines that understand context rather than simply transcribing text.

A clinician can say:

“The child has left-sided clubfoot, Pirani score four point five. Started Ponseti casting today. Follow up next Thursday.”

The system not only transcribes the speech but extracts structured concepts:

    • Diagnosis: Left clubfoot

    • Score: Pirani = 4.5

    • Procedure: Ponseti casting

    • Follow-up date: Next Thursday

This shift means voice can finally act as a true interface, not an afterthought.

3. Pipecat: The Backbone of a Real-Time Voice Pipeline

To build the Voice Agent, we used Pipecat, a modern framework designed for constructing realtime AI voice pipelines. Pipecat was a good fit because:

    • It composes STT → LLM → business logic → TTS pipelines elegantly

    • It supports low-latency streaming interactions

    • It works with both open-source and commercial components

    • It can run on-premise without relying on external cloud services

    • It handles accents and noisy environments better than older stacks

    • It aligns well with the open, modular philosophy of Bahmni

Most importantly for low-resource deployments:
Pipecat can be fully self-hosted.

Hospitals running Bahmni on local networks can run the entire pipeline—STT, LLM, form extraction, and TTS—on their own machines, ensuring data privacy and high availability even during outages.

4. STT & TTS in 2025 — A Solved Problem with Great Options

The biggest enabler of a reliable voice agent today is the maturity of speech-to-text and text-to-speech technologies.

Open-source STT options:

    • Whisper / Whisper-turbo models

    • Vosk

    • Nvidia NeMo ASR

Commercial STT options (if bandwidth permits):

    • Deepgram

    • AssemblyAI

    • Azure Speech

    • Google Speech

These engines perform well even with African and South Asian accents, which historically were difficult for automated speech systems.

TTS options:

    • Coqui TTS (open source)

    • NeMo TTS

    • Azure or Google TTS (for cloud setups)

The critical point is that either path—open-source or commercial—works reliably. Hospitals choosing fully offline deployments can stay 100% self-contained using Whisper + Coqui. Those with stable bandwidth can use cloud engines to reduce local compute load.

STT/TTS are no longer the bottleneck. That allows us to focus on the meaningful piece: structuring medical data.

5. Making Bahmni Work with a Voice Agent — Without Changing Bahmni

One of the design goals was simple:

Do not modify Bahmni.
Do not modify OpenMRS concepts.
Do not redesign existing forms.

The voice agent should work as a new input layer, nothing more.

Here’s the architecture in plain terms:

    • Clinician Speaks

      Pipecat captures the audio stream.

    • Speech-to-Text (STT)

      Converts speech into text—on-prem or cloud.

    • LLM Extraction

      A model parses the text and extracts structured fields tied to OpenMRS concepts.

      Example:

      {"diagnosis": "Left clubfoot", "pirani_score": 4.5, "plan": "Ponseti casting"}

    • Mapping to Bahmni Concepts

      A simple mapping layer turns extracted values into Obs (observations) and form properties.

    • Bahmni REST API

      The Voice Agent sends the structured JSON to Bahmni’s existing API endpoints.

      Bahmni treats it exactly as if a user filled the form manually.

    • Optional Confirmation Step

      The agent speaks back:

      “I recorded left clubfoot with a Pirani score of 4.5. Should I submit the form?”

    • There are no changes to:

    • Bahmni UI

    • Bahmni ERP

    • OpenMRS core

    • Concept dictionaries

    • Existing workflows

    • This preserves maintainability across upgrades and keeps the system community-friendly.

6. Demo Video

This clip illustrates the end-to-end workflow:
A clinician speaks naturally, the agent extracts the information, and the encounter form fills itself instantly—all using normal Bahmni APIs.

 

7. Open-Source and Community Friendly

The entire approach stays true to Bahmni’s and OpenMRS’s ethos:

    • Uses open frameworks (Pipecat, Whisper, NeMo, Coqui)

    • Works with on-prem Bahmni installations

    • Modular architecture that others can extend

    • Easy to plug in different STT or LLM providers

    • Localizable for different languages

    • Clean separation from Bahmni codebase

    • Encourages experimentation in the global digital health community

Hospital implementers can fork, adapt, and evolve the agent for their own clinical forms or languages without needing to alter the EMR.

8. Why This Matters for Low-Resource Settings

Voice-based documentation can meaningfully improve workflows in environments where:

    • Computer access is limited

    • Clinicians handle long queues

    • Typing slows down patient care

    • Power and connectivity fluctuate

    • Staff availability is stretched thin

    • Local languages and accents vary

By reducing the time spent navigating forms, clinicians can focus more on the child sitting in front of them.
By improving completeness and consistency, hospitals gain better data for clinical decisions and reporting.
By keeping everything offline-capable and open-source, the solution remains practical for real-world deployments.

This is not about replacing existing Bahmni features.
It’s about removing friction and letting clinicians document naturally.

If the community finds this valuable, we hope it sparks new contributions, new workflows, and new ideas around voice interfaces integrated into open-source health systems.


About Us

WeXL builds practical, high-performance AI solutions designed for real-world constraints. Our work spans voice-driven clinical interfaces, low-resource machine learning pipelines, and modular integrations that run reliably in the demanding environments where open-source platforms like Bahmni and OpenMRS are deployed. As engineers and contributors in the public education and health ecosystem, we focus on adding value without adding complexity—augmenting existing systems rather than replacing them. Our approach is straightforward: build what works, keep it interoperable, and make it accessible to teams operating in remote hospitals, limited-bandwidth clinics, and community health settings. We believe technology should meet people where they are, and our tools reflect that philosophy.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *