How KriraAI Built AI Clinical Documentation for Healthcar

Divyang Mandani·Jun 16, 2026·14 min read·Insights

A physician at a leading healthcare enterprise was finishing patient notes at 11pm most nights. The clinical staff called it pajama time. Across the network, doctors spent roughly two hours in the electronic health record for every hour of care. That burden was not a minor inconvenience. It was a measurable driver of burnout, coding errors, and lost revenue. This is the engagement where KriraAI rebuilt that reality with AI clinical documentation. The client had data, clinicians, and an Epic EHR. What they lacked was a system that turned spoken encounters into accurate notes. We designed and shipped a production platform that listens, drafts, and grounds every note in clinical evidence. This blog covers the full story. It walks through the problem we solved and the system we built. It covers the architecture, the delivery journey, and the results the client measured after go-live.

The Problem KriraAI Was Called In To Solve

The operational reality at this health system was familiar to anyone in healthcare. Clinicians documented every encounter by hand inside the EHR. They typed during visits, which broke eye contact with patients. They finished the rest after hours, often at home. The documentation workflow was slow, inconsistent, and exhausting.

The data to support better documentation already existed but went unused. Every visit produced rich clinical conversation that was never captured structurally. Older dictation tools transcribed words but understood nothing clinical. Coding teams worked from incomplete notes and guessed at missing details.

The breakdown showed up in a few specific places across the network:

Clinicians spent close to two hours each day on after-hours documentation, which the staff called pajama time.
Dictation produced raw text with no structure, so coders could not trust it without manual review.
First-pass claim quality suffered because notes lacked the detail payers required for clean submission.
Two clinicians seeing similar cases produced very different documentation, which hurt continuity and analytics.

Those gaps carried a direct financial cost. The network's first-pass clean claim rate sat at 82 percent. Roughly one in nine claims was denied on first submission. Each denial triggered manual rework, appeals, and delayed payment. The denial rate alone represented millions in delayed and lost revenue each year.

The human cost was just as serious. Physicians reclaimed almost no time between patients. Documentation scored as the leading source of clinician burnout in internal surveys. Several specialists had already reduced their clinical hours to cope.

The competitive pressure made the status quo impossible to sustain. Larger systems were adopting ambient documentation and winning clinician loyalty. Recruitment suffered when physicians compared documentation workloads across employers. The client needed to reduce clinician documentation burden quickly and at scale. That requirement is what brought our team in.

What KriraAI Built

KriraAI built an ambient AI medical scribe platform for the health system. The system listens to each clinical encounter and produces a signed-ready note. It captures the conversation, understands it clinically, and drafts documentation in the EHR. Every note is grounded in evidence and checked for accuracy before review.

From Spoken Encounter to Signed Note

The flow begins when a clinician starts a visit from the EHR. A SMART on FHIR app launches inside the existing Epic workflow. Ambient audio streams to our pipeline as the visit happens. The platform never asks clinicians to change how they talk to patients.

Speech becomes text through a fine-tuned automatic speech recognition model. We adapted Whisper large-v3 on de-identified clinical audio. Speaker diarization separates the clinician, the patient, and any family present. The result is a clean, attributed transcript of the encounter.

A transformer based clinical encoder then extracts medical meaning. It identifies problems, medications, lab references, and procedures. A domain fine-tuned language model drafts the structured note from that understanding. The model writes in the SOAP format clinicians already expect.

Accuracy is where most documentation tools fail, so we engineered against it. Every drafted note runs through a retrieval augmented generation layer. That layer grounds clinical and coding statements in approved guidelines and payer policy. A verification model checks each claim in the note against the source transcript.

The finished draft returns to the EHR for clinician sign-off. The clinician reviews, edits if needed, and signs in seconds. The same structured output feeds the coding and documentation integrity workflow. Coders receive accurate, complete notes instead of fragments.

This platform replaced manual typing and blind dictation across the network. It augmented clinicians rather than removing their judgment. KriraAI designed every layer for production scale from the first day. The system now handles roughly 45,000 encounters per month.

Solution Architecture Behind Our AI Clinical Documentation Platform

The architecture behind our AI clinical documentation platform was built for scale, safety, and speed. We deployed everything inside the client's private cloud. No patient data ever left their controlled environment. Every layer was designed to handle real clinical load, not a demo.

[Figure 1: Reference architecture of the ambient documentation platform, from audio capture through EHR write-back.]

Data Ingestion and Pipeline Layer

Data entered the platform through several patterns at once. Ambient audio streamed over a secure channel into Apache Kafka. Change data capture from the EHR database used Debezium into Kafka Connect. Clinical context arrived through FHIR R4 APIs and HL7v2 feeds via Mirth Connect.

Apache Flink handled stream processing for time-sensitive work. It assembled partial transcripts and ran diarization windows. A de-identification stage stripped PHI at ingestion using clinical named entity recognition. Apache Airflow orchestrated batch DAGs for guideline refresh and retraining.

Documents were embedded at ingestion time for later retrieval. We generated clinical embeddings as guidelines and policies arrived. This kept the retrieval index current without nightly full rebuilds. The pipeline normalized everything into FHIR resources for consistency.

The AI and Machine Learning Core

The core combined several specialized models rather than one. Each model handled a distinct part of the documentation task. This separation kept every component accurate and independently improvable. It also let us retrain one model without disturbing the rest.

The machine learning core included these components:

A fine-tuned Whisper large-v3 model transcribed clinical speech at a 6.3 percent word error rate.
A pannote diarization model separated clinician, patient, and family speakers reliably.
A transformer based clinical encoder extracted problems, medications, labs, and procedures from the transcript.
A fine-tuned open-weight language model drafted SOAP notes after supervised fine-tuning and direct preference optimization.
A retrieval augmented generation layer grounded clinical and coding statements in approved guidelines and payer policy.
An entailment based verification model checked every note claim against the source transcript for faithfulness.

Retrieval augmented generation in healthcare demands precise grounding. We indexed 2.4 million guideline and payer policy chunks in Qdrant. The index used HNSW indexing for fast nearest neighbor search. Hybrid retrieval combined dense embeddings with BM25, then a cross-encoder reranked the candidates.

Serving was engineered for low latency at production scale. We served the language model with vLLM and continuous batching. INT4 quantization reduced memory while holding accuracy steady. Ray Serve orchestrated the models across an A100 GPU cluster.

Integration With the Client's EHR

The integration layer connected our AI outputs to the clinician workflow. We built the FHIR EHR integration as a SMART on FHIR application. It launched inside Epic without a separate login. CDS Hooks surfaced documentation suggestions at the point of care.

Service communication used the right protocol for each path. Internal services talked over gRPC for low latency. External contracts used versioned REST and GraphQL through a Kong gateway. Finalized notes are written back to Epic as drafts through webhook triggers.

Event-driven design kept the system responsive under load. Kafka topics carried events between stages without tight coupling. Each stage scaled independently as encounter volume rose. The FHIR EHR integration respected Epic rate limits through async batching.

Monitoring and Observability

We treated monitoring as a first-class part of the platform. Data drift detection used population stability index and KL divergence. We tracked input shifts in specialty mix, audio quality, and vocabulary. Alerts fired when feature distributions moved past defined thresholds.

Model quality was measured continuously, not just at launch. Word error rate ran against held-out gold transcripts each week. Note faithfulness was scored against clinician edits. Coding accuracy was checked against audited ground truth.

Latency and reliability were tracked at the percentile level. We monitored p50, p95, and p99 for end-to-end note generation. The stack used Prometheus, Grafana, and OpenTelemetry tracing. Evidently MLflow handled drift reports and the model registry, and automated triggers retrained models when quality crossed thresholds.

Security and Compliance

Security was designed for HIPAA from the first architecture review. The entire platform ran inside a private VPC with no public endpoints. All models were self-hosted, so no PHI reached any third-party API. Encryption used TLS 1.3 in transit and AES-256 at rest with KMS managed keys.

Access control was strict and attribute aware. Role-based access control combined with attribute-level data masking. Audit logging was written to an immutable append-only store. The deployment met SOC 2 Type II and HITRUST requirements.

Delivery and Clinician Interface

The delivery layer met clinicians where they already worked. The SMART on FHIR app embedded directly in Epic. A mobile companion captured ambient audio in the exam room. Clinicians reviewed, edited, and signed each draft note in seconds.

Separate interfaces served other teams. Coding and documentation integrity staff used a dedicated review dashboard. Operations teams watched a live observability dashboard. Every interface read from the same grounded, structured output.

The Technology Stack and Why We Chose It

Every technology in the stack was chosen for a clear reason. The client ran on AWS, so we built on HIPAA-eligible AWS services. This avoided new vendor risk and kept data inside one trust boundary. GPU workloads ran on EC2 with EKS for orchestration.

The stack was organized by layer:

The cloud layer used HIPAA-eligible AWS services with EKS, chosen to keep data inside one trust boundary.
The streaming layer used Apache Kafka and Apache Flink, chosen because clinical events arrive continuously.
The model and serving layer used fine-tuned open-weight models on vLLM, chosen to keep PHI inside the VPC.
The retrieval layer used Qdrant with HNSW indexing, chosen for fast recall across millions of chunks.
The integration layer used FHIR, CDS Hooks, gRPC, and a Kong gateway, chosen to fit the existing Epic environment.
The monitoring layer used Prometheus, Grafana, MLflow, and Evidently, chosen for full visibility into model and system health.
The security layer used a private VPC, KMS encryption, and HashiCorp Vault, chosen to meet HIPAA and HITRUST.

We chose self-hosted open-weight models over hosted APIs deliberately. A hosted clinical API would have sent PHI outside the client's control. Self-hosting kept all inference inside the private VPC. It also let us fine-tune freely on de-identified clinical data.

Each choice matched the client's scale, constraints, and existing environment. We avoided tools that would have added operational burden. The result was a stack the client's own team could run. KriraAI built it to be owned, not just delivered.

How We Delivered It: The Implementation Journey

We delivered this platform in clear phases over roughly nine months. Each phase had defined exit criteria before the next began. KriraAI runs every engagement with this delivery discipline. It keeps complex AI work predictable for the client.

The delivery moved through seven phases:

Discovery and requirements ran for four weeks, including clinician shadowing and baseline metric capture.
Architecture design ran for three weeks, covering security review and the FHIR integration blueprint.
Data foundation and model development ran for twelve weeks, building the de-identification pipeline and fine-tuned models.
Integration and testing ran in parallel, validating the SMART on FHIR app in an Epic sandbox.
Clinical validation ran for six weeks, with physician reviewers scoring note faithfulness in shadow mode.
Phased deployment ran for eight weeks, starting with internal medicine before expanding to other specialties.
Handover included runbooks, training, and full MLOps transfer to the client's own team.

The Challenges We Worked Through

No honest delivery story is free of hard problems. We hit several and solved each one in the open. The hardest issues were technical, and each forced a real design change.

Ambient audio quality was the first real challenge. Busy clinics produced overlapping speech and background noise. We added voice activity detection and acoustic preprocessing. We also enrolled clinician voice profiles to sharpen diarization.

Note faithfulness was the challenge that mattered most clinically. Early drafts occasionally stated details the transcript did not support. We added the entailment based verification layer to catch this. Unfaithful statements dropped sharply and consistency reached 98.4 percent.

Specialty generalization needed direct attention. A model tuned on primary care underperformed in cardiology. We trained specialty-specific LoRA adapters on top of the base model. Performance recovered across every specialty we rolled out.

EHR integration surfaced practical friction. Epic FHIR rate limits slowed write-back at peak hours. We moved write-back to async batching with retries. The FHIR EHR integration then held steady under full clinical load.

The Results the Client Achieved

The results were measured over the first six months after go-live. They were confirmed against the client's own baselines. Every number below reflects production performance, not a pilot. The platform changed both clinician experience and financial outcomes.

The measured outcomes included the following:

Documentation time per encounter fell from 16 minutes to under 4 minutes, a 75 percent reduction.
After-hours documentation, the pajama time clinicians dreaded, dropped by 62 percent.
First-pass clean claim rate rose from 82 percent to 94 percent.
The claim denial rate fell by 38 percent on first submission.
Physicians reclaimed close to 1.5 hours of their working day.
Note factual consistency reached 98.4 percent through the verification layer.
The platform processed roughly 45,000 encounters per month at a p95 latency of 41 seconds.

The financial impact followed directly from these gains. Fewer denials meant faster, fuller reimbursement. Better documentation captured the true complexity of each case. The client reached payback on the engagement within nine months.

The human impact was just as clear. Clinician burnout scores tied to documentation improved measurably. Several physicians restored clinical hours they had cut. The effort to reduce clinician documentation burden delivered exactly what the client needed.

What This Architecture Makes Possible Next

We built this platform to grow, not to sit still. The architecture scales horizontally as encounter volume rises. Kafka and Flink absorb higher throughput by adding partitions and workers. The GPU serving tier scales out with Ray Serve as demand grows.

New use cases sit naturally on the same foundation. The grounded retrieval layer already understands clinical guidelines and payer policy. Adding documentation for new specialties needs adapters, not a rebuild. The client can extend into prior authorization drafting on the same stack.

A platform like this future-proofs documentation because every layer is modular. The data pipeline, model core, and retrieval index evolve independently. New models can replace old ones without touching integration. That separation is what lets the system improve for years.

The client's roadmap now spans the next two to three years. Planned additions include coding automation and quality measure capture. Real-time clinical decision support sits on the same retrieval foundation. KriraAI designed the architecture so each step reuses what already exists.

Other healthcare organizations can apply the same principles directly. Self-host models to keep PHI controlled and compliant. Ground every generated output in approved clinical evidence. Treat monitoring and faithfulness as core engineering, not afterthoughts.

Conclusion

This engagement produced three insights worth carrying forward. The technical insight is that faithfulness must be engineered, not assumed. Grounding generation in evidence and verifying every claim is what made the notes safe. AI clinical documentation only earns clinical trust when it is provably accurate.

The operational insight is that adoption follows workflow fit. Clinicians embraced the platform because it lived inside Epic and changed nothing about how they talk to patients. The strategic insight is that a modular architecture compounds in value. Each new use case reused the same foundation instead of starting over.

KriraAI brings this same engineering rigor and delivery discipline to every client. We design production systems, not proofs of concept, and we hand them over fully owned. We treat security, faithfulness, and monitoring as core parts of the build. That is how we delivered measurable results in a domain where mistakes are not acceptable.

If you are facing a costly, data-rich problem that AI should be solving, bring it to KriraAI. We will design the architecture, build the system, and deliver outcomes you can measure. The conversation starts with your problem, not a product pitch.

FAQs

Ambient AI clinical documentation works by listening to the natural clinician and patient conversation. It turns that conversation into a structured note. In the platform KriraAI built, ambient audio streams to a fine-tuned speech recognition model that transcribes clinical speech accurately. A diarization model separates each speaker, and a clinical encoder extracts the medical meaning. A fine-tuned language model then drafts the note in SOAP format. A retrieval and verification layer grounds every clinical statement in approved guidelines. It also checks each statement against the transcript. Only then does a clinician review and sign the final note.

AI clinical documentation can be fully HIPAA compliant when it is engineered correctly. The platform KriraAI delivered was built to that standard. The entire system ran inside the client's private VPC with no public endpoints, so patient data never left their environment. All models were self-hosted, which meant no protected health information was sent to any third-party API. Encryption protected data in transit and at rest. Access used role-based controls with attribute-level masking. Audit logs were immutable. The deployment met SOC 2 Type II and HITRUST requirements as well.

AI can reduce clinician documentation time dramatically, and in this engagement the reduction was 75 percent per encounter. Documentation time fell from roughly 16 minutes per visit to under 4 minutes. That gain followed the go-live of the ambient AI medical scribe. After-hours documentation, often called pajama time, dropped by 62 percent across the network. Physicians reclaimed close to 1.5 hours of their working day. The exact savings depend on specialty and baseline workflow. A well grounded ambient platform still removes most of the manual typing burden. Clinicians stay in full control of the final note.

AI clinical documentation integrates with the EHR through standards based interfaces rather than fragile custom connections. The FHIR EHR integration KriraAI built used a SMART on FHIR application. That app launched directly inside Epic with no separate login. CDS Hooks surfaced suggestions at the point of care. Finalized notes are written back to the EHR as drafts through webhook triggers. Internal services communicated over gRPC for low latency, while external contracts used versioned REST and GraphQL. This standards based approach kept the integration stable. It respected Epic rate limits. It also stayed straightforward for the client to maintain.

Yes, AI clinical documentation reduces claim denials by improving the accuracy and completeness of the clinical record. In this engagement, the first-pass clean claim rate rose from 82 percent to 94 percent. The denial rate fell by 38 percent on first submission. The platform achieves this by grounding coding relevant statements in payer policy through retrieval augmented generation in healthcare. Complete, accurate notes give coding teams the detail they need on the first pass. Fewer denials mean faster reimbursement, less manual rework, and a record that reflects the true complexity of each patient case.

Divyang Mandani

Founder & CEO

Jun 16, 2026

Divyang Mandani is the CEO of KriraAI, driving innovative AI and IT solutions with a focus on transformative technology, ethical AI, and impactful digital strategies for businesses worldwide.

Ready to Write Your Success Story?

Do not wait for tomorrow; lets start building your future today. Get in touch with KriraAI and unlock a world of possibilities for your business. Your digital journey begins here - with KriraAI, where innovation knows no bounds.