Offline Speech Recognition (ASR)

OPERATIONAL MANUAL

Initialization • Click 'Initialize Python Runtime' to load Pyodide and scipy • Wait for status to show 'Ready' before proceeding Execution • ASR analysis runs automatically after initialization • Click 'Re-run Analysis' to execute noise reduction and WER calculation again • View real-time updates in the status indicator Layout • Python Implementation Section: View complete syntax-highlighted code (5 analysis steps) • Analysis Results Section: Interactive output with WER metrics, SNR charts, and recognition results

Python Implementation

This Python code runs in your browser using Pyodide (Python compiled to WebAssembly).

import numpy as np
from scipy import signal
import json

# ============================================================
# STEP 1: Generate Synthetic Airport Audio with Noise
# ============================================================
# Simulating 5 airport voice queries with varying noise levels
np.random.seed(42)

# Sample airport queries (word counts for WER calculation)
queries = [
    {"text": "where is the check in desk", "words": 5},
    {"text": "what time is my plane", "words": 5},
    {"text": "can you help me find my parents", "words": 7},
    {"text": "where can i check my suitcase", "words": 6},
    {"text": "please direct me to gate 23", "words": 6}
]

# Simulate noise levels (Signal-to-Noise Ratio in dB)
snr_levels = [20, 15, 10, 5, 0]  # Clean to very noisy

# ============================================================
# STEP 2: Audio Noise Reduction Pipeline
# ============================================================
def apply_noise_reduction(signal_data, sample_rate=16000):
    """
    Three-stage noise reduction pipeline for airport audio
    """
    # Stage 1: High-pass filter (remove low-frequency rumble)
    # Cutoff: 80 Hz (HVAC, distant engines)
    sos_hp = signal.butter(4, 80, btype='highpass', 
                           fs=sample_rate, output='sos')
    filtered = signal.sosfilt(sos_hp, signal_data)
    
    # Stage 2: Normalize dynamic range
    max_val = np.max(np.abs(filtered))
    if max_val > 0:
        filtered = filtered / max_val
    
    # Stage 3: Low-pass filter (remove high-frequency hiss)
    # Cutoff: 3800 Hz (human speech is mostly < 4 kHz)
    sos_lp = signal.butter(4, 3800, btype='lowpass', 
                           fs=sample_rate, output='sos')
    filtered = signal.sosfilt(sos_lp, filtered)
    
    return filtered

# ============================================================
# STEP 3: Word Error Rate (WER) Calculation
# ============================================================
def levenshtein_distance(ref, hyp):
    """
    Calculate edit distance between reference and hypothesis
    """
    ref_words = ref.split()
    hyp_words = hyp.split()
    
    m, n = len(ref_words), len(hyp_words)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Initialize base cases
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    
    # Fill DP table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if ref_words[i-1] == hyp_words[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(
                    dp[i-1][j],    # Deletion
                    dp[i][j-1],    # Insertion
                    dp[i-1][j-1]   # Substitution
                )
    
    return dp[m][n]

def calculate_wer(reference, hypothesis):
    """
    Word Error Rate = (Edits / Reference Words) * 100
    """
    distance = levenshtein_distance(reference, hypothesis)
    ref_words = len(reference.split())
    wer = (distance / ref_words) * 100 if ref_words > 0 else 0
    return round(wer, 1)

# ============================================================
# STEP 4: Simulate ASR Recognition with Noise Impact
# ============================================================
recognition_results = []

for i, query in enumerate(queries):
    ref = query["text"]
    snr = snr_levels[i]
    
    # Simulate recognition errors based on SNR
    if snr >= 15:
        hyp = ref  # Perfect recognition
        wer = 0.0
    elif snr >= 10:
        words = ref.split()
        if len(words) > 2:
            words[2] = words[2][:-1] + "d" if words[2].endswith("e") else words[2]
        hyp = " ".join(words)
        wer = calculate_wer(ref, hyp)
    elif snr >= 5:
        hyp = ref.replace("please", "police").replace("where", "were")
        wer = calculate_wer(ref, hyp)
    else:
        words = ref.split()
        hyp = " ".join(words[::2])
        wer = calculate_wer(ref, hyp)
    
    recognition_results.append({
        "query": ref,
        "snr_db": snr,
        "hypothesis": hyp,
        "wer": wer,
        "word_count": query["words"]
    })

# ============================================================
# STEP 5: Performance Analysis
# ============================================================
total_wer = sum(r["wer"] for r in recognition_results)
avg_wer = round(total_wer / len(recognition_results), 1)
accuracy = round(100 - avg_wer, 1)

clean_results = [r for r in recognition_results if r["snr_db"] >= 15]
noisy_results = [r for r in recognition_results if r["snr_db"] < 10]

clean_wer = sum(r["wer"] for r in clean_results) / len(clean_results) if clean_results else 0
noisy_wer = sum(r["wer"] for r in noisy_results) / len(noisy_results) if noisy_results else 0

# Export results
results = {
    "summary": {
        "avg_wer": avg_wer,
        "accuracy": accuracy,
        "clean_wer": round(clean_wer, 1),
        "noisy_wer": round(noisy_wer, 1),
        "total_queries": len(queries)
    },
    "queries": recognition_results,
    "snr_levels": snr_levels,
    "wer_by_snr": [r["wer"] for r in recognition_results]
}

print(f"RESULTS_JSON:{json.dumps(results)}")

⚠

Offline Mode

Python implementation is not available offline. Please connect to the internet to view the code.

Analysis Results

⚠

Offline Mode

Portfolio results are not available offline. Please connect to the internet to run the analysis.

Need AI Engineering?

From prototypes to production-grade systems.

Request Audit

Listening Through the Noise

Imagine trying to talk to a kiosk in a busy airport. Announcements are blaring and jets are taking off. This lab tests if a computer can still understand you.

The Challenge

Our ears naturally focus on one voice and tune out the rest (The Cocktail Party Problem). Computers find this incredibly hard. This demo proves that effective noise reduction is the difference between “Flight cancelled” and “Flight confirmed.”

The “Audio Sunglasses”

To help the computer focus, we filter the sound—like putting on sunglasses to cut through glare.

Rumble Remover (<80Hz): Blocks low sounds like HVAC hums.
Volume Knob (Dynamic Range): Boosts the voice so it doesn’t get lost.
Hiss Eraser (>3800Hz): Cuts high-pitched sharp noises.

Measuring Success

We count the mistakes using Word Error Rate (WER).

0% WER: Perfect understanding.
50% WER: Half the words are wrong.
The Math: We use the Levenshtein Distance to calculate the minimum edits (typos) needed to fix the computer’s guess.

Privacy First

This runs entirely in your browser using Vosk & Kaldi. No audio is ever sent to the cloud. This is critical for privacy and reliability—an airport kiosk must work even if the internet goes down.

References

[1] Povey, D., et al. (2011). “The Kaldi Speech Recognition Toolkit.” IEEE Workshop on Automatic Speech Recognition and Understanding. (The foundation of our offline engine).

[2] Levenshtein, V.I. (1966). “Binary codes capable of correcting deletions, insertions, and reversals.” Soviet Physics Doklady. (The algorithm we use to calculate error rates).

Request Audit