LSB Audio Steganography | Yosef Shulman

OPERATIONAL MANUAL

Initialization • Click 'Initialize Python Runtime' to load Pyodide and NumPy • Wait for status to show 'Ready' before proceeding Execution • Steganography analysis runs automatically after initialization • Click 'Re-run Steganography' to execute LSB embedding and extraction again • View real-time updates in the status indicator Layout • Python Implementation Section: View complete syntax-highlighted code (5 algorithm steps) • Analysis Results Section: Interactive output with embedding results, SNR metrics, and LSB distribution charts

Python Implementation

This Python code runs in your browser using Pyodide (Python compiled to WebAssembly).

import numpy as np
import json

def text_to_binary(message):
    """Convert text message to binary string (8 bits per character)."""
    binary = ''.join(format(ord(char), '08b') for char in message)
    return binary

def binary_to_text(binary_string):
    """Convert binary string back to text."""
    chars = []
    for i in range(0, len(binary_string), 8):
        byte = binary_string[i:i+8]
        if len(byte) == 8:
            chars.append(chr(int(byte, 2)))
    return ''.join(chars)

def embed_message_lsb(audio_samples, message):
    """
    Embed message into audio using LSB steganography.
    Returns modified audio samples and metadata.
    """
    # Convert message to binary
    message_binary = text_to_binary(message)
    message_length = len(message_binary)
    
    # Check if audio has enough samples
    if message_length > len(audio_samples):
        raise ValueError(f"Message too long. Need {message_length} samples, have {len(audio_samples)}")
    
    # Create copy of audio
    stego_audio = audio_samples.copy()
    
    # Embed message bits into LSB of audio samples
    for i in range(message_length):
        # Get the audio sample as integer
        sample = int(stego_audio[i])
        # Get message bit
        message_bit = int(message_binary[i])
        # Clear LSB and set to message bit
        sample = (sample & ~1) | message_bit
        stego_audio[i] = sample
    
    return stego_audio, message_length

def extract_message_lsb(stego_audio, message_length):
    """
    Extract hidden message from audio using LSB steganography.
    """
    # Extract LSBs from required number of samples
    binary_message = ''
    for i in range(message_length):
        sample = int(stego_audio[i])
        lsb = sample & 1
        binary_message += str(lsb)
    
    # Convert binary to text
    message = binary_to_text(binary_message)
    return message

def calculate_embedding_stats(original_audio, stego_audio, message_length):
    """Calculate statistics about the embedding process."""
    # Calculate SNR (Signal-to-Noise Ratio)
    diff = original_audio[:message_length] - stego_audio[:message_length]
    signal_power = np.mean(original_audio[:message_length].astype(float) ** 2)
    noise_power = np.mean(diff.astype(float) ** 2)
    
    # Handle edge cases for SNR calculation
    if noise_power > 0 and signal_power > 0:
        snr_db = 10 * np.log10(signal_power / noise_power)
        # Cap at reasonable max value
        if np.isinf(snr_db) or snr_db > 200:
            snr_db = 200.0
        elif np.isnan(snr_db):
            snr_db = 0.0
    else:
        snr_db = 200.0  # Perfect embedding (no noise)
    
    # Calculate capacity and usage
    total_capacity_bits = len(original_audio)
    used_bits = message_length
    usage_percent = (used_bits / total_capacity_bits) * 100
    
    # Count modified samples
    modified_samples = np.sum(original_audio[:message_length] != stego_audio[:message_length])
    modification_rate = (modified_samples / message_length) * 100 if message_length > 0 else 0.0
    
    return {
        'snr_db': float(snr_db),
        'capacity_bits': int(total_capacity_bits),
        'used_bits': int(used_bits),
        'usage_percent': float(usage_percent),
        'modified_samples': int(modified_samples),
        'modification_rate': float(modification_rate)
    }

def analyze_bit_distribution(stego_audio, message_length):
    """Analyze the distribution of LSBs in the stego audio."""
    lsbs = []
    for i in range(message_length):
        sample = int(stego_audio[i])
        lsb = sample & 1
        lsbs.append(lsb)
    
    ones = sum(lsbs)
    zeros = len(lsbs) - ones
    
    return {
        'ones': int(ones),
        'zeros': int(zeros),
        'ratio': float(ones / len(lsbs)) if len(lsbs) > 0 else 0
    }

# Demo: Create synthetic audio signal
sample_rate = 16000
duration = 2.0  # seconds
num_samples = int(sample_rate * duration)

# Generate simple sine wave as carrier audio
frequency = 440  # A4 note
t = np.linspace(0, duration, num_samples, endpoint=False)
original_audio = np.sin(2 * np.pi * frequency * t)

# Scale to 16-bit integer range
original_audio = (original_audio * 32767).astype(np.int16)

# Secret message to embed
secret_message = "It always seems impossible until it's done"

# EMBEDDING PHASE
stego_audio, message_length = embed_message_lsb(original_audio, secret_message)

# EXTRACTION PHASE
extracted_message = extract_message_lsb(stego_audio, message_length)

# ANALYSIS
stats = calculate_embedding_stats(original_audio, stego_audio, message_length)
bit_dist = analyze_bit_distribution(stego_audio, message_length)

# Calculate sample differences for visualization
sample_indices = list(range(min(100, message_length)))
original_samples = [int(original_audio[i]) for i in sample_indices]
stego_samples = [int(stego_audio[i]) for i in sample_indices]
differences = [int(stego_audio[i] - original_audio[i]) for i in sample_indices]

# Prepare results
results = {
    'original_message': secret_message,
    'extracted_message': extracted_message,
    'message_length_chars': len(secret_message),
    'message_length_bits': message_length,
    'match': extracted_message == secret_message,
    'stats': stats,
    'bit_distribution': bit_dist,
    'sample_data': {
        'indices': sample_indices,
        'original': original_samples,
        'stego': stego_samples,
        'differences': differences
    }
}

print("RESULTS_JSON:" + json.dumps(results))

⚠

Offline Mode

Python implementation is not available offline. Please connect to the internet to view the code.

Analysis Results

⚠

Offline Mode

Portfolio results are not available offline. Please connect to the internet to run the analysis.

Need AI Engineering?

From prototypes to production-grade systems.

Request Audit

The Art of Invisible Ink

Encryption hides the content of a message. Steganography hides the existence of the message itself. This lab implements Least Significant Bit (LSB) encoding—a digital equivalent of writing in invisible ink.

How It Works

Digital audio is stored as a series of 16-bit numbers (samples).

Original Sample: 10101100 10101100 (Amplitude: 44200)
Modified Sample: 10101100 10101101 (Amplitude: 44201)

Changing the last bit changes the volume by 1/65536th. This microscopic change is mathematically retrievable but biologically inaudible (SNR > 60dB).

The Trade-off: Fragility vs. Capacity

LSB is high-capacity (up to ~5kb/s) and computationally instant. However, it is extremely fragile.

Weakness: Any compression (MP3, AAC) or volume change destroys the secret bits.
Detection: While invisible to the ear, it leaves a statistical fingerprint. In the “Output Analysis” panel, watch the Bit Distribution chart. Natural audio has uneven bit distribution; encrypted messages look like perfect static (50/50 distribution of 0s and 1s).

Modern Context

While LSB is the classic educational example, modern military-grade steganography uses Deep Learning to hide data in the frequency domain (spectrograms), creating changes that survive MP3 compression and are harder to detect statistically.

References

[1] J. Ros, M. Geleta, J. Pons, and X. Giro-i-Nieto, “Towards Robust Image-in-Audio Deep Steganography,” arXiv:2303.05007 [cs.CR], Mar. 2023. Available: https://arxiv.org/abs/2303.05007

Request Audit