Voice-Controlled Audio Visualizer

Multimodal Interaction

This experiment combines Audio Visualization with Voice Control to create a hands-free interactive experience. It addresses the challenge of modifying visualization parameters (color, shape, speed) while the user is engaging with the music/audio input.

Feature Extraction (Meyda.js)

Instead of simple volume-based animation, this visualizer extracts complex audio features in real-time:

Spectral Centroid: Represents the “brightness” of the sound. Mapped to the color temperature of the visuals.
Zero-Crossing Rate (ZCR): A measure of noisiness. Mapped to the rotation speed and jitter of the geometry.

Voice Commands (p5.speech)

The system listens for specific keywords to alter the rendering mode:

“Square” / “Circle” / “Triangle”: Changes the primitive geometry.
“Red” / “Blue”: Shifts the base color palette.

This demonstrates the integration of multiple web AI/DSP libraries (Meyda for DSP, p5.speech for ASR) in a single operational loop.

References

[1] Meyda. (n.d.). “Meyda: Audio Feature Extraction for JavaScript.” Retrieved from https://meyda.js.org/ (Library used for Spectral Centroid analysis).

[2] IDM NYU. (n.d.). “p5.speech.” Retrieved from https://idmnyu.github.io/p5.js-speech/ (Library used for Voice Command recognition).

[3] p5.js. (n.d.). “Reference.” Retrieved from https://p5js.org/reference/ (Core visualization library).

Need DSP Engineering?

Multimodal Interaction

Feature Extraction (Meyda.js)

Voice Commands (p5.speech)

References