Distributed Multi-Speaker Voice Activity Detection for Wireless Acoustic Sensor Networks


A distributed multi-speaker voice activity detection (DM-VAD) method for wireless acoustic sensor networks (WASNs) is proposed. DM-VAD is required in many signal processing applications, e.g. distributed speech enhancement based on multi-channel Wiener filtering, but is non-existent up to date. The proposed method neither requires a fusion center nor prior knowledge about the node positions, microphone array orientations or the number of observed sources. It consists of two steps: (i) distributed source-specific energy signal unmixing (ii) energy signal based voice activity detection. Existing computationally efficient methods to extract source-specific energy signals from the mixed observations, e.g., multiplicative non-negative independent component analysis (MNICA) quickly loose performance with an increasing number of sources, and require a fusion center. To overcome these limitations, we introduce a distributed energy signal unmixing method based on a source-specific node clustering method to locate the nodes around each source. To determine the number of sources that are observed in the WASN, a source enumeration method that uses a Lasso penalized Poisson generalized linear model is developed. Each identified cluster estimates the energy signal of a single (dominant) source by applying a two-component MNICA. The VAD problem is transformed into a clustering task, by extracting features from the energy signals and applying K-means type clustering algorithms. All steps of the proposed method are evaluated using numerical experiments. A VAD accuracy of $> 85 \%$ is achieved for a challenging scenario where 20 nodes observe 7 sources in a simulated reverberant rectangular room.
Submitted 16 Mar 2017 to Methodology [stat.ME]
Published 20 Mar 2017