Sound event detection (SED) is a reasonable choice in a number of application domains including cattle sheds, dense forests, or any dark environments where visual objects are usually concealed or… Click to show full abstract
Sound event detection (SED) is a reasonable choice in a number of application domains including cattle sheds, dense forests, or any dark environments where visual objects are usually concealed or invisible. This study presents an autonomous monitoring system based on sound characteristics developed for welfare management in large cattle farms. Two types of artificial audio datasets are prepared: the cow sound event dataset and the UrbanSound8K dataset, which are then used with various sound object detectors for real world implementation. Using a data-driven approach, a conventional convolutional neural network structure with certain improvements is first applied, and from there proceed to a two-stage visual object detection method for audio by treating acoustic signals as an RGB images. The object detection method achieves a higher quantitative evaluation score and more precise qualitative results than previous related studies. We conclude that visual object detection methods are more effective than currently-available CNN architectures for rare sound object detection. Indeed, an artificial data preparation strategy can provide a better method for addressing the problem of data scarcity and the annotation difficulties involved in rare sound event detection.
               
Click one of the above tabs to view related content.