Recently, the advancements in edge computing have boosted the deployment of video analysis systems based on deep learning, which breaks the limitation of the constrained communication and computing resources of… Click to show full abstract
Recently, the advancements in edge computing have boosted the deployment of video analysis systems based on deep learning, which breaks the limitation of the constrained communication and computing resources of local devices. However, processing multi-scene high-resolution video streams in crowd surveillance remains a significant challenge since it is difficult to formulate dynamic video content and communication environment to support offloading decisions. To bridge the gap between applications and modeling, this paper presents a Real-Time Cloud-edge-device Collaboration framework, which enables fast and accurate Crowd counting (RT3C) on the real dataset. RT3C comprises key frame detection, adaptive patch partition, patch encoder and decoder and computation offloading decision, designed to divide key frames into a minimum number of patches and determine the offloading location of patches. A Real-Time Multi-Agent Actor-Critic (RTMAAC) algorithm based on multi-agent reinforcement learning is proposed to decide whether to compute patches with a lightweight model on edge or a large model on cloud. Unlike traditional approaches ignoring the contents, RTMAAC is a dynamic online decision algorithm based on context of the network and video. Extensive experiments demonstrate that RT3C effectively discriminates the valid frames and optimizes offloading decisions in complex environments, outperforming other baseline algorithms on the two crowd counting datasets. In summary, RT3C provides a promising framework for multi-scene video streams, which can be extended to other applications to realize video computation based on deep models.
               
Click one of the above tabs to view related content.