At present, the most widely used human pose estimation is in intelligent monitoring. With the improvement of people’s security awareness and the increasing maturity of monitoring technology, the application field of intelligent monitoring system is expanding. The difference between intelligent monitoring and ordinary monitoring is mainly that it estimates the human body posture algorithm, judges the human body posture in the monitoring screen scene, extracts the key information therein, and prompts the user in time when abnormal behavior occurs.
Original link: https://blog.csdn.net/qq_34351067/article/details/124258600
The source code mainly identifies abnormal behaviors of multiple people (up to 5 people): fighting and falling, walking, standing. This source code can be run on a limited-time free cloud GPU, or it can be run on a local computer, and supports self-built datasets and action detection training. After-sales plus QQ
In computer vision, human posture estimation refers to the accurate detection and effective combination of various joints and rigid parts of the human body based on video and image information, the purpose of which is to obtain the position of each key point of the human body, and after getting the correct position, the key points are correctly connected to form the human skeleton information, and the follow-up research can use the skeleton information to analyze the human action and behavior, and we can subdivide this problem into four tasks:
Single-Person Skeleton Estimation
Multi-person Pose Estimation
Video Pose Tracking
3D Skeleton Estimation
Human pose estimation has broad application prospects in the fields of human-computer interaction, intelligent monitoring, virtual reality and motion analysis. This article mainly introduces the application in the field of human-computer interaction and intelligent monitoring.
Human-computer interaction: Human-computer interaction refers to the interaction and communication between humans and machines, with the aim of enabling robots to understand and imitate human language and behavior, so that humans can interact with robots more effectively and naturally. In order to achieve natural interaction, traditional input and output methods are far from sufficient, and the interaction between people depends to a large extent on speech and vision, so human-machine interaction is bound to develop along the direction of speech and visual interaction. Pose estimation as computer vision.
At present, the most widely used human pose estimation is in intelligent monitoring. With the improvement of people’s security awareness and the increasing maturity of monitoring technology, the application field of intelligent monitoring system is expanding. The difference between intelligent monitoring and ordinary monitoring is mainly that it embeds the human posture estimation technology into the video server, uses the algorithm to estimate and judge the human posture in the monitoring screen scene, extracts the key information therein, and sends an alarm to the user in time when abnormal behavior occurs. Intelligent monitoring can be applied to campuses, prisons, homes, hospitals and other scenarios, for example, the introduction of intelligent monitoring into the campus, the intelligent monitoring system uses human posture estimation to monitor the psychological state of students, effectively preventing the occurrence of campus violence.
Attitude estimation can also be applied to sports events, queue scoring, intelligent driving, store retail, etc. For example, in the field of sports, an artificial intelligence coaching system is established to help athletes adjust professional movements and provide athletes with a personalized sports training experience.
Overview of custom datasets:
Based on the generated dataset#fall1.mp4 ==0 fight1.mp4==1 stand1.mp4==2 walk1.mp4==3
The key point model was used to extract the skeletal key points from each video (video format, 720p MP4) to make a dataset.
fall, 0
fight, 1
stand, 2
walk1, 3
The input for a single frame (where J knuckles) is stored as:
[ j0_x,j0_y,j1_x,j1_y,j2_x,j2_y,j3_x,j3_y,j4_x,j4_y,j5_x,j5_y,j6_x,j6_y,j7_x,j7_y,j8_x,j8_y,j9_x,j9_y,j10_x,j10_y,j11_x, j12_y, j12_y,j13_x,j13_y,j14_x,j14_y,j15_x,j15_y,j16_x,j16_y]
For the following experiments, very little preprocessing was done on the dataset. The following steps were taken:
The key model runs on a single frame, for each subject, action, and view, and outputs 17 joint x and y position keys and the accuracy of each frame Converted to txt format, preserving only the x and y positions of each frame, the actions performed during the frame, and the order of the frames. This is used to create a database of joint 2D positions associated with active category numbers and corresponding series without further prepending.
Solution Description:
Use key point detection and multi-target attitude tracking to obtain the pedestrian and tracking ID serial numbers in the video input, and identify each person’s actions separately.
Each person was intercepted and used to obtain the corresponding 17 bone feature points, and the order and type of bone feature points were consistent with COCO.
The target pedestrian corresponding to each tracking ID accumulates the skeletal feature point results to form the time series key point sequence of the character. When a predetermined number of frames is accumulated, the action detection model determines the action type of the sequence of time series keys and outputs the action label of each person.
It can be run directly on a limited-time free cloud GPU
Support custom datasets and train your own action categories, and support local camera inference.
Custom dataset method: Find a single person’s single action type of video, don’t find the lens that has been switching, to find the lens that is basically unchanged like surveillance, different videos can be split and spliced into a long video with a win10 video editor, there are several action types to splice a few long videos, the video format is mp4,720p.
Original blog address:
demo video