DeepLabCut: markerless pose estimation of user-defined body parts with deep learning
Mathis et al, Nature Neuroscience 2018 rdcu.be/4Rep
Quantifying behavior is crucial for many applications in neuroscience. Videography provides easy methods for the observation and recording of animal behavior in diverse settings, yet extracting particular aspects of a behavior for further analysis can be highly time consuming. In motor control studies, humans or other animals are often marked with reflective markers to assist with computer-based tracking, but markers are intrusive, and the number and location of the markers must be determined a priori. Here we present an efficient method for markerless pose estimation based on transfer learning with deep neural networks that achieves excellent results with minimal training data. We demonstrate the versatility of this framework by tracking various body parts in multiple species across a broad collection of behaviors. Remarkably, even when only a small number of frames are labeled (~200), the algorithm achieves excellent tracking performance on test frames that is comparable to human accuracy.
For more information:
Alexander Mathis - firstname.lastname@example.org
Mackenzie Mathis - email@example.com
Case Study 1: 95 images were used to train DeepLabCut to predict 22 labels on the chestnut horse (video 1). Automatic labeling was then performed on the full video of chestnut and a previously unseen brown horse (video 2).
Video 3 is taking DeepLabCut, first trained on the chestnut horse (video 1), then adding only 11 labeled frames of Justify on a race track, re-training briefly, and applying the automatic labels to the full video. Note the differences in background, change of viewpoint, as well as the different relative sizes of the horse in video 3 vs video 1.
Walking horses: data and human-annotation by Byron Rogers of Performance Genetics
video 1: Chestnut horse video 2: Brown horse video 3: Justify track practice
Case Study 3: Left: Mouse locomotion. Data, labeling, DeepLabCut training & video generation by Rick Warren in Dr. Nate Sawtell's lab at Columbia University. Shown here are the 3D movements from a head-fixed mouse running on a treadmill as collected by one camera (plus a mirror). One network was trained to detect the body parts in both views simultaneously. He used 825 frames of data for training (fewer labels would give similar performance). Here is more information and open source building designs for Rick’s KineMouse Wheel.
Right: Electric fish freely swimming with a tether. Data, labeling, DeepLabCut training & video generation by Avner Wallach, PhD, a post-doc at the Sawtell lab. He used 250 frames for training.
Case Study 4: James Bonaiuto, PhD (a postdoctoral fellow in the group of Dr. Pier F. Ferrari at the Institut des Sciences Cognitives, CNRS) trained three networks - one trained on each view with ~120 training frames per view. The 3D trajectories were extracted by using the camera calibration functionality in openCV to compute a projection matrix for each camera and then using these to reconstruct the 3D coordinates from the labeled 2D points in each view.
Case Study 5: Open field with objects and a patch cable. Korleki Akiti a PhD student in the laboratory of Prof. Nao Uchida, at Harvard University labeled data for mice in an open field setting, and then we tested DeepLabCut's ability to track the 4 parts on a mouse (snout, ears, tail base) in different light conditions. A test image under a normal lighting condition is on the left, and two challenging examples are shown in the middle and right panels:
please do not take images or videos from this website without providing credit to the authors of the videos!