DeepLabCut - a markerless tracking toolbox




Markerless tracking of user-defined features with deep learning. Alexander Mathis, Pranav Mamidanna, Taiga Abe, Kevin M. Cury, Venkatesh N. Murthy, Mackenzie W. Mathis* and Matthias Bethge* (*co-senior authors)

Videos related to our pre-print:
Automated tracking of mouse hand digits (Related to Figure 7)
Automated tracking of drosophila (Related to Figure 6)
Automated tracking during an odor guided navigation task (Related to Figure 2 & 3)

For more information:
Alexander Mathis -
Mackenzie Mathis -

Case Study 1: 95 images were used to train DeepLabCut to predict 22 labels on the chestnut horse (video 1). Automatic labeling was then performed on the full video of chestnut and a previously unseen brown horse (video 2).

Video 3 is taking DeepLabCut, first trained on the chestnut horse (video 1), then adding only 11 labeled frames of Justify on a race track, re-training briefly, and applying the automatic labels to the full video. Note the differences in background, change of viewpoint, as well as the different relative sizes of the horse in video 3 vs video 1.

Walking horses: data and human-annotation by Byron Rogers of Performance Genetics

video 1: Chestnut horse                        video 2: Brown horse                         video 3: Justify track practice

Case Study 2: rat skilled reaching assay from Dr. Daniel Leventhal's group at the University of Michigan. The data was collected during an automated pellet reaching task, and it was labeled by Dr. Daniel Leventhal. We used 180 labeled frames for training.

Case Study 3: Left: Mouse locomotion. Data, labeling, DeepLabCut training & video generation by Rick Warren in Dr. Nate Sawtell's lab at Columbia University. Shown here are the 3D movements from a head-fixed mouse running on a treadmill as collected by one camera (plus a mirror). One network was trained to detect the body parts in both views simultaneously. He used 825 frames of data for training (fewer labels would give similar performance). 

Right: Electric fish freely swimming with a tether. Data, labeling, DeepLabCut training & video generation by Avner Wallach, a post-doc at the Sawtell lab. He used 250 frames for training.


Case Study 4: James Bonaiuto, PhD (a postdoctoral fellow in the group of Dr. Pier F. Ferrari at the Institut des Sciences Cognitives, CNRS) trained three networks - one trained on each view with ~120 training frames per view. The 3D trajectories were extracted by using the camera calibration functionality in openCV to compute a projection matrix for each camera and then using these to reconstruct the 3D coordinates from the labeled 2D points in each view.