CITI-DailyActivities 3D dataset


The CITI-DailyActivities 3D dataset comprises action videos of three modalities such as RGB videos, depth maps, and 3D skeleton structures. It contains fifteen daily activities including walk, sit down, sit still, use a TV remote, stand up, stand still, pick up books, carry books, put down books, carry a backpack, drop a backpack, make a phone call, drink water, wave hand, and clap, as shown in the below Figure.

Figure. one example from each of the fifteen daily activities included in this dataset.

The dataset has 481 sequences. Among them, 181 sequences contain outlier frames presenting in arbitrary locations and lasting for various durations. Ten actors, including eight males and two females, were recruited for building this dataset, and one of them is left-handed. Each activity is performed by each actor between two and five times. A Microsoft Kinect was used for the collection so that the RGB video, the depth maps, and the inferred skeletons of each activity sequence are all available. The skeleton structures in this work were extracted by using the Kinect for Windows SDK v1.8

*we provide various data formats for the action labels, and skeletal features in our dataset such as ".mat", ".txt", and ",npy"


Several challenge examples in the skeleton streams in this dataset are shown in the following videos, where the portions of the skeletons extracted with low confidence are drawn in yellow.



  • RGB-D images (480x640) [coming soon]
  • Depth images (320x240) [coming soon]
  • Skeletal joint Locations [.txt]
  • Normalized skeletal data (all the skeletal streams are with equal length) [.mat] [.npy]
  • Labels [.mat] [.txt] [.npy]

NOTE: The dataset contains 482 action examples, where action example #1 - #300 are the actions without outlier frames, and action example # 301 - # 481 are the actions with outlier frames.

Skeleton Format

The ordering of the joints is as follows:
No.01 ->   SHOULDER_LEFT   
No.02 ->   SHOULDER_RIGHT        
No.04 ->   SPINE        
No.05 ->   HIP_LEFT
No.06 ->   HIP_RIGHT    
No.07 ->   HIP_CENTER    
No.08 ->   ELBOW_LEFT    
No.09 ->   ELBOW_RIGHT
No.10 ->   WRIST_LEFT      
No.11 ->   WRIST_RIGHT    
No.12 ->   HAND_LEFT    
No.13 ->   HAND_RIG    
No.14 ->   KNEE_LEFT    
No.15 ->   KNEE_RIGHT    
No.16 ->   ANKLE_LEFT   
No.17 ->   ANKLE_RIGHT   
No.18 ->   FOOT_LEFT   
No.19 ->   FOOT_RIGHT    
No.20 ->   HEAD

Action Labels:

Label 01: walk 
Label 02: sit down
Label 03: sit still
Label 04: use a TV control
Label 05: stand up
Label 06: stand still
Label 07: pick up a book
Label 08: carry
Label 09: put down a book
Label 10: Put on a backpack
Label 11: take off a backpack
Label 12: talking on the phone
Label 13: drinking water
Label 14: waving hand 
Label 15: clap



If you make use of our CITI-DailyActivities 3D dataset in any form, please cite the following reference.

  title={Recognizing Partially Observed Human Actions by Observation Filtering and Completion},
  author={Lin, Shih-Yao and Lin, Yen-Yu and Chen, Chu-Song and Hung, Yi-Ping},
  journal={ACM Transaction on Multimedia Computing, Communications, and Applications (ACM TOMM)},