Human activity recognition in video : extending statistical features across time, space and semantic context