Inference for instrument reliability and rater agreement within a multi-rater and longitudinal data setting