A review and empirical comparison of causal inference methods for clustered observational data with application to the evaluation of the effectiveness of medical devices