Efficient Scaling of Language Models