Text-Image Alignment in Diffusion Models: The Role of Attention Sink