Image-to-Text Generation with GPT-2

Align CLIP's visual representation with GPT-2.

Interective huggingface space demo is here