Leopard: A Vision Language Model for Text-Rich Multi-Image Tasks arxiv.org 5 points by PaulHoule 2 days ago