Leopard: A Vision Language Model for Text-Rich Multi-Image Tasks