For the image of the cat, we compared the scores from the ONNX-based model with those from clip.cpp.
It appears that the ONNX model doesn't provide accurate output. Although we understand that the scores may not match exactly, the relative order should ideally be the same.
clip.cpp output
clip-android-demo (ONNX) output #2
