All of the reasons already mentioned, but also, your brain is hardwired to pay more attention to a face, and to a face expressing emotion even more in particular.
The first two panels establish a pattern that your brain subconsciously keys in on. At a glance even without registering the specific images and text, your brain instantly knows that panel 3 doesn’t fit in the pattern.