I've been scouring this site because I thought that there was a question very close to this one in the past, but alas about the closest I can find is about tagging in general ("Tagging in Semantic Web"). I would think that whatever you find, you would want the representation of people in a group photo to be their FOAF profile, if available. You could do this simply with rdfs:seeAlso barring the use of some better property.
Eric, there is draft for a "Media Fragments URI 1.0" at W3C. It is still a draft, but should address what you describe (the document's main focus is video, but the "spatial dimension" should also apply on an image. I hope this helps.
AKtive Media is an ontology based cross-media annotation (Images and Text) system. Our goal is to automate the process of annoation by suggesting knowledge to the user in an interactive way while the user is annotating and hence minimizing user effort.
I don't know what ontology that is used on it. Also, it's a bit outdated (Last update: 2010-01-18).