Each image's text is a sequence of clues to pixel-precise locations in the image.
For example, in the text for the first image (fox in foxhole), I think:
"spending your own time helping other fools get by" refers to the rabbit's nose because a rabbit is a fool and it's in front of the white treetrunk and there is a pair of clock hands connecting the rabbit's nose and the white treetrunk
"an old jewel on a chain of stone" refers to the top of the springhouse because the springhouse white roof arcs look like a chain and on top of the springhouse looks like a jewel and it's on a mountain chain.
Every 2 consecutive clues define 2 locations in the image. Draw a line through the 2 locations to the border. Each line has 2 ends in the border. Each end can be either a letter, a starpoint or blank space. The ordered accumulation of letters spells out the master riddle (no anagramming). A starpoint indicates a word-break. A blank space indicates nothing.
It is very difficult, ambiguous and time-consuming. The hardest thing is determining where the clue boundaries are in the texts. I count about 750 clues over the 16 images. There are very many visual red herrings.
I do not see clock hands either. I see 2 curved blades of grass. And the spring house roof does not look like a chain to me either. I never would have paired this text with the items in the picture. If this is the method, I guess I'm out, but I don't discourage any ideas - if this works and you can discover the location of the key, kudos! We have less than a year left & I want to see this thing found!