I had fun trying to do this "in camera" without any editing after generation. I ended up using segmentation but after playing around with a lot of different options for what colors to use (trying stone, book, window, painting), I gave up and just made it black and white and still used segmentation.
Segmentation controlnet with an image of the cover template with different parts being different colors as the input. The segmentation controlnet makes Stablediffusion treats different colors as different things, so it keeps each part distinct. If you use colors that aren't in the actual color reference chart, it just sorta improvises, so I used black and white. On one I used the color that corresponds to painting,picture for the area inside the "frame" where the actual art is. Since I was doing it all in one generation with one prompt I just sorta had to accept what it gave me for the colors and textures of the text and borders.