Stable Diffusion Prompt Guide: Basic to Advanced & Examples

Updated on April 30, 2025

Stable Diffusion continues to push the boundaries of AI image generation, with recent advancements like Stable Diffusion 3.5 Large setting new benchmarks for both image quality and, crucially, prompt adherence. This latest version is proving market-leading in understanding and executing complex instructions, rivaling much larger models.

But are you truly leveraging the full power of these advanced models? The key lies in mastering the art of the prompt. Prompts are more than just commands; they are the precise creative directions that guide Stable Diffusion to transform your artistic vision into unique, high-quality images that perfectly match your desired aesthetic, style, and subject matter.

In this comprehensive guide, I will walk you through the process of crafting effective Stable Diffusion prompts, from foundational concepts to advanced techniques. You'll learn how to move beyond simple keyword lists to build prompts that unleash the full potential of Stable Diffusion's AI, allowing you to bring your most imaginative ideas to life with unparalleled precision and control.

How to Write Stable Diffusion Prompts: From Basic to Advanced

While a single word can indeed generate an image, relying solely on the AI to fill in details offers less direct control. To truly dictate the final output and take advantage of prompt adherence in models like Stable Diffusion 3.5, it's essential to understand how to use keywords and structure your prompts effectively. This guide will show you how to detail exactly what you want to see by referencing and strategically incorporating elements from various keyword categories.

Subject
Medium
Style
Lighting
Color
Mood
Viewpoint
Others

FYI: The default Stable Diffusion output is 512x512px, which becomes blurry when enlarged. While you can upscale resolution within Stable Diffusion, it's often time-consuming and may introduce artifacts. That's where Aiarty Image Enhancer comes in – an AI-driven solution that can:

Aiarty Image Enhancer

Upscale and Enhance AI-generated Images with AI

Enhance and enlarge Stable Diffusion images by 1x, 2x, 4x, or 8x
Upscale image resolution to 4K, 8K, 16K, or even 32K
Deliver super clarity without artifacts

Free Download Free Download

Before we dive into writing Stable Diffusion prompts, keep in mind that the final image style and quality really depend on both your prompt and the Stable Diffusion Checkpoint (or model) you use. Different models are trained on distinct datasets, leading to different style biases; for example, Epic Realism yields realistic images while RealCartoon-XL focuses on cartoon styles.

Crucially for this guide, most of our examples and demonstrations will use the RealCartoon-XL Checkpoint. This means the results you see are within the context of this specific cartoon-style model. For a deeper understanding of any model's characteristics, always check its model card.

#1. Subject

The subject of the image can be anything - a person, animal, character, location, object, or something else. A common mistake when describing an image is not providing enough detail about the subjects. Be sure to thoroughly describe what you expect to see in the image, rather than just stating the general category of the subject.

For instance, when creating a beautiful woman. A newbie may write: 1 girl

This can bring you nice-looking images but doesn't really give a clue for Stable Diffusion to understand what we're exactly aiming for.

To ensure the desired outcome, be sure we're providing specific details about the woman's appearance, for example, her physical appearance (e.g. long hair, porcelain skin, piercing blue eyes), style of clothing she's wearing (e.g. flowing white sundress, strappy sandals), pose or body language (e.g. standing in a relaxed, one hand on her hip), and the background setting (e.g. in a lush, flower-filled garden). Thus, let's add more keywords to this prompt:

1girl, upper body, stand, arm on side, pink hair, bob cut, blue eyes, lips, nose, smiling, headband, ear piercing, kimono, milky way, glittering stars

#2. Medium

Medium is the material used to make artwork. Using the medium keyword in the prompt can dramatically change the image style. Some commonly-used keywords under this category include:

photos, painting, illustration, oil painting, watercolor painting, vector graphics, 3D rendering, sculpture, doodle, tapestry, Haida print, etc.

Let's try adding our prompt with 2 different mediums: 3D rendering and painting. And check how they can affect the resulting work.

Left image: 1girl, upper body, stand, arm on side, pink hair, bob cut, blue eyes, lips, nose, smiling, headband, ear piercing, kimono, milky way, glittering stars, 3D rendering

Right image: 1girl, upper body, stand, arm on side, pink hair, bob cut, blue eyes, lips, nose, smiling, headband, ear piercing, kimono, milky way, glittering stars, painting

#3. Style

The style of an image refers to its artistic expression, such as pop art, hyper-realistic, fantasy, dark art, fauvism, impressionism, surrealism, concept art, heavy metal 1981, and more. Adding them to the prompt can also make a big difference. Let's try with fantasy.

1girl, upper body, stand, arm on side, pink hair, bob cut, blue eyes, lips, nose, smiling, headband, ear piercing, kimono, milky way, glittering stars, painting, fantasy

#4. Lighting

The right lighting is crucial for successful images. Different lighting keywords, such as soft, ambient, overcast, dimly lit, studio lights, dramatic lighting, and dreamlike ethereal lighting, significantly influence the overall look and feel of an image. There, let's add dreamlike diffuse ethereal lighting and studio light to the prompt separately.

Left image: 1girl, upper body, stand, arm on side, pink hair, bob cut, blue eyes, lips, nose, smiling, headband, ear piercing, kimono, milky way, moon, glittering stars, painting, fantasy, dramatic spotlighting

Right image: 1girl, upper body, stand, arm on side, pink hair, bob cut, blue eyes, lips, nose, smiling, headband, ear piercing, kimono, milky way, moon, glittering stars, painting, fantasy, studio lighting

You've probably noticed some unwanted elements in the outputs, like distorted hands and text. To fix this, you can use negative prompts. Let's quickly look at how to add negative prompts.

#5. Color

We can control the overall color of the image by adding color keywords. The colors you specified may appear as a tone or in objects. The best and most popular used keywords about color include vibrant, bold, contracting, muted, bright, monochromatic, colorful, black and white, and pastel.

Aside, you can add the names of websites and studios that are known for their use of specific color palettes. For example, you can add color inspired by Studio Ghibli to the prompt.

1girl, upper body, stand, arm on side, pink hair, bob cut, blue eyes, lips, nose, smiling, headband, ear piercing, kimono, milky way, moon, glittering stars, painting, fantasy, studio lighting, color inspired by Studio Ghibli

#6. Mood or Emotion

Some commonly used keywords include fluid, pensive, sedate, calm, sad, angry, raucous, and energetic.

By adding one or more of these types of keywords to our prompt, the Stable Diffusion-generated image can easily evoke a specific emotional response or overall ambiance. Here let's add energetic and check the difference.

In this case, the generated images show the feeling of excitement and dynamism.

#7. Viewpoint/Composition

Don't forget to detail the preferred camera angle, perspective, framing, and composition of the image, with common prompts like front view, side view, back view, looking back, eye contact, from above, portrait, headshot, close-up, and bird's-eye view to guide the visualization of the scene.

Let's add from above to the prompt.

#8. Other Categories

When generating images with Stable Diffusion, the type of content you aim to create may require specific keywords from particular categories. For instance, when striving for realistic photos. Some examples of the types of keywords you might want to include are:

Camera devices: iPhone, DJI, DSLR, mirrorless, etc.
Shooting angles: overhead, low-angle, bird's eye view, etc.
Shooting distance: close-up, tight shot, wide shot, etc.
Camera settings: ISO, shutter speed, aperture, etc.

Here's an example:

medium shot, portrait, young adult woman, beautiful, blue eyes, realistic skin with imperfections, goddess, warm smile, god rays, wildflowers, forest, bokeh, (shot by iPhone 15:1.3), (portrait mode:1.3), f/number, (soft and natural light:1.2), high resolution

By incorporating these kinds of camera-specific keywords into our prompts, we're giving the Stable Diffusion model more context and information about the type of photographic look and feel you need.

Adding Negative Prompt

When you use the Stable Diffusion AI to generate images, it's really helpful to include negative prompts. Negative prompts tell the AI what you don't want to see in the final image. Without negative prompts, the AI might create images with things in them that you don't want. For example, maybe you want to make an image of a futuristic city, but the AI includes some weird artifacts or glitches that you don't like. By using negative prompts, you can instruct the AI to avoid those unwanted elements. This gives you more control over the final image and helps you get results that match the vision you had in mind.

Some generally used negative prompts include: low quality, low res, blurry, artifacts, grainy, pixelated, distorted, cropped, out of focus, bad composition, ugly, duplicated, boring.

As said, you can also add negative prompts to help avoid the shown up of unwanted elements, for instance, text, logo, watermark, banner, extra digits, and signature. To take the image you are working with as an example, the hand and eyes look deformed and there is unwanted text, so let's add the negative words for that:

low quality, blurry, artifacts, grainy, cropped, ugly, duplicated, hands, imperfect eyes, deformed pupils, deformed iris, text, boring

Now, it looks much better.

Stable Diffusion Prompting Techniques

#1. Stable Diffusion Prompt Weighting

Prompt weighting is a technique used to give more or less importance to different parts of our prompt when generating images with Stable Diffusion. The basic idea is that you can assign numerical weights to the various elements in our prompt. This allows you to emphasize the aspects you want the model to focus on more while downplaying other parts. See below to know how it works:

The default weight of words in our prompt is 1.

Increase Emphasize on a Word

To increase emphasis on a keyword, simply put that word in () or add + at the end of that keyword. The more () or + you use, the more attention the model will give to the keyword, for instance,

(keyword) or keyword+ is equivalent to 1.1
((keyword)) or keyword++ is equivalent to 1.1x1.1
(((keyword))) or keyword+++ is equivalent to 1.1x1.1x1.1

See the example below:

Alternatively, add a number between 1.1 and 2 at the end to make a keyword more important, for instance, a (word:1.5) to increase the attention to the keyword by a factor of 1.5.

Here's an example,

Reduce Emphasize on a Word

To reduce emphasis on a keyword, simply put that word in []. The more brackets you use, the less attention the model will give to the keyword, for instance,

[keyword] or keyword- is equivalent to 0.9
[[keyword]] or keyword-- is equivalent to 0.9x0.9
[[[keyword]]] or keyword--- is equivalent to 0.9x0.9x0.9

Alternatively, add a number between 0.1 and 0.9 at the end to make a keyword less important, for instance, a (keyword:0.8) to decrease the attention to the keyword by a factor of 0.8.

#2. Keywords Blending

Prompt scheduling is a technique used in Stable Diffusion, which allows you to blend two different keywords or prompts during the image generation process. This is done using a specific syntax: [keyword1: keyword2: factor].

The way it works is:

keyword1: the first keyword that will be used to start the image generation.
keyword2: the second keyword that will replace the first one during the generation process.
factor: a number between 0 and 1 that determines when the transition from keyword1 to keyword2 should happen.

For example, if I input closeup photo of [Taylor Swift: Beyonce: 0.4] and set the sample steps to 40, then the first 16 steps (0.4x40) are for creating a closeup photo of Taylor Swift, and the reset steps are a closeup photo for Beyonce.

You can also use multiple celebrity names with a hybrid approach, for example:

face of [Taylor Swift|Beyonce|Billie Eilish]

This will create a face that is a hybrid of the three celebrities, blending their features.

#3. Alternating Words

The alternating words technique is a convenient syntax for swapping between two different prompts or keywords every other step during the image generation process. This is done using the following syntax:

(keyword1|keyword2) or [keyword1|keyword2]

For example, using (frog|man) or [frog|man] means that in step 1, the prompt will be a frog, in step 2 it will be man, in step 3 it will be a frog again, and so on. This results in a morphing style in which this prompt would be half frog and half man.

#4. Starting a New Chunk

The BREAK keyword (must be uppercase) is used to fill the current chunks with padding characters. When you add BREAK and then continue with more text, it starts a new chunk. This helps to ensure that the following prompts are not influenced by the previous text. Let's use an example to better understand how this works.

Left image: 1girl, sitting on a bench in a park, high resolution, sunlight, smiling, head resting on hands, long hair, round glasses, pink eyes, white dress, purple hair, green headband.

Right image: 1girl, sitting on a bench in a park, high resolution, sunlight, smiling, head resting on hands, long hair, round glasses, pink eyes, white dress, purple hair, BREAK, green headband.

How Long a Stable Diffusion Prompt Can Be

The length of text you can input into Stable Diffusion depends on the version and model you're using. For example, in the original v1 model, the prompt is limited to 75 tokens.

The AUTOMATIC1111 Stable Diffusion model has no hard limit on the number of tokens. If our prompt exceeds 75 tokens, it will be split into multiple "chunks" of 75 tokens each. This process can continue indefinitely until our computer runs out of memory.

Note:

A token is not exactly the same as a word - it's a numerical representation of a word that the model recognizes. If the model doesn't recognize a word, it will break it down into smaller parts (sub-words) until it understands it. For example, the word dog is one token, and running is another token. But the phrase dogrunning would be two tokens because the model doesn't recognize it as a single word.

Stable Diffusion Prompts: Tips and Tricks

#1. Attribute Association

Sometimes, when you specify certain attributes in a prompt, other related attributes may be generated as well. This is because some attributes are strongly correlated with each other. For example, if you specify that you want a photo of a woman with brown eyes, the model may generate an image of an Asian woman, as brown eyes are more common in Asia populations.

1 girl, focus on the face, brown eyes, very detailed, realistic soft skin

However, if you change the eye color from brown to green – one of the rarest eye colors, then we'll very possibly create an image of a Europe woman.

#2. Celebrity Names Association

When using the names of celebrities or artists in our Stable Diffusion prompts, you need to be aware that these names can carry implicit associations with specific poses, outfits, or styles.

This means that when you include a celebrity name in our prompt, the AI model may interpret that name as implying certain visual characteristics that are commonly associated with that particular person, even if that wasn't our original intent.

#3. Artist Names Association

When you include the name of a specific artist in our prompt, such as Edvard Munch, Clive Barker, Tarsila do Amaral, or Louis Royo, the Stable Diffusion model will try to generate an image that has characteristics associated with that artist's unique style and visual aesthetic.

See what I get as below by adding the style of Edvard Munch.

The generated image is likely to have features that are reminiscent of Munch's distinctive expressionist painting style.

#4. Follow a Structured Syntax

By organizing the prompt in this structured way, you're making it easier for the Stable Diffusion model to understand the relationships between the different elements and more effectively incorporate them into the generated image.

Let's say our prompt has the phrases "green headband" at the beginning and "round glasses" at the very end. Stable Diffusion may be more likely to prioritize and include the "green headband" detail, while potentially skipping or deprioritizing the "round glasses" part.

However, if you group all the descriptors for the subject (like hair, facial features, clothing, etc.) together in one section, Stable Diffusion is more likely to interpret those as a cohesive set of attributes that should be included in the final output. So the prompt is suggested to be broken down into these main sections:

Subject & setting
Medium, style, lighting, color, and emotion
Composition & additional information

Maximize the Quality of Your Stable Diffusion Images