Introduction

As AI-assisted painting continues to gain popularity, many users have indicated the need for guidance regarding the creation of general prompts for anime-style AI-generated art. Consequently, we have compiled relevant insights, and in this article, we will analyze the principles behind crafting effective prompts to achieve the desired anime AI art prompt

Understanding the 'Element Triad' in AI Painting

AI painting can create many desired and beautiful pieces of art. However, many people do not understand why the AI often fails to produce the effect they want.Firstly, it’s essential to have control over the artwork. A famous notion that contributes to the lack of control over the image is known as the ‘Element Triad.’

What are elements? They can be understood as a tag, also termed as a prompt, or an object characterized by certain features.

They are: Element Overflow, Element Conflict, and Element Pollution.

Element Overflow occurs when an element appears in an unwanted place, creating erroneous elements or multiple instances of the same element incorrectly (often implying structural mistakes).

Element Conflict arises when certain elements can appear, but others conflict and either cannot appear or become ineffective (often referring to tags of the same nature).

Element Pollution refers to situations where certain elements overlay but do not completely cover other elements, or the colors of some elements bleed onto others.

Containers and Depth Perception in Artwork

In the context of art, a “container” can also be referred to as a space. It delineates a specific area within the picture plane for drawing, essentially organizing the space distribution without structure.


Currently, we can demarcate a drawing area for an object (commonly referred to as an item or thing) by using various positional or spatial tags.


For example, the tag “indoors” suggests a typical interior drawing, while “lake” implies an outdoor water scene.


The arrangement of figures and scenery affects the perception of the artwork. If a person is placed in front of a scene, the focus is on the portrait, while positioning the scene in front of the character turns the person into an element of the container or space, integrating the figure into an environmental scene.


For instance, using the tags “indoors,” “against window,” “sitting on chair,” and “1girl” could generate an image of a girl sitting on a chair by a window, inside a room.

Sitting girl anime image generated by ZMO

Attention to Tags and the Importance of Visual Hierarchy

Next, one must pay attention to the standardization of tags. For instance, in natural language syntax, one typically prioritizes adverbials and can omit articles and quantifiers. For example, the definite article “the” can be eliminated, which can satisfy the requirements for AI-driven painting while reducing the word count.


To effectively create environmental art, an understanding of visual depth is necessary—although AI might not inherently understand what this entails. Depicting a scene with a sense of visual hierarchy is a fundamental principle of drawing.


The scene could be divided into background, characters, and foreground, layering them sequentially to prevent element overflow. For instance, to have a character holding a bottle, the “bottle” tag should be placed behind the “person” tag. If the bottle appears first and then the person, it will result in the character appearing within the bottle. This highlights the principle that both in portraiture, landscape painting, or any other style, secondary objects should appear after the primary subjects.


Principle: The AI works by scanning from beginning to end and applying color in what is called a “step.” Elements that appear earlier have more control over the canvas. Those that appear later serve primarily to add details to the “container.” If there are too few elements, the AI automatically adds more, which could lead to extraneous parts like an extra leg or a face appearing on an object.


Similarly, if there are too many elements and the canvas size is small, or too few elements and the canvas size is large, element overflow may occur.

Understanding the Broad and Standard Syllogisms in AI Art

Many people have tried to address the issue of element overflow by proposing what is known as the broad and standard syllogisms in art prompts, also called the Broad and Standard Syllogistic Approaches.

Broad Syllogistic Approach

Linguistically, the description of an object might follow this format: “Subject, Definition, Details.”


For computer language, this translates to an “Object instantiation, Class definition, Attributes.”

Character Illustration:

  • A painting: What kind of painting is it, followed by specific details of the painting such as art style, quality, and camera angle.
  • A character: What kind of person is it, followed by specific details about the person (facial features, clothing, and actions).
  •  A background: What kind of background is it, followed by specific details about the background and the objects within.

Landscape Illustration

A painting: What kind of painting is it, followed by specific details of the painting such as art style, quality, and camera angle.

  • A background: What kind of background is it, followed by specific details about the background.
  •  A character: What kind of person is it, followed by specific details about the person (facial features, clothing, and actions).
  • Objects in the background/foreground: What kind of object is it, followed by specific details of the object.

    Items should be placed after characters to prevent the inadvertent generation of faces on objects. The above represents the broad syllogistic approach, which aligns well with the key points of manual painting.

Standard Syllogistic Approach

  • Prefix (basic prefix + art style term + overall effect feature) – this pertains to the art quality, style, lighting, and camera angle.
  • Subject (the main subject matter of the canvas) – this could be the character, scenery, or architecture.
  • Scene (background, ambient items, foreground, special effects) – elements that complement and fill the image.

Adjusting the Syllogistic Approach

These theories are flexible, orderly, and follow a logical sequence. For example, to emphasize special effects in a prompt under the term “stunning,” one could place effects at the beginning to highlight them, such as ahead of the character generation, resulting in a more visually captivating image. Weight adjustments can also be made to control the intensity of the painting, which will be discussed later.


Once again, pay attention to filling the canvas, ensuring the size of the canvas is proportional to the number of elements.

Tag Ordering

Following the broad syllogistic approach, objects can be depicted in a regulated sequence from most to least significant: primary subject -> secondary subject -> tertiary subject.


As the first appearing tags dominate the composition, subsequent tags act primarily to fill in details. Thus, the relationship between character and scenery tags is clarified, along with the need for tag ordering. Benefits of this include preventing tag swallowing (tag conflict), controlling layout, reducing element overflow (rich background elements may cause multiple faces issue), easier management, and enhanced stability.


Principle: AI scans from the beginning to the end and ‘paints’ in steps; elements that appear earlier control the composition, while later elements add detail. If there are not enough elements, AI automatically adds more, which might result in extraneous parts like an extra leg.


If the sequence of elements is incorrect (for instance, if objects precede characters), AI may later modify the prior elements to comply with tags, though achieving the “perfect” step is not always manageable.


Placing characters in front of the scene emphasizes them, whereas placing the scene in front integrates the characters, resulting in a type of environmental portrait.


You are not bound to follow these guidelines rigidly; they are presented to ensure controllable AI artwork. For instance, following these rules reduced instances of the “extra limbs” phenomenon in my work. However, if you enjoy randomness, feel free to add tags freely. This unpredictability can be part of the fun.

What is Weight Adjustment and Hidden Properties of Tags

With the theory covered, it’s time to start utilizing tags. Many newcomers are unaware of the function of parentheses. Curly braces {} are used as weighting symbols in Naifu, parentheses () are used for weighting in Web-UI, and square brackets [] are used to diminish the weight in both. With each additional layer, the weight increases by a factor of 1.1. Diminishing the weight with [] means dividing the weight by approximately 1.1 (around 0.91). However, curly braces do not serve any purpose in Web-UI, while on the official site, they are said to increase the weight by a factor of 1.05.


(prompt)


(prompt:weight multiplier)


The outer layer must always be parentheses and not any other brackets. For example, (white hair:1.1) increases the weight. In addition to adjusting the weight of entire terms, partial weight can also be applied to adjectives, as in (white:1.1) hair.


Utilizing the weight multiplier can effectively control the weight. However, a high number of steps can deepen the weight, which suggests that a high number of steps (said to be around 120) can “retouch” the image. But an excessively high weight combined with a high number of steps may lead to an imbalance in the image.


Direct addition of parentheses has the following effect:


(prompt) === (prompt:1.1)


((prompt)) === (prompt:1.21) …


The same principle applies to diminishing the weight.

weight function sample image

masterpiece, an extremely cute and beautiful girl, highly detailed beautiful face and red eyes,(jean shoes:1.2) cute, (evil smile:1.2), multicolored hair, very short hair, animal ears, wolf ears,child, colorful jacket, neon colors, punk rock, shorts, piercing, full body, paw pose, (cat paw gloves:1.3)

Using mathematics or computer science language, such a function is f(x), where adding another layer gives f(f(x)). This has associativity since the sequence of operation affects the output, but it does not have the commutativity or the distributive properties of standard mathematics because the order in which tags are applied will impact the generated image. Therefore, the outcome is affected by the order and combination of tags, similar to a function composition in mathematics where f(g(x)) is not necessarily equal to g(f(x)).

We can adjust tag weights to make certain elements appear more prominently, somewhat addressing the issue of element conflict. This is a standard technique employed by many users.

As for hidden properties of tags, what does this mean? Hidden properties of tags refer to the inherent characteristics that certain tags bring with them, i.e., attributes that we do not need to assign because they are already implied.

For instance:

  • The tag “maid” might inherently clothe the character in a maid outfit, attributing the elements of that role to the figure.
  • The tag “elf” might automatically convey attributes like pointed ears and an affinity for green forests.

Entities with inherent attributes carry these hidden properties, which can be overridden via element conflict with other properties. Hidden properties can be advantageous or disadvantageous. Typically, we might use a tag with fewer hidden properties like “1girl” for a more DIY approach. By appending art style tags, artist names, or specific anime character tags, one can customize the desired character traits. If one desires to emulate the style of a particular game or comic series, incorporating the name of the game company, comic artist, or fan-creator into the prompt can be effective. For example, to adopt Fortnite’s style, one might append “by Epic Games” to the prompt; for a Mario-style, “art by Nintendo.”


If the proportions of the generated portrait differ from expectations, specifying terms like “medium shot” or “close-up” can adjust the perceived camera focus and composition. Should there be unwanted NSFW content in a generated image, adjusting the negative prompt can restrict the output, for example, adding a negative prompt: “lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, nsfw, sexual, nude, sex, 18+, naked, porn, dick, vagina, naked person, explicit content, uncensored, fuck, nipples, breasts, areola,” to mitigate or restrict the generation of such content.

What is Isolated Rendering (Short, Medium, Long Prompts)

Isolated rendering refers to the use of short, medium, and long elements, or the generation of these elements in their respective lengths within prompts. There are typically three forms of generation: the most common direct method, the less common short sentence method, and the long, descriptive method akin to natural language.

Assuming the goal is to generate an image of a beautiful girl with blue eyes, blonde hair, a white shirt, a pink skirt, and white knee-high socks, standing. Direct generation often leads to problems like color pollution.

Direct Generation (Pitch Style Generation):

masterpiece, best quality, full body ,standing , 1 girl , (blue eyes),(gold hair),(white clothes),(pink skirt),(white kneehighs), (Magic journey anime filter)

masterpiece best quality full body standing 1 girl blue eyesgold hairwhite clothespink skirtwhite kneehighs

Short sentence(AND emphasize)

masterpiece, best quality, full body ,standing , 1 girl , (blue eyes) AND (gold hair),(white clothes) AND (pink skirt) AND (white kneehighs),

masterpiece best quality full body standing 1 girl blue eyes AND gold hairwhite clothes AND pink skirt AND white kneehighs

Long entire sentence(Natrural tone)

masterpiece, best quality, full body ,sitting , (1 girl with  blue eyes and gold hair wearing white clothes and pink skirt with white kneehighs),

masterpiece best quality full body sitting 1 girl with blue eyes and gold hair wearing white clothes and pink skirt with white kneehighs

The rules you mention specify the importance of using uppercase “AND” between attributes in short prompts to effectively bind properties to objects. This specialized syntax emphasizes the association of attributes with their respective objects, bringing about a degree of stability in the result. It’s also noted that not all algorithms support this syntax, with DDIM being an example where the “AND” construction may lead to errors.


Long prompt generation leverages the use of natural language constructs like prepositions, present participle forms of verbs, and conjunctions to attach properties to objects seamlessly, facilitating isolated rendering. Such constructions using “object with attribute,” gerunds, and “noun”, bound to objects, enhance the stability of tags.


Direct generation is often less successful in accurately binding colors to elements. With the short prompt chant, the likelihood of visual imbalances is reduced, while the long prompt approach effectively mitigates errors in element-to-color assignments.


In summary, element pollution remains a significant challenge in AI-generated multiple-character art. Through tag sequencing, step-by-step weighting, and the application of natural language structuring, one can effectively prevent issues such as element overflow, conflict, and pollution. These are essential techniques for artists and AI enthusiasts to navigate the complexities of AI art generation effectively.

Conclusion

In summary, there are three main points: Firstly, utilize extended descriptions in single sentences. Then, arrange tags in order, adding stepwise cycling as needed. Tags follow a linear priority, yet it’s important to note that this isn’t applied throughout the entire process. The latter half of the prompt is usually standardized in strength, so placing tags that require flexible weight control towards the end for collective adjustment is an effective strategy.

If you find the discussion on prompt structure somewhat complex, ZMO has compiled some straightforward and user-friendly prompt structures to aid in the generation of artwork. These structures succinctly encapsulate the main points from the content above. Additionally, a list of prompt keyword suggestions is provided to facilitate better outcomes in your AI-driven art generation.

Core Structure of the Prompt:
1. Modifier + Subject + Detailed Description (facial features, clothing style – e.g., hanfu, action or state, environmental setting) + Style Description
– Style Description:
– Specific movie styles or company brands – Pixar, Disney, Zootopia, XX Artstation
– Image specific category/style: 3D animation/cartoon, etc.
– Keywords that enhance image effects: super realistic, super clear details, extremely detailed, 8K, etc.
– Lighting features: cinematic lighting, volumetric lighting, etc.
2. Modifier + Subject + Detailed Description + “in the style of” + Style Description
3. Random Style Description + Style Description “of” + Subject + Detailed Description
4. Utilizing professional terminology, representative works, and famous artists for reference will significantly improve results.
5. Sometimes, simple prompts can also be quite effective.

Keyword Descriptions:
1. For modifying the subject:
– Private customization, Designer customization, product shots, An ultra-high definition professional studio quality photograph of, Full view of a landscape, a portrait of, etc.
2. Adjectives:
– Luxury, elegant, intricate, realistic, beautiful, discrete, intricate, professional, etc.
3. Lighting Effects:
– Octane render, cinematic dynamic background, cinematic lighting, volumetric lighting, golden light, light rays, dramatic lighting, volumetric shadows, studio lighting, etc.

General Keyword Prompts:
– For High-Definition Keywords:
– “highly detailed xxx”, “high-quality xxx”, “8k”, “extremely detailed xxx”, “hd”, etc.
– Corresponding Keywords for Japanese Anime Style:
– “anime style”, “manga style”, “Japanese comics style”, “anime art”, etc.
– Keywords for American Comic Style:
– “American comics”, “American comics style”, etc.
– Keywords for 3D Pixel Style:
– “voxel art“, ”voxel render”, etc.
– Keywords for Environmental Light Effects:
– “cinematic lighting”, “warm natural lighting”, “stage lighting”, “radiant lighting”, “volumetric lighting”, etc.
– Keywords for the Background:
– “intricate background”, “fantasy scene”, “colorful scene”, etc.
– Keywords for Attractiveness:
– “handsome”, “beautiful face”, “cute”, “Exquisite features”, “good-looking”, “perfect face”, “elegant”, etc.
– Keywords for Clothing:
– “business suit”, “silk dress”, “intricately detailed outfit”, etc.
– Keywords for Portraiture:
– “character portrait”, “upper body portrait”, “frontage head portrait”, “avatar”, “fantasy cloth”, etc.

These structures and keyword prompts are designed to streamline the AI art generation process, making it more accessible and yielding better-aligned results with the artist’s intent.