Merge branch 'master' into feature/scale_to

dc1432e0 · AUTOMATIC1111 · GitHub · 1d64976d · ca5efc31 · dc1432e0
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -2,7 +2,7 @@
 name: Feature request
 about: Suggest an idea for this project
 title: ''
-labels: ''
+labels: 'suggestion'
 assignees: ''

 ---

--- a/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md
+++ b/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md
+# Please read the [contributing wiki page](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Contributing) before submitting a pull request!
+
+If you have a large change, pay special attention to this paragraph:
+
+> Before making changes, if you think that your feature will result in more than 100 lines changing, find me and talk to me about the feature you are proposing. It pains me to reject the hard work someone else did, but I won't add everything to the repo, and it's better if the rejection happens before you have to waste time working on the feature.
+
+Otherwise, after making sure you're following the rules described in wiki page, remove this section and continue on.
+
+**Describe what this pull request is trying to achieve.**
+
+A clear and concise description of what you're trying to accomplish with this, so your intent doesn't have to be extracted from your code.
+
+**Additional notes and description of your changes**
+
+More technical discussion about your changes go here, plus anything that a maintainer might have to specifically take a look at, or be wary of.
+
+**Environment this was tested in**
+
+List the environment you have developed / tested this on. As per the contributing page, changes should be able to work on Windows out of the box.
+ - OS: [e.g. Windows, Linux]
+ - Browser [e.g. chrome, safari]
+ - Graphics card [e.g. NVIDIA RTX 2080 8GB, AMD RX 6600 8GB]
+
+**Screenshots or videos of your changes**
+
+If applicable, screenshots or a video showing off your changes. If it edits an existing UI, it should ideally contain a comparison of what used to be there, before your changes were made.
+
+This is **required** for anything that touches the user interface.
\ No newline at end of file
--- a/CODEOWNERS
+++ b/CODEOWNERS
+*       @AUTOMATIC1111
--- a/README.md
+++ b/README.md
@@ -28,10 +28,12 @@ Check the [custom scripts](https://github.com/AUTOMATIC1111/stable-diffusion-web
    - CodeFormer, face restoration tool as an alternative to GFPGAN
    - RealESRGAN, neural network upscaler
    - ESRGAN, neural network upscaler with a lot of third party models
-    - SwinIR, neural network upscaler
+    - SwinIR and Swin2SR([see here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2092)), neural network upscalers
    - LDSR, Latent diffusion super resolution upscaling
 - Resizing aspect ratio options
 - Sampling method selection
+    - Adjust sampler eta values (noise multiplier)
+    - More advanced noise setting options
 - Interrupt processing at any time
 - 4GB video card support (also reports of 2GB working)
 - Correct seeds for batches
@@ -67,6 +69,7 @@ Check the [custom scripts](https://github.com/AUTOMATIC1111/stable-diffusion-web
     - also supports weights for prompts: `a cat :1.2 AND a dog AND a penguin :2.2`
 - No token limit for prompts (original stable diffusion lets you use up to 75 tokens)
 - DeepDanbooru integration, creates danbooru style tags for anime prompts (add --deepdanbooru to commandline args)
+- [xformers](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers), major speed increase for select cards: (add --xformers to commandline args)

 ## Installation and Running
 Make sure the required [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) are met and follow the instructions available for both [NVidia](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs) (recommended) and [AMD](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs) GPUs.
@@ -116,13 +119,17 @@ The documentation was moved from this README over to the project's [wiki](https:
 - CodeFormer - https://github.com/sczhou/CodeFormer
 - ESRGAN - https://github.com/xinntao/ESRGAN
 - SwinIR - https://github.com/JingyunLiang/SwinIR
+- Swin2SR - https://github.com/mv-lab/swin2sr
 - LDSR - https://github.com/Hafiidz/latent-diffusion
 - Ideas for optimizations - https://github.com/basujindal/stable-diffusion
 - Doggettx - Cross Attention layer optimization - https://github.com/Doggettx/stable-diffusion, original idea for prompt editing.
+- InvokeAI, lstein - Cross Attention layer optimization - https://github.com/invoke-ai/InvokeAI (originally http://github.com/lstein/stable-diffusion)
 - Rinon Gal - Textual Inversion - https://github.com/rinongal/textual_inversion (we're not using his code, but we are using his ideas).
 - Idea for SD upscale - https://github.com/jquesnelle/txt2imghd
 - Noise generation for outpainting mk2 - https://github.com/parlance-zz/g-diffuser-bot
 - CLIP interrogator idea and borrowing some code - https://github.com/pharmapsychotic/clip-interrogator
+- Idea for Composable Diffusion - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
+- xformers - https://github.com/facebookresearch/xformers
+- DeepDanbooru - interrogator for anime diffusers https://github.com/KichangKim/DeepDanbooru
 - Initial Gradio script - posted on 4chan by an Anonymous user. Thank you Anonymous user.
- DeepDanbooru - interrogator for anime diffusors https://github.com/KichangKim/DeepDanbooru
 - (You)
--- a/environment-wsl2.yaml
+++ b/environment-wsl2.yaml
@@ -3,9 +3,9 @@ channels:
  - pytorch
  - defaults
 dependencies:
-  - python=3.8.5
-  - pip=20.3
+  - python=3.10
+  - pip=22.2.2
  - cudatoolkit=11.3
-  - pytorch=1.11.0
-  - torchvision=0.12.0
-  - numpy=1.19.2
+  - pytorch=1.12.1
+  - torchvision=0.13.1
+  - numpy=1.23.1
\ No newline at end of file
--- a/javascript/contextMenus.js
+++ b/javascript/contextMenus.js
@@ -16,7 +16,7 @@ contextMenuInit = function(){
      oldMenu.remove()
    }

-    let tabButton = gradioApp().querySelector('button')
+    let tabButton = uiCurrentTab
    let baseStyle = window.getComputedStyle(tabButton)

    const contextMenu = document.createElement('nav')
@@ -123,48 +123,53 @@ contextMenuInit = function(){
  return [appendContextMenuOption, removeContextMenuOption, addContextMenuEventListener]
 }

-initResponse = contextMenuInit()
-appendContextMenuOption     = initResponse[0]
-removeContextMenuOption     = initResponse[1]
-addContextMenuEventListener = initResponse[2]
+initResponse = contextMenuInit();
+appendContextMenuOption     = initResponse[0];
+removeContextMenuOption     = initResponse[1];
+addContextMenuEventListener = initResponse[2];

-
-//Start example Context Menu Items
-generateOnRepeatId = appendContextMenuOption('#txt2img_generate','Generate forever',function(){
-  let genbutton = gradioApp().querySelector('#txt2img_generate');
-  let interruptbutton = gradioApp().querySelector('#txt2img_interrupt');
-  if(!interruptbutton.offsetParent){
-    genbutton.click();
-  }
-  clearInterval(window.generateOnRepeatInterval)
-  window.generateOnRepeatInterval = setInterval(function(){
+(function(){
+  //Start example Context Menu Items
+  let generateOnRepeat = function(genbuttonid,interruptbuttonid){
+    let genbutton = gradioApp().querySelector(genbuttonid);
+    let interruptbutton = gradioApp().querySelector(interruptbuttonid);
    if(!interruptbutton.offsetParent){
      genbutton.click();
    }
-  },
-  500)}
-)
-
-cancelGenerateForever = function(){ 
-  clearInterval(window.generateOnRepeatInterval) 
-  let interruptbutton = gradioApp().querySelector('#txt2img_interrupt');
-  if(interruptbutton.offsetParent){
-      interruptbutton.click();
+    clearInterval(window.generateOnRepeatInterval)
+    window.generateOnRepeatInterval = setInterval(function(){
+      if(!interruptbutton.offsetParent){
+        genbutton.click();
+      }
+    },
+    500)
  }
-}

-appendContextMenuOption('#txt2img_interrupt','Cancel generate forever',cancelGenerateForever)
-appendContextMenuOption('#txt2img_generate', 'Cancel generate forever',cancelGenerateForever)
+  appendContextMenuOption('#txt2img_generate','Generate forever',function(){
+    generateOnRepeat('#txt2img_generate','#txt2img_interrupt');
+  })
+  appendContextMenuOption('#img2img_generate','Generate forever',function(){
+    generateOnRepeat('#img2img_generate','#img2img_interrupt');
+  })

-
-appendContextMenuOption('#roll','Roll three',
-  function(){ 
-    let rollbutton = gradioApp().querySelector('#roll');
-    setTimeout(function(){rollbutton.click()},100)
-    setTimeout(function(){rollbutton.click()},200)
-    setTimeout(function(){rollbutton.click()},300)
+  let cancelGenerateForever = function(){ 
+    clearInterval(window.generateOnRepeatInterval) 
  }
-)
+
+  appendContextMenuOption('#txt2img_interrupt','Cancel generate forever',cancelGenerateForever)
+  appendContextMenuOption('#txt2img_generate', 'Cancel generate forever',cancelGenerateForever)
+  appendContextMenuOption('#img2img_interrupt','Cancel generate forever',cancelGenerateForever)
+  appendContextMenuOption('#img2img_generate', 'Cancel generate forever',cancelGenerateForever)
+
+  appendContextMenuOption('#roll','Roll three',
+    function(){ 
+      let rollbutton = get_uiCurrentTabContent().querySelector('#roll');
+      setTimeout(function(){rollbutton.click()},100)
+      setTimeout(function(){rollbutton.click()},200)
+      setTimeout(function(){rollbutton.click()},300)
+    }
+  )
+})();
 //End example Context Menu Items

 onUiUpdate(function(){

--- a/javascript/edit-attention.js
+++ b/javascript/edit-attention.js
@@ -25,6 +25,7 @@ addEventListener('keydown', (event) => {
 	} else {
 		end = target.value.slice(selectionEnd + 1).indexOf(")") + 1;
 		weight = parseFloat(target.value.slice(selectionEnd + 1, selectionEnd + 1 + end));
+		if (isNaN(weight)) return;
 		if (event.key == minus) weight -= 0.1;
 		if (event.key == plus) weight += 0.1;

@@ -38,4 +39,7 @@ addEventListener('keydown', (event) => {
 		target.selectionStart = selectionStart;
 		target.selectionEnd = selectionEnd;
 	}
+	// Since we've modified a Gradio Textbox component manually, we need to simulate an `input` DOM event to ensure its
+	// internal Svelte data binding remains in sync.
+	target.dispatchEvent(new Event("input", { bubbles: true }));
 });
--- a/javascript/hints.js
+++ b/javascript/hints.js
@@ -79,6 +79,8 @@ titles = {
    "Highres. fix": "Use a two step process to partially create an image at smaller resolution, upscale, and then improve details in it without changing composition",
    "Scale latent": "Uscale the image in latent space. Alternative is to produce the full image from latent representation, upscale that, and then move it back to latent space.",

+    "Eta noise seed delta": "If this values is non-zero, it will be added to seed and used to initialize RNG for noises when using samplers with Eta. You can use this to produce even more variation of images, or you can use this to match images of other software if you know what you are doing.",
+    "Do not add watermark to images": "If this option is enabled, watermark will not be added to created images. Warning: if you do not add watermark, you may be behaving in an unethical manner.",
 }



--- a/launch.py
+++ b/launch.py
@@ -104,6 +104,7 @@ def prepare_enviroment():
    args, skip_torch_cuda_test = extract_arg(args, '--skip-torch-cuda-test')
    xformers = '--xformers' in args
    deepdanbooru = '--deepdanbooru' in args
+    ngrok = '--ngrok' in args

    try:
        commit = run(f"{git} rev-parse HEAD").strip()
@@ -127,13 +128,16 @@ def prepare_enviroment():

    if not is_installed("xformers") and xformers and platform.python_version().startswith("3.10"):
        if platform.system() == "Windows":
-            run_pip("install https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/a/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl", "xformers")
+            run_pip("install https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/c/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl", "xformers")
        elif platform.system() == "Linux":
            run_pip("install xformers", "xformers")

    if not is_installed("deepdanbooru") and deepdanbooru:
        run_pip("install git+https://github.com/KichangKim/DeepDanbooru.git@edf73df4cdaeea2cf00e9ac08bd8a9026b7a7b26#egg=deepdanbooru[tensorflow] tensorflow==2.10.0 tensorflow-io==0.27.0", "deepdanbooru")

+    if not is_installed("pyngrok") and ngrok:
+        run_pip("install pyngrok", "ngrok")
+
    os.makedirs(dir_repos, exist_ok=True)

    git_clone("https://github.com/CompVis/stable-diffusion.git", repo_dir('stable-diffusion'), "Stable Diffusion", stable_diffusion_commit_hash)

--- a/modules/deepbooru.py
+++ b/modules/deepbooru.py
 import os.path
 from concurrent.futures import ProcessPoolExecutor
-from multiprocessing import get_context
+import multiprocessing
+import time

+def get_deepbooru_tags(pil_image):
+    """
+    This method is for running only one image at a time for simple use.  Used to the img2img interrogate.
+    """
+    from modules import shared  # prevents circular reference
+    create_deepbooru_process(shared.opts.interrogate_deepbooru_score_threshold, shared.opts.deepbooru_sort_alpha)
+    shared.deepbooru_process_return["value"] = -1
+    shared.deepbooru_process_queue.put(pil_image)
+    while shared.deepbooru_process_return["value"] == -1:
+        time.sleep(0.2)
+    tags = shared.deepbooru_process_return["value"]
+    release_process()
+    return tags

-def _load_tf_and_return_tags(pil_image, threshold):
+
+def deepbooru_process(queue, deepbooru_process_return, threshold, alpha_sort):
+    model, tags = get_deepbooru_tags_model()
+    while True: # while process is running, keep monitoring queue for new image
+        pil_image = queue.get()
+        if pil_image == "QUIT":
+            break
+        else:
+            deepbooru_process_return["value"] = get_deepbooru_tags_from_model(model, tags, pil_image, threshold, alpha_sort)
+
+
+def create_deepbooru_process(threshold, alpha_sort):
+    """
+    Creates deepbooru process.  A queue is created to send images into the process.  This enables multiple images
+    to be processed in a row without reloading the model or creating a new process.  To return the data, a shared
+    dictionary is created to hold the tags created.  To wait for tags to be returned, a value of -1 is assigned
+    to the dictionary and the method adding the image to the queue should wait for this value to be updated with
+    the tags.
+    """
+    from modules import shared  # prevents circular reference
+    shared.deepbooru_process_manager = multiprocessing.Manager()
+    shared.deepbooru_process_queue = shared.deepbooru_process_manager.Queue()
+    shared.deepbooru_process_return = shared.deepbooru_process_manager.dict()
+    shared.deepbooru_process_return["value"] = -1
+    shared.deepbooru_process = multiprocessing.Process(target=deepbooru_process, args=(shared.deepbooru_process_queue, shared.deepbooru_process_return, threshold, alpha_sort))
+    shared.deepbooru_process.start()
+
+
+def release_process():
+    """
+    Stops the deepbooru process to return used memory
+    """
+    from modules import shared  # prevents circular reference
+    shared.deepbooru_process_queue.put("QUIT")
+    shared.deepbooru_process.join()
+    shared.deepbooru_process_queue = None
+    shared.deepbooru_process = None
+    shared.deepbooru_process_return = None
+    shared.deepbooru_process_manager = None
+
+def get_deepbooru_tags_model():
    import deepdanbooru as dd
    import tensorflow as tf
    import numpy as np
-
    this_folder = os.path.dirname(__file__)
    model_path = os.path.abspath(os.path.join(this_folder, '..', 'models', 'deepbooru'))
    if not os.path.exists(os.path.join(model_path, 'project.json')):
        # there is no point importing these every time
        import zipfile
        from basicsr.utils.download_util import load_file_from_url
-        load_file_from_url(r"https://github.com/KichangKim/DeepDanbooru/releases/download/v3-20211112-sgd-e28/deepdanbooru-v3-20211112-sgd-e28.zip",
-                           model_path)
+        load_file_from_url(
+            r"https://github.com/KichangKim/DeepDanbooru/releases/download/v3-20211112-sgd-e28/deepdanbooru-v3-20211112-sgd-e28.zip",
+            model_path)
        with zipfile.ZipFile(os.path.join(model_path, "deepdanbooru-v3-20211112-sgd-e28.zip"), "r") as zip_ref:
            zip_ref.extractall(model_path)
        os.remove(os.path.join(model_path, "deepdanbooru-v3-20211112-sgd-e28.zip"))
@@ -24,7 +78,13 @@ def _load_tf_and_return_tags(pil_image, threshold):
    model = dd.project.load_model_from_project(
        model_path, compile_model=True
    )
+    return model, tags
+

+def get_deepbooru_tags_from_model(model, tags, pil_image, threshold, alpha_sort):
+    import deepdanbooru as dd
+    import tensorflow as tf
+    import numpy as np
    width = model.input_shape[2]
    height = model.input_shape[1]
    image = np.array(pil_image)
@@ -46,28 +106,27 @@ def _load_tf_and_return_tags(pil_image, threshold):

    for i, tag in enumerate(tags):
        result_dict[tag] = y[i]
-    result_tags_out = []
+
+    unsorted_tags_in_theshold = []
    result_tags_print = []
    for tag in tags:
        if result_dict[tag] >= threshold:
            if tag.startswith("rating:"):
                continue
-            result_tags_out.append(tag)
+            unsorted_tags_in_theshold.append((result_dict[tag], tag))
            result_tags_print.append(f'{result_dict[tag]} {tag}')

-    print('\n'.join(sorted(result_tags_print, reverse=True)))
-
-    return ', '.join(result_tags_out).replace('_', ' ').replace(':', ' ')
-
+    # sort tags
+    result_tags_out = []
+    sort_ndx = 0
+    if alpha_sort:
+        sort_ndx = 1

-def subprocess_init_no_cuda():
-    import os
-    os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
+    # sort by reverse by likelihood and normal for alpha
+    unsorted_tags_in_theshold.sort(key=lambda y: y[sort_ndx], reverse=(not alpha_sort))
+    for weight, tag in unsorted_tags_in_theshold:
+        result_tags_out.append(tag)

+    print('\n'.join(sorted(result_tags_print, reverse=True)))

-def get_deepbooru_tags(pil_image, threshold=0.5):
-    context = get_context('spawn')
-    with ProcessPoolExecutor(initializer=subprocess_init_no_cuda, mp_context=context) as executor:
-        f = executor.submit(_load_tf_and_return_tags, pil_image, threshold, )
-        ret = f.result()  # will rethrow any exceptions
-    return ret
\ No newline at end of file
+    return ', '.join(result_tags_out).replace('_', ' ').replace(':', ' ')
--- a/modules/devices.py
+++ b/modules/devices.py
@@ -36,6 +36,7 @@ errors.run(enable_tf32, "Enabling TF32")

 device = device_gfpgan = device_bsrgan = device_esrgan = device_scunet = device_codeformer = get_optimal_device()
 dtype = torch.float16
+dtype_vae = torch.float16

 def randn(seed, shape):
    # Pytorch currently doesn't handle setting randomness correctly when the metal backend is used.
@@ -59,9 +60,12 @@ def randn_without_seed(shape):
    return torch.randn(shape, device=device)


-def autocast():
+def autocast(disable=False):
    from modules import shared

+    if disable:
+        return contextlib.nullcontext()
+
    if dtype == torch.float32 or shared.cmd_opts.precision == "full":
        return contextlib.nullcontext()


--- a/modules/hypernetwork.py
+++ b/modules/hypernetwork.py
-import glob
-import os
-import sys
-import traceback
-
-import torch
-
-from ldm.util import default
-from modules import devices, shared
-import torch
-from torch import einsum
-from einops import rearrange, repeat
-
-
-class HypernetworkModule(torch.nn.Module):
-    def __init__(self, dim, state_dict):
-        super().__init__()
-
-        self.linear1 = torch.nn.Linear(dim, dim * 2)
-        self.linear2 = torch.nn.Linear(dim * 2, dim)
-
-        self.load_state_dict(state_dict, strict=True)
-        self.to(devices.device)
-
-    def forward(self, x):
-        return x + (self.linear2(self.linear1(x)))
-
-
-class Hypernetwork:
-    filename = None
-    name = None
-
-    def __init__(self, filename):
-        self.filename = filename
-        self.name = os.path.splitext(os.path.basename(filename))[0]
-        self.layers = {}
-
-        state_dict = torch.load(filename, map_location='cpu')
-        for size, sd in state_dict.items():
-            self.layers[size] = (HypernetworkModule(size, sd[0]), HypernetworkModule(size, sd[1]))
-
-
-def list_hypernetworks(path):
-    res = {}
-    for filename in glob.iglob(os.path.join(path, '**/*.pt'), recursive=True):
-        name = os.path.splitext(os.path.basename(filename))[0]
-        res[name] = filename
-    return res
-
-
-def load_hypernetwork(filename):
-    path = shared.hypernetworks.get(filename, None)
-    if path is not None:
-        print(f"Loading hypernetwork {filename}")
-        try:
-            shared.loaded_hypernetwork = Hypernetwork(path)
-        except Exception:
-            print(f"Error loading hypernetwork {path}", file=sys.stderr)
-            print(traceback.format_exc(), file=sys.stderr)
-    else:
-        if shared.loaded_hypernetwork is not None:
-            print(f"Unloading hypernetwork")
-
-        shared.loaded_hypernetwork = None
-
-
-def attention_CrossAttention_forward(self, x, context=None, mask=None):
-    h = self.heads
-
-    q = self.to_q(x)
-    context = default(context, x)
-
-    hypernetwork = shared.loaded_hypernetwork
-    hypernetwork_layers = (hypernetwork.layers if hypernetwork is not None else {}).get(context.shape[2], None)
-
-    if hypernetwork_layers is not None:
-        k = self.to_k(hypernetwork_layers[0](context))
-        v = self.to_v(hypernetwork_layers[1](context))
-    else:
-        k = self.to_k(context)
-        v = self.to_v(context)
-
-    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))
-
-    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
-
-    if mask is not None:
-        mask = rearrange(mask, 'b ... -> b (...)')
-        max_neg_value = -torch.finfo(sim.dtype).max
-        mask = repeat(mask, 'b j -> (b h) () j', h=h)
-        sim.masked_fill_(~mask, max_neg_value)
-
-    # attention, what we cannot get enough of
-    attn = sim.softmax(dim=-1)
-
-    out = einsum('b i j, b j d -> b i d', attn, v)
-    out = rearrange(out, '(b h) n d -> b n (h d)', h=h)
-    return self.to_out(out)
--- a/modules/hypernetworks/hypernetwork.py
+++ b/modules/hypernetworks/hypernetwork.py
+import datetime
+import glob
+import html
+import os
+import sys
+import traceback
+import tqdm
+
+import torch
+
+from ldm.util import default
+from modules import devices, shared, processing, sd_models
+import torch
+from torch import einsum
+from einops import rearrange, repeat
+import modules.textual_inversion.dataset
+from modules.textual_inversion.learn_schedule import LearnSchedule
+
+
+class HypernetworkModule(torch.nn.Module):
+    def __init__(self, dim, state_dict=None):
+        super().__init__()
+
+        self.linear1 = torch.nn.Linear(dim, dim * 2)
+        self.linear2 = torch.nn.Linear(dim * 2, dim)
+
+        if state_dict is not None:
+            self.load_state_dict(state_dict, strict=True)
+        else:
+
+            self.linear1.weight.data.normal_(mean=0.0, std=0.01)
+            self.linear1.bias.data.zero_()
+            self.linear2.weight.data.normal_(mean=0.0, std=0.01)
+            self.linear2.bias.data.zero_()
+
+        self.to(devices.device)
+
+    def forward(self, x):
+        return x + (self.linear2(self.linear1(x)))
+
+
+class Hypernetwork:
+    filename = None
+    name = None
+
+    def __init__(self, name=None, enable_sizes=None):
+        self.filename = None
+        self.name = name
+        self.layers = {}
+        self.step = 0
+        self.sd_checkpoint = None
+        self.sd_checkpoint_name = None
+
+        for size in enable_sizes or []:
+            self.layers[size] = (HypernetworkModule(size), HypernetworkModule(size))
+
+    def weights(self):
+        res = []
+
+        for k, layers in self.layers.items():
+            for layer in layers:
+                layer.train()
+                res += [layer.linear1.weight, layer.linear1.bias, layer.linear2.weight, layer.linear2.bias]
+
+        return res
+
+    def save(self, filename):
+        state_dict = {}
+
+        for k, v in self.layers.items():
+            state_dict[k] = (v[0].state_dict(), v[1].state_dict())
+
+        state_dict['step'] = self.step
+        state_dict['name'] = self.name
+        state_dict['sd_checkpoint'] = self.sd_checkpoint
+        state_dict['sd_checkpoint_name'] = self.sd_checkpoint_name
+
+        torch.save(state_dict, filename)
+
+    def load(self, filename):
+        self.filename = filename
+        if self.name is None:
+            self.name = os.path.splitext(os.path.basename(filename))[0]
+
+        state_dict = torch.load(filename, map_location='cpu')
+
+        for size, sd in state_dict.items():
+            if type(size) == int:
+                self.layers[size] = (HypernetworkModule(size, sd[0]), HypernetworkModule(size, sd[1]))
+
+        self.name = state_dict.get('name', self.name)
+        self.step = state_dict.get('step', 0)
+        self.sd_checkpoint = state_dict.get('sd_checkpoint', None)
+        self.sd_checkpoint_name = state_dict.get('sd_checkpoint_name', None)
+
+
+def list_hypernetworks(path):
+    res = {}
+    for filename in glob.iglob(os.path.join(path, '**/*.pt'), recursive=True):
+        name = os.path.splitext(os.path.basename(filename))[0]
+        res[name] = filename
+    return res
+
+
+def load_hypernetwork(filename):
+    path = shared.hypernetworks.get(filename, None)
+    if path is not None:
+        print(f"Loading hypernetwork {filename}")
+        try:
+            shared.loaded_hypernetwork = Hypernetwork()
+            shared.loaded_hypernetwork.load(path)
+
+        except Exception:
+            print(f"Error loading hypernetwork {path}", file=sys.stderr)
+            print(traceback.format_exc(), file=sys.stderr)
+    else:
+        if shared.loaded_hypernetwork is not None:
+            print(f"Unloading hypernetwork")
+
+        shared.loaded_hypernetwork = None
+
+
+def apply_hypernetwork(hypernetwork, context, layer=None):
+    hypernetwork_layers = (hypernetwork.layers if hypernetwork is not None else {}).get(context.shape[2], None)
+
+    if hypernetwork_layers is None:
+        return context, context
+
+    if layer is not None:
+        layer.hyper_k = hypernetwork_layers[0]
+        layer.hyper_v = hypernetwork_layers[1]
+
+    context_k = hypernetwork_layers[0](context)
+    context_v = hypernetwork_layers[1](context)
+    return context_k, context_v
+
+
+def attention_CrossAttention_forward(self, x, context=None, mask=None):
+    h = self.heads
+
+    q = self.to_q(x)
+    context = default(context, x)
+
+    context_k, context_v = apply_hypernetwork(shared.loaded_hypernetwork, context, self)
+    k = self.to_k(context_k)
+    v = self.to_v(context_v)
+
+    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))
+
+    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
+
+    if mask is not None:
+        mask = rearrange(mask, 'b ... -> b (...)')
+        max_neg_value = -torch.finfo(sim.dtype).max
+        mask = repeat(mask, 'b j -> (b h) () j', h=h)
+        sim.masked_fill_(~mask, max_neg_value)
+
+    # attention, what we cannot get enough of
+    attn = sim.softmax(dim=-1)
+
+    out = einsum('b i j, b j d -> b i d', attn, v)
+    out = rearrange(out, '(b h) n d -> b n (h d)', h=h)
+    return self.to_out(out)
+
+
+def train_hypernetwork(hypernetwork_name, learn_rate, data_root, log_directory, steps, create_image_every, save_hypernetwork_every, template_file, preview_image_prompt):
+    assert hypernetwork_name, 'embedding not selected'
+
+    path = shared.hypernetworks.get(hypernetwork_name, None)
+    shared.loaded_hypernetwork = Hypernetwork()
+    shared.loaded_hypernetwork.load(path)
+
+    shared.state.textinfo = "Initializing hypernetwork training..."
+    shared.state.job_count = steps
+
+    filename = os.path.join(shared.cmd_opts.hypernetwork_dir, f'{hypernetwork_name}.pt')
+
+    log_directory = os.path.join(log_directory, datetime.datetime.now().strftime("%Y-%m-%d"), hypernetwork_name)
+    unload = shared.opts.unload_models_when_training
+
+    if save_hypernetwork_every > 0:
+        hypernetwork_dir = os.path.join(log_directory, "hypernetworks")
+        os.makedirs(hypernetwork_dir, exist_ok=True)
+    else:
+        hypernetwork_dir = None
+
+    if create_image_every > 0:
+        images_dir = os.path.join(log_directory, "images")
+        os.makedirs(images_dir, exist_ok=True)
+    else:
+        images_dir = None
+
+    shared.state.textinfo = f"Preparing dataset from {html.escape(data_root)}..."
+    with torch.autocast("cuda"):
+        ds = modules.textual_inversion.dataset.PersonalizedBase(data_root=data_root, width=512, height=512, repeats=1, placeholder_token=hypernetwork_name, model=shared.sd_model, device=devices.device, template_file=template_file, include_cond=True)
+
+    if unload:
+        shared.sd_model.cond_stage_model.to(devices.cpu)
+        shared.sd_model.first_stage_model.to(devices.cpu)
+
+    hypernetwork = shared.loaded_hypernetwork
+    weights = hypernetwork.weights()
+    for weight in weights:
+        weight.requires_grad = True
+
+    losses = torch.zeros((32,))
+
+    last_saved_file = "<none>"
+    last_saved_image = "<none>"
+
+    ititial_step = hypernetwork.step or 0
+    if ititial_step > steps:
+        return hypernetwork, filename
+
+    schedules = iter(LearnSchedule(learn_rate, steps, ititial_step))
+    (learn_rate, end_step) = next(schedules)
+    print(f'Training at rate of {learn_rate} until step {end_step}')
+
+    optimizer = torch.optim.AdamW(weights, lr=learn_rate)
+
+    pbar = tqdm.tqdm(enumerate(ds), total=steps - ititial_step)
+    for i, (x, text, cond) in pbar:
+        hypernetwork.step = i + ititial_step
+
+        if hypernetwork.step > end_step:
+            try:
+                (learn_rate, end_step) = next(schedules)
+            except Exception:
+                break
+            tqdm.tqdm.write(f'Training at rate of {learn_rate} until step {end_step}')
+            for pg in optimizer.param_groups:
+                pg['lr'] = learn_rate
+
+        if shared.state.interrupted:
+            break
+
+        with torch.autocast("cuda"):
+            cond = cond.to(devices.device)
+            x = x.to(devices.device)
+            loss = shared.sd_model(x.unsqueeze(0), cond)[0]
+            del x
+            del cond
+
+            losses[hypernetwork.step % losses.shape[0]] = loss.item()
+
+            optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+
+        pbar.set_description(f"loss: {losses.mean():.7f}")
+
+        if hypernetwork.step > 0 and hypernetwork_dir is not None and hypernetwork.step % save_hypernetwork_every == 0:
+            last_saved_file = os.path.join(hypernetwork_dir, f'{hypernetwork_name}-{hypernetwork.step}.pt')
+            hypernetwork.save(last_saved_file)
+
+        if hypernetwork.step > 0 and images_dir is not None and hypernetwork.step % create_image_every == 0:
+            last_saved_image = os.path.join(images_dir, f'{hypernetwork_name}-{hypernetwork.step}.png')
+
+            preview_text = text if preview_image_prompt == "" else preview_image_prompt
+
+            optimizer.zero_grad()
+            shared.sd_model.cond_stage_model.to(devices.device)
+            shared.sd_model.first_stage_model.to(devices.device)
+
+            p = processing.StableDiffusionProcessingTxt2Img(
+                sd_model=shared.sd_model,
+                prompt=preview_text,
+                steps=20,
+                do_not_save_grid=True,
+                do_not_save_samples=True,
+            )
+
+            processed = processing.process_images(p)
+            image = processed.images[0]
+
+            if unload:
+                shared.sd_model.cond_stage_model.to(devices.cpu)
+                shared.sd_model.first_stage_model.to(devices.cpu)
+
+            shared.state.current_image = image
+            image.save(last_saved_image)
+
+            last_saved_image += f", prompt: {preview_text}"
+
+        shared.state.job_no = hypernetwork.step
+
+        shared.state.textinfo = f"""
+<p>
+Loss: {losses.mean():.7f}<br/>
+Step: {hypernetwork.step}<br/>
+Last prompt: {html.escape(text)}<br/>
+Last saved embedding: {html.escape(last_saved_file)}<br/>
+Last saved image: {html.escape(last_saved_image)}<br/>
+</p>
+"""
+
+    checkpoint = sd_models.select_checkpoint()
+
+    hypernetwork.sd_checkpoint = checkpoint.hash
+    hypernetwork.sd_checkpoint_name = checkpoint.model_name
+    hypernetwork.save(filename)
+
+    return hypernetwork, filename
+
+
--- a/modules/hypernetworks/ui.py
+++ b/modules/hypernetworks/ui.py
+import html
+import os
+
+import gradio as gr
+
+import modules.textual_inversion.textual_inversion
+import modules.textual_inversion.preprocess
+from modules import sd_hijack, shared, devices
+from modules.hypernetworks import hypernetwork
+
+
+def create_hypernetwork(name, enable_sizes):
+    fn = os.path.join(shared.cmd_opts.hypernetwork_dir, f"{name}.pt")
+    assert not os.path.exists(fn), f"file {fn} already exists"
+
+    hypernet = modules.hypernetworks.hypernetwork.Hypernetwork(name=name, enable_sizes=[int(x) for x in enable_sizes])
+    hypernet.save(fn)
+
+    shared.reload_hypernetworks()
+
+    return gr.Dropdown.update(choices=sorted([x for x in shared.hypernetworks.keys()])), f"Created: {fn}", ""
+
+
+def train_hypernetwork(*args):
+
+    initial_hypernetwork = shared.loaded_hypernetwork
+
+    assert not shared.cmd_opts.lowvram, 'Training models with lowvram is not possible'
+
+    try:
+        sd_hijack.undo_optimizations()
+
+        hypernetwork, filename = modules.hypernetworks.hypernetwork.train_hypernetwork(*args)
+
+        res = f"""
+Training {'interrupted' if shared.state.interrupted else 'finished'} at {hypernetwork.step} steps.
+Hypernetwork saved to {html.escape(filename)}
+"""
+        return res, ""
+    except Exception:
+        raise
+    finally:
+        shared.loaded_hypernetwork = initial_hypernetwork
+        shared.sd_model.cond_stage_model.to(devices.device)
+        shared.sd_model.first_stage_model.to(devices.device)
+        sd_hijack.apply_optimizations()
+
--- a/modules/ngrok.py
+++ b/modules/ngrok.py
+from pyngrok import ngrok, conf, exception
+
+
+def connect(token, port):
+    if token == None:
+        token = 'None'
+    conf.get_default().auth_token = token
+    try:
+        public_url = ngrok.connect(port).public_url
+    except exception.PyngrokNgrokError:
+        print(f'Invalid ngrok authtoken, ngrok connection aborted.\n'
+              f'Your token: {token}, get the right one on https://dashboard.ngrok.com/get-started/your-authtoken')
+    else:
+        print(f'ngrok connected to localhost:{port}! URL: {public_url}\n'
+               'You can use this link after the launch is complete.')
--- a/modules/processing.py
+++ b/modules/processing.py
@@ -207,7 +207,7 @@ def create_random_tensors(shape, seeds, subseeds=None, subseed_strength=0.0, see
    # enables the generation of additional tensors with noise that the sampler will use during its processing.
    # Using those pre-generated tensors instead of simple torch.randn allows a batch with seeds [100, 101] to
    # produce the same images as with two batches [100], [101].
-    if p is not None and p.sampler is not None and len(seeds) > 1 and opts.enable_batch_seeds:
+    if p is not None and p.sampler is not None and (len(seeds) > 1 and opts.enable_batch_seeds or opts.eta_noise_seed_delta > 0):
        sampler_noises = [[] for _ in range(p.sampler.number_of_needed_noises(p))]
    else:
        sampler_noises = None
@@ -247,6 +247,9 @@ def create_random_tensors(shape, seeds, subseeds=None, subseed_strength=0.0, see
        if sampler_noises is not None:
            cnt = p.sampler.number_of_needed_noises(p)

+            if opts.eta_noise_seed_delta > 0:
+                torch.manual_seed(seed + opts.eta_noise_seed_delta)
+
            for j in range(cnt):
                sampler_noises[j].append(devices.randn_without_seed(tuple(noise_shape)))

@@ -259,6 +262,13 @@ def create_random_tensors(shape, seeds, subseeds=None, subseed_strength=0.0, see
    return x


+def decode_first_stage(model, x):
+    with devices.autocast(disable=x.dtype == devices.dtype_vae):
+        x = model.decode_first_stage(x)
+
+    return x
+
+
 def get_fixed_seed(seed):
    if seed is None or seed == '' or seed == -1:
        return int(random.randrange(4294967294))
@@ -294,6 +304,7 @@ def create_infotext(p, all_prompts, all_seeds, all_subseeds, comments, iteration
        "Denoising strength": getattr(p, 'denoising_strength', None),
        "Eta": (None if p.sampler is None or p.sampler.eta == p.sampler.default_eta else p.sampler.eta),
        "Clip skip": None if clip_skip <= 1 else clip_skip,
+        "ENSD": None if opts.eta_noise_seed_delta == 0 else opts.eta_noise_seed_delta,
    }

    generation_params.update(p.extra_generation_params)
@@ -398,9 +409,8 @@ def process_images(p: StableDiffusionProcessing) -> Processed:
                # use the image collected previously in sampler loop
                samples_ddim = shared.state.current_latent

-            samples_ddim = samples_ddim.to(devices.dtype)
-
-            x_samples_ddim = p.sd_model.decode_first_stage(samples_ddim)
+            samples_ddim = samples_ddim.to(devices.dtype_vae)
+            x_samples_ddim = decode_first_stage(p.sd_model, samples_ddim)
            x_samples_ddim = torch.clamp((x_samples_ddim + 1.0) / 2.0, min=0.0, max=1.0)

            del samples_ddim
@@ -533,7 +543,7 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
        if self.scale_latent:
            samples = torch.nn.functional.interpolate(samples, size=(self.height // opt_f, self.width // opt_f), mode="bilinear")
        else:
-            decoded_samples = self.sd_model.decode_first_stage(samples)
+            decoded_samples = decode_first_stage(self.sd_model, samples)

            if opts.upscaler_for_img2img is None or opts.upscaler_for_img2img == "None":
                decoded_samples = torch.nn.functional.interpolate(decoded_samples, size=(self.height, self.width), mode="bilinear")

--- a/modules/safe.py
+++ b/modules/safe.py
@@ -10,6 +10,11 @@ import torch
 import numpy
 import _codecs
 import zipfile
+import re
+
+
+# PyTorch 1.13 and later have _TypedStorage renamed to TypedStorage
+TypedStorage = torch.storage.TypedStorage if hasattr(torch.storage, 'TypedStorage') else torch.storage._TypedStorage


 def encode(*args):
@@ -20,7 +25,7 @@ def encode(*args):
 class RestrictedUnpickler(pickle.Unpickler):
    def persistent_load(self, saved_id):
        assert saved_id[0] == 'storage'
-        return torch.storage._TypedStorage()
+        return TypedStorage()

    def find_class(self, module, name):
        if module == 'collections' and name == 'OrderedDict':
@@ -50,11 +55,27 @@ class RestrictedUnpickler(pickle.Unpickler):
        raise pickle.UnpicklingError(f"global '{module}/{name}' is forbidden")


+allowed_zip_names = ["archive/data.pkl", "archive/version"]
+allowed_zip_names_re = re.compile(r"^archive/data/\d+$")
+
+
+def check_zip_filenames(filename, names):
+    for name in names:
+        if name in allowed_zip_names:
+            continue
+        if allowed_zip_names_re.match(name):
+            continue
+
+        raise Exception(f"bad file inside {filename}: {name}")
+
+
 def check_pt(filename):
    try:

        # new pytorch format is a zip file
        with zipfile.ZipFile(filename) as z:
+            check_zip_filenames(filename, z.namelist())
+
            with z.open('archive/data.pkl') as file:
                unpickler = RestrictedUnpickler(file)
                unpickler.load()

--- a/modules/sd_hijack.py
+++ b/modules/sd_hijack.py
@@ -8,8 +8,9 @@ from torch import einsum
 from torch.nn.functional import silu

 import modules.textual_inversion.textual_inversion
-from modules import prompt_parser, devices, sd_hijack_optimizations, shared, hypernetwork
+from modules import prompt_parser, devices, sd_hijack_optimizations, shared
 from modules.shared import opts, device, cmd_opts
+from modules.sd_hijack_optimizations import invokeAI_mps_available

 import ldm.modules.attention
 import ldm.modules.diffusionmodules.model
@@ -23,30 +24,37 @@ def apply_optimizations():

    ldm.modules.diffusionmodules.model.nonlinearity = silu

-    if cmd_opts.force_enable_xformers or (cmd_opts.xformers and shared.xformers_available and torch.version.cuda and torch.cuda.get_device_capability(shared.device) == (8, 6)):
+    if cmd_opts.force_enable_xformers or (cmd_opts.xformers and shared.xformers_available and torch.version.cuda and (6, 0) <= torch.cuda.get_device_capability(shared.device) <= (8, 6)):
        print("Applying xformers cross attention optimization.")
        ldm.modules.attention.CrossAttention.forward = sd_hijack_optimizations.xformers_attention_forward
        ldm.modules.diffusionmodules.model.AttnBlock.forward = sd_hijack_optimizations.xformers_attnblock_forward
    elif cmd_opts.opt_split_attention_v1:
        print("Applying v1 cross attention optimization.")
        ldm.modules.attention.CrossAttention.forward = sd_hijack_optimizations.split_cross_attention_forward_v1
+    elif not cmd_opts.disable_opt_split_attention and (cmd_opts.opt_split_attention_invokeai or not torch.cuda.is_available()):
+        if not invokeAI_mps_available and shared.device.type == 'mps':
+            print("The InvokeAI cross attention optimization for MPS requires the psutil package which is not installed.")
+            print("Applying v1 cross attention optimization.")
+            ldm.modules.attention.CrossAttention.forward = sd_hijack_optimizations.split_cross_attention_forward_v1
+        else:
+            print("Applying cross attention optimization (InvokeAI).")
+            ldm.modules.attention.CrossAttention.forward = sd_hijack_optimizations.split_cross_attention_forward_invokeAI
    elif not cmd_opts.disable_opt_split_attention and (cmd_opts.opt_split_attention or torch.cuda.is_available()):
-        print("Applying cross attention optimization.")
+        print("Applying cross attention optimization (Doggettx).")
        ldm.modules.attention.CrossAttention.forward = sd_hijack_optimizations.split_cross_attention_forward
        ldm.modules.diffusionmodules.model.AttnBlock.forward = sd_hijack_optimizations.cross_attention_attnblock_forward


 def undo_optimizations():
+    from modules.hypernetworks import hypernetwork
+
    ldm.modules.attention.CrossAttention.forward = hypernetwork.attention_CrossAttention_forward
    ldm.modules.diffusionmodules.model.nonlinearity = diffusionmodules_model_nonlinearity
    ldm.modules.diffusionmodules.model.AttnBlock.forward = diffusionmodules_model_AttnBlock_forward


 def get_target_prompt_token_count(token_count):
-    if token_count < 75:
-        return 75
-
-    return math.ceil(token_count / 10) * 10
+    return math.ceil(max(token_count, 1) / 75) * 75


 class StableDiffusionModelHijack:
@@ -110,6 +118,8 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module):
        self.tokenizer = wrapped.tokenizer
        self.token_mults = {}

+        self.comma_token = [v for k, v in self.tokenizer.get_vocab().items() if k == ',</w>'][0]
+
        tokens_with_parens = [(k, v) for k, v in self.tokenizer.get_vocab().items() if '(' in k or ')' in k or '[' in k or ']' in k]
        for text, ident in tokens_with_parens:
            mult = 1.0
@@ -127,7 +137,6 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module):
                self.token_mults[ident] = mult

    def tokenize_line(self, line, used_custom_terms, hijack_comments):
-        id_start = self.wrapped.tokenizer.bos_token_id
        id_end = self.wrapped.tokenizer.eos_token_id

        if opts.enable_emphasis:
@@ -140,6 +149,7 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module):
        fixes = []
        remade_tokens = []
        multipliers = []
+        last_comma = -1

        for tokens, (text, weight) in zip(tokenized, parsed):
            i = 0
@@ -148,13 +158,33 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module):

                embedding, embedding_length_in_tokens = self.hijack.embedding_db.find_embedding_at_position(tokens, i)

+                if token == self.comma_token:
+                    last_comma = len(remade_tokens)
+                elif opts.comma_padding_backtrack != 0 and max(len(remade_tokens), 1) % 75 == 0 and last_comma != -1 and len(remade_tokens) - last_comma <= opts.comma_padding_backtrack:
+                    last_comma += 1
+                    reloc_tokens = remade_tokens[last_comma:]
+                    reloc_mults = multipliers[last_comma:]
+
+                    remade_tokens = remade_tokens[:last_comma]
+                    length = len(remade_tokens)
+                    
+                    rem = int(math.ceil(length / 75)) * 75 - length
+                    remade_tokens += [id_end] * rem + reloc_tokens
+                    multipliers = multipliers[:last_comma] + [1.0] * rem + reloc_mults
+                
                if embedding is None:
                    remade_tokens.append(token)
                    multipliers.append(weight)
                    i += 1
                else:
                    emb_len = int(embedding.vec.shape[0])
-                    fixes.append((len(remade_tokens), embedding))
+                    iteration = len(remade_tokens) // 75
+                    if (len(remade_tokens) + emb_len) // 75 != iteration:
+                        rem = (75 * (iteration + 1) - len(remade_tokens))
+                        remade_tokens += [id_end] * rem
+                        multipliers += [1.0] * rem
+                        iteration += 1
+                    fixes.append((iteration, (len(remade_tokens) % 75, embedding)))
                    remade_tokens += [0] * emb_len
                    multipliers += [weight] * emb_len
                    used_custom_terms.append((embedding.name, embedding.checksum()))
@@ -162,10 +192,10 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module):

        token_count = len(remade_tokens)
        prompt_target_length = get_target_prompt_token_count(token_count)
-        tokens_to_add = prompt_target_length - len(remade_tokens) + 1
+        tokens_to_add = prompt_target_length - len(remade_tokens)

-        remade_tokens = [id_start] + remade_tokens + [id_end] * tokens_to_add
-        multipliers = [1.0] + multipliers + [1.0] * tokens_to_add
+        remade_tokens = remade_tokens + [id_end] * tokens_to_add
+        multipliers = multipliers + [1.0] * tokens_to_add

        return remade_tokens, fixes, multipliers, token_count

@@ -260,29 +290,55 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module):
            hijack_fixes.append(fixes)
            batch_multipliers.append(multipliers)
        return batch_multipliers, remade_batch_tokens, used_custom_terms, hijack_comments, hijack_fixes, token_count
-
+    
    def forward(self, text):
-
-        if opts.use_old_emphasis_implementation:
+        use_old = opts.use_old_emphasis_implementation
+        if use_old:
            batch_multipliers, remade_batch_tokens, used_custom_terms, hijack_comments, hijack_fixes, token_count = self.process_text_old(text)
        else:
            batch_multipliers, remade_batch_tokens, used_custom_terms, hijack_comments, hijack_fixes, token_count = self.process_text(text)

-        self.hijack.fixes = hijack_fixes
        self.hijack.comments += hijack_comments

        if len(used_custom_terms) > 0:
            self.hijack.comments.append("Used embeddings: " + ", ".join([f'{word} [{checksum}]' for word, checksum in used_custom_terms]))
+        
+        if use_old:
+            self.hijack.fixes = hijack_fixes
+            return self.process_tokens(remade_batch_tokens, batch_multipliers)
+        
+        z = None
+        i = 0
+        while max(map(len, remade_batch_tokens)) != 0:
+            rem_tokens = [x[75:] for x in remade_batch_tokens]
+            rem_multipliers = [x[75:] for x in batch_multipliers]
+
+            self.hijack.fixes = []
+            for unfiltered in hijack_fixes:
+                fixes = []
+                for fix in unfiltered:
+                    if fix[0] == i:
+                        fixes.append(fix[1])
+                self.hijack.fixes.append(fixes)
+            
+            z1 = self.process_tokens([x[:75] for x in remade_batch_tokens], [x[:75] for x in batch_multipliers])
+            z = z1 if z is None else torch.cat((z, z1), axis=-2)
+            
+            remade_batch_tokens = rem_tokens
+            batch_multipliers = rem_multipliers
+            i += 1
+        
+        return z
+        
+    
+    def process_tokens(self, remade_batch_tokens, batch_multipliers):
+        if not opts.use_old_emphasis_implementation:
+            remade_batch_tokens = [[self.wrapped.tokenizer.bos_token_id] + x[:75] + [self.wrapped.tokenizer.eos_token_id] for x in remade_batch_tokens]
+            batch_multipliers = [[1.0] + x[:75] + [1.0] for x in batch_multipliers]
+        
+        tokens = torch.asarray(remade_batch_tokens).to(device)
+        outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)

-        target_token_count = get_target_prompt_token_count(token_count) + 2
-
-        position_ids_array = [min(x, 75) for x in range(target_token_count-1)] + [76]
-        position_ids = torch.asarray(position_ids_array, device=devices.device).expand((1, -1))
-
-        remade_batch_tokens_of_same_length = [x + [self.wrapped.tokenizer.eos_token_id] * (target_token_count - len(x)) for x in remade_batch_tokens]
-        tokens = torch.asarray(remade_batch_tokens_of_same_length).to(device)
-
-        outputs = self.wrapped.transformer(input_ids=tokens, position_ids=position_ids, output_hidden_states=-opts.CLIP_stop_at_last_layers)
        if opts.CLIP_stop_at_last_layers > 1:
            z = outputs.hidden_states[-opts.CLIP_stop_at_last_layers]
            z = self.wrapped.transformer.text_model.final_layer_norm(z)
@@ -290,7 +346,7 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module):
            z = outputs.last_hidden_state

        # restoring original mean is likely not correct, but it seems to work well to prevent artifacts that happen otherwise
-        batch_multipliers_of_same_length = [x + [1.0] * (target_token_count - len(x)) for x in batch_multipliers]
+        batch_multipliers_of_same_length = [x + [1.0] * (75 - len(x)) for x in batch_multipliers]
        batch_multipliers = torch.asarray(batch_multipliers_of_same_length).to(device)
        original_mean = z.mean()
        z *= batch_multipliers.reshape(batch_multipliers.shape + (1,)).expand(z.shape)

--- a/modules/sd_hijack_optimizations.py
+++ b/modules/sd_hijack_optimizations.py
 import math
 import sys
 import traceback
+import importlib

 import torch
 from torch import einsum
@@ -9,12 +10,12 @@ from ldm.util import default
 from einops import rearrange

 from modules import shared
+from modules.hypernetworks import hypernetwork
+

 if shared.cmd_opts.xformers or shared.cmd_opts.force_enable_xformers:
    try:
        import xformers.ops
-        import functorch
-        xformers._is_functorch_available = True
        shared.xformers_available = True
    except Exception:
        print("Cannot import xformers", file=sys.stderr)
@@ -28,16 +29,10 @@ def split_cross_attention_forward_v1(self, x, context=None, mask=None):
    q_in = self.to_q(x)
    context = default(context, x)

-    hypernetwork = shared.loaded_hypernetwork
-    hypernetwork_layers = (hypernetwork.layers if hypernetwork is not None else {}).get(context.shape[2], None)
-
-    if hypernetwork_layers is not None:
-        k_in = self.to_k(hypernetwork_layers[0](context))
-        v_in = self.to_v(hypernetwork_layers[1](context))
-    else:
-        k_in = self.to_k(context)
-        v_in = self.to_v(context)
-    del context, x
+    context_k, context_v = hypernetwork.apply_hypernetwork(shared.loaded_hypernetwork, context)
+    k_in = self.to_k(context_k)
+    v_in = self.to_v(context_v)
+    del context, context_k, context_v, x

    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q_in, k_in, v_in))
    del q_in, k_in, v_in
@@ -61,22 +56,16 @@ def split_cross_attention_forward_v1(self, x, context=None, mask=None):
    return self.to_out(r2)


-# taken from https://github.com/Doggettx/stable-diffusion
+# taken from https://github.com/Doggettx/stable-diffusion and modified
 def split_cross_attention_forward(self, x, context=None, mask=None):
    h = self.heads

    q_in = self.to_q(x)
    context = default(context, x)

-    hypernetwork = shared.loaded_hypernetwork
-    hypernetwork_layers = (hypernetwork.layers if hypernetwork is not None else {}).get(context.shape[2], None)
-
-    if hypernetwork_layers is not None:
-        k_in = self.to_k(hypernetwork_layers[0](context))
-        v_in = self.to_v(hypernetwork_layers[1](context))
-    else:
-        k_in = self.to_k(context)
-        v_in = self.to_v(context)
+    context_k, context_v = hypernetwork.apply_hypernetwork(shared.loaded_hypernetwork, context)
+    k_in = self.to_k(context_k)
+    v_in = self.to_v(context_v)

    k_in *= self.scale

@@ -128,18 +117,111 @@ def split_cross_attention_forward(self, x, context=None, mask=None):

    return self.to_out(r2)

+
+def check_for_psutil():
+    try:
+        spec = importlib.util.find_spec('psutil')
+        return spec is not None
+    except ModuleNotFoundError:
+        return False
+
+invokeAI_mps_available = check_for_psutil()
+
+# -- Taken from https://github.com/invoke-ai/InvokeAI --
+if invokeAI_mps_available:
+    import psutil
+    mem_total_gb = psutil.virtual_memory().total // (1 << 30)
+
+def einsum_op_compvis(q, k, v):
+    s = einsum('b i d, b j d -> b i j', q, k)
+    s = s.softmax(dim=-1, dtype=s.dtype)
+    return einsum('b i j, b j d -> b i d', s, v)
+
+def einsum_op_slice_0(q, k, v, slice_size):
+    r = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device, dtype=q.dtype)
+    for i in range(0, q.shape[0], slice_size):
+        end = i + slice_size
+        r[i:end] = einsum_op_compvis(q[i:end], k[i:end], v[i:end])
+    return r
+
+def einsum_op_slice_1(q, k, v, slice_size):
+    r = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device, dtype=q.dtype)
+    for i in range(0, q.shape[1], slice_size):
+        end = i + slice_size
+        r[:, i:end] = einsum_op_compvis(q[:, i:end], k, v)
+    return r
+
+def einsum_op_mps_v1(q, k, v):
+    if q.shape[1] <= 4096: # (512x512) max q.shape[1]: 4096
+        return einsum_op_compvis(q, k, v)
+    else:
+        slice_size = math.floor(2**30 / (q.shape[0] * q.shape[1]))
+        return einsum_op_slice_1(q, k, v, slice_size)
+
+def einsum_op_mps_v2(q, k, v):
+    if mem_total_gb > 8 and q.shape[1] <= 4096:
+        return einsum_op_compvis(q, k, v)
+    else:
+        return einsum_op_slice_0(q, k, v, 1)
+
+def einsum_op_tensor_mem(q, k, v, max_tensor_mb):
+    size_mb = q.shape[0] * q.shape[1] * k.shape[1] * q.element_size() // (1 << 20)
+    if size_mb <= max_tensor_mb:
+        return einsum_op_compvis(q, k, v)
+    div = 1 << int((size_mb - 1) / max_tensor_mb).bit_length()
+    if div <= q.shape[0]:
+        return einsum_op_slice_0(q, k, v, q.shape[0] // div)
+    return einsum_op_slice_1(q, k, v, max(q.shape[1] // div, 1))
+
+def einsum_op_cuda(q, k, v):
+    stats = torch.cuda.memory_stats(q.device)
+    mem_active = stats['active_bytes.all.current']
+    mem_reserved = stats['reserved_bytes.all.current']
+    mem_free_cuda, _ = torch.cuda.mem_get_info(q.device)
+    mem_free_torch = mem_reserved - mem_active
+    mem_free_total = mem_free_cuda + mem_free_torch
+    # Divide factor of safety as there's copying and fragmentation
+    return self.einsum_op_tensor_mem(q, k, v, mem_free_total / 3.3 / (1 << 20))
+
+def einsum_op(q, k, v):
+    if q.device.type == 'cuda':
+        return einsum_op_cuda(q, k, v)
+
+    if q.device.type == 'mps':
+        if mem_total_gb >= 32:
+            return einsum_op_mps_v1(q, k, v)
+        return einsum_op_mps_v2(q, k, v)
+
+    # Smaller slices are faster due to L2/L3/SLC caches.
+    # Tested on i7 with 8MB L3 cache.
+    return einsum_op_tensor_mem(q, k, v, 32)
+
+def split_cross_attention_forward_invokeAI(self, x, context=None, mask=None):
+    h = self.heads
+
+    q = self.to_q(x)
+    context = default(context, x)
+
+    context_k, context_v = hypernetwork.apply_hypernetwork(shared.loaded_hypernetwork, context)
+    k = self.to_k(context_k) * self.scale
+    v = self.to_v(context_v)
+    del context, context_k, context_v, x
+
+    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))
+    r = einsum_op(q, k, v)
+    return self.to_out(rearrange(r, '(b h) n d -> b n (h d)', h=h))
+
+# -- End of code from https://github.com/invoke-ai/InvokeAI --
+
 def xformers_attention_forward(self, x, context=None, mask=None):
    h = self.heads
    q_in = self.to_q(x)
    context = default(context, x)
-    hypernetwork = shared.loaded_hypernetwork
-    hypernetwork_layers = (hypernetwork.layers if hypernetwork is not None else {}).get(context.shape[2], None)
-    if hypernetwork_layers is not None:
-        k_in = self.to_k(hypernetwork_layers[0](context))
-        v_in = self.to_v(hypernetwork_layers[1](context))
-    else:
-        k_in = self.to_k(context)
-        v_in = self.to_v(context)
+
+    context_k, context_v = hypernetwork.apply_hypernetwork(shared.loaded_hypernetwork, context)
+    k_in = self.to_k(context_k)
+    v_in = self.to_v(context_v)
+
    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b n h d', h=h), (q_in, k_in, v_in))
    del q_in, k_in, v_in
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)

--- a/modules/sd_models.py
+++ b/modules/sd_models.py
@@ -149,8 +149,13 @@ def load_model_weights(model, checkpoint_info):
        model.half()

    devices.dtype = torch.float32 if shared.cmd_opts.no_half else torch.float16
+    devices.dtype_vae = torch.float32 if shared.cmd_opts.no_half or shared.cmd_opts.no_half_vae else torch.float16

    vae_file = os.path.splitext(checkpoint_file)[0] + ".vae.pt"
+
+    if not os.path.exists(vae_file) and shared.cmd_opts.vae_path is not None:
+        vae_file = shared.cmd_opts.vae_path
+
    if os.path.exists(vae_file):
        print(f"Loading VAE weights from: {vae_file}")
        vae_ckpt = torch.load(vae_file, map_location="cpu")
@@ -158,6 +163,8 @@ def load_model_weights(model, checkpoint_info):

        model.first_stage_model.load_state_dict(vae_dict)

+    model.first_stage_model.to(devices.dtype_vae)
+
    model.sd_model_hash = sd_model_hash
    model.sd_model_checkpoint = checkpoint_file
    model.sd_checkpoint_info = checkpoint_info