Comfyui speed up github Also ComfyUI takes up more VRAM (6400 MB in ComfyUI and 4200 MB in A1111). Will provide feedback later if you like Reply reply Thanks to city96 for active development of the node. - Speed up inference on nvidia 10 series on Linux. 20K subscribers in the comfyui community. - Speed up TAESD preview. ***************************************************** "bitsandbytes_NF4" custom Up to 28. @comfyanonymous friend, if I want to speed up loading models in ComfyUI that are stored in my mounted Google Drive, for example, Sign up for free to join this conversation on GitHub. (7900 XTX, 32GB RAM, Windows 10, Radeon 24. com/gameltb/ComfyUI_stable_fast. NVIDIA just released the 545. The problem with what I did is since I disabled the xformers, we lose a lot This has a very slight hit on inference speed and zero hit on memory use, initial tests indicate it's absolutely worth using. With four LORA, the speed drops x3. By clicking “Sign up for GitHub”, when running this, it seems abnormally slow. https://developer. Works fully offline: will never download anything. What sampler/scheduler are you seeing the speed increase with? Euler - Simple, try using --force-fp32 or --force-fp16 and if no improvement then --use-split-cross-attention Boosting ComfyUI Model Load Speed in Google Colab Pro with Google Drive. New. This provides more context for the sampling. Reload to refresh your session. A small 10MB default model, 320n. 90 votes, 23 comments. github. · comfyanonymous/ComfyUI@58c9838 context_expand_pixels: how much to grow the context area (i. I'm seeing iterations go from 2-3s/it to 40-70s/it+ Running on an i9 11900k, 32GB Ram, NVidea RTX 4070 12GB (I know I'm kinda pushing it on VRam so not sure if this sampler is just a bit more strict with VRam requirements) You can also try setting this env variable PYTORCH_TUNABLEOP_ENABLED=1 which might speed things up at the cost of a very slow initial run. Try using an fp16 model config in the CheckpointLoader node. · comfyanonymous/ComfyUI@ae197f6 T-GATE could brings 10%-50% speed up for different diffusion models, only slightly reduces the quality of the generated images and maintains the original composition. I have two accelerators, and I used to rely on one with fast internet speed for updating ComfyUI and plugins. The problem was solved after the last update, at least on Q8. ai. com Open. Some monkey patch is used for current implementation. context_expand_pixels: how much to grow the context area (i. But with two or more, the speed drops several times. When I use the single file version of FP8, generating a 1024*1024 graph takes up about 14g of VRAM, with a peak of 31g of RAM; when I use the nf4 version, it takes up about 12. 5 and 2. mp4 Update 2: Experimental IP2V - Image Prompting to Video via VLM by @Dango233. Top. com/blog/unlock-faster-image-generation-in-stable-diffusion-web-ui-with-nvidia-tensorrt/ Is anyone "flux1-dev-bnb-nf4" is a new Flux model that is nearly 4 times faster than the Flux Dev version and 3 times faster than the Flux Schnell version. but on 4090 we really got x2 speed. e. 5 version and CUDA to modify related code to speed up image generation · Issue #5535 · comfyanonymous/ComfyUI Hi, is there a chance to speed up the installation process? Unfortunately the environment uses only one cpu core for the pip install process, which can take a long time (up to 2 hours) depending on the instance of vast. 1 driver). AUTOMATIC1111. ; fill_mask_holes: Whether to fully fill any Unless you're planning on running a public server I guess there's not really much information here. mp4 HunyuanVideo_00306. I'm talking about after the container spins up. vram capacity isnt really the issue, its getting the data to the cores fast enough, big vram is just our best solution currently (the us department of energy recently released a paper on supercluster parrelization in which they retimed the data flow to Speed is fairly comparable between models for me usually only going up a percentage for FP16 Dev. - How to use PyTorch 2. Vast. 51s → 0. In ubuntu I am getting around 10it/s with my 6900xt on default settings (py -3. First thing I’ve noticed that the UI is recognizing the 120hz display, on idle (not Upscale Models (ESRGAN, ESRGAN variants, SwinIR, Swin2SR, etc) Starts up very fast. 32s (37. When using one LORA, I didnt notice a drop in speed (Q8). Beta Was this translation The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. · comfyanonymous Getting 1. context_expand_factor: how much to grow the context area (i. From initial testing, the filtering effect is better than classifier models such as This will break the comfy because the xformers that still enabled (no idea of how disable only in VAE), maybe overkill but by disabling xformers with --disable-xformers --use-split-cross-attention param made the trick, and now I can render images in a higer resolution than the usual without crash. 30it/s with these settings: 512x512, Euler a, 100 steps, 15 cfg and ComfyUI with the same settings is only 9. My assumption was the filename prefix loop or the repeated regex. the area for the sampling) around the original mask, in pixels. It should be at least as fast as the a1111 ui if you do that. It's freaking magic. My PC Specifications: Processor: Intel I'm experiencing slow network speeds when using ComfyUI, especially during operations that involve downloading data. 512x512 was 22-24 it/s up to 49 it/s. Open the . A100 didn't support the fp8 types and presumably at some point TransformerEngine will get ported to Windows / integrated I have a dual boot system Ubuntu/Win11. Actual Behavior. nvidia. Workflow examples can be found on the Examples in my GPU cloud service, it takes ~40 seconds to launch ComfyUI. bat file. 10 main. Contribute to ccssu/ComfyUI-Workflows-Speedup development by creating an account on GitHub. If it ComfyUI Flux Accelerator can generate images up to 37. ComfyUI ComfyUI-Workflows-Speedup. upd. I've noticed that after the recent update, the inference speed of Flux has slowed down, and there's no difference in speed between using the --fast option and not using it, my device being RTX4090 The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. On Windows using directml I get 1it/s, usually less, using (py -3 The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. 84 drivers and they show a 2x speed improvement on image inference on this page: the readme only mentions AUTOMATIC1111 so I guess it will need work for it to wok in ComfyUI. You switched accounts on another tab or window. Config file to set the search paths for models. With the arrival of Flux, even 24gb cards are maxed out and models have to be swapped in and out in the image creation process, which is slow. However, the generation speed drops significantly with each added LORA. That should speed things up a bit on newer cards. Notes Only parts of the graph that have an output with all the correct inputs will be executed. However, a Comfyui windows portable, fully up to date 13900k, 32 GB ram windows 11 h2 4090 newest drivers works fine with 1. Sort by: Best. If you have two gpus this would be a massive s A1111 gives me 10. 25% faster) Run ComfyUI with --disable-cuda-malloc may be possible to optimize the speed further. 8. All Feature Idea Allow memory to split across GPUs. in my local computer, it takes ~10 seconds to launch, also it has wayyy more custom nodes. g. Anything to speed up my workflows. The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. The FLUX model took a long time to load, but I was able to fix it. If you wish to use other models from that repository, download the ONNX model and place it in the models/nsfw directory, then set the appropriate detect_size. the speed loss for cpu off loading is because of transfer of data back and forth, aswel as read/write operations. Open comment sort options. A1111 generates an image with the same settings (in spoilers) in 41 seconds, and ComfyUI in 54 seconds. It can be done without any loss in quality when the sigma are low enough (~1). the area for the sampling) around the original mask, as a factor, e. 70it/s. 0 flows, but sdxl loads the checkpoints, take up about 19GB vram, then pushes to 24 GB vram upon running a prompt, once the prompt finishes, (or if I cancel it) it just sits at 24 GB until I close out the comfyui command prompt Expected Behavior--fast should be faster. FreeU and PatchModelAddDownscale are now supported experimentally, Just use the comfy node It now has a ComfyUI extension: https://github. Turns out that with Find your ComfyUI main directory (usually something like C:\ComfyUI_windows_portable) and just put your arguments in the run_nvidia_gpu. HunyuanVideo_00304. There is no progress at all, ComfyUI starts hogging 1 CPU core 100%, and my computer becomes unusably slow (to the point regardless of which upscale model - experienced slow speed/inactivity with models like the 4xUltraSharp and 4xFFHQDAT etc. 25% faster than the default settings. - Speed up hunyuan dit inference a bit. You signed out in another tab or window. . onnx, is provided. ; invert_mask: Whether to fully invert the context_expand_pixels: how much to grow the context area (i. ai has gi You signed in with another tab or window. This node adapts the original model and inference code from nudenet for use with Comfy. 1 is grow 10% of the size of the mask. I can confirm, everything false still sees extremely slow save speed. py). 1. The speed is the same though - about 2 seconds per it. 25 votes, 14 comments. bat file with notepad, make your changes, There has been a number of big changes to the ComfyUI core recently which should improve performance across the board but there might still be some bugs that slow After installing the beta version of desktop ComfyUI, I’ve started testing the performance. Here are some examples (tested on RTX 4090): 512x512 4steps: 0. Best. 7 seconds in auto1111 with 512x512 20 steps euler comfy gets me 3 seconds to do same image with same settings, thats half the speed, and its pretty big slowdown from auto1111 Any chance t Saved searches Use saved searches to filter your results more quickly The fact it works the first time but fails on the second makes me think there is something to improve, but I am definitely playing with the limit of my system (resolution around 1024x768 and other things in my workflow). You signed in with another tab or window. A bit ago I tried saving in batches asynchronously and then changing the date metadata post-save so everything was in their correct order, but couldn't get the filename/date stuff right and gave up. 7g of VRAM, with a peak of about 16g of RAM, and both of them are at about the same speed, and the reduction of video memory usage doesn't seem to have been as much as I Keybind Explanation; Ctrl + Enter: Queue up current graph for generation: Ctrl + Shift + Enter: Queue up current graph as first for generation: Ctrl + Alt + Enter Saved searches Use saved searches to filter your results more quickly The problem is that everyone has different configurations, and my ComfyUI setup was a mess. 5% speed increase with my latest "automatic CFG" update! In short: Turning off the guidance makes the steps go twice as fast. Share Add a Comment. ngkq wvagi dkkv ymar ijhn nvjnq dnyk grb xskyym wfwoek