Inventree 0.9+ cannot be started on low-end servers due to infinite loop of restarting timeouting workers

This issue has been tracked since 2023-01-08.

Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

Describe the bug*

It is impossible to start Inventree 0.9+ on low-end servers, at least after update from 0.8.3. The problem is that by default Inventree spawns 4 workers. After startup of Inventree, these workers get a task to make a backup of database. This task fails because the workers time out (the server gets loaded to 100% on 2c/4t Atom N2800) and get restarted. Also, Inventree tries to create 3 consecutive backups with time period of 2 minutes which is guaranteed to fail on an Atom server with HDD (currently my DB is about 373 kB and media 343 MB).
This forces Inventree startup into infinite loop of backup/fail and web interface never starts.
When Gunicorn servers and background workers are limited both to 1, it is possible to start Inventree.
Also, with 0.8.x it was perfectly fine. And it's not due to low memory - the server has 4 GB which don't get full when the problem arises, Docker has no artificial limits set. It may also be linked to HDD speed.

Steps to Reproduce

  1. Install Inventree 0.9+ using docker-compose on a slow machine and with a large media store.
  2. Start up Inventree in non-detached mode.
  3. Observe the repeated restarts of Inventree workers in a loop while trying to make a backup.
  4. Shut down Inventree and add this to your .env file
INVENTREE_GUNICORN_WORKERS=1
INVENTREE_BACKGROUND_WORKERS=1

and also possibly this to config.yaml

background:
  workers: 1
  timeout: 90
  1. Start Inventree. Eventually, it starts and from now on works normally.

Example of behaviour before modification of workers.
image

Expected behavior

Backups shouldn't be run at the very moment of starting a server. These should be postponed until Inventree reaches some uptime (at least a few minutes).

Maybe, if workers periodically fail in a loop, try reducing the amount of workers. If they fail because the HW gets overwhelmed by them, killing and respawning all of them doesn't solve any issue, it just halts Inventree.

Maybe Inventree shouldn't be spawning all workers at once when it starts. Some delay could be added before each of them is spawned.

Inventree shouldn't be trying to make several backups in a row (it did so 3x even after it successfully started).

Discussion thread:
#4179

Relevant issue (now closed):
#4086

Deployment Method

  • Docker
  • Bare metal

Version Information

0.9.1 and 0.9.2 from Docker, 0.9.0 not tested.

Relevant log output

No response

SchrodingersGat wrote this answer on 2023-01-11

@MR-DOS some changes are being implemented in #4190

Namely:

  • backup-on-update is now a manual process requiring interaction from the user
  • daily backup is disabled by default

Hopefully this combination should prevent this issue from recurring

MR-DOS wrote this answer on 2023-01-11

Hi, I believe the fix fixes a completely different issue (or even a non-issue). The problem is not with "invoke update" at all. The problem is when starting the server as usual - it tries to create backups on startup of a normal server instance because prior versions did not have this feature and thus no backups exist which forces it to make backups which fails at that point.
Also, the system load during backup is linearly dependent on amount of background workers.

SchrodingersGat wrote this answer on 2023-01-30

@matmair this is a critical one before we release 0.10.0 - any ideas how to best tackle it? It appears that the background worker is being overwhelmed by the daily backup task - especially on initial server startup

MR-DOS wrote this answer on 2023-01-30

Also, I'm having deterministic periodic freezes with a period of half an hour. It seems to happen at xx:00 and xx:30 +- 5 minutes. Inventree freezes for maybe 1-2 minutes before becoming responsive again. Happens on each half-hour with no exceptions. No cron jobs are running on the system and none of the other services have been updated, so it has to be some change in Inventree.
I will probably have to connect to the machine and start the docker environment in non-detached mode to figure out what it's caused by, but I'd suspect it's the backup system again. So for now, I'm leaving it here as it might be relevant to this bug as well. If it turns out to be something else, I will file a new report for that bug.

SchrodingersGat wrote this answer on 2023-01-30

@MR-DOS this is strange - the "backup" task should only run daily, not at half hour intervals. I have multiple docker installs running, have never observed this. Please do keep us updated if you find anything else!

matmair wrote this answer on 2023-01-30

@SchrodingersGat I think this is related to the instance, I have my personal test instance running on a raspberry with 1gb ram without problems, updated since 0.6.0 - with backups.

MR-DOS wrote this answer on 2023-01-30

Docker logs don't show anything particularly useful. A few times, workers were killed due to timeout, but mostly it is just normal log of HTTP accesses with no hint what's going on in the background.

At the exact moment of the last downtime, something triggered pip and for some reason, it brags about Git.

86.49.231.133 - - [29/Jan/2023:13:38:47 +0100] "GET /api/part/parameter/13627/ HTTP/1.1" 200 48 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
86.49.231.133 - - [29/Jan/2023:13:38:48 +0100] "PATCH /api/part/parameter/13627/ HTTP/1.1" 200 49 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
86.49.231.133 - - [29/Jan/2023:13:38:49 +0100] "GET /api/part/parameter/?search=&part=546 HTTP/1.1" 200 2818 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
13:39:20 [Q] INFO Enqueued 26640
86.49.231.133 - - [29/Jan/2023:13:39:21 +0100] "GET /api/notifications/?read=false HTTP/1.1" 200 2 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
86.49.231.133 - - [29/Jan/2023:13:39:22 +0100] "GET /api/notifications/?read=false HTTP/1.1" 200 2 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
86.49.231.133 - - [29/Jan/2023:13:39:22 +0100] "GET /api/notifications/?read=false HTTP/1.1" 200 2 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
86.49.231.133 - - [29/Jan/2023:13:39:31 +0100] "GET /part/546/ HTTP/1.1" 200 81946 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
172.18.0.4 - - [29/Jan/2023:13:39:31 +0100] "GET /auth/ HTTP/1.1" 200 0 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
86.49.231.133 - - [29/Jan/2023:13:39:31 +0100] "GET /js/dynamic/nav.js HTTP/1.1" 200 8674 "https://inventree.msboss.cz/part/546/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"

EDIT: Why is it running pip during normal operation?

simonkuehling wrote this answer on 2023-01-31

I might have another instance of the same worker-timeout-loop during startup to add to the issue. My docker production installation at InvenTree:latest is running on a Raspberry Pi 3B+.
The loop occured after I had added the inventree-brother-plugin to plugins.txt - the initial plugin installation took too long during startup so that the worker processes timed out and were constantly restartet.

My "solution" was to increase the gunicorn timeout variable in the .env file to

# Options for gunicorn server
INVENTREE_GUNICORN_TIMEOUT=60

but maybe actions like pip installs should somehow generally not count against the worker timeout? (I'm not sure about the technical internals on this one at the moment...)

matmair wrote this answer on 2023-01-31

@simonkuehling we can not really influence how startup time is calculated as that is in an upstream package. But changing it to something higher like 60-90 seconds sound like a good idea.

MR-DOS wrote this answer on 2023-02-01

Maybe I have found another reason why my Inventree runs sluggish. It's making backups all the time! Each 2 minutes, it spits out a backup of both database and all Inventree files. No wonder it's rnning into timeouts here and there.
Also, the depth of backup seems to be fixed at 10 files (not last week) for SQL backups. As for backups of media files - these seem to be limited to last 4 days.
image

More Details About Repo
Owner Name inventree
Repo Name InvenTree
Full Name inventree/InvenTree
Language Python
Created Date 2017-03-23
Updated Date 2023-03-31
Star Count 2586
Watcher Count 61
Fork Count 411
Issue Count 141

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date