Troubleshooting

When running against issues it can be helpful to increase the verbosity of Orchest and changing the log level of all Orchest’s containers. You can do this using:

orchest patch --log-level=DEBUG

Some other kubectl commands that can be useful when debugging Orchest:

# Inspect the logs of a particular service
kubectl logs -n orchest -f deployment/orchest-api

# Attach a shell in a particular service
kubectl exec -n orchest -it deployment/orchest-api bash

Exit code 137 when building Orchest images (scripts/build_containers.sh)

For Docker Desktop users, make sure increase the allocated memory to Docker Engine. This can be done by going to Docker Desktop > Settings > Advanced > Increase the Memory slider (GitHub issue for reference).

Inspecting the orchest-api API schema

To develop against the API it can be useful to have a look at the swagger documentation. This can be done by portforwarding the orchest-api and visiting the /api endpoint.

# You will be able to visit `localhost:8000/api`
kubectl port-forward -n orchest deployment/orchest-api 8000:80

Inspecting the orchest-database

kubectl port-forward -n orchest deployment/orchest-database 5432:5432

# You could accomplish the same by ``exec``ing into the database pod,
# this can be much more handy since commands history will be
# preserved through restarts, etc.
psql -h 127.0.0.1 -p 5432 -U postgres -d orchest_api
psql -h 127.0.0.1 -p 5432 -U postgres -d orchest_webserver
psql -h 127.0.0.1 -p 5432 -U postgres -d auth_server

Breaking schema changes

What it looks like

The client can’t be accessed (the webserver is not up) or the client can be accessed but a lot of functionality seems to not be working, e.g. creating an environment.

How to solve
kubectl port-forward -n orchest deployment/orchest-database 5432:5432
psql -h 127.0.0.1 -p 5432 -U postgres
# Once in psql, drop the db of interest.
drop database orchest_api; # or orchest-webserver, auth-server
# Exit psql and restart Orchest
bash orchest restart

Note

An alternative approach is to reinstall Orchest. bash orchest uninstall followed by bash orchest install`.

Context

Some branches might contain a schema migration that applies changes to the database in a way that is not compatible with dev or any other branch. By moving back to those branches, the database has a schema that is not compatible with what’s in the code.

Verify

Check the webserver and the api logs. It will be easy to spot because the service won’t produce other logs but the ones related to incompatible schema changes and database issues.

Error: Multiple head revisions

What it looks like

You see an error along the lines of Error: Multiple head revisions are present for given argument 'head' inside one of the services interacting with the DB, e.g. the orchest-api.

How to solve

Using the orchest-api as an example here.

bash scripts/migration_manager.sh orchest-api merge heads

It may be that the above doesn’t work, because the orchest-api never reaches a running state. In that case we need to:

# Change the deployment so that it does a sleep instead of invoke
# the cmd of the container.
kubectl -n orchest edit deploy orchest-api
# command: ["sleep"]
# args: ["1000"]

# Now run the migration script inside the orchest-api container
python migration_manager.py db merge heads

# Next we need to copy the file out of the container and into
# the migration revisions directly inside the orchest-api
kubectl cp \
    "orchest/${pod_name}:/orchest/services/orchest-api/app/migrations/versions" \
    "services/orchest-api/app/migrations/versions"

# Rebuild the orchest-api container on the node
scripts/build_container.sh -i orchest-api -t "v2022.04.0" -o "v2022.04.0"

# Edit the orchest-api deployment again to make sure to not
# run the sleep command anymore.
kubectl -n orchest edit deploy orchest-api
Context

Alembic creates revision files to do migrations. When two different branches have done schema migrations then the head will diverge, similar to git now having two different branches which point to different commits. Once these branches get merged, the alembic revision heads need to be merged as well.

Dev mode not working

  • Make sure you started the cluster with the Orchest repository mounted, see here.

  • If you have changed some dependencies (i.e. requirements.txt files) you have to rebuild the image and kill the pod to get it redeployed.

Test updating Orchest

Through the CLI

orchest uninstall
scripts/build_container.sh -M -t "v2022.04.4" -o "v2022.04.4"
orchest install --dev
scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
orchest update --dev --version=v2022.04.5
scripts/build_container.sh -M -t "v2022.04.6" -o "v2022.04.6"
orchest update --dev --version=v2022.04.6

Through the UI

For this to work you need to be running in dev mode and have the orchest-dev-repo mounted (as per setting up minikube for development).

# Start from a clean slate so that we know what version we are on
# before invoking the update.
orchest uninstall

# Build whatever version you like! In case you want to test out
# the product after the update, build the X-1 latest release
# tag.
scripts/build_container.sh -m -t "v2022.04.4" -o "v2022.04.4"

# Installing and making sure running in dev.
orchest install --dev
orchest patch --dev
pnpm run dev

# Build the version to update to
scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"

# Invoke the update through the UI go to:
# http://localorchest.io/update
...

# In case you want to test it again
scripts/build_container.sh -M -t "v2022.04.6" -o "v2022.04.6"
# Invoke the update through the UI go to:
# http://localorchest.io/update
...

# And repeat if you like.

Can’t log-in to authentication enabled instance

Open k9s and open a shell (s shortcut) on the orchest-database pod.

# Log into the DB
psql -U postgres -d orchest_api

UPDATE settings
SET value = '{"value": false}'
WHERE name='AUTH_ENABLED';

Next you need to orchest restart on your host for the changes to take affect. Or kill the appropriate pods so that they restart.

Missing environment variables in pods

What it looks like

The pods of the orchest-api, orchest-webserver, auth-server or celery-worker are having issues related to missing environment variables. E.g. they can’t start because a given environment variable is not defined or is wrongly defined.

Context

Some parts of Orchest, like the orchest-controller, are never stopped, regardless of issuing a orchest restart or orchest stop. This makes it so that, despite rebuilding all images (i.e. including the controller) on a given branch and restarting Orchest, the new controller image is not actually deployed. This leads to an inconsistency between what’s running in the cluster.

How to solve
  • make sure you have built the controller image, eval $(minikube -p minikube docker-env) then bash scripts/build_container.sh -i orchest-controller -o $TAG -t $TAG.

  • stop Orchest, orchest stop.

  • cause a redeployment of the controller image by killing the controller pod, kubectl delete pod -n orchest -l app.kubernetes.io/name=orchest-controller, or scale down and back up the controller deployment, or any other preferred solution.

  • start Orchest, orchest start.