Troubleshooting#

When running against issues it can be helpful to increase the verbosity of Orchest and changing the log level of all Orchest’s containers. You can do this using:

orchest patch --log-level=DEBUG

Some other kubectl commands that can be useful when debugging Orchest:

# Inspect the logs of a particular service
kubectl logs -n orchest -f deployment/orchest-api

# Attach a shell in a particular service
kubectl exec -n orchest -it deployment/orchest-api bash

Minikube is unresponsive on Docker Desktop#

Sometimes Minikube becomes unresponsive on Docker Desktop. Some tell-tale signs of this are:

  • You’re getting exit code 137 when building Orchest images (scripts/build_containers.sh)

  • k9s fails to connect to minikube’s context and docker logs minikube hangs, even though minikube is started

  • Minikube commands hang or are really slow

This happens because too little memory is being allocated to Docker Engine. To fix it: open Docker Desktop and go to Settings > Advanced and then increase the “Memory” slider (See this GitHub issue for reference).

Inspecting the orchest-api API schema#

To develop against the API it can be useful to have a look at the swagger documentation. This can be done by portforwarding the orchest-api and visiting the /api endpoint.

# You will be able to visit `localhost:8000/api`
kubectl port-forward -n orchest deployment/orchest-api 8000:80

Inspecting the orchest-database#

kubectl port-forward -n orchest deployment/orchest-database 5432:5432

# You could accomplish the same by ``exec``ing into the database pod,
# this can be much more handy since commands history will be
# preserved through restarts, etc.
psql -h 127.0.0.1 -p 5432 -U postgres -d orchest_api
psql -h 127.0.0.1 -p 5432 -U postgres -d orchest_webserver
psql -h 127.0.0.1 -p 5432 -U postgres -d auth_server

Breaking schema changes#

What it looks like

The client can’t be accessed (the webserver is not up) or the client can be accessed but a lot of functionality seems to not be working, e.g. creating an environment.

How to solve
kubectl port-forward -n orchest deployment/orchest-database 5432:5432
psql -h 127.0.0.1 -p 5432 -U postgres
# Once in psql, drop the db of interest.
drop database orchest_api; # or orchest-webserver, auth-server
# Exit psql and restart Orchest
bash orchest restart

Note

An alternative approach is to reinstall Orchest. bash orchest uninstall followed by bash orchest install`.

Context

Some branches might contain a schema migration that applies changes to the database in a way that is not compatible with dev or any other branch. By moving back to those branches, the database has a schema that is not compatible with what’s in the code.

Verify

Check the webserver and the api logs. It will be easy to spot because the service won’t produce other logs but the ones related to incompatible schema changes and database issues.

Error: Multiple head revisions#

What it looks like

You see an error along the lines of Error: Multiple head revisions are present for given argument 'head' inside one of the services interacting with the DB, e.g. the orchest-api.

How to solve

Using the orchest-api as an example here.

bash scripts/migration_manager.sh orchest-api merge heads

It may be that the above doesn’t work, because the orchest-api never reaches a running state. In that case we need to:

# Change the deployment so that it does a sleep instead of invoke
# the cmd of the container.
kubectl -n orchest edit deploy orchest-api
# command: ["sleep"]
# args: ["1000"]

# Now run the migration script inside the orchest-api container
python migration_manager.py db merge heads

# Next we need to copy the file out of the container and into
# the migration revisions directly inside the orchest-api
kubectl cp \
    "orchest/${pod_name}:/orchest/services/orchest-api/app/migrations/versions" \
    "services/orchest-api/app/migrations/versions"

# Rebuild the orchest-api container on the node
scripts/build_container.sh -i orchest-api -t "v2022.04.0" -o "v2022.04.0"

# Edit the orchest-api deployment again to make sure to not
# run the sleep command anymore.
kubectl -n orchest edit deploy orchest-api
Context

Alembic creates revision files to do migrations. When two different branches have done schema migrations then the head will diverge, similar to git now having two different branches which point to different commits. Once these branches get merged, the alembic revision heads need to be merged as well.

Dev mode not working#

  • Make sure you started the cluster with the Orchest repository mounted, see here.

  • If you have changed some dependencies (i.e. requirements.txt files) you have to rebuild the image and kill the pod to get it redeployed.

Can’t log-in to authentication enabled instance#

Open k9s and open a shell (s shortcut) on the orchest-database pod.

# Log into the DB
psql -U postgres -d orchest_api

UPDATE settings
SET value = '{"value": false}'
WHERE name='AUTH_ENABLED';

Next you need to orchest restart on your host for the changes to take affect. Or kill the appropriate pods so that they restart.

Missing environment variables in pods#

What it looks like

The pods of the orchest-api, orchest-webserver, auth-server or celery-worker are having issues related to missing environment variables. E.g. they can’t start because a given environment variable is not defined or is wrongly defined.

Context

Some parts of Orchest, like the orchest-controller, are never stopped, regardless of issuing a orchest restart or orchest stop. This makes it so that, despite rebuilding all images (i.e. including the controller) on a given branch and restarting Orchest, the new controller image is not actually deployed. This leads to an inconsistency between what’s running in the cluster.

How to solve
  • make sure you have built the controller image, eval $(minikube -p minikube docker-env) then bash scripts/build_container.sh -i orchest-controller -o $TAG -t $TAG.

  • stop Orchest, orchest stop.

  • cause a redeployment of the controller image by killing the controller pod, kubectl delete pod -n orchest -l app.kubernetes.io/name=orchest-controller, or scale down and back up the controller deployment, or any other preferred solution.

  • start Orchest, orchest start.