Development workflow#
Prerequisites#
Required software#
You need the following installed to contribute to Orchest:
Python version
3.x
Go: Used by the
orchest-controller
and needed to run ourscripts/build_container.sh
to build Orchest’s images.Docker: To build Orchest’s images.
minikube: To deploy Orchest on a local cluster.
kubectl: To manage k8s clusters.
helm: Needed to run our
scripts/build_container.sh
to create the manifests to deploy theorchest-controller
.pre-commit: Running pre-commit hooks, e.g. linters.
Google Chrome: Requirement to run integration tests locally.
Optional, but highly recommended:
k9s: Terminal UI to manage k8s clusters.
jq: Useful when working with JSON in your terminal.
gron: Make JSON greppable.
🎉 In case you dare to run a script to install all of the above
Requires to be on Linux
cd ~/Downloads
# go
curl -L https://go.dev/dl/go1.18.3.linux-amd64.tar.gz -o go.tar.gz
sudo tar -C /usr/local -xzf go.tar.gz
# Docker
# https://docs.docker.com/engine/install/ubuntu/#install-using-the-convenience-script
# https://docs.docker.com/engine/install/linux-postinstall/
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
# minikube
# https://minikube.sigs.k8s.io/docs/start/
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
# kubectl
# https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh -v v3.9.0
# pre-commit
pip install pre-commit
# Node and npm
curl -sL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
# pnpm
sudo npm install -g pnpm
## --- Extra's
# k9s
curl -L https://github.com/derailed/k9s/releases/download/v0.25.21/k9s_Linux_x86_64.tar.gz -o k9s.tar.gz
tar -C ~/.local/bin -xzf k9s.tar.gz
Dependencies#
After installing the required software, you need to configure the tools and install additional dependencies.
Note
Make sure you are inside the root of the orchest
repository.
# Set-up pre-commit:
pre-commit install
# Install frontend dependencies for local development:
npm run setup --install && pnpm i
# Install the Orchest CLI to manage the Orchest Cluster in k8s:
python3 -m pip install -e orchest-cli
# Install dependencies to build the docs:
python3 -m pip install -r docs/requirements.txt
Orchest’s integration tests require a MySQL client to be installed:
sudo apt install -y default-libmysqlclient-dev
brew install mysql
Cluster for development#
Currently, the development tools assume that you have Orchest installed on a local minikube cluster. To get the best development experience, it is recommended to mount the Orchest repository in minikube which allows for incremental development.
Note
Make sure you are inside the root of the orchest
repository.
# Delete any existing cluster
minikube delete
# Start minikube with the repository mounted in the required place
# for hot reloading to work.
minikube start \
--cpus max \
--memory max \
--addons ingress metrics-server \
--mount-string="$(pwd):/orchest-dev-repo" --mount
Installing Orchest for development#
Now when all dependencies are installed and your cluster is set up, you can install Orchest!
But before doing so, it is important to realize that the Docker daemon of your host is different from the Docker daemon of minikube. This means that you need to build Orchest’s images on the minikube node in order for minikube to be able to use them, otherwise it will pull the images from DockerHub. Note that DockerHub only contains images of Orchest releases and not active code changes from GitHub branches. Therefore it is important to configure your environment to use minikube’s Docker daemon before building images.
Note
The command below needs to be run in every terminal window you open!
# Use minikube's Docker daemon:
eval $(minikube -p minikube docker-env)
Next, you can build Orchest’s images. Again, there is an important realization to make here and that
is that the images you build are given a specific tag. This tag is used by the Orchest Controller
(orchest-controller
) to manage the Orchest Cluster, thus if you build images with tag X
but
deploy the orchest-controller
with tag Y
, then the orchest-controller
will start pulling the
images with tag Y
from DockerHub (instead of using the locally built images with tag X
). This
will become important when rebuilding images after making code changes.
# Verify whether you are using minikube's Docker daemon
echo "$MINIKUBE_ACTIVE_DOCKERD"
Let’s build the minimal set of required images for Orchest to run:
# Set the *tag* to the latest Orchest version available
export TAG="$(orchest version --latest)"
# Build the minimal set of images for Orchest to run
scripts/build_container.sh -M -t $TAG -o $TAG
💡 Additional notes on the “scripts/build_container.sh” script
In this section we will quickly go over the most important options that can be passed to the
scripts/build_container.sh
script. Note that, because Orchest is a fully containerized
application, the new images (the ones with your code changes) need to be build and used by the
cluster in order to reflect your changes.
-n
: Build the specified images but without using the existing Docker cache. Might be useful in case you are experiencing trouble building an image.-i
: Build a specific image, e.g.... -i orchest-api
-m
: Build a minified set of images, that is, all images except base Environment images. Environment images don’t need to be explicitly build in case you didn’t make any changes to them (see making Environment base image changes) and instead can be pulled in from DockerHub. Do note that the$TAG
you are building Orchest with needs to be a valid image tag that exists for the image on Orchest’s DockerHub.-M
: Build the absolute minimal set of images required by Orchest to run. On top of the-m
option this also excludes all images for Sessions.-v
: Run the script in verbose mode (useful for debugging), which will also disable parallel building of the images.
Any number of these options can be passed to the script.
Of course, all details about the script can be found by checking out its source code.
And finally, install Orchest:
# The --dev flag is used so that it doesn't pull in the release assets
# from GitHub, but instead uses the manifests from the local filesystem
# to deploy the Orchest Controller. NOTE: these manifests are automatically
# generated when running the above `build_container.sh` script ;)
orchest install --dev
Take a look in k9s and see how Orchest is getting installed.
Once the installation is completed you can reach Orchest using one of the following approaches, depending on your operating system:
Simply access the Orchest UI by browsing to the IP returned by:
minikube ip
Run the tunnel daemon and browse to localhost.
sudo minikube tunnel
Does everything look good? Awesome! You’re all set up and ready to start coding now! 🎉
Have a look at our best practices and our GitHub to find interesting issues to work on.
Redeploying Orchest after code changes#
Warning
Running minikube delete
is not recommend because it will lose the Docker cache on the minikube
node, making rebuilding images very slow. Luckily, it is unlikely you will need to (ever) run
minikube delete
.
In this section we will go over the three ways to “redeploy” Orchest to reflect your code changes. Note that each approach is best used in specific circumstances.
Using development mode to automatically reflect code changes. (link) Best used when working on a PR and you would like to see your code changes immediately, especially useful when developing the front-end (e.g.
orchest-webserver
).Rebuilding and redeploying only the service’s image that you made changes to. (link) Best used when you know the code changes affect only one service and you don’t want to fully re-install Orchest. For example, you want to test a PR that only changed the front-end and want to run in production mode instead of development mode.
Completely uninstalling and installing Orchest again. (link) When making larger changes that touch different parts of Orchest, it is a good idea to fully re-install Orchest. Do note that this should be rather fast because the Docker cache is used when rebuilding images.
Development mode (incremental development)#
For the next steps, we assume you already installed Orchest.
To get “hot reloading”, you need to make sure your minikube cluster was created using the above mount command and have Orchest serve files from your local filesystem (that contains code changes) instead of the files baked into the Docker images. To achieve the latter, simply run:
Note: Don’t forget to disable cache (DevTools -> Disable cache) or force reload (Command/Ctrl + Shift + R) to see frontend changes propagate.
# In case any new dependencies were changed or added they need
# to be installed
pnpm i
# Run the client dev server for hot reloading of client (i.e. FE)
# files
pnpm run dev
# Get the Orchest Cluster to serve files from your local filesystem.
orchest patch --dev
Note: Your cluster will stay in --dev
mode until you unpatch it (using orchest patch --no-dev
).
The services that support incremental development are:
orchest-webserver
orchest-api
auth-server
For changes to all other services, you need to redeploy the respective image as described in the next section.
Note
Even if you do incremental development, it is good practice to rebuild the containers and run in production mode before opening a PR (see before committing).
Teardown#
If at any point you want to disable incremental development, proceed as follows:
# Kill the client dev server
kill $(pidof pnpm)
# Revert the patch
orchest patch --no-dev
To stop the cluster, it’s enough to call minikube stop
, which will stop all the pods.
Switching branches#
If you have a running development installation with hot reloading, every time you make a change to the code it will be automatically reloaded. However, when switching git branches that are very different, or if changes to certain core components were made, this procedure might produce inconsistent results. A safer way to proceed is to uninstall Orchest before making the switch, see re-installing Orchest.
Rebuilding images#
To easily test code changes of an arbitrary service, you will need to:
rebuild the respective Docker image and
make it available to the k8s deployment.
The procedure changes slightly depending on the deployment type, i.e. single-node or multi-node. Luckily, in the majority of cases you will be using a local single-node cluster (like the one you created in the previous steps).
For the sake of simplicity (without loss of generality), let’s assume you made changes to the
orchest-api
.
Generally, single node deployments make it far easier to test changes. First of all, configure your environment to use minikube’s Docker daemon if you haven’t already:
# If not active, set it
eval $(minikube -p minikube docker-env)
Now you’re ready to rebuild the images, to which you made changes, using the build_container.sh
script.
Note: It is very important (otherwise your code changes will not be reflected) to use the tag equal to the currently running Orchest version.
export TAG="$(orchest version)"
# Rebuild the images that need it
scripts/build_container.sh -i orchest-api -t $TAG -o $TAG
Alternatively, you can run scripts/build_container.sh -M -t $TAG -o $TAG
to rebuild the absolute
minimal required set of images instead of cherry picking. This is not a bad idea given that the
Docker cache will be used and thus rebuilds of unchanged images is quick.
Lastly, you need to make sure that your new orchest-api
image is used by minikube. This can be
done by deleting the respective orchest-api
pod (which will automatically get replaced with a new
pod serving your updated image thanks to Kubernetes deployments):
# Kill the pods of the orchest-api, so that the new image gets used
# when new pod gets automatically deployed.
kubectl delete pods -n orchest -l "app.kubernetes.io/name=orchest-api"
Check out k9s if you want to use a visual interface instead (highly recommended!).
The procedure for single-node is not possible in multi node deployments though. Since this is slightly more involved, we provide the following scripts:
# Redeploy a service after building the image using the repo code.
# This is the script that you will likely use the most. This script
# assumes Orchest is installed and running, since it interacts with
# an Orchest service.
bash scripts/redeploy_orchest_service_on_minikube.sh orchest-api
# Remove an image from minikube. Can be useful to force a pull from
# a registry.
bash scripts/remove_image_from_minikube.sh orchest/orchest-api
# Build an image with a given tag, on all nodes.
bash scripts/build_image_in_minikube.sh orchest-api v2022.03.7
# Run arbitrary commands on all nodes.
bash scripts/run_in_minikube.sh echo "hello"
Warning
The redeploy and build_image scripts require the Orchest repository to be mounted in minikube. However, note that multi node mounting might not be supported by all minikube drivers. We have tested with docker, the default driver.
Re-installing Orchest#
When making larger changes or when wanting to check out a different branch for example, it is a good idea to re-install Orchest. Rest assured, this should be fairly quick!
# Uninstall Orchest before proceeding
orchest uninstall
# Switch git branches if applicable
git switch feature-branch
# Rebuild containers, if needed
eval $(minikube -p minikube docker-env)
export TAG="$(orchest version --latest)"
scripts/build_container.sh -M -t $TAG -o $TAG
# Install Orchest again
orchest install --dev
Making changes#
Before committing#
Make sure your development environment is set up correctly
(see prerequisites)
so that pre-commit can automatically take care of running the appropriate
formatters and linters when running git commit
.
In our CI we also run a bunch of checks, such as unit tests and integration tests to make sure the codebase remains stable. To read more about testing, check out the testing section.
Opening a PR#
Note
When opening a PR please change the base in which you want to merge from master
to dev
.
The GitHub docs
describe how this can be done.
We use gitflow as
our branching model with master
and dev
being the described master
and develop
branches respectively. Therefore, we require PRs to be merged into dev
instead of master
.
When opening the PR a checklist will automatically appear to guide you to successfully completing your PR 🏁
Changing Python dependencies#
Python dependencies for the microservices are specified using pip’s requirements.txt
files.
Those files are automatically generated by pip-tools
from requirements.in
files by calling pip-compile
, which locks all the transitive
dependencies. After a locked requirements.txt
file is in place,
subsequent calls to pip-compile
will not upgrade any of the dependencies
unless the constraints in requirements.in
are modified.
To manually upgrade a dependency to a newer version, there are several options:
pip-compile -P <dep> # Upgrades <dep> to latest version
pip-compile -U # Try to upgrade everything
As a general rule, avoid writing exact pins in requirements.in
unless there are known incompatibilities.
In addition, avoid manually editing requirements.txt
files,
since they will be automatically generated.
Warning
A bug in pip-tools affects local dependencies. Older versions are not affected, but they are not compatible with modern pip. At the time of writing, the best way forward is to install this fork (see this PR for details):
pip install -U "pip-tools @ git+https://github.com/richafrank/pip-tools.git@combine-without-copy"
Database schema migrations#
Whenever one of the services’s database models (in their respective models.py
) have been
changed, a database migration has to be performed so that all existing users are unaffected by the
schema change on update (since they can then be automatically migrated to the latest version).
# Depending on the service that requires schema changes.
scripts/migration_manager.sh orchest-api migrate
scripts/migration_manager.sh orchest-webserver migrate
# For more options run:
scripts/migration_manager.sh --help
Building the docs#
Our docs are built using Read the Docs with Sphinx and written in reStructuredText.
To build the docs, run:
cd docs
make html
Tip
👉 If you didn’t follow the prerequisites, then make sure you’ve installed the needed requirements to builds the docs:
python3 -m pip install -r docs/requirements.txt
Example VS Code monorepo set-up#
Note
👉 This section is for VS Code and pyright users.
If you use VS Code (or the pyright language server to be
more precise) then this section is for you. The different services contain their own
pyrightconfig.json
file that configures smart features such as auto complete, go to definition,
find all references, and more. For this to work, you need to install the dependencies of the
services in the correct virtual environment by running:
scripts/run_tests.sh
Next you can create a workspace file that sets up VS Code to use the right Python interpreters (do note that this won’t include all the files defined in the Orchest repo), e.g.:
{
"folders": [
{
"path": "services/orchest-api"
},
{
"path": "services/orchest-webserver"
},
{
"path": "services/base-images/runnable-shared"
},
{
"path": "services/session-sidecar"
},
{
"name": "orchest-sdk",
"path": "orchest-sdk/python"
},
{
"name": "internal lib Python",
"path": "lib/python/orchest-internals/"
}
],
"settings": {}
}
Automated tests#
Unit tests#
Unit tests are being ported to k8s, stay tuned :)!
Integration tests#
Integration tests are being ported to k8s, stay tuned :)!
Manual testing#
Test Environment or custom Jupyter base image changes#
When building environment or custom Jupyter images the image builder mounts the socket of the container runtime running on the node. This means that, to test changes to a base image, all that is needed is to build or load the new base image in the container runtime. Example:
# Make changes to services/base-images/base-kernel-py/Dockerfile, then:
eval $(minikube -p minikube docker-env)
bash scripts/build_container.sh -o v2022.08.11 -t v2022.08.11 -i base-kernel-py
# That's it, you can now build an environment image in Orchest using the
# new python base image.
Currently, this has only been tested with docker as the container runtime.
Test running Orchest on containerd
#
To test running Orchest on containerd
, we recommend installing MicroK8s.
Alternatively, you can also set up Orchest on GKE (see installation)
or install MicroK8s in a VM (e.g. using VirtualBox).
Next, enable the following addons:
microk8s enable hostpath-storage \
&& microk8s enable dns \
&& microk8s enable ingress
Now that MicroK8s is correctly configured we need to rebuild Orchest’s images and save them to a
.tar
file so that containerd
can unpack the file and use the images.
export TAG=v2022.06.4
scripts/build_container.sh -M -t $TAG -o $TAG
docker save \
$(docker images | awk '{if ($1 ~ /^orchest\//) new_var=sprintf("%s:%s", $1, $2); print new_var}' | grep $TAG | sort | uniq) \
-o orchest-images.tar
👉 I didn’t install MicroK8s on my host
In case you didn’t install MicroK8s on your host directly, you need to ship the images to the MicroK8s node:
scp ./orchest-images.tar {your_user}@${microk8s node ip}:~/
And set up the kubeconfig on your host so that you can use the orchest-cli
like:
KUBECONFIG=/path/to/kubeconfig orchest install --dev
Next, inside the MicroK8s node (which can be your host), you can import the images using:
microk8s ctr --namespace k8s.io --address /var/snap/microk8s/common/run/containerd.sock image import orchest-images.tar
# OR, requires ctr to be installed: https://github.com/containerd/containerd/releases
sudo ctr -n k8s.io -a /var/snap/microk8s/common/run/containerd.sock i import orchest-images.tar
Now you can install Orchest:
orchest install --dev --socket-path=/var/snap/microk8s/common/run/containerd.sock
Run Orchest Controller locally#
For easier debugging it is possible to run the orchest-controller
locally with a debugger. We
will explain how to do so using VSCode. Make sure your cluster is set up and you’ve installed
Go, then follow the steps below:
Run the orchest-controller
with a debugger in VSCode, example launch.json
:
{
"configurations": [
{
"name": "Launch ctrl",
"type": "go",
"request": "launch",
"mode": "debug",
"program": "${workspaceFolder}/cmd/controller/main.go",
"args": [
"--inCluster=false",
"--defaultVersion=<INSERT VERSION, e.g. v2022.05.0>",
"--assetsDir=${workspaceFolder}/deploy",
"--endpoint=:5000"
],
"env": {
"KUBECONFIG": "~/.kube/config"
}
}
]
}
Next install Orchest and afterwards issue other commands to test the controller with:
# Asuming you are in the root of the orchest git repository
orchest install --dev
# Delete orchest-controller deployment so that the one started with
# the debugger does everything
kubectl delete -n orchest deploy orchest-controller
The Orchest Controller should now be running inside a debugger session.
Without using VSCode#
Build the orchest-controller
binary via the Makefile
in services/orchest-controller
and
run the orchest-controller
by passing the following command line arguments:
# Asuming you have built the controller via "make controller" command
./bin/controller --inCluster=false --defaultVersion=v2022.05.3 \
--endpoint=:5000 --assetsDir=./deploy
Test updating Orchest#
Through the CLI#
orchest uninstall
scripts/build_container.sh -M -t "v2022.04.4" -o "v2022.04.4"
orchest install --dev
scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
orchest update --dev --version=v2022.04.5
scripts/build_container.sh -M -t "v2022.04.6" -o "v2022.04.6"
orchest update --dev --version=v2022.04.6
Through the UI#
For this to work you need to be running in dev mode and have the orchest-dev-repo
mounted (as
per setting up minikube for development).
# Start from a clean slate so that we know what version we are on
# before invoking the update.
orchest uninstall
# Build whatever version you like! In case you want to test out
# the product after the update, build the X-1 latest release
# tag.
scripts/build_container.sh -m -t "v2022.04.4" -o "v2022.04.4"
# Installing and making sure running in dev.
orchest install --dev
orchest patch --dev
pnpm run dev
# Build the version to update to
scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
# Invoke the update through the UI go to:
# http://localorchest.io/update
...
# In case you want to test it again
scripts/build_container.sh -M -t "v2022.04.6" -o "v2022.04.6"
# Invoke the update through the UI go to:
# http://localorchest.io/update
...
# And repeat if you like.