In Orchest a project is essentially just a
git repository, it contains:
.gitdirectory which makes it a
The code, e.g. the Notebook files that are attached to the pipeline steps.
.orchestdirectory which should also be versioned as it defines the environment that are used by the project. By versioning them as well, the project runs on every machine.
. ├── .git/ ├── .orchest │ ├── environments/ │ └── pipelines/ ├── california_housing.orchest ├── collect-results.ipynb └── get-data.py
Projects also encapsulate jobs. However, these are not stored within in the project on the filesystem.
Given that a project is a
git repository you might be confused where to write data to, since
git’s best practices state that you should not upload large files, just source files. That is indeed
a very good observation. For the use case of storing data locally, all code should store data to the
/data directory. Additionally, secrets should be set using environment variables as they would otherwise be versioned!
/data directory is accessible by all pipelines across all projects, even by jobs.
Inside your code (that run inside environments) you can access your files
using relative paths. In case your are looking to use absolute path, all files of a project are
mounted to the
👉 Get started by following the quickstart tutorial.
There are numerous ways to get started on a new project in Orchest, you can:
Add a new project from scratch.
Import an existing project using its git repository URL (the same URL you would use to
git clonea repo), learn how to import a project.
Explore Orchest curated or community contributed examples and importing them.
git inside Orchest¶
👉 Would you rather watch a short video tutorial? Check it here: versioning using git in Orchest.
git inside Orchest works using the jupyterlab-git extension which we ship pre-installed. The only
thing that you need to do is configure JupyterLab (go to
settings > configure JupyterLab) and set your
user.email, for example:
git config --global user.name "John Doe" git config --global user.email "email@example.com"
If you’d like to add a private SSH key to your terminal session in JupyterLab you can do so through the following commands:
echo "chmod 400 /data/id_rsa" >> ~/.bashrc echo "ssh-add /data/id_rsa 2>/dev/null" >> ~/.bashrc echo "if [ -z \$SSH_AGENT_PID ]; then exec ssh-agent bash; fi" >> ~/.bashrc mkdir -p ~/.ssh printf "%s\n" "Host github.com" " IdentityFile /data/id_rsa" >> ~/.ssh/config ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts
Make sure the
id_rsa private key file is uploaded through the file manager (go to File
manager) in the root
🚨 Putting your private key in the
/data folder exposes the private key file to everyone
using your Orchest instance.
Now you can version using
git through a JupyterLab terminal or use the extension through the
To import private
git repositories upload them directly through the File manager into the
projects/ directory. Orchest will then pick up the project automatically.