How to…¶
Pass data between pipeline steps¶
Please refer to the dedicated section on data passing.
Install new packages¶
Tip
👉 Would you rather watch a short video tutorial? Check it our here: installing additional packages.
To install new packages, you should make use of environments. Simply build a
new environment that contains your package and select it inside the pipeline editor. Installing
packages is done using well known commands such as pip install
and sudo apt-get install
.
Note
💡 When updating an existing environment, the new environment will automatically be used inside the visual editor (and for your interactive pipeline runs). However, the JupyterLab kernel needs to be restarted if it was already running.
What not to do¶
Do not install new packages by running bash commands inside the Notebooks. This will require the packages to be installed every time you do a pipeline run, since the state of the kernel environment is ephemeral.
Use git
inside Orchest¶
Please refer to the dedicated section on using git inside Orchest.
Import a project¶
Check out our video: importing a project.
Minimize Orchest’s disk size¶
To keep Orchest’s disk footprint to a minimal you can use the following best practices:
Are you persisting data to disk? Then write it to the
/data
directory instead of the project directory. Jobs create a snapshot (for reproducibility reasons) of your project directory and would copy data in your project directory for every pipeline run, consuming large amounts of storage. The smaller the size of your project directory, the smaller the size of your jobs.Do you have many pipeline runs as part of jobs? You can configure your job to only retain a number of pipeline runs and automatically delete the older ones. Steps: (1) edit an existing job or create a new one, (2) go to pipeline runs, and (3) select auto clean-up.
Use a GPU in Orchest¶
Currently GPU support is not yet available. Coming soon!
Use the Orchest CLI¶
Below you will find the most important orchest-cli
commands that you need to know (you can also get all this
information by running orchest -h
):
orchest start
# Stop Orchest (shuts down Orchest completely).
orchest stop
# Install Orchest (check out the dedicated `Installation` guide in
# the `Getting started` section).
orchest install
# Update Orchest to a newer version (NOTE: this can also be done
# through the settings in the UI).
orchest update
# Get extensive version information. Useful to see whether the
# installation was successful.
orchest version
Use Orchest shortcuts like a pro¶
Command palette¶
Key(s) |
Action |
---|---|
Control/Command + K |
Open command palette |
↑/↓ |
Navigate command palette commands |
PageUp/PageDown |
Navigate command palette commands |
Escape |
Dismiss command palette |
Pipeline editor¶
Key(s) |
Action |
---|---|
Space + click + drag |
Pan canvas* |
Ctrl + click |
Select multiple steps |
Ctrl + A |
Select all steps* |
Ctrl + Enter |
Run selected steps* |
H |
Center view and reset zoom |
Escape |
Deselect steps |
Delete/Backspace |
Delete selected step(s) |
Double click a step |
Open file in JupyterLab |
* Requires mouse to hover the canvas
Skip notebook cells¶
Notebooks facilitate an experimental workflow, meaning that there will be cells that should not be run when executing the notebook (from top to bottom). Since pipeline runs require your notebooks to be executable, Orchest provides an (pre-installed JupyterLab) extension to skip those cells.
To skip a cell during pipeline runs:
Open JupyterLab.
Go to the Property Inspector, this is the icon with the two gears all the way at the right.
Select the cell you want to skip and give it a tag of: skip.
The cells with the skip tag are still runnable through JupyterLab, but when executing these notebooks as part of pipelines in Orchest they will not be run.
Migrate to Kubernetes¶
The moment we have moved to a Kubernetes backed Orchest version (and deprecated the Docker based version), we will update this section of the documentation to include steps on how to migrate your current deployment to a Kubernetes based one.
Just know that we are super excited to make the Kubernetes version available part of the open core and we are invested to provide a smooth migration experience 🔥