When the model is trained, all the model files are packed into a Docker
image, which is then used for the training itself, and for hyperparameter
tuning later on. In order to build that image, the SDK must be provided
with Docker credentials, so that it can publish the resulting images
to the registry specified in Model.image
attribute.
As the image building happens on the cluster, the model files are first uploaded to a blob storage such as S3, GCS, or MinIO and then used by the builder. By default, the SDK uses a cluster-local MinIO installation which doesn’t require any additional configuration. If users wish to use a specific S3 location instead, then appropriate AWS credentials need to be provided.
This guide focuses on two main aspects of credential distribution and configuration:
- Secure automatic mounting of credential files and environment variables to notebook containers
- Using the SDK API for creating and modifying access credentials and parameters.
Secure mounting of credentials
The standard way of secure sharing of sensitive information in
Kubernetes is by using Secret
objects. More information about Secret
objects
is available in the official Kubernetes
documentation.
A Secret
can be created from a file and can be accessed only by the
service account which created it, and by users with administrative
privileges.
After a Secret
is created, it can be mounted to a Notebook container as a file
or used to populate environment variables. To simplify the process, you can create all
the necessary secrets once and then reuse them across notebooks. To make
Secrets
automatically available for mounting, create a PodDefault
for them.
For more information about PodDefault
follow the
Kubeflow Notebook Setup Guide or check the
PodDefault manifest.
Docker credentials
In order to make Docker credentials available as a Secret
,
create a config.json
file which has the following standard layout:
The auth
field is a base64-encoded string of the form
<username>:<password>
where <username>
and <password>
are the
actual username and password used to login to the Docker registry. To
generate the value for auth
field, use the following command:
echo -n "<username>:<password>" | base64
.
To create a Secret
from the credentials file config.json
run the
following command:
Be sure to replace <kaptain_namespace>
with the namespace you use for creating notebooks.
In this example, we used a namespace named ‘user’
Verify the Secret
is created:
the output should look like this:
To make this Secret
available for selection in the Notebook creation
dialogue, create a file named pod_default.yaml
with the following contents:
Create the PodDefault
resource from the file using the following command:
After that, the Docker credentials secret will be available for
selection in the Notebook Spawner UI and, if selected, will be mounted
to /home/kubeflow/.docker/
:
AWS credentials
File-based and environment variable based configuration
There are two ways to make AWS credentials available in a notebook:
- As a configuration file mounted to the
Pod
from aSecret
- As environment variables injected into the
Pod
from aSecret
- The configuration file method is recommended when working with the default account settings, i.e. when only credentials such as AWS Access Key ID, AWS Secret Access Key, and AWS Session Token are needed to access associated S3 storage.
- The environment variables method is recommended when additional configuration is required, such as AWS Region, S3 Endpoint URL, S3 Bucket Access style (url or path-style), and Protocol Signature version. These additional properties are often required when working with non-standard S3-compatible storage solutions such as MinIO.
File-based AWS credentials
Making an AWS credentials file available as a Secret
follows the same steps as with Docker credentials.
First, create an AWS credentials
file with the standard layout:
To create a Secret
from the file credentials
run the following
command:
Be sure to replace <kaptain_namespace>
with the namespace you use for creating notebooks. In this example, we used a namespace named ‘user’
Verify that the Secret
is created:
the output should look like this:
To make this Secret
available for selection in the Notebook creation
dialogue, create a PodDefault
referencing it. Create a file named
pod_default.yaml
with the following contents:
Create a PodDefault
resource from the file using the following command:
After creating this resource, the AWS credentials secret will be available for selection in
the Notebook Spawner UI and, if selected, will be mounted to
/home/kubeflow/.aws/credentials
:
Environment variable based AWS configuration
Making AWS configuration and credentials available as environment
variables requires creating a Secret
from manifest.
The following environment variables are supported and recognized by the SDK:
AWS_ACCESS_KEY_ID
- The access key to authenticate with S3.AWS_SECRET_ACCESS_KEY
- The secret key to authenticate with S3.AWS_SESSION_TOKEN
- The session token to authenticate with S3.AWS_REGION
- The name of AWS region.S3_ENDPOINT
- The complete URL of S3 endpoint. This parameter is required when working with non-standard, S3-compatible storage solutions such as MinIO. It should be set to the resolvable address of the running server.S3_SIGNATURE_VERSION
- The signature version when signing requestsS3_FORCE_PATH_STYLE
- When enabled, clients will use path style instead of URL style for accessing buckets. Supported values:true
|false
.
Creating a Secret
with environment variables requires a YAML
specification file (e.g. secret.yaml
) with the following contents:
<base64-encoded value>
should contain the actual property value
encoded in base64. To encode a specific value in base64 use the
following command:
echo -n "<AWS configuration property value>" | base64
.
To create a Secret
from the YAML specification file (e.g.
secret.yaml
) run the following command:
Be sure to replace <kaptain_namespace>
with the namespace you use for creating notebooks.
In this example, we used a namespace named ‘user’.
Verify the Secret
is created:
To make this Secret
available for selection in the Notebook creation dialogue, create a PodDefault
referencing it. Create the file pod_default.yaml
with the following contents:
Create a PodDefault
resource from file using the following command:
After that, the AWS configuration secret will be available for selection in the Notebook Spawner UI and, if selected, will make all the environment variables available in the Notebook:
SDK API for Configuring access to Docker and cloud storage
The Model
class serves as the main API for training, tuning, and
deploying model to serving. It uses a Docker registry for publishing
images with model files and also requires S3-compatible storage for
storing trained models and transient data. For configuring access to
the Docker registry and storage, Model
exposes a config
argument which
allows users to fine-tune their configurations. This section covers the available configuration providers and their defaults.
Config.default()
in the example below is initialized and expects a
Docker configuration file to be present at
/home/kubeflow/.docker/config.json
when run from the Notebook:
Users can customize the Config
to provide custom Docker and storage
configuration for example:
The SDK comes with the convenience implementations for Docker and S3-compatible storage configuration providers.
Docker Configuration
DockerConfigurationProvider
supports Docker credentials reading from
a file only.
DockerConfigurationProvider.default()
loads a configuration from
/home/kubeflow/.docker/config.json
and fails with an error if the file is not present.
DockerConfigurationProvider.from_file(<path/to/config.json>)
loads a
configuration from the specified path, and can be used when the config
Secret
is mounted to a non-default path or created by the user.
Example:
AWS Configuration
S3ConfigurationProvider
supports reading Docker credentials only from a file.
S3ConfigurationProvider.default()
loads configuration from
/home/kubeflow/.aws/credentials
and fails with an error if the file is not
present.
S3ConfigurationProvider.from_file(<path/to/aws/credentials>)
loads
configuration from the specified path and can be used when the config
Secret
is mounted to a non-default path or created by the user.
S3ConfigurationProvider.from_env()
loads configuration from the
environment variables and can be used when the configuration Secret
is
mounted as environment variables or environment variables are set by the
user.
Example:
MinIO Configuration
DefaultMinioConfigurationProvider
is a special configuration provider
pre-configured for an in-cluster MinIO instance.
DefaultMinioConfigurationProvider.default()
returns a configured
instance ready to be used with it. It is used by default when no S3
provider is specified. DefaultMinioConfigurationProvider
extends
S3ConfigurationProvider
and supports all the same methods.
Example: