CockroachDB constructs a secure API call to the cloud storage specified in a URL passed to one of the following statements:
We strongly recommend using cloud/remote storage.
URL format
URLs for the files you want to import must use the format shown below. For examples, see Example file URLs.
[scheme]://[host]/[path]?[parameters]
Location | Scheme | Host | Parameters |
---|---|---|---|
Amazon | s3 |
Bucket name | AUTH : implicit or specified (default: specified ). When using specified pass user's AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY .ASSUME_ROLE (optional): Pass the ARN of the role to assume. Use in combination with AUTH=implicit or specified .AWS_SESSION_TOKEN (optional): For more information, see Amazon's guide on temporary credentials. S3_STORAGE_CLASS (optional): Specify the Amazon S3 storage class for created objects. Default: STANDARD . |
Azure | azure |
Storage container | AZURE_ACCOUNT_KEY , AZURE_ACCOUNT_NAME For more information, see Authentication - Azure Storage. |
Google Cloud | gs |
Bucket name | AUTH : implicit , or specified (default: specified ); CREDENTIALS For more information, see Authentication - Google Cloud Storage. |
HTTP | http |
Remote host | N/A For more information, see Authentication - HTTP. |
NFS/Local 1 | nodelocal |
nodeID or self 2 (see Example file URLs) |
N/A |
S3-compatible services | s3 |
Bucket name | Warning: While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_SESSION_TOKEN , AWS_REGION 3 (optional), AWS_ENDPOINT For more information, see Authentication - S3-compatible services. |
The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
You can disable the use of implicit credentials when accessing external cloud storage services for various bulk operations by using the --external-io-disable-implicit-credentials
flag.
1 The file system backup location on the NFS drive is relative to the path specified by the --external-io-dir
flag set while starting the node. If the flag is set to disabled
, then imports from local directories and NFS drives are disabled.
2 Using a nodeID
is required and the data files will be in the extern
directory of the specified node. In most cases (including single-node clusters), using nodelocal://1/<path>
is sufficient. Use self
if you do not want to specify a nodeID
, and the individual data files will be in the extern
directories of arbitrary nodes; however, to work correctly, each node must have the --external-io-dir
flag point to the same NFS mount or other network-backed, shared storage.
3 The AWS_REGION
parameter is optional since it is not a required parameter for most S3-compatible services. Specify the parameter only if your S3-compatible service requires it.
Example file URLs
Example URLs for BACKUP
, RESTORE
, changefeeds, or EXPORT
given a bucket or container name of acme-co
and an employees
subdirectory:
Location | Example |
---|---|
Amazon S3 | s3://acme-co/employees?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456 |
Azure | azure://acme-co/employees?AZURE_ACCOUNT_NAME=acme-co&AZURE_ACCOUNT_KEY=url-encoded-123 |
Google Cloud | gs://acme-co/employees?AUTH=specified&CREDENTIALS=encoded-123 |
NFS/Local | nodelocal://1/path/employees , nodelocal://self/nfsmount/backups/employees 2 |
Cloud storage sinks (for changefeeds) only work with JSON
and emits newline-delimited JSON
files.
Example URLs for IMPORT
given a bucket or container name of acme-co
and a filename of employees
:
Location | Example |
---|---|
Amazon S3 | s3://acme-co/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456 |
Azure | azure://acme-co/employees.sql?AZURE_ACCOUNT_NAME=acme-co&AZURE_ACCOUNT_KEY=url-encoded-123 |
Google Cloud | gs://acme-co/employees.sql?AUTH=specified&CREDENTIALS=encoded-123 |
HTTP | http://localhost:8080/employees.sql |
NFS/Local | nodelocal://1/path/employees , nodelocal://self/nfsmount/backups/employees 2 |
HTTP storage can only be used for IMPORT
and CREATE CHANGEFEED
.
Encryption
Transport Layer Security (TLS) is used for encryption in transit when transmitting data to or from Amazon S3, Google Cloud Storage, and Azure.
For encryption at rest, if your cloud provider offers transparent data encryption, you can use that to ensure that your backups are not stored on disk in cleartext.
CockroachDB also provides client-side encryption of backup data, for more information, see Take and Restore Encrypted Backups.
Authentication
When running bulk operations to and from a storage bucket, authentication setup can vary depending on the cloud provider. This section details the necessary steps to authenticate to each cloud provider.
implicit
authentication cannot be used to run bulk operations from CockroachDB Cloud clusters—instead, use AUTH=specified
.
You can either authenticate to Amazon S3 with specified or implicit authentication. To have users assume IAM roles to complete bulk operations on an S3 bucket, you can also configure assume role authentication in addition to specified or implicit.
Specified authentication
If the AUTH
parameter is not provided, AWS connections default to specified
and the access keys must be provided in the URI parameters.
As an example:
BACKUP DATABASE <database> INTO 's3://{bucket name}/{path in bucket}/?AWS_ACCESS_KEY_ID={access key ID}&AWS_SECRET_ACCESS_KEY={secret access key}';
Implicit authentication
If the AUTH
parameter is implicit
, the access keys can be omitted and the credentials will be loaded from the environment (i.e., the machines running the backup).
New in v22.2:
You can grant a user the EXTERNALIOIMPLICITACCESS
system privilege.
BACKUP DATABASE <database> INTO 's3://{bucket name}/{path}?AUTH=implicit';
You can associate an EC2 instance with an IAM role to provide implicit access to S3 storage within the IAM role's policy. In the following command, the instance example
EC2 instance is associated with the example profile
instance profile, giving the EC2 instance implicit access to any example profile
S3 buckets.
aws ec2 associate-iam-instance-profile --iam-instance-profile Name={example profile} --region={us-east-2} --instance-id {instance example}
Assume role authentication
CockroachDB supports assume role authentication on clusters running v22.2. Authenticating to cloud storage with ASSUME_ROLE
on clusters running versions v22.1 and earlier, or mixed versions, is not supported and will result in failed bulk operations.
New in v22.2: To limit the control access to your Amazon S3 buckets, you can create IAM roles for users to assume. IAM roles do not have an association to a particular user. The role contains permissions that define the operations a user (or Principal) can complete. An IAM user can then assume a role to undertake a CockroachDB backup, restore, import, etc. As a result, the IAM user only has access to the assigned role, rather than having unlimited access to an S3 bucket.
Role assumption applies the principle of least privilege rather than directly providing privilege to a user. Creating IAM roles to manage access to AWS resources is Amazon's recommended approach compared to giving access straight to IAM users.
For example, to configure a user to assume an IAM role that allows a bulk operation to an Amazon S3 bucket, take the following steps:
Create a role that contains a policy to interact with the S3 buckets depending on the operation your user needs to complete. See the Storage permissions section for details on the minimum permissions each CockroachDB bulk operation requires. You can create an IAM role in Amazon's Management console, under the IAM and then Policies menu. Alternately, you can use the AWS CLI.
If you do not already have the user that needs to assume the role, create the user. Under IAM in the Amazon console, navigate to Users and Add users. You can then add the necessary permissions by clicking on the Permissions tab. Ensure that the IAM user has
sts:AssumeRole
permissions attached. The following policy will give the user assume role permissions:{ "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": "arn:aws:iam::{account ID}:role/{role name}" } }
The
Resource
here is the Amazon Resource Name (ARN) of the role you created in step 1. You can copy this from the role's Summary page.The
sts:AssumeRole
permission allows the user to obtain a temporary set of security credentials that gives them access to an S3 bucket to which they would not have access with their user-based permissions.Return to your IAM role's Summary page, and click on the Trust Relationships tab. Add a trust policy into the role, which will define the users that can assume the role.
The following trust policy provides the user the privilege to assume the role:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789123:user/{user}" }, "Action": "sts:AssumeRole" } ] }
When creating a trust policy consider the following:
- In the trust policy you need to include the ARN of the user that you want to assume the role under
Principal
. You can also include theCondition
attribute to further control access to the Amazon S3 bucket. For example, this could limit the operation to a specified date range, to users with multi-factor authentication enabled, or to specific IP addresses. - If you set the
Principal
ARN toroot
, this will allow any IAM user in the account with theAssumeRole
permission to access the Amazon S3 bucket as per the defined IAM role permissions. - When the IAM user takes on the role to perform a bulk operation, they are temporarily granted the permissions contained in the role. That is, not the permissions specified in their user profile.
- In the trust policy you need to include the ARN of the user that you want to assume the role under
Run the bulk operation. If using specified authentication, pass in the S3 bucket's URL with the IAM user's
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. If using implicit authentication, specifyAUTH=IMPLICIT
instead. For assuming the role, pass the assumed role's ARN, which you can copy from the IAM role's summary page:BACKUP DATABASE movr INTO 's3://{bucket name}?AWS_ACCESS_KEY_ID={user key}&AWS_SECRET_ACCESS_KEY={user secret key}&ASSUME_ROLE=arn:aws:iam::{account ID}:role/{role name}' AS OF SYSTEM TIME '-10s';
CockroachDB also supports authentication for assuming roles when taking encrypted backups. To use with an encrypted backup, pass the
ASSUME_ROLE
parameter to the KMS URI as well as the bucket's:BACKUP INTO 's3://{bucket name}?AWS_ACCESS_KEY_ID={user key}&AWS_SECRET_ACCESS_KEY={user secret key}&ASSUME_ROLE={ARN}' WITH kms = 'aws:///{key}?AWS_ACCESS_KEY_ID={user key}&AWS_SECRET_ACCESS_KEY={user secret key}®ION={region}&ASSUME_ROLE={ARN}';
For more information on AWS KMS URI formats, see Take and Restore Encrypted Backups.
Role chaining
Beyond a user assuming a role, it is also possible to "chain" roles to create a path for users to assume roles to particular operations. Role chaining allows a user to assume a role through an intermediate role(s) instead of the user directly assuming a role. In this way, the role chain passes the request for access to the final role in the chain. Role chaining could be useful when a third-party organization needs access to your Amazon S3 bucket to complete a bulk operation. Or, your organization could grant roles based on limited-privilege levels.
Assuming the role follows the same approach outlined in the previous section. The additional required step to chain roles is to ensure that the ARN of role A, which is assuming role B, is present in role B's trust policy with the sts:AssumeRole
action.
The role B's trust policy must contain:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{account-A-ID}:role/{role A name}"
},
"Action": "sts:AssumeRole"
}
]
}
In a chain of three roles, role C's trust policy needs to include role B in the same way. For example, to chain three roles so that a user could assume role C, it is necessary to verify the following:
User → | Role A → | Role B → | Role C |
---|---|---|---|
Has permission to assume role A. See step 2. | Has a trust policy that permits the user to assume role A. See step 3. | Has a trust policy that permits role A to assume role B. | Has a trust policy that permits role B to assume role C. |
Needs permission to assume role B. | Needs permission to assume role C. |
When passing a chained role into BACKUP
, it will follow this pattern:
BACKUP DATABASE movr INTO "s3://{bucket name}?AWS_ACCESS_KEY_ID={user's key}&AWS_SECRET_ACCESS_KEY={user's secret key}&ASSUME_ROLE={role A ARN},{role B ARN},{role C ARN}" AS OF SYSTEM TIME '-10s';
Each chained role is listed separated by a ,
. You can copy the ARN of the role from its summary page.
The AUTH
parameter passed to the file URL must be set to either specified
or implicit
. The default behavior is specified
in v21.2+. The following sections describe how to set up each authentication method.
Specified authentication
To access the storage bucket with specified
credentials, it's necessary to create a service account and add the service account address to the permissions on the specific storage bucket.
The JSON credentials file for authentication can be downloaded from the Service Accounts page in the Google Cloud Console and then base64-encoded:
cat gcs_key.json | base64
Pass the encoded JSON object to the CREDENTIALS
parameter:
BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=specified&CREDENTIALS={encoded key}';
Implicit authentication
For CockroachDB instances that are running within a Google Cloud Environment, environment data can be used from the service account to implicitly access resources within the storage bucket.
New in v22.2:
You can grant a user the EXTERNALIOIMPLICITACCESS
system privilege.
For CockroachDB clusters running in other environments, implicit
authentication access can still be set up manually with the following steps:
Create a service account and add the service account address to the permissions on the specific storage bucket.
Download the JSON credentials file from the Service Accounts page in the Google Cloud Console to the machines that CockroachDB is running on. (Since this file will be passed as an environment variable, it does not need to be base64-encoded.) Ensure that the file is located in a path that CockroachDB can access.
Create an environment variable instructing CockroachDB where the credentials file is located. The environment variable must be exported on each CockroachDB node:
export GOOGLE_APPLICATION_CREDENTIALS="/{cockroach}/gcs_key.json"
Alternatively, to pass the credentials using
systemd
, usesystemctl edit cockroach.service
to add the environment variableEnvironment="GOOGLE_APPLICATION_CREDENTIALS=gcs-key.json"
under[Service]
in thecockroach.service
unit file. Then, runsystemctl daemon-reload
to reload thesystemd
process. Restart thecockroach
process on each of the cluster's nodes withsystemctl restart cockroach
, which will reload the configuration files.To pass the credentials using code, see Google's Authentication documentation.
Run a backup (or other bulk operation) to the storage bucket with the
AUTH
parameter set toimplicit
:BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=implicit';
If the use of implicit credentials is disabled with --external-io-disable-implicit-credentials
flag, an error will be returned when accessing external cloud storage services for various bulk operations when using AUTH=implicit
.
To access Azure storage containers, it is sometimes necessary to url encode the account key since it is base64-encoded and may contain +
, /
, =
characters. For example:
BACKUP DATABASE <database> INTO 'azure://{container name}/{path}?AZURE_ACCOUNT_NAME={account name}&AZURE_ACCOUNT_KEY={url-encoded key}';
If your environment requires an HTTP or HTTPS proxy server for outgoing connections, you can set the standard HTTP_PROXY
and HTTPS_PROXY
environment variables when starting CockroachDB. You can create your own HTTP server with NGINX. A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from HTTPS URLs.
If you cannot run a full proxy, you can disable external HTTP(S) access (as well as custom HTTP(S) endpoints) when importing by using the --external-io-disable-http
flag.
While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).
A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca
cluster setting, which will be used when verifying certificates from an S3-compatible service.
Storage permissions
This section describes the minimum permissions required to run CockroachDB bulk operations. While we provide the required permissions for Amazon S3 and Google Cloud Storage, the provider's documentation provides detail on the setup process and different options regarding access management.
Depending on the actions a bulk operation performs, it will require different access permissions to a cloud storage bucket.
This table outlines the actions that each operation performs against the storage bucket:
Operation | Permission | Description |
---|---|---|
Backup | Write | Backups write the backup data to the bucket/container. During a backup job, a BACKUP CHECKPOINT file will be written that tracks the progress of the backup. |
Get | Backups need get access after a pause to read the checkpoint files on resume. | |
List | Backups need list access to the files already in the bucket. For example, BACKUP uses list to find previously taken backups when executing an incremental backup and to find the latest checkpoint file. |
|
Delete (optional) | To clean up BACKUP CHECKPOINT files that the backup job has written, you need to also include a delete permission in your bucket policy (e.g., s3:DeleteObject ). However, delete is not necessary for backups to complete successfully in v22.1 and later. |
|
Restore | Get | Restores need access to retrieve files from the backup. Restore also requires access to the LATEST file in order to read the latest available backup. |
List | Restores need list access to the files already in the bucket to find other backups in the backup collection. This contains metadata files that describe the backup, the LATEST file, and other versioned subdirectories and files. |
|
Import | Get | Imports read the requested file(s) from the storage bucket. |
Export | Write | Exports need write access to the storage bucket to create individual export file(s) from the exported data. |
Enterprise changefeeds | Write | Changefeeds will write files to the storage bucket that contain row changes and resolved timestamps. |
These actions are the minimum access permissions to be set in an Amazon S3 bucket policy:
Operation | S3 permission |
---|---|
Backup | s3:PutObject , s3:GetObject , s3:ListBucket |
Restore | s3:GetObject , s3:ListBucket |
Import | s3:GetObject |
Export | s3:PutObject |
Enterprise Changefeeds | s3:PutObject |
See Policies and Permissions in Amazon S3 for detail on setting policies and permissions in Amazon S3.
An example S3 bucket policy for a backup:
{
"Version": "2012-10-17",
"Id": "Example_Policy",
"Statement": [
{
"Sid": "ExampleStatement01",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{ACCOUNT_ID}:user/{USER}"
},
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::{BUCKET_NAME}",
"arn:aws:s3:::{BUCKET_NAME}/*"
]
}
]
}
In Google Cloud Storage, you can grant users roles that define their access level to the storage bucket. For the purposes of running CockroachDB operations to your bucket, the following table lists the permissions that represent the minimum level required for each operation. GCS provides different levels of granularity for defining the roles in which these permissions reside. You can assign roles that already have these permissions configured, or make your own custom roles that include these permissions.
For more detail about Predefined, Basic, and Custom roles, see IAM roles for Cloud Storage.
Operation | GCS Permission |
---|---|
Backup | storage.objects.create , storage.objects.get , storage.objects.list |
Restore | storage.objects.get , storage.objects.list |
Import | storage.objects.get |
Export | storage.objects.create |
Changefeeds | storage.objects.create |
For guidance on adding a user to a bucket's policy, see Add a principal to a bucket-level policy.
Additional cloud storage feature support
Object locking
Delete and overwrite permissions are not required. To complete a backup successfully, BACKUP
requires read and write permissions to cloud storage buckets. As a result, you can write backups to cloud storage buckets with object locking enabled. This allows you to store backup data using a write-once-read-many (WORM) model, which refers to storage that prevents any kind of deletion or modification to the objects once written.
We recommend enabling object locking in cloud storage buckets to protect the validity of a backup for restores.
For specific cloud-storage provider documentation, see the following:
- AWS S3 Object Lock
- Retention policies and Bucket Lock in Google Cloud Storage
- Immutable storage in Azure Storage
Amazon S3 storage classes
When storing objects in Amazon S3 buckets during backups, exports, and changefeeds, you can specify the S3_STORAGE_CLASS={class}
parameter in the URI to configure a storage class type. For example, the following S3 connection URI specifies the INTELLIGENT_TIERING
storage class:
's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}&S3_STORAGE_CLASS=INTELLIGENT_TIERING'
Use the parameter to set one of these storage classes listed in Amazon's documentation. For more general usage information, see Amazon's Using Amazon S3 storage classes documentation.
You can view an object's storage class in the Amazon S3 Console from the object's Properties tab. Alternatively, use the AWS CLI to list objects in a bucket, which will also display the storage class:
aws s3api list-objects-v2 --bucket {bucket-name}
{
"Key": "2022/05/02-180752.65/metadata.sst",
"LastModified": "2022-05-02T18:07:54+00:00",
"ETag": "\"c0f499f21d7886e4289d55ccface7527\"",
"Size": 7865,
"StorageClass": "STANDARD"
},
...
"Key": "2022-05-06/202205061217256387084640000000000-1b4e610c63535061-1-2-00000000-
users-7.ndjson",
"LastModified": "2022-05-06T12:17:26+00:00",
"ETag": "\"c60a013619439bf83c505cb6958b55e2\"",
"Size": 94596,
"StorageClass": "INTELLIGENT_TIERING"
},
For a specific operation, see the following examples:
- Backup with an S3 storage class
- Create a changefeed with an S3 storage class
- Export tabular data with an S3 storage class