Skip to content

Debug AWS Service with CLI Tools

This guide shows how to debug AWS service issues with AWS CLI tool. The AWS CLI tool is already installed in the PuppyGraph Image.

1. Start PuppyGraph with Proper Authentication method

1.1. Start PuppyGraph with Access and Secret Key

User can export Access key and secret key to PuppyGraph by using environment variable.

Environment Variable Description Example Value
AWS_ACCESS_KEY_ID access key AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY secret key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_REGION AWS region for the S3 bucket us-east-1

So user can start PuppyGraph with command

docker run -p 8081:8081 -p 8182:8182 -e AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE -e AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY -e AWS_REGION=us-east-1 --name puppy --rm -itd puppygraph/puppygraph:stable

1.2. Start PuppyGraph Using Assumed Role

User can start PuppyGraph as before and set related variable inside the container.

docker run -p 8081:8081 -p 8182:8182  --name puppy --rm -itd puppygraph/puppygraph:stable
Then set variable as here

1.3. Start PuppyGraph Using Instance Profile

User need not set any variable when the Instance Profile is properly set for the EC2 machine.

docker run -p 8081:8081 -p 8182:8182  --name puppy --rm -itd puppygraph/puppygraph:stable

2. Enter PuppyGraph Container

User needs to enter the container to use the AWS CLI tool.

docker exec -it puppy bash

3. Debug Service with AWS CLI Tool.

After setting the proper authentication method, user can execute aws command in the shell.

3.1. Debug AWS S3 Service.

In PuppyGraph, we need to list folder and get objects for S3 service. To learn more about S3 cli, user can visit website for more detail.

3.1.1 List S3 Folder

Suppose we want to list folder s3://example_folder/, we can run

aws s3 ls s3://example_folder/
Then, the command is expected to return files or folders in it. If return:
An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
it means user does not have the permission to list the folder. User needs the set the policy as guideline

3.1.2 Get Objects

Suppose we want to get S3 object s3://example_folder/schema.json, we can run

aws s3 cp s3://example_folder/schema.json /tmp/
Then, the command is expected to return the message:
download: s3://example_folder/schema.json to /tmp/schema.json

3.2. Debug AWS Glue Service.

For Glue service, we need to list databases, list tables and get table meta. To learn more about Glue cli, user can visit website for more detail.

3.2.1 List Databases.

To list databases, user must specify the region.

aws glue get-databases --region us-east-1
Then, the command should return json message like
{
    "DatabaseList": [
        {
            "Name": "test_db",
            "Parameters": {
                "owner": "user"
            },
            "CreateTime": "2025-05-20T01:25:36+00:00",
            "CreateTableDefaultPermissions": [
                {
                    "Principal": {
                        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                    },
                    "Permissions": [
                        "ALL"
                    ]
                }
            ],
            "CatalogId": "12334778858"
        }
    ]
}

3.2.2 List Tables.

To list tables, user must specify the database name. For example, to list tables in test_db, user can type:

aws glue get-tables --database-name test_db
Then, the command should return json message like
{
    "TableList": [
        {
            "Name": "person",
            "DatabaseName": "test_db",
            "CreateTime": 1686035123.0,
            "UpdateTime": 1690400519.0,
            "Retention": 0,
            "StorageDescriptor": {
                "Columns": [
                    {
                        "Name": "id",
                        "Type": "string",
                        "Parameters": {
                            "iceberg.field.current": "true",
                            "iceberg.field.id": "1",
                            "iceberg.field.optional": "true"
                        }
                    }
                ],
                "Location": "s3://test/test_db.db/person",
                "AdditionalLocations": [],
                "Compressed": false,
                "NumberOfBuckets": 0,
                "SortColumns": [],
                "StoredAsSubDirectories": false
            },
            "TableType": "EXTERNAL_TABLE",
            "IsRegisteredWithLakeFormation": false,
            "CatalogId": "012345678901"
        }
    ]
}

3.2.3 Get Table Metadata.

To get a table detail, user must specify the database and table name. For example, to get tables test_db.person, user can type:

aws glue get-table --database-name test_db --name person
Then, the command should return json message like
{
    "Table": {
        "Name": "person",
        "DatabaseName": "test_db",
        "CreateTime": 1686035123.0,
        "UpdateTime": 1690400519.0,
        "Retention": 0,
        "StorageDescriptor": {
            "Columns": [
                {
                    "Name": "id",
                    "Type": "string",
                    "Parameters": {
                        "iceberg.field.current": "true",
                        "iceberg.field.id": "1",
                        "iceberg.field.optional": "true"
                    }
                }
            ],
            "Location": "s3://test/test_db.db/person",
            "AdditionalLocations": [],
            "Compressed": false,
            "NumberOfBuckets": 0,
            "SortColumns": [],
            "StoredAsSubDirectories": false
        },
        "TableType": "EXTERNAL_TABLE",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "012345678901"
    }
}

3.3. Debug AWS S3Tables Service.

For S3Tables service, we need to list databases, list tables and get table meta. To learn more about S3tables cli, user can visit website for more detail.

3.3.1 List Databases.

To list databases, user can input command

aws s3tables list-table-buckets

Then, the command should return json message like

{
    "tableBuckets": [
        {
            "arn": "arn:aws:s3tables:us-east-1:012345678901:bucket/test",
            "name": "test",
            "ownerAccountId": "012345678901",
            "createdAt": "2025-03-20T06:45:33.381786+00:00",
            "tableBucketId": "2d53-76a4-4eab-9acc-acc0",
            "type": "customer"
        }
    ]
}
The arn in the return is what we need.

3.3.2 List Tables.

To list tables, user needs to specify bucket-arn which we get in the list databases result.

aws s3tables list-tables --table-bucket-arn arn:aws:s3tables:us-east-1:012345678901:bucket/test

Then, the command should return json message like

{
    "tables": [
        {
            "namespace": [
                "test"
            ],
            "name": "person",
            "type": "customer",
            "tableARN": "arn:aws:s3tables:us-east-1:012345678901:bucket/test/table/4f3ee873-cdc5-41cc-bda1-d7b916efa83f",
            "createdAt": "2025-03-20T06:54:24.080422+00:00",
            "modifiedAt": "2025-03-20T06:59:28.057763+00:00"
        }
    ]
}

3.3.3 Get Table Metadata

To get table metadata, user needs to specify bucket-arn, namespace and table name.

aws s3tables get-table --table-bucket-arn arn:aws:s3tables:us-east-1:012345678901:bucket/test --namespace test --name person

Then, the command should return json message like

{
    "name": "person",
    "type": "customer",
    "tableARN": "arn:aws:s3tables:us-east-1:012345678901:bucket/test/table/4f3ee873-cdc5-41cc-bda1-d7b916efa83f",
    "namespace": [
        "test"
    ],
    "namespaceId": "2324dc2ed-618a-4323-9c12-21ee3411e8uu",
    "metadataLocation": "s3://4f3ee873-cdc5-41cc-mhnuezdnba67t55cb1xbb4otyiqmquse1b--table-s3/metadata/00001-e1533bf2-581e-4689-9dfa-4b141f1bbf8e.metadata.json",
    "warehouseLocation": "s3://4f3ee873-cdc5-41cc-mhnuezdnba67t55cb1xbb4otyiqmquse1b--table-s3",
    "createdAt": "2025-03-20T06:54:24.080422+00:00",
    "createdBy": "012345678901",
    "modifiedAt": "2025-03-20T06:59:28.057763+00:00",
    "ownerAccountId": "012345678901",
    "format": "ICEBERG",
    "tableBucketId": "3dea5312-73l1-1qaa-0aec-alcd9aa49a00"
}