Exporting and Downloading Datasets

The DiVA API enables you to define a Marqeta platform dataset and export it as a compressed CSV file. You can choose between Zip or Gzip compression. After export, you use the API to download the compressed file.

Exporting a dataset as a file

You can export any dataset as a CSV file by sending a GET request to the appropriate endpoint. To construct your endpoint URL, start with the URL you would use to retrieve that same dataset in JSON format, for example: /views/authorizations/month?program=my_program Then insert the export_type path parameter (/csv) before the query string, for example: /views/authorizations/month/csv?program=my_program By default, the resulting dataset is compressed as a gz file. You can compress it as a zip file by including the compress query parameter, for example: /views/authorizations/month/csv?compress=zip&program=my_program Because the export operation is processed asynchronously, you should receive an immediate 202 Accepted response. The JSON-formatted response body contains a token that you will use in downloading your data-set file, for example:

JSON

{
    "token": "db63c24d8307c24b7e17d33735114dc8f807838a.csv.gz"
}

Downloading the exported file

Note
By default, the DiVA API returns 1,048,575 rows in a file export and can take several minutes to generate the file. You can increase the download limit up to 5,000,000 rows by including the max_count=5000000 parameter.

To retrieve your file, send a GET request to the /download?token={my_download_token} endpoint, where {my_download_token} is the value of the token field that was returned in response to your export request, for example: /download?token=db63c24d8307c24b7e17d33735114dc8f807838a.csv.gz

Note
The token value includes two filename extensions (for example, .csv.gz). You must include these extensions in your request URL.

The API returns one of these responses:

If the job is not finished: The 202 "Accepted" HTTP response code and a plain-text body containing the word Pending.
If the job is finished: The 200 "OK" HTTP response code and the file as an application/octet-stream.
If the job has expired: The 410 "Gone" HTTP response code. Completed jobs expire after 60 minutes.

When saving your file, use the same filename extensions you used in your URL request, for example: my_downloaded_file.csv.gz The following example of Python code illustrates how you can download an exported report file in CSV format:

Python

import requests
 from requests.auth import HTTPBasicAuth
 import time
 import pandas as pd

 # Constants for HTTP response codes
 RC_SUCCESS = 200
 RC_ACCEPTED = 202
 RC_UNAUTHORIZED = 401

 # Generate authentication string
 username = 'APPLICATION_TOKEN' # replace APPLICATION_TOKEN with your application token
 password = 'ACCESS_TOKEN' # replace ACCESS_TOKEN with your access token
 basic_auth = HTTPBasicAuth(username, password)

 # Download an exported file with the specified token
 # Parameters:
 # file_token - token of the file to download
 # auth - authentication string
 # base_url - base api path for download url
 # retry_seconds - maximum time to retry, in seconds
 def getCSV(file_token, auth, base_url, retry_seconds = 300):

             # Set timeout to current time plus maximum time to retry
             timeout = time.time() + retry_seconds

             # Build URL to download exported file
             download_file_url = base_url + '/download?token=' + file_token

             # Check status whether the file is ready for download
             code = requests.head(download_file_url, auth = basic_auth).status_code
             while (code != RC_SUCCESS) and time.time() < timeout:
                 time.sleep(1)
                 # Retry check status
                 code = requests.head(download_file_url, auth = basic_auth).status_code

             if code == RC_SUCCESS:  # check status succeeded - the file is ready to download
                 download_response = requests.get(download_file_url, auth = basic_auth)

                 # Save the response content into a temporary file
                 file = open('temp.csv.gz', 'wb')
                 file.write(download_response.content)
                 file.close()

                 # Read the CSV content from the gzipped file
                 data_out = pd.read_csv('temp.csv.gz', compression = 'gzip',
                                    error_bad_lines = False)

             else:
                 data_out = 'no timely response' # check status timed out

             return data_out

  # Build URL to export dataset for resource of interest (e.g. cards) in desired file format (e.g. CSV)
 api_base_path = 'https://diva-api.marqeta.com/data/v2'
 resource_format_path = '/views/cards/detail/csv'
 program_selector = '?program=MY_PROGRAM' # replace MY_PROGRAM with the name of your program
 export_dataset_url = api_base_path + resource_format_path + program_selector

 # Invoke request to export the dataset
 export_response = requests.get(export_dataset_url, auth = basic_auth)

 if export_response.status_code == RC_ACCEPTED: # export request succeeded

     # Obtain the CSV file token from the response
     export_file_token = export_response.json().get('token')

     # Call the getCSV function to download the CSV file
     data = getCSV(file_token = export_file_token, auth = basic_auth, base_url = api_base_path)

     if data == 'no timely response':
         print('Failure: No timely response')
     else:
         print('Success: Dataset length = ' + str(len(data)))

 elif export_response.status_code == RC_UNAUTHORIZED:
     print('Failure: Unauthorized access') # authentication failed

 else:
     print('Failure: Unknown error') # export request failed

​Exporting a dataset as a file

​Downloading the exported file

Exporting a dataset as a file

Downloading the exported file