Sending data in batches
How to send data in groups or batches using the Data Ingestion API.
The Cable Data Ingestion API allows you to send data to Cable for processing by using batched requests.
In the per endpoint documentation, which contains information about each endpoint’s specifications, you will also be able to see which ones support batching.
What is batching?
Each HTTP connection that your client makes to our servers results in a certain amount of overhead. Batching is the act of grouping data together so that you can send it all at once in a single request, instead of making multiple requests each with a single data entry. This results in less overhead.
When to use batching
There are multiple situations where using batching might help you and your work. Here are a few examples:
Sending historical data
Your company might already have been live for months or years. Cable can take in historical data to perform a retrospective assessment of your financial crime control effectiveness. Using batching, a one-off script can be written to send Cable the required historical data with as little overhead as possible.
If your internal pipelines are complex, or you make changes during the course of a day, you might want to create a scheduled job that sends Cable multiple data entries at once, at a scheduled time.
Scheduled batched submissions are also useful when you have access to a data lake (or warehouse), like Redshift or BigQuery, and tools like Looker, Tableau, SAP, etc. The Cable Data Ingestion API offers a very flexible environment for direct integrations from these types of sources.
For various reasons, you may have data filtered down to CSV files, coming from other departments within your company, or that originally had a different destination or intent.
The Cable API supports CSV file submissions on the same endpoints that support single & batched JSON data submissions (see this guide for more details).
Based on the programming language you use internally, you can also find packages or modules online to help you read CSV files so that you can create the necessary HTTP calls to the API to send over the information in JSON format.
Improving “real-time” integrations
If you have already integrated with the Cable Data Ingestion API, there may be occasions when you want to send two or more data entries at the same time. These can include things like getting multiple data points from an internal queueing system which spits out processed data in chunks, or getting data back from one of your other 3rd party integrations again as chunks.
Managing a single connection means you don't have to do the initial handshake for each data entry that needs to be sent. Using batching, everything is processed together and sent back as one response. Of course, it may take a bit longer to handle a single packet rather than the sequential method, but your throughput is increased. This is because roundtrip latency for request to response is not multiplied. Thus, you get a performance gain in terms of requests handling speeds.
Another thing to keep in mind is throttling. API throttling is the process of limiting the number of API requests a client can make in a certain period. API throttling is a solution that helps Cable to ensure fair use of our APIs and also to help combat possible malicious intents. Batching helps with avoiding throttling because the number of requests sent will be reduced.
As mentioned above, each endpoint is built to support batching by default. This means that whether the client sends one data entry or multiple, the format will be the same, just with either one or more entries in an array. By having it configured this way, there is less confusion as to how to do one or another, and effortless to switch at any point for a client.
As with the format, the responses from the endpoints are the same between single or batched submissions.
For success-200 responses, the response will always be:
For error responses, the returned JSON will be more complex. Specifically for the case of batching, what you need to look at are the “body” errors. This will contain information about the index of the data entry inside the submitted array, as well as the properties that may have failed.
An example where the 1st entry is missing the
timestamp property and the 3rd is missing
data will look like this:
The below example carries through all endpoints that support batching.
- Single data entry request
- Multiple data entries request