Sharing data with CSV files
Learn how to collect, and format your data to send to Cable using CSV files
Introduction
Cable is the leader in financial crime Automated Assurance. We automatically monitor your financial crime controls to identify regulatory breaches, control failures, and any risks that could indicate your controls are not effective.
For Cable to be able to conduct ongoing monitoring of your financial crime controls, we need to receive regular data updates from you. You can either do this programmatically via our API, or by uploading CSV files directly through our web app. It may be easier for you to get started by providing us with manual CSV uploads.
This guide goes through two important topics:
- How to format your CSV file, provide good data to Cable via CSV, and avoid common pitfalls
- Commonly asked questions when uploading data
Before you begin this guide, you should be familiar with CSV files, some of their common pitfalls with strings, and the separators they use for the values. However, most programs (e.g. Excel) should be able to quickly export data into the CSV format.
Formatting your CSV file and data
It is important to understand the format of the CSV file to avoid upload and configuration issues.
When generating CSV files, confirm the following:
- There is a single row at the top of your spreadsheet that will act as the headers (or label) for each column
- A comma delimiter is used in your CSV, not a semi-colon (if you have that option)
- Use unicode (UTF-8) in your CSV when present in the export options
There are a few warnings though:
- Opening CSVs in different programs can change the encoding, and doing so can add unwanted formatting, and saving that file will result in corrupted data
- Special characters need to be correctly escaped
- Examples:
- Text with
,
→ Nationality GBR,PT - Text with
"
→ Risk rate “1.79”
- Text with
- Should be represented as:
"Nationality GBR,PT"
"Risk rate ""1.79"""
- Examples:
Formating your data correctly
Review your data and make sure that the formatting is consistent before sharing with Cable. This can be much more painful to correct after the sharing.
Our best practices for formatting your data are as follows:
- Name the CSV file according to its data and make the content similar to each API Endpoint, so it is easier for Cable to process it
- Avoid nested fields in your columns, try not to have JSON inside a single cell, as flattening it makes it easier to work with even if you have more columns
- Define how you format
null
or empty values, and format these values in a consistent way so that they are easy to parse - Review values that should fall into categories, and make sure the values correspond to what you expect when you export data from the source system
- For example: If you have a list of 3-letter country codes, make sure there are no differences:
GBR
,gbr
,Gbr
, etc
- For example: If you have a list of 3-letter country codes, make sure there are no differences:
What does a good CSV look like?
The information in this section will use practical examples to explain why you need to prepare your data, and how you should do it. These examples will focus on companies
and account_statuses
data, but can be generalised to any other data point that Cable collects. This section covers Reference IDs and timestamps, which are both essential elements for correct processing of your CSV data.
You can look at the Account Status API Endpoint documentation to better understand the data that is being used.
Reference IDs in the CSV
When uploading multiple files, where each file has a different type of data, it is important to be able to link records across these files. For example, we want to know how many account_status
changes a company
has been through.
Each CSV file should contain a column with the id
you use internally to identify the entity (user, company, etc.). It is important that this id
is consistent across multiple files, so we can link records.
For example, if you are uploading the companies
and the account_statuses
mentioned above, a row in each file might look like:
Companies
Account Status
Notice how the account status
row has a related_company_id
that is the same as the row in the companies
part of the table.
Timestamps in the CSV
Below is an example with randomised data for a business customer. Notice that each datapoint is given its own column and named to align with parameters of the Account Status API endpoint.
Each row in the Account Status table below has a single timestamp. The important things to remember about timestamps are:
- The timestamp must correspond to when the information in the row occurred or became available, e.g. when the account status changes happened
- If there are several timestamps relating to the data - e.g. if a company changes account status multiple times - then these must be in separate rows
- See the two events below with
id = 9
andid = 11
— in this case theaccount_status
of the object changed fromin review
toactive
.
- See the two events below with
- It is acceptable to have some of the columns
null
if no information was provided at that time
Differences between the first upload and the following ones
When you first upload data to Cable via CSV, we conduct a process to understand and map out your data to our internal system. We need to work with you during this process to ensure that we are correctly going through your data.
Once this mapping is done, we run your data through our analytics engine and derive alerts from it. Subsequently, if you don’t make any changes to your data format, we can process your files faster in the following uploads.
The first data upload should follow the standards agreed during the data onboarding process.
With the following uploads:
- Stick to the agreed cadence
- Keep the same format as the initial upload
- Send new events that have happened since the last upload
Commonly asked questions
How can I delete a file I mistakenly uploaded?
If you import the same file twice, or import the wrong file, please let us know so that we kcan fix the situation.
However, if you are uploading files that have duplicate rows or uploading data in two files that have overlapping periods, that is not a problem. We can deal with duplicated data that comes from any overlap at the beginning/end of each period.
Are there any possibilities of overwriting data?
No, each file is stored individually so we keep track of all the data sent to us. If there are any issues, reach out to us, and we can help you with your queries.