Import Data
Follow these instructions to share your data to be annotated with us
Data imports work slightly differently depending on what type of data you're importing.
- First, consider the data type. We support many data types natively and can help you transform assets that aren't natively supported.
- Second, consider your preferred import method, specifically if you'd like to host the data on your own cloud or import from an unrestricted URL.
Here's an overview of data import and hosting methods available by data type.
| Import method | Image | DICOM | Video | Text |
|---|---|---|---|---|
| Amazon S3 import (Web + API) | ✅ | ✅ | ✅ | ❌ |
| Self-hosted Amazon S3 / Azure / Google Cloud (Web + API) (Web + API) | ✅ | ✅ | ✅ | ✅ |
| URL import | ✅ | ✅ | ✅ | ❌ |
| .zip upload (Web only)* | ✅ | ✅ | ✅ | ❌ |
| .csv upload (Web only) | ❌ | ❌ | ❌ | ✅ |
| API direct upload | ❌ | ❌ | ❌ | ✅ |
*In the case of uploading multiple .zips, please ensure there are no duplicate files across the different .zip uploads. Uploading duplicate files will result in data upload issues
We currently support the following DICOM syntaxes:
- Implicit VR Little Endian
- Explicit VR Little Endian
- Explicit VR Big Endian
- JPEG Baseline (Process 1)
- JPEG Lossless, Nonhierarchical (Processes 14)
- JPEG Lossless, Non-Hierarchical, First-Order Prediction (Process 14 [Selection Value 1])
- JPEG 2000 Image Compression (Lossless Only)
- JPEG 2000 Image Compression
- RLE Lossless
If you use a different syntax, please inform your project manager so we can prepare our systems accordingly!
Each asset’s origin must be unique within a project. Attempting to import an asset that shares an origin with an existing one will cause errors.
Images, video, or audio data
Add all the assets you'd like to label to a folder. Then, choose an import method:
API (recommended)
Follow instructions for initiating data import via API. This is the best method for automating your data pipeline and workflow.
Centaur Portal
Visit the project page where you'd like to add data. The Data imports tab includes the two supported methods of imports.
- Amazon S3: upload your folder of data to an S3 bucket and follow our S3 import instructions.
- Zip uploader: compress your data, and import it with the zip file uploader. There is a size limit of 25GB per zip upload.
If you've already added data to the project, select Add data to reopen the import workflow and use one of the methods above.
Add a files manifest (required for image series)
A files manifest is required if you selected image series as your data type.
Examples of when you need a files manifest:
- Frames from a video (e.g. multiple frames from the same ultrasound)
- Audio and images shown together in a video (e.g. spectrogram with audio recording)
Once you've imported your files, click + Upload Manifest next to your data import. Learn more about creating and uploading a files manifest.
Text data
Choose an import method:
API (recommended)
Follow instructions for initiating data import via API. This is the best method for automating your data pipeline and workflow. Your text data must be configured as JSON.
Centaur Portal
Visit the project page where you'd like to add data. The Data imports tab lets you choose between adding plain text, or HTML data.
Plain text import is the preferred method for most text data imports. However, when you have text data that requires styling, we recommend importing HTML. Learn more about formatting HTML.
- For NER and classification tasks, static highlighting on plain text data can be accomplished after creating your task (agnostic of if the data is self-hosted). Provide a CSV containing the case ID and ranged interval for the static highlight using to your project manager. All ranges should be 0-indexed, include the start index, and exclude the end index. Ranges must be provided in ascending order. Here is an example: [[6,15],[23,32].
Format a CSV with two columns: one for the origin and one for the content (text) you'd like labeled. Click Add data to upload it and get started.
If you've already added data to the project, you'll be restricted to the type of text you already added—plain text or HTML.
Other data types
Are you sharing a non-native datatype. Our team will help you transform it into something that can be labeled.
Follow the steps above for importing image, video or audio data. On project creation, be sure to select Image as your data type. All non-native assets will be converted into an image content type by our team.
Once you've initiated your import, our team will be notified. We will convert your import to a usable format, and notify you when it's ready.
Updated 3 months ago
