Importing Geospatial Data

This section explains how importing a dataset creates columns in CARTO (and the naming conventions that you should use). It includes how CARTO guesses content during the import process, lists the supported geospatial formats for uploading data, and describes how to upload multilayer datasets or batch file uploads.

Dataset Basics

When a file is imported, it is transformed into a dataset that can be processed by CARTO. The system automatically creates the following columns:

  • cartodb_id
    • This column is used as the primary key of the table
    • Its values must be integers, non-null, and unique
  • the_geom
    • This column stores the main geometric features of a dataset in the EPSG 4326 projection
  • the_geom_webmercator
    • This column stores the geometries transformed into the EPSG 3857 projection, and is used for rendering purposes
  • _feature_count
    • This column is automatically created when overview representations of data are created (for datasets containing more than 500,000 points).

When a dataset is exported from CARTO, it includes the cartodb_id and the_geom columns, which will be reused if the dataset is then imported to the system. This ensures that importing an exported dataset contains the original exported dataset content and row order.

If these columns are generated by the user, the CARTO table requirements must be followed in order to produce a successful import. Otherwise, importing datasets which do not meet the requirements (such as a dataset with duplicated integers in its cartodb_id column) will result in an import failure.

Naming

Apply the following naming conventions for datasets in CARTO, and avoid using the reserved words as part of your file names.

  • Table names must begin with a letter (a-z). Otherwise, “table_” is prepended to the name
  • Column names must begin with a letter (a-z) or an underscore (_)
  • Column and table names can have a maximum of 63 characters. Names are trimmed if they exceed this length
Reserved Words

There are certain words reserved in the system that cannot be used to name columns or datasets, mainly the PostgreSQL reserved words. Any names that conflict with a reserved word are prefixed with an underscore (_) automatically.

Import Guessing

CARTO includes guessing functionality during the import process. This is useful for when files or data are missing some upload information. The following guessing options are available:

  • Fields guessing

    For files whose format does not include type information (usually CSV files), field guessing options can be enabled. There are two particular guessing options for these type of files:

    • Type guessing: determines the type of imported columns from the text contents, available in the CSV file. If enabled, it generates numeric and boolean columns when appropriate, otherwise, it uses regular string columns

    • Quoted fields guessing: when disabled, avoids double quoted fields for type guessing. Otherwise, double quoted fields are used when enabled

  • Content Guessing

    Files that contain country, city, IP address information can be automatically geocoded by the system, if the content guessing option is enabled. This automatic geocoding only occurs if there is not a big proportion of repeated, or null values, in a column. Content guessing does not require the target columns to be named in a special way (such as “country” or “city”), CARTO inspects the different available columns and identifies which of them can be guessed geospatially.

Tip: For information about how to granularly configure the guessing options for your import process, view the upload file parameters on the standard tables section.

Supported Geospatial Data Formats

CARTO supports several geospatial data formats to upload vector data. The important details of each format, as well as some guidelines to upload your files to CARTO, are defined in this section.

Shapefile

The Shapefile format is a multi-file format — it consists of a set of files with the same name and stored in the same directory, which are differentiated by their extension.

A Shapefile has to be formed, at least, by a .shp file, a .shx file, a .prj file and a .dbf file. These files contain the geometry data, the indexes, the projection information and the attributes, respectively. Other auxiliary files are not mandatory and contain extra information for the Shapefile. Shapefiles must be imported as a single compressed file, in the .zip or .gz format.

Note: The Shapefile format has certain limitations that can affect the way that your datasets are exported/imported into CARTO:

  • The column name cannot exceed 10 characters. Exporting a dataset with longer names in this format will trim the names
  • Date columns only support the date, not the time. Exporting and importing a date column as a Shapefile will remove all time information and maintain just the date. If you need to work with date and time data, it is recommended to export/import the information as a string and convert it to a date
  • Although the projection of the file should be correctly determined and adjusted from the .prj file, it is recommended to upload Shapefiles in the EPSG 4326 projection
  • For improved compatibility, ensure you save your Shapefile with encoding UTF-8, prior importing

Keyhole Markup Language (KML)

The KML format is a XML based format which adds to it a geographical meaning by being able to define features such as points, polygons or lines in the EPSG 4326 projection.

KML uses common XML types such as string, boolean, double, or int, so your column types will be respected when your dataset is imported or exported from CARTO.

Each feature is defined as a Placemark element, which usually contains a name, a description, and the geometry itself. If more data columns are required, these fields need to be defined and included inside a ExtendedData element of the KML document.

In terms of geometric elements, the Point, Polygon, Line, MultiGeometry and Geometry elements are supported. Different geometry types in the same layer are not supported.

KMZ

A Keyhole Markup language Zipped (KMZ) file corresponds to a compressed file, including a KML file and zero, or more, supporting files (images, icons, overlays or other elements referenced in the KML file). See the Keyhole Markup Language (KML) section for more information.

GeoJSON

The GeoJSON format is an extension of the JavaScript Object Notation (JSON) that encodes geographical features and their metadata. This format supports data types such as string, double or boolean. Dates exported as GeoJSON are stored as strings and will be recognized as such, on data imports.

With respect to geometries, Points, (Multi)Polygons and (Multi)Lines are supported. GeometryCollection geometric objects are not supported and will raise an import error. The supported geometries can be imported inside FeatureCollection and Feature objects.

Importing different geometry types in a FeatureCollection element is not supported.

CSV

Comma-Separated Values (or TSV, Tab-Separated Values) files can be imported to CARTO. For a successful import, follow these formatting guidelines:

  • The first line of the CSV file must contain the name of the columns
  • The rest of the lines of the CSV file must follow the schema defined by the header column, in terms of number of columns
  • To ensure correct parsing, it is recommended that string values are double-quoted
  • If the data itself contains quotes, the values must be double-quoted and the internal quotes must be escaped
  • CSV lines must be terminated with CR/LF, or LF line terminators. CR line terminators are not supported
Example: Quoted strings in a CSV
1
2
name, description, score
"John Doe", "Awesome, the best player ever", 100
Example: Escaped quotes in a CSV
1
2
name, geojson
"Null Island", "{""type"": ""Point"", ""coordinates"": [0,0]}"
CSV Format Guessing

As the CSV format does not specify the type of the columns in the data, CARTO applies a guessing functionality that converts your data to columns, using a supported format. This enables you to generate numeric columns, or geocode your dataset directly on import.

There are two particular guessing options for CSV files: types guessing and quoted fields guessing. View the Import Guessing section for details.

Spreadsheets (Excel or OpenDocument)

Excel files, or other spreadsheets (such as OpenDocument spreadsheets or Google Drive spreadsheets) are supported by CARTO.

The format of the uploaded Spreadsheet must apply the following format:

  • The first row must contain the names for each column
  • Merged cells are not supported
  • Graphs, charts, or other kind of elements are not supported

For multi-sheet spreadsheets, only the first sheet will be imported.

GPX

The GPX (GPS Exchange Format) files are XML documents that contain waypoints, tracks and/or routes. When importing a GPX file, CARTO will generate different datasets for points, tracks and waypoints. The resulting names of these datasets will be a combination of the GPX name and their type: _track_points, _tracks, and _waypoints, respectively.

OSM

CARTO supports importing Open Street Map dumps (.osm files). These files are XML documents that have a osm parent element that can contain blocks of nodes, ways, or relations representing points, lines or polygons. CARTO will automatically separate OSM dumps into different tables, depending on the geometry. Therefore, importing a single OSM file can lead to more than one resulting dataset.

MapInfo

The MapInfo file format is geospatial vector data developed by MapInfo, which supports grids based multiple files. MapInfo files (.DAT, .ID, .MAP, .TAB) must be imported as a single compressed file, in the .zip or .gz format.

CARTO

CARTO files are CARTO generated map visualization files. This .carto file includes the dataset and visualization definition, which contains any SQL queries, CartoCSS, basemaps, attributions, metadata, and styling that was applied to a map. This is useful for downloading complete CARTO visualizations that you can share or import.

Multilayer Uploads

Several of the formats supported by CARTO can store different layers, or geometric types, by definition. Importing a file that contains more than one layer result in different imported datasets.

If the option create_vis is enabled in the import process, the different layers imported will be added to the created map. The number of layers that can be included in a map depends on the maximum value of layers per map in the configuration of the user.

The maximum number of datasets created from a multilayer file is 10. If the imported file contains more than 10 layers, those layers are omitted.

Shapefile

The different layers included in a Shapefile are imported as independent datasets.

KML Files

KML files generate a different dataset, per each Folder, that they contain.

GPX Files

GPX files that contain more than one type of elements (waypoints, tracks, and/or routes) are imported in a different dataset, per type.

OSM Files

OSM files generate a different layer, per each type of geometry that their nodes, ways, or relations represent (points, polygons or lines).

Multiple File Uploads

You can perform a batch file upload if the files are sent to the server in a compressed format. As with the case of multilayer uploads, if the import process is configured to generate a map after import, the different datasets are added as layers to the new map. The number of layers that can be included in a map depends on the maximum value of layers allotted to the users account.

The maximum number of files that can be imported in a single file is 10. If the compressed file contains more than 10 files, only the first 10 files are imported and the rest of the files are omitted.