Skip to content

File-based driver properties

File drivers allow you to interact with various file formats, like CSV, Excel, JSON, XML, and Parquet, as if they were databases. Each driver comes with customizable properties to control how data is read and processed.

General properties

Property Description Default value
defaultSchema Default schema name. default
firstRow First row to read. Row numbering starts at 1. 1
rowCount Maximum number of rows to read. -1 means no limit. -1
scanSubfolders Scan subfolders for data files. true
subfolderNameSeparator Defines the separator used in subfolder names within schema names. __
licensePath Path to the driver's license file (for use outside of Team Edition).
internalDbBatchSize Internal database batch size. 1000
internalDbFilePath Internal database file path.
internalDbTransactionSize A number of batches in a single transaction. 10
useInternalDb Use internal database for complex queries. true
partitioningStrategy Defines how files are grouped into tables. Available values: none, folder, and pattern. none
partitioningRegex Regular expression used with pattern to extract the logical table name from file names. ^(.+)[_-].*$

File partitioning

Use partitioningStrategy to group multiple files of the same format into one logical table.

A logical table is a table shown by the driver in Team Edition. It can represent one file or a group of files, depending on the partitioning strategy. The source files aren’t physically merged. The driver reads matching files together and displays them as one table.

Available values:

  • none: Each file is a separate table.
  • folder: Files in the same folder are grouped into one table.
  • pattern: Files are grouped into separate logical tables by the table name extracted from file names.

Use partitioningRegex with pattern to extract the logical table name. The first capture group defines the table name. For each extracted name, the driver creates a separate logical table and reads matching files into that table. Files with different extracted names are shown as separate tables.

Example

With the default regex ^(.+)[_-].*$:

Files Logical table
data_01.csv, data_02.csv data
events-2023.json, events-2024.json events

Note

Files with different formats aren’t combined into the same table. For example, data_01.csv and data_02.parquet aren’t grouped together, even if the regex returns the same logical table name.

CSV driver properties

Property Description Default value
separator The delimiter to use for separating entries. ,
escapeChar The character to use for escaping a separator or quote. \
quoteChar The character to use for quoted elements. "
strictQuotes Sets if characters outside the quotes are ignored. false
ignoreLeadingWhitespace If true, parser should ignore white space before a quote in a field. true
ignoreQuotations If true, treat quotations like any other character. false
nullFieldIndicator Which field content will be returned as null. NEITHER
trimWhitespaces If true, parser should trim whitespaces from the beginning and end of the field. true
header If true, the first row is treated as a header. true
sampleRows Number of rows to extract metadata from. 5
wildcard Wildcard for file names. **.{csv,tcv,txt}

XLSX driver properties

Property Description Default value
header Indicates if the file has a header. true
sampleRows Number of rows to extract metadata from. 5
wildcard Wildcard for file names. **.{xlsx}
schemaNameMode Defines how the schema name is formed: RELATIVE_DIR_PATH
- RELATIVE_DIR_PATH: Uses the concatenated relative path of the directory, and the table name is $file_name + _ + $sheet_name.
- RELATIVE_FILE_PATH: Uses the relative file path to generate the schema name. Each sheet in the file becomes a table.

JSON driver properties

Property Description Default value
sampleRows Number of rows to extract metadata from. 5
wildcard Wildcard for file names. **.{json}

XML driver properties

Property Description Default value
sampleRows Number of rows to extract metadata from. 5
wildcard Wildcard for file names. **.{xml}

Parquet driver properties

Property Description Default value
tmpFolder Temporary folder for storing downloaded Parquet files from Cloud Storages.
wildcard Wildcard for file names. **.{parquet}