File-based driver properties
File drivers allow you to interact with various file formats, like CSV, Excel, JSON, XML, and Parquet, as if they were databases. Each driver comes with customizable properties to control how data is read and processed.
General properties¶
| Property | Description | Default value |
|---|---|---|
defaultSchema |
Default schema name. | default |
firstRow |
First row to read. Row numbering starts at 1. | 1 |
rowCount |
Maximum number of rows to read. -1 means no limit. |
-1 |
scanSubfolders |
Scan subfolders for data files. | true |
subfolderNameSeparator |
Defines the separator used in subfolder names within schema names. | __ |
licensePath |
Path to the driver's license file (for use outside of Team Edition). | |
internalDbBatchSize |
Internal database batch size. | 1000 |
internalDbFilePath |
Internal database file path. | |
internalDbTransactionSize |
A number of batches in a single transaction. | 10 |
useInternalDb |
Use internal database for complex queries. | true |
partitioningStrategy |
Defines how files are grouped into tables. Available values: none, folder, and pattern. |
none |
partitioningRegex |
Regular expression used with pattern to extract the logical table name from file names. |
^(.+)[_-].*$ |
File partitioning¶
Use partitioningStrategy to group multiple files of the same format into one logical table.
A logical table is a table shown by the driver in Team Edition. It can represent one file or a group of files, depending on the partitioning strategy. The source files aren’t physically merged. The driver reads matching files together and displays them as one table.
Available values:
none: Each file is a separate table.folder: Files in the same folder are grouped into one table.pattern: Files are grouped into separate logical tables by the table name extracted from file names.
Use partitioningRegex with pattern to extract the logical table name. The first capture group defines the table
name. For each extracted name, the driver creates a separate logical table and reads matching files into that table.
Files with different extracted names are shown as separate tables.
Example
With the default regex ^(.+)[_-].*$:
| Files | Logical table |
|---|---|
data_01.csv, data_02.csv |
data |
events-2023.json, events-2024.json |
events |
Note
Files with different formats aren’t combined into the same table. For example, data_01.csv and
data_02.parquet aren’t grouped together, even if the regex returns the same logical table name.
CSV driver properties¶
| Property | Description | Default value |
|---|---|---|
separator |
The delimiter to use for separating entries. | , |
escapeChar |
The character to use for escaping a separator or quote. | \ |
quoteChar |
The character to use for quoted elements. | " |
strictQuotes |
Sets if characters outside the quotes are ignored. | false |
ignoreLeadingWhitespace |
If true, parser should ignore white space before a quote in a field. | true |
ignoreQuotations |
If true, treat quotations like any other character. | false |
nullFieldIndicator |
Which field content will be returned as null. | NEITHER |
trimWhitespaces |
If true, parser should trim whitespaces from the beginning and end of the field. | true |
header |
If true, the first row is treated as a header. | true |
sampleRows |
Number of rows to extract metadata from. | 5 |
wildcard |
Wildcard for file names. | **.{csv,tcv,txt} |
XLSX driver properties¶
| Property | Description | Default value |
|---|---|---|
header |
Indicates if the file has a header. | true |
sampleRows |
Number of rows to extract metadata from. | 5 |
wildcard |
Wildcard for file names. | **.{xlsx} |
schemaNameMode |
Defines how the schema name is formed: | RELATIVE_DIR_PATH |
- RELATIVE_DIR_PATH: Uses the concatenated relative path of the directory, and the table name is $file_name + _ + $sheet_name. |
||
- RELATIVE_FILE_PATH: Uses the relative file path to generate the schema name. Each sheet in the file becomes a table. |
JSON driver properties¶
| Property | Description | Default value |
|---|---|---|
sampleRows |
Number of rows to extract metadata from. | 5 |
wildcard |
Wildcard for file names. | **.{json} |
XML driver properties¶
| Property | Description | Default value |
|---|---|---|
sampleRows |
Number of rows to extract metadata from. | 5 |
wildcard |
Wildcard for file names. | **.{xml} |
Parquet driver properties¶
| Property | Description | Default value |
|---|---|---|
tmpFolder |
Temporary folder for storing downloaded Parquet files from Cloud Storages. | |
wildcard |
Wildcard for file names. | **.{parquet} |