The best practice is usually to clean data inside the data source (or when storing the data), unless there’s a good reason to keep or remap blank '' or NULL values solely inside Luzmo.
Typical reasons to clean data at the source:
The absence of data is not meaningful, for example when a customer record has no country, but you know it should never be empty. Cleaning the data at the source will mean that all downstream systems (not just Luzmo) benefit.
You want consistency across systems. If different teams and (BI) tools consume the same dataset, applying a clean standard at the source avoids confusion.
The field participates in joins or filters. NULL values can cause the following behaviour, so cleaning upstream avoids fragmentation:
Good reasons to clean data in Luzmo:
NULL values → "Unknown"
Blank '' values → "Not provided"
These distinctions may genuinely matter in analysis, but might not always make sense to changes these values in your data source.
The data source is not easily editable, e.g. you receive the dataset from an external partner, a database you cannot modify, a SaaS system with limited data cleaning options, etc.
You want dashboard-specific transformations (via multiple derived columns). Maybe in one dashboard, blanks should be grouped under "Other", while in another dashboard, they should remain separate.
As an example of a derived column that allows you to replace blank and N/A values with your prefered strings (in this case blank values become the "Blank" string and N/A values become "Not Available"), you can use the following column definition:
IF(
your_column = '',
'Blank',
COALESCE(
your_column,
'Not Available'
)
)
For a deep dive into more data modelling best practices, check out our Designing effective data models for analytics course.