Metadata-Driven File Ingestion and Quarantine for Metropolis Central Data Inbox (Microsoft Fabric)
Company Background
The City of Metropolis launched a city-wide Lakehouse initiative to centralize
data from departments such as Police, Transportation, and 311 Services. A
shared cloud-based Central Data Inbox was created as a single entry point for
incoming files. Due to the lack of automation and standards, the inbox became
congested, impacting analytics efficiency and data governance.
Current Situation
All departments delivered files to the same inbox. Analysts manually reviewed
files and triggered pipelines, which led to duplicate processing, skipped
files, unsupported formats being ignored, and no audit trail. As data volumes
increased, analytics and reporting slowed.
Key Challenges
No automated file detection or classification, heavy reliance on manual
processes, duplicate ingestion due to missing cleanup, unsupported formats not
tracked, poor auditability, and inconsistent downstream data availability.
Objective
Design a single automated, metadata-driven ingestion framework that dynamically
scans incoming files, routes them based on format, ingests only valid data into
Lakehouse tables, quarantines unsupported files, cleans the inbox after
processing, and maintains full traceability.
Solution Architecture
A Microsoft Fabric pipeline dynamically scans the Central Data Inbox, evaluates
file metadata at runtime, routes files using conditional logic, ingests
validated data into Lakehouse Delta tables, quarantines unsupported formats,
and automatically cleans the inbox to prevent reprocessing.
Metadata-Driven Ingestion
A Get Metadata activity retrieves a complete inventory of incoming files at
runtime. This removes dependency on file naming conventions and ensures every
file is evaluated, enabling scalable and automated ingestion.
Deliverables
Delta tables were created for Police, Parking, and 311 datasets. Valid files
were archived in department-specific folders, unsupported files were
quarantined for review, and the Central Data Inbox was cleared after
processing.
Business Impact
Manual effort was eliminated through full automation, data duplication was
prevented, data reliability and governance improved, analytics and reporting
accelerated, and complete traceability of file movement was achieved.




Comments
Post a Comment