For illustration lets take the case of very simple flat file in the below format.
The package consists of two data flow tasks. The first data flow task is used for storing the aggregate information from the source data. The data flow consists of below tasks
The flat file source is used to extract data from the source file. The aggregate task will group by relevant fields ( ID,Name in this case) to get unique values and apply aggregation (MIN) over remaining fields (Date in above example). This is then stored in cache using cache transform.
Now that we’ve the aggregated data ready we can utilize it in a lookup and get unique values from it. The second data flow is used for this purpose with below workflow
The flat file data is looked up against the aggregated data stored in cache. This will give match result only for the cases where date value is minimum among the group of ID and Name fields. So what effectively comes out of the match output would be the unique values. Link it to OLEDB destination task pointed to the destination table to get unique values out.
The Lookup task makes use of the cache connection manager to connect to the cache populated in the earlier data flow task and tries to match source values against the reference aggregated values.
Now if you go ahead and check the destination you can see only unique values getting populated in the the destination table as shown below
This gives a method for eliminating duplicates from flat file without using any staging table but just utilizing the cache as the temporary store. Hopefully It will help you out in similar scenarios.
The package and source file used can be downloaded from the below links
https://drive.google.com/file/d/0B4ZDNhljf8tQN19wUkVSamRzM0E/edit?usp=sharing
https://docs.google.com/document/d/1q0MI2zfUlg0M6INsFhlIlVQ6xqciohmR4vE_MIlrfwE/edit?usp=sharing