Q1] What do we need to do, to avoid duplicate data being entered in Data Hub?

Q2] Can we write any formula in Data Hub for concatenation, to increase performance? Does Data Hub supports formula or calculation?

How is the data being duplicated in the data hub
To answer the question, yes it is possible to avoid duplicate data. But it is usually better and more efficient if duplicate data never enters the data hub in the first place

1) Understand the data and meta data. Ask the extraction team to provide a unique key in the file, concatenate several fields to form the unique key in the extraction itself if need be. If there are duplicates, then the unique key is not unique. Highlight this to the extraction team. I would not let duplicate data to enter inside the data hub. Keep refining the unique until you reach such a point.

2) We can write minimal formulas in data hub. Try to avoid concatenations, if possible push it to the extraction layer itself.