Our customer wants to carry out a big research & analytics to serve several business purposes requiring information about customer behaviors, number of best-selling models, latest software rates, flow of products (from and to), percentage of OS version and download & upload speed.
What available is data from various sources in different formats including:
- Detailed transactions from multi-regions market
- Software that collects data from different laptops
- A great load of agreements
- Scattered pieces of data collected
FPT Software created a Data Lake in Folder Structure with three main zones including:
- Transient: to hold ephemeral data, such as temporary copies or other short-lived data before being moving to Qualify check process and stay on Raw Zone
- Raw: store data permanently and in its original form “the single source of truth”. The raw data is usually considered immutable data (i.e., unchangeable). This allows you to go back to a point in time if necessary.
- Process: Enriched folder and a place where the raw data is further enhanced. Ex: the output from complex analytics would go in here. The intent of this folder is that we store data that is readily consumable for analytics.
The project succeeded brilliantly with valuable information:
- Best-selling model all over the world
- Internet Speed of each regions
- The Big markets among the world
- Best software installed for UX