We need to be careful in every stage, from loading data from S3 to temporary table to transformed data from temporary table to target table. A simple example of where things could go wrong is with decimal values getting lost while data is loaded into an intermediate table. Why? Likely because you mishandled conversions. You did everything right, and there still seems to be a mismatch in data between data file and table. Data type for columns that store time keys where as expected data is max 10-digit only. Using a data type like INT8 or even INT4 will result in more storage utilization than it actually needs so SMALLINT is the most suitable data type for these columns. Using random data types and data length is considered bad table design for example, considering the SALESFACT table STOREID and DAYID columns will have limited distinct values because they reference a dimension that has several hundred to few thousands rows only.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |