Hive: Partitioning by part of integer column


Keywords:hive 


Question: 

I want to create an external Hive table, partitioned by record type and date (year, month, day). One complication is that the date format I have in my data files is a single value integer yyyymmddhhmmss instead of the required date format yyyy-mm-dd hh:mm:ss. Can I specify 3 new partition column based on just single data value? Something like the example below (which doesn't work)

create external table cdrs (
record_id int, 
record_detail tinyint,
datetime_start int
)
partitioned by (record_type int, createyear=datetime_start(0,3) int, createmonth=datetime_start(4,5) int, createday=datetime_start(6,7) int)
row format delimited 
fields terminated by '|' 
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");

1 Answer: 

If you want to be able to use MSCK REPAIR TABLE to add the partition for you based on the directories structure you should use the following convention:

  • The nesting of the directories should match the order of the partition columns.
  • A directory name should be {partition column name}={value}

If you intends to add the partitions manually then the structure has no meaning.
Any set values can be coupled with any directory. e.g. -

alter table cdrs  
add if not exist partition (record_type='TYP123',createdate=date '2017-03-22') 
location 'hdfs://nameservice1/tmp/sbx_unleashed.db/2017MAR22_OF_TYPE_123';

Assuming directory structure -

.../sbx_unleashed.db/record_type=.../createyear=.../createmonth=.../createday=.../

e.g.

.../sbx_unleashed.db/record_type=TYP123/createyear=2017/createmonth=03/createday=22/

create external table cdrs 
(
   record_id      int
  ,record_detail  tinyint
  ,datetime_start int
)
partitioned by (record_type int,createyear int, createmonth tinyint, createday tinyint)
row format delimited 
fields terminated by '|' 
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;

Assuming directory structure -

.../sbx_unleashed.db/record_type=.../createdate=.../

e.g.

.../sbx_unleashed.db/record_type=TYP123/createdate=2017-03-22/

create external table cdrs 
(
   record_id      int
  ,record_detail  tinyint
  ,datetime_start int
)
partitioned by (record_type int,createdate date)
row format delimited 
fields terminated by '|' 
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;