hadoop - Hive update all values in a column


Keywords:hadoop 


Question: 

I have an external partitioned Hive table. One of its columns is a string named OLDDATE that has the date in a different format(DD-MM-YY). I want to update the column and store dates in YYYY-MM-DD format. All years are 20XX.

So I thought of this

select CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-',SPLIT(OLDDATE ,'-')[0]) from table

This gives me the dates in the format I want. Now how do I overwrite the old date with this new date?


3 Answers: 

You can effect an update by overwriting the table with its own contents, just with the date field changed according to your transformation, like this pseudo-code:

INSERT OVERWRITE table
SELECT
    col1
  , col2
  ...
  , CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-',SPLIT(OLDDATE ,'-')[0]) AS olddate
  ...
  , coln
FROM table;
 

Since its an partitioned table, the folder names must be created with the date values. Hence you are not able to update the values.

One work around for this would be create a new table and run your above query and insert data into the new table.

After that you can drop your existing table and treat this new table as your required table.

 

@user2441441 To overwrite a partitioned table:

    INSERT OVERWRITE table PARTITION (p_col)
    SELECT
        col1
      , col2
      ...
      , CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'- 
     ',SPLIT(OLDDATE ,'-')[0]) AS olddate
      ...
      , coln
      , p_col
   FROM table;