hadoop - sqoop inserting data into wrong hive column from rdbms table


Keywords:hadoop 


Question: 

I have a table called 'employee' in SQL Server :

ID  NAME    ADDRESS   DESIGNATION
1   Jack     XXX       Clerk
2   John     YYY       Engineer

I have created an external table (emp) in hive and through sqoop import I imported data from employee to hive table using --query argument of sqoop. If I mention --query as 'select * from employee' then data gets inserted to hive table correctly.But if I mention --query as 'select ID,NAME,DESIGNATION' from employee' then data in DESIGNATION column of 'employee' table(rdbms) is getting inserted to address column of 'emp' table instead of getting inserted to designation column.When I run the below hive query:

select designation from emp;

I get values as : NULL NULL instead of : Clerk Engineer But if I run the hive query as :

select address from emp;

I get values as : Clerk Engineer instead of :NULL NULL Any ideas of fixing this incorrect data would be of great help.I am currently using 0.11 version of hive so I can't use hive insert queries which are available from 0.14 hive version.


2 Answers: 

ok,I show you a sample.

sqoop import --connect jdbc:mysql://host:port/db'?useUnicode=true&characterEncoding=utf-8' \
--username 'xxxx' \
--password 'xxxx' \
--table employee \
--columns 'ID,NAME,DESIGNATION' \
--where 'aaa=bbb' \
-m 1 \
--target-dir hdfs://nameservice1/dir \
--fields-terminated-by '\t' \
--hive-import \
--hive-overwrite \
--hive-drop-import-delims \
--null-non-string '\\N' \
--null-string '\\N' \
--hive-table 'hive_db.hive_tb' \
--hive-partition-key 'pt' \
--hive-partition-value '2016-01-20'

and some param is optional.

sqoop syntax detail:

 

The Sqoop statement will import the data to hdfs directory as(assuming field separator as ,)

1,Jack,Clerk
2,John,Engineer

So Address column will have DESIGNATION data and DESIGNATION column will be null

You can try --query "select ID,NAME,'',DESIGNATION from employee", this should work