数据文件内容
steven:100;steven:90;steven:99^567^22ray:90;ray:98^456^30Tom:81^222^33
期望最终放到数据库的数据格式如下:
steven 100 567 22steven 90 567 22steven 99 567 22ray 90 456 30ray 98 456 30Tom 81 222 33
Specifically, if you want to return a different number of columns, or a different number of rows for a given input row, then yu need to perform what hive calls a transform.
1.创建表存储原始数据
create table u_data(col1 string, code int, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' STORED AS TEXTFILE;
2.加载数据
load data local inpath '/home/stevenxia/data1' overwrite into table u_data;
3.编写transform脚本
#!/usr/bin/pythonimport sysfor line in sys.stdin: values = line.split() tmp = values[0] key_values = tmp.split(";") for kv in key_values: k = kv.split(":")[0] v = kv.split(":")[1] print '\t'.join([k,v,values[1],values[2]])
4.把脚本部署到node节点, 位置 /home/stevenxia/u.py
5.这样hive就可以使用了
select transform(u.col1, u.code, u.age) using '/home/stevenxia/u.py' as (col1, col2, col3, col4) from (select * from u_data) as u;
运行结果