Questions tagged [hive]

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

-2
votes
0answers
12 views

How to execute custom C++ binary on HDFS file

I have custom c++ binaries which reads raw data file and writes derived data file. The size of files are in 100Gbs. Moreover, I would like to process multiple 100Gb files in parallel and generated a ...
1
vote
1answer
11 views

When the underlying data changes, do we need to drop and create the partition in Hive?

Let's say I have a hive table partitioned by date with its data stored in S3 as Parquet files. Let's also assume that for a particular partition (date), there were originally 20 records. If I then ...
0
votes
3answers
29 views

hive date cast issue

Hi in my Hive table I have a column with the date values like this . cl1 31102019000000 30112019000000 31122019000000 I have tried to convert the column values to date format like this Select ...
0
votes
0answers
19 views

sh: HIVE: command not found

I'm running a python script that lance some shell scripts and hql scripts (trough the shell). It was working fine, but now i'm getting this "output": No rows affected (326.768 seconds) Beeline ...
1
vote
2answers
23 views

Is there a difference between using between vs '> & <' when quering a hive table partitioned on date string?

I have a use of selecting data from a large hive table partitioned on date (format : yyyyMMdd), the hive query is required to fetch few fields from 6 months of data (total 180 date partitions. ...
1
vote
2answers
25 views

Changing dd/mm/yyyy/ hh/mm/ss format to yyyymm in Hive

I'm using Hive at the moment. I have a column (column A) of strings which is in the following format 11/9/2009 0:00:00. I'd like to extract the yyyymm. i.e. I'd like the above string to be 200909. I'...
0
votes
1answer
21 views

How Can I Run Sequential Temp Tables & Final SELECT Query

I'm used to BigQuery where I can run temp tables with the 'WITH' clause and then join those temp tables with a final query. However, I am now using a Hive db via DataGrip where I cannot run sequential ...
1
vote
1answer
16 views

Hive explain plan where to see full table scan?

How can I see from hive EXPLAIN is there a full table scan?
-4
votes
1answer
23 views

Need quick access to data from Hadoop (s/ms) [on hold]

I have a table in hive while query it is taking more time due to huge data available in that table. Which tools will help me to get data under a second/milli-second? Note: I taking data from ...
0
votes
1answer
6 views

Connecting to HIVE using JAVA with Cloudera Drivers with Kerberos Authentication

I am looking for a JAVA code to connect to a HIVE DB using Cloudera HIVE drivers, which needs Kerberos Authentication. Over the net I can only find codes with Apache Drivers, is there a difference? ...
-1
votes
0answers
11 views

Volume share calculation against partner level

I need to calculate the volume share against partners. I need overall volume to calculate share, but with groups, it is not possible. select m.partner, m.partner_name, p.volume, p.volume*100/sum(p....
1
vote
1answer
29 views

How to “filter” records in Hive table?

Imagine table with id, status and modified_date. One id can have more than one record in table. I need to get out only that row for each id that has current status together with the modified_date when ...
1
vote
2answers
50 views

Newly Inserted Hive records do not show in Spark Session of Spark Shell

I ran a simple program of Spark-sql to get data from Hive to Spark session using spark-SQL. scala> spark.sql("select count(1) from firsthivestreamtable").show(100,false) +--------+ |count(1)| +---...
0
votes
0answers
16 views

What might be the reason if hive performance is low all of a sudden?

All complex queries like joins had run well until last week. But all of a sudden, I have been encountering the bad gateway 504 error in hue. I can see application stuck at 95% in yarn. And to worsen ...
1
vote
1answer
28 views

regexp_replace function in hive to format SSN

Can anyone please help? I want to replace SSN with dash for the given string using regexp_replace in Hive SQL I am trying with below query but get the result as 1-2-3 select regexp_replace("...
0
votes
0answers
12 views

Set output S3 chunk size in Hive Activity running on AWS EMR 4.7.0

I'm trying to figure out how to configure block size (or chunk size) of individual objects written to s3 through a Hive activity running on AWS EMR 4.7.0. For my usecase, we've an AWS Data-Pipeline ...
0
votes
0answers
14 views

how to save hive storage data in s3 as unencrypted format

I am trying to load the unencrypted data from other source in hive and trying to save in another location in S3 using EMR. However the problem is while saving the data in S3, its storing as part ...
0
votes
1answer
17 views

How to Create DDL in HIVE and save it as a file in your directory

Currently, I use the following code to show the DDL of tables in HIVE: Show create table cus_data I'm trying to write the results of that statement to a file in a given location on my command line. ...
1
vote
1answer
13 views

only keep distinct rows when doing collect_set over a moving windowing function in hive

Lets say I have a hive table that has 3 rows: merchant_id, week_id, acc_id. My goal is to collect the unique customers in the previous 4 weeks for each week and I am using a moving window to do this. ...
0
votes
1answer
15 views

Hive external table delimited by commas, but comma present in data

I have some data coming in from an external source of the format: user_id, user_name, project_name, position "111", "Tom Petty", "Heartbreakers", "Vocals" "222", "Ringo Starr", "Beatles, The", "...
-3
votes
0answers
30 views

How to load a fixed length .csv file into hive warehouse

I want to load a fixed length .csv file into hive table using pyspark. Format of fixed length .csv file would be: Start position | End position | column name | Data type | Col value 1 | ...
0
votes
0answers
26 views

Optimizing Query based on Inserting into multiple columns into target table from one column of a source table

I am trying to insert values from one column of a table to multiple columns of another table based on conditions. I have prepared the query but it's getting stuck at 80% of the MapReduce phase I am ...
0
votes
0answers
13 views

How to join in HDFS 2 tables out which 1 is small to be stored in the RAM of a node in the cluster?

I have the following tables in HDFS/Hive: One table, e.g. D, which is small enough to be stored in the RAM of a single compute node in the cluster Another table, e.g. E, which is much bigger than ...
0
votes
0answers
15 views

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

I am getting a weird issue while importing data to partition table. Below is the query. INSERT OVERWRITE table db.table1 partition(date) select A.*,A.date from db.table2 A; sometimes the query ...
0
votes
1answer
39 views

Hive returns Circular dependencies error when reset running total to 0 after reaching the limit

I'm trying to compute a running total and reset it to 0 based on 2 conditions or if the limit is reached, but I'm getting into a circular dependencies error. Here is an example. As in the image ...
0
votes
0answers
11 views

What is the difference between “Location” and “path” in hive tables

When describing a table in hive, I get two paths in the description of the table. hive> MSCK REPAIR TABLE default.am_test; OK Time taken: 0.061 seconds ... ... Location: hdfs://...
0
votes
0answers
7 views

Hive database size check for specific period

I am using Ambari Sandbox.I want to check the hive database size for a specific time interval. I know below command which will give the entire size: dfs -du -s -h /path/to/table output I have ...
1
vote
1answer
47 views

Building derived column using Spark transformations

I got a table record as stated below. Id Indicator Date 1 R 2018-01-20 1 R 2018-10-21 1 P 2019-01-22 2 R 2018-02-28 2 P 2018-05-22 2 ...
0
votes
0answers
11 views

Distribute by a Range of IDs into multiple Reducers and Sort by the IDs for efficient Access through Hive

All, I am trying to find out the most efficient way of storing data into my Hive table which enables the query engine to make the best use of bloom filters and storage index. This table has billions ...
1
vote
2answers
18 views

Get particular values from a string in hive

I have a hive table with two columns both are strings name details "john" , {"addr":"NY","phone":"1234"} "john" , {"addr":"CA", "phone":"7145"} "mary" , {"addr":"BOS","phone":"1234"} Is ...
1
vote
1answer
43 views

Concat function in postgresql

I have the below select statement in hive .It executes perfectely fine. In Hive select COALESCE(product_name,CONCAT(CONCAT(CONCAT(TRIM(product_id),' - '),trim(plan_code)),' - UNKNOWN')) as ...
0
votes
0answers
12 views

How to fix “java.lang.NoSuchMethodError” for geoip2 java running under Hive

I am having trouble when executing a UDF with geoip2 (maxmind) as dependency under Hive 2.3.4 (java 8) the same code works fine under older versions of hive that use java 7 and also under Presto that ...
0
votes
0answers
20 views

Can't connect to Hive server with spark JDBC in kerberised cluster

I try to read data from one Hive (hive n°1) and write result into another Hive (hive n°2) ( they are from 2 different cluster ). I can't use a single spark session to connect to both Hive, so i will ...
3
votes
1answer
33 views

Perform validation and checks on Hive table (may not be a duplicate)

We know that Hive does not validate data based on the fields and its a user responsibility to check it manually. I am aware of few basic checks which we can perform to validate the data. Count the ...
0
votes
0answers
11 views

Is There any way to Map Hive column type with Hbase column value type while using hive external table?

Below is Hive external table on hbase: CREATE EXTERNAL TABLE `mobile_claim_raised_hbase`( `id` string COMMENT '', `phone_number` string COMMENT '', `claim_ts` bigint COMMENT '') ROW ...
0
votes
1answer
8 views

FAILED: HiveAuthzPluginException Error getting permissions for hdfs

I am trying to insert data into hive table from a file in hdfs directory by the query: $ jdbc:hive2://localhost:10000> LOAD DATA INPATH '/user/xyz/stdfiles/testtbl.txt' OVERWRITE INTO TABLE testdb....
0
votes
1answer
22 views

How to read stream of structured data and write to Hive table

There is a need to read the stream of structured data from Kafka stream and write it to the already existing Hive table. Upon analysis, it appears that one of the options is to do readStream of Kafka ...
4
votes
0answers
37 views

Hive partitioned table reads all the partitions despite having a Spark filter

I'm using spark with scala to read a specific Hive partition. The partition is year, month, day, a and b scala> spark.sql("select * from db.table where year=2019 and month=2 and day=28 and a='y' ...
0
votes
0answers
19 views

How spark decides the no. of partitions/tasks to create when it reads from Hive

Let's say: We have a table stored in Hive partitioned on the date. For example: we have a table called Person and a partition inside it called datestr=2019-01-01 and it is stored in Parquet format(...
0
votes
0answers
14 views

Iterative select from hive database using pyspark kernel in jupyter notebook

How can I use for loop to split a hive database using pyspark: I have a database that contains 80 million rows sorted by ids.each ID can have several rows. pyspark %%sql -o df1 -n -1 SELECT VAR1, ...
0
votes
0answers
21 views

Apache spark (2.4.2) with Hive MetaStore 3

I am trying to connect Spark 2.4 to Hive Metastore 3 to catalog ORC files on S3. Spark Configuration: sparkConf .set("spark.sql.catalogImplementation", "hive") ....
0
votes
0answers
12 views

Can you create a range partition table in hive?

I create partitions in hive using static partitions or dynamic partitions. Hive: create table employees ( name string, salary float, subordinated array<string>, deductions map<string,...
1
vote
0answers
15 views

How to evaluate multiple rows based on a column

Table ActualTO(60000+ records daily increment) have columns rid, caid & consent .rid is primary key.there will be multiple unique caid's corresponding to each rid.consent Column contains either ...
0
votes
2answers
36 views

Spark - Hive table returnig null value on shell

I am trying to pull hive table data on spark shell using spark.sql(" ") but it's giving null values. Hive table contains data. Even I have written code using HiveContext object but still same issue ...
0
votes
0answers
11 views

Hive is cannot run query or cannot run hive in cli mode

We have some problem in our HDP envrionment (HDP 2.6.5) We can access hive from ambari but when we query such as show tables it returns Message: E090 HDFS020 Could not write file /user/admin/hive/...
1
vote
2answers
32 views

Explode on multiple columns in Hive

I'm trying to explode records in multiple columns in Hive. For example, if my dataset looks like this - COL_01 COL_02 COL_03 1 A, B X, Y, Z 2 D, E, F V, W I want this as ...
1
vote
1answer
49 views

Lateral view / explode in Spark with multiple columns, getting duplicates

I have the following dataframe with some columns that contains arrays. (We are using spark 1.6) +--------------------+--------------+------------------+--------------+--------------------+------------...
1
vote
1answer
16 views

How can I pull 12 million rows into CSV from hive table using Java faster?

I need to pull ~12million rows into CSV using JDBC to Hive Connection. Can I do it faster using some batch processing? Can I append the CSV file? I have made the connection to Hive using JDBC and I ...
0
votes
0answers
11 views

how do I keep single quotes with ^a-zA-Z0-9\?

I am using something like this to get the text I need. how to extend this to include single quotes if the msg is 'I'hve' or O'Keefe. regexp_replace(regexp_replace(msg string,'\n',''),'[^a-zA-Z0-9\...
0
votes
0answers
9 views

Hive JDBC - connection leakage when getConnection fails

I am using cloudera hive jdbc https://www.cloudera.com/downloads/connectors/hive/jdbc/2-5-4.html Sometimes, when the calling of getConnection() fails (not always, depends on server stability), it ...

http://mssss.yulina-kosm.ru