Questions tagged [hiveql]

Variant of SQL used in the Apache Hive data warehouse infrastructure. Use this tag for questions related to the Hive Query Language including HiveQL syntax and HiveQL-specific functions.

0
votes
0answers
8 views

CREATE EXTERNAL TABLE AS SELECT issue

Error : Error: Error while compiling statement: FAILED: NoMatchingMethodException No matching method for class org.apache.hadoop.hive.ql.udf.UDFToString with (array<string>). Possible choices:...
0
votes
0answers
23 views

Decimal Field In Hive Only Returning Results When Encapsulating Value In Single Quotes

I ingested a table from a SQLServer database into a Hive database. When I attempt to do a lookup on a value in Hive, I have to surround the value with single quotes in order to find it. In SQLServer, ...
0
votes
0answers
25 views

Can hive create an temorary table so I can get a small table for further queries?

I have a table, the schema is as below: struct TBL{ 1:optional string category; 2:optional string endpoint; 3:optional string priority; 4:optional i64 timestap; 5:optional string ...
1
vote
1answer
20 views

Hive Query with multiple Columns in Select and group by one column

I have the below sample image of the dataset and the expected result. What can be the best way to achieve this kind of result in a dataset with a billion records. Should we use the intermediate ...
-1
votes
0answers
11 views

how do i avoid concurrency between two hive query

I have two spark jobs, the first job insert overwrite in a managed hive table and the second job is reading from the same table. they are runing in parallel. The problem is the second job dont take ...
0
votes
0answers
7 views

special character with upper() case function in impala

I have a requirement in my current project. we have a column in a hdfs file "ASCII_FIXED_STR" when I am retrieving through impala using below query, I am not getting the proper output sql without ...
0
votes
0answers
16 views

Hive - Unable to load the data properly as i'm getting some of the values as NULL [on hold]

After creating a hive table, the values for some columns which I'm getting as are NULL. There shouldn't be any NULL value in the output. Can someone please help me out? Sample Data: |1||, , <...
1
vote
1answer
15 views

How to use hive variable substitution rightly

When I'm using variable substitution in hive, I meet some errors, but I need your help. My code: set hievar:b='on t1.id=t2.id where t2.id is null'; select * from t_old as t1 full outer join t_new as ...
0
votes
0answers
22 views

Compare Hive table columns to list of values in another table field?

It may sound confusing but I need to compare a list of table names and columns stored in a hive table with the actual tables columns and produce a comparison result. Querying hive metastore is not ...
1
vote
1answer
38 views

How to explode quantiles in hive

I am trying to get quantiles of a field and I want to explode them so that each value is a separate row rather than all of them forming a single array. First, I calculate 20 quantiles as below: ...
0
votes
0answers
20 views

can i remove inner join and two outer select ( as its just mapping ) from the below query

We have this query that runs on our hivedb daily. We have a inner join, UNION-ALL with left outer join UNION ALL with Right outer join in this query. Can I remove the inner join and outer two select ...
0
votes
1answer
41 views

Find second highest salary in each department using rank/dense_rank in hive

These were the two questions asked to me during an interview but only condition is to use rank/dense_rank. Find second highest salary in each department using rank/dense_rank in hive. ...
0
votes
1answer
24 views

TextInputFormat vs HiveIgnoreKeyTextOutputFormat

I'm just starting out with Hive, and I have a question about Input/Output Format. I'm using the OpenCSVSerde serde, but I don't understand why for text files the Input format is org.apache.hadoop....
0
votes
0answers
10 views

Hive looping query results and execute another query for each of the previous results

Is there any way in Hive to solve the below use case. Within one hql file (sequence of hive queries in one file) , do the below Select data from a table Iterate the above result and for each ...
1
vote
1answer
27 views

Exploding a list in Hive SQL to identify blanks

I have a column called part_nos_list as array<\string> in a hive table. Apparently that column has blank and I want to update that with a '-'. The code sort of does that but 42 rows in a group by ...
0
votes
1answer
15 views

hive create an array from string

In my data I have comma seperated strings. It would be much easier if these would be arrays, so I can easily match them with another array for example. However, I am not able to create an array from ...
1
vote
1answer
21 views

Query taking time despite adding session settings

Following is the ETL generated query Query - SELECT infaHiveSysTimestamp('SS') as a0, 7991 as a1, single_use_subq30725.a1 as a2, SUBSTR(SUBSTR(single_use_subq30725.a2, 0, 5), 0, 5) as a3, CAST(1 AS ...
-1
votes
0answers
12 views

Hive external table and internal table performance

Can anybody tell the detail performance level with partition table created with partition in manage table ( data loading with Load data inpath into manage table ) V/s external table with partition ( ...
0
votes
1answer
40 views

Why is my Hive QL Query that I run in SSMS via Openquery through the Hortonworks ODBC Driver producing an error?

I set up a connection to a Hive server using the Hortonworks ODBC Driver for Apache Hive. Version info is below: OS: Windows Server 2012 R2 Standard Hive: 1.2.1000.2.6.5.4-1 Hadoop: 2.7.3.2.6.5.4-1 ...
0
votes
2answers
20 views

Hive- how to get the derive column names and use it in the same query?

I am trying to run the below query : select [every_column],count(*) from <table> group by [every_column] having count(*) >1 But column names should be derived in the same query. I believe ...
1
vote
1answer
17 views

Hive - Update records in a table with todays date IF they are not found in another table?

I currently have a Main result table (test1) that stores all of my records of issues and a second table (test2) that is run every week or so and I'm trying to find those records where not exists in ...
1
vote
1answer
30 views

Count number of objects in JSON array stored as Hive string column

I have a Hive table with a JSON string stored as a string in a column. Something like this. Id | Column1 (String) 1 | [{k1:v1,k2:v2},{k3:v3,k4:v4}] ...
1
vote
0answers
34 views

NTILE() in hive is getting stuck at 99%

I am running the below query in hive :- CREATE TABLE IF NOT EXISTS database1.table2 AS SELECT A.company,A.amount, NTILE(100) OVER(PARTITION BY A.company ORDER BY A.amount DESC) as pct FROM (select ...
0
votes
1answer
24 views

How to choose columns for Partitioning and bucketing in hive Table?

What will be the ideal columns for partitioning and bucketing for the below schema? Is it necessary to implement both or one is good to go? user_id INTEGER UNSIGNED, product_id VARCHAR(20), gender ...
1
vote
2answers
25 views

How to go through all partitions in hive?

I want to update column's value in all partitions. Before I found insert overwrite can be used to update data. My current statement is insert OVERWRITE table s_job PARTITION(pt ='20190101') select ...
0
votes
0answers
29 views

Hive Create table if not exist else insert overwrite data in table

Sorry if this is a dumb question but I am trying to figure out how to write Create table if not exists else insert overwrite the SQL query results. I also want this to be a permanent table (not ...
-1
votes
1answer
33 views

PySpark throwing ParseException for syntactical correct Hive Query

I got a DDL query that works fine within beeline, but when I try to run the same query within a sparkSession it throws a parse Exception. from pyspark import SparkContext, SparkConf from pyspark.sql ...
1
vote
2answers
22 views

Select Top Level Domain from email address in Hive

I'm trying to find the length of the top level domains within an emailaddress column. I've tried a few iterations of regexp_replace, but no success. Failed attempts are all around the following ...
0
votes
0answers
14 views

Hive partition table entry

I am creating a process A that inserts records into hive partitioned table on daily basis. As soon as the table insert is complete I want to trigger another process B.Process B is present in another ...
1
vote
1answer
24 views

Hive Track changes in a column

Hi I have been trying to monitor divergence from an original value in a column in hive. For Example: column 1 tracking_column 6 0 6 0 6 0 5 -1 6 0 6 ...
0
votes
1answer
31 views

Window function error At least 1 group must only depend on input columns

I have a common table expression with a window function and keep getting an error message: Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into ...
0
votes
2answers
104 views

Cast a concatenation of a number and string to a date

Tried: select concat(cast(201201 as string), '01'); Returns 20120101 However: select to_date(concat(cast(201201 as string), '01')); Returns null. How can I tell hive to treat the result of ...
0
votes
0answers
21 views

Change Hive table properties during select

I have this table that gets updated everytime a "select *" statement is run. CREATE EXTERNAL TABLE TABLE_EXAMPLE ( id string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' LOCATION "/...
0
votes
0answers
57 views

Reading Hive view created with CTE (With clause) from spark

I have a view on Hive created with CTE (WITH clause) that, union two tables, then compute to show only the most recent record per id. In my env, I have a tool for browsing hive databases (DBeaver, ...
0
votes
0answers
17 views

Not able to insert the data more than 15 precision in hive table column of decimal data type

As default maximum precision decimal data types is 38. I tried to create a table with decimal column (31,2) and then tried to insert the data.Data gets loaded perfectly fine untill 15th digit but as i ...
-1
votes
0answers
8 views

What is the use of add_bias() in hivemall?

I start using hivemall at my work. I don't know the exact use of add_bias().
-1
votes
0answers
11 views

HIVE Tables - Partitioned Files

Here is the folder/ partitioned structure - /FLIGHT/2019/03/01/XYZ.tsv, /FLIGHT/2019/03/02/XYZ.tsv, /FLIGHT/2019/03/03/XYZ.tsv etc. While declaring HIVE table can we use something like /FLIGHT/{}/{}/{...
0
votes
0answers
33 views

How do I execute this MySQL query in hive query

Hi I have below a MySQL query, can anyone help me on how do I execute this on hive query with the same output? MY QUERY: SELECT YEAR(transdate) AS _year,DATE(transdate) AS sodate,custid, ...
1
vote
1answer
34 views

How to define a partitioned external table for a nested directory structure

For a set of datafiles stored in hdfs in a year/*.csv structure as follows: $ hdfs dfs -ls air/ Found 21 items air/year=2000 drwxr-xr-x - hadoop hadoop 0 2019-03-08 01:45 air/year=...
0
votes
1answer
21 views

Syntax error: cannot recognize input near “ ” " in function specification

I am running following hive query in qubole select locate(';', substring(tags, locate('Swimlane:', tags), length(tags))) from myTable Error I am getting: Syntax Error: org.apache....
0
votes
2answers
29 views

How to create an RDD directly from Hive table?

I am learning spark and creating rdd using the SparkContext object and using some local files, s3 and hdfs as follows: val lines = sc.textFile("file://../kv/mydata.log") val lines = sc.textFile("s3n:...
0
votes
0answers
11 views

find the average of hh:mm:ss in hive

I have a hive table with columns script_name, start_time,end_time,duration. All the 3 columns are in the format of hh:mm:ss. I want to take the average value of these columns
0
votes
0answers
6 views

why is that hiveserver2 is showing correct results but hiveserver1 is not?

I am trying a simple select count(*) query. But getting 0 rows in hiveserver1 and correct data in hiveserver2. Any ideas on the reason for this?
1
vote
1answer
26 views

How to Find the average of hh:mm:ss in hive

Consider i have hive table with columns script_name, start_time, end_time, duration. Start time, end time and duration are in the format of hh:mm:ss. My requirement is to find the average time of ...
0
votes
1answer
17 views

hive:how to get last 3 month total spend when use join

How to get last 3 month total spend when use join source1 and source2 then get target table? source1: +--------+----------+ | cst_id | date | +--------+----------+ | a | 20180125 | | b ...
1
vote
0answers
26 views

Why cannot Hive Map key (i16) cast upside when search by key?

There is Hive table that is defined by thrift. struct A { 1: optional map<i16, string> myMap; } I have tried a few queries to search through this table. // this one gets an error: select ...
1
vote
1answer
48 views

UNION ALL doesn't generate any data in Hive

I am trying to do UNION ALL for three different tables with same DDL structure but the final output is generating zero rows. I have no clue with what's happening in the underlying execution. Could ...
1
vote
0answers
58 views

AWS EMR S3 Hive

I follow the instructions from the book titled Big Data Visualization, see https://www.amazon.com/Big-Data-Visualization-James-Miller/dp/1785281941 Basically, the steps are: a) Load in a huge text ...
1
vote
2answers
34 views

how to convert the customised time to seconds in HIVE

I am looking for a solution for my issue. My issue is , i want to convert my data to seconds. The data in my HIVE table looks below: my input : time 2m3s 10s 12.2 ...
0
votes
1answer
18 views

HiveQL Query to Find a Delta Between Rows if a Condition matches

I have some data in data lake: Person | Date | Time | Number of Friends | Bob | 02/01 | unix_ts1 | 5 | Kate | 02/01 | unix_ts2 | 2 | Jill |...

http://mssss.yulina-kosm.ru