Questions tagged [hive]

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

0
votes
0answers
8 views

Column type issue when transferring bulk data between MySQL to Hive through Sqoop

I'm trying send data from my MySQL thought Sqoop to Hive database. After run, this execution stops with a issue in the column type: 19/03/25 14:26:37 INFO mapreduce.ImportJobBase: Publishing Hive/...
0
votes
0answers
26 views

How to connect spark with hive using pyspark?

I am trying to read hive tables using pyspark, remotely. It states the error that it is unable to connect to Hive Metastore client. I have read multiple answers on SO and other sources, they were ...
0
votes
0answers
15 views

Create Elasticsearch mapping table in aws glue catalog

Is it possible to create external table in aws glue catalog, which points to existed Elasticsearch index (e.g. aws elasticsearch server), as we can do it in Hive metastore? And then insert data or ...
-2
votes
1answer
17 views

How do I see the partitions in a teradata view. Just provide me the syntax of displaying the partitions of a particular table [on hold]

I tried checking for the syntax of displaying the partitions on the internet but didnt find what I wanted. I just want to know the basic syntax of showing partitions from a table in teradata.
0
votes
1answer
15 views

presto hive metastore connections

In one of my application I have been using presto and hive-metastore to query data from s3. In order to configure the hive-metastore on production(I am going to deploy presto and hive on docker ...
-1
votes
0answers
12 views

How to test Hive CRUD queries from Shell scripting

i am creating a shell script, which should execute The HIVE basic queries and assert that with expected result. from where should i start in shell scripting.? thanks in advance
1
vote
2answers
33 views

Space in column name is throwing exception while parquet is used for compression

I am getting below error while inserting the data into a table of parquet format with column name having space. Used Hive client of Cloudera version CREATE TABLE testColumNames( First Name string) ...
0
votes
1answer
25 views

Data is there in HDFS but not fetch in hive table

I have loaded record from hive table through spark program, data loaded successfully into HDFS but records are not fetching in Hive table. Please find below compressing technique which we are using. ...
0
votes
0answers
12 views

Reflect api in hive failed while using it in beeline

I have a requirement to generate unique if for each and every record in hive.I have used the reflect api available in hive.The query works fine while using from hive command prompt but fails while ...
2
votes
2answers
23 views

Changing column name in a Hive external table that contains data

I have a column insert_process_id that I am trying to rename to process_id. This external table is in parquet file format. Please advise how to rename this column.
0
votes
0answers
20 views

How to use Python UDF for Hive as inline code without script file

I'm using Python package impyla to connect to Hive programmatically; I'm not using the hive CLI. And I'm trying to use a UDF written in Python. All tutorials I've seen do this like this ADD FILE ...
1
vote
2answers
56 views

How to identify repeated occurrences of a string column in Hive?

I have a view like this in Hive: id sequencenumber appname 242539622 1 A 242539622 2 A 242539622 3 A 242539622 ...
0
votes
1answer
14 views

Change table column name parquet format Hadoop

I have table with columns a,b,c. The data store on hdfs as parquet, is it possible to change specific column name even if the parquet already writted with the schema of a,b,c?
0
votes
2answers
38 views

Create 5-minute interval between two timestamp

I have a bunch of data point for each there are two columns: start_dt and end_dt. I am wondering how can I split the time gap between start_dt and end_dt into 5 minutes interval? For instance, id++++...
0
votes
0answers
6 views

Implementing linear regression models in Apache Hive

I've been working on a linear regression model in scikit learn which now works as expected. Would anyone know if its possible to implement the same using apache Hive? It would be great if so - I've ...
0
votes
0answers
9 views

Error running emrfs delete - Metadata 'EmrFSMetadata' does not exist

As title. We have stage/prod emr clusters and we may need to run emrfs delete s3_path command on both clusters via the jenkins jobs. However, I can run the emrfs delete successfully on stage emr one,...
1
vote
1answer
28 views

why boolean field is not working in Hive?

I have a column in my hive table which datatype is boolean. when I tried to import data from csv, it stored as NULL. This is my sample table : CREATE tABLE if not exists Engineanalysis( EngineModel ...
0
votes
0answers
9 views

Create Hive table based on Parquet file schema

So I have a directory of about 600 parquet files, and using parquet-tools I've extracted the files' schema: message spark_schema { optional int64 af; optional binary dst_addr (STRING); optional ...
2
votes
1answer
33 views

Converting '03/11/2019 11:45:00' to hive timestamp

I have dates in the format '3/20/2019 17:23:00' or '2/1/2019 11:44:00' and they come in as null when I specify the type as timestamp in hive. I am trying to convert it to with the following code but I'...
1
vote
0answers
31 views

How to cache the left most table in memory for a left outer join in hive

I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records) SELECT st.id FROM small_table st LEFT JOIN large_table lt ON st.id = lt.id In the above ...
0
votes
0answers
14 views

Parquet String to timestamp conversion in hive

I have parquet files generated by some code. I created a DDL for that data, added table in hive and pointed to those parquet files in hdfs. When I try to query the table, all fields look perfect. But, ...
0
votes
1answer
18 views

Is it possible to change partition size manually in Hive?

Actually whenever we are creating partition in hive, how much size is getting allocated. Is it same like 1 Block for 1 partition i.e, 128 MB ? And also is it possible to change the size of partition?
-4
votes
0answers
36 views

How to schedule jobs in Apache Hadoop for Hive & Spark scripts [on hold]

M new to BigData env. M trying to understand , I have few ETL scripts that i have created in HIve & Spark and now i wanted to schedule them to run on daily basis. Can somebody tell me how to ...
0
votes
0answers
7 views

How can I pass HQL parameter in Oozie workflow using Hue?

http://gethue.com/drag-drop-saved-hive-queries-into-your-workflows/#comment-78368 Did it not working at currrent version? ex, Hue4.3 etc.. Parameter Name is 'zip' when submit workflow. It's not '...
0
votes
0answers
21 views

Hive table does not respond to select query. ACID enabled

My hive query does not respond at all on a table .Any select query on the table doesnt respond I have created this table using a select query. Tried to change the type to ORC Tried to make ...
0
votes
1answer
32 views

Timestamp decrease the hour in insert overwrite

I have been work with Sqoop, hive and Impala. My Sqoop Job get a field from SQL Server with the format datetime to write in a TABLE1 stored as textfile. The field in TABLE1 have the timestamp format....
1
vote
0answers
36 views

Unable to access Hive JDBC remotely after SSL configured on Hive

I have enabled SSL on Hive and after that, I've tried connecting to the HIVE through JDBC from DBeaver using below connection Urls. URL1: "jdbc:hive2://hostname:10000/default;ssl=true;sslTrustStore=&...
0
votes
3answers
67 views

Picking same column of multiple rows into one rows of multiple column [duplicate]

I have below Two DF MasterDF NumberDF(Creating using Hive load) Desire output: Logic to populate For Field1 need to pick sch_id where CAT='PAY' and SUB_CAT='client' For Field2 need to pick sch_id ...
0
votes
0answers
18 views

How to import MySQL table with one entirely NULL column using Sqoop into HDFS?

I have a table in MySQL like the table below. Please note that this question is different from other questions in that the column name here does not exist, it is NULL. +------+--------+------+ | id ...
1
vote
1answer
26 views

Get data in the last three months using talend (Big Data Hive)

I have a query to get all data from big data hive as source using talend this is the query i usually use: SELECT bd_bt_xyz.xllnis05_timestamp, bd_bt_xyz.xllnis05_key, . . (too many field) ...
0
votes
0answers
11 views

Getting 'org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family table does not exist in region hbase:meta'

I'm trying to integrate hive and hbase, but when i create (external) table in hive with hbase handler: create external table entity_hbase(id bigint, value string, ts bigint, entity_type tinyint) ...
2
votes
1answer
65 views

Apache Spark: why spark create multiple stages to scan a hive table even cached and why repartition can solve that?

My Questions are: why spark create multiple stages to scan a hive table, even though I already cache the dataframe? why the number of stages is reduced when I repartition the dataframe before cache? ...
1
vote
0answers
36 views
+50

Optimize Hive table loading time in spark

I am working on a spark streaming project in which incoming stream join with multiple Hive tables. So I am loading the table in spark and cache it. Below is the given sample query- val df = sql("...
0
votes
0answers
31 views

Decimal Field In Hive Only Returning Results When Encapsulating Value In Single Quotes

I ingested a table from a SQLServer database into a Hive database. When I attempt to do a lookup on a value in Hive, I have to surround the value with single quotes in order to find it. In SQLServer, ...
0
votes
0answers
24 views

map Parquet files in S3 to their schemas to solve Hive ClassCastException

I have a hive table partitionned by timestamp on top of parquet files with snappy conversion. basically the paths look like : s3:/bucketname/project/flowtime=0/ s3:/bucketname/project/flowtime=1/ ...
0
votes
0answers
26 views

HIVE - create external tables where string itself contains commas

I am new to Hive and am creating external tables on csv file. One of the issues I am coming across are values that contain multiple commas within string itself. For example, the csv file contains ...
-1
votes
0answers
19 views

How to create hive tables from a complex json file using spark

Could you please help me on how to create a HIVE tables from a complex JSON file using spark (a generic solution) ? thanks
1
vote
1answer
31 views

How to get Sum(Column) over (partition by other columns)?

I am trying to convert Teradata code written like below Select A.col1, sum(A.metric1) over (partition by A.col1, B.col1 order by A.col2 asc) as Cust_col, B.col1 from A JOIN B on (A.join_key=B....
0
votes
0answers
26 views

Can hive create an temorary table so I can get a small table for further queries?

I have a table, the schema is as below: struct TBL{ 1:optional string category; 2:optional string endpoint; 3:optional string priority; 4:optional i64 timestap; 5:optional string ...
0
votes
0answers
14 views

Hive table creation with a control character and a multi character delimiter

How to create a hive table with multi character delimiter and one of the delimiter is a control character ^A and the other one §
0
votes
1answer
28 views

Hive issues in HDinsights

When I run my hive queries in Azure HDInsights cluster I get the following error message: ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang....
0
votes
1answer
32 views

SQL query to retrieve values recursively for all rows

I have a 2 tables, where each object has a parent. Each parent is also an object, where it can have another parent and so on. There will be no circular chaining. I can access the data either through ...
0
votes
1answer
16 views

hive- how do I convert scientific notation to decimal value

According to this accepted answer, I should change 1.2766E-10 to decimal like below: select cast(1.2766E-10 as decimal(11, 10)) but it gives me, 1E-10. How can I disable scientific notation in Hive ...
1
vote
1answer
21 views

Hive Query with multiple Columns in Select and group by one column

I have the below sample image of the dataset and the expected result. What can be the best way to achieve this kind of result in a dataset with a billion records. Should we use the intermediate ...
0
votes
0answers
36 views

Retreiving null from array of structs from an xml using hive XML SerDe

So I have this XML file: <wd:File xmlns:wd="urn:com.workday/bsvc"> <wd:Report_Entry> <wd:Job_Application wd:Descriptor="Agent Smith - R-00620 Pharmacist Manager - Sam's -R2_RHO_H&...
-1
votes
0answers
15 views

From Spark2-shell, Hive table returning null for map fields, the same returning data using spark-shell

I am reading data from Hive table using spark.sqlContext.sql from Spark2-shell, it is returning all the data except the map field, returning as null. Other string values are populated. When I use ...
0
votes
0answers
11 views

order of elements changes in map<string,string> on adding new column in hive table

In stage table we have data as follows in "call_log" field which is of String data type: a=1|b=2|c=3|d=4 We are doing a str_to_map(call_log,"\\|","=") while loading into Final table where "call_log"...
1
vote
3answers
31 views

How to remove leading zero's from string column in Hive/Impala

If amount is "$0012304" then I want result like "$12304" and if amount is "$0000000" then "$0" When I use regexp_replace(amount,'^?0', ''), it is replacing all the zeros in the column and giving a ...
1
vote
1answer
18 views

Calculating unmatching rows in partitioned table in hive

I have a use-case where I have to calculate unmatching rows(excluding matching records) from two different partition's from a partitioned hive table. Let's suppose there is a partitioned table called ...
-1
votes
0answers
13 views

how do i avoid concurrency between two hive query

I have two spark jobs, the first job insert overwrite in a managed hive table and the second job is reading from the same table. they are runing in parallel. The problem is the second job dont take ...

http://mssss.yulina-kosm.ru