in Education by
In the hive, partitioning and bucketing a table, both are done on a column. But how exactly are they different? Select the correct answer from above options

1 Answer

0 votes
by
 
Best answer
Partitioning Based on values of columns of a table, Partition divides large amount of data into multiple slices. What that means is we are able to differentiate a large amount of data on the basis of our need, for example if we have the data for all the employees working in a particular company ( with huge number of employees) but we need to survey only the employees which belong to a particular category, in the absence of partitioning our process would be to scan through all the entries and find those out, but if we partition our table on the basis of category then it becomes very simple to survey the lot. Bucketing Bucketing basically puts data into more manageable or equal parts. When we go for partitioning, we might end up with multiple small partitions based on column values. But when we go for bucketing, we restrict number of buckets to store the data ( which is defined earlier). Difference and Conclusion When we are dealing with some field in our data which has high cardinality ( number of possible values the field can have) it should be taken care that partitioning is not used. If we partition a field with large amount of values, we might end up with too many directories in our file system. What bucketing does differently to partitioning is we have a fixed number of files, since you do specify the number of buckets, then hive will take the field, calculate a hash, which is then assigned to that bucket. We can partition on multiple fields ( category, country of employee etc), while you can bucket on only one field. So, bucketing is useful for the situation in which the field has high cardinality and data is evenly spread among all buckets ( approximately). Partitioning works best when the cardinality of the partitioning field is not too high and it can quickly be queued after.

Related questions

0 votes
    Can someone tell me what is metadata? What is the difference between Internal tables and external tables in the hive? Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    I have just started with Hadoop. Using Cloudera's Hadoop VM, I worked with Hive, Pig and Hadoop. As I worked, ... we need them both? Select the correct answer from above options...
asked Jan 20, 2022 in Education by JackTerrance
0 votes
    In the hive, partitioning and bucketing a table, both are done on a column. But how exactly are they ... ,Core Questions, Core Hadoop MCQ,core interview questions for experienced...
asked Oct 31, 2021 in Education by JackTerrance
0 votes
    Is there any way or any command which I can use in command prompt to know the version of Hadoop? Also, how ... the version of Hive? Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    What is the difference between Hadoop, HBase, Hive and Pig? I know the basic Definitions of all these terms, But ... can these be used? Select the correct answer from above options...
asked Jan 20, 2022 in Education by JackTerrance
0 votes
    I run hive query by java code. Example: "SELECT * FROM table WHERE id > 100" How to export result to hdfs file.a Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me why Hive is used in Hadoop? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    I want to put my Results of a hiveql query into a CSV file. How can I do it? I tried doing this, but ... select goods from the table; Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    In Hive, when we do a query (like: select * from employee), we do not get any column names in the ... when you execute any query? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me whether Hive is easy to learn? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    Can Hive be used for unstructured data? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me what are the advantages of Hive? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me why Hive is not a database? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    Can anyone tell whether Hive is SQL or NoSQL? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me what kind of data warehouse application is suitable for Hive? Select the correct answer from above options...
asked Jan 8, 2022 in Education by JackTerrance
...