Convert a MySQL table into a ColumnFamily in Cassandra : Slow batch mutations with Hector

Question

Convert a MySQL table into a ColumnFamily in Cassandra : Slow batch mutations with Hector

asked Feb 18, 2022 in Education by JackTerrance

I have a very large MySQL table (billions of rows, with dozens of columns) I would like to convert into a ColumnFamily in Cassandra. I'm using Hector. I first create my schema as such : String clusterName = "Test Cluster"; String host = "cassandra.lanhost.com:9160"; String newKeyspaceName = "KeyspaceName"; String newColumnFamilyName = "CFName"; ThriftCluster cassandraCluster; CassandraHostConfigurator cassandraHostConfigurator; cassandraHostConfigurator = new CassandraHostConfigurator(host); cassandraCluster = new ThriftCluster(clusterName, cassandraHostConfigurator); BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName(newKeyspaceName); columnFamilyDefinition.setName(newColumnFamilyName); columnFamilyDefinition.setDefaultValidationClass("UTF8Type"); columnFamilyDefinition.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName()); columnFamilyDefinition.setComparatorType(ComparatorType.UTF8TYPE); BasicColumnDefinition columnDefinition = new BasicColumnDefinition(); columnDefinition.setName(StringSerializer.get().toByteBuffer("id")); columnDefinition.setIndexType(ColumnIndexType.KEYS); columnDefinition.setValidationClass(ComparatorType.INTEGERTYPE.getClassName()); columnDefinition.setIndexName("id_index"); columnFamilyDefinition.addColumnDefinition(columnDefinition); columnDefinition = new BasicColumnDefinition(); columnDefinition.setName(StringSerializer.get().toByteBuffer("status")); columnDefinition.setIndexType(ColumnIndexType.KEYS); columnDefinition.setValidationClass(ComparatorType.ASCIITYPE.getClassName()); columnDefinition.setIndexName("status_index"); columnFamilyDefinition.addColumnDefinition(columnDefinition); ....... ColumnFamilyDefinition cfDef = new ThriftCfDef(columnFamilyDefinition); KeyspaceDefinition keyspaceDefinition = HFactory.createKeyspaceDefinition(newKeyspaceName, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDef)); cassandraCluster.addKeyspace(keyspaceDefinition); Once that done, I load my data, stored in a List, since I'm fetching the MySQL data with a namedParametersJdbcTemplate, as such : String clusterName = "Test Cluster"; String host = "cassandra.lanhost.com:9160"; String KeyspaceName = "KeyspaceName"; String ColumnFamilyName = "CFName"; final StringSerializer serializer = StringSerializer.get(); public void insert(List dataToInsert) throws ExceptionParserInterrupted { Keyspace workingKeyspace = null; Cluster cassandraCluster = HFactory.getOrCreateCluster(clusterName, host); workingKeyspace = HFactory.createKeyspace(KeyspaceName, cassandraCluster); Mutator mutator = HFactory.createMutator(workingKeyspace, serializer); ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(workingKeyspace, ColumnFamilyName, serializer, serializer); long t1 = System.currentTimeMillis(); for (SqlParameterSource data : dataToInsert) { String keyId = "id" + (Integer) data.getValue("id"); mutator.addInsertion(keyId, ColumnFamilyName, HFactory.createColumn("id", (Integer) data.getValue("id"), StringSerializer.get(), IntegerSerializer.get())); mutator.addInsertion(keyId,ColumnFamilyName, HFactory.createStringColumn("status", data.getValue("status").toString())); ............... } mutator.execute(); System.out.println(t1 - System.currentTimeMillis()); I'm inserting 100 000 lines in approximatively 1 hour, which is really slow. I heard about multi-threading my inserts, but in this particular case I don't know what to do. Should I use BatchMutate? JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

Related questions

0 votes

Q: mysql 5.7 log-slow-queries error

I'm trying to enable Slow Query Logging on mysql 5.7 and getting this error: 2016-04-27T14:55:51 ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jul 11, 2022 in Education by JackTerrance

0 votes

Q: Very Slow result when use WHERE and ORDER BY condition in MYSQL Query

I am facing issue of very slow result. I am sharing table structure as and results also. if you ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 9, 2022 in Education by JackTerrance

0 votes

Q: How to enable slow query log MySQL?

How to enable slow query log MySQL?...

asked Aug 20, 2021 in Technology by JackTerrance

0 votes

Q: How can we stop Slow Query Log in MySQL?

How can we stop Slow Query Log in MySQL?...

asked Aug 20, 2021 in Technology by JackTerrance

0 votes

Q: How to echo results from a case switch mySQL query into a table within a span tag

I need to take the users input from the select box for Allergen1 once an option is selected and the ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 16, 2022 in Education by JackTerrance

0 votes

Q: Convert mysql timestamp to epoch time in python

Convert mysql timestamp to epoch time in python - is there an easy way to do this? JavaScript questions ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Mar 18, 2022 in Education by JackTerrance

0 votes

Q: How to change slow parametrized inserts into fast bulk copy (even from memory)

I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Mar 15, 2022 in Education by JackTerrance

0 votes

Q: How to change slow parametrized inserts into fast bulk copy (even from memory)

I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Mar 13, 2022 in Education by JackTerrance

0 votes

Q: Consider yourself to be in a planet where the computational power of chips to be slow. You have an array of size 10.You want to perform enqueue some element into this array. But you can perform only push and pop operations .Push and pop operation both take 1 sec respectively. The total time required to perform enQueue operation is?

Consider yourself to be in a planet where the computational power of chips to be slow. You have an ... from above options Data Structures and Algorithms questions and answers...

asked Nov 14, 2021 in Education by JackTerrance

0 votes

Q: MYSQL update Trigger check changes in all columns and insert values to other table

I've come to this code, but from this i have to manually insert all columns and check it by each ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 16, 2022 in Education by JackTerrance

0 votes

Q: How to delete data in mysql DB from table td using ajax?

My goal here is to delete the correspondent data in mysql db table using AJAX. So I've generated a table with several rows and want to a non specific row... PHP Code:...

asked Mar 2, 2022 in Education by JackTerrance

0 votes

Q: imort CSV data to Mysql table

i have one csv file which contains library group and it's data... group consider as sheet and for ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jan 16, 2022 in Education by JackTerrance

0 votes

Q: What is heap table and other name of these table in MySQL?

What is heap table and other name of these table in MySQL?...

asked Nov 30, 2020 in Technology by JackTerrance

0 votes

Q: How to convert data from the List of strings into a Map<String, Integer> with Stream API?

I tried to convert data from the list of strings into a Map with the Stream API. But the way I ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 3, 2022 in Education by JackTerrance

0 votes

Q: radha is a professional photographer and is working with the multinational photo printing company which device would radha be using to convert photographs into digital forms?

radha is a professional photographer and is working with the multinational photo printing company which device would radha ... forms? Select the correct answer from above options...

asked Dec 24, 2021 in Education by JackTerrance

JackTerrance · Answer 1 · 2022-02-18T04:40:51+0000

Yes, you should run your insertion code from multiple threads. Take a look at the following stress testing code for an example of how to do this efficiently with hector: https://github.com/zznate/cassandra-stress An additional source of your insert performance issue may be the number of secondary indexes you are applying on the column family (each secondary index creates an additional column family 'under the hood'). Correctly designed data models should not really need a large number of secondary indexes. The following article provides a good overview of data modeling in Cassandra: http://www.datastax.com/docs/1.0/ddl/index