in Education by
I have a very large MySQL table (billions of rows, with dozens of columns) I would like to convert into a ColumnFamily in Cassandra. I'm using Hector. I first create my schema as such : String clusterName = "Test Cluster"; String host = "cassandra.lanhost.com:9160"; String newKeyspaceName = "KeyspaceName"; String newColumnFamilyName = "CFName"; ThriftCluster cassandraCluster; CassandraHostConfigurator cassandraHostConfigurator; cassandraHostConfigurator = new CassandraHostConfigurator(host); cassandraCluster = new ThriftCluster(clusterName, cassandraHostConfigurator); BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName(newKeyspaceName); columnFamilyDefinition.setName(newColumnFamilyName); columnFamilyDefinition.setDefaultValidationClass("UTF8Type"); columnFamilyDefinition.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName()); columnFamilyDefinition.setComparatorType(ComparatorType.UTF8TYPE); BasicColumnDefinition columnDefinition = new BasicColumnDefinition(); columnDefinition.setName(StringSerializer.get().toByteBuffer("id")); columnDefinition.setIndexType(ColumnIndexType.KEYS); columnDefinition.setValidationClass(ComparatorType.INTEGERTYPE.getClassName()); columnDefinition.setIndexName("id_index"); columnFamilyDefinition.addColumnDefinition(columnDefinition); columnDefinition = new BasicColumnDefinition(); columnDefinition.setName(StringSerializer.get().toByteBuffer("status")); columnDefinition.setIndexType(ColumnIndexType.KEYS); columnDefinition.setValidationClass(ComparatorType.ASCIITYPE.getClassName()); columnDefinition.setIndexName("status_index"); columnFamilyDefinition.addColumnDefinition(columnDefinition); ....... ColumnFamilyDefinition cfDef = new ThriftCfDef(columnFamilyDefinition); KeyspaceDefinition keyspaceDefinition = HFactory.createKeyspaceDefinition(newKeyspaceName, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDef)); cassandraCluster.addKeyspace(keyspaceDefinition); Once that done, I load my data, stored in a List, since I'm fetching the MySQL data with a namedParametersJdbcTemplate, as such : String clusterName = "Test Cluster"; String host = "cassandra.lanhost.com:9160"; String KeyspaceName = "KeyspaceName"; String ColumnFamilyName = "CFName"; final StringSerializer serializer = StringSerializer.get(); public void insert(List dataToInsert) throws ExceptionParserInterrupted { Keyspace workingKeyspace = null; Cluster cassandraCluster = HFactory.getOrCreateCluster(clusterName, host); workingKeyspace = HFactory.createKeyspace(KeyspaceName, cassandraCluster); Mutator mutator = HFactory.createMutator(workingKeyspace, serializer); ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(workingKeyspace, ColumnFamilyName, serializer, serializer); long t1 = System.currentTimeMillis(); for (SqlParameterSource data : dataToInsert) { String keyId = "id" + (Integer) data.getValue("id"); mutator.addInsertion(keyId, ColumnFamilyName, HFactory.createColumn("id", (Integer) data.getValue("id"), StringSerializer.get(), IntegerSerializer.get())); mutator.addInsertion(keyId,ColumnFamilyName, HFactory.createStringColumn("status", data.getValue("status").toString())); ............... } mutator.execute(); System.out.println(t1 - System.currentTimeMillis()); I'm inserting 100 000 lines in approximatively 1 hour, which is really slow. I heard about multi-threading my inserts, but in this particular case I don't know what to do. Should I use BatchMutate? JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
Yes, you should run your insertion code from multiple threads. Take a look at the following stress testing code for an example of how to do this efficiently with hector: https://github.com/zznate/cassandra-stress An additional source of your insert performance issue may be the number of secondary indexes you are applying on the column family (each secondary index creates an additional column family 'under the hood'). Correctly designed data models should not really need a large number of secondary indexes. The following article provides a good overview of data modeling in Cassandra: http://www.datastax.com/docs/1.0/ddl/index

Related questions

0 votes
    I'm trying to enable Slow Query Logging on mysql 5.7 and getting this error: 2016-04-27T14:55:51 ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jul 11, 2022 in Education by JackTerrance
0 votes
    I am facing issue of very slow result. I am sharing table structure as and results also. if you ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 9, 2022 in Education by JackTerrance
0 votes
    How to enable slow query log MySQL?...
asked Aug 20, 2021 in Technology by JackTerrance
0 votes
    How can we stop Slow Query Log in MySQL?...
asked Aug 20, 2021 in Technology by JackTerrance
0 votes
    I need to take the users input from the select box for Allergen1 once an option is selected and the ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 16, 2022 in Education by JackTerrance
0 votes
    Convert mysql timestamp to epoch time in python - is there an easy way to do this? JavaScript questions ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 18, 2022 in Education by JackTerrance
0 votes
    I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 15, 2022 in Education by JackTerrance
0 votes
    I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 13, 2022 in Education by JackTerrance
0 votes
    I've come to this code, but from this i have to manually insert all columns and check it by each ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 16, 2022 in Education by JackTerrance
0 votes
    My goal here is to delete the correspondent data in mysql db table using AJAX. So I've generated a table with several rows and want to a non specific row... PHP Code:...
asked Mar 2, 2022 in Education by JackTerrance
0 votes
    i have one csv file which contains library group and it's data... group consider as sheet and for ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jan 16, 2022 in Education by JackTerrance
0 votes
    What is heap table and other name of these table in MySQL?...
asked Nov 30, 2020 in Technology by JackTerrance
0 votes
    I tried to convert data from the list of strings into a Map with the Stream API. But the way I ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 3, 2022 in Education by JackTerrance
0 votes
    radha is a professional photographer and is working with the multinational photo printing company which device would radha ... forms? Select the correct answer from above options...
asked Dec 24, 2021 in Education by JackTerrance
...