International Big Data Conference 2015

The last twelve months can only be described as a whirlwind of new technology advances in the Big Data market. Perhaps the most notable of which has been the huge explosion of interest in the Apache Spark in-memory clustered execution environment. The momentum behind Apache Spark seems to be unstoppable with more and more vendors getting behind it and availability of Spark both on-premises and in the Cloud. This new in-memory execution environment has a number of components including Spark Streaming, Spark SQL, MLlib, GraphX, SparkR and the Tachyon in-memory file system. In addition, Hadoop MapReduce, Hive and Pig are all moving on to Spark. As momentum gathers pace, Data Scientists can make use of an number of new tools that run on top of Spark to prepare and analyse data and/or develop their own analytical applications in languages Scala, Java, Python and R.

A lot of other things have also been going on in Big Data. For example, the issue of Information Management and Governance in a Big Data environment is turning into a huge issue with so many new data sources and now both traditional and self-service data integration tools available. The hype says create a centralised ‘Data Lake’ on Hadoop. Is this a good strategy? What about data governance? What about Big Data Security? How do you stop a Data Lake becoming a ‘Data Swamp’? Given that Big Data is here to stay, how should you organize your information architecture going forward? What about high velocity data such as sensor data from an Internet of Things?

Also what about Analytics? How do you make sense of all the algorithms? When do you use what where? What kinds of algorithms are useful for what kinds of purpose?

This Conference aims to provide an update on Big Data and Analytics to show the latest advances in technology and address important areas such as Apache Spark, Advanced Analytics, SQL on Hadoop, Internet of Things (IoT). It also deals with key Information Management issues like Big Data Security, the explosion of data sources, the impact of self-service data integration, how to organise and govern data in a Data Lake. The intention is to improve your understanding to help you get started and succeed with Big Data and integrate new technologies into your existing environment.