Apache Spark: Perangkat Lunak Analisis Terpadu untuk Big Data

       Apache Spark adalah engine ( perangkat lunak ) analisis terpadu super cepat untuk memproses data dalam skala besar; meliputi Big Data dan machine learning. Secara lebih detailnya, Apache Spark dapat didefinisikan sebagai engine ( perangkat lunak ) untuk memproses data dalam skala besar secara in-memory, dilengkapi dengan API pengembangan yang elegan dan ekspresif guna memudahkan para pekerja data dalam mengeksekusi pekerjaan-pekerjaan yang membutuhkan perulangan akses yang cepat terhadap data yang diproses, seperti halnya streaming, machine learning, maupun SQL, secara efisien.

       Apache Spark terdiri atas Spark Core ( inti ) dan sekumpulan library perangkat lunak. Inti dari Spark adalah distributed execution engine, dan API Java, Scala maupun Python disediakan sebagai platform untuk mengembangkan aplikasi ETL ( Extract, Transform, Load ) terdistribusi. Kemudian, library perangkat lunak tambahan, yang dibangun diatas inti ( core )-nya, memfasilitasi berbagai jenis pekerjaan yang berhubungan dengan streaming, SQL, dan machine learning.



Komponen Apache Spark (hortonworks.com)

       Spark didesain untuk data science dan menyediakan abstraksi yang membuat data science menjadi lebih mudah. Para data scientist ( ilmuwan data ) sering menggunakan machine learning, yaitu sekumpulan teknik dan algorithma yang dapat belajar dari data-data yang diberikan. Algorithma-algorithma ini banyak yang sifatnya iterative ( melakukan perulangan kalkulasi ), sehingga kemampuan Spark untuk menempatkan data-data yang diproses pada cache di memory, berperan sangat besar dalam peningkatan kecepatan bagi pemrosesan data yang sifatnya iterative tersebut. Kemampuan Spark ini telah menjadikan Spark sebagai engine yang ideal bagi implementasi algorithma-algorithma machine learning. Berkaitan dengan hal ini, Spark juga menyertakan Mllib, library perangkat lunak yang menyediakan implementasi algorithma-algorithma machine learning untuk teknik-teknik data science yang sudah umum, seperti Classification, Regression, Collaborative Filtering, Clustering, and Dimensionality Reduction.

       Sebagai perangkat lunak untuk memproses data dalam skala besar, Apache Spark memiliki sejumlah keunggulan, diantaranya:
  1. Kecepatan. Apache Spark mampu bekerja 100 kali lebih cepat dibanding Hadoop. Berkat penggunaan state-of-the-art DAG scheduler, query optimizer, dan physical execution engine, Apache Spark dapat mencapai performa tinggi baik dalam hal pemrosesan data yang sifatnya batch maupun streaming.
  2. Mudah Digunakan. Dapat menggunakan bahasa program Java, Scala, Python, R, dan SQL untuk mengembangkan aplikasi yang menggunakan Apache Spark. Spark menyediakan lebih dari 80 operator level tinggi yang dapat memudahkan pengembang untuk membangun aplikasi secara parallel. Apache Spark juga dapat digunakan secara interaktif dari shell Scala, Python, R, dan SQL.
  3. Memiliki Cakupan yang Luas. Apache Spark menggabungkan SQL, streaming, dan analitik yang kompleks; menyediakan setumpuk library perangkat lunak meliputi SQL dan DataFrames, MLlib untuk machine learning, GraphX, dan Spark Streaming. Pengembang aplikasi dapat menggabungkan semua library ini dengan mulus dalam satu aplikasi yang sama.
  4. Dapat dijalankan Dimana-mana. Apache Spark dapat dijalankan di Hadoop YARN, Apache Mesos, Kubernetes, dengan mode standalone maupun cluster, atau di platform cloud seperti EC2. Pada dasarnya, Spark dapat mengakses berbagai tipe sumber data seperti halnya HDFS, Apache Cassandra, Apache HBase, Apache Hive, dan ratusan sumber data lainnya.
       Sejak peluncurannya, Apache Spark telah dengan cepat diadopsi oleh perusahaan-perusahaan dari berbagai jenis bidang industri. Raksasa dunia Internet seperti halnya Netflix, Yahoo!, dan eBay telah menjalankan Spark dalam skala super besar, secara kolektif memproses data dalam hitungan petabytes pada kluster yang terdiri atas 8000 nodes ( komputer ). Spark tumbuh dengan cepat menjadi komunitas open source terbesar di bidang Big Data, terdiri atas lebih dari 1000 kontributor dan 250+ organisasi.

Berminat untuk mencoba menjalankan aplikasi berbasis Apache Spark? Silakan ikuti tutorial berikut:


keduanya dikemas secara sedernana dan straight forward.

Sumber data yang dapat diakses Apache Spark (databriks.com) 

Ref:
1. Hortonworks, "What Apache Spark Does?," https://hortonworks.com/apache/spark/ [Accessed 29 7 2018].
2. Apache, "Apache Spark," https://spark.apache.org/. [Accessed 29 7 2018].
3. Databricks, "What is Apache Spark?," https://databricks.com/spark/about. [Accessed 29 7 2018].

Comments

Unknown said…
Terimakasih atas ulasan tentang big data pada postingan blog ini, sangat mudah dimengerti.
Vijjam Wjaya said…
Sama2,
terima kasih sudah menyimak.
Semoga bermanfaat!
Codex said…
Saya ingin banyak belajar, bisa kah saya memang kontak email kakak
Vijjam Wjaya said…
silakan: wijaya1414{at_mark}gmail{dot}com
ek said…
I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
Data Scientist Courses This is a great inspiring article.I am pretty much pleased with your good work.You put really very helpful information...
Your work is very good, and I appreciate you and hopping for some more informative posts
priyanka said…
Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Simple Linear Regression
Correlation vs covariance
data science interview questions
KNN Algorithm
Logistic Regression explained
Priyanka said…
Attend The Machine Learning Course Bangalore From ExcelR. Practical Machine Learning course Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning course Bangalore.
Machine Learning Course Bangalore
priyanka said…
very well explained .I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Simple Linear Regression
Correlation vs covariance
data science interview questions
KNN Algorithm
Logistic Regression explained
"...dilengkapi dengan API..." saya izin bertanya API itu apa? suatu program kah?
Unknown said…
I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.Data Analytics Course
Unknown said…
hello sir,
thanks for giving that type of information. I am really happy to visit your blog.Leading Solar company in Andhra Pradesh
360digitmgdelhi said…
You totally coordinate our desire and the assortment of our data.
data scientist course delhi
360DigiTMG said…
Standard visits recorded here are the simplest strategy to value your vitality, which is the reason why I am heading off to the site regularly, looking for new, fascinating information. Many, bless your heart!
data science training
Very awesome!!! When I searched for this I found this website at the top of all blogs in search engines.
Data Science Training

Happy to visit your blog, I am by all accounts forward to more solid articles and I figure we as a whole wish to thank so numerous great articles, blog to impart to us.
data scientist certification
i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
Data Scientist Course
I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
Data Analytics Courses in Bangalore
madhavi reddy said…
I Want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging endeavors.
Business Analytics Course in Bangalore
I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
data analytics course in bangalore
Pallavi reddy said…
i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
data scientist course in bangalore
madhavi reddy said…
I Want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging endeavors.
data science certification in banagalore
Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
Data Science Training in Bangalore
Great blog found to be well written in a simple manner that everyone will understand and gain the enough knowledge from your blog being more informative is an added advantage for the users who are going through it. Once again nice blog keep it up.

data analytics courses in bangalore with placement
Honestly speaking this blog is absolutely amazing in learning the subject that is building up the knowledge of every individual and enlarging to develop the skills which can be applied in to practical one. Finally, thanking the blogger to launch more further too.

data science in bangalore
admin said…
Informasi diatas kurang lengkap ? temukan artikel terkait disalah satu web kami.

Blog Pendidikan ;
Blog Guru ;
Blog Mahasiswa ;
Blog Dosen ;
Blog Siswa ;
Blog Pelajar ;
Blog Ilmu ;
Blog Indonesia ;
Blog EDU ;

Terimakasih, semoga bermanfaat !
InstituteBlr said…
I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
data analytics course in bangalore
I am a new user of this site, so here I saw several articles and posts published on this site, I am more interested in some of them, hope you will provide more information on these topics in your next articles.
data analytics training in bangalore
Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
Data Science Course in Bangalore
Pallavi reddy said…
i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
best data science courses in bangalore

Pallavi reddy said…
i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
best data science courses in bangalore
Sekayu Ngoding said…
This comment has been removed by the author.
Fyndhere said…
Thanks for this post can you please help me to find relevant materials in nearby stores of hyderabad
Adinda said…
Good post. It’s always useful to read content from other authors and
practice something from other websites.

Visit Us
Pallavi reddy said…
I am glad to discover this page. I have to thank you for the time I spent on this especially great reading !! I really liked each part and also bookmarked you for new information on your site.
artificial intellingence training in chennai
Aishwariya said…
Very professionally written blog. Regards to your hard work and effort. Looking forward to learn a lot more from you. Do share more.
Reactjs Training in Chennai |
Best Reactjs Training Institute in Chennai |
Reactjs course in Chennai |
princika said…
I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
data science training in chennai
Nice blog. Good work. Clear explanation and informative content. Keep sharing more blogs.
Artificial Intelligence Course in Hyderabad with placements

Popular posts from this blog

Apa itu Big Data : Menyimak Kembali Definisi Big Data, Jenis Teknologi Big Data, dan Manfaat Pemberdayaan Big Data

Memahami Definisi Big Data

Cara Sederhana Install Hadoop 2 mode Standalone pada Windows 7 dan Windows 10

MapReduce: Besar dan Powerful, tapi Tidak Ribet

Bagaimana Cara Membaca Google Play eBook Secara Offline?

Pentingnya Web Crawling sebagai Cara Pengumpulan Data di Era Big Data

HDFS: Berawal dari Google untuk Big Data

Apa itu 'BIG DATA'?

Big Data Bisa Apa? Big Data Untuk Siapa?