Found 16 repositories(showing 16)
lhh2002
ๅคงๆฐๆฎ้ข่ฏ้ข๏ผไป0ๅฐ1่ตฐๅๆถๆๅธไน่ทฏใFlinkใSparkใHiveใHBaseใHadoopใKettleใKafka...
abhishek-ch
Intention is to use different algorithms of Machine Learning in R-Programming and Python to work with various dimension and range of data. The implementation will be based on BigData framework and main point of attraction will be Spark and Hive includinh hadoop
liguodongiot
A curated list of awesome frameworks, libraries and software for IOT/BigData/AI/Cloud/Python/Java...
yoyostudy
Implementation of Paper "AutoRed: Automated Attack Scenario Generation Framework for Red Teaming of LLMs" [IEEE BigData 2024 Industry & Governance Track]
USC-InfoLab
Official implementation of WaveGNN (IEEE BigData 2025), integrating a decay-aware Transformer and dynamic Graph Neural Network framework for modeling irregular multivariate time series without imputation. Achieves consistent, robust, and interpretable performance on clinical datasets.
probablyabdullah
A forecasting study based on BigData methods through the example of pesticide sales. Predictions are based on linear regression and two neural networks frameworks, to obtain statistics and compare results.
akash36
Consists of few case studies done under BigData using MapReduce framework.
D-grimut
Repo of various examples of frameworks and tools used for bigdata, with annotations about caveats and how to use.
palominogabriel
Code to make an analysis of the political campaign spent in 2010 using the bigdata framework storm.
wooky94
A framework for using multi-threading in operations that can be decomposed into several tasks working just-in-time. Typically to process a flow of objects in a bigdata context.
This project deals with implementation of BigData using Hadoop framework. I have provide the support of BigData HADOOP version 1. My product provide three ways to setup hadoop cluster -} Automatic Configuration -} Manual Configuration -} On Demand Configuration TECHNOLOGY USED: - OS Used RHEL 7.2 - SSH, NFS, DNS and YUM Configuration - Framework (Map Reduce, Pig, Hive, ZooKeeper) - FRONTEND - python CGI integrated with HTML - BACKEND - HTML, CSS, Java-Script, Ajax.
BigData handling with Hadoop framework and Google Dataproc. This program counts the occurances of trigrams on any dataset of txt files using the Hadoop MapReduce framework. It also sorts the trigrams by their size in descending order by implementing a Comparator class. A complimentary report is produced with it highlighting all the additional features to optimise the processing of Big Data, such as a FileInputCobiner.
JJCuh
Demonstrate the knowledge of BigData systems: Hadoop Distributed File System, NoSQL data base HBase, Resource manager YARN, Processing Frameworks such as MapReduce, Hive, Kafka, and also the Spark eco-system. Develop both batch processing and stream processing data-intensive applications using Apache Hadoop and Apache Spark technologies.
Public sector banks deal with large amount of data on a daily basis. To manage such large datasets, Bigdata analytics techniques like Mapreduce that run on Hadoop framework are used. The basic entities in public sector are customers and their transactions. The question of interest here is to analyse the behavioural patterns of the spending habits of customers and patterns of their transactions. These behaviours are encoded into data through certain medium that capture behaviour. Various data sources are then accessed, prepared, consolidated and analyzed. Ultimately, this gives rise to insights into the patterns of the expenses by the customers across a period of time.
The rise of different topics all over the interneton a daily basis is increasing rapidly which leads to a seriesof fluctuating data in all areas of the world and following upa particular topic and getting related information regardingthe topic is essential to keep with the trend. Most of thesetopics trend with their related hash tags on twitter on a dailybasis, and the data related to the trending topics are vastand require a capable and efficient framework to stream,analyse and cluster the topics based on the topicโs hash tag. Bigdata frameworks such as Apache Spark has high computingcapacity to manage such big data at a faster rate in an efficientway.The challenge of analyzing the trending topics on twitterfor real-time topic clustering to get a clear and only relatedtweets and information regarding a topic is the motivation forthe following applied clustering techniques applied using ApacheSpark and clustering algorithms. The clustering algorithm isconstructed using Spark LDA (Latent Dirichlet Allocation),the algorithm takes the live twitter stream data as an inputand the data is Represented using a vector space model,thenon-negative dimension weights highlight the significance of thein accordance term functions, one essential assets of the sortof function space is high dimensions which occurs.The LDAalgorithm takes the approximate assumed topics in the documentand will assign every word in the document to a temporarytopic using LDA which is a probabilistic model that posits aset of global topics and a set of document topics, the LDAprocess is applied iterative by loop each word in the document and update the topic assignment based on the criteria established.
kaustubhyerkade
DE Stuff - Programming & DSA in Python Distrubuted computation & storage -Hadoop - imp - spark,spark sql file format - json,avro,parquet type of data - structiured & semi sturectured processing mechniasm - batch & real time pub sub -kafka or AWS Kinesis data warehuse desginng amazon - ecomerce, netflix sql complex transactional & no sql databases - key value ,document,columanr base,graph base db,-mongodb,caseendra,hbase - data visualization tools- ETL tools cloud services- experience ๐ฅ๐๐บ๐ฝ๐ผ๐ฟ๐๐ฎ๐ป๐ ๐ฅ๐ฒ๐๐ผ๐๐ฟ๐ฐ๐ฒ๐๐ฅ ๐ด Data Engineer Roadmap Document : โก๏ธhttps://docs.google.com/document/d/1g...โ ๐ด Python : โก๏ธ https://www.youtube.com/watch?v=_uQrJ...โ โก๏ธ https://www.programiz.com/python-prog...โ ๐ด Scala : โก๏ธ https://www.youtube.com/watch?v=LQVDJ...โ โก๏ธ http://allaboutscala.com/โ ๐ด Java : โก๏ธ https://www.youtube.com/watch?v=eIrMb...โ โก๏ธ https://beginnersbook.com/java-tutori...โ ๐ด Linux, Unix, Shell Scripting : โก๏ธ https://practice.geeksforgeeks.org/ba...โ โก๏ธ Above one is free course with invitation code - ELEARNINGBLINUX ๐ด Data Structures & Algorithms : โก๏ธ https://www.youtube.com/watch?v=5_5oE...โ โก๏ธ https://www.geeksforgeeks.org/โ โก๏ธ https://leetcode.com/โ ๐ด DBMS : โก๏ธ https://www.youtube.com/watch?v=kBdlM...โ โก๏ธ https://www.studytonight.com/dbms/โ ๐ด SQL Scripting : โก๏ธ https://www.youtube.com/watch?v=HXV3z...โ โก๏ธ https://www.youtube.com/watch?v=7S_tz...โ โก๏ธ https://www.w3schools.com/sql/โ ๐ด Basic Terminologies In BigData : โก๏ธ https://data-flair.training/blogs/wha...โ โก๏ธ https://www.edureka.co/blog/what-is-b...โ ๐ด Data Exploration Libraries : โก๏ธ Pandas - https://www.youtube.com/watch?v=UB3DE...โ โก๏ธ NumPy - https://www.youtube.com/watch?v=DI8wg...โ ๐ด Data Warehousing Concepts : โก๏ธ https://www.youtube.com/watch?v=J326L...โ โก๏ธhttps://www.tutorialspoint.com/dwh/dw...โ. ๐ด BigData Frameworks (Hadoop, Hive, Spark, Sqoop, Nifi, Flume) : โก๏ธ https://www.youtube.com/results?searc...โ โก๏ธ https://www.youtube.com/user/edurekaINโ โก๏ธ https://data-flair.training/โ โก๏ธ https://www.edureka.co/โ ๐ด Workflow Schedulers, Dependency Management : โก๏ธ https://www.youtube.com/watch?v=niJ06...โ โก๏ธ https://www.youtube.com/watch?v=6RebQ...โ ๐ด NoSQL Databases : โก๏ธ HBase - https://www.youtube.com/watch?v=NOX6-...โ โก๏ธ Cassandra - https://www.youtube.com/watch?v=iDhIj...โ โก๏ธ Elastic Search - https://www.youtube.com/watch?v=1Envk...โ โก๏ธ MongoDB - https://www.youtube.com/watch?v=pWbMr...โ ๐ด Apache Kafka : โก๏ธ https://www.youtube.com/watch?v=daRyk...โ ๐ด Dashboarding Tools : โก๏ธ Tableau - https://www.youtube.com/watch?v=aHaOI...โ โก๏ธ PowerBI - https://www.youtube.com/watch?v=3u7MQ...โ โก๏ธ Grafana - https://www.youtube.com/watch?v=CjABE...โ โก๏ธ Kibana - https://www.youtube.com/watch?v=gQ1c1...โ ๐ด BigData Services in Cloud (AWS) : โก๏ธ https://www.youtube.com/watch?v=k1RI5...โ โก๏ธ https://www.youtube.com/watch?v=8PyLr...โ โก๏ธ https://www.simplilearn.com/aws-big-d...โ ๐๐บ๐ฝ๐ผ๐ฟ๐๐ฎ๐ป๐ ๐๐ผ๐ฝ๐ถ๐ฐ๐ ๐ถ๐ป ๐ฆ๐ค๐ : ๐ Joins ๐ Group By ๐ Nested joins ๐ Case-When conditions ๐ Window functions ๐ง๐ฒ๐ฐ๐ต ๐๐๐ฎ๐ฐ๐ธ ๐ณ๐ผ๐ฟ ๐ฟ๐ฒ๐ฎ๐น๐๐ถ๐บ๐ฒ ๐ฑ๐ฎ๐๐ฎ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ๐ : ๐ Apache Kafka ๐ Apache Flink ๐ Apache Storm ๐ AWS Kinesis ๐ Spark Streaming ๐๐ถ๐ด๐๐ฎ๐๐ฎ ๐ณ๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ๐ ๐ฎ๐ป๐ฑ ๐๐ฎ๐ฑ๐ผ๐ผ๐ฝ ๐ฒ๐ฐ๐ผ๐๐๐๐๐ฒ๐บ : ๐ Hadoop architecture, Map-Reduce, HDFS, Yarn ๐ Apache Spark ๐ Hive ๐ Flume ๐ Sqoop ๐ Zookeeper ๐ Ambari, Hue ๐ Oozie, Airflow, Azkaban ๐๐ฎ๐๐ฎ ๐ฉ๐ถ๐๐๐ฎ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ง๐ผ๐ผ๐น๐ : ๐ Tableau ๐ Power BI ๐ Qlik Sense ๐ Grafana ๐ Kibana ๐ง๐ฟ๐ฎ๐ป๐๐ฎ๐ฐ๐๐ถ๐ผ๐ป๐ฎ๐น ๐๐ฎ๐๐ฎ๐ฏ๐ฎ๐๐ฒ๐ : ๐ Amazon Aurora ๐ PostgreSQL ๐ MySQL ๐ MariaDB ๐ Oracle ๐ SQL Server ๐ก๐ผ-๐ฆ๐ค๐ ๐๐ฎ๐๐ฎ๐ฏ๐ฎ๐๐ฒ๐ : ๐ DynamoDB ๐ Cassandra ๐ MongoDB ๐ ElasticSearch ๐ HBase ๐ Couchbase ๐ Redis Book - Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python 2nd Edition,
All 16 repositories loaded