Found 23 repositories(showing 23)
arpit3043
Summarization systems often have additional evidence they can utilize in order to specify the most important topics of document(s). For example, when summarizing blogs, there are discussions or comments coming after the blog post that are good sources of information to determine which parts of the blog are critical and interesting. In scientific paper summarization, there is a considerable amount of information such as cited papers and conference information which can be leveraged to identify important sentences in the original paper. How text summarization works In general there are two types of summarization, abstractive and extractive summarization. Abstractive Summarization: Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text. It can be correlated to the way human reads a text article or blog post and then summarizes in their own word. Input document → understand context → semantics → create own summary. 2. Extractive Summarization: Extractive methods attempt to summarize articles by selecting a subset of words that retain the most important points. This approach weights the important part of sentences and uses the same to form the summary. Different algorithm and techniques are used to define weights for the sentences and further rank them based on importance and similarity among each other. Input document → sentences similarity → weight sentences → select sentences with higher rank. The limited study is available for abstractive summarization as it requires a deeper understanding of the text as compared to the extractive approach. Purely extractive summaries often times give better results compared to automatic abstractive summaries. This is because of the fact that abstractive summarization methods cope with problems such as semantic representation, inference and natural language generation which is relatively harder than data-driven approaches such as sentence extraction. There are many techniques available to generate extractive summarization. To keep it simple, I will be using an unsupervised learning approach to find the sentences similarity and rank them. One benefit of this will be, you don’t need to train and build a model prior start using it for your project. It’s good to understand Cosine similarity to make the best use of code you are going to see. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. Since we will be representing our sentences as the bunch of vectors, we can use it to find the similarity among sentences. Its measures cosine of the angle between vectors. Angle will be 0 if sentences are similar. All good till now..? Hope so :) Next, Below is our code flow to generate summarize text:- Input article → split into sentences → remove stop words → build a similarity matrix → generate rank based on matrix → pick top N sentences for summary.
Ashishkumar-hub
Extractive text summerization using BERT
AhmedaliElgabry
No description available
Lspringer24
# Tableau Homework - Citi Bike Analytics ### Before You Begin * This assignment will be saved to your tableau public account rather than github. * If you haven't already, be sure to create a tableau public account [here](https://public.tableau.com/s/). * The free tier of tableau only lets you save to their public server. This means that each time you save your file it will be uploaded to your tableau public profile. * You are able to load and continue working on the same workbook. * When you are finished with your assignment, you will turn in the URL to your tableau public workbook along with any additional files used for your analysis. ## Background  Congratulations on your new job! As the new lead analyst for the [New York Citi Bike](https://en.wikipedia.org/wiki/Citi_Bike) Program, you are now responsible for overseeing the largest bike sharing program in the United States. In your new role, you will be expected to generate regular reports for city officials looking to publicize and improve the city program. Since 2013, the Citi Bike Program has implemented a robust infrastructure for collecting data on the program's utilization. Through the team's efforts, each month bike data is collected, organized, and made public on the [Citi Bike Data](https://www.citibikenyc.com/system-data) webpage. However, while the data has been regularly updated, the team has yet to implement a dashboard or sophisticated reporting process. City officials have a number of questions on the program, so your first task on the job is to build a set of data reports to provide the answers. ## Task **Your task in this assignment is to aggregate the data found in the Citi Bike Trip History Logs and find two unexpected phenomena.** **Design 2-5 visualizations for each discovered phenomena (4-10 total). You may work with a timespan of your choosing. Optionally, you may merge multiple datasets from different periods.** **The following are some questions you may wish to tackle. Do not limit yourself to these questions; they are suggestions for a starting point. Be creative!** * How many trips have been recorded total during the chosen period? * By what percentage has total ridership grown? * How has the proportion of short-term customers and annual subscribers changed? * What are the peak hours in which bikes are used during summer months? * What are the peak hours in which bikes are used during winter months? * Today, what are the top 10 stations in the city for starting a journey? (Based on data, why do you hypothesize these are the top locations?) * Today, what are the top 10 stations in the city for ending a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for starting a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for ending a journey (Based on data, why?) * Today, what is the gender breakdown of active participants (Male v. Female)? * How effective has gender outreach been in increasing female ridership over the timespan? * How does the average trip duration change by age? * What is the average distance in miles that a bike is ridden? * Which bikes (by ID) are most likely due for repair or inspection in the timespan? * How variable is the utilization by bike ID? **Next, as a chronic over-achiever:** * Use your visualizations (does not have to be all of them) to design a dashboard for each phenomena. * The dashboards should be accompanied with an analysis explaining why the phenomena may be occuring. **City officials would also like to see one of the following visualizations:** * **Basic:** A static map that plots all bike stations with a visual indication of the most popular locations to start and end a journey with zip code data overlaid on top. * **Advanced:** A dynamic map that shows how each station's popularity changes over time (by month and year). Again, with zip code data overlaid on the map. * The map you choose should also be accompanied by a write-up unveiling any trends that were noticed during your analysis. **Finally, create your final presentation** * Create a Tableau story that brings together the visualizations, requested maps, and dashboards. * This is what will be presented to the officials, so be sure to make it professional, logical, and visually appealing. ## Considerations Remember, the people reading your analysis will **NOT** be data analysts. Your audience will be city officials, public administrators, and heads of New York City departments. Your data and analysis needs to be presented in a way that is focused, concise, easy-to-understand, and visually compelling. Your visualizations should be colorful enough to be included in press releases, and your analysis should be thoughtful enough for dictating programmatic changes. ## Submission Your final submission should include: * A link to your Tableau Public workbook that includes: * 4-10 Total "Phenomenon" Visualizations * 2 Dashboards * 1 City Official Map * 1 Story * A text or markdown file with your analysis on the phenomenons you uncovered from the data. ## Assessment Your final product will be assessed on the following metrics: * Analytic Rigor * Readability * Visual Attraction ## Hints * You may need to get creative in how you combine each of the CSV files. Don't just assume Tableau is the right tool for the job. At this point, you have a wealth of technical skills and research abilities. Dig for an approach that works and just go with it. * Don't just assume the CSV format hasn't changed since 2013. Subtle changes to the formats in any of your columns can blockade your analysis. Ensure your data is consistent and clean throughout your analysis. (Hint: Start and End Time change at some point in the history logs). * Consider building your visualizations with small extracts of the data (i.e. single files) before attempting to import the whole thing. What you will find is that importing all 20+ million records of data will create performance issues quickly. Welcome to "Big Data." * While utilizing all of the data may seem like a nice power play, consider the time-course in making your analysis. Is data from 2013 the most relevant for making bike replacement decisions today? Probably not. Don't let overwhelming data fool you. Ground your analysis in common sense. * Remember, data alone doesn't "answer" anything. You will need to accompany your data visualizations with clear and directed answers and analysis. * As is often the case, your clients are asking for a LOT of answers. Be considerate about their need-to-know and the importance of not "cramming in everything". Of course, answer each question, but do so in a way that is organized and presentable. * Since this is a project for the city, spend the appropriate time thinking through decisions on color schemes, fonts, and visual story-telling. The Citi Bike program has a clear visual footprint. As a suggestion, look for ways to have your data visualizations match their aesthetic tones. * Pay attention to labels. What exactly is "time duration"? What's the value of "age of birth"? You will almost certainly need calculated fields to get what you need. * Keep a close eye for obvious outliers or false data. Not everyone who signs up for the program is answering honestly. * In answering the question of "why" a phenomenon is occurring, consider adding other pieces of information on socioeconomic or other geographic data. Tableau has a map "layer" feature that you may find handy. * Don't be afraid to manipulate your data and play with settings in Tableau. Tableau is meant to be explored. We haven't covered all that you need -- so you will need to keep an eye out for new tricks. * Treat this as a serious endeavor! This is an opportunity to show future employers that you have what it takes to be a top-notch analyst. * Good luck! ### Copyright Data Boot Camp (C) 2019. All Rights Reserved.
Shulin27
An extractive text summerizer model built implementing Google's page ranking algorithm through cosine approach by the means of word vectors.
Daniels-JohnDerek
Summer 2017 - Research assistant at Lehigh University Computer Science and Engineering Department. Worked with Daniel P. Lopresti, Professor and Chair, Department of Computer Science and Engineering. Topic of research - Data and Text Analytics. Developed Python program for women’s-rights law firm in Washington D.C. to extract text from multiple different format files, perform tesseract OCR on images to extract text, perform text analytics on extracted text and format using excel.
nikmors
A Holiday landing page: HTML/CSS Animation/Js A text animation created in Adobe XD. The SVG code has been extracted to create the HTML tags and then apply the animation with CSS. An SVG animation using stroke dash array and stroke dash offset to create a cool effect. In the end of the clip, there are two pictures related to our summer holiday theme. The two images slide in the end of the text animation, thanks to CSS Animation. We only use the Js here as a tool to help us gather information about the total length of the specific strokes from the stroke-dasharray of every letter we want to animate. Check it out. Tools/Skillset: -Adobe XD -HTML -CSS Animation -JS
Lspringer24
 Congratulations on your new job! As the new lead analyst for the [New York Citi Bike](https://en.wikipedia.org/wiki/Citi_Bike) Program, you are now responsible for overseeing the largest bike sharing program in the United States. In your new role, you will be expected to generate regular reports for city officials looking to publicize and improve the city program. Since 2013, the Citi Bike Program has implemented a robust infrastructure for collecting data on the program's utilization. Through the team's efforts, each month bike data is collected, organized, and made public on the [Citi Bike Data](https://www.citibikenyc.com/system-data) webpage. However, while the data has been regularly updated, the team has yet to implement a dashboard or sophisticated reporting process. City officials have a number of questions on the program, so your first task on the job is to build a set of data reports to provide the answers. ## Task **Your task in this assignment is to aggregate the data found in the Citi Bike Trip History Logs and find two unexpected phenomena.** **Design 2-5 visualizations for each discovered phenomena (4-10 total). You may work with a timespan of your choosing. Optionally, you may merge multiple datasets from different periods.** **The following are some questions you may wish to tackle. Do not limit yourself to these questions; they are suggestions for a starting point. Be creative!** * How many trips have been recorded total during the chosen period? * By what percentage has total ridership grown? * How has the proportion of short-term customers and annual subscribers changed? * What are the peak hours in which bikes are used during summer months? * What are the peak hours in which bikes are used during winter months? * Today, what are the top 10 stations in the city for starting a journey? (Based on data, why do you hypothesize these are the top locations?) * Today, what are the top 10 stations in the city for ending a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for starting a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for ending a journey (Based on data, why?) * Today, what is the gender breakdown of active participants (Male v. Female)? * How effective has gender outreach been in increasing female ridership over the timespan? * How does the average trip duration change by age? * What is the average distance in miles that a bike is ridden? * Which bikes (by ID) are most likely due for repair or inspection in the timespan? * How variable is the utilization by bike ID? **Next, as a chronic over-achiever:** * Use your visualizations (does not have to be all of them) to design a dashboard for each phenomena. * The dashboards should be accompanied with an analysis explaining why the phenomena may be occuring. **City officials would also like to see one of the following visualizations:** * **Basic:** A static map that plots all bike stations with a visual indication of the most popular locations to start and end a journey with zip code data overlaid on top. * **Advanced:** A dynamic map that shows how each station's popularity changes over time (by month and year). Again, with zip code data overlaid on the map. * The map you choose should also be accompanied by a write-up unveiling any trends that were noticed during your analysis. **Finally, create your final presentation** * Create a Tableau story that brings together the visualizations, requested maps, and dashboards. * This is what will be presented to the officials, so be sure to make it professional, logical, and visually appealing. ## Considerations Remember, the people reading your analysis will **NOT** be data analysts. Your audience will be city officials, public administrators, and heads of New York City departments. Your data and analysis needs to be presented in a way that is focused, concise, easy-to-understand, and visually compelling. Your visualizations should be colorful enough to be included in press releases, and your analysis should be thoughtful enough for dictating programmatic changes. ## Submission Your final submission should include: * A link to your Tableau Public workbook that includes: * 4-10 Total "Phenomenon" Visualizations * 2 Dashboards * 1 City Official Map * 1 Story * A text or markdown file with your analysis on the phenomenons you uncovered from the data. ## Assessment Your final product will be assessed on the following metrics: * Analytic Rigor * Readability * Visual Attraction ## Hints * You may need to get creative in how you combine each of the CSV files. Don't just assume Tableau is the right tool for the job. At this point, you have a wealth of technical skills and research abilities. Dig for an approach that works and just go with it. * Don't just assume the CSV format hasn't changed since 2013. Subtle changes to the formats in any of your columns can blockade your analysis. Ensure your data is consistent and clean throughout your analysis. (Hint: Start and End Time change at some point in the history logs). * Consider building your visualizations with small extracts of the data (i.e. single files) before attempting to import the whole thing. What you will find is that importing all 20+ million records of data will create performance issues quickly. Welcome to "Big Data." * While utilizing all of the data may seem like a nice power play, consider the time-course in making your analysis. Is data from 2013 the most relevant for making bike replacement decisions today? Probably not. Don't let overwhelming data fool you. Ground your analysis in common sense. * Remember, data alone doesn't "answer" anything. You will need to accompany your data visualizations with clear and directed answers and analysis. * As is often the case, your clients are asking for a LOT of answers. Be considerate about their need-to-know and the importance of not "cramming in everything". Of course, answer each question, but do so in a way that is organized and presentable. * Since this is a project for the city, spend the appropriate time thinking through decisions on color schemes, fonts, and visual story-telling. The Citi Bike program has a clear visual footprint. As a suggestion, look for ways to have your data visualizations match their aesthetic tones. * Pay attention to labels. What exactly is "time duration"? What's the value of "age of birth"? You will almost certainly need calculated fields to get what you need. * Keep a close eye for obvious outliers or false data. Not everyone who signs up for the program is answering honestly. * In answering the question of "why" a phenomenon is occurring, consider adding other pieces of information on socioeconomic or other geographic data. Tableau has a map "layer" feature that you may find handy. * Don't be afraid to manipulate your data and play with settings in Tableau. Tableau is meant to be explored. We haven't covered all that you need -- so you will need to keep an eye out for new tricks. * Treat this as a serious endeavor! This is an opportunity to show future employers that you have what it takes to be a top-notch analyst. * Good luck!
KamranNiroomand
Abstractive and Extractive Text summarization leveraging various Transformer-based models.
codeRED-03
Hello everyone, in this repo i have build a extractive text summerisation. I have used NLP, sklearn, textrank algorithm.
Khushalsawant
Extractive text summerization
ANJALI2980
No description available
nabeelasim
No description available
No description available
deepankar27
Extractive Text Summerization
velankar
Extractive approach for text summerization.
kriti524
Extractive and Abstractive Text Summerizer
frank93011
implementation of text summerization task using several method - extractive, seq2seq, seq2seq with attention.
ritwik-jain-tech
It is Desktop based Text Summerization app which is build on the Extractive methedology of text Summerization using I mplementation of Natural Language Processing and Vector Space Algorithm.
earlgreyhot1701D
Beginner Python project from CCC AI Summer Camp that extracts and structures PDF text for LLM-based tutoring support.
Saiiii0906
A AIML + Cloud project, which helps the user to upload the document(pdf)/image and extract the text using AWS textract service and storing the raw text in AWS S3/Dynamo DB, using AWS SageMaker for LLM summerization and AWS Bedrock for Q&A.
JanhaviV1220
This is an OCR project to extract text from Images, Audio files and PDFs using various libraries of python like pytesseract and whisper library of OpenAI.This project also gives a brief idea of cross referencing using MySQL and summerizing of the extracted text
For Medical Industry, postmarket survielance is very important. In this regards, the extracting and summerizing the relevant public available data is very important . This program is a text mining code to extract and summerize adverse event reports from MAUDE FDA website. The code was written in R software.
All 23 repositories loaded