Video Summarization (VS) has been recognized as one of the most interested research and development field since the late 2000s. Generation of correct and adequate summaries for the given video is the end goal of the VS. There are different sub fields evolved since then such as Video Synopsis, Video Storytelling, Text-based Video Summaries (TVS), etc. Improvements in the Vision area with Convolutional Neural Network (CNN) approach have been accelerated this field further with all the ML categories such as Supervised Learning (SL), Unsupervised Learning (UL), and Reinforcement Learning (RL). Current State-of-the-Art (STOA) methods show that the usage of Natural Language Processing (NLP) and Transformer based solutions would make VS into a viable solution. However, the TVS area is yet to be investigated into the feasibility and real-world application. To fill this gap in TVS area, we introduce 3ML-TVS, called Three different Machine Learning to Text-based Video Summarization, a feasible solution that is made from the existing ML methods in Action Classification, Object Classification, and NLP. By fine-tuning each model individually, the result can be generated with promising accuracy. The proposed system is demonstrated the capacity of being applied to the real-world application also. This solution also proves that existing ML models have the capability to tackle much harder problem with simple systematic approaches rather than implementing a gigantic ML network.
Stars
1
Forks
0
Watchers
1
Open Issues
0
Overall repository health assessment
No language data available
No package.json found
This might not be a Node.js project
1
commits