This project provides a comprehensive sentiment analysis pipeline for MyTelkomsel application reviews scraped from the Google Play Store. It features a robust Indonesian text preprocessing engine and compares three distinct architectural approaches: TF-IDF with Linear SVM, Word2Vec with Random Forest, and a Deep Learning BiLSTM network to identify user sentiment with high accuracy.
Tech Stack
Python
TensorFlow
Keras
Scikit-learn
Gensim
Pandas
NumPy
google-play-scraper
Tools Used
Jupyter Notebook
Git LFS
VS Code
Google Play Store API
Key Features
Data Acquisition
▸Automated Scraping: Direct review extraction from Google Play Store using `google-play-scraper`.
python src/train/inference.py --text "Aplikasi sangat bagus dan membantu!"
Challenges & Solutions
Challenge
Imbalanced Neutral Sentiment
Solution
Applied custom weights and refined text normalization to improve F1-scores for the neutral class, which is inherently more ambiguous in Indonesian app reviews.
Challenge
Large Model Storage
Solution
Integrated Git LFS (Large File Storage) to manage model binary persistence while keeping the source repository lightweight and performant.
Challenge
Casual Language Variance
Solution
Developed a robust preprocessing script to handle Indonesian-specific linguistic nuances, slang, and common typos found in Play Store reviews.