Back to Projects
Data Science
Completed

MyTelkomsel Sentiment Analysis

Comparing ML/DL architectures for Indonesian app review sentiment classification.

Aug - Oct 2025Team: Personal ProjectRole: Data Scientist

About this Project

This project provides a comprehensive sentiment analysis pipeline for MyTelkomsel application reviews scraped from the Google Play Store. It features a robust Indonesian text preprocessing engine and compares three distinct architectural approaches: TF-IDF with Linear SVM, Word2Vec with Random Forest, and a Deep Learning BiLSTM network to identify user sentiment with high accuracy.

Tech Stack

Python
TensorFlow
Keras
Scikit-learn
Gensim
Pandas
NumPy
google-play-scraper

Tools Used

Jupyter Notebook
Git LFS
VS Code
Google Play Store API

Key Features

Data Acquisition

  • Automated Scraping: Direct review extraction from Google Play Store using `google-play-scraper`.
  • Star Rating Mapping: Automated labeling logic (1-2 stars → Negative, 3 → Neutral, 4-5 → Positive).
  • Indonesian Focus: Targeted collection logic specifically for the `id:id` locale.

Text Preprocessing

  • Tokenization & Cleaning: Robust removal of URLs, mentions, hashtags, and whitespace normalization.
  • Indonesian NLP: Specific normalization rules to handle local slang and formal/informal Indonesian text.
  • Vectorization Pipelines: Comparative implementation of TF-IDF, Word2Vec, and Keras Tokenizer.

Model Architectures

  • Scheme A (Linear SVM): High-speed production model using TF-IDF feature extraction (86.15% accuracy).
  • Scheme B (Random Forest): Ensemble learning approach using 200D custom Word2Vec embeddings.
  • Scheme C (BiLSTM): State-of-the-art Deep Learning architecture for capturing sequential context (86.82% accuracy).

Analytical Insights

  • Performance Benchmarking: Detailed F1-score and accuracy comparison across all three model types.
  • CLI Inference Tool: Cross-platform command-line tool for real-time sentiment prediction.
  • Model Persistence: Efficient storage and versioning of large `.keras` and `.joblib` files via Git LFS.

Highlights

BiLSTM Deep Learning Model
Automated Play Store Scraping
Indonesian NLP Pipeline

Installation

Base Setup

git clone https://github.com/Arfazrll/MyTelkomsel-Sentiment-Insights.git
cd mytelkomsel-sentiment-analysis
git lfs install
git lfs pull
pip install -r requirements.txt

Data Collection

python src/scraping/scrape_playstore.py
python src/prepare_dataset.py

Inference

python src/train/inference.py --text "Aplikasi sangat bagus dan membantu!"

Challenges & Solutions

Challenge

Imbalanced Neutral Sentiment

Solution

Applied custom weights and refined text normalization to improve F1-scores for the neutral class, which is inherently more ambiguous in Indonesian app reviews.

Challenge

Large Model Storage

Solution

Integrated Git LFS (Large File Storage) to manage model binary persistence while keeping the source repository lightweight and performant.

Challenge

Casual Language Variance

Solution

Developed a robust preprocessing script to handle Indonesian-specific linguistic nuances, slang, and common typos found in Play Store reviews.

LinkedIn