Backend Development, API Integrations, Code Refactoring by Senior Engineers ✨

Без рубрики

Case Study: How We Tamed a Sprawling Legacy Codebas

Date

Author

Case Study: How We Tamed a Sprawling Legacy Codebase and Automated Financial Reporting for an E-commerce Business, Saving 40+ Hours Monthly

A deep dive into our Python, Pandas, and Django implementation that transformed a manual, error-prone process into a reliable, automated pipeline.

Introduction

We partnered with a fast-growing online retailer specializing in premium eco-friendly products. While sales were booming, their internal operations were buckling under the weight of a manual monthly reporting cycle. The process was a nightmare: key financial and marketing data was siloed across four different systems (WooCommerce, Google Analytics, Facebook Ads, and a legacy accounting software). Each month, an employee would spend days manually downloading CSV files, reconciling spreadsheets, and wrestling with inconsistent data formats. This led to inevitable human errors, reporting delays, and a complete lack of real-time business insights.

The Client’s Challenge: Automate the generation of a unified monthly performance report, eliminating manual work and ensuring data accuracy.

Key Pain Points

  • Time Drain: 40+ human-hours consumed every month.
  • Error-Prone: Manual data entry caused discrepancies and mistrust in reports.
  • Delayed Insights: Reports were finalized by the 10th of the following month, making the data stale for decision-making.
  • No Scalability: The process would completely break as the business continued to grow.

Our Investigation: Uncovering the Root Causes

We began not with code, but with a thorough audit. We needed to understand the data landscape before writing a single line.

We discovered:

  • Inconsistent Formats: Order IDs were integers in WooCommerce but strings with a prefix in the accounting system.
  • Currency Chaos: Ad platforms reported in USD, while the accounting system required EUR.
  • Data Inconsistency: An order might be refunded in WooCommerce but not reflected in the manually downloaded ad spend reports.
  • Legacy Hurdles: The old accounting software only provided data via a proprietary CSV export, with no API.

Conclusion: A simple script wouldn’t suffice. We needed a robust, fault-tolerant ETL (Extract, Transform, Load) pipeline with a centralized data store.

The Solution: Building a Robust Data Pipeline

We designed and implemented a three-tiered architecture:

  • ETL Scripts (Python/Pandas): To extract, clean, and harmonize data from all sources.
  • Django Admin Dashboard: To manage the process, view logs, and trigger manual runs if needed.
  • PostgreSQL Database: To serve as the single source of truth for transformed data.
  • Scheduled Tasks (Celery Beat): To automate the entire pipeline to run nightly.

Step 1: Extraction and Transformation (The «E» and «T» in ETL)

We created a modular Python class for each data source. Here’s a simplified example for processing the WooCommerce order data, demonstrating how we handled common issues.

woocommerce_extractor.py Python
# etl/sources/woocommerce_extractor.py
import pandas as pd
import numpy as np

class WooCommerceExtractor:
    """Extracts and transforms data from WooCommerce CSV export."""

    def __init__(self, file_path):
        self.df = pd.read_csv(file_path)
        self._prepare_data()

    def _prepare_data(self):
        # 1. Standardize Column Names
        self.df.rename(columns={
            'Order Number': 'order_id',
            'Order Date': 'date',
            'Order Total': 'total_value',
            'Customer User': 'customer_id',
            'Status': 'status'
        }, inplace=True)

        # 2. Clean and Type Conversion
        # Extract numeric ID from a string like "1234"
        self.df['order_id'] = self.df['order_id'].str.extract('(\d+)').astype(int)
        
        # Convert to datetime
        self.df['date'] = pd.to_datetime(self.df['date'], errors='coerce')
        
        # Convert currency and filter only completed orders
        self.df['total_value_eur'] = self.df['total_value'] * self._get_usd_to_eur_rate()
        self.df = self.df[self.df['status'].str.lower() == 'completed']

        # 3. Select and finalize columns
        self.df = self.df[['date', 'order_id', 'customer_id', 'total_value_eur']]

    def _get_usd_to_eur_rate(self):
        # In a real scenario, this would fetch from a financial API
        return 0.93

    def get_data(self):
        return self.df

# Example usage
# wc_data = WooCommerceExtractor('wc_orders.csv').get_data()
# print(wc_data.head())

Step 2: Data Unification and Loading (The «L» in ETL)

After each source was processed, we merged them into a final dataset within our Django project.

etl_pipeline.py Python
# reports/etl_pipeline.py
from .sources.woocommerce_extractor import WooCommerceExtractor
from .sources.facebook_ads_extractor import FacebookAdsExtractor
from django.db import transaction
from .models import UnifiedSale

class ETLPipeline:

    def run(self):
        # Extract and Transform from all sources
        wc_data = WooCommerceExtractor('path/to/wc_export.csv').get_data()
        fb_data = FacebookAdsExtractor('path/to/fb_export.csv').get_data()

        # Merge Data on a common key (e.g., date)
        # This is a simplified example. A real merge would be more complex.
        merged_data = pd.merge(wc_data, fb_data, on='date', how='outer')

        # Load into Django Model
        try:
            with transaction.atomic(): # Ensure database integrity
                for index, row in merged_data.iterrows():
                    UnifiedSale.objects.update_or_create(
                        date=row['date'],
                        defaults={
                            'revenue': row['total_value_eur'],
                            'ad_spend': row['amount_eur'],
                            # ... other fields
                        }
                    )
                print("ETL pipeline completed successfully.")
        except Exception as e:
            print(f"ETL pipeline failed: {e}")
            # Implement robust logging and alerting here

# models.py
from django.db import models

class UnifiedSale(models.Model):
    """Centralized model for storing unified report data."""
    date = models.DateField(unique=True)
    revenue = models.DecimalField(max_digits=10, decimal_places=2)
    ad_spend = models.DecimalField(max_digits=10, decimal_places=2)
    orders = models.IntegerField()
    # ... other key metrics

    class Meta:
        ordering = ['-date']

Step 3: Automation and Scheduling

We used Celery Beat to schedule the ETL pipeline to run automatically every night.

tasks.py Python
# tasks.py
from celery import shared_task
from .etl_pipeline import ETLPipeline

@shared_task
def run_etl_pipeline():
    """Celery task to run the ETL process."""
    pipeline = ETLPipeline()
    pipeline.run()

# celery.py (configuration)
app.conf.beat_schedule = {
    'run-nightly-etl': {
        'task': 'reports.tasks.run_etl_pipeline',
        'schedule': crontab(hour=3, minute=0), # Runs every day at 3 AM
    },
}

The Measurable Results

The impact was immediate and transformative.

40+ Hours Saved Monthly

The manual reporting process was completely eliminated. The report is now generated automatically and is available on the 1st of every month.

99.9% Data Accuracy

Automated validation rules and a single source of truth eradicated human error. Finance and Marketing now work from the same numbers.

Faster, Data-Driven Decisions

The client’s leadership team now has a Django admin dashboard they can check daily to monitor key metrics like CAC, LTV, and ROAS, instead of waiting for a monthly report.

Future-Proof Foundation

The modular pipeline can easily be extended to include new data sources (e.g., a new marketplace like Amazon) as the business continues to grow.

“This wasn’t just a technical fix; it was a transformational project for our operations. The team didn’t just write code—they deeply understood our business problem and built a solution that is both incredibly robust and easy for our non-technical staff to use. The time and frustration saved are immeasurable.”

— COO, Eco-Friendly Retailer

Conclusion: Beyond the Hype of Automation

This project is a perfect example of how strategic software development delivers tangible ROI. It wasn’t about using the trendiest technology; it was about applying robust engineering principles—modularity, automation, and data integrity—to solve a critical business pain point.

Is your team wasting valuable time on manual data processing? Our team of expert developers specializes in building custom automation solutions that drive efficiency and provide a competitive edge.

Contact us today

Let’s discuss how we can help you turn your operational headaches into your greatest strengths.

No Terms Found

Share Post: