Today’s objective was to containerize the SEC filing analysis application to ensure a consistent and reproducible runtime environment. This process involved not only creating the necessary Docker artifacts but also refactoring the application’s configuration to adhere to best practices for handling secrets and environment-specific variables.
Initial Containerization and Refactoring
The first phase of the work focused on establishing the foundational components for containerization and improving the project’s production readiness.
The key commits included:
- Addition of
Dockerfileanddocker-compose.yml: Standard files were created to define the application’s image and orchestrate the service stack, respectively. TheDockerfileis based on a Python image, installs dependencies fromrequirements.txt, and defines the container’s entry point. - Dependency Management for Production: The
psycopg2-binarypackage was added torequirements.txt. This is a critical step to move from a development-time database like SQLite to a production-grade PostgreSQL database, which the Docker environment is configured to use. - Configuration Handling: Environment variable management within the application’s Python code (
main.py,database.py) was refactored to rely on a centralized configuration system, anticipating the need to inject variables from the Docker environment. - Source Control Hygiene: The
.gitignorefile was updated to exclude a broader range of files and artifacts, such as IDE configurations, environment files (.env), and cache directories, keeping the repository clean.
The Core Challenge: Propagating Secrets to the Container
Upon the initial docker-compose up, the application failed with an error indicating a missing API key for the OpenAI/OpenRouter service.
Investigation:
The root cause was identified quickly: the API credentials were hardcoded in a launch.json file, which is used exclusively by the VS Code debugger. These variables are not present in the runtime environment when the application is executed directly or via Docker.
Solution:
The standard and most secure practice is to decouple secrets and configuration from the application code and development tools. The following steps were executed to resolve the issue:
-
Centralize Secrets in
.env: All environment-specific variables, including the existingDATABASE_URLand the new LLM credentials (LLM_MODEL,OPENAI_API_KEY,OPENAI_BASE_URL), were consolidated into a.envfile at the project root. This file is explicitly excluded from version control by.gitignore. -
Update
docker-compose.yml: While Docker Compose automatically loads variables from a.envfile in the same directory, it is best practice to explicitly pass them to the service for clarity and control. Thewebservice definition indocker-compose.ymlwas updated to pass the LLM-related environment variables to the container. -
Update Application Configuration (
config.py): The application’s Pydantic-based settings module (config.py) was extended. TheSettingsclass was modified to loadLLM_MODEL,OPENAI_API_KEY, andOPENAI_BASE_URLfrom the environment, making them available to the application logic in a structured and validated manner.
This three-step process—.env -> docker-compose.yml -> config.py—establishes a robust and conventional pattern for managing configuration that works seamlessly across local development and containerized environments.
A Note on Local Environment Troubleshooting
A secondary challenge encountered during this process was related to the local Docker setup on Ubuntu. Initial commands failed with permission denied... docker.sock, which evolved into protocol not available after switching contexts.
This is a common issue for users of Docker Desktop on Linux. The CLI was initially trying to connect to the traditional Docker daemon socket (/var/run/docker.sock), which was not running. The correct target was the Docker Desktop VM, managed via the desktop-linux context.
The persistent protocol not available error indicated that the Docker Desktop backend service itself had entered a faulty state. The resolution was to perform a Clean / Purge data from the Docker Desktop troubleshooter. This action resets the Docker engine, deleting all images and containers, but resolves underlying data corruption.
Conclusion
The primary goal of containerizing the application was achieved. The key takeaway is the critical importance of a well-defined configuration management strategy. But again, like many things in software development, the process always involves unexpected challenges and debugging efforts.