Our company has built a tool using Python to monitor data pipelines and services.
To test and demonstrate the tool’s capabilities, we have a set of demo “playground” pipelines with various simple tasks…
– data ingest
– ETL
– model training
– model deployment
The pipelines demonstrate the ability to work with a wide variety of common data science and analytics resources:
– AWS / GCP services (deployment with Pulumi)
– AWS S3 / GCP Cloud Storage
– APIs
– database operations
We want to keep extending and refining these demos as we continue to build features for our product:
– dbt
– Airbyte
– Databricks
– Snowflake
– etc
We’d like to find someone willing to take on a few tasks per week to help maintain and expand these demos and docs.
The budget is limited to start, with opportunity to renegotiate after 2 weeks. You are encouraged to use cursor and Claude Sonnet 3.5. We are very flexible on scheduling.
Example tasks we have for the upcoming few weeks:
– Establish an API Gateway (v1) for public access to an AWS Lambda, secured with rate limiting and a secret key
– Create a copy of an existing AWS Lambda pipeline with an event-driven architecture instead of a single python script
– Create a copy of an existing AWS Lambda model training script as a Databricks notebook
– Create a copy of an existing AWS Lambda pipeline using Airbyte
– Create a copy of an existing AWS Lambda pipeline using dbt
– Create a copy of an existing AWS Lambda pipeline using AWS Glue
– Create a sample dataset and load it into Snowflake
– Create a sample dataset and load it into Postgres (Neon)
– Update an existing pipeline task so that it will fail (timeout) every Saturday
– Update an existing pipeline task so that it will fail (bad API key) every Wednesday
If this is of interest, please introduce yourself and share
– a desired weekly budget,
– desired number of hours, and
– which of the above tasks you’d expect to be able to accomplish in a first week given ~4 hours (half a day