Software Engineer in Data
At Kamu we are developing a novel Web3 technology that, similarly to the invention of SQL database 40 years ago, will write a new chapter in humanity’s transition towards data economy.
This is your opportunity to join an ambitious early-stage startup that has already secured funding, and work on a technology that will shape the future of data science from a place of relative financial stability.
About us #︎
Kamu is building a unique decentralized network for the exchange and collaborative processing of structured data (whitepaper). Think of it as GitHub on top of a decentralized database, where people and organizations can share near real-time data streams, and data scientists can collaboratively compose them with SQL into high-quality data products ready for use by data-centric apps and for AI/ML. The network guarantees that all data is 100% auditable and verifiable and brings superior automation, accountability, and transparency to the data flows that underpin our society.
Kamu is backed by multiple investors and companies including Protocol Labs (the creators of IPFS and Filecoin) and Dell Technologies.
- A distributed, multinational company with a presence in Canada, Ukraine, and Portugal
- A highly technical group with decades of experience in big data, distributed software, and PhDs in AI/ML and computer science
- A team that takes pride in delivering quality products and efficient workflows
- Strong believers in Web3, decentralization, and personal data ownership
- Open Source enthusiasts who develop technology in the open, and constantly share progress with the community through publications and conferences.
About you #︎
You have sharded databases, tuned replication, and dove into intricacies of transaction isolation levels. You know your way around OLAP cubes and transitioned companies from map-reduce to data lakes and fabrics. You have developed countless data APIs, optimized join peformance in your favorite dataframe library, and wrote a few of analytics engines of your own.
You have a burning passion for data … yet also a lingering sense that something is missing. You can’t help but wonder:
- Why data flows in every company resemble a Rube Goldberg machine
- Why there are hundreds of different analytical databases, while the vast majority of organizations in the world still cannot afford anything beyond Excel?
- Why two dacades into Big Data age everyone still struggles to keep even small data up-to-date and of good quality?
- Why despite the mantra of “breaking down the silos” enterprise data produced nothing but silos?
- Why reuse and collaboration in data does not exist, and all attempts to create cross-company data repositories turns into “data graveyards”?
You, just like us, feel that the world of data is ready for a major innovation that will shake up the status quo. So instead of continuing the “rat race” towards bigger and more performant data that benefits only big tech companies - you want to apply your skills to something that can make a real difference in the world and democratize the data globally.
If this is true - you should talk to us!
As a Data Engineer in Kamu you will be working on the core technologies that serve our network and the platform:
- A stream-oriented data format for structured dynamic data that can work with conventional (S3, HDFS) and decentralized (IPFS, Arweave) storage
- A metadata format that serves as a passport of data and describes every event that influenced it
- A protocol for 100% verifiable, reproducible, and auditable multi-party data processing
- A fleet of plug-in data processing engines
- And an infrastructure that turns this technology into a novel decentralized and near real-time data lake!
Core technology stack:
- Apache Arrow
- Streaming (temporal) SQL
- Apache Spark, Flink, DataFusion
- IPLD, IPFS, Filecoin
- Ethereum blockchain
Your work will include:
- Evolving the core data formats and protocols
- Improving the the existing data engines and integrating new ones
- Building an efficient distributed processing infrastructure for running data pipelines and API queries
- Designing data access APIs for ingress and egress of data
- Building a federated data sharing and compute network
- Integrating Kamu with 3rd-party data providers and consumers
- Integrating Kamu with blockchain decoding/indexing technologies
- Research and implementation of features like:
- Privacy-preserving compute
- Fine-grain provenance
- AI/ML integration with Kamu data pipelines
- Communicating your progress to users and the community
- Contributing to the product documentation and automated testing
- BSc in CS or equivalent experience
- 6+ years of industry experience
- Required skills:
- High profficiency in Rust, Java, or Scala
- Strong knowledge of SQL and database internals
- Modern data lake architecture and horizontal scaling
- Data science toolkits (Pandas, R)
- Data integration systems and patterns
- Software quality (test pyramid, CI/CD)
- Bonus skills:
- Structured data formats (Parquet, Arrow)
- Stateful stream processing fundamentals
- CDC, Event sourcing
- Docker, AWS, Kubernetes
- Data visualization (PowerBI, Tableau, Jupyter)
- Development methodologies (Agile, Scrum)
- Open source collaboration
- Blockchain indexing and analytics (Dune, TrueBlocks)
- Decentralized storage (IPFS)
- Good written English skills, ability to write clear documentation
What we offer #︎
- 🤙 Remote work with flexible hours
- 💵 Competitive salary, equity
- 💻 $1,500 home office equipment stipend
- 🏖️ 21 days of paid vacation per year
- ✈️ Conference travel and education budget
Application process #︎
- Technical screening [40m]
- Chat with one of the founders [40m]
- Online interview [90m]
Send your CV to firstname.lastname@example.org
All applications are reviewed by a human
🇺🇦✊ We stand with Ukraine and employ refugees and people on free and occupied territories. Ukrainian applicants can expect:
- Accelerated recruitment process
- Interview in their native language
- Home office equipment support
- Relocation support