CalTek is an Engineering and IT staffing agency recruiting a Data Architect for a client. Our client is a global leader in intelligent systems software powering billions of connected devices across telecommunications, automotive, aerospace, defense, industrial, and medical sectors. From real-time operating systems (RTOS) to secure virtualization platforms and embedded analytics, their mission-critical software is trusted by Fortune 500 companies and government agencies worldwide.
As innovation demands grow in areas such as 5G infrastructure, autonomous vehicles, satellite systems, industrial automation, and medical robotics, this company plays a pivotal role in enabling real-time, secure, and reliable data-driven decision-making at the edge and in the cloud.
To support its growing ecosystem of intelligent devices and embedded platforms, the company is seeking an experienced Data Architect to design and implement enterprise-grade data systems that support high availability, traceability, and performance at scale.
Typical Duties and Responsibilities
As a Data Architect, you will lead the strategic design, modeling, and governance of data systems that fuel high-stakes software platforms and embedded systems. You’ll work closely with engineering, DevOps, cybersecurity, and product teams to define data structures, manage pipelines, and ensure consistent access to clean, compliant, and actionable data across on-premise and cloud-native infrastructures.
Key Responsibilities Include:
- Design and implement scalable, fault-tolerant data architectures that support real-time and batch data processing across embedded and enterprise systems
- Define and maintain logical, physical, and conceptual data models, data catalogs, and schemas for structured and semi-structured data
- Collaborate with software engineering teams to embed data instrumentation into intelligent systems, including embedded Linux, RTOS, and edge analytics platforms
- Build secure and efficient data pipelines using modern tools (e.g., Apache Kafka, Apache NiFi, Airflow, dbt) to support telemetry, system health monitoring, and performance analytics
- Integrate multiple data sources (e.g., sensor data, event logs, machine telemetry, cloud APIs) into centralized data lakes and operational data stores
- Drive data governance policies, access controls, metadata management, and versioning in compliance with industry standards such as HIPAA, NIST 800-53, ISO/IEC 27001, and DoD STIGs
- Develop and maintain data warehouse and OLAP structures using platforms such as Snowflake, Amazon Redshift, BigQuery, or Azure Synapse
- Collaborate with ML and AI teams to ensure data readiness and model compatibility for advanced analytics and predictive maintenance
- Provide technical guidance on data lineage, replication, disaster recovery, and archival strategies
- Support CI/CD of data services using Terraform, Kubernetes, and GitOps pipelines
Education
- Bachelor’s degree in Computer Science, Software Engineering, Data Science, or a related technical field is required
- Master’s degree or relevant certifications (e.g., AWS Certified Data Analytics, Google Cloud Data Engineer, CDMP) preferred
Required Skills and Experience
- 5+ years of experience in data architecture, data engineering, or enterprise database design, ideally supporting embedded systems, IoT, or industrial software platforms
- Strong understanding of data modeling techniques (3NF, dimensional modeling, data vault, etc.)
- Proficiency in SQL (PostgreSQL, MySQL, MS SQL Server) and NoSQL databases (e.g., MongoDB, Cassandra, Redis)
- Experience designing and scaling streaming and event-driven architectures using Kafka, MQTT, RabbitMQ, or similar
- Strong hands-on experience with cloud platforms (AWS, Azure, or GCP) and infrastructure-as-code tools like Terraform or CloudFormation
- Familiarity with metadata management, schema evolution, and master data management (MDM) tools
- Knowledge of DevSecOps principles, data encryption, access control, and data privacy regulations (e.g., GDPR, CCPA, HIPAA)
- Proven experience with ETL/ELT pipeline orchestration and modern tools like Airflow, dbt, Fivetran, or Matillion
- Excellent documentation and communication skills—capable of translating complex data structures into understandable formats for technical and non-technical stakeholders
Preferred Qualifications
- Background supporting embedded platforms, real-time systems, or edge computing architectures
- Experience working with time-series databases (e.g., InfluxDB, TimescaleDB, Prometheus) for sensor and telemetry data
- Exposure to federated learning, data virtualization, or hybrid data mesh strategies
- Familiarity with secure communications protocols (e.g., TLS, SSH, MQTT-SN) in industrial or medical data environments
- Prior experience with military or aerospace standards (e.g., DO-178C, MIL-STD-1553, or NASA data protocols)
- Experience contributing to data strategy roadmaps, including cost optimization and technology selection for long-term scalability