-
CNTXT AI

AI Engineer

CNTXT AI
United Arab Emirates · Full-time · Mid-Senior

AI Engineer — Speech & Voice Intelligence

Company: CNTXT Type: Full-time Location: Remote-friendly / Hybrid


About CNTXT

CNTXT is building voice AI infrastructure for the Arabic-speaking world. We work on the hard problems — natural speech synthesis, real-time transcription, and conversational voice systems — with a focus on Arabic language quality that actually serves the region's speakers.


The Role

We're looking for an AI engineer or researcher who is passionate about voice and speech technology. You'll work directly on the models and systems that power our speech products — evaluating architectures, running fine-tuning experiments, and shipping improvements to production. This is a hands-on role that sits at the intersection of research and engineering.


What Our Team Works On

Speech Synthesis (TTS) We build and fine-tune Arabic TTS systems based on state-of-the-art generative architectures — both autoregressive models that generate speech token by token and non-autoregressive models that produce full utterances in parallel. This includes working with neural vocoders (HiFi-GAN, MelGAN, WaveGlow), audio codecs and tokenizers (EnCodec, DAC, RVQ-based systems), acoustic encoders (HuBERT, wav2vec), and diffusion-based audio decoders. A significant focus is voice cloning and zero-shot speaker adaptation for Arabic voices.


Speech Recognition (ASR) We work with encoder-decoder and CTC-based ASR models (Whisper, Conformer, wav2vec 2.0) to build accurate, low-latency Arabic transcription. This includes streaming inference, domain adaptation, and language model integration for Arabic dialect robustness.


Speech-to-Speech We are building end-to-end voice interaction pipelines that chain ASR, language understanding, and TTS — with hard constraints on latency. This involves voice activity detection (VAD), speaker diarization, speech enhancement, and optimizing the full stack for real-time performance.


Arabic Language Challenges Arabic presents unique challenges across the whole stack: diacritization (tashkil) is critical for TTS pronunciation accuracy, dialect variation (MSA, Gulf, Levantine, Egyptian, Maghrebi) affects both synthesis and recognition quality, and training data for many dialects remains scarce. A big part of our work is closing these gaps.


What You'll Work On

  • Benchmark and evaluate TTS and ASR models on Arabic test sets — measuring WER, speaker similarity (SIM), naturalness, and dialect coverage across MSA and regional varieties
  • Fine-tune pretrained TTS models on curated Arabic data — including ablations on diacritized vs. undiacritized input, dialect-specific training splits, and voice prompt conditioning
  • Experiment with audio tokenizer and codec configurations — comparing discrete RVQ representations against continuous latent approaches and their effect on Arabic phoneme accuracy
  • Build and maintain Arabic speech data pipelines — audio sourcing, normalization, diacritization, quality filtering, and manifest generation for model training
  • Optimize models for production serving — streaming chunk generation, KV cache tuning, quantization, and batched inference for low-latency Arabic TTS and ASR
  • Evaluate and adapt speech-to-speech pipelines — integrating ASR, LLM, and TTS components with attention to end-to-end latency and Arabic conversational quality


What We're Looking For

  • Strong foundations in machine learning and deep learning
  • Hands-on experience training or fine-tuning neural models — domain matters less than depth
  • Comfortable with Python, PyTorch, and the HuggingFace ecosystem
  • Able to read research papers and translate ideas into experiments independently
  • Clear communicator who can work across research and engineering


Nice to Have

  • Native or fluent Arabic speaker — a real advantage when evaluating synthesis naturalness and dialect quality
  • Prior work with speech or audio models (ASR, TTS, speaker verification, codec, VAD, enhancement, or similar)
  • Familiarity with Arabic linguistic structure, diacritization tools, and NLP preprocessing for Arabic
  • Experience with inference optimization — quantization, speculative decoding, CUDA kernels, or serving frameworks (vLLM, TensorRT)
  • Publications or open-source contributions in speech or audio


What We Offer

  • Work at the frontier of Arabic voice AI — a genuinely underserved, high-impact area
  • Direct influence on product and research direction
  • Small, focused team — your work ships and matters
  • Competitive compensation and remote flexibility

Key Skills

Ranked by relevance

asr ai machine learning pytorch python
Login to Apply
Posted
May 16, 2026
Type
Full-time
Level
Mid-Senior
Location
Abu Dhabi
Company
CNTXT AI

Industries

Software Development

Categories

Engineering

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
boost.ai
Related

AI Engineer - Voice

2026-06-17

Full-time
Associate
Norway
Software Development
Engineering
View Job Details
boost.ai
Related

Support Engineer - Voice

2026-06-17

Full-time
Associate
Norway
Software Development
Engineering
View Job Details
Next Match AI
Related

Full Stack Engineer

2026-06-17

Full-time
Entry
United Arab Emirates
Software Development
Engineering