Skip to content
mimi

Language Data Scientist

Innodata Inc.

Calgary · On-site Full-time 1w ago

About the role

Job Title: Language Data Scientist

Location: Remote within Canada (excluding Quebec)

Employment Type: Full-Time (40 hours per week) Fixed-Term

Salary Range: Up to $120k CAD

Company: Innodata is a leading data engineering company serving over 2,000 customers worldwide.

About the Role

Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance GenAI applications. You will work hands on with multi modal and multi lingual datasets and collaborate with cross functional partners. You will use your experience with human and synthetic data workflows to drive innovation and continuous improvement. The ideal candidate must have the right mix of skills in computational linguistics, human evaluation tasks, data science, and data engineering.

Key Responsibilities • Design/improve workflows to create data for AI/ML training and evaluation, including human annotation and synthetic workflows. • Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross functional collaboration with customers. • Critically assess annotation tooling and workflows. • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance. • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.

Qualifications • Familiarity with social media platforms and cultural context with North American trending content. • Nuanced interpretation of social media content in its setting. • Knowledge of how components of GenAI products or services combine to work. • Collaborating with cross functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals. • MA in (computational) linguistics, data science, computer science (AI/ML/NLU), quantitative social sciences or a related field; PhD preferred. • Extensive experience working with human language data and designing human evaluation tasks. • Deep understanding of language and its relationship with culture; ability to identify ambiguity and subjectivity in language. • Ability to work with multi lingual and multi modal projects. • Advanced knowledge of statistics, metrics (e.g., F1 score, inter rater reliability), and sampling methods. • Experience with NLP techniques and tools such as SpaCy, NLTK, Hugging Face. • Proficiency in Python for data transformation, visualization, and analytics. • Understanding of data pipelines, efficient data collection, storage, and processing. • Excellent problem solving, independent work, and ability to collaborate as part of a team.

Preferred Qualifications • Research experience staying up to date with generative AI, machine learning and deep learning techniques. • Knowledge of optimizing generative AI models for improved performance, scalability, and efficiency. • Experience developing and maintaining ML/AI pipelines. • Knowledge of fine tuning pre trained models. • Strong documentation and communication skills. • Experience mentoring junior team members. • Understanding of GPT, VAE, and GANs.

Equal Opportunity Employer

Innodata Inc is committed to equal employment opportunity and nondiscrimination. We provide equal opportunity to qualified people without regard to race, color, religion, sex, national origin, age, veteran status, disability, or any other legally protected status.

Requirements

  • Familiarity with social media platforms and cultural context with North American trending content.
  • Nuanced interpretation of social media content in its setting.
  • Knowledge of how components of GenAI products or services combine to work.
  • Collaborating with cross functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals.
  • MA in (computational) linguistics, data science, computer science (AI/ML/NLU), quantitative social sciences or a related field; PhD preferred.
  • Extensive experience working with human language data and designing human evaluation tasks.
  • Deep understanding of language and its relationship with culture; ability to identify ambiguity and subjectivity in language.
  • Ability to work with multi lingual and multi modal projects.
  • Advanced knowledge of statistics, metrics (e.g., F1 score, inter rater reliability), and sampling methods.
  • Experience with NLP techniques and tools such as SpaCy, NLTK, Hugging Face.
  • Proficiency in Python for data transformation, visualization, and analytics.
  • Understanding of data pipelines, efficient data collection, storage, and processing.
  • Excellent problem solving, independent work, and ability to collaborate as part of a team.

Responsibilities

  • Design/improve workflows to create data for AI/ML training and evaluation, including human annotation and synthetic workflows.
  • Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross functional collaboration with customers.
  • Critically assess annotation tooling and workflows.
  • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance.
  • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.

Skills

Computational linguisticsHuman evaluation tasksData scienceData engineeringSocial media platformsCultural contextGenAI productsAI project requirementsStatisticsMetricsSampling methodsNLP techniquesSpaCyNLTKHugging FacePythonData pipelinesData collectionData storageData processing

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free