Cloud Reliability Specialist (Mandarin Speaking) - Intern

Huawei Ireland Research Centre
Dublin
Internship
1 week ago

About Huawei

Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices. With integrated solutions across four key domains – telecom networks, IT, smart devices, and cloud services – we are committed to bringing digital to every person, home and organization for a fully connected, intelligent world.

At Huawei, innovation focuses on customer needs. We invest heavily in basic research, concentrating on technological breakthroughs that drive the world forward. We have more than 180,000 employees, and we operate in more than 170 countries and regions.

About the IRC

Huawei Ireland Research Centre (IRC) mission is to position Huawei as a recognized technology leader and a global provider of information and communications technology (ICT) solutions. To achieve this we are building an industry-recognized multi-discipline Research Centre of experts with focus on medium-term to long-term issues. The IRC will work closely with an open innovative ecosystem with Huawei customers to address real-world issues. The IRC will also engage with key European universities to build a basic research capability to support Huawei technical projects.

Job Overview

We are looking for a Mandarin-speaking specialist to support our technical teams on cloud reliability. He/she will contribute to one of the technical areas the lab currently is working on, e.g. Fault Localization Agent design and development. Meanwhile, he/she will also assist by researching relevant topics and translating complex technical concepts into clear, concise documentation. In this role, the specialist will conduct research, analyze publicly available information, and collaborate with technical experts to produce source code, reports, presentations, and other technical documentation.

Key Responsibilities

Qualifications

Project Examples

1. Explainable multivariant anomaly detection

Many existing multivariate anomaly detection solutions lack explainability, making fault localization challenging—especially since SRE engineers may distrust black-box outputs. Inspired by GNNs/GCNs, we designed and developed graph-based algorithms to detect anomalies across cloud infrastructure, including physical/virtual hardware and microservice layers. Our solution delivers reliable performance while maintaining resource efficiency, even in production environments.

2. AI Agent for fault management

Fault management remains a significant challenge for Site Reliability Engineers (SREs). AI agents present a promising solution to streamline this process and reduce manual effort. Currently, we are investigating and developing a series of specialized agents—such as detection and localization agents—for specific use cases, built on our agent platform. This work will help to build knowledge and experience on state-of-the-art AI technologies on LLM and causal analysis, such as intent understanding and COT(chain-of-though).

Privacy Statement

Please read and understand our West European Recruitment Privacy Notice before submitting your personal data to Huawei so that you fully understand how we process and manage your personal data received.

http://career.huawei.com/reccampportal/portal/hrd/weu_rec_all.html

Apply
Other Job Recommendations:

Manager, Site Reliability Engineering Tooling

Toast
Dublin
  • Developing and evangelizing patterns and best practices to...
  • Enable a geographically distributed team of talented...
3 days ago

Senior Network Reliability Engineer

Oracle
Dublin
  • Hands-on experience with network monitoring and telemetry...
  • Strong communication skills, both written and verbal, with...
3 days ago

Site Reliability Engineer

Mastercard
Dublin
  • Abide by Mastercard’s security policies and practices;
  • Ensure the confidentiality and integrity of the information...
5 days ago

Reliability Engineer and MEP Lead

AbbVie
County Mayo
  • Define and own the global MEP strategy, governance model,...
  • Set, review, and drive performance of MEP KPIs (asset...
1 week ago

Senior Site Reliability Engineer

Fivetran
  • Ensuring the ongoing reliability and robustness of...
  • Partnering with engineering, support, and sales to...
1 week ago