Description
Are you ready to make an impact at DTCC?
Do you want to work on innovative projects, collaborate with a dynamic and supportive team, and receive investment in your professional development? At DTCC, we are at the forefront of innovation in the financial markets. We are committed to helping our employees grow and succeed. We believe that you have the skills and drive to make a real impact. We foster a thriving internal community and are committed to creating a workplace that looks like the world that we serve.
The Information Technology group delivers secure, reliable technology solutions that enable DTCC to be the trusted infrastructure of the global capital markets. The team delivers high-quality information through activities that include development of essential, building infrastructure capabilities to meet client needs and implementing data standards and governance.
Pay and Benefits:
- Competitive compensation, including base pay and annual incentive
- Comprehensive health and life insurance and well-being benefits, based on location
- Pension / Retirement benefits
- Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.
- DTCC offers a flexible/hybrid model of 3 days onsite and 2 days remote (onsite Tuesdays, Wednesdays and a third day unique to each team or employee).
The impact you will have in this role:
We are seeking an experienced L3 Messaging Platform Engineer to support and evolve our enterprise IBM MQ messaging ecosystem. This role is designed for a deeply technical engineer who goes beyond reactive support and actively drives the team toward a proactive, automation‑first mindset, and reliability‑focused operating model.
The ideal candidate thrives in complex production environments, has strong L3 troubleshooting expertise, and brings curiosity and enthusiasm for modern messaging technologies, including containers, cloud platforms, and next‑generation messaging architectures. This role provides hands‑on opportunities to influence platform strategy, tooling, and operational maturity while partnering closely with L2 teams, application owners, and infrastructure partners.
Your Primary Responsibilities:
- Serve as the L3 technical SME for IBM MQ / Messaging Technologies, providing deep troubleshooting, design guidance, and resolution of the most complex messaging incidents across production and non‑production environments.
- Partner closely with L2 support teams to uplift operational maturity by shifting knowledge left through improved runbooks, tooling, automation, and clear escalation patterns.
- Lead problem management and root‑cause analysis efforts, ensuring recurring incidents are fully understood, permanently remediated, and prevented from reoccurring.
- Design and implement proactive monitoring, alerting, and health indicators that detect leading signals of failure and reduce customer‑impacting incidents.
- Identify repetitive operational failure patterns and engineer self‑healing automation to automatically detect, mitigate, or recover from known failure scenarios.
- Build and maintain automation for MQ lifecycle operations, including provisioning, configuration validation, certificates, health checks, and recovery workflows.
- Drive reduction of operational toil by continuously replacing manual intervention with policy‑driven, automated, and resilient solutions.
- Actively contribute to incident postmortems, blameless retrospectives, and reliability reviews with a focus on systemic improvements and long‑term fixes.
- Support and influence platform modernization initiatives, including adoption of containerized messaging, cloud and hybrid architectures, and improved CI/CD integration where applicable.
- Collaborate with engineering, infrastructure, security, and application teams to ensure secure, resilient, and standards‑compliant messaging solutions.
- Mentor engineers across L2/L3 teams, promoting best practices in reliability engineering, automation, and proactive operations.
- Operate with a Site Reliability Engineering (SRE) mindset, focusing on improving platform reliability, availability, scalability, and resilience rather than reactive incident handling alone.
Talents Needed for Success:
- Minimum of 10 years of related experience
- Strong expertise with IBM MQ (including MQ IPT, and NativeHA) and IBM App Connect Enterprise (ACE)
- Proven L3 Troubleshooting experience, including performance analysis and failure recovery
- Solid experience in Linux/Unix environments
- Strong understanding of high-availability, fault-tolerant system design
- Experience with automation tools such as Ansible, Chef, Terraform
- Experience with observability and monitoring (e.g., Splunk, Grafana, Prometheus, APM6)
- Exposure to containerized messaging platforms and modern deployment models (e.g., OpenShift/Kubernetes, cloud or hybrid environments)
- Ability to identify recurring operational issues and design long‑term, sustainable fixes instead of short‑term workarounds
- Proven ability to remain calm, structured, and decisive during major incidents, providing technical leadership and clear communication
- Strong Site Reliability Engineering (SRE) mindset, with a proven track record of improving system reliability, stability, and availability through engineering solutions rather than manual intervention
- Strong ownership mindset with the ability to drive outcomes without excessive oversight
Actual salary is determined based on the role, location, individual experience, skills, and other considerations. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation