Job Overview:
We are seeking a highly skilled IT professional to lead our enterprise monitoring efforts. This role will involve owning the daily health, performance, and availability of our monitoring tools.
Responsibilities:
* Perform routine maintenance, upgrades, and configuration tuning to ensure optimal system performance
* Triage and resolve monitoring-related incidents and service tickets in a timely manner
* Collaborate with application, infrastructure, and DevOps teams to integrate monitoring solutions and improve visibility
* Develop and maintain dashboards, alerts, and reports to support operational needs
* Participate in on-call rotations and support incident response efforts
* Document operational procedures, runbooks, and knowledge base articles
* Identify and implement automation opportunities to reduce manual effort and improve reliability
Requirements:
* 5+ years of experience in systems engineering or enterprise monitoring roles
* Hands-on experience with Splunk, DynaTrace, and NewRelic in production environments
* Strong understanding of IT operations, incident management, and ticketing systems
* Proficiency in scripting languages for automation and tool integration
* Familiarity with cloud platforms and containerized environments
* Excellent troubleshooting skills and a bias for action in high-pressure situations
* Strong written and verbal communication skills