System Monitoring Expert
Job Summary:
* Responsible for the daily health, performance, and availability of enterprise monitoring tools.
* Perform routine maintenance, upgrades, and configuration tuning to ensure optimal system efficiency.
* Triage and resolve monitoring-related incidents and service tickets in a timely and efficient manner.
* Collaborate with application, infrastructure, and DevOps teams to integrate monitoring solutions and improve visibility.
* Develop and maintain dashboards, alerts, and reports to support operational and business needs.
* Participate in on-call rotations and support incident response efforts.
* Document operational procedures, runbooks, and knowledge base articles.
* Identify and implement automation opportunities to reduce manual effort and improve reliability.
Key Qualifications:
* At least 5 years of experience in systems engineering or enterprise monitoring roles.
* Hands-on experience with Splunk, DynaTrace, and NewRelic in production environments.
* Strong understanding of IT operations, incident management, and ticketing systems.
* Proficiency in scripting languages (e.g., Python, PowerShell, Bash) for automation and tool integration.
* Familiarity with cloud platforms (AWS, Azure, or GCP) and containerized environments (Kubernetes, Docker).
* Excellent troubleshooting skills and a bias for action in high-pressure situations.
* Strong written and verbal communication skills in English; Portuguese is an asset.
About the Role:
This is a challenging opportunity to work with cutting-edge technology and contribute to the development of our monitoring capabilities.
We are looking for a highly skilled and experienced System Monitoring Expert to join our team.