About the Role:
We are seeking a highly skilled Software Developer with deep expertise in C/C++ programming and a strong understanding of GPU architecture to join our high-performance computing team. In this role, you will be responsible for analyzing, optimizing, and porting existing CPU-based code to GPU-based architectures, maximizing parallel performance and efficiency.
Key Responsibilities:
* Design, develop, and optimize performance-critical software components in C and C++.
* Analyze existing CPU code and port functionality to GPUs (CUDA, OpenCL, or equivalent).
* Work closely with algorithm engineers and systems architects to identify opportunities for parallelization.
* Develop and maintain GPU-accelerated modules, with a focus on performance tuning and memory management.
* Use profiling and debugging tools to benchmark, troubleshoot, and optimize GPU workloads.
* Collaborate with cross-functional teams on integrating GPU-accelerated code into larger software systems.
* Contribute to documentation, code reviews, and performance analyses.
Requirements:
Technical Skills:
* Strong proficiency in C and C++ (including modern C++ standards).
* Deep understanding of GPU architecture and parallel computing concepts.
* Proven experience in porting CPU code to GPU using CUDA, OpenCL, or similar technologies.
* Familiarity with GPU memory models, kernels, warp/thread management, and optimization techniques.
* Experience with performance profiling tools (e.g., NVIDIA Nsight, Visual Profiler, VTune, etc.).
* Solid understanding of multithreading, memory hierarchy, and numerical computation.
Experience:
* 3–7+ years of professional experience in systems software or performance-critical application development.
* At least 2 years of working with GPU programming or porting workloads to GPU platforms.
* Experience in domains such as scientific computing, simulations, video processing, rendering, or machine learning is a plus.
Preferred Qualifications:
* Experience with cross-platform development (Linux, Windows).
* Familiarity with Python or scripting languages for workflow automation or testing.
* Exposure to low-level performance tuning (SIMD, cache optimization).
* Knowledge of other accelerators (e.g., Vulkan, Metal, DirectCompute, or FPGA) is a bonus.
*Please send your resume in English
Salary: 13,100 - 21,900 BRL