About the Role:We are seeking a highly skilled Software Developer with deep expertise in C/C++ programming and a strong understanding of GPU architecture to join our high-performance computing team. In this role, you will be responsible for analyzing, optimizing, and porting existing CPU-based code to GPU-based architectures, maximizing parallel performance and efficiency.Key Responsibilities:Design, develop, and optimize performance-critical software components in C and C++.Analyze existing CPU code and port functionality to GPUs (CUDA, OpenCL, or equivalent).Work closely with algorithm engineers and systems architects to identify opportunities for parallelization.Develop and maintain GPU-accelerated modules, with a focus on performance tuning and memory management.Use profiling and debugging tools to benchmark, troubleshoot, and optimize GPU workloads.Collaborate with cross-functional teams on integrating GPU-accelerated code into larger software systems.Contribute to documentation, code reviews, and performance analyses.Requirements:Technical Skills:Strong proficiency in C and C++ (including modern C++ standards).Deep understanding of GPU architecture and parallel computing concepts.Proven experience in porting CPU code to GPU using CUDA, OpenCL, or similar technologies.Familiarity with GPU memory models, kernels, warp/thread management, and optimization techniques.Experience with performance profiling tools (e.g., NVIDIA Nsight, Visual Profiler, VTune, etc.).Solid understanding of multithreading, memory hierarchy, and numerical computation.Experience:3–7+ years of professional experience in systems software or performance-critical application development.At least 2 years of working with GPU programming or porting workloads to GPU platforms.Experience in domains such as scientific computing, simulations, video processing, rendering, or machine learning is a plus.Preferred Qualifications:Experience with cross-platform development (Linux, Windows).Familiarity with Python or scripting languages for workflow automation or testing.Exposure to low-level performance tuning (SIMD, cache optimization).Knowledge of other accelerators (e.g., Vulkan, Metal, DirectCompute, or FPGA) is a bonus.*Please send your resume in EnglishSalary: 13,100 - 21,900 BRL