About the Role:
We are seeking a highly skilled
Software Developer
with deep expertise in
C/C++ programming
and a strong understanding of
GPU architecture
to join our high-performance computing team. In this role, you will be responsible for analyzing, optimizing, and porting existing CPU-based code to GPU-based architectures, maximizing parallel performance and efficiency.
Key Responsibilities:
Design, develop, and optimize performance-critical software components in
C and C++
.
Analyze existing CPU code and
port functionality to GPUs
(CUDA, OpenCL, or equivalent).
Work closely with algorithm engineers and systems architects to
identify opportunities for parallelization
.
Develop and maintain GPU-accelerated modules, with a focus on
performance tuning and memory management
.
Use profiling and debugging tools to
benchmark, troubleshoot, and optimize GPU workloads
.
Collaborate with cross-functional teams on integrating GPU-accelerated code into larger software systems.
Contribute to documentation, code reviews, and performance analyses.
Requirements:
Technical Skills:
Strong proficiency in
C and C++
(including modern C++ standards).
Deep understanding of
GPU architecture and parallel computing
concepts.
Proven experience in
porting CPU code to GPU
using
CUDA, OpenCL, or similar technologies
.
Familiarity with
GPU memory models, kernels, warp/thread management
, and
optimization techniques
.
Experience with performance profiling tools (e.g., NVIDIA Nsight, Visual Profiler, VTune, etc.).
Solid understanding of
multithreading, memory hierarchy
, and
numerical computation
.
Experience:
3–7+ years of professional experience in
systems software or performance-critical application development
.
At least 2 years of working with
GPU programming or porting workloads to GPU platforms
.
Experience in domains such as
scientific computing, simulations, video processing, rendering, or machine learning
is a plus.
Preferred Qualifications:
Experience with cross-platform development (Linux, Windows).
Familiarity with Python or scripting languages for workflow automation or testing.
Exposure to low-level performance tuning (SIMD, cache optimization).
Knowledge of other accelerators (e.g., Vulkan, Metal, DirectCompute, or FPGA) is a bonus.
*Please send your resume in English
Salary: 13,100 - 21,900 BRL