SSM conv kernel — RVV vectorised

ggml_compute_forward_ssm_conv_f32_rvv · RISC-V Vector · LMUL=m4 · vl=4 · SpacemiT X60 VLEN=256

conv_x active column
weight active column
vsum lanes
output stored
conv_x (src0)
vlse32 stride = ncs×4 bytes — gathers same column from all vl rows
weights (src1)
vlse32 stride = nc×4 bytes
output (dst)
vse32 — one instruction writes all vl lanes
vsum register
LMUL=m4 · vl=4 lanes · f32
RVV instruction
Scalar equivalent
1 / 6