nncase

History

huochenghai 338ba1070d Feature/cpu (#1019 ) * add layernorm * pass reduce * add comment * add layer norm test * fix layernorm * fix layernorm * add demo2 * fix build / add view * update layernorm * support layernorm of llama * fix build * add demo2 * pass ym * pass demo2 * fix onnx external data importer * fuse MHA of llama * add more cpu kernels * update MHA fusion * reorder MHA weights * add demo3 * add demo3 compute statge 1 * fix build * fix __tdma_all_sync_apply * add to v35 * dump const * update demo3 golden * compiled * support multiple output compare * fix MHA kernel * resplit v2 * fix MHA kernel * to v26 * push * fix double free * fix mha kernel * fix V35 * fix all * support rmsnrom * fix v22 * fix v22 v10 * pass v28 * pass v43 * remove dump * add other part * pass all llama65b decoder layers * pass gather * open 32 threads for demo4 * update binary/unary with external op * fix codes using stdlib * update kernel inputs * Fix gather * refactor cpu cmodel * update demo head * pass graph to tir * fix head main * pass norm case * update demo names * add xpu source gen * Fix head kernel segment fault * fix cost evaluator * Fix head kernel cos similarity * decoder layer pass input layernorm * Add uanry demo * pass v30 of decoder layer * fix softmax * Enable ImmOutput * fix malloc * remove debug macro * Add ImmOut * fix tdma store * refactor cpu runtime * refactor method table * fix cpu test * refactor auto distributed * update cpu test * fix rdata * update cpu test with rdata * fix typeinfer * add XPU Op layernorm * update layernorm cost * fix cost evaluator * fix layernorm * add partial resplit * add rvv matmul * add codegen of cpu gather * add concat/slice codegen * merge * add codegen of cpu softmax * update slice cpu case * fix slice * fix cpu concat * fix cpu concat * Apply code-format changes * fix build * Apply code-format changes * add codegen of transpose * add reshape * pass reshape2 * update stackvm * merge * fix build * update compile * fix to slicing * fix negative axis * fix matmul evaluator * add NormAxis * fix ToSlice * fix matmul * add GatherReduceScatter * fix ToSlice * refactor auto dist * fix boxing partial to slice codegen * softmax support split on axis * add conv2d cpu kernel * disable outter split on inner splited axis * fix binary distributedtype infer * fixGetPartialCandidateNDSBPs * pass cpu conv2d * support dilated conv2d * add mha pattern * add combine reshape transpose * fix mha fusion/ add rules * fix rdata map dispose * add xpu reduce arg * Apply code-format changes * add VAE fusion * fuse VAE * support xpu instance norm * Apply code-format changes * add reduce arg * Apply code-format changes * fix to tir keep vars order * Apply code-format changes * add XPU resize * Apply code-format changes * fix resize cpu kernel op * fix Resize * Update layernom op for test * Apply code-format changes * fix conv2d kernel * fix boxing with reshape * fix build * fix pytest compare * add gelu kernels * add xpu cast * fix swish type infer * support xpu expand * Update layernorm rvv codes * fix binary broadcast with distributed broadcast * support multi outputs * fix single output * fix new linked section * fuse Unet * add cos dump * fix build * speed up onnx external data load * add typeinfer case for binary/matmul * move matmul rvv to kernels * fix conv2d kernel * fix Unet Fusion * optimize dynamic onnx * change fusion counter * fix conv2d if split is partial * split conv to conv+bias+clamp, and add xpu clamp * update fusion merger * fix slice with negative axis * llama-4-decoder pass (x86/rv64) * text encoder/vae decoder pass (x86/rv64) * fix conan config * Fix cpu/test compile * fix cmake config * fix synax err * fix synax err * normallize axes of slice * disable module cpu on windows * donot split softmax on axis * add softmax kernel test * add rvv instance norm * clean modules dir * add rvv clamp * Clean modules dir * fix match result * Apply code-format changes * Clean Tests * fix csproj * fix buffer schedule * Add unet pytest * add target's commands * fix buffer and memspan hashcode and equals * fix unitest * fix unittest * fix test_cli output dump * fix command line * fix type infer * fix format * fix all test * Apply code-format changes * fix merge * Apply code-format changes * fix merge * fix runtime build * fix kernel test build * Apply code-format changes * fix use mean * Apply code-format changes * optimize dot dump * fix merge * fix output when test cli --------- Co-authored-by: xhuohai <xhuohai@users.noreply.github.com> Co-authored-by: 郑启航 <597323109@qq.com> Co-authored-by: zhen8838 <zhen8838@users.noreply.github.com> Co-authored-by: lerenhua <2532375005@qq.com> Co-authored-by: liuzhiming <liuzhiming@canaan-creative.com> Co-authored-by: liuzm6217-jianan <liuzm6217-jianan@users.noreply.github.com>		2023-11-07 10:13:25 +08:00
..
cpu	Add auto code format (#745 )	2022-12-23 20:22:23 +08:00
k210	Add auto code format (#745 )	2022-12-23 20:22:23 +08:00
vulkan	Add auto code format (#745 )	2022-12-23 20:22:23 +08:00