nncase/targets
huochenghai 338ba1070d
Feature/cpu (#1019)
* add layernorm

* pass reduce

* add comment

* add layer norm test

* fix  layernorm

* fix layernorm

* add demo2

* fix build / add view

* update layernorm

* support layernorm of llama

* fix build

* add demo2

* pass ym

* pass demo2

* fix onnx external data importer

* fuse MHA of llama

* add more cpu kernels

* update MHA fusion

* reorder MHA weights

* add demo3

* add demo3 compute statge 1

* fix build

* fix __tdma_all_sync_apply

* add to v35

* dump const

* update demo3 golden

* compiled

* support multiple output compare

* fix MHA kernel

* resplit v2

* fix MHA kernel

* to v26

* push

* fix double free

* fix mha kernel

* fix  V35

* fix all

* support rmsnrom

* fix v22

* fix v22 v10

* pass v28

* pass v43

* remove dump

* add other part

* pass all llama65b decoder layers

* pass gather

* open 32 threads for demo4

* update binary/unary with external op

* fix codes using stdlib

* update kernel inputs

* Fix gather

* refactor cpu cmodel

* update demo head

* pass graph to tir

* fix head main

* pass norm case

* update demo names

* add xpu source gen

* Fix head kernel segment fault

* fix cost evaluator

* Fix head kernel cos similarity

* decoder layer pass input layernorm

* Add uanry demo

* pass v30 of decoder layer

* fix softmax

* Enable ImmOutput

* fix malloc

* remove debug macro

* Add ImmOut

* fix tdma store

* refactor cpu runtime

* refactor method table

* fix cpu test

* refactor auto distributed

* update cpu test

* fix rdata

* update cpu test with rdata

* fix typeinfer

* add XPU Op layernorm

* update layernorm cost

* fix cost evaluator

* fix layernorm

* add partial resplit

* add rvv matmul

* add codegen of cpu gather

* add concat/slice codegen

* merge

* add codegen of cpu softmax

* update slice cpu case

* fix slice

* fix cpu concat

* fix cpu concat

* Apply code-format changes

* fix build

* Apply code-format changes

* add codegen of transpose

* add reshape

* pass reshape2

* update stackvm

* merge

* fix build

* update compile

* fix to slicing

* fix negative axis

* fix matmul evaluator

* add NormAxis

* fix ToSlice

* fix matmul

* add GatherReduceScatter

* fix ToSlice

* refactor auto dist

* fix boxing partial to slice codegen

* softmax support split on axis

* add conv2d cpu kernel

* disable outter split on inner splited axis

* fix binary distributedtype infer

* fixGetPartialCandidateNDSBPs

* pass cpu conv2d

* support dilated conv2d

* add mha pattern

* add combine reshape transpose

* fix mha fusion/ add rules

* fix rdata map dispose

* add xpu reduce arg

* Apply code-format changes

* add VAE fusion

* fuse VAE

* support xpu instance norm

* Apply code-format changes

* add reduce arg

* Apply code-format changes

* fix to tir keep vars order

* Apply code-format changes

* add XPU resize

* Apply code-format changes

* fix resize cpu kernel op

* fix Resize

* Update layernom op for test

* Apply code-format changes

* fix conv2d kernel

* fix boxing with reshape

* fix build

* fix pytest compare

* add gelu kernels

* add xpu cast

* fix swish type infer

* support xpu expand

* Update layernorm rvv codes

* fix binary broadcast with distributed broadcast

* support multi outputs

* fix single output

* fix new linked section

* fuse Unet

* add cos dump

* fix build

* speed up onnx external data load

* add typeinfer case for binary/matmul

* move matmul rvv to kernels

* fix conv2d kernel

* fix Unet Fusion

* optimize dynamic onnx

* change fusion counter

* fix conv2d if split is partial

* split conv to conv+bias+clamp, and add xpu clamp

* update fusion merger

* fix slice with negative axis

* llama-4-decoder pass (x86/rv64)

* text encoder/vae decoder pass (x86/rv64)

* fix conan config

* Fix cpu/test compile

* fix cmake config

* fix synax err

* fix synax err

* normallize axes of slice

* disable module cpu on windows

* donot split softmax on axis

* add softmax kernel test

* add rvv instance norm

* clean modules dir

* add rvv clamp

* Clean modules dir

* fix match result

* Apply code-format changes

* Clean Tests

* fix csproj

* fix buffer schedule

* Add unet pytest

* add target's commands

* fix buffer and memspan hashcode and equals

* fix unitest

* fix unittest

* fix test_cli output dump

* fix command line

* fix type infer

* fix format

* fix all test

* Apply code-format changes

* fix merge

* Apply code-format changes

* fix merge

* fix runtime build

* fix kernel test build

* Apply code-format changes

* fix use mean

* Apply code-format changes

* optimize dot dump

* fix merge

* fix output when test cli

---------

Co-authored-by: xhuohai <xhuohai@users.noreply.github.com>
Co-authored-by: 郑启航 <597323109@qq.com>
Co-authored-by: zhen8838 <zhen8838@users.noreply.github.com>
Co-authored-by: lerenhua <2532375005@qq.com>
Co-authored-by: liuzhiming <liuzhiming@canaan-creative.com>
Co-authored-by: liuzm6217-jianan <liuzm6217-jianan@users.noreply.github.com>
2023-11-07 10:13:25 +08:00
..
cpu Add auto code format (#745) 2022-12-23 20:22:23 +08:00
k210 Add auto code format (#745) 2022-12-23 20:22:23 +08:00
vulkan Add auto code format (#745) 2022-12-23 20:22:23 +08:00