최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기반도체공학회 논문지 = Transactions on semiconductor engineering, v.1 no.1, 2023년, pp.23 - 30
류은지 (Pohang University of Science and Technology) , 이영주 (Pohang University of Science and Technology)
With the recent development of Generative AI technology by IT giants, the size of the transformer model is increasing exponentially over trillion won. In order to continuously enable these AI services, it is essential to reduce the weight of the model. In this paper, we find a hardware-friendly stru...
A. Vaswani et al., "Attention is all you need," in Proc.?of NeurIPS, 2017.?
T. Brown et al., "Language models are few-shot learners," in Proc. of NeurIPS, 2020, pp. 1877-1901.?
A. Chowdhery et al., "Palm: Scaling language modeling with pathways," arXiv preprint arXiv:2204.02311,?2022.?
S. Han, H. Mao, and W. J. Dally, "Deep compression:?Compressing deep neural networks with pruning,?trained quantization and Huffman coding," arXiv preprint arXiv:1510.00149, 2015.?
Sutskever, I., Vinyals, O., & Le, Q. V.. "Sequence to?sequence learning with neural networks." Advances?in neural information processing systems 27 (2014).?
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova,?"Bert: Pre-training of deep bidirectional transformers?for language understanding," in Proc. of NAACL,?2019, pp. 4171-4186.?
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., &?Bowman, S. R. "GLUE: A multi-task benchmark and?analysis platform for natural language understanding." arXiv preprint arXiv:1804.07461 (2018).?
Paperno, D., Kruszewski, G., Lazaridou, A., Pham, Q.?N., Bernardi, R., Pezzelle, S., ... & Fernandez, R. "The?LAMBADA dataset: Word prediction requiring a?broad discourse context." arXiv preprint?arXiv:1606.06031 (2016).?
G. Park, B. Park, S. J. Kwon, B. Kim, Y. Lee, and D.?Lee, "nuqmm: Quantized matmul for efficient inference of large-scale generative language models,"?arXiv preprint arXiv:2206.09557, 2022.?
Dettmers, T., Lewis, M., Belkada, Y., & Zettlemoyer,?L. "Llm. int8 (): 8-bit matrix multiplication for transformers at scale." arXiv preprint arXiv:2208.07339?(2022).?
Yao, Z., Yazdani Aminabadi, R., Zhang, M., Wu, X.,?Li, C., & He, Y. "Zeroquant: Efficient and affordable post-training quantization for large-scale transformers." Advancesin Neural Information Processing Systems 35 (2022): 27168-27183.?
Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., &?Han, S. "Smoothquant: Accurate and efficient post-training quantization for large language models." International Conference on Machine Learning. PMLR,?2023.?
Gou, Jianping, et al. "Knowledge distillation: A survey." International Journal of Computer Vision 129?(2021): 1789-1819.?
Gu, Y., Dong, L., Wei, F., & Huang, M. "Knowledge?Distillation of Large Language Models." arXiv preprint arXiv:2306.08543 (2023).?
Frantar, E., & Alistarh, D. "SparseGPT: Massive?Language Models Can Be Accurately Pruned in OneShot." (2023).?
Ma, X., Fang, G., & Wang, X. "LLM-Pruner: On the?Structural Pruning of Large Language Models." arXiv?preprint arXiv:2305.11627 (2023).?
Zhang, M., Shen, C., Yang, Z., Ou, L., Yu, X., &?Zhuang, B. "Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning." arXiv preprint?arXiv:2305.18403 (2023).?
V. Sanh, T. Wolf, and A. Rush, "Movement pruning:?Adaptive sparsity by fine-tuning," in Proc. of NeurIPS, 2020, pp. 20 378-20 389.?
Babak Hassibi, David G Stork, and Gregory J Wolff.?Optimal brain surgeon and general network pruning.?In IEEE International Conference on Neural Networks, 1993.?
Elias Frantar, Sidak Pal Singh, and Dan Alistarh. Optimal Brain Compression: A framework for accurate?post-training quantization and pruning. arXiv preprint?arXiv:2208.11580, 2022.?
Y. He, X. Zhang, and J. Sun, "Channel pruning for?accelerating very deep neural networks," in Proc. of?ICCV, 2017, pp. 1389-1397.?
M. Zhu, T. Zhang, Z. Gu, and Y. Xie, "Sparse tensor?core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus," in Proc.?of MICRO, 2019, pp. 359-371.?
E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I.?Titov, "Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned,"?in Proc. of ACL, 2019, pp. 5797-5808.?
P. Michel, O. Levy, and G. Neubig, "Are sixteen?heads really better than one?" Proc. of NeurIPS, vol.?32, 2019.?
Lagunas, Francois, et al. "Block pruning for faster?transformers." arXiv preprint arXiv:2109.04838?(2021).?
E. Yoo, G. Park, J. Min, S. Kwon, B. Park, D. Lee,?and Y. Lee*, "TF-MVP: Novel sparsity-aware transformer accelerator with mixed-length vector pruning," Design Automation Conference (DAC), San?Francis-co, CA, USA, July 2023.
J. Park, H. Yoon, D. Ahn, J. Choi, and J.-J. Kim, "Optimus: Optimized matrix multiplication structure for?transformer neural network accelerator," Proc. of?MLSys, pp. 363-378, 2020.?
A. Parashar et al., "Scnn: An accelerator for compressed-sparse convolutional neural networks," ACM?SIGARCH computer architecture news, vol. 45, no. 2,?pp. 27-40, 2017.?
S. Zhang et al., "Cambricon-x: An accelerator for?sparse neural networks," in Proc. of MICRO. IEEE,?2016, pp. 1-12.?
S. Moon, H. Lee, Y. Byun, J. Park, J. Joe, S. Hwang,?S. Lee, and Y. Lee*, "FPGA-based sparsity-aware?CNN accelerator for noise-resilient edge-level image?recognition," IEEE Asian Solid-State Circuits Confer-ence (A-SSCC), Macao, China, Nov. 2019, pp.?205-208.?
H. Kwon, Y. Byun, S. Kang, and Y. Lee*, "CHAMP:?Channel merging process for cost-efficient highly-pruned CNN acceleration," IEEE Transactions on?Circuits and Systems I: Regular vol. 69, no. 8, pp.?3308-3319, Aug. 2022.?
Y. Byun, S. Moon, B. Park, S. Kwon, D. Lee, G.?Park, E. Yoo, J. Min and Y. Lee*, "Sparsity-Aware?Memory Interface Architecture using Stacked XOR-Net Compression for Accelerating Pruned-DNN?Models," Proceedings of Machine Learning and Systems, Miami, FL, USA, June 2023.?
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.