Saif M. Khan and Alexander Mann, "AI Chips: What They Are and Why They Matter" (Center for Security and Emerging Technology, April 2020), cset.georgetown.edu/research/ai-chips-what-they-are-and-why-they-matter/. https://doi.org/10.51593/20190014
Zhang, Qiyang, et al. "A comprehensive benchmark of deep learning libraries on mobile devices." Proceedings of the ACM Web Conference 2022. 2022.
https://dl.acm.org/doi/abs/10.1145/3485447.3512148
Courville, Vanessa, and Vahid Partovi Nia. "Deep learning inference frameworks for arm cpu." Journal of Computational Vision and Imaging Systems 5.1 (2019): 3-3.
https://openjournals.uwaterloo.ca/index.php/vsl/article/download/1645/2014
Ignatov, Andrey, et al. "Learned smartphone isp on mobile npus with deep learning, mobile ai 2021 challenge: Report." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Gómez-Luna, Juan, et al. "Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture." arXiv preprint arXiv:2105.03814 (2021).
https://arxiv.org/abs/2105.03814
Jiang, Jiantong, et al. "Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs." Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2020.
https://dl.acm.org/doi/abs/10.1145/3373087.3375313
Ignatov, Andrey, et al. "Ai benchmark: All about deep learning on smartphones in 2019." 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019.
https://ieeexplore.ieee.org/abstract/document/9022101
Buch, Michael, et al. "Ai tax in mobile socs: End-to-end performance analysis of machine learning in smartphones." 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 2021.
Weights and Biases
wandb.ai/site
Tensorboard
www.tensorflow.org/tensorboard
Tutorial Tensorboard
https://www.tensorflow.org/tensorboard/get_started
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
https://arxiv.org/abs/1602.07360
Zhang, Xiangyu, et al. "Shufflenet: An extremely efficient convolutional neural network for mobile devices." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
https://arxiv.org/abs/1704.04861
Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Gou, Jianping, et al. "Knowledge distillation: A survey." International Journal of Computer Vision 129.6 (2021): 1789-1819.
https://link.springer.com/article/10.1007/s11263-021-01453-z
Dive into Deep Learning. 14.2 - Computer Vision - Fine-Tuning (https://d2l.ai/chapter_computer-vision/fine-tuning.html)
MLCommons: https://mlcommons.org/en/
Modelo de arquitetura GPUshttps://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Pascal (2016): https://developer.nvidia.com/pascal
Volta (2018): https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
Turing (2019): https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
Ampere (2020): https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html
Hopper (2022): https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepaper-hopper
ISSCC 2020 Tutorial: How to Evaluate Deep Neural Network Processors: https://eems.mit.edu/wp-content/uploads/2020/09/ieee_mssc_summer2020.pdf
Jacob, Benoit; et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. 2017 (https://arxiv.org/abs/1712.05877)
Xiangxiang Chu, Liang Li, Bo Zhang. Make Rep VGG Greater Again: A Quantization-aware Approach. 2023 (https://arxiv.org/pdf/2212.01593)
Angshuman Parashar, Yannan Nellie Wu, Po-An Tsai, Vivienne Sze, Joel S. Emer. Timeloop/Accelergy Tutorial: Tools for Evaluating Deep Neural Network Accelerator Designs. Tutorial (NVidia/MIT) (2020) (https://accelergy.mit.edu/tutorial.html)
Quantization PyTorch
https://pytorch.org/docs/stable/quantization.html
https://pytorch.org/tutorials/recipes/quantization.html
Pruning PyTorch
https://pytorch.org/tutorials/intermediate/pruning_tutorial.html
NVIDIA cuSPARSE
https://docs.nvidia.com/cuda/cusparse/
Quantization/Pruning Tensorflow
https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide
https://www.tensorflow.org/lite/performance/post_training_quantization
Dive into deep learning. Computational Performance
https://d2l.ai/chapter_computational-performance/index.html
Torchscript
https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html
https://pytorch.org/tutorials/recipes/torchscript_inference.html
https://pytorch.org/tutorials/advanced/cpp_export.html
Docker: https://www.docker.com/
TF: https://www.tensorflow.org/
TFLITE: https://www.tensorflow.org/lite
ONNX: https://onnx.ai/
Repositório da disciplina: https://github.com/TIC-13/luxai_ai_performance
Sze, Vivienne, et al. "Efficient processing of deep neural networks: A tutorial and survey." Proceedings of the IEEE 105.12 (2017): 2295-2329. https://ieeexplore.ieee.org/abstract/document/8114708
Zhang, Aston and Lipton, Zachary C. and Li Mu and Smola, Alexander J. Dive into Deep Learning. Cambridge University Press. d2l.ai, 2023