深海游弋的鱼 – 智障儿童欢乐多

ubuntu 22.04.3 执行更新报错 Segmentation fault (core dumped)

最近有台设备意外关机重启，经过磁盘文件损坏修复，可以成功进入系统。但是执行更新命令的时候报错 Segmentation fault (core dumped)，如下：

$ sudo apt-get update
命中:1 http://security.ubuntu.com/ubuntu noble-security InRelease            
命中:2 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble InRelease            
命中:3 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-updates InRelease
命中:4 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-backports InRelease
错误：已到超时限制
Segmentation fault (core dumped)
正在读取软件包列表... 完成
E: Problem executing scripts APT::Update::Post-Invoke-Success 'if /usr/bin/test -w /var/lib/command-not-found/ -a -e /usr/lib/cnf-update-db; then /usr/lib/cnf-update-db > /dev/null; fi'
E: Sub-process returned an error code

$ sudo apt-get update

命中:1 http://security.ubuntu.com/ubuntu noble-security InRelease

命中:2 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble InRelease

命中:3 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-updates InRelease

命中:4 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-backports InRelease

错误：已到超时限制

Segmentation fault (core dumped)

正在读取软件包列表... 完成

E: Problem executing scripts APT::Update::Post-Invoke-Success 'if /usr/bin/test -w /var/lib/command-not-found/ -a -e /usr/lib/cnf-update-db; then /usr/lib/cnf-update-db > /dev/null; fi'

E: Sub-process returned an error code

观察输出日志，锁定文件 /usr/lib/cnf-update-db，于是查看文件内容，发现是个 Python3 的脚本，里面的内容如下：

$ cat /usr/lib/cnf-update-db

#!/usr/bin/python3

import apt_pkg
import glob
import logging
import os
import sys

from CommandNotFound.db.creator import DbCreator
from CommandNotFound import CommandNotFound


if __name__ == "__main__":
    if "--debug" in sys.argv[1:]:
        logging.basicConfig(level=logging.DEBUG)
    elif "--verbose" in sys.argv[1:]:
        logging.basicConfig(level=logging.INFO)

    apt_pkg.init_config()
    db = CommandNotFound.dbpath
    if not os.access(os.path.dirname(db), os.W_OK):
        print("datbase directory %s not writable" % db)
        sys.exit(0)

    if apt_pkg.config.find_b("Acquire::IndexTargets::deb::CNF::DefaultEnabled", True):
        command_files = glob.glob("/var/lib/apt/lists/*Commands-*")
    else:
        command_files = glob.glob("/var/lib/apt/lists/*Contents*")
    if len(command_files) > 0:
        umask = os.umask(0o22)
        col = DbCreator(command_files)
        col.create(db)
        os.umask(umask)
    else:
        print("Could not find any command metadata")
        print("Please run 'apt update' before using this command.")

$ cat /usr/lib/cnf-update-db

#!/usr/bin/python3

import apt_pkg

import glob

import logging

import os

import sys

from CommandNotFound.db.creator import DbCreator

from CommandNotFound import CommandNotFound

if __name__ == "__main__":

if "--debug" in sys.argv[1:]:

logging.basicConfig(level=logging.DEBUG)

elif "--verbose" in sys.argv[1:]:

logging.basicConfig(level=logging.INFO)

apt_pkg.init_config()

db = CommandNotFound.dbpath

if not os.access(os.path.dirname(db), os.W_OK):

print("datbase directory %s not writable" % db)

sys.exit(0)

if apt_pkg.config.find_b("Acquire::IndexTargets::deb::CNF::DefaultEnabled", True):

command_files = glob.glob("/var/lib/apt/lists/*Commands-*")

else:

command_files = glob.glob("/var/lib/apt/lists/*Contents*")

if len(command_files) > 0:

umask = os.umask(0o22)

col = DbCreator(command_files)

col.create(db)

os.umask(umask)

else:

print("Could not find any command metadata")

print("Please run 'apt update' before using this command.")

于是逐行执行脚本，发现执行到 from CommandNotFound.db.creator import DbCreator 出现闪退。

观察系统日志：

$ sudo dmesg

...........................
[14247.387532] python3[36129]: segfault at 0 ip 000071a2374c10f0 sp 00007ffc59f80c28 error 6 in libsqlite3.so.0.8.6[71a23744b000+10b000] likely on CPU 1 (core 1, socket 0)

$ sudo dmesg

...........................

[14247.387532] python3[36129]: segfault at 0 ip 000071a2374c10f0 sp 00007ffc59f80c28 error 6 in libsqlite3.so.0.8.6[71a23744b000+10b000] likely on CPU 1 (core 1, socket 0)

从系统日志上可以看到 libsqlite3 调用的数据库出现异常了，要么是数据库出问题，要么安装包出问题。更高概率是某个数据库文件出现问题了，那到底是哪个数据库文件呢？

我们不妨卸载重装一下 command-not-found，如下：

$ sudo apt-get remove --purge command-not-found
正在读取软件包列表... 完成
正在分析软件包的依赖关系树... 完成
正在读取状态信息... 完成                 
下列软件包是自动安装的并且现在不需要了：
  python3-commandnotfound python3-gdbm
使用'sudo apt autoremove'来卸载它(它们)。
下列软件包将被【卸载】：
  command-not-found*
升级了 0 个软件包，新安装了 0 个软件包，要卸载 1 个软件包，有 0 个软件包未被升级。
解压缩后将会空出 29.7 kB 的空间。
您希望继续执行吗？ [Y/n] 
(正在读取数据库 ... 系统当前共安装有 228019 个文件和目录。)
正在卸载 command-not-found (23.04.0) ...
(正在读取数据库 ... 系统当前共安装有 228013 个文件和目录。)
正在清除 command-not-found (23.04.0) 的配置文件 ...
dpkg: 警告: 卸载 command-not-found 时，目录 /var/lib/command-not-found 非空，因而不会删除该目录
错误：已到超时限制

$ sudo apt-get remove --purge command-not-found

正在读取软件包列表... 完成

正在分析软件包的依赖关系树... 完成

正在读取状态信息... 完成

下列软件包是自动安装的并且现在不需要了：

python3-commandnotfound python3-gdbm

使用'sudo apt autoremove'来卸载它(它们)。

下列软件包将被【卸载】：

command-not-found*

升级了 0 个软件包，新安装了 0 个软件包，要卸载 1 个软件包，有 0 个软件包未被升级。

解压缩后将会空出 29.7 kB 的空间。

您希望继续执行吗？ [Y/n]

(正在读取数据库 ... 系统当前共安装有 228019 个文件和目录。)

正在卸载 command-not-found (23.04.0) ...

(正在读取数据库 ... 系统当前共安装有 228013 个文件和目录。)

正在清除 command-not-found (23.04.0) 的配置文件 ...

dpkg: 警告: 卸载 command-not-found 时，目录 /var/lib/command-not-found 非空，因而不会删除该目录

错误：已到超时限制

结果问题依旧，那么是不是 /var/lib/command-not-found 这个目录下的数据库导致的呢？我们观察数据库文件：

$ ls /var/lib/command-not-found
commands.db  commands.db.metadata

1 2	$ ls /var/lib/command-not-found commands.db commands.db.metadata

可以看到，这个目录下恰好有 libsqlite3 调用的数据库文件，我们删除这个目录，然后重启系统。

$ sudo apt-get reinstall command-not-found

$ sudo rm -rf /var/lib/command-not-found/*

$ sudo reboot

$ sudo apt-get reinstall command-not-found

$ sudo rm -rf /var/lib/command-not-found/*

$ sudo reboot

结果出乎意外的修复了这个问题。

参考链接

Ubuntu镜像源cn.arichinve.ubuntu.com自动跳转到清华镜像

Ubuntu 更新系统

$ sudo apt-get update

1	$ sudo apt-get update

在执行后，发现输出如下信息：

命中:1 http://security.ubuntu.com/ubuntu noble-security InRelease            
命中:2 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble InRelease            
命中:3 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-updates InRelease
命中:4 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-backports InRelease

命中:1 http://security.ubuntu.com/ubuntu noble-security InRelease

命中:2 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble InRelease

命中:3 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-updates InRelease

命中:4 http://mirrors.tuna.tsinghua.edu.cn/ubuntu noble-backports InRelease

我明明记得源配置的是 cn.archive.ubuntu.com ，但是发现，系统更新时，自动去找了清华大学的Ubuntu源。

查看系统的 /etc/apt/sources.list 里面配置的也是 cn.archive.ubuntu.com 。

带着很多问号，在浏览器打开 https://cn.archive.ubuntu.com，发现网站已经自动跳转到清华镜像站了。

如果配置了防火墙过滤的场景需要特别注意这种情况。

参考链接

Ubuntu镜像源cn.arichinve.ubuntu.com不可用原因分析和解决

利用Flutter构建无界面交互的后台服务应用,涵盖Isolate并发编程、平台通道进阶使用、后台任务调度

1. 当Flutter遇见无界面服务

"那个开发跨平台UI的神器，居然能用来写后台服务？"这是我在2023年GitHub Trending上看到Flutter新增的后台执行功能时发出的惊叹。传统的Flutter开发总是与Material Design、Widget树等可视化元素紧密相连，但今天我们要探讨的是一个完全不同的维度——利用Flutter构建不需要任何用户界面的后台服务应用。

这种模式特别适合需要长期驻留的任务处理场景，比如数据同步、定时巡检、消息队列消费等。想象一下，你的手机应用在后台默默完成照片云端备份，或者智能家居网关持续处理传感器数据，这些都是无界面服务的典型应用场景。

2. 技术实现基础架构

2.1 Isolate的深度应用

// 后台数据处理服务示例（技术栈：Flutter 3.13/Dart 3.0）
import 'dart:isolate';

void backgroundService(SendPort mainSendPort) async {
  final receivePort = ReceivePort();
  mainSendPort.send(receivePort.sendPort);

  await for (final message in receivePort) {
    if (message == 'process_data') {
      // 模拟数据处理耗时操作
      final result = await _batchProcessData();
      mainSendPort.send(result);
    }
  }
}

Future<String> _batchProcessData() async {
  // 这里可以接入实际的数据处理逻辑
  await Future.delayed(Duration(seconds: 3));
  return 'Processed 500 records at ${DateTime.now()}';
}

// 主Isolate启动代码
void main() async {
  final mainReceivePort = ReceivePort();
  await Isolate.spawn(backgroundService, mainReceivePort.sendPort);
  
  final completer = Completer<SendPort>();
  mainReceivePort.listen((message) {
    if (message is SendPort) {
      completer.complete(message);
    } else {
      print('后台服务返回: $message');
    }
  });

  final serviceSendPort = await completer.future;
  serviceSendPort.send('process_data');
}

// 后台数据处理服务示例（技术栈：Flutter 3.13/Dart 3.0）

import 'dart:isolate';

void backgroundService(SendPort mainSendPort) async {

final receivePort = ReceivePort();

mainSendPort.send(receivePort.sendPort);

await for (final message in receivePort) {

if (message == 'process_data') {

// 模拟数据处理耗时操作

final result = await _batchProcessData();

mainSendPort.send(result);

}

Future<String> _batchProcessData() async {

// 这里可以接入实际的数据处理逻辑

await Future.delayed(Duration(seconds: 3));

return 'Processed 500 records at ${DateTime.now()}';

}

// 主Isolate启动代码

void main() async {

final mainReceivePort = ReceivePort();

await Isolate.spawn(backgroundService, mainReceivePort.sendPort);

final completer = Completer<SendPort>();

mainReceivePort.listen((message) {

if (message is SendPort) {

completer.complete(message);

} else {

print('后台服务返回: $message');

}

});

final serviceSendPort = await completer.future;

serviceSendPort.send('process_data');

}

这个示例展示了如何创建独立的Isolate进行后台数据处理。通过ReceivePort/SendPort实现进程间通信，主Isolate可以灵活控制后台任务。注意这里使用了Dart 3.0的增强型模式匹配语法，使得消息处理更加优雅。

2.2 后台服务生命周期管理

// 后台服务管理器示例
class BackgroundServiceManager {
  static final _instance = BackgroundServiceManager._internal();
  Isolate? _serviceIsolate;
  SendPort? _servicePort;

  factory BackgroundServiceManager() => _instance;

  BackgroundServiceManager._internal();

  Future<void> startService() async {
    if (_serviceIsolate != null) return;

    final receivePort = ReceivePort();
    _serviceIsolate = await Isolate.spawn(
      _serviceEntry,
      receivePort.sendPort,
      debugName: 'BackgroundService',
    );

    _servicePort = await receivePort.first;
  }

  static void _serviceEntry(SendPort sendPort) {
    final service = BackgroundServiceCore();
    final receivePort = ReceivePort();
    sendPort.send(receivePort.sendPort);

    receivePort.listen((message) {
      if (message == 'health_check') {
        sendPort.send(service.getHealthStatus());
      }
    });
  }
}

// 后台服务管理器示例

class BackgroundServiceManager {

static final _instance = BackgroundServiceManager._internal();

Isolate? _serviceIsolate;

SendPort? _servicePort;

factory BackgroundServiceManager() => _instance;

BackgroundServiceManager._internal();

Future<void> startService() async {

if (_serviceIsolate != null) return;

final receivePort = ReceivePort();

_serviceIsolate = await Isolate.spawn(

_serviceEntry,

receivePort.sendPort,

debugName: 'BackgroundService',

);

_servicePort = await receivePort.first;

}

static void _serviceEntry(SendPort sendPort) {

final service = BackgroundServiceCore();

final receivePort = ReceivePort();

sendPort.send(receivePort.sendPort);

receivePort.listen((message) {

if (message == 'health_check') {

sendPort.send(service.getHealthStatus());

}

});

}

这个管理器类实现了服务的单例管理、健康检查等核心功能。通过隔离的构造函数参数控制，确保后台服务的稳定运行。特别要注意Isolate的异常捕获机制，建议在实战中增加错误处理回调。

3. 关键技术深度解析

3.1 平台通道的进阶使用

// Android后台服务绑定示例（需配合Kotlin实现）
const _channel = MethodChannel('com.example/background');

Future<void> startAndroidForegroundService() async {
  try {
    await _channel.invokeMethod('startForegroundService', {
      'title': '数据同步服务',
      'content': '正在同步用户数据...',
      'icon': 'ic_stat_sync',
    });
  } on PlatformException catch (e) {
    print('服务启动失败: ${e.message}');
  }
}

// Android后台服务绑定示例（需配合Kotlin实现）

const _channel = MethodChannel('com.example/background');

Future<void> startAndroidForegroundService() async {

try {

await _channel.invokeMethod('startForegroundService', {

'title': '数据同步服务',

'content': '正在同步用户数据...',

'icon': 'ic_stat_sync',

});

} on PlatformException catch (e) {

print('服务启动失败: ${e.message}');

}

在Android端需要实现Foreground Service时，可以通过平台通道调用原生API。这里演示了如何启动前台服务并传递通知参数，注意不同Android版本的后台限制差异，建议结合WorkManager实现兼容方案。

3.2 后台任务调度策略

// 智能任务调度器实现
class TaskScheduler {
  final _tasks = PriorityQueue<TaskItem>();
  Timer? _schedulerTimer;

  void scheduleTask(TaskItem task) {
    _tasks.add(task);
    _scheduleNext();
  }

  void _scheduleNext() {
    if (_tasks.isEmpty) return;
    
    final nextTask = _tasks.first;
    final now = DateTime.now();
    final delay = nextTask.scheduleTime.difference(now);

    _schedulerTimer?.cancel();
    _schedulerTimer = Timer(delay, () {
      _executeTask(nextTask);
      _tasks.remove(nextTask);
      _scheduleNext();
    });
  }

  void _executeTask(TaskItem task) {
    Isolate.run(() => task.execute());
  }
}

// 智能任务调度器实现

class TaskScheduler {

final _tasks = PriorityQueue<TaskItem>();

Timer? _schedulerTimer;

void scheduleTask(TaskItem task) {

_tasks.add(task);

_scheduleNext();

}

void _scheduleNext() {

if (_tasks.isEmpty) return;

final nextTask = _tasks.first;

final now = DateTime.now();

final delay = nextTask.scheduleTime.difference(now);

_schedulerTimer?.cancel();

_schedulerTimer = Timer(delay, () {

_executeTask(nextTask);

_tasks.remove(nextTask);

_scheduleNext();

});

}

void _executeTask(TaskItem task) {

Isolate.run(() => task.execute());

}

这个任务调度器实现了优先级队列管理和智能延迟执行，通过Isolate.run简化了并发任务处理。在真实场景中，需要结合设备状态（网络、电量等）动态调整执行策略。

4. 典型应用场景剖析

某电商应用的实践案例：他们的价格监控服务需要每小时抓取30个竞品网站的价格数据。传统方案使用服务器端执行，但遇到动态反爬机制时效果不佳。改用Flutter无界面服务后：

利用设备分散执行降低IP封锁风险
客户端直接处理数据减少服务器压力
离线时自动缓存任务，网络恢复后批量提交
用户隐私数据全程不离开设备

实测结果显示数据采集成功率从68%提升至92%，服务器成本降低40%。这个案例充分体现了客户端计算的优势。

5. 技术方案优劣评估

优势维度：

开发效率：复用现有Flutter代码库
跨平台一致性：一套代码覆盖Android/iOS
资源利用：客户端计算减轻服务器负担
隐私安全：敏感数据无需离开用户设备

挑战要点：

后台执行时间限制（iOS严格限制30秒）
设备资源的不确定性（电量、网络波动）
调试复杂度高于传统服务端开发
平台政策风险（后台服务滥用可能导致应用下架）

某金融App的教训：他们在Android端过度使用后台定位服务，导致应用被Google Play临时下架。这提示我们需要合理设计后台服务的触发频率和资源占用。

6. 开发注意事项清单

电量优化策略：使用Android的JobScheduler或iOS的BGTaskScheduler
内存警戒线：Android后台进程建议不超过40MB内存占用
平台政策红线：仔细阅读Apple的《App Store审核指南》第4章
异常熔断机制：连续失败3次的任务应进入休眠状态
本地化存储规范：使用Isolate的专用存储空间避免并发冲突
跨版本兼容方案：为Android 12+的精确闹钟权限准备降级方案

某智能家居App的实践：他们为后台服务设计了三级降级策略（立即执行->等待充电->WiFi环境），使设备指令送达率从79%提升至98%。

7. 未来演进方向

Google正在推进的Flutter Background Isolate框架值得关注，该方案将提供：

统一的任务队列管理
跨平台的后台唤醒机制
智能资源调度接口
增强型调试工具链

早期测试显示，新框架可使后台服务的启动时间缩短60%，内存占用降低35%。建议保持对Flutter Dev Channel更新的关注，及时获取最新特性。

参考链接

利用Flutter构建无界面交互的后台服务应用,涵盖Isolate并发编程、平台通道进阶使用、后台任务调度

openKylin v2.0 SP2默认模式不能安装软件包的问题

最近在尝试使用国产的 openKylin ，结果在安装软件包的时候报错"当前模式禁止执行（unpack）操作"，如下图：

继续阅读openKylin v2.0 SP2默认模式不能安装软件包的问题

鸿蒙Flutter开发中集成Webview

主要有两种方案

使用第三方库

使用 flutter_inappwebview 插件，在 pubspec.yaml 文件中配置：

dependencies:

  # 鸿蒙OS
  path_provider_ohos:
    git:
      url: https://gitcode.com/openharmony-tpc/flutter_packages.git
      path: packages/path_provider/path_provider_ohos
  url_launcher_ohos:
    git:
      url: https://gitcode.com/openharmony-tpc/flutter_packages.git
      path: packages/url_launcher/url_launcher_ohos

  ## 支持鸿蒙的WebView, 6.x版本开始支持Windows系统
  flutter_inappwebview:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview

dependency_overrides:
  # 支持鸿蒙的WebView, 6.x版本开始支持Windows系统
  flutter_inappwebview_platform_interface:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview_platform_interface
  flutter_inappwebview_android:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview_android
  flutter_inappwebview_ios:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview_ios
  flutter_inappwebview_macos:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview_macos
  flutter_inappwebview_web:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview_web
  flutter_inappwebview_windows:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview_windows
  flutter_inappwebview_ohos:
    git:
      url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git
      ref: br_v6.1.5_ohos
      path: flutter_inappwebview_ohos

dependencies:

# 鸿蒙OS

path_provider_ohos:

git:

url: https://gitcode.com/openharmony-tpc/flutter_packages.git

path: packages/path_provider/path_provider_ohos

url_launcher_ohos:

git:

url: https://gitcode.com/openharmony-tpc/flutter_packages.git

path: packages/url_launcher/url_launcher_ohos

## 支持鸿蒙的WebView, 6.x版本开始支持Windows系统

flutter_inappwebview:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview

dependency_overrides:

# 支持鸿蒙的WebView, 6.x版本开始支持Windows系统

flutter_inappwebview_platform_interface:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview_platform_interface

flutter_inappwebview_android:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview_android

flutter_inappwebview_ios:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview_ios

flutter_inappwebview_macos:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview_macos

flutter_inappwebview_web:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview_web

flutter_inappwebview_windows:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview_windows

flutter_inappwebview_ohos:

git:

url: https://gitcode.com/openharmony-sig/flutter_inappwebview.git

ref: br_v6.1.5_ohos

path: flutter_inappwebview_ohos

或者使用 fluttertpc_flutter_webview_plugin 插件，在 pubspec.yaml 文件中配置：

dependencies:
  flutter_webview_plugin:
   git:
     url: https://gitcode.com/openharmony-sig/fluttertpc_flutter_webview_plugin
  
  flutter_webview_plugin_ohos:
    git:
      url: https://gitcode.com/openharmony-sig/fluttertpc_flutter_webview_plugin
      path: ohos

dependencies:

flutter_webview_plugin:

git:

url: https://gitcode.com/openharmony-sig/fluttertpc_flutter_webview_plugin

flutter_webview_plugin_ohos:

git:

url: https://gitcode.com/openharmony-sig/fluttertpc_flutter_webview_plugin

path: ohos

编写原生 ArkTS 代码实现 PlatformView

创建 entryablitiy

在 src/main/module.json5 中配置 ablitiy

    "abilities": [
      {
   
        "name": "EntryAbility",
        "srcEntry": "./ets/entryability/EntryAbility.ets",
        "description": "$string:EntryAbility_desc",
        "icon": "$media:icon",
        "label": "$string:EntryAbility_label",
        "startWindowIcon": "$media:icon",
        "startWindowBackground": "$color:start_window_background",
        "exported": true,
        "skills": [
          {
   
            "entities": [
              "entity.system.home"
            ],
            "actions": [
              "action.system.home"
            ]
          }
        ],
        "orientation": "landscape"
      }
    ],

"abilities": [

{

"name": "EntryAbility",

"srcEntry": "./ets/entryability/EntryAbility.ets",

"description": "$string:EntryAbility_desc",

"icon": "$media:icon",

"label": "$string:EntryAbility_label",

"startWindowIcon": "$media:icon",

"startWindowBackground": "$color:start_window_background",

"exported": true,

"skills": [

{

"entities": [

"entity.system.home"

"actions": [

"action.system.home"

]

}

"orientation": "landscape"

}

cat src/main/entryablity/CustomFactory.ets

import { BinaryMessenger } from '@ohos/flutter_ohos/src/main/ets/plugin/common/BinaryMessenger';
import MessageCodec from '@ohos/flutter_ohos/src/main/ets/plugin/common/MessageCodec';
import PlatformViewFactory from '@ohos/flutter_ohos/src/main/ets/plugin/platform/PlatformViewFactory';
import { CustomView } from './CustomView';
import common from '@ohos.app.ability.common';
import PlatformView from '@ohos/flutter_ohos/src/main/ets/plugin/platform/PlatformView';

export class CustomFactory extends PlatformViewFactory {
  message: BinaryMessenger;

  constructor(message: BinaryMessenger, createArgsCodes: MessageCodec<Object>) {
    super(createArgsCodes);
    this.message = message;
  }

  public create(context: common.Context, viewId: number, args: Object): PlatformView {
    return new CustomView(context, viewId, args, this.message);
  }
}

import { BinaryMessenger } from '@ohos/flutter_ohos/src/main/ets/plugin/common/BinaryMessenger';

import MessageCodec from '@ohos/flutter_ohos/src/main/ets/plugin/common/MessageCodec';

import PlatformViewFactory from '@ohos/flutter_ohos/src/main/ets/plugin/platform/PlatformViewFactory';

import { CustomView } from './CustomView';

import common from '@ohos.app.ability.common';

import PlatformView from '@ohos/flutter_ohos/src/main/ets/plugin/platform/PlatformView';

export class CustomFactory extends PlatformViewFactory {

message: BinaryMessenger;

constructor(message: BinaryMessenger, createArgsCodes: MessageCodec<Object>) {

super(createArgsCodes);

this.message = message;

}

public create(context: common.Context, viewId: number, args: Object): PlatformView {

return new CustomView(context, viewId, args, this.message);

}

cat src/main/entryablity/CustomPlugin.ets

import  { FlutterPlugin,
  FlutterPluginBinding } from '@ohos/flutter_ohos/src/main/ets/embedding/engine/plugins/FlutterPlugin';
import StandardMessageCodec from '@ohos/flutter_ohos/src/main/ets/plugin/common/StandardMessageCodec';
import { CustomFactory } from './CustomFactory';

export class CustomPlugin implements FlutterPlugin {
  getUniqueClassName(): string {
    return 'CustomPlugin';
  }

  onAttachedToEngine(binding: FlutterPluginBinding): void {
    binding.getPlatformViewRegistry()?.
    registerViewFactory('com.rex.custom.ohos/customView', new CustomFactory(binding.getBinaryMessenger(), StandardMessageCodec.INSTANCE));
  }

  onDetachedFromEngine(binding: FlutterPluginBinding): void {}
}

import { FlutterPlugin,

FlutterPluginBinding } from '@ohos/flutter_ohos/src/main/ets/embedding/engine/plugins/FlutterPlugin';

import StandardMessageCodec from '@ohos/flutter_ohos/src/main/ets/plugin/common/StandardMessageCodec';

import { CustomFactory } from './CustomFactory';

export class CustomPlugin implements FlutterPlugin {

getUniqueClassName(): string {

return 'CustomPlugin';

}

onAttachedToEngine(binding: FlutterPluginBinding): void {

binding.getPlatformViewRegistry()?.

registerViewFactory('com.rex.custom.ohos/customView', new CustomFactory(binding.getBinaryMessenger(), StandardMessageCodec.INSTANCE));

}

onDetachedFromEngine(binding: FlutterPluginBinding): void {}

}

cat src/main/entryablity/CustomView.ets

import MethodChannel, {
  MethodCallHandler,
  MethodResult
} from '@ohos/flutter_ohos/src/main/ets/plugin/common/MethodChannel';
import PlatformView, { Params } from '@ohos/flutter_ohos/src/main/ets/plugin/platform/PlatformView';
import common from '@ohos.app.ability.common';
import { BinaryMessenger } from '@ohos/flutter_ohos/src/main/ets/plugin/common/BinaryMessenger';
import StandardMethodCodec from '@ohos/flutter_ohos/src/main/ets/plugin/common/StandardMethodCodec';
import MethodCall from '@ohos/flutter_ohos/src/main/ets/plugin/common/MethodCall';
import { webview } from '@kit.ArkWeb';

@Component
struct ButtonComponent {
  @Prop params: Params
  customView: CustomView = this.params.platformView as CustomView
  @StorageLink('numValue') storageLink: string = "first"
  @State bkColor: Color = Color.Red

  private webviewController: WebviewController = new webview.WebviewController()

  build() {
    Web({src: 'https://bing.com', controller: this.webviewController})
      .domStorageAccess(true)
      .fileAccess(true)
      .mixedMode(MixedMode.All)
      .databaseAccess(true)
      .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36")
  }
}

@Builder
function ButtonBuilder(params: Params) {
  ButtonComponent({ params: params })
    .backgroundColor(Color.Transparent)
}

AppStorage.setOrCreate('numValue', 'test')

@Observed
export class CustomView extends PlatformView implements MethodCallHandler {
  numValue: string = "test";

  methodChannel: MethodChannel;
  index: number = 1;

  constructor(context: common.Context, viewId: number, args: ESObject, message: BinaryMessenger) {
    super();
    // 注册消息通道
    this.methodChannel = new MethodChannel(message, `com.rex.custom.ohos/customView${viewId}`, StandardMethodCodec.INSTANCE);
    this.methodChannel.setMethodCallHandler(this);
  }

  onMethodCall(call: MethodCall, result: MethodResult): void {
    // 接受Dart侧发来的消息
    let method: string = call.method;
    let link1: SubscribedAbstractProperty<number> = AppStorage.link('numValue');
    switch (method) {
      case 'getMessageFromFlutterView':
        let value: ESObject = call.args;
        this.numValue = value;
        link1.set(value)
        console.log("nodeController receive message from dart: " + this.numValue);
        result.success(true);
        break;
    }
  }

  public sendMessage = () => {
    console.log("nodeController sendMessage")
    //向Dart侧发送消息
    this.methodChannel.invokeMethod('getMessageFromOhosView', 'natvie - ' + this.index++);
  }

  getView(): WrappedBuilder<[Params]> {
    return new WrappedBuilder(ButtonBuilder);
  }

  dispose(): void {
  }
}

import MethodChannel, {

MethodCallHandler,

MethodResult

} from '@ohos/flutter_ohos/src/main/ets/plugin/common/MethodChannel';

import PlatformView, { Params } from '@ohos/flutter_ohos/src/main/ets/plugin/platform/PlatformView';

import common from '@ohos.app.ability.common';

import { BinaryMessenger } from '@ohos/flutter_ohos/src/main/ets/plugin/common/BinaryMessenger';

import StandardMethodCodec from '@ohos/flutter_ohos/src/main/ets/plugin/common/StandardMethodCodec';

import MethodCall from '@ohos/flutter_ohos/src/main/ets/plugin/common/MethodCall';

import { webview } from '@kit.ArkWeb';

@Component

struct ButtonComponent {

@Prop params: Params

customView: CustomView = this.params.platformView as CustomView

@StorageLink('numValue') storageLink: string = "first"

@State bkColor: Color = Color.Red

private webviewController: WebviewController = new webview.WebviewController()

build() {

Web({src: 'https://bing.com', controller: this.webviewController})

.domStorageAccess(true)

.fileAccess(true)

.mixedMode(MixedMode.All)

.databaseAccess(true)

.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36")

}

@Builder

function ButtonBuilder(params: Params) {

ButtonComponent({ params: params })

.backgroundColor(Color.Transparent)

}

AppStorage.setOrCreate('numValue', 'test')

@Observed

export class CustomView extends PlatformView implements MethodCallHandler {

numValue: string = "test";

methodChannel: MethodChannel;

index: number = 1;

constructor(context: common.Context, viewId: number, args: ESObject, message: BinaryMessenger) {

super();

// 注册消息通道

this.methodChannel = new MethodChannel(message, `com.rex.custom.ohos/customView${viewId}`, StandardMethodCodec.INSTANCE);

this.methodChannel.setMethodCallHandler(this);

}

onMethodCall(call: MethodCall, result: MethodResult): void {

// 接受Dart侧发来的消息

let method: string = call.method;

let link1: SubscribedAbstractProperty<number> = AppStorage.link('numValue');

switch (method) {

case 'getMessageFromFlutterView':

let value: ESObject = call.args;

this.numValue = value;

link1.set(value)

console.log("nodeController receive message from dart: " + this.numValue);

result.success(true);

break;

}

public sendMessage = () => {

console.log("nodeController sendMessage")

//向Dart侧发送消息

this.methodChannel.invokeMethod('getMessageFromOhosView', 'natvie - ' + this.index++);

}

getView(): WrappedBuilder<[Params]> {

return new WrappedBuilder(ButtonBuilder);

}

dispose(): void {

}

cat src/main/entryablity/EntryAbility.ets

import { FlutterAbility, FlutterEngine } from '@ohos/flutter_ohos';
import { GeneratedPluginRegistrant } from '../plugins/GeneratedPluginRegistrant';
import { CustomPlugin } from './CustomPlugin';

export default class EntryAbility extends FlutterAbility {
  configureFlutterEngine(flutterEngine: FlutterEngine) {
    super.configureFlutterEngine(flutterEngine)
    GeneratedPluginRegistrant.registerWith(flutterEngine)
    this.addPlugin(new CustomPlugin())
  }
}

import { FlutterAbility, FlutterEngine } from '@ohos/flutter_ohos';

import { GeneratedPluginRegistrant } from '../plugins/GeneratedPluginRegistrant';

import { CustomPlugin } from './CustomPlugin';

export default class EntryAbility extends FlutterAbility {

configureFlutterEngine(flutterEngine: FlutterEngine) {

super.configureFlutterEngine(flutterEngine)

GeneratedPluginRegistrant.registerWith(flutterEngine)

this.addPlugin(new CustomPlugin())

}

创建 pages

cat src/main/ets/pages/index.ets

import common from '@ohos.app.ability.common';
import { FlutterPage } from '@ohos/flutter_ohos'

let storage = LocalStorage.getShared()
const EVENT_BACK_PRESS = 'EVENT_BACK_PRESS'

@Entry(storage)
@Component
struct Index {
  private context = getContext(this) as common.UIAbilityContext
  @LocalStorageLink('viewId') viewId: string = "";

  build() {
    Column() {
      FlutterPage({ viewId: this.viewId })
    }
  }

  onBackPress(): boolean {
    this.context.eventHub.emit(EVENT_BACK_PRESS)
    return true
  }
}

import common from '@ohos.app.ability.common';

import { FlutterPage } from '@ohos/flutter_ohos'

let storage = LocalStorage.getShared()

const EVENT_BACK_PRESS = 'EVENT_BACK_PRESS'

@Entry(storage)

@Component

struct Index {

private context = getContext(this) as common.UIAbilityContext

@LocalStorageLink('viewId') viewId: string = "";

build() {

Column() {

FlutterPage({ viewId: this.viewId })

}

onBackPress(): boolean {

this.context.eventHub.emit(EVENT_BACK_PRESS)

return true

}

在 src/main/resources/base/profile/main_page.json 中配置路由

{
   
  "src": [
    "pages/Index"
  ]
}

{

"src": [

"pages/Index"

]

}

在 Dart 侧调用该 PlatformView

Scaffold(
  appBar: AppBar(title: Text('code')),
  body: OhosView(
  viewType: 'com.rex.custom.ohos/customView',
  // onPlatformViewCreated: _onPlatformViewCreated,
  creationParams: const <String, dynamic>{
   'initParams': 'hello world'},
  creationParamsCodec: const StandardMessageCodec(),
)

Scaffold(

appBar: AppBar(title: Text('code')),

body: OhosView(

viewType: 'com.rex.custom.ohos/customView',

// onPlatformViewCreated: _onPlatformViewCreated,

creationParams: const <String, dynamic>{

'initParams': 'hello world'},

creationParamsCodec: const StandardMessageCodec(),

)

参考链接

鸿蒙Flutter实战：03-鸿蒙Flutter开发中集成Webview

当笔记本电脑盖子关闭时如何禁用指纹认证(ubuntu 24.04)？

To disable fingerprint authentication when the laptop lid is closed, and re-enable when it is reopened, we will use acpid to bind to the button/lid.* event to a custom script that will stop and mask the fprintd service on lid close, and unmask and start the fprintd service on lid open.

We also check that the HDMI cable is connected by testing the contents of /sys/class/drm/card1-HDMI-A-1/status.

Follow the steps below: (ThinkPad T440 ubunu 22.04)

Create file /etc/acpi/laptop-lid.sh with the following contents:

#!/bin/bash

lock=$HOME/fprint-disabled

if grep -Fq closed /proc/acpi/button/lid/LID/state &&
   grep -Fxq connected /sys/class/drm/card1-HDMI-A-1/status
then
  touch "$lock"
  systemctl stop fprintd
  systemctl mask fprintd
elif [ -f "$lock" ]
then
  systemctl unmask fprintd
  systemctl start fprintd
  rm "$lock"
fi

#!/bin/bash

lock=$HOME/fprint-disabled

if grep -Fq closed /proc/acpi/button/lid/LID/state &&

grep -Fxq connected /sys/class/drm/card1-HDMI-A-1/status

then

touch "$lock"

systemctl stop fprintd

systemctl mask fprintd

elif [ -f "$lock" ]

then

systemctl unmask fprintd

systemctl start fprintd

rm "$lock"

Make the file executable with

chmod +x /etc/acpi/laptop-lid.sh

1

chmod +x /etc/acpi/laptop-lid.sh
Create file /etc/acpi/events/laptop-lid with the following contents:

event=button/lid.* action=/etc/acpi/laptop-lid.sh

1
2

event=button/lid.*
action=/etc/acpi/laptop-lid.sh
Restart the acpid service with:

Shell

sudo service acpid restart

1

sudo service acpid restart

Now the fingerprint will be used only when the lid is open.

In order to restore the correct state of the fprintd service if you disconnect/reconnect while the laptop is off, you may call the above script from a systemd init file. The steps to do this are the following:

Create a file named /etc/systemd/system/laptop-lid.service with the following contents:

[Unit]
Description=Laptop Lid
After=suspend.target

[Service]
ExecStart=/etc/acpi/laptop-lid.sh

[Install]
WantedBy=multi-user.target
WantedBy=suspend.target

[Unit]

Description=Laptop Lid

After=suspend.target

[Service]

ExecStart=/etc/acpi/laptop-lid.sh

[Install]

WantedBy=multi-user.target

WantedBy=suspend.target

Reload the systemd config files with

sudo systemctl daemon-reload

1

sudo systemctl daemon-reload
Start the service with

sudo systemctl start laptop-lid.service

1

sudo systemctl start laptop-lid.service
Enable the service so that it starts automatically on boot

sudo systemctl enable laptop-lid.service

1

sudo systemctl enable laptop-lid.service

Now the status should be correct even after connecting/disconnecting when the computer is off.

References used for creating the code in the answer:
参考链接

从零学习大模型（4）——Transformer 的 “内部齿轮”：FFN、残差连接与归一化如何让 AI 更聪明？

如果把 Transformer 比作一台精密的机器，那么注意力机制是它的 “核心引擎”，而前馈神经网络（FFN）、残差连接（Residual Connection）和归一化（Normalization）就是让引擎高效运转的 “内部齿轮”。这些模块看似简单，却解决了深度学习的两大核心难题 —— 特征提取能力不足和训练不稳定性，是大语言模型能 “理解语言、生成文本” 的关键支撑。

前馈神经网络（FFN）：给注意力结果 “加工提纯”

注意力机制能捕捉词与词的关联（如 “它” 指代 “狗”），但输出的特征向量还需进一步 “加工” 才能被模型有效利用。前馈神经网络（FFN）的作用，就是对注意力的输出进行非线性转换和特征提纯 —— 就像厨师把新鲜食材（注意力结果）做成美味菜肴（可用特征）。

FFN 的核心结构：两层线性变换 + 激活函数

Transformer 中的 FFN 结构非常简洁，通常由两步处理组成。

第一步是线性变换（Linear1），将输入向量从高维压缩到更高维（如从 512 维升到 2048 维）。这一步的作用是 “扩展特征空间”—— 就像用更高分辨率的镜头观察物体，能捕捉更多细节（如 “狗” 不仅有 “动物” 特征，还有 “哺乳动物”“宠物” 等细分特征）。之后经过激活函数（如 ReLU）引入非线性转换，线性变换只能学习简单关系（如 “狗→动物”），而非线性变换能学习复杂关联（如 “狗→宠物→需要喂食”）。

第二步是另一个线性变换（Linear2），将高维向量压缩回原维度（如从 2048 维降回 512 维），这一步是 “特征聚合”—— 把扩展出的细节特征重新整合，形成更精炼的表示。

以 “猫追狗，它跑得很快” 为例，注意力机制已计算出 “它” 与 “狗” 的关联，输出包含关联信息的向量；FFN 通过线性变换扩展特征（如 “狗” 的 “奔跑能力”“被追状态” 等细节），再通过激活函数强化关键特征（如 “奔跑能力”），最后压缩为更有效的向量。

为什么 FFN 是注意力的 “最佳搭档”？

注意力机制擅长 “捕捉关联”，但缺乏 “特征转换” 能力 —— 它输出的向量本质是 “关联加权求和”，特征表达较为粗糙。而 FFN 的优势正在于 “提纯特征”：增强非线性，让模型能学习复杂语义（如隐喻、逻辑推理）；聚焦关键特征，通过维度扩展和压缩，强化重要特征（如 “跑” 与 “狗” 的关联），弱化噪声；补充局部特征，注意力关注全局关联，FFN 则可捕捉局部特征（如 “跑得很快” 中 “跑” 与 “快” 的搭配）。形象说：注意力是 “侦察兵”（找到相关信息），FFN 是 “分析师”（提炼有用信息）。

激活函数：给 FFN 注入 “非线性能力”

激活函数是 FFN 的 “灵魂”—— 没有它，FFN 就退化为线性变换（两层线性变换等价于一层），无法学习复杂特征。ReLU（Rectified Linear Unit）是 Transformer 原始论文的选择，公式为 ReLU (x) = max (0, x)（负数输出 0，正数直接输出），它的优势是计算简单，解决了早期 “Sigmoid 梯度消失” 问题，但存在 “死亡 ReLU” 问题（输入为负时神经元永久失效）。

GELU（Gaussian Error Linear Unit）是 BERT、GPT 等模型的改进选择，公式近似为 0.5x (1 + tanh (√(2/π)(x + 0.044715x³)))，它比 ReLU 更平滑（不会突然输出 0），能保留更多中间特征（如 “跑” 的强度不同时，输出有细微差异），适合需要精细特征的模型（如 BERT 的文本理解、GPT 的生成）。

SwiGLU（Swish-Gated Linear Unit）是大模型（如 LLaMA、GPT-4）的主流选择，公式为 SwiGLU (x) = Swish (x) × Linear (x)（Swish 是带参数的 Sigmoid，这里用线性变换模拟 “门控”），它通过 “门控机制” 动态筛选特征（如 “激活” 有用特征，“抑制” 无关特征），比 GELU 更灵活，在 100 亿参数以上的大模型中，能显著提升生成连贯性和推理能力。

激活函数的选择遵循 “模型越大，越需要灵活激活” 的规律：小模型用 ReLU 足够高效，大模型则需 SwiGLU 的精细调控。

残差连接：让模型 “深而不垮” 的 “桥梁设计”

在深度学习中，模型深度（层数）是提升性能的关键 —— 但传统网络超过一定层数后，会出现 “梯度消失”（训练时参数难以更新）和 “性能下降”（层数增加，精度反而降低）。残差连接（Residual Connection）的发明，彻底解决了这个问题，让 Transformer 能堆叠数十甚至上百层。

核心原理：“跳过连接” 传递原始信息

残差连接的结构极其简单：将模块的输入与输出相加。例如在注意力模块中，输出等于注意力计算结果加上原始输入。这种 “跳过连接” 的作用，可通过一个比喻理解：传统网络中，信息像 “接力赛”—— 每一层必须完美传递信息，否则后面就会 “断档”；残差连接中，信息像 “双车道”—— 一条道是模块处理（如注意力），另一条道是原始信息直接传递。即使模块处理有损失，原始信息仍能通过 “直通道” 到达深层。

为什么残差连接能解决 “梯度消失”？

训练模型时，参数更新依赖 “梯度”（损失对参数的导数）。

传统网络中，梯度需要逐层传递，层数越多，梯度衰减越严重（就像声音在长管道中逐渐减弱）。而残差连接让梯度有了 “捷径”：损失对输入 x 的梯度等于损失对模块输出的梯度加上 1（直接从输出 = 模块输出 + 输入的关系推导）。这意味着梯度不会衰减到 0（至少有 “1” 的基础），深层参数也能有效更新。例如，训练一个 100 层的 Transformer，没有残差连接时，第 100 层的梯度可能衰减到接近 0，参数几乎不更新；有残差连接时，梯度通过 “输出 + 输入” 的路径，能稳定传递到第 1 层，所有层参数都能正常更新。

归一化：让训练 “稳如泰山” 的 “校准工具”

深度学习中，输入向量的数值范围可能剧烈波动（如有的词向量值在 0-1，有的在 100-200）。这种 “数值不稳定” 会导致训练震荡（损失忽高忽低），甚至无法收敛。归一化（Normalization）的作用，就是将向量标准化到固定范围（如均值 0、方差 1），就像给数据 “校准”—— 让模型处理的始终是 “符合预期” 的输入。

Transformer 中最常用的归一化方法是层归一化（Layer Norm，LN），但也有 BN（Batch Norm）、RMSNorm 等变体。理解它们的区别，就能明白为什么 LN 成为 NLP 的主流选择。

LN 与 BN：归一化的 “两种思路”

LN 和 BN 的核心目标相同（标准化数值），但归一化的 “范围” 不同。层归一化（LN）是对单样本内的所有特征进行归一化（如一个句子的 512 维向量），计算方式是对每个样本，计算自身特征的均值和方差。批归一化（BN）则是对批次内的所有样本的同一特征维度进行归一化（如 32 个句子的同一特征维度），计算方式是对每个特征维度，计算批次内所有样本的均值和方差。

为什么文本用 LN，图像用 BN？文本的 “批次一致性” 差：同一批次中，句子长度、语义差异大（如有的是新闻，有的是诗歌），BN 的 “批次均值” 没有意义；而 LN 基于单样本归一化，不受批次影响。图像的 “特征一致性” 强：同一批次的图像（如猫的图片）在同一像素位置（如边缘特征）的数值分布相似，BN 能有效利用这种一致性。

在 Transformer 中，LN 通常紧跟残差连接，形成 “残差 – 归一化” 组合（如输出等于 LN（注意力输出 + 输入））。这种组合既能标准化数值，又能通过残差保留原始信息。

预归一化（Pre-Norm）与后归一化（Post-Norm）：归一化的 “时机选择”

在 Transformer 层中，归一化可以放在模块（注意力或 FFN）之前（Pre-Norm）或之后（Post-Norm），这两种设计对训练稳定性影响很大。Post-Norm（后归一化）是原始 Transformer 的选择，流程是先做模块计算和残差，再进行归一化。这种方式存在问题：模块计算可能导致数值剧烈波动（如注意力的点积可能很大），残差相加后再归一化，仍可能出现训练不稳定（尤其是深层模型）。

Pre-Norm（预归一化）是现代大模型（如 GPT、LLaMA）的选择，流程是先对输入归一化，再做模块计算和残差。这种方式的优势在于：归一化后输入更稳定（均值 0、方差 1），模块计算不易出现数值爆炸，训练更稳定，且能支持更深的层数（如 100 层以上）。实际效果显示，Post-Norm 在 12 层以内表现正常，超过 24 层训练损失容易震荡；而 Pre-Norm 即使堆叠 100 层，损失仍能平稳下降。这也是大模型普遍采用 Pre-Norm 的核心原因。

归一化的 “轻量化” 变体：RMSNorm 与 ScaleNorm

LN 虽稳定，但计算均值和方差的开销较高。研究者们提出了更高效的变体。RMSNorm（Root Mean Square Layer Normalization）是 LLaMA、GPT-3 等模型的选择，它去掉均值计算，只通过 “均方根” 标准化，计算量比 LN 减少 20%（无需减均值），且在语言模型中性能接近 LN。其原理是文本特征的均值通常接近 0（因词向量训练时已中心化），去掉均值对结果影响小。

ScaleNorm 是进一步简化的变体，通过向量的 L2 范数进行标准化，计算更简单（无需统计方差），适合资源受限的场景。但它对输入分布较敏感，在小模型中表现较好。

这些变体的核心思路是：在保证稳定性的前提下，减少计算开销 —— 对大模型而言，每一层的效率提升都会累积成显著优势。

各模块的协同作用：Transformer 的 “流水线设计”

FFN、残差连接、归一化不是孤立存在的，它们在 Transformer 层中形成 “流水线”，共同完成特征处理。

以编码器层为例，完整流程如下：首先接收前一层输出的特征向量作为输入；接着进行预归一化，得到标准化的输入向量（先归一化，保证输入稳定）；然后通过多头注意力模块计算注意力输出（注意力捕捉关联）；之后进行残差连接，将注意力输出与原始输入相加（保留原始信息，避免特征丢失）；再次进行预归一化，为 FFN 提供稳定输入；FFN 处理通过 SwiGLU 激活函数和线性变换提纯特征；最后进行最终残差连接，输出整合了注意力和 FFN 的特征。

这个流程的精妙之处在于：归一化确保每一步输入稳定，避免数值波动；残差连接让信息 “有退路”，深层也能有效传递；FFN 则在稳定的基础上，持续提纯特征。就像工厂流水线：归一化是 “质检校准”，残差连接是 “备用通道”，FFN 是 “精加工”—— 三者协同，让 Transformer 能稳定高效地学习语言规律。

不同模型的模块选择：效率与性能的平衡

模型对 FFN、残差、归一化的选择，体现了 “任务需求 – 模型大小 – 计算资源” 的平衡。GPT-4 等大模型选择 SwiGLU 作为 FFN 激活函数，RMSNorm 作为归一化方式，采用 Pre-Norm 连接设计。因为大模型需精细特征和稳定性，SwiGLU 提升表达，RMSNorm 高效，Pre-Norm 支持深层。

LLaMA 2 等开源模型同样选择 SwiGLU、RMSNorm 和 Pre-Norm，开源模型需兼顾性能与效率，RMSNorm 减少计算，适合部署。BERT 等专注理解任务的模型使用 GELU 激活函数，采用 LN 归一化和改进版 Pre-Norm 连接设计，理解任务需平滑特征，GELU 比 ReLU 更精细，LN 稳定性足够。

轻量模型（如 MobileBERT）则选择 ReLU 作为激活函数，ScaleNorm 作为归一化方式，采用 Pre-Norm 连接设计，移动端需极致效率，ReLU 和 ScaleNorm 计算量最小。

结语：细节决定性能的 “深度学习哲学”

FFN、残差连接、归一化这些模块，看似是 “辅助组件”，却决定了 Transformer 能走多深、跑多快。它们的演进印证了深度学习的一个核心哲学：大模型的能力不仅来自 “规模”（参数和数据），更来自 “细节设计”—— 如何让每一层更稳定，让每一次计算更有效。

从 ReLU 到 SwiGLU，从 Post-Norm 到 Pre-Norm，从 LN 到 RMSNorm，这些微小的改进累积起来，让模型从 “能训练 12 层” 到 “能训练 100 层”，从 “生成生硬文本” 到 “写出流畅文章”。未来，随着模型规模继续扩大，这些 “内部齿轮” 的优化仍将是关键 —— 毕竟，能支撑起千亿参数的，从来不是 “宏大架构”，而是每一个精密的细节。

当我们惊叹于 AI 的语言能力时，或许该记住：让它 “聪明” 的，不仅是注意力机制的 “聚焦”，还有这些模块在背后默默的 “加工、传递与校准”。

参考链接

从零学习大模型（4）——Transformer 的 “内部齿轮”：FFN、残差连接与归一化如何让 AI 更聪明？

llama2.c 源码阅读

1. 概述

前OpenAI著名工程师Andrej Kapathy开源了llama2.c项目，该项目是llama2模型推理代码的C语言实现，用大概970行C代码实现了LLama2模型的推理算法。整个项目代码简洁高效，值得深度阅读。对掌握大模型推理算法的细节有极大的帮助。

2. 源码阅读

2.1 基础算法

RMS归一化公式是：

$$ o_i = w_i \times x_i \times \frac {1}{\sqrt{\frac{1}{n}\sum_{j=0}^{n-1} x_j^2 + \epsilon}} $$

其中，$\epsilon$ 为防止分母为0的数值。还有RMS因子是对x的归一化，w变量是gain变量，重新缩放标准化后的输入向量。

// ----------------------------------------------------------------------------
// neural net blocks; the dynamics of the Transformer
void rmsnorm(float* o, float* x, float* weight, int size) {
    // calculate sum of squares
    float ss = 0.0f;
    for (int j = 0; j < size; j++) {
        ss += x[j] * x[j];
    }
    ss /= size;
    ss += 1e-5f;
    ss = 1.0f / sqrtf(ss);
    // normalize and scale
    for (int j = 0; j < size; j++) {
        o[j] = weight[j] * (ss * x[j]);
    }
}

// ----------------------------------------------------------------------------

// neural net blocks; the dynamics of the Transformer

void rmsnorm(float* o, float* x, float* weight, int size) {

// calculate sum of squares

float ss = 0.0f;

for (int j = 0; j < size; j++) {

ss += x[j] * x[j];

}

ss /= size;

ss += 1e-5f;

ss = 1.0f / sqrtf(ss);

// normalize and scale

for (int j = 0; j < size; j++) {

o[j] = weight[j] * (ss * x[j]);

}

softmax函数公式是：

$$ o_i = \frac {e^{x_i-x_{max}}}{\sum_{j=0}^{n-1} e^{x_j-x_{max}}} $$

代码如下，注释说的很清楚，减去最大值是为了防止数值溢出，数值更稳定。通过简单数学变换可以得知，最终结果不变。

void softmax(float* x, int size) {
    // find max value (for numerical stability)
    float max_val = x[0];
    for (int i = 1; i < size; i++) {
        if (x[i] > max_val) {
            max_val = x[i];
        }
    }
    // exp and sum
    float sum = 0.0f;
    for (int i = 0; i < size; i++) {
        x[i] = expf(x[i] - max_val);
        sum += x[i];
    }
    // normalize
    for (int i = 0; i < size; i++) {
        x[i] /= sum;
    }
}

void softmax(float* x, int size) {

// find max value (for numerical stability)

float max_val = x[0];

for (int i = 1; i < size; i++) {

if (x[i] > max_val) {

max_val = x[i];

}

// exp and sum

float sum = 0.0f;

for (int i = 0; i < size; i++) {

x[i] = expf(x[i] - max_val);

sum += x[i];

}

// normalize

for (int i = 0; i < size; i++) {

x[i] /= sum;

}

W (d,n) @ x (n,) -> xout (d,)的矩阵乘法，采用naive的矩阵乘法，即外层循环是行，内层循环是列。代码如下：

void matmul(float* xout, float* x, float* w, int n, int d) {
    // W (d,n) @ x (n,) -> xout (d,)
    // by far the most amount of time is spent inside this little function
    int i;
    #pragma omp parallel for private(i)
    for (i = 0; i < d; i++) {
        float val = 0.0f;
        for (int j = 0; j < n; j++) {
            val += w[i * n + j] * x[j];
        }
        xout[i] = val;
    }
}

void matmul(float* xout, float* x, float* w, int n, int d) {

// W (d,n) @ x (n,) -> xout (d,)

// by far the most amount of time is spent inside this little function

int i;

#pragma omp parallel for private(i)

for (i = 0; i < d; i++) {

float val = 0.0f;

for (int j = 0; j < n; j++) {

val += w[i * n + j] * x[j];

}

xout[i] = val;

}

2.2. forward计算

模型中一个attention block的计算如下图所示：

项目代码是按照每一个token来计算QKV的，其中参数dim是transformer的向量维度。l是layer序号。

第一步是rmsnorm，即归一化。输入是x (d,)，rms权重向量是w->rms_att_weight + l*dim，计算结果输出到s->xb (d,)中。

// attention rmsnorm
rmsnorm(s->xb, x, w->rms_att_weight + l*dim, dim);

1 2	// attention rmsnorm rmsnorm(s->xb, x, w->rms_att_weight + l*dim, dim);

第二步是QKV的矩阵乘法，注意kv_dim和dim的区别，是为了同时兼容multi head attention和grouped query attention两种算法。如下图所示：

kv_dim是key和value的总维度，dim是transformer的向量总维度。在multi head attention中，kv_dim = dim。在grouped query attention中，kv_dim = dim * n_kv_heads / n_heads。以图中为例，n_kv_heads = 4, n_heads = 8，则kv_dim = dim / 2。

对于各矩阵的维度，以及在MHA、GQA等算法中的关系，参考下图：

Q、K、V三个向量计算的详细代码如下，即Wq(d,d) @ xb(d,) -> q(d,)，Wk(dkv,d) @ xb(d,) -> k(dkv,), Wv(dkv,d) @ xb(d,) -> v(dkv,)

// key and value point to the kv cache
int loff = l * p->seq_len * kv_dim; // kv cache layer offset for convenience
s->k = s->key_cache + loff + pos * kv_dim;
s->v = s->value_cache + loff + pos * kv_dim;

// qkv matmuls for this position
matmul(s->q, s->xb, w->wq + l*dim*dim, dim, dim);
matmul(s->k, s->xb, w->wk + l*dim*kv_dim, dim, kv_dim);
matmul(s->v, s->xb, w->wv + l*dim*kv_dim, dim, kv_dim);

// key and value point to the kv cache

int loff = l * p->seq_len * kv_dim; // kv cache layer offset for convenience

s->k = s->key_cache + loff + pos * kv_dim;

s->v = s->value_cache + loff + pos * kv_dim;

// qkv matmuls for this position

matmul(s->q, s->xb, w->wq + l*dim*dim, dim, dim);

matmul(s->k, s->xb, w->wk + l*dim*kv_dim, dim, kv_dim);

matmul(s->v, s->xb, w->wv + l*dim*kv_dim, dim, kv_dim);

接下来需要给Q和K向量添加RoPE位置编码，按照如下公式计算，其中m就是当前token的序号pos。需要注意的是，llama模型是给每一层的Q和K向量都添加这个编码。

$$ \begin{aligned} \theta_i &= \frac{1}{10000^{2i/hs}}= 10000^{-2i/hs} \\ Q(i) &=Q(i)\cos (m\theta_i) - Q(i+1)\sin(m\theta_i)\\ Q(i+1) &=Q(i)\sin (m \theta_i) + Q(i+1)\cos(m\theta_i)\\ K(i) &=K(i)\cos (m \theta_i) - K(i+1)\sin(m\theta_i)\\ K(i+1) &=K(i)\sin (m \theta_i) + K(i+1)\cos(m\theta_i)\\ \end{aligned} $$

详细代码如下，注意在GQA中，K的向量长度小于Q的向量长度，所以在i < kv_dim时，计算Q和K的向量。在i >= kv_dim时，只计算Q的向量。

// RoPE relative positional encoding: complex-valued rotate q and k in each head
for (int i = 0; i < dim; i+=2) {
    int head_dim = i % head_size;
    float freq = 1.0f / powf(10000.0f, head_dim / (float)head_size);
    float val = pos * freq;
    float fcr = cosf(val);
    float fci = sinf(val);
    int rotn = i < kv_dim ? 2 : 1; // how many vectors? 2 = q & k, 1 = q only
    for (int v = 0; v < rotn; v++) {
        float* vec = v == 0 ? s->q : s->k; // the vector to rotate (query or key)
        float v0 = vec[i];
        float v1 = vec[i+1];
        vec[i]   = v0 * fcr - v1 * fci;
        vec[i+1] = v0 * fci + v1 * fcr;
    }
}

// RoPE relative positional encoding: complex-valued rotate q and k in each head

for (int i = 0; i < dim; i+=2) {

int head_dim = i % head_size;

float freq = 1.0f / powf(10000.0f, head_dim / (float)head_size);

float val = pos * freq;

float fcr = cosf(val);

float fci = sinf(val);

int rotn = i < kv_dim ? 2 : 1; // how many vectors? 2 = q & k, 1 = q only

for (int v = 0; v < rotn; v++) {

float* vec = v == 0 ? s->q : s->k; // the vector to rotate (query or key)

float v0 = vec[i];

float v1 = vec[i+1];

vec[i] = v0 * fcr - v1 * fci;

vec[i+1] = v0 * fci + v1 * fcr;

}

接下来针对每个头，计算attention score。attention score的计算公式如下：

$$ score(i) = softmax(\frac{ Q_i K^T}{\sqrt{d}})V , \quad Q_i \in \R^{1 \times d},K \in \R^{n\times d},V\in\R^{n\times d} $$

具体计算的时候，先遍历每个head，在每个head中，先计算Qi和K的点积，然后除以sqrt(d)，得到att (1,n)向量，最后softmax得到attention score。

在GQA中，由于分组共享了Q和K的向量，在计算attention score的时候，需要把Q和K的向量“展开”还原为(n,d)的矩阵，具体做法是通过h / kv_mul，保证 kv_mul个Q和K向量共享一个权重。

然后计算attention score (1,n)和V (n,d)的乘积，得到xb (1,d)。这个计算并不是完全按照普通矩阵乘来计算的，而是把每个位置的attention score和V的每一行相乘，然后累加到xb中。这样计算的好处是对cache更加友好，是一种常见的矩阵乘算法。

对于每个头，每个token的attention score计算过程的可视化如图所示：

图中可以清楚看出，每个token都计算了一遍和其他token的相关度，再进行加权求和得到最终的attention score。

具体代码如下：

for (h = 0; h < p->n_heads; h++) {
    // get the query vector for this head
    float* q = s->q + h * head_size;
    // attention scores for this head
    float* att = s->att + h * p->seq_len;
    // iterate over all timesteps, including the current one
    for (int t = 0; t <= pos; t++) {
        // get the key vector for this head and at this timestep
        float* k = s->key_cache + loff + t * kv_dim + (h / kv_mul) * head_size;
        // calculate the attention score as the dot product of q and k
        float score = 0.0f;
        for (int i = 0; i < head_size; i++) {
            score += q[i] * k[i];
        }
        score /= sqrtf(head_size);
        // save the score to the attention buffer
        att[t] = score;
    }

    // softmax the scores to get attention weights, from 0..pos inclusively
    softmax(att, pos + 1);

    // weighted sum of the values, store back into xb
    float* xb = s->xb + h * head_size;
    memset(xb, 0, head_size * sizeof(float));
    for (int t = 0; t <= pos; t++) {
        // get the value vector for this head and at this timestep
        float* v = s->value_cache + loff + t * kv_dim + (h / kv_mul) * head_size;
        // get the attention weight for this timestep
        float a = att[t];
        // accumulate the weighted value into xb
        for (int i = 0; i < head_size; i++) {
            xb[i] += a * v[i];
        }
    }
}

for (h = 0; h < p->n_heads; h++) {

// get the query vector for this head

float* q = s->q + h * head_size;

// attention scores for this head

float* att = s->att + h * p->seq_len;

// iterate over all timesteps, including the current one

for (int t = 0; t <= pos; t++) {

// get the key vector for this head and at this timestep

float* k = s->key_cache + loff + t * kv_dim + (h / kv_mul) * head_size;

// calculate the attention score as the dot product of q and k

float score = 0.0f;

for (int i = 0; i < head_size; i++) {

score += q[i] * k[i];

}

score /= sqrtf(head_size);

// save the score to the attention buffer

att[t] = score;

}

// softmax the scores to get attention weights, from 0..pos inclusively

softmax(att, pos + 1);

// weighted sum of the values, store back into xb

float* xb = s->xb + h * head_size;

memset(xb, 0, head_size * sizeof(float));

for (int t = 0; t <= pos; t++) {

// get the value vector for this head and at this timestep

float* v = s->value_cache + loff + t * kv_dim + (h / kv_mul) * head_size;

// get the attention weight for this timestep

float a = att[t];

// accumulate the weighted value into xb

for (int i = 0; i < head_size; i++) {

xb[i] += a * v[i];

}

从代码中也能看出，为什么需要把K和V的矩阵进行cache。因为对于一个位置的token而言，Q矩阵每次参与计算的只有当前位置的一行，而K和V矩阵，则是每行都需要参与计算。最终得到的也是该位置的(1,d)向量作为attention score。因此，为了减少计算量，把K和V矩阵进行cache也是理所当然。

接下来的计算就非常简单，注释也非常直观。详细步骤如下：

计算Wo (d,d) @ xb^T (d,)得到xb2 (d,)
通过残差连接，叠加x (d,)向量：x += xb2
x再经过一个RMSNorm(x)，得到xb (d,)
计算hb和hb2：W1(hd, d) @ xb (d,) -> hb1(hd,) , W3(hd, d) @ xb (d,) -> hb2(hd, )
hb经过silu非线性激活函数变换，计算方式为：$$silu(hb) = hb (1/ (1 + e^{-hb}))$$
然后计算逐位相乘 hb * hb2, 得到hb (hd,)
计算W2(d, hd) @ hb (hd,) -> xb (d,)
最终再通过残差连接，叠加xb向量：x += xb

// final matmul to get the output of the attention
matmul(s->xb2, s->xb, w->wo + l*dim*dim, dim, dim);

// residual connection back into x
for (int i = 0; i < dim; i++) {
    x[i] += s->xb2[i];
}

// ffn rmsnorm
rmsnorm(s->xb, x, w->rms_ffn_weight + l*dim, dim);

// Now for FFN in PyTorch we have: self.w2(F.silu(self.w1(x)) * self.w3(x))
// first calculate self.w1(x) and self.w3(x)
matmul(s->hb, s->xb, w->w1 + l*dim*hidden_dim, dim, hidden_dim);
matmul(s->hb2, s->xb, w->w3 + l*dim*hidden_dim, dim, hidden_dim);

// SwiGLU non-linearity
for (int i = 0; i < hidden_dim; i++) {
    float val = s->hb[i];
    // silu(x)=x*σ(x), where σ(x) is the logistic sigmoid
    val *= (1.0f / (1.0f + expf(-val)));
    // elementwise multiply with w3(x)
    val *= s->hb2[i];
    s->hb[i] = val;
}

// final matmul to get the output of the ffn
matmul(s->xb, s->hb, w->w2 + l*dim*hidden_dim, hidden_dim, dim);

// residual connection
for (int i = 0; i < dim; i++) {
    x[i] += s->xb[i];
}

// final matmul to get the output of the attention

matmul(s->xb2, s->xb, w->wo + l*dim*dim, dim, dim);

// residual connection back into x

for (int i = 0; i < dim; i++) {

x[i] += s->xb2[i];

}

// ffn rmsnorm

rmsnorm(s->xb, x, w->rms_ffn_weight + l*dim, dim);

// Now for FFN in PyTorch we have: self.w2(F.silu(self.w1(x)) * self.w3(x))

// first calculate self.w1(x) and self.w3(x)

matmul(s->hb, s->xb, w->w1 + l*dim*hidden_dim, dim, hidden_dim);

matmul(s->hb2, s->xb, w->w3 + l*dim*hidden_dim, dim, hidden_dim);

// SwiGLU non-linearity

for (int i = 0; i < hidden_dim; i++) {

float val = s->hb[i];

// silu(x)=x*σ(x), where σ(x) is the logistic sigmoid

val *= (1.0f / (1.0f + expf(-val)));

// elementwise multiply with w3(x)

val *= s->hb2[i];

s->hb[i] = val;

}

// final matmul to get the output of the ffn

matmul(s->xb, s->hb, w->w2 + l*dim*hidden_dim, hidden_dim, dim);

// residual connection

for (int i = 0; i < dim; i++) {

x[i] += s->xb[i];

}

继续每一层的计算，每一层的输入都是x，输出也是x，循环计算。在每一层都算完以后，最后再计算：

RMSNorm(x)，把x向量进行归一化。
计算Wc(dvoc, d) @ x (d,) -> logits (dvoc,)，其中dvoc为词典大小。

至此，最终得到的logits就是该位置的在token词典中的分类概率。

// final rmsnorm
rmsnorm(x, x, w->rms_final_weight, dim);

// classifier into logits
matmul(s->logits, x, w->wcls, p->dim, p->vocab_size);
return s->logits;

// final rmsnorm

rmsnorm(x, x, w->rms_final_weight, dim);

// classifier into logits

matmul(s->logits, x, w->wcls, p->dim, p->vocab_size);

return s->logits;

2.3 抽样方法

拿到logits之后，需要通过抽样来最终确定输出哪个token，常见的抽样方法有greedy(argmax)，随机抽样，以及top-p (nucleus) 抽样。

2.3.1 Greedy Sampling

Greedy Sampling是直接选择概率最大的token作为输出。代码简单直观，如下：

int sample_argmax(float* probabilities, int n) {
    // return the index that has the highest probability
    int max_i = 0;
    float max_p = probabilities[0];
    for (int i = 1; i < n; i++) {
        if (probabilities[i] > max_p) {
            max_i = i;
            max_p = probabilities[i];
        }
    }
    return max_i;
}

int sample_argmax(float* probabilities, int n) {

// return the index that has the highest probability

int max_i = 0;

float max_p = probabilities[0];

for (int i = 1; i < n; i++) {

if (probabilities[i] > max_p) {

max_i = i;

max_p = probabilities[i];

}

return max_i;

}

2.3.2 Random Sampling

Random Sampling是随机选择一个token作为输出。代码也很简单，如下：

int sample_mult(float* probabilities, int n, float coin) {
    // sample index from probabilities (they must sum to 1!)
    // coin is a random number in [0, 1), usually from random_f32()
    float cdf = 0.0f;
    for (int i = 0; i < n; i++) {
        cdf += probabilities[i];
        if (coin < cdf) {
            return i;
        }
    }
    return n - 1; // in case of rounding errors
}

int sample_mult(float* probabilities, int n, float coin) {

// sample index from probabilities (they must sum to 1!)

// coin is a random number in [0, 1), usually from random_f32()

float cdf = 0.0f;

for (int i = 0; i < n; i++) {

cdf += probabilities[i];

if (coin < cdf) {

return i;

}

return n - 1; // in case of rounding errors

}

2.3.3 Top-p (Nucleus) Sampling

Top-p (Nucleus) Sampling是随机选择概率大于某个阈值的token作为输出。代码也很简单，如下：

int sample_topp(float* probabilities, int n, float topp, ProbIndex* probindex, float coin) {
    // top-p sampling (or "nucleus sampling") samples from the smallest set of
    // tokens that exceed probability topp. This way we never sample tokens that
    // have very low probabilities and are less likely to go "off the rails".
    // coin is a random number in [0, 1), usually from random_f32()

    int n0 = 0;
    // quicksort indices in descending order of probabilities
    // values smaller than (1 - topp) / (n - 1) cannot be part of the result
    // so for efficiency we crop these out as candidates before sorting
    const float cutoff = (1.0f - topp) / (n - 1);
    for (int i = 0; i < n; i++) {
        if (probabilities[i] >= cutoff) {
            probindex[n0].index = i;
            probindex[n0].prob = probabilities[i];
            n0++;
        }
    }
    qsort(probindex, n0, sizeof(ProbIndex), compare);

    // truncate the list where cumulative probability exceeds topp
    float cumulative_prob = 0.0f;
    int last_idx = n0 - 1; // in case of rounding errors consider all elements
    for (int i = 0; i < n0; i++) {
        cumulative_prob += probindex[i].prob;
        if (cumulative_prob > topp) {
            last_idx = i;
            break; // we've exceeded topp by including last_idx
        }
    }

    // sample from the truncated list
    float r = coin * cumulative_prob;
    float cdf = 0.0f;
    for (int i = 0; i <= last_idx; i++) {
        cdf += probindex[i].prob;
        if (r < cdf) {
            return probindex[i].index;
        }
    }
    return probindex[last_idx].index; // in case of rounding errors
}

int sample_topp(float* probabilities, int n, float topp, ProbIndex* probindex, float coin) {

// top-p sampling (or "nucleus sampling") samples from the smallest set of

// tokens that exceed probability topp. This way we never sample tokens that

// have very low probabilities and are less likely to go "off the rails".

// coin is a random number in [0, 1), usually from random_f32()

int n0 = 0;

// quicksort indices in descending order of probabilities

// values smaller than (1 - topp) / (n - 1) cannot be part of the result

// so for efficiency we crop these out as candidates before sorting

const float cutoff = (1.0f - topp) / (n - 1);

for (int i = 0; i < n; i++) {

if (probabilities[i] >= cutoff) {

probindex[n0].index = i;

probindex[n0].prob = probabilities[i];

n0++;

}

qsort(probindex, n0, sizeof(ProbIndex), compare);

// truncate the list where cumulative probability exceeds topp

float cumulative_prob = 0.0f;

int last_idx = n0 - 1; // in case of rounding errors consider all elements

for (int i = 0; i < n0; i++) {

cumulative_prob += probindex[i].prob;

if (cumulative_prob > topp) {

last_idx = i;

break; // we've exceeded topp by including last_idx

}

// sample from the truncated list

float r = coin * cumulative_prob;

float cdf = 0.0f;

for (int i = 0; i <= last_idx; i++) {

cdf += probindex[i].prob;

if (r < cdf) {

return probindex[i].index;

}

return probindex[last_idx].index; // in case of rounding errors

}

2.3.4 选择抽样策略

具体执行抽样前，需要做一些变换，比如：

除以temperature，用来调整概率分布，温度越高，概率分布越平滑
计算softmax(logits)，得到概率分布代码如下所示：

// apply the temperature to the logits
for (int q=0; q<sampler->vocab_size; q++) { logits[q] /= sampler->temperature; }
// apply softmax to the logits to get the probabilities for next token
softmax(logits, sampler->vocab_size);

// apply the temperature to the logits

for (int q=0; q<sampler->vocab_size; q++) { logits[q] /= sampler->temperature; }

// apply softmax to the logits to get the probabilities for next token

softmax(logits, sampler->vocab_size);

然后根据不同的采样策略，选择不同的采样函数。

2.4 encode和decode

2.4.1 encode

encode函数将输入文本转化为token id序列。token id为int类型，长度为max_len。encode算法非常直观，先是在tokenize词典中查询每个UTF-8字符。如果找不到，则将文本编码为byte fallback。注意每个UTF-8字符长度是1到3个字节之间，需要针对UTF-8编码的规范进行判断。

代码如下：

// process the raw (UTF-8) byte sequence of the input string
for (char *c = text; *c != '\0'; c++) {

    // reset buffer if the current byte is ASCII or a leading byte
    // 0xC0 is 11000000, so (*c & 0xC0) keeps the first 2 bits and zeros the rest
    // 0x80 is 10000000
    // in UTF-8, all continuation bytes start with "10" in first two bits
    // so in English this is: "if this byte is not a continuation byte"
    if ((*c & 0xC0) != 0x80) {
        // this byte must be either a leading byte (11...) or an ASCII char (0x...)
        // => reset our location, as we're starting a new UTF-8 codepoint
        str_len = 0;
    }

    // append the current byte to the buffer
    str_buffer[str_len++] = *c; // ++ is post-increment, incremented after this line
    str_buffer[str_len] = '\0';

    // while the next character is a continuation byte, continue appending
    // but if there are too many of them, just stop to avoid overruning str_buffer size.
    if ((*(c+1) & 0xC0) == 0x80 && str_len < 4) {
        continue;
    }

    // ok c+1 is not a continuation byte, so we've read in a full codepoint
    int id = str_lookup(str_buffer, t->sorted_vocab, t->vocab_size);

    if (id != -1) {
        // we found this codepoint in vocab, add it as a token
        tokens[(*n_tokens)++] = id;
    } else {
        // byte_fallback encoding: just encode each byte as a token
        // +3 is here because the first 3 vocab elements are <unk>, <s>, </s>
        // so the individual bytes only start at index 3
        for (int i=0; i < str_len; i++) {
            tokens[(*n_tokens)++] = (unsigned char)str_buffer[i] + 3;
        }
    }
    str_len = 0; // protect against a sequence of stray UTF8 continuation bytes
}

// process the raw (UTF-8) byte sequence of the input string

for (char *c = text; *c != '\0'; c++) {

// reset buffer if the current byte is ASCII or a leading byte

// 0xC0 is 11000000, so (*c & 0xC0) keeps the first 2 bits and zeros the rest

// 0x80 is 10000000

// in UTF-8, all continuation bytes start with "10" in first two bits

// so in English this is: "if this byte is not a continuation byte"

if ((*c & 0xC0) != 0x80) {

// this byte must be either a leading byte (11...) or an ASCII char (0x...)

// => reset our location, as we're starting a new UTF-8 codepoint

str_len = 0;

}

// append the current byte to the buffer

str_buffer[str_len++] = *c; // ++ is post-increment, incremented after this line

str_buffer[str_len] = '\0';

// while the next character is a continuation byte, continue appending

// but if there are too many of them, just stop to avoid overruning str_buffer size.

if ((*(c+1) & 0xC0) == 0x80 && str_len < 4) {

continue;

}

// ok c+1 is not a continuation byte, so we've read in a full codepoint

int id = str_lookup(str_buffer, t->sorted_vocab, t->vocab_size);

if (id != -1) {

// we found this codepoint in vocab, add it as a token

tokens[(*n_tokens)++] = id;

} else {

// byte_fallback encoding: just encode each byte as a token

// +3 is here because the first 3 vocab elements are <unk>, <s>, </s>

// so the individual bytes only start at index 3

for (int i=0; i < str_len; i++) {

tokens[(*n_tokens)++] = (unsigned char)str_buffer[i] + 3;

}

str_len = 0; // protect against a sequence of stray UTF8 continuation bytes

}

其次，尝试合并临近的字符，并查询tokenize词典，如果存在，则将临近的token缩对应的字符串合并为一个token。并反复迭代，直到找不到相邻的两个token可以合并为一个token为止。代码也很直观，如下：

// merge the best consecutive pair each iteration, according the scores in vocab_scores
while (1) {
    float best_score = -1e10;
    int best_id = -1;
    int best_idx = -1;

    for (int i=0; i < (*n_tokens-1); i++) {
        // check if we can merge the pair (tokens[i], tokens[i+1])
        sprintf(str_buffer, "%s%s", t->vocab[tokens[i]], t->vocab[tokens[i+1]]);
        int id = str_lookup(str_buffer, t->sorted_vocab, t->vocab_size);
        if (id != -1 && t->vocab_scores[id] > best_score) {
            // this merge pair exists in vocab! record its score and position
            best_score = t->vocab_scores[id];
            best_id = id;
            best_idx = i;
        }
    }

    if (best_idx == -1) {
        break; // we couldn't find any more pairs to merge, so we're done
    }

    // merge the consecutive pair (best_idx, best_idx+1) into new token best_id
    tokens[best_idx] = best_id;
    // delete token at position best_idx+1, shift the entire sequence back 1
    for (int i = best_idx+1; i < (*n_tokens-1); i++) {
        tokens[i] = tokens[i+1];
    }
    (*n_tokens)--; // token length decreased
}

// merge the best consecutive pair each iteration, according the scores in vocab_scores

while (1) {

float best_score = -1e10;

int best_id = -1;

int best_idx = -1;

for (int i=0; i < (*n_tokens-1); i++) {

// check if we can merge the pair (tokens[i], tokens[i+1])

sprintf(str_buffer, "%s%s", t->vocab[tokens[i]], t->vocab[tokens[i+1]]);

int id = str_lookup(str_buffer, t->sorted_vocab, t->vocab_size);

if (id != -1 && t->vocab_scores[id] > best_score) {

// this merge pair exists in vocab! record its score and position

best_score = t->vocab_scores[id];

best_id = id;

best_idx = i;

}

if (best_idx == -1) {

break; // we couldn't find any more pairs to merge, so we're done

}

// merge the consecutive pair (best_idx, best_idx+1) into new token best_id

tokens[best_idx] = best_id;

// delete token at position best_idx+1, shift the entire sequence back 1

for (int i = best_idx+1; i < (*n_tokens-1); i++) {

tokens[i] = tokens[i+1];

}

(*n_tokens)--; // token length decreased

}

2.4.2 decode

decode函数将token id序列转化为文本。代码也直观，有一些比较tricky之处，代码也注释清楚：

char* decode(Tokenizer* t, int prev_token, int token) { char *piece = t->vocab[token]; // following BOS (1) token, sentencepiece decoder strips any leading whitespace (see PR #89) if (prev_token == 1 && piece[0] == ' ') { piece++; } // careful, some tokens designate raw bytes, and look like e.g. '<0x01>' // parse this and convert and return the actual byte unsigned char byte_val; if (sscanf(piece, "<0x%0char* decode(Tokenizer* t, int prev_token, int token) {
    char *piece = t->vocab[token];
    // following BOS (1) token, sentencepiece decoder strips any leading whitespace (see PR #89)
    if (prev_token == 1 && piece[0] == ' ') { piece++; }
    // careful, some tokens designate raw bytes, and look like e.g. '<0x01>'
    // parse this and convert and return the actual byte
    unsigned char byte_val;
    if (sscanf(piece, "<0x%02hhX>", &byte_val) == 1) {
        piece = (char*)t->byte_pieces + byte_val * 2;
    }
    return piece;
}

char* decode(Tokenizer* t, int prev_token, int token) { char *piece = t->vocab[token]; // following BOS (1) token, sentencepiece decoder strips any leading whitespace (see PR #89) if (prev_token == 1 && piece[0] == ' ') { piece++; } // careful, some tokens designate raw bytes, and look like e.g. '<0x01>' // parse this and convert and return the actual byte unsigned char byte_val; if (sscanf(piece, "<0x%0char* decode(Tokenizer* t, int prev_token, int token) {

char *piece = t->vocab[token];

// following BOS (1) token, sentencepiece decoder strips any leading whitespace (see PR #89)

if (prev_token == 1 && piece[0] == ' ') { piece++; }

// careful, some tokens designate raw bytes, and look like e.g. '<0x01>'

// parse this and convert and return the actual byte

unsigned char byte_val;

if (sscanf(piece, "<0x%02hhX>", &byte_val) == 1) {

piece = (char*)t->byte_pieces + byte_val * 2;

}

return piece;

}

2.5 文本生成

文本生成是最基础的inference逻辑，对话也是基于文本生成而实现的。整个代码逻辑也非常简单：

将每一个token id逐个进行forward计算
判断当前token位置是否还在prompt长度内，如果不在则执行sampling策略，通过logits向量选取下一个token
否则直接从prompt中读取下一个token。
将下一个token进行decode，并打印出来。

代码详见：

while (pos < steps) {

    // forward the transformer to get logits for the next token
    float* logits = forward(transformer, token, pos);

    // advance the state machine
    if (pos < num_prompt_tokens - 1) {
        // if we are still processing the input prompt, force the next prompt token
        next = prompt_tokens[pos + 1];
    } else {
        // otherwise sample the next token from the logits
        next = sample(sampler, logits);
    }
    pos++;

    // data-dependent terminating condition: the BOS (=1) token delimits sequences
    if (next == 1) { break; }

    // print the token as string, decode it with the Tokenizer object
    char* piece = decode(tokenizer, token, next);
    safe_printf(piece); // same as printf("%s", piece), but skips "unsafe" bytes
    fflush(stdout);
    token = next;

    // init the timer here because the first iteration can be slower
    if (start == 0) { start = time_in_ms(); }
}

while (pos < steps) {

// forward the transformer to get logits for the next token

float* logits = forward(transformer, token, pos);

// advance the state machine

if (pos < num_prompt_tokens - 1) {

// if we are still processing the input prompt, force the next prompt token

next = prompt_tokens[pos + 1];

} else {

// otherwise sample the next token from the logits

next = sample(sampler, logits);

}

pos++;

// data-dependent terminating condition: the BOS (=1) token delimits sequences

if (next == 1) { break; }

// print the token as string, decode it with the Tokenizer object

char* piece = decode(tokenizer, token, next);

safe_printf(piece); // same as printf("%s", piece), but skips "unsafe" bytes

fflush(stdout);

token = next;

// init the timer here because the first iteration can be slower

if (start == 0) { start = time_in_ms(); }

}

2.6 其他

其他部分的代码就是一些简单的数据结构定义，以及helper函数和main函数，这里就不再赘述了。

3. 总结

总体来说，这个项目是一个toy项目，代码逻辑比较简单，但是也提供了非常多的细节参考。特别是兼容了MHA和GQA算法，对于理解这些算法的原理非常有帮助。

但也要看出，这个代码中并没有实现prefill阶段，而是采用逐个token输入的方式填充kv cache。效率的确比较低，但好在逻辑清晰，容易理解。

如果需要进一步优化这个代码，其实有很多可优化点，例如prefill的并行加载优化，减少重复decode等，但这些都超出了这个项目的范围，留给读者自己探索。

参考链接

llama2.c 源码阅读

Not running swift-stdlib-tool: ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES is enabled, but the product type 'com.apple.product-type.tool' is not a wrapper type.

使用 Xcode 在原有APP项目(.app)新建了一个独立的命令行可执行程序，作为 SMJobBless 使用的独立进程。编译的时候出现如下告警：

Not running swift-stdlib-tool: ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES is enabled, but the product type 'com.apple.product-type.tool' is not a wrapper type.

1	Not running swift-stdlib-tool: ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES is enabled, but the product type 'com.apple.product-type.tool' is not a wrapper type.

原来的项目配置了 ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES 编译选项，所以新建的子工程默认继承了这个编译选项。

根据编译参数文档 Build settings reference，文档对与这个编译参数的说明如下：

明确说明 ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES 编译选项，只针对 .app 项目，不能作用于独立的二进制程序。

原因在于 .app 项目是一个文件夹，可以在 Framework 目录里面存放额外的文件，独立的进程只有一个文件，没有存储独立的 libswift.dylib 的地方。

解决方法很简单，在编译选项中，设备设置为 No 即可。

如下图：

参考链接

Transformer架构变化：RMSNorm指南

引言

从 2017 年 Transformer 架构被提出以来，到 2025 已经 8 年过去了，Transformer 架构已经发生了很多变化。比如，现如今越来越多的大模型采用的是 RMSNorm¹ 而不是 LayerNorm。

RMSNorm（ Root Mean Square Layer Normalization ）是一种用于深度学习的归一化方法，其核心思想是通过对输入向量进行缩放归一化，以提升训练稳定性和效率。

今天这篇文章就是对 RMSNorm 的一个简单介绍，在了解 RMSNorm 之前，我们不妨先回顾一下什么是 LayerNorm。

LayerNorm 回顾

$\mathbf y=\frac{\mathbf x-E[\mathbf x]}{\sqrt{Var(\mathbf x)+\epsilon}}*\gamma+\beta$

上面是 LayerNorm 的公式，如果我们忽略放缩因子 $\gamma,\beta$ 不看，LayerNorm 做的事情很好理解：将每一个样本的特征向量 $x$ 转变为均值为 0，标准差为 1 的特征向量

为什么 LayerNorm 是有用的呢？之前流行的解释是

re-centering：输入 $\mathbf x$ 总是会减去均值 $E[\mathbf x]$。好处是如果输入 $\mathbf x$ 发生了整体的偏移（Shift Noise）也没事，输入 $\mathbf x$ 始终会在 0 的附近
re-scaling：减去均值之后总是会除以 $\sqrt{Var(\mathbf x)+\epsilon}$。好处是如果输入 $\mathbf x$ 被成比例放缩，也没有影响

可以写个简单的 PyTorch 代码验证一下

import torch

def re_centering(x):
    return x - x.mean(dim=-1)

def re_scaling(x):
    return x / (x.std(dim=-1) + 1e-5)

x = torch.arange(4).float()
print(x, re_centering(x + 10000))
# tensor([0., 1., 2., 3.]) tensor([-1.5000, -0.5000,  0.5000,  1.5000])
print(x, re_scaling(x * 10000))
# tensor([0., 1., 2., 3.]) tensor([0.0000, 0.7746, 1.5492, 2.3238])

import torch

def re_centering(x):

return x - x.mean(dim=-1)

def re_scaling(x):

return x / (x.std(dim=-1) + 1e-5)

x = torch.arange(4).float()

print(x, re_centering(x + 10000))

# tensor([0., 1., 2., 3.]) tensor([-1.5000, -0.5000, 0.5000, 1.5000])

print(x, re_scaling(x * 10000))

# tensor([0., 1., 2., 3.]) tensor([0.0000, 0.7746, 1.5492, 2.3238])

RMSNorm

RMSNorm 认为 LayerNorm 的价值在于 re-scaling 特性，跟 re-centering 倒是关系不大¹，所以在设计 RMSNorm 的时候作者只考虑如何做 re-scaling。下面是 RMSNorm 的公式

$ \mathbf y=\frac{\mathbf x}{\sqrt{\frac{1}{n}\sum_ix_i^2+\epsilon}}*\gamma $

和 LayerNorm 对比，主要的几个差异如下

分子不需要减去 $E[\mathbf x]$
分母从 $Var(\mathbf x)$ 变成了 $\frac{1}{n}\sum_ix_i^2$
只需要维护 $\gamma$ 参数，不需要维护 $\beta$

RMSNorm 的好处

通过上面观察到的几点差异，我们可以看出 RMSNorm 的一些显而易见的好处：

需要维护的参数更少了，只有 $\gamma$
计算量也减少了，因为不用计算输入 $\mathbf x$ 的均值 $E[\mathbf x]$（注意 $Var(\mathbf x)$ 的计算也需要均值）

当然，最重要的是，RMSNorm 的效果还真就挺好的，跟 LayerNorm 也差不了多少，具体的实验细节和结果可以参考原论文¹

PyTorch API

PyTorch 提供的 nn.RMSNorm 实现有如下的几个参数

normalized_shape：表示用于计算 RMS 基于的输入张量的末尾维度
eps：为了数值稳定，加上的一个很小的值
element_affine：是否要启用可学习参数 $\gamma$？

>>> rms_norm = nn.RMSNorm([2, 3])
>>> input = torch.randn(2, 2, 3)
>>> rms_norm(input)

>>> rms_norm = nn.RMSNorm([2, 3])

>>> input = torch.randn(2, 2, 3)

>>> rms_norm(input)

RMSNorm from Scratch

手写 RMSNorm 的难度不是很大，下面我写的代码可以作为参考

import torch
import torch.nn as nn
import torch.nn.functional as F

class RMSNorm(nn.Module):
    def __init__(
        self,
        normalized_shape: list | tuple,
        eps: float = 1e-5,
        element_affine: bool = True,
    ):
        super().__init__()
        self.eps = eps
        self.element_affine = element_affine
        if self.element_affine:
            self.gamma = nn.Parameter(torch.ones(normalized_shape))
        else:
            self.register_parameter("gamma", None)

    def forward(self, x: torch.Tensor):
        x = x * torch.rsqrt(self.eps + x.pow(2).mean(dim=-1, keepdim=True))

        return x if self.gamma is None else x * self.gamma

import torch

import torch.nn as nn

import torch.nn.functional as F

class RMSNorm(nn.Module):

def __init__(

self,

normalized_shape: list | tuple,

eps: float = 1e-5,

element_affine: bool = True,

super().__init__()

self.eps = eps

self.element_affine = element_affine

if self.element_affine:

self.gamma = nn.Parameter(torch.ones(normalized_shape))

else:

self.register_parameter("gamma", None)

def forward(self, x: torch.Tensor):

x = x * torch.rsqrt(self.eps + x.pow(2).mean(dim=-1, keepdim=True))

return x if self.gamma is None else x * self.gamma

Zhang, Biao, and Rico Sennrich. “Root mean square layer normalization.” Advances in Neural Information Processing Systems 32 (2019). ↩︎ ↩︎ ↩︎

2026 年 1 月
一	二	三	四	五	六	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31