Compare commits

...

28 Commits

Author SHA1 Message Date
zzy
d88fa3b8d3 feat: rename core types to scc prefix for consistency
Updated type names from `core_*` to `scc_*` across lex_parser and stream modules to maintain naming consistency within the SCC codebase. This includes changes to function signatures and internal usage of types like `core_probe_stream_t`, `core_pos_t`, and `cstring_t` to their `scc_*` counterparts.
2025-12-11 13:00:29 +08:00
zzy
35c13ee30a feat: add internationalization support with Chinese translations
- Introduce Translator class for simple i18n with English and Chinese locales
- Add concurrent.futures, os, and locale imports for parallel execution and language detection
- Implement automatic language detection based on system locale
- Provide translation keys for build messages, test results, and command outputs
- Support dynamic language switching via set_language method
2025-12-11 12:27:11 +08:00
zzy
098e41d3e5 feat(cbuild): refactor and enhance C build system with new features
- Refactor logging system with simplified ColorFormatter and improved output formatting
- Add test command with regex pattern matching and timeout support
- Implement file size formatting utility for build output
- Remove unused imports and streamline code structure
- Update .gitignore to exclude external/ directory
- Improve error handling and subprocess management in test execution
- Optimize build dependency resolution with topological sorting
- Enhance configuration parsing and target management

The changes focus on code quality improvements, adding testing capabilities, and optimizing the build process for better performance and maintainability.
2025-12-10 22:21:37 +08:00
zzy
186602a301 feat(core): rename string and stream functions for clarity
- Rename `cstring_push` to `cstring_append_ch` and `cstring_push_cstr` to `cstring_append_cstr` for consistent naming with new `cstring_append` function
- Update all callers in lexer and tests to use new function names
- Rename stream `destroy` method to `drop` for consistency with resource management conventions
- Fix potential overflow in string capacity calculation by adjusting growth logic
2025-12-09 18:04:53 +08:00
zzy
36bff64a91 feat 重构stream流API并适配lex_parse和lexer 2025-12-08 23:04:11 +08:00
zzy
1ab07a5815 feat(cbuild): 重构依赖解析器并新增依赖树打印功能
- 修改 `DependencyResolver.resolve` 方法返回类型为 `None`,不再返回依赖映射副本
- 新增 `print_tree` 方法用于打印格式化的依赖树结构
- 调整 `get_sorted_dependencies` 和 `get_all_contexts` 返回值类型为 `PackageConfig` 列表
- 在 `CPackageBuilder.tree` 中调用新的 `print_tree` 方法以简化逻辑
- 添加 `DummyCompiler` 类作为占位编译器实现
- 优化 argparse 命令行参数结构,将通用参数移到子命令中
- 改进编译器初始化逻辑,增加对参数存在性的检查
- 移除 clean 子命令中的 --all 参数选项
2025-11-27 15:25:45 +08:00
zzy
ed829bdc21 feat(cbuild): 重构依赖解析器并增强命令行功能
- 将 `resolved_deps` 重命名为 `deps` 并新增 `dep_map` 用于存储依赖关系
- 新增 `get_dependencies_of` 方法以获取指定包的直接依赖列表
- 实现递归打印依赖树的功能,优化 `tree` 命令展示效果
- 引入分层命令行参数解析,支持子命令及更多选项(如 --record、--args 等)
- 改进日志输出与构建模式提示信息,使其更准确反映当前构建状态
- 编译器类中增加命令记录机制,便于调试和回溯执行过程
2025-11-27 12:15:45 +08:00
zzy
e6a76e7a86 feat(lex_parser): 提取字符判断函数并增强解析器断言
将 `is_next_line` 内联函数重命名为 `lex_parse_is_endline` 并新增 `lex_parse_is_whitespace` 函数,统一用于词法解析中的字符分类。同时加强多个解析函数的输入参数断言,提升代码健壮性。

此外,修正了 `lex_parse_skip_whitespace` 中的逻辑错误,并优化部分注释和控制流结构。

feat(pprocessor): 初始化预处理器模块并添加基础功能实现

新增预处理器模块 `pprocessor`,包括宏定义、条件编译状态管理以及基本的指令解析框架。实现了标识符解析、空白跳过、关键字查找等功能,并初步支持 `#define` 指令的对象类宏替换。

该提交还引入了一组测试用例,覆盖多种宏展开场景及边界情况,确保预处理器的核心行为符合预期。
2025-11-24 22:44:08 +08:00
zzy
871d031ceb feat(lex_parser): 初始化词法解析器模块
新增词法解析器库 `smcc_lex_parser`,包含基础的词法规则解析功能:
- 支持字符、字符串、数字、标识符的解析
- 支持跳过注释、空白符、行尾等辅助函数
- 提供对应的单元测试用例,覆盖各类合法与非法输入情况

该模块依赖 `libcore`,并被 `smcc_lex` 模块引用以支持更上层的词法分析逻辑。
2025-11-23 22:53:46 +08:00
zzy
67af0c6bf2 feat(cbuild): 引入构建缓存机制与编译模式支持
新增了基于文件修改时间和内容哈希的构建缓存功能,能够有效避免不必要的重复编译。同时增加了多种编译模式(如 debug、release、test 等)及其对应的默认编译选项,提升了构建过程的灵活性和效率。

- 添加 `BuildCache` 类用于管理缓存逻辑
- 支持通过 `CompilerBuildMode` 枚举选择不同构建模式
- 在 `CPackageBuilder` 中集成缓存判断与更新流程
- 优化日志输出及部分代码结构以提升可读性
2025-11-22 17:03:48 +08:00
zzy
fa5611dabd fix(log): 修复默认日志实例命名冲突并优化宏定义
将 `logger_root` 重命名为 `__default_logger_root` 以避免潜在的符号冲突,
同时简化日志宏定义逻辑,提升代码可读性与维护性。此外,为防止 clang-format
格式化影响日志宏的排版,添加了 clang-format 开关注释。

refactor(memory): 优化 memcmp 函数中的 switch-case 结构

在 `smcc_memcmp` 函数中为每个 case 添加 `/* fall through */` 注释,
明确表示故意穿透到下一个 case,提高代码意图的清晰度,并增强静态分析工具的兼容性。
2025-11-22 16:59:28 +08:00
zzy
63f6f13883 feat(cbuild): 重构构建系统并迁移至 tools/cbuild
将 `cbuild.py` 迁移至 `tools/cbuild/` 并进行大量功能增强。引入依赖解析器、支持颜色日志输出、
改进包配置默认值处理、完善构建目标识别与拓扑排序依赖管理。同时添加 `.gitignore` 和
`pyproject.toml` 以支持标准 Python 包结构,并更新 README 文档。

新增命令支持:tree(显示依赖树)、clean(带文件统计)、test(运行测试)等,
优化了 Windows 平台下的可执行文件扩展名处理逻辑。

移除了旧的 `wc.py` 行数统计脚本。
2025-11-22 15:08:49 +08:00
zzy
d6941e1d2f feat(cbuild): 添加构建路径去重与相对化处理
新增 `_path_collection` 方法用于统一处理路径的解析、去重及相对化,
优化对象文件路径生成逻辑,支持更安全的路径映射机制,防止文件冲突。
同时添加对构建目录的清理功能(clean 命令),完善构建生命周期管理。

主要变更包括:
- 引入 `hashlib` 模块以支持路径哈希命名
- 重构 `get_build_components` 和 `get_object_path` 方法
- 新增 `clean` 命令及相关实现
- 改进命令行参数支持,增加 "clean" 选项
2025-11-21 18:03:10 +08:00
zzy
a3322f0d4c feat(runtime): 添加字符串和内存操作工具函数
- 在 `core_mem.h` 中新增 `smcc_strhash32`、`smcc_strlen` 和 `smcc_strcmp` 函数,
  提供 C 字符串的哈希、长度获取和比较功能
- 完善 `core_str.h` 中 `cstring_t` 结构体及相关操作函数的注释说明
- 更新 `core_str.h` 头文件保护宏命名,增强模块标识一致性
- 修改 `core_vec.h` 文件头部保护宏名称以匹配实际文件名

另外,在 lexer 测试运行代码中引入日志相关头文件并调整日志级别设置逻辑。
2025-11-21 17:52:42 +08:00
zzy
164bab0f13 fix(lexer): 修复词法分析器中的关键字比较与字符串处理逻辑
修正了关键字表的注释,明确要求其必须按字典序排列以确保二分查找正确性。
在词法分析过程中,修复标识符解析时对 `cstring` 的使用问题,并调整 token 类型赋值顺序,
避免潜在的未定义行为。同时新增测试文件用于验证操作符、关键字及各类字面量的识别准确性,
并更新测试运行器的日志级别控制参数。
2025-11-20 22:49:22 +08:00
zzy
f29fd92fdf feat(core): 添加字符串长度计算函数并优化数据结构定义
- 在 `core_mem.h` 中新增 `smcc_strlen` 函数,用于计算字符串长度
- 调整 `VEC` 宏定义参数,移除冗余的 name 参数,增强结构体声明一致性
- 修改 `cstring_from_cstr` 返回值字段顺序,保持代码风格统一
- 在 `libcore.h` 中增加日志相关宏定义的保护判断,防止重复定义冲突
2025-11-20 22:26:49 +08:00
zzy
d1fafa830d format using clang-format to formate code 2025-11-20 17:55:08 +08:00
zzy
9762cf8a2b feat(log): 支持设置多个日志级别的组合
将 `log_set_level` 函数的参数类型从 `log_level_t` 改为 `int`,
以支持传入多个日志级别的按位或组合。

同时调整测试代码中的日志级别设置方式,并修复部分逻辑引用问题,以及#未知宏跳过更多的行的bug。
2025-11-20 14:30:14 +08:00
zzy
47b56d52f6 feat(core): 重构词法分析器流接口并迁移至 core 库
将 lexer_stream 抽象为 core_stream,统一运行时核心组件的输入流模型。
移除了旧的 `lexer_stream.h` 定义,并将其功能完整迁移至 `core_stream.h` 中。
更新了内存流实现以适配新的 core_stream 接口,并修复部分资源释放问题。
同时调整日志模块包含方式,增强模块间解耦能力。

此变更影响词法分析器对输入流的操作方式,所有涉及 stream 的类型与函数均已替换为 core 前缀版本。
测试用例同步更新并验证通过。
2025-11-20 14:17:03 +08:00
zzy
5c24f35c87 feat 新的运行时环境 2025-11-20 11:22:37 +08:00
zzy
e22811f2f5 feat(build): 引入新的 Python 构建系统并移除旧 Makefile
新增基于 Python 的构建脚本 `cbuild.py`,支持包管理、依赖解析和模块化编译。
同时添加 `.gitignore` 忽略 `build` 目录,并在 `justfile` 中更新构建命令。
移除了原有的 `lib/Makefile` 和主目录下的相关 make 规则,统一使用新构建系统。
2025-11-20 10:44:59 +08:00
ZZY
8d97fe896c chore: 更新 .gitignore 文件
- 添加 docs 文件夹到忽略列表,以忽略 Doxygen 生成的文件
- 保持原有的忽略规则不变
2025-04-05 23:11:39 +08:00
ZZY
c800b48ca2 refactor(riscv32): 重构 RISC-V 指令定义
- 在 riscv32_def.h 中添加 RV_INSTRUCTIONS 宏,列出所有指令
- 在 riscv32_mcode.c 中使用宏定义指令数组,减少代码重复
- 此重构简化了指令添加和修改的过程,提高了代码可维护性
2025-04-02 15:56:51 +08:00
ZZY
1cf26c43f3 bugfix 添加lib的修改 2025-04-01 23:28:15 +08:00
ZZY
b57f21556a stable 重构文件结构
抽象出Machine Code
2025-04-01 23:27:25 +08:00
ZZY
74f43a1ab7 stable 2025-04-01 00:13:21 +08:00
ZZY
2b4857001c feat(frontend): 重构词法分析器
- 添加 .gitignore 文件,忽略编译器生成的二进制文件
- 重构 lexer.c 文件,改进了关键字处理和字符串处理
- 更新前端的前端、解析器和 AST 相关文件,以适应新的词法分析器
- 优化了 token 相关的定义和函数,引入了新的 token 类型
2025-03-23 12:13:16 +08:00
ZZY
05c637e594 refactor: 重构前端代码并添加日志功能
- 重命名和重构了多个文件,包括 lexer、parser 和 AST 相关代码
- 添加了日志功能,使用 LOG_* 宏替代原有的 error 和 warn 函数
- 优化了错误处理和内存分配方式
- 调整了代码结构,提高了模块化和可读性
2025-03-19 12:22:55 +08:00
189 changed files with 9843 additions and 6277 deletions

34
.gitignore vendored Normal file
View File

@@ -0,0 +1,34 @@
.*
!.gitignore
# doxygen generated files
docs
# smcc compiler generated files
*.bin
.s
.asm
# linux binary files
*.o
*.a
*.so
*.out
# windows binary files
*.obj
*.lib
*.dll
*.exe
# developed notes
note.md
# python
.venv
# cbuilder
build
# external
external/

2970
Doxyfile Normal file

File diff suppressed because it is too large Load Diff

5
Makefile Normal file
View File

@@ -0,0 +1,5 @@
build-docs:
doxygen Doxyfile
docs: build-docs
python -m http.server -d docs/html

15
README.md Normal file
View File

@@ -0,0 +1,15 @@
# Simple Models C Compiler
> Smaller Compiler(SMCC)
This is a simple C compiler that generates executable code from a simple c99 sub programming language. The language supports basic operations such as arithmetic, logical, conditional statements and if else while for switch case statements and function calls and system calls.
## Features
- 隔离标准库
- 轻量化
- 模块化
- 自举构建

View File

@@ -1,18 +0,0 @@
all: ccompiler
run: ccompiler
./ccompiler test.c flat.bin
ccompiler: frontend ir
gcc -g rv32ima_codegen.c -L../../frontend -lfrontend -L../../middleend -lir -o ccompiler
frontend:
make -C ../../frontend
ir:
make -C ../../middleend
clean:
rm -f ccompiler flat.bin
make -C ../../frontend clean
make -C ../../middleend clean

View File

@@ -1,341 +0,0 @@
#ifndef __RV32I_GEN_H__
#define __RV32I_GEN_H__
/**
31 25 24 20 19 15 14 12 11 7 6 0
imm[31:12] rd 0110111 U lui
imm[31:12] rd 0010111 U auipc
imm[20|10:1|11|19:12] rd 1101111 J jal
imm[11:0] rs1 000 rd 1100111 I jalr
imm[12|10:5] rs2 rs1 000 imm[4:1|11] 1100011 B beq
imm[12|10:5] rs2 rs1 001 imm[4:1|11] 1100011 B bne
imm[12|10:5] rs2 rs1 100 imm[4:1|11] 1100011 B blt
imm[12|10:5] rs2 rs1 101 imm[4:1|11] 1100011 B bge
imm[12|10:5] rs2 rs1 110 imm[4:1|11] 1100011 B bltu
imm[12|10:5] rs2 rs1 111 imm[4:1|11] 1100011 B bgeu
imm[11:0] rs1 000 rd 0000011 I lb
imm[11:0] rs1 001 rd 0000011 I lh
imm[11:0] rs1 010 rd 0000011 I lw
imm[11:0] rs1 100 rd 0000011 I lbu
imm[11:0] rs1 101 rd 0000011 I lhu
imm[11:5] rs2 rs1 000 imm[4:0] 0100011 S sb
imm[11:5] rs2 rs1 001 imm[4:0] 0100011 S sh
imm[11:5] rs2 rs1 010 imm[4:0] 0100011 S sw
imm[11:0] rs1 000 rd 0010011 I addi
imm[11:0] rs1 010 rd 0010011 I slti
imm[11:0] rs1 011 rd 0010011 I sltiu
imm[11:0] rs1 100 rd 0010011 I xori
imm[11:0] rs1 110 rd 0010011 I ori
imm[11:0] rs1 111 rd 0010011 I andi
0000000 shamt rs1 001 rd 0010011 I slli
0000000 shamt rs1 101 rd 0010011 I srli
0100000 shamt rs1 101 rd 0010011 I srai
0000000 rs2 rs1 000 rd 0110011 R add
0100000 rs2 rs1 000 rd 0110011 R sub
0000000 rs2 rs1 001 rd 0110011 R sll
0000000 rs2 rs1 010 rd 0110011 R slt
0000000 rs2 rs1 011 rd 0110011 R sltu
0000000 rs2 rs1 100 rd 0110011 R xor
0000000 rs2 rs1 101 rd 0110011 R srl
0100000 rs2 rs1 101 rd 0110011 R sra
0000000 rs2 rs1 110 rd 0110011 R or
0000000 rs2 rs1 111 rd 0110011 R and
0000 pred succ 00000 000 00000 0001111 I fence
0000 0000 0000 00000 001 00000 0001111 I fence.i
000000000000 00000 00 00000 1110011 I ecall
000000000000 00000 000 00000 1110011 I ebreak
csr rs1 001 rd 1110011 I csrrw
csr rs1 010 rd 1110011 I csrrs
csr rs1 011 rd 1110011 I csrrc
csr zimm 101 rd 1110011 I csrrwi
csr zimm 110 rd 1110011 I cssrrsi
csr zimm 111 rd 1110011 I csrrci
*/
#include <stdint.h>
// 寄存器枚举定义
typedef enum {
REG_X0, REG_X1, REG_X2, REG_X3, REG_X4, REG_X5, REG_X6, REG_X7,
REG_X8, REG_X9, REG_X10, REG_X11, REG_X12, REG_X13, REG_X14, REG_X15,
REG_X16, REG_X17, REG_X18, REG_X19, REG_X20, REG_X21, REG_X22, REG_X23,
REG_X24, REG_X25, REG_X26, REG_X27, REG_X28, REG_X29, REG_X30, REG_X31,
REG_ZERO = REG_X0, REG_RA = REG_X1, REG_SP = REG_X2, REG_GP = REG_X3,
REG_TP = REG_X4, REG_T0 = REG_X5, REG_T1 = REG_X6, REG_T2 = REG_X7,
REG_S0 = REG_X8, REG_S1 = REG_X9, REG_A0 = REG_X10, REG_A1 = REG_X11,
REG_A2 = REG_X12, REG_A3 = REG_X13, REG_A4 = REG_X14, REG_A5 = REG_X15,
REG_A6 = REG_X16, REG_A7 = REG_X17, REG_S2 = REG_X18, REG_S3 = REG_X19,
REG_S4 = REG_X20, REG_S5 = REG_X21, REG_S6 = REG_X22, REG_S7 = REG_X23,
REG_S8 = REG_X24, REG_S9 = REG_X25, REG_S10 = REG_X26, REG_S11 = REG_X27,
REG_T3 = REG_X28, REG_T4 = REG_X29, REG_T5 = REG_X30, REG_T6 = REG_X31,
} RV32Reg;
/******************** 立即数处理宏 ********************/
#define IMM_12BITS(imm) ((imm) & 0xFFF)
#define IMM_20BITS(imm) ((imm) & 0xFFFFF)
#define SHAMT_VAL(imm) ((imm) & 0x1F)
#define CSR_VAL(csr) ((csr) & 0xFFF)
// B型立即数编码[12|10:5|4:1|11]
#define ENCODE_B_IMM(imm) ( \
(((imm) >> 12) & 0x1) << 31 | /* imm[12:12] -> instr[31:31] */ \
(((imm) >> 5) & 0x3F) << 25 | /* imm[10:5] -> instr[30:25] */ \
(((imm) >> 1) & 0xF) << 8 | /* imm[4:1] -> instr[11:8] */ \
(((imm) >> 11) & 0x1) << 7) /* imm[11:11] -> instr[7:7] */
// J型立即数编码[20|10:1|11|19:12]W
#define ENCODE_J_IMM(imm) ( \
(((imm) >> 20) & 0x1) << 31 | /* imm[20:20] -> instr[31:31] */ \
(((imm) >> 1) & 0x3FF)<< 21 | /* imm[10:1] -> instr[30:21] */ \
(((imm) >> 11) & 0x1) << 20 | /* imm[11:11] -> instr[20:20] */ \
(((imm) >> 12) & 0xFF) << 12) /* imm[19:12] -> instr[19:12] */
/******************** 指令生成宏 ********************/
// R型指令宏
#define RV32_RTYPE(op, f3, f7, rd, rs1, rs2) (uint32_t)( \
(0x33 | ((rd) << 7) | ((f3) << 12) | ((rs1) << 15) | \
((rs2) << 20) | ((f7) << 25)) )
// I型指令宏
#define RV32_ITYPE(op, f3, rd, rs1, imm) (uint32_t)( \
(op | ((rd) << 7) | ((f3) << 12) | ((rs1) << 15) | \
(IMM_12BITS(imm) << 20)) )
// S型指令宏
#define RV32_STYPE(op, f3, rs1, rs2, imm) (uint32_t)( \
(op | ((IMM_12BITS(imm) & 0xFE0) << 20) | ((rs1) << 15) | \
((rs2) << 20) | ((f3) << 12) | ((IMM_12BITS(imm) & 0x1F) << 7)) )
// B型指令宏
#define RV32_BTYPE(op, f3, rs1, rs2, imm) (uint32_t)( \
(op | (ENCODE_B_IMM(imm)) | ((rs1) << 15) | \
((rs2) << 20) | ((f3) << 12)) )
// U型指令宏
#define RV32_UTYPE(op, rd, imm) (uint32_t)( \
(op | ((rd) << 7) | (IMM_20BITS((imm) >> 12) << 12)) )
// J型指令宏
#define RV32_JTYPE(op, rd, imm) (uint32_t)( \
(op | ((rd) << 7) | ENCODE_J_IMM(imm)) )
/******************** U-type ********************/
#define LUI(rd, imm) RV32_UTYPE(0x37, rd, imm)
#define AUIPC(rd, imm) RV32_UTYPE(0x17, rd, imm)
/******************** J-type ********************/
#define JAL(rd, imm) RV32_JTYPE(0x6F, rd, imm)
/******************** I-type ********************/
#define JALR(rd, rs1, imm) RV32_ITYPE(0x67, 0x0, rd, rs1, imm)
// Load instructions
#define LB(rd, rs1, imm) RV32_ITYPE(0x03, 0x0, rd, rs1, imm)
#define LH(rd, rs1, imm) RV32_ITYPE(0x03, 0x1, rd, rs1, imm)
#define LW(rd, rs1, imm) RV32_ITYPE(0x03, 0x2, rd, rs1, imm)
#define LBU(rd, rs1, imm) RV32_ITYPE(0x03, 0x4, rd, rs1, imm)
#define LHU(rd, rs1, imm) RV32_ITYPE(0x03, 0x5, rd, rs1, imm)
// Immediate arithmetic
#define ADDI(rd, rs1, imm) RV32_ITYPE(0x13, 0x0, rd, rs1, imm)
#define SLTI(rd, rs1, imm) RV32_ITYPE(0x13, 0x2, rd, rs1, imm)
#define SLTIU(rd, rs1, imm) RV32_ITYPE(0x13, 0x3, rd, rs1, imm)
#define XORI(rd, rs1, imm) RV32_ITYPE(0x13, 0x4, rd, rs1, imm)
#define ORI(rd, rs1, imm) RV32_ITYPE(0x13, 0x6, rd, rs1, imm)
#define ANDI(rd, rs1, imm) RV32_ITYPE(0x13, 0x7, rd, rs1, imm)
// Shift instructions
#define SLLI(rd, rs1, shamt) RV32_ITYPE(0x13, 0x1, rd, rs1, (0x00000000 | (shamt << 20)))
#define SRLI(rd, rs1, shamt) RV32_ITYPE(0x13, 0x5, rd, rs1, (0x00000000 | (shamt << 20)))
#define SRAI(rd, rs1, shamt) RV32_ITYPE(0x13, 0x5, rd, rs1, (0x40000000 | (shamt << 20)))
/******************** B-type ********************/
#define BEQ(rs1, rs2, imm) RV32_BTYPE(0x63, 0x0, rs1, rs2, imm)
#define BNE(rs1, rs2, imm) RV32_BTYPE(0x63, 0x1, rs1, rs2, imm)
#define BLT(rs1, rs2, imm) RV32_BTYPE(0x63, 0x4, rs1, rs2, imm)
#define BGE(rs1, rs2, imm) RV32_BTYPE(0x63, 0x5, rs1, rs2, imm)
#define BLTU(rs1, rs2, imm) RV32_BTYPE(0x63, 0x6, rs1, rs2, imm)
#define BGEU(rs1, rs2, imm) RV32_BTYPE(0x63, 0x7, rs1, rs2, imm)
/******************** S-type ********************/
#define SB(rs2, rs1, imm) RV32_STYPE(0x23, 0x0, rs1, rs2, imm)
#define SH(rs2, rs1, imm) RV32_STYPE(0x23, 0x1, rs1, rs2, imm)
#define SW(rs2, rs1, imm) RV32_STYPE(0x23, 0x2, rs1, rs2, imm)
/******************** R-type ********************/
#define ADD(rd, rs1, rs2) RV32_RTYPE(0x33, 0x0, 0x00, rd, rs1, rs2)
#define SUB(rd, rs1, rs2) RV32_RTYPE(0x33, 0x0, 0x20, rd, rs1, rs2)
#define SLL(rd, rs1, rs2) RV32_RTYPE(0x33, 0x1, 0x00, rd, rs1, rs2)
#define SLT(rd, rs1, rs2) RV32_RTYPE(0x33, 0x2, 0x00, rd, rs1, rs2)
#define SLTU(rd, rs1, rs2) RV32_RTYPE(0x33, 0x3, 0x00, rd, rs1, rs2)
#define XOR(rd, rs1, rs2) RV32_RTYPE(0x33, 0x4, 0x00, rd, rs1, rs2)
#define SRL(rd, rs1, rs2) RV32_RTYPE(0x33, 0x5, 0x00, rd, rs1, rs2)
#define SRA(rd, rs1, rs2) RV32_RTYPE(0x33, 0x5, 0x20, rd, rs1, rs2)
#define OR(rd, rs1, rs2) RV32_RTYPE(0x33, 0x6, 0x00, rd, rs1, rs2)
#define AND(rd, rs1, rs2) RV32_RTYPE(0x33, 0x7, 0x00, rd, rs1, rs2)
/******************** I-type (system) ********************/
#define FENCE(pred, succ) (uint32_t)( 0x0F | ((pred) << 23) | ((succ) << 27) )
#define FENCE_I() (uint32_t)( 0x100F )
#define ECALL() (uint32_t)( 0x73 )
#define EBREAK() (uint32_t)( 0x100073 )
// CSR instructions
#define CSRRW(rd, csr, rs) RV32_ITYPE(0x73, 0x1, rd, rs, CSR_VAL(csr))
#define CSRRS(rd, csr, rs) RV32_ITYPE(0x73, 0x2, rd, rs, CSR_VAL(csr))
#define CSRRC(rd, csr, rs) RV32_ITYPE(0x73, 0x3, rd, rs, CSR_VAL(csr))
#define CSRRWI(rd, csr, zimm) RV32_ITYPE(0x73, 0x5, rd, 0, (CSR_VAL(csr) | ((zimm) << 15)))
#define CSRRSI(rd, csr, zimm) RV32_ITYPE(0x73, 0x6, rd, 0, (CSR_VAL(csr) | ((zimm) << 15)))
#define CSRRCI(rd, csr, zimm) RV32_ITYPE(0x73, 0x7, rd, 0, (CSR_VAL(csr) | ((zimm) << 15)))
/* M-Extention */
#define MUL(rd, rs1, rs2) RV32_RTYPE(0x33, 0x0, 0x01, rd, rs1, rs2)
#define DIV(rd, rs1, rs2) RV32_RTYPE(0x33, 0x0, 0x05, rd, rs1, rs2)
#define REM(rd, rs1, rs2) RV32_RTYPE(0x33, 0x0, 0x07, rd, rs1, rs2)
/******************** Pseudo-instructions ********************/
// 伪指令
// nop (No operation)
#define NOP() ADDI(REG_X0, REG_X0, 0) // 无操作
// neg rd, rs (Two's complement of rs)
#define NEG(rd, rs) SUB(rd, REG_ZERO, rs) // 补码
// negw rd, rs (Two's complement word of rs)
#define NEGW(rd, rs) SUBW(rd, REG_ZERO, rs) // 字的补码
// snez rd, rs (Set if ≠ zero)
#define SNEZ(rd, rs) SLTU(rd, REG_X0, rs) // 非0则置位
// sltz rd, rs (Set if < zero)
#define SLTZ(rd, rs) SLT(rd, rs, REG_X0) // 小于0则置位
// sgtz rd, rs (Set if > zero)
#define SGTZ(rd, rs) SLT(rd, REG_X0, rs) // 大于0则置位
// beqz rs, offset (Branch if = zero)
#define BEQZ(rs, offset) BEQ(rs, REG_X0, offset) // 为0则转移
// bnez rs, offset (Branch if ≠ zero)
#define BNEZ(rs, offset) BNE(rs, REG_X0, offset) // 非0则转移
// blez rs, offset (Branch if ≤ zero)
#define BLEZ(rs, offset) BGE(REG_X0, rs, offset) // 小于等于0则转移
// bgez rs, offset (Branch if ≥ zero)
#define BGEZ(rs, offset) BGE(rs, REG_X0, offset) // 大于等于0则转移
// bltz rs, offset (Branch if < zero)
#define BLTZ(rs, offset) BLT(rs, REG_X0, offset) // 小于0则转移
// bgtz rs, offset (Branch if > zero)
#define BGTZ(rs, offset) BLT(REG_X0, rs, offset) // 大于0则转移
// j offset (Jump)
#define J(offset) JAL(REG_X0, offset) // 跳转
// jr rs (Jump register)
#define JR(rs) JALR(REG_X0, rs, 0) // 寄存器跳转
// ret (Return from subroutine)
#define RET() JALR(REG_X0, REG_RA, 0) // 从子过程返回
// tail offset (Tail call far-away subroutine)
#define TAIL_2(offset) AUIPC(REG_X6, offset), JAL(REG_X0, REG_X6, offset) // 尾调用远程子过程, 有2条指令
#define TAIL(offset) TAIL_2(offset) // Warning this have 2 instructions
// csrr csr, rd (Read CSR)
#define CSRR(csr, rd) CSRRS(rd, csr, REG_X0) // 读CSR寄存器
// csrw csr, rs (Write CSR)
#define CSR W(csr, rs) CSRRW(csr, REG_X0, rs) // 写CSR寄存器
// csrs csr, rs (Set bits in CSR)
#define CSRS(csr, rs) CSRRS(REG_X0, csr, rs) // CSR寄存器置零位
// csrrc csr, rs (Clear bits in CSR)
#define CSRC(csr, rs) CSRRC(REG_X0, csr, rs) // CSR寄存器清
// csrci csr, imm (Immediate clear bits in CSR)
#define CSRCI(csr, imm) CSRRCI(REG_X0, csr, imm) // 立即数清除CSR
// csrrwi csr, imm (Write CSR immediate)
#define CSRRWI2(csr, imm) CSRRWI(REG_X0, csr, imm) // 立即数写入CSR
// csrrsi csr, imm (Immediate set bits in CSR)
#define CSRRSI2(csr, imm) CSRRSI(REG_X0, csr, imm) // 立即数置位CSR
// csrrci csr, imm (Immediate clear bits in CSR)
#define CSRRCI2(csr, imm) CSRRCI(REG_X0, csr, imm) // 立即数清除CSR
// // frcsr rd (Read FP control/status register)
// #define FRC SR(rd) CSRRS(rd, FCSR, REG_X0) // 读取FP控制/状态寄存器
// // fscsr rs (Write FP control/status register)
// #define FSCSR(rs) CSRRW(REG_X0, FCSR, rs) // 写入FP控制/状态寄存器
// // frrm rd (Read FP rounding mode)
// #define FRRM(rd) CSRRS(rd, FRM, REG_X0) // 读取FP舍入模式
// // fsrm rs (Write FP rounding mode)
// #define FS RM(rs) CSRRW(REG_X0, FRM, rs) // 写入FP舍入模式
// // frflags rd (Read FP exception flags)
// #define FRFLAGS(rd) CSRRS(rd, FFLAGS, REG_X0) // 读取FP例外标志
// // fsflags rs (Write FP exception flags)
// #define FS FLAGS(rs) CSRRW(REG_X0, FFLAGS, rs) // 写入FP例外标志
// Myriad sequences
#define LI(rd, num) \
LUI(rd, num), \
ADDI(rd, rd, num)
#define MV(rd, rs) ADDI(rd, rs, 0)
#define NOT(rd, rs) XORI(rd, rs, -1)
#define SEQZ(rd, rs) SLTIU(rd, rs, 1)
#define SGT(rd, rs1, rs2) SLT(rd, rs2, rs1)
// TODO call have error when outof jalr
#define CALL(offset) \
AUIPC(REG_X1, REG_X0), \
JALR(REG_X1, REG_X1, offset)
#define CALL_ABS(addr) \
AUIPC(REG_X0, addr), \
JALR(REG_X1, REG_X0, addr)
#ifdef RISCV_VM_BUILDIN_ECALL
#define ECALL_PNT_INT(num) \
ADDI(REG_A0, REG_X0, num), \
ADDI(REG_A7, REG_X0, 0x1), \
ECALL()
#define ECALL_PNT_STR(str) \
ADDI(REG_A0, REG_X0, str), \
ADDI(REG_A7, REG_X0, 0x4), \
ECALL()
#define ECALL_EXIT2() \
ADDI(REG_A7, REG_X0, 93), \
ECALL()
#define ECALL_EXIT_ARG(errno) \
ADDI(REG_A0, REG_X0, errno), \
ECALL_EXIT2()
#define ECALL_EXIT() \
ADDI(REG_A7, REG_X0, 93), \
ECALL()
#define ECALL_SCAN_INT(int) \
ADDI(REG_A7, (1025 + 4)), \
ECALL()
#define ECALL_SCAN_STR(str) \
ADDI(REG_A0, REG_X0, str), \
ADDI(REG_A7, REG_X0, (1025 + 5)), \
ECALL()
#endif
#endif

View File

@@ -1,458 +0,0 @@
#define RISCV_VM_BUILDIN_ECALL
#include "rv32gen.h"
#include <stdio.h>
#include <assert.h>
// 指令编码联合体(自动处理小端序)
typedef union rv32code {
uint32_t code;
uint8_t bytes[4];
} rv32code_t;
#include "../../frontend/frontend.h"
#include "../../middleend/ir.h"
typedef struct {
int code_pos;
int to_idx;
int cur_idx;
int base_offset;
enum {
JMP_BRANCH,
JMP_JUMP,
JMP_CALL,
} type;
} jmp_t;
static struct {
vector_header(codes, rv32code_t);
int stack_offset;
int stack_base;
int tmp_reg;
ir_bblock_t* cur_block;
ir_func_t* cur_func;
ir_prog_t* prog;
vector_header(jmp, jmp_t*);
vector_header(call, jmp_t*);
int cur_func_offset;
int cur_block_offset;
} ctx;
int write_inst(union rv32code ins, FILE* fp) {
return fwrite(&ins, sizeof(union rv32code), 1, fp);
}
#define GENCODE(code) vector_push(ctx.codes, (rv32code_t)(code)); len += 4
#define GENCODES(...) do { \
rv32code_t codes[] = { \
__VA_ARGS__ \
}; \
for (int i = 0; i < sizeof(codes) / sizeof(codes[0]); i ++) { \
GENCODE(codes[i]); \
} \
} while (0)
static int stack_offset(ir_node_t* ptr) {
int offset = ctx.stack_base;
for (int i = 0; i < ctx.cur_func->bblocks.size; i ++) {
ir_bblock_t* block = vector_at(ctx.cur_func->bblocks, i);
for (int i = 0; i < block->instrs.size; i++) {
if (vector_at(block->instrs, i) == ptr) {
offset += i * 4;
assert(offset >= 0 && offset < ctx.stack_offset);
return offset;
}
}
offset += block->instrs.size * 4;
}
assert(0);
}
static int block_idx(ir_bblock_t* toblock) {
for (int i = 0; i < ctx.cur_func->bblocks.size; i ++) {
ir_bblock_t* block = vector_at(ctx.cur_func->bblocks, i);
if (toblock == block) {
return i;
}
}
assert(0);
}
static int func_idx(ir_func_t* tofunc) {
for (int i = 0; i < ctx.prog->funcs.size; i ++) {
ir_func_t* func = vector_at(ctx.prog->funcs, i);
if (tofunc == func) {
return i;
}
}
assert(0);
}
static int system_func(const char* name) {
static struct {
const char* name;
int ecall_num;
} defined_func[] = {
{"ecall_pnt_int", 1},
{"ecall_pnt_char", 11},
{"ecall_scan_int", 1025 + 4},
};
for (int i = 0; i < sizeof(defined_func)/sizeof(defined_func[0]); i++) {
if (strcmp(name, defined_func[i].name) == 0) {
return defined_func[i].ecall_num;
}
}
return -1;
}
static int get_node_val(ir_node_t* ptr, int reg) {
int len = 0;
switch (ptr->tag) {
case IR_NODE_CONST_INT: {
GENCODES(LI(reg, ptr->data.const_int.val));
break;
}
// case IR_NODE_CALL: {
// // GENCODE(SW(REG_A0, REG_SP, ctx.stack_offset));
// // GENCODE()
// // break;
// }
default: {
int offset = stack_offset(ptr);
GENCODE(LW(reg, REG_SP, offset));
break;
}
}
return len;
}
static int gen_instr(ir_bblock_t* block, ir_node_t* instr) {
int len = 0;
int offset;
switch (instr->tag) {
case IR_NODE_ALLOC: {
break;
}
case IR_NODE_LOAD: {
// S1 = *(S0 + imm)
offset = stack_offset(instr->data.load.target);
GENCODE(LW(REG_T0, REG_SP, offset));
break;
}
case IR_NODE_STORE: {
// *(S0 + imm) = S1
len += get_node_val(instr->data.store.value, REG_T0);
offset = stack_offset(instr->data.store.target);
GENCODE(SW(REG_T0, REG_SP, offset));
break;
}
case IR_NODE_RET: {
// A0 = S0
if (instr->data.ret.ret_val != NULL) {
len += get_node_val(instr->data.ret.ret_val, REG_A0);
}
GENCODE(LW(REG_RA, REG_SP, 0));
GENCODE(ADDI(REG_SP, REG_SP, ctx.stack_offset));
GENCODE(RET());
break;
}
case IR_NODE_OP: {
len += get_node_val(instr->data.op.lhs, REG_T1);
len += get_node_val(instr->data.op.rhs, REG_T2);
switch (instr->data.op.op) {
case IR_OP_ADD:
GENCODE(ADD(REG_T0, REG_T1, REG_T2));
break;
case IR_OP_SUB:
GENCODE(SUB(REG_T0, REG_T1, REG_T2));
break;
case IR_OP_MUL:
GENCODE(MUL(REG_T0, REG_T1, REG_T2));
break;
case IR_OP_DIV:
GENCODE(DIV(REG_T0, REG_T1, REG_T2));
break;
case IR_OP_MOD:
GENCODE(REM(REG_T0, REG_T1, REG_T2));
break;
case IR_OP_EQ:
GENCODE(XOR(REG_T0, REG_T1, REG_T2));
GENCODE(SEQZ(REG_T0, REG_T0));
break;
case IR_OP_GE:
GENCODE(SLT(REG_T0, REG_T1, REG_T2));
GENCODE(SEQZ(REG_T0, REG_T0));
break;
case IR_OP_GT:
GENCODE(SGT(REG_T0, REG_T1, REG_T2));
break;
case IR_OP_LE:
GENCODE(SGT(REG_T0, REG_T1, REG_T2));
GENCODE(SEQZ(REG_T0, REG_T0));
break;
case IR_OP_LT:
GENCODE(SLT(REG_T0, REG_T1, REG_T2));
break;
case IR_OP_NEQ:
GENCODE(XOR(REG_T0, REG_T1, REG_T2));
break;
default:
error("ERROR gen_instr op in riscv");
break;
}
offset = stack_offset(instr);
GENCODE(SW(REG_T0, REG_SP, offset));
break;
}
case IR_NODE_BRANCH: {
len += get_node_val(instr->data.branch.cond, REG_T0);
int tidx = block_idx(instr->data.branch.true_bblock);
int fidx = block_idx(instr->data.branch.false_bblock);
int cidx = block_idx(ctx.cur_block);
jmp_t* jmp;
jmp = xmalloc(sizeof(jmp_t));
*jmp = (jmp_t) {
.base_offset = 8,
.code_pos = ctx.codes.size,
.type = JMP_BRANCH,
.to_idx = tidx,
.cur_idx=cidx,
};
vector_push(ctx.jmp, jmp);
GENCODE(BNEZ(REG_T0, 0));
jmp = xmalloc(sizeof(jmp_t));
*jmp = (jmp_t) {
.base_offset = 4,
.code_pos = ctx.codes.size,
.type = JMP_JUMP,
.to_idx = fidx,
.cur_idx=cidx,
};
vector_push(ctx.jmp, jmp);
GENCODE(J(0));
break;
}
case IR_NODE_JUMP: {
int idx = block_idx(instr->data.jump.target_bblock);
jmp_t* jmp = xmalloc(sizeof(jmp_t));
*jmp = (jmp_t) {
.base_offset = 4,
.code_pos = ctx.codes.size,
.type = JMP_JUMP,
.to_idx = idx,
.cur_idx=block_idx(ctx.cur_block),
};
vector_push(ctx.jmp, jmp);
GENCODE(J(0));
break;
}
case IR_NODE_CALL: {
if (instr->data.call.args.size > 8) {
error("can't add so much params");
}
int param_regs[8] = {
REG_A0, REG_A1, REG_A2, REG_A3,
REG_A4, REG_A5, REG_A6, REG_A7
};
for (int i = 0; i < instr->data.call.args.size; i++) {
ir_node_t* param = vector_at(instr->data.call.args, i);
len += get_node_val(param, param_regs[i]);
}
int system_func_idx = system_func(instr->data.call.callee->name);
if (system_func_idx != -1) {
// ecall
GENCODES(
ADDI(REG_A7, REG_X0, system_func_idx),
ECALL()
);
goto CALL_END;
}
jmp_t* jmp = xmalloc(sizeof(jmp_t));
*jmp = (jmp_t) {
.base_offset = ctx.cur_func_offset + ctx.cur_block_offset + len,
.code_pos = ctx.codes.size,
.type = JMP_CALL,
.to_idx = func_idx(instr->data.call.callee),
.cur_idx = func_idx(ctx.cur_func),
};
vector_push(ctx.call, jmp);
GENCODES(CALL(0));
CALL_END:
offset = stack_offset(instr);
GENCODE(SW(REG_A0, REG_SP, offset));
break;
}
default:
error("ERROR gen_instr in riscv");
}
return len;
}
static int gen_block(ir_bblock_t* block) {
int len = 0;
ctx.cur_block = block;
for (int i = 0; i < block->instrs.size; i ++) {
ctx.cur_block_offset = len;
len += gen_instr(block, vector_at(block->instrs, i));
}
return len;
}
static int gen_func(ir_func_t* func) {
int len = 0;
ctx.cur_func = func;
ctx.stack_base = 16;
ctx.stack_offset = ctx.stack_base;
for (int i = 0; i < func->bblocks.size; i++) {
ctx.stack_offset += 4 * (*vector_at(func->bblocks, i)).instrs.size;
}
GENCODE(ADDI(REG_SP, REG_SP, -ctx.stack_offset));
GENCODE(SW(REG_RA, REG_SP, 0));
int param_regs[8] = {
REG_A0, REG_A1, REG_A2, REG_A3,
REG_A4, REG_A5, REG_A6, REG_A7
};
if (func->params.size > 8) {
error("can't add so much params");
}
for (int i = 0; i < func->params.size; i++) {
int offset = stack_offset(vector_at(func->params, i));
GENCODE(SW(param_regs[i], REG_SP, offset));
}
int jmp_cache[func->bblocks.size + 1];
if (ctx.jmp.data != NULL) vector_free(ctx.jmp);
vector_init(ctx.jmp);
jmp_cache[0] = 0;
for(int i = 0; i < func->bblocks.size; i ++) {
ctx.cur_func_offset = len;
jmp_cache[i + 1] = jmp_cache[i];
int ret = gen_block(vector_at(func->bblocks, i));
jmp_cache[i + 1] += ret;
len += ret;
}
for (int i = 0; i < ctx.jmp.size; i++) {
jmp_t* jmp = vector_at(ctx.jmp, i);
int32_t code = 0;
int offset = jmp_cache[jmp->to_idx] - (jmp_cache[jmp->cur_idx + 1] - jmp->base_offset);
if (jmp->type == JMP_JUMP) {
code = J(offset);
} else {
code = BNEZ(REG_T0, offset);
}
ctx.codes.data[jmp->code_pos] = (rv32code_t) {
.code = code,
};
}
return len;
}
static int gen_code(ir_prog_t* prog) {
ctx.prog = prog;
for (int i = 0; i < prog->extern_funcs.size; i++) {
if (system_func(prog->extern_funcs.data[i]->name) == -1) {
error("func %s not defined and not a system func", prog->extern_funcs.data[i]->name);
}
}
int len = 0;
int jmp_cache[prog->funcs.size + 1];
for(int i = 0; i < prog->funcs.size; i ++) {
jmp_cache[i + 1] = jmp_cache[i];
int ret = gen_func(vector_at(prog->funcs, i));
jmp_cache[i + 1] += ret;
len += ret;
}
for (int i = 0; i < ctx.call.size; i++) {
jmp_t* jmp = vector_at(ctx.call, i);
int32_t code = 0;
// FIXME ERROR
int offset = jmp_cache[jmp->to_idx] - (jmp_cache[jmp->cur_idx] + jmp->base_offset);
assert(offset > -0xfff && offset < 0xfff);
int32_t codes[2] = {
CALL(offset)
};
for (int i = 0; i < 2; i++) {
ctx.codes.data[jmp->code_pos + i] = (rv32code_t) {
.code = codes[i],
};
}
}
// Got Main pos;
for (int i = 0; i < prog->funcs.size; i++) {
if (strcmp(vector_at(prog->funcs, i)->name, "main") == 0) {
return jmp_cache[i];
}
}
error("main not found");
}
int main(int argc, char** argv) {
// gcc rv32ima_codegen.c -o rv32gen.exe
const char* infilename = "test.c";
const char* outfilename = "flat.bin";
if (argc >= 2) {
infilename = argv[1];
}
if (argc >= 3) {
outfilename = argv[2];
}
FILE* in = fopen(infilename, "r");
FILE* out = fopen(outfilename, "wb");
if (in == NULL || out == NULL) {
printf("Failed to open file\n");
return 1;
}
struct ASTNode* root = frontend(infilename, in, (sread_fn)fread_s);
gen_ir_from_ast(root);
int main_pos = gen_code(&prog);
#define CRT_CODE_SIZE 16
rv32code_t gcodes[] = {
LI(REG_SP, 0x1000),
LI(REG_RA, 0x0),
CALL(0),
// Exit
ECALL_EXIT2(),
};
main_pos += (CRT_CODE_SIZE - 4) * 4;
assert(main_pos > -0xfff && main_pos < 0xfff);
rv32code_t call_main[2] = {
CALL(main_pos)
};
gcodes[4] = call_main[0];
gcodes[5] = call_main[1];
for (int i = 0; i < CRT_CODE_SIZE; i++) {
write_inst((union rv32code) {
.code = NOP(),
}, out);
}
fflush(out);
assert(CRT_CODE_SIZE >= sizeof(gcodes) / sizeof(gcodes[0]));
fseek(out, 0, SEEK_SET);
fwrite(gcodes, sizeof(gcodes), 1, out);
fflush(out);
fseek(out, CRT_CODE_SIZE * 4, SEEK_SET);
fwrite(ctx.codes.data, sizeof(ctx.codes.data[0]), ctx.codes.size, out);
fflush(out);
fclose(in);
fclose(out);
// printf("comiler end out: %s\n", outfilename);
return 0;
}

View File

@@ -1,28 +0,0 @@
VM := ../../rv32-vm
CC := ../../ccompiler
STD_CC := gcc
TESTS := $(wildcard *.c)
# 定义所有测试目标
TEST_TARGETS := $(patsubst %.c, %_test, $(TESTS))
all: $(TEST_TARGETS)
%_test: %.c
@$(STD_CC) -g -o $@ $<
@$(CC) $< flat.bin
@./$@ ; ret_gcc=$$?
@$(VM) flat.bin ; ret_vm=$$?
@echo "Testing $@"
@if [ $$ret_gcc -eq $$ret_vm ]; then \
echo "$@ passed"; \
else \
echo "$@ failed: GCC returned $$ret_gcc, VM returned $$ret_vm"; \
exit 1; \
fi
clean:
rm -f $(TEST_TARGETS) flat.bin
.PHONY: all clean

View File

@@ -1,47 +0,0 @@
# 编译器设置
CC = gcc
AR = ar
CFLAGS = -g -Wall
# 源文件路径
LEXER_DIR = ./lexer
PARSER_DIR = ./parser
AST_DIR = ./parser/ast
SYMTAB_DIR = ./parser/symtab
# 源文件列表
SRCS = \
frontend.c \
$(LEXER_DIR)/lexer.c \
$(LEXER_DIR)/token.c \
$(PARSER_DIR)/parser.c \
$(AST_DIR)/ast.c \
$(AST_DIR)/block.c \
$(AST_DIR)/decl.c \
$(AST_DIR)/expr.c \
$(AST_DIR)/func.c \
$(AST_DIR)/program.c \
$(AST_DIR)/stmt.c \
$(AST_DIR)/term.c \
$(SYMTAB_DIR)/hashmap.c \
$(SYMTAB_DIR)/scope.c \
$(SYMTAB_DIR)/symtab.c \
# 生成目标文件列表
OBJS = $(SRCS:.c=.o)
# 最终目标
TARGET = libfrontend.a
all: $(TARGET)
$(TARGET): $(OBJS)
$(AR) rcs $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $<
clean:
rm -f $(OBJS) $(TARGET)
.PHONY: all clean

View File

@@ -1,18 +0,0 @@
#include "lexer/lexer.h"
#include "parser/symtab/symtab.h"
#include "frontend.h"
struct ASTNode* frontend(const char* file, void* stream, sread_fn sread) {
lexer_t lexer;
init_lexer(&lexer, file, stream, sread);
symtab_t symtab;
init_symtab(&symtab);
parser_t parser;
init_parser(&parser, &lexer, &symtab);
parse_prog(&parser);
// TODO Free the resourse
return parser.root;
}

View File

@@ -1,27 +0,0 @@
#ifndef __FRONTEND_H__
#define __FRONTEND_H__
#ifndef error
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#define STD_LIBRARY
#define error(...) do { fprintf(stderr, __VA_ARGS__); assert(0); } while (0)
#endif
#ifndef warn
#include <stdio.h>
#define STD_LIBRARY
#define warn(...) do { fprintf(stdout, __VA_ARGS__); } while (0)
#endif
#define xmalloc(size) malloc(size)
#ifndef FRONTEND_IMPLEMENTATION
#include "parser/parser.h"
#include "parser/ast/ast.h"
typedef int (*sread_fn)(void *dst_buf, int dst_size, int elem_size, int count, void *stream);
struct ASTNode* frontend(const char* file, void* stream, sread_fn sread);
#endif
#endif

View File

@@ -1,510 +0,0 @@
/**
* 仿照LCCompiler的词法分析部分
*
* 如下为LCC的README in 2025.2
This hierarchy is the distribution for lcc version 4.2.
lcc version 3.x is described in the book "A Retargetable C Compiler:
Design and Implementation" (Addison-Wesley, 1995, ISBN 0-8053-1670-1).
There are significant differences between 3.x and 4.x, most notably in
the intermediate code. For details, see
https://drh.github.io/lcc/documents/interface4.pdf.
VERSION 4.2 IS INCOMPATIBLE WITH EARLIER VERSIONS OF LCC. DO NOT
UNLOAD THIS DISTRIBUTION ON TOP OF A 3.X DISTRIBUTION.
LCC is a C89 ("ANSI C") compiler designed to be highly retargetable.
LOG describes the changes since the last release.
CPYRIGHT describes the conditions under you can use, copy, modify, and
distribute lcc or works derived from lcc.
doc/install.html is an HTML file that gives a complete description of
the distribution and installation instructions.
Chris Fraser / cwf@aya.yale.edu
David Hanson / drh@drhanson.net
*/
#define FRONTEND_IMPLEMENTATION
#include "../frontend.h"
#include "token.h"
#include "lexer.h"
static const struct {
const char* name;
enum CSTD_KEYWORD std_type;
tok_type_t tok;
} keywords[] = {
#define X(name, std_type, tok, ...) { #name, std_type, tok },
KEYWORD_TABLE
#undef X
};
// by using binary search to find the keyword
static inline int keyword_cmp(const char* name, int len) {
int low = 0;
int high = sizeof(keywords) / sizeof(keywords[0]) - 1;
while (low <= high) {
int mid = (low + high) / 2;
const char *key = keywords[mid].name;
int cmp = 0;
// 自定义字符串比较逻辑
for (int i = 0; i < len; i++) {
if (name[i] != key[i]) {
cmp = (unsigned char)name[i] - (unsigned char)key[i];
break;
}
if (name[i] == '\0') break; // 遇到终止符提前结束
}
if (cmp == 0) {
// 完全匹配检查(长度相同)
if (key[len] == '\0') return mid;
cmp = -1; // 当前关键词比输入长
}
if (cmp < 0) {
high = mid - 1;
} else {
low = mid + 1;
}
}
return -1; // Not a keyword.
}
void init_lexer(lexer_t* lexer, const char* file_name, void* stream, lexer_sread_fn sread)
{
lexer->cur_ptr = lexer->end_ptr = (unsigned char*)&(lexer->buffer);
lexer->index = 1;
lexer->line = 1;
lexer->stream = stream;
lexer->sread = sread;
for (int i = 0; i < sizeof(lexer->buffer) / sizeof(lexer->buffer[0]); i++) {
lexer->buffer[i] = 0;
}
}
static void flush_buffer(lexer_t* lexer) {
int num = lexer->end_ptr - lexer->cur_ptr;
for (int i = 0; i < num; i++) {
lexer->buffer[i] = lexer->cur_ptr[i];
}
lexer->cur_ptr = (unsigned char*)lexer->buffer;
int read_size = LEXER_BUFFER_SIZE - num;
// TODO size_t to int maybe lose precision
int got_size = lexer->sread(lexer->buffer + num, read_size, 1, read_size, lexer->stream);
if (got_size < 0) {
error("lexer read error");
} else if (got_size < read_size) {
lexer->end_ptr += got_size;
lexer->end_ptr[0] = '\0'; // EOF
lexer->end_ptr++;
} else if (got_size == read_size) {
lexer->end_ptr += got_size;
} else {
error("lexer read error imposible got_size > read_size maybe overflow?");
}
}
static void goto_newline(lexer_t* lexer) {
do {
if (lexer->cur_ptr == lexer->end_ptr) {
flush_buffer(lexer);
lexer->cur_ptr--;
}
lexer->cur_ptr++;
} while (*lexer->cur_ptr != '\n' && *lexer->cur_ptr != '\0');
}
static void goto_block_comment(lexer_t* lexer) {
while (1) {
if (lexer->end_ptr - lexer->cur_ptr < 2) {
flush_buffer(lexer);
}
if (*lexer->cur_ptr == '\0') {
break;
} else if (lexer->cur_ptr[0] == '*' && lexer->cur_ptr[1] == '/') {
lexer->cur_ptr += 2;
break;
} else {
lexer->cur_ptr++;
}
}
}
// TODO escape character not enough
static char got_slash(unsigned char* peek) {
switch (*peek) {
case '\\': return '\\';
case '\'': return '\'';
case '\"': return '\"';
case '\?': return '\?';
case '0': return '\0';
case 'b': return '\b';
case 'f': return '\f';
case 'n': return '\n';
case 'r': return '\r';
case 't': return '\t';
case 'v': return '\v';
default: error("Unknown escape character");
}
}
static void parse_char_literal(lexer_t* lexer, tok_t* token) {
char val = 0;
unsigned char* peek = lexer->cur_ptr + 1;
if (*peek == '\\') {
peek++;
val = got_slash(peek);
peek++;
} else {
val = *peek++;
}
if (*peek++ != '\'') error("Unclosed character literal");
token->val.ch = val;
lexer->cur_ptr = peek;
token->val.have = 1;
token->type = TOKEN_CHAR_LITERAL;
}
static void parse_string_literal(lexer_t* lexer, tok_t* token) {
unsigned char* peek = lexer->cur_ptr + 1;
// TODO string literal size check
char* dest = token->val.str = xmalloc(LEXER_MAX_TOKEN_SIZE + 1);
int len = 0;
while (*peek != '"') {
if (peek >= lexer->end_ptr) flush_buffer(lexer);
if (*peek == '\\') { // 处理转义
peek++;
*peek = got_slash(peek);
}
if (len >= LEXER_MAX_TOKEN_SIZE) error("String too long");
dest[len++] = *peek++;
}
dest[len] = '\0';
lexer->cur_ptr = peek + 1;
token->val.have = 1;
token->type = TOKEN_STRING_LITERAL;
}
// FIXME it write by AI maybe error
static void parse_number(lexer_t* lexer, tok_t* token) {
unsigned char* peek = lexer->cur_ptr;
int base = 10;
int is_float = 0;
long long int_val = 0;
double float_val = 0.0;
double fraction = 1.0;
// 判断进制
if (*peek == '0') {
peek++;
switch (*peek) {
case 'x':
case 'X':
base = 16;
default:
base = 8;
}
}
// 解析整数部分
while (1) {
int digit = -1;
if (*peek >= '0' && *peek <= '9') {
digit = *peek - '0';
} else if (base == 16) {
if (*peek >= 'a' && *peek <= 'f') digit = *peek - 'a' + 10;
else if (*peek >= 'A' && *peek <= 'F') digit = *peek - 'A' + 10;
}
if (digit < 0 || digit >= base) break;
if (!is_float) {
int_val = int_val * base + digit;
} else {
float_val = float_val * base + digit;
fraction *= base;
}
peek++;
}
// 解析浮点数
if (*peek == '.' && base == 10) {
is_float = 1;
float_val = int_val;
peek++;
while (*peek >= '0' && *peek <= '9') {
float_val = float_val * 10.0 + (*peek - '0');
fraction *= 10.0;
peek++;
}
float_val /= fraction;
}
// 解析科学计数法
if ((*peek == 'e' || *peek == 'E') && base == 10) {
is_float = 1;
peek++;
// int exp_sign = 1;
int exponent = 0;
if (*peek == '+') peek++;
else if (*peek == '-') {
// exp_sign = -1;
peek++;
}
while (*peek >= '0' && *peek <= '9') {
exponent = exponent * 10 + (*peek - '0');
peek++;
}
// float_val *= pow(10.0, exp_sign * exponent);
}
// 存储结果
lexer->cur_ptr = peek;
token->val.have = 1;
if (is_float) {
token->val.d = float_val;
token->type = TOKEN_FLOAT_LITERAL;
} else {
token->val.ll = int_val;
token->type = TOKEN_INT_LITERAL;
}
}
#define GOT_ONE_TOKEN_BUF_SIZE 64
// /zh/c/language/operator_arithmetic.html
void get_token(lexer_t* lexer, tok_t* token) {
// 需要保证缓冲区始终可读
if (lexer->end_ptr - lexer->cur_ptr < GOT_ONE_TOKEN_BUF_SIZE) {
flush_buffer(lexer);
}
register unsigned char* peek = lexer->cur_ptr;
// 快速跳过空白符
while (*peek == ' ' || *peek == '\t') {
if (peek == lexer->end_ptr) {
break;
}
peek++;
}
if (peek != lexer->cur_ptr) {
// To TOKEN_FLUSH
lexer->cur_ptr = peek;
token->type = TOKEN_FLUSH;
}
tok_type_t tok = TOKEN_INIT;
tok_val_t constant;
constant.have = 0;
// once step
switch (*peek++) {
case '=':
switch (*peek++) {
case '=': tok = TOKEN_EQ; break;
default: peek--, tok = TOKEN_ASSIGN; break;
} break;
case '+':
switch (*peek++) {
case '+': tok = TOKEN_ADD_ADD; break;
case '=': tok = TOKEN_ASSIGN_ADD; break;
default: peek--, tok = TOKEN_ADD; break;
} break;
case '-':
switch (*peek++) {
case '-': tok = TOKEN_SUB_SUB; break;
case '=': tok = TOKEN_ASSIGN_SUB; break;
case '>': tok = TOKEN_DEREF; break;
default: peek--, tok = TOKEN_SUB; break;
} break;
case '*':
switch (*peek++) {
case '=': tok = TOKEN_ASSIGN_MUL; break;
default: peek--, tok = TOKEN_MUL; break;
} break;
case '/':
switch (*peek++) {
case '=': tok = TOKEN_ASSIGN_DIV; break;
case '/': {
// need get a new line to parse
goto_newline(lexer);
tok = TOKEN_LINE_COMMENT;
goto END;
}
case '*': {
lexer->cur_ptr = peek;
goto_block_comment(lexer);
tok = TOKEN_BLOCK_COMMENT;
goto END;
}
default: peek--, tok = TOKEN_DIV; break;
} break;
case '%':
switch (*peek++) {
case '=': tok = TOKEN_ASSIGN_MOD; break;
default: peek--, tok = TOKEN_MOD; break;
} break;
case '&':
switch (*peek++) {
case '&': tok = TOKEN_AND_AND; break;
case '=': tok = TOKEN_ASSIGN_AND; break;
default: peek--, tok = TOKEN_AND; break;
} break;
case '|':
switch (*peek++) {
case '|': tok = TOKEN_OR_OR; break;
case '=': tok = TOKEN_ASSIGN_OR; break;
default: peek--, tok = TOKEN_OR; break;
} break;
case '^':
switch (*peek++) {
case '=': tok = TOKEN_ASSIGN_XOR; break;
default: peek--, tok = TOKEN_XOR; break;
} break;
case '<':
switch (*peek++) {
case '=': tok = TOKEN_LE; break;
case '<': tok = (*peek == '=') ? (peek++, TOKEN_ASSIGN_L_SH) : TOKEN_L_SH; break;
default: peek--, tok = TOKEN_LT; break;
} break;
case '>':
switch (*peek++) {
case '=': tok = TOKEN_GE; break;
case '>': tok = (*peek == '=') ? (peek++, TOKEN_ASSIGN_R_SH) : TOKEN_R_SH; break;
default: peek--, tok = TOKEN_GT; break;
} break;
case '~':
tok = TOKEN_BIT_NOT; break;
case '!':
switch (*peek++) {
case '=': tok = TOKEN_NEQ; break;
default: peek--, tok = TOKEN_NOT; break;
} break;
case '[':
tok = TOKEN_L_BRACKET; break;
case ']':
tok = TOKEN_R_BRACKET; break;
case '(':
tok = TOKEN_L_PAREN; break;
case ')':
tok = TOKEN_R_PAREN; break;
case '{':
tok = TOKEN_L_BRACE; break;
case '}':
tok = TOKEN_R_BRACE; break;
case ';':
tok = TOKEN_SEMICOLON; break;
case ',':
tok = TOKEN_COMMA; break;
case ':':
tok = TOKEN_COLON; break;
case '.':
if (peek[0] == '.' && peek[1] == '.') {
peek += 2;
tok = TOKEN_ELLIPSIS;
} else {
tok = TOKEN_DOT;
}
break;
case '?':
tok = TOKEN_COND; break;
case '\v': case '\r': case '\f': // FIXME it parse as a blank character
tok = TOKEN_FLUSH; break;
case '\n':
// you need to flush a newline or blank
lexer->line++;
tok = TOKEN_FLUSH; break;
case '#':
warn("TODO: #define\n");
goto_newline(lexer);
tok = TOKEN_FLUSH;
goto END;
case '\0':
// EOF
tok = TOKEN_EOF;
goto END;
case '\'':
return parse_char_literal(lexer, token);
return;
case '"':
return parse_string_literal(lexer, token);
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
return parse_number(lexer, token);
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z':
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':case 'Y': case 'Z':
case '_':
// TOKEN_IDENT
if ((*peek == 'L' && *peek == '\'') || (*peek == 'L' && *peek == '"')) {
error("unsupport wide-character char literal by `L` format");
}
while (1) {
if (peek == lexer->end_ptr) {
error("unsupport outof 64 length identifier");
}
if ((*peek >= 'a' && *peek <= 'z') || (*peek >= 'A' && *peek <= 'Z') ||
(*peek == '_') || (*peek >= '0' && *peek <= '9')) {
peek++;
continue;
}
break;
}
int res = keyword_cmp((const char*)lexer->cur_ptr, peek - (lexer->cur_ptr));
if (res == -1) {
int strlen = peek - lexer->cur_ptr;
unsigned char* str = xmalloc(strlen + 1);
constant.have = 1;
constant.str = (char*)str;
for (int i = 0; i < strlen; i++) {
str[i] = lexer->cur_ptr[i];
}
str[strlen] = '\0';
constant.have = 1;
constant.str = (char*)str;
tok = TOKEN_IDENT; break;
} else {
tok = keywords[res].tok; break;
}
default:
error("unsupport char in sourse code `%c`", *(lexer->cur_ptr));
break;
}
lexer->cur_ptr = peek;
END:
token->val = constant;
token->type = tok;
}
// get_token maybe got invalid (with parser)
void get_valid_token(lexer_t* lexer, tok_t* token) {
tok_type_t type;
do {
get_token(lexer, token);
type = token->type;
} while (type == TOKEN_FLUSH || type == TOKEN_LINE_COMMENT || type == TOKEN_BLOCK_COMMENT);
}

View File

@@ -1,37 +0,0 @@
#ifndef __LEXER_H__
#define __LEXER_H__
#include "token.h"
#ifndef LEXER_MAX_TOKEN_SIZE
#define LEXER_MAX_TOKEN_SIZE 63
#endif
#ifndef LEXER_BUFFER_SIZE
#define LEXER_BUFFER_SIZE 4095
#endif
typedef int (*lexer_sread_fn)(void *dst_buf, int dst_size,
int elem_size, int count, void *stream);
typedef struct lexer {
int line;
int index;
// const char current_file_name[LEXER_BUFFER_SIZE+1];
unsigned char* cur_ptr; // 当前扫描的字符,但是还没有开始扫描
unsigned char* end_ptr; // 缓冲区最后一个字符的下一个位置
char buffer[LEXER_BUFFER_SIZE+1];
lexer_sread_fn sread;
void* stream;
} lexer_t;
void init_lexer(lexer_t* lexer, const char* file_name, void* stream,
lexer_sread_fn sread);
// pure token getter it will included empty token like TOKEN_FLUSH
void get_token(lexer_t* lexer, tok_t* token);
// get_token maybe got invalid (with parser as TOKEN_FLUSH)
void get_valid_token(lexer_t* lexer, tok_t* token);
#endif

View File

@@ -1,17 +0,0 @@
CC = gcc
CFLAGS = -g -Wall
SRC = ../lexer.c ../token.c
all = test_all
test_all: test
./test
run:
$(CC) $(CFLAGS) $(SRC) run.c -o run
test:
$(CC) $(CFLAGS) $(SRC) -o test test.c
clean:
rm -f test run

View File

@@ -1,46 +0,0 @@
#include "../lexer.h"
#include <stdio.h>
// gcc -g ../lexer.c ../token.c test_lexer.c -o test_lexer
/*
tok_tConstant {
int have;
union {
char ch;
int i;
float f;
double d;
long long ll;
char* str;
};
};
*/
int g_num;
int g_num_arr[3];
int main(int argc, char* argv[]) {
int num = 0;
const char* file_name = "test_lexer.c";
if (argc == 2) {
file_name = argv[1];
}
FILE* fp = fopen(file_name, "r");
if (fp == NULL) {
perror("open file failed");
return 1;
}
printf("open file success\n");
lexer_t lexer;
init_lexer(&lexer, "test_lexter.c", fp, (lexer_sread_fn)fread_s);
tok_t tok;
while (1) {
get_valid_token(&lexer, &tok);
if (tok.type == TOKEN_EOF) {
break;
}
printf("line: %d, column: %d, type: %3d, typename: %s\n",
lexer.line, lexer.index, tok.type, get_tok_name(tok.type));
}
}

View File

@@ -1,172 +0,0 @@
// test_lexer.c
#include "../../../../libcore/acutest.h"
#include "../lexer.h"
#include <string.h>
int test_read(void *dst_buf, int dst_size, int elem_size, int count, void *stream) {
if (stream == NULL) {
return 0;
}
int size = dst_size > elem_size * count ? elem_size * count : dst_size;
memcpy(dst_buf, stream, size);
return size;
}
// 测试辅助函数
static inline void test_lexer_string(const char* input, tok_type_t expected_type) {
lexer_t lexer;
tok_t token;
init_lexer(&lexer, "test.c", (void*)input, test_read);
get_valid_token(&lexer, &token);
TEST_CHECK(token.type == expected_type);
TEST_MSG("Expected: %s", get_tok_name(expected_type));
TEST_MSG("Got: %s", get_tok_name(token.type));
}
// 基础运算符测试
void test_operators() {
TEST_CASE("Arithmetic operators"); {
test_lexer_string("+", TOKEN_ADD);
test_lexer_string("++", TOKEN_ADD_ADD);
test_lexer_string("+=", TOKEN_ASSIGN_ADD);
test_lexer_string("-", TOKEN_SUB);
test_lexer_string("--", TOKEN_SUB_SUB);
test_lexer_string("-=", TOKEN_ASSIGN_SUB);
test_lexer_string("*", TOKEN_MUL);
test_lexer_string("*=", TOKEN_ASSIGN_MUL);
test_lexer_string("/", TOKEN_DIV);
test_lexer_string("/=", TOKEN_ASSIGN_DIV);
test_lexer_string("%", TOKEN_MOD);
test_lexer_string("%=", TOKEN_ASSIGN_MOD);
}
TEST_CASE("Bitwise operators"); {
test_lexer_string("&", TOKEN_AND);
test_lexer_string("&&", TOKEN_AND_AND);
test_lexer_string("&=", TOKEN_ASSIGN_AND);
test_lexer_string("|", TOKEN_OR);
test_lexer_string("||", TOKEN_OR_OR);
test_lexer_string("|=", TOKEN_ASSIGN_OR);
test_lexer_string("^", TOKEN_XOR);
test_lexer_string("^=", TOKEN_ASSIGN_XOR);
test_lexer_string("~", TOKEN_BIT_NOT);
test_lexer_string("<<", TOKEN_L_SH);
test_lexer_string("<<=", TOKEN_ASSIGN_L_SH);
test_lexer_string(">>", TOKEN_R_SH);
test_lexer_string(">>=", TOKEN_ASSIGN_R_SH);
}
TEST_CASE("Comparison operators"); {
test_lexer_string("==", TOKEN_EQ);
test_lexer_string("!=", TOKEN_NEQ);
test_lexer_string("<", TOKEN_LT);
test_lexer_string("<=", TOKEN_LE);
test_lexer_string(">", TOKEN_GT);
test_lexer_string(">=", TOKEN_GE);
}
TEST_CASE("Special symbols"); {
test_lexer_string("(", TOKEN_L_PAREN);
test_lexer_string(")", TOKEN_R_PAREN);
test_lexer_string("[", TOKEN_L_BRACKET);
test_lexer_string("]", TOKEN_R_BRACKET);
test_lexer_string("{", TOKEN_L_BRACE);
test_lexer_string("}", TOKEN_R_BRACE);
test_lexer_string(";", TOKEN_SEMICOLON);
test_lexer_string(",", TOKEN_COMMA);
test_lexer_string(":", TOKEN_COLON);
test_lexer_string(".", TOKEN_DOT);
test_lexer_string("...", TOKEN_ELLIPSIS);
test_lexer_string("->", TOKEN_DEREF);
test_lexer_string("?", TOKEN_COND);
}
}
// 关键字测试
void test_keywords() {
TEST_CASE("C89 keywords");
test_lexer_string("while", TOKEN_WHILE);
test_lexer_string("sizeof", TOKEN_SIZEOF);
// TEST_CASE("C99 keywords");
// test_lexer_string("restrict", TOKEN_RESTRICT);
// test_lexer_string("_Bool", TOKEN_INT); // 需确认你的类型定义
}
// 字面量测试
void test_literals() {
TEST_CASE("Integer literals"); {
// 十进制
test_lexer_string("0", TOKEN_INT_LITERAL);
test_lexer_string("123", TOKEN_INT_LITERAL);
// test_lexer_string("2147483647", TOKEN_INT_LITERAL);
// // 十六进制
// test_lexer_string("0x0", TOKEN_INT_LITERAL);
// test_lexer_string("0x1A3F", TOKEN_INT_LITERAL);
// test_lexer_string("0XABCDEF", TOKEN_INT_LITERAL);
// // 八进制
// test_lexer_string("0123", TOKEN_INT_LITERAL);
// test_lexer_string("0777", TOKEN_INT_LITERAL);
// // 边界值测试
// test_lexer_string("2147483647", TOKEN_INT_LITERAL); // INT_MAX
// test_lexer_string("4294967295", TOKEN_INT_LITERAL); // UINT_MAX
}
TEST_CASE("Character literals"); {
test_lexer_string("'a'", TOKEN_CHAR_LITERAL);
test_lexer_string("'\\n'", TOKEN_CHAR_LITERAL);
test_lexer_string("'\\t'", TOKEN_CHAR_LITERAL);
test_lexer_string("'\\\\'", TOKEN_CHAR_LITERAL);
test_lexer_string("'\\0'", TOKEN_CHAR_LITERAL);
}
TEST_CASE("String literals"); {
test_lexer_string("\"hello\"", TOKEN_STRING_LITERAL);
test_lexer_string("\"multi-line\\nstring\"", TOKEN_STRING_LITERAL);
test_lexer_string("\"escape\\\"quote\"", TOKEN_STRING_LITERAL);
}
// TEST_CASE("Floating literals");
// test_lexer_string("3.14e-5", TOKEN_FLOAT_LITERAL);
}
// 边界测试
void test_edge_cases() {
// TEST_CASE("Long identifiers");
// char long_id[LEXER_MAX_TOKEN_SIZE+2] = {0};
// memset(long_id, 'a', LEXER_MAX_TOKEN_SIZE+1);
// test_lexer_string(long_id, TOKEN_IDENT);
// TEST_CASE("Buffer boundary");
// char boundary[LEXER_BUFFER_SIZE*2] = {0};
// memset(boundary, '+', LEXER_BUFFER_SIZE*2-1);
// test_lexer_string(boundary, TOKEN_ADD);
}
// 错误处理测试
void test_error_handling() {
TEST_CASE("Invalid characters");
lexer_t lexer;
tok_t token;
init_lexer(&lexer, "test.c", NULL, test_read);
get_valid_token(&lexer, &token);
TEST_CHECK(token.type == TOKEN_EOF); // 应触发错误处理
}
// 测试列表
TEST_LIST = {
{"operators", test_operators},
{"keywords", test_keywords},
{"literals", test_literals},
{"edge_cases", test_edge_cases},
{"error_handling", test_error_handling},
{NULL, NULL}
};

View File

@@ -1,84 +0,0 @@
#define FRONTEND_IMPLEMENTATION
#include "../frontend.h"
#include "token.h"
#define ROUND_IDX(idx) ((idx) % tokbuf->cap)
tok_t* pop_tok(tok_buf_t* tokbuf) {
if (tokbuf->size == 0) {
error("no token to pop");
return NULL;
}
int idx = tokbuf->cur;
tokbuf->cur = ROUND_IDX(idx + 1);
tokbuf->size -= 1;
return tokbuf->buf + idx;
}
void flush_peek_tok(tok_buf_t* tokbuf) {
tokbuf->peek = tokbuf->cur;
}
void init_tokbuf(tok_buf_t *tokbuf, void *stream, get_tokbuf_func gettok) {
tokbuf->cur = 0;
tokbuf->end = 0;
tokbuf->peek = 0;
tokbuf->size = 0;
tokbuf->stream = stream;
tokbuf->gettok = gettok;
tokbuf->buf = NULL;
tokbuf->cap = 0;
}
tok_t *peek_tok(tok_buf_t *tokbuf) {
int idx = tokbuf->peek;
tokbuf->peek = ROUND_IDX(idx + 1);
if (tokbuf->size >= tokbuf->cap) {
error("peek too deep, outof array size");
}
if (idx == tokbuf->end) {
if (tokbuf->size == tokbuf->cap) {
error("peek_tok buffer overflow");
}
if (tokbuf->gettok == NULL) {
error("peek_tok can not got tok");
}
tokbuf->gettok(tokbuf->stream, &(tokbuf->buf[idx]));
tokbuf->size++;
tokbuf->end = tokbuf->peek;
}
return &(tokbuf->buf[idx]);
}
tok_type_t peek_tok_type(tok_buf_t* tokbuf) {
return peek_tok(tokbuf)->type;
}
int expect_pop_tok(tok_buf_t* tokbuf, tok_type_t type) {
flush_peek_tok(tokbuf);
tok_t* tok = peek_tok(tokbuf);
if (tok->type != type) {
error("expected tok: %s, got %s", get_tok_name(type), get_tok_name(tok->type));
} else {
pop_tok(tokbuf);
}
return 0;
}
// 生成字符串映射(根据需求选择#str或#name
static const char* token_strings[] = {
// 普通token使用#str
#define X(str, tok) [tok] = #str,
TOKEN_TABLE
#undef X
// 关键字使用#name
#define X(name, std, tok) [tok] = #name,
KEYWORD_TABLE
#undef X
};
const char* get_tok_name(tok_type_t type) {
return token_strings[type];
}

View File

@@ -1,157 +0,0 @@
#ifndef __TOKEN_H__
#define __TOKEN_H__
enum CSTD_KEYWORD {
CSTD_C89,
CSTD_C99,
CEXT_ASM,
};
// Using Binary Search To Fast Find Keyword
#define KEYWORD_TABLE \
X(asm , CEXT_ASM, TOKEN_ASM) \
X(break , CSTD_C89, TOKEN_BREAK) \
X(case , CSTD_C89, TOKEN_CASE) \
X(char , CSTD_C89, TOKEN_CHAR) \
X(const , CSTD_C89, TOKEN_CONST) \
X(continue , CSTD_C89, TOKEN_CONTINUE) \
X(default , CSTD_C89, TOKEN_DEFAULT) \
X(do , CSTD_C89, TOKEN_DO) \
X(double , CSTD_C89, TOKEN_DOUBLE) \
X(else , CSTD_C89, TOKEN_ELSE) \
X(enum , CSTD_C89, TOKEN_ENUM) \
X(extern , CSTD_C89, TOKEN_EXTERN) \
X(float , CSTD_C89, TOKEN_FLOAT) \
X(for , CSTD_C89, TOKEN_FOR) \
X(goto , CSTD_C89, TOKEN_GOTO) \
X(if , CSTD_C89, TOKEN_IF) \
X(inline , CSTD_C99, TOKEN_INLINE) \
X(int , CSTD_C89, TOKEN_INT) \
X(long , CSTD_C89, TOKEN_LONG) \
X(register , CSTD_C89, TOKEN_REGISTER) \
X(restrict , CSTD_C99, TOKEN_RESTRICT) \
X(return , CSTD_C89, TOKEN_RETURN) \
X(short , CSTD_C89, TOKEN_SHORT) \
X(signed , CSTD_C89, TOKEN_SIGNED) \
X(sizeof , CSTD_C89, TOKEN_SIZEOF) \
X(static , CSTD_C89, TOKEN_STATIC) \
X(struct , CSTD_C89, TOKEN_STRUCT) \
X(switch , CSTD_C89, TOKEN_SWITCH) \
X(typedef , CSTD_C89, TOKEN_TYPEDEF) \
X(union , CSTD_C89, TOKEN_UNION) \
X(unsigned , CSTD_C89, TOKEN_UNSIGNED) \
X(void , CSTD_C89, TOKEN_VOID) \
X(volatile , CSTD_C89, TOKEN_VOLATILE) \
X(while , CSTD_C89, TOKEN_WHILE) \
// KEYWORD_TABLE
#define TOKEN_TABLE \
X(EOF , TOKEN_EOF) \
X(init , TOKEN_INIT) \
X(flush , TOKEN_FLUSH) \
X("==" , TOKEN_EQ) \
X("=" , TOKEN_ASSIGN) \
X("++" , TOKEN_ADD_ADD) \
X("+=" , TOKEN_ASSIGN_ADD) \
X("+" , TOKEN_ADD) \
X("--" , TOKEN_SUB_SUB) \
X("-=" , TOKEN_ASSIGN_SUB) \
X("->" , TOKEN_DEREF) \
X("-" , TOKEN_SUB) \
X("*=" , TOKEN_ASSIGN_MUL) \
X("*" , TOKEN_MUL) \
X("/=" , TOKEN_ASSIGN_DIV) \
X("/" , TOKEN_DIV) \
X("//" , TOKEN_LINE_COMMENT) \
X("/* */" , TOKEN_BLOCK_COMMENT) \
X("%=" , TOKEN_ASSIGN_MOD) \
X("%" , TOKEN_MOD) \
X("&&" , TOKEN_AND_AND) \
X("&=" , TOKEN_ASSIGN_AND) \
X("&" , TOKEN_AND) \
X("||" , TOKEN_OR_OR) \
X("|=" , TOKEN_ASSIGN_OR) \
X("|" , TOKEN_OR) \
X("^=" , TOKEN_ASSIGN_XOR) \
X("^" , TOKEN_XOR) \
X("<<=" , TOKEN_ASSIGN_L_SH) \
X("<<" , TOKEN_L_SH) \
X("<=" , TOKEN_LE) \
X("<" , TOKEN_LT) \
X(">>=" , TOKEN_ASSIGN_R_SH) \
X(">>" , TOKEN_R_SH) \
X(">=" , TOKEN_GE) \
X(">" , TOKEN_GT) \
X("!" , TOKEN_NOT) \
X("!=" , TOKEN_NEQ) \
X("~" , TOKEN_BIT_NOT) \
X("[" , TOKEN_L_BRACKET) \
X("]" , TOKEN_R_BRACKET) \
X("(" , TOKEN_L_PAREN) \
X(")" , TOKEN_R_PAREN) \
X("{" , TOKEN_L_BRACE) \
X("}" , TOKEN_R_BRACE) \
X(";" , TOKEN_SEMICOLON) \
X("," , TOKEN_COMMA) \
X(":" , TOKEN_COLON) \
X("." , TOKEN_DOT) \
X("..." , TOKEN_ELLIPSIS) \
X("?" , TOKEN_COND) \
X(identifier , TOKEN_IDENT) \
X(int_literal , TOKEN_INT_LITERAL) \
X(float_literal , TOKEN_FLOAT_LITERAL) \
X(char_literal , TOKEN_CHAR_LITERAL) \
X(string_literal , TOKEN_STRING_LITERAL) \
// END
// 定义TokenType枚举
typedef enum tok_type {
// 处理普通token
#define X(str, tok) tok,
TOKEN_TABLE
#undef X
// 处理关键字(保持原有格式)
#define X(name, std, tok) tok,
KEYWORD_TABLE
#undef X
} tok_type_t;
typedef struct tok_val {
int have;
union {
char ch;
int i;
float f;
double d;
long long ll;
char* str;
};
} tok_val_t;
typedef struct tok {
tok_type_t type;
tok_val_t val;
} tok_t;
typedef struct tok_buf {
int cur;
int end;
int peek;
int size;
int cap;
tok_t* buf;
void* stream;
void (*gettok)(void* stream, tok_t* token);
} tok_buf_t;
typedef void(*get_tokbuf_func)(void* stream, tok_t* token);
void init_tokbuf(tok_buf_t* tokbuf, void* stream, get_tokbuf_func gettok);
tok_t* peek_tok(tok_buf_t* tokbuf);
tok_t* pop_tok(tok_buf_t* tokbuf);
void flush_peek_tok(tok_buf_t* tokbuf);
tok_type_t peek_tok_type(tok_buf_t* tokbuf);
int expect_pop_tok(tok_buf_t* tokbuf, tok_type_t type);
const char* get_tok_name(tok_type_t type);
#endif

View File

@@ -1,18 +0,0 @@
- ast.c 作为抽象语法树的定义
- block.c 作为块的实现主要用于处理作用域,需要符号表
- decl.c 作为声明的实现,其中主要携带变量声明,函数声明见 func.c ,需要符号表
- func.c 作为函数的实现,其中主要携带函数声明,以及函数定义,需要符号表
- expr.c 作为表达式的实现。需要符号表
- stmt.c 作为语句的实现。需要表达式类型判断合法性
- term.c 作为终结符的实现。需要表达式类型判断合法性
- program.c 作为词法分析语义分析入口函数可以根据parser结构生成AST
其中stmt参考cppreference
其中expr参考AI以及CParser

View File

@@ -1,173 +0,0 @@
#include "ast.h"
#include "../parser.h"
struct ASTNode* new_ast_node(void) {
struct ASTNode* node = xmalloc(sizeof(struct ASTNode));
init_ast_node(node);
return node;
}
void init_ast_node(struct ASTNode* node) {
node->type = NT_INIT;
for (int i = 0; i < sizeof(node->children) / sizeof(node->children[0]); i++) {
node->children[i] = NULL;
}
}
// struct ASTNode* find_ast_node(struct ASTNode* node, ast_type_t type) {
// }
#include <stdio.h>
static void pnt_depth(int depth) {
for (int i = 0; i < depth; i++) {
printf(" ");
}
}
// void pnt_ast(struct ASTNode* node, int depth) {
// if (!node) return;
// pnt_depth(depth);
// switch (node->type) {
// case NT_ROOT:
// for (int i = 0; i < node->root.child_size; i++) {
// pnt_ast(node->root.children[i], depth);
// }
// return;
// case NT_ADD : printf("+ \n"); break; // (expr) + (expr)
// case NT_SUB : printf("- \n"); break; // (expr) - (expr)
// case NT_MUL : printf("* \n"); break; // (expr) * (expr)
// case NT_DIV : printf("/ \n"); break; // (expr) / (expr)
// case NT_MOD : printf("%%\n"); break; // (expr) % (expr)
// case NT_AND : printf("& \n"); break; // (expr) & (expr)
// case NT_OR : printf("| \n"); break; // (expr) | (expr)
// case NT_XOR : printf("^ \n"); break; // (expr) ^ (expr)
// case NT_L_SH : printf("<<\n"); break; // (expr) << (expr)
// case NT_R_SH : printf(">>\n"); break; // (expr) >> (expr)
// case NT_EQ : printf("==\n"); break; // (expr) == (expr)
// case NT_NEQ : printf("!=\n"); break; // (expr) != (expr)
// case NT_LE : printf("<=\n"); break; // (expr) <= (expr)
// case NT_GE : printf(">=\n"); break; // (expr) >= (expr)
// case NT_LT : printf("< \n"); break; // (expr) < (expr)
// case NT_GT : printf("> \n"); break; // (expr) > (expr)
// case NT_AND_AND : printf("&&\n"); break; // (expr) && (expr)
// case NT_OR_OR : printf("||\n"); break; // (expr) || (expr)
// case NT_NOT : printf("! \n"); break; // ! (expr)
// case NT_BIT_NOT : printf("~ \n"); break; // ~ (expr)
// case NT_COMMA : printf(", \n"); break; // expr, expr 逗号运算符
// case NT_ASSIGN : printf("= \n"); break; // (expr) = (expr)
// // case NT_COND : // (expr) ? (expr) : (expr)
// case NT_STMT_EMPTY : // ;
// printf(";\n");
// break;
// case NT_STMT_IF : // if (cond) { ... } [else {...}]
// printf("if");
// pnt_ast(node->if_stmt.cond, depth+1);
// pnt_ast(node->if_stmt.if_stmt, depth+1);
// if (node->if_stmt.else_stmt) {
// pnt_depth(depth);
// printf("else");
// pnt_ast(node->if_stmt.else_stmt, depth+1);
// }
// break;
// case NT_STMT_WHILE : // while (cond) { ... }
// printf("while\n");
// pnt_ast(node->while_stmt.cond, depth+1);
// pnt_ast(node->while_stmt.body, depth+1);
// break;
// case NT_STMT_DOWHILE : // do {...} while (cond)
// printf("do-while\n");
// pnt_ast(node->do_while_stmt.body, depth+1);
// pnt_ast(node->do_while_stmt.cond, depth+1);
// break;
// case NT_STMT_FOR : // for (init; cond; iter) {...}
// printf("for\n");
// if (node->for_stmt.init)
// pnt_ast(node->for_stmt.init, depth+1);
// if (node->for_stmt.cond)
// pnt_ast(node->for_stmt.cond, depth+1);
// if (node->for_stmt.iter)
// pnt_ast(node->for_stmt.iter, depth+1);
// pnt_ast(node->for_stmt.body, depth+1);
// break;
// case NT_STMT_SWITCH : // switch (expr) { case ... }
// case NT_STMT_BREAK : // break;
// case NT_STMT_CONTINUE : // continue;
// case NT_STMT_GOTO : // goto label;
// case NT_STMT_CASE : // case const_expr:
// case NT_STMT_DEFAULT : // default:
// case NT_STMT_LABEL : // label:
// break;
// case NT_STMT_BLOCK : // { ... }
// printf("{\n");
// for (int i = 0; i < node->block.child_size; i++) {
// pnt_ast(node->block.children[i], depth+1);
// }
// pnt_depth(depth);
// printf("}\n");
// break;
// case NT_STMT_RETURN : // return expr;
// printf("return");
// if (node->return_stmt.expr_stmt) {
// printf(" ");
// pnt_ast(node->return_stmt.expr_stmt, depth+1);
// } else {
// printf("\n");
// }
// break;
// case NT_STMT_EXPR : // expr;
// printf("stmt\n");
// pnt_ast(node->expr_stmt.expr_stmt, depth);
// pnt_depth(depth);
// printf(";\n");
// break;
// case NT_DECL_VAR : // type name; or type name = expr;
// printf("decl_val\n");
// break;
// case NT_DECL_FUNC: // type func_name(param_list);
// printf("decl func %s\n", node->func.name->syms.tok.val.str);
// break;
// case NT_FUNC : // type func_name(param_list) {...}
// printf("def func %s\n", node->func.name->syms.tok.val.str);
// // pnt_ast(node->child.func.params, depth);
// pnt_ast(node->func.body, depth);
// // pnt_ast(node->child.func.ret, depth);
// break;
// case NT_PARAM : // 函数形参
// printf("param\n");
// case NT_ARG_LIST : // 实参列表需要与NT_CALL配合
// printf("arg_list\n");
// case NT_TERM_CALL : // func (expr)
// printf("call\n");
// break;
// case NT_TERM_IDENT:
// printf("%s\n", node->syms.tok.val.str);
// break;
// case NT_TERM_VAL : // Terminal Symbols like constant, identifier, keyword
// tok_t * tok = &node->syms.tok;
// switch (tok->type) {
// case TOKEN_CHAR_LITERAL:
// printf("%c\n", tok->val.ch);
// break;
// case TOKEN_INT_LITERAL:
// printf("%d\n", tok->val.i);
// break;
// case TOKEN_STRING_LITERAL:
// printf("%s\n", tok->val.str);
// break;
// default:
// printf("unknown term val\n");
// break;
// }
// default:
// break;
// }
// // 通用子节点递归处理
// if (node->type <= NT_ASSIGN) { // 表达式类统一处理子节点
// if (node->expr.left) pnt_ast(node->expr.left, depth+1);
// if (node->expr.right) pnt_ast(node->expr.right, depth + 1);
// }
// }

View File

@@ -1,189 +0,0 @@
#ifndef __AST_H__
#define __AST_H__
#include "../../frontend.h"
#include "../../lexer/lexer.h"
#include "../../../../libcore/vector.h"
#include "../type.h"
typedef enum {
NT_INIT,
NT_ROOT, // global scope in root node
NT_ADD, // (expr) + (expr)
NT_SUB, // (expr) - (expr)
NT_MUL, // (expr) * (expr)
NT_DIV, // (expr) / (expr)
NT_MOD, // (expr) % (expr)
NT_AND, // (expr) & (expr)
NT_OR, // (expr) | (expr)
NT_XOR, // (expr) ^ (expr)
NT_L_SH, // (expr) << (expr)
NT_R_SH, // (expr) >> (expr)
NT_EQ, // (expr) == (expr)
NT_NEQ, // (expr) != (expr)
NT_LE, // (expr) <= (expr)
NT_GE, // (expr) >= (expr)
NT_LT, // (expr) < (expr)
NT_GT, // (expr) > (expr)
NT_AND_AND, // (expr) && (expr)
NT_OR_OR, // (expr) || (expr)
NT_NOT, // ! (expr)
NT_BIT_NOT, // ~ (expr)
NT_COND, // (expr) ? (expr) : (expr)
NT_COMMA, // expr, expr 逗号运算符
NT_ASSIGN, // (expr) = (expr)
NT_ADDRESS, // &expr (取地址)
NT_DEREF, // *expr (解引用)
NT_INDEX, // arr[index] (数组访问)
NT_MEMBER, // struct.member
NT_PTR_MEMBER,// ptr->member
NT_CAST, // (type)expr 强制类型转换
NT_SIZEOF, // sizeof(type|expr)
// NT_ALIGNOF, // _Alignof(type) (C11)
NT_STMT_EMPTY, // ;
NT_STMT_IF, // if (cond) { ... } [else {...}]
NT_STMT_WHILE, // while (cond) { ... }
NT_STMT_DOWHILE, // do {...} while (cond)
NT_STMT_FOR, // for (init; cond; iter) {...}
NT_STMT_SWITCH, // switch (expr) { case ... }
NT_STMT_BREAK, // break;
NT_STMT_CONTINUE, // continue;
NT_STMT_GOTO, // goto label;
NT_STMT_CASE, // case const_expr:
NT_STMT_DEFAULT, // default:
NT_STMT_LABEL, // label:
NT_STMT_BLOCK, // { ... }
NT_STMT_RETURN, // return expr;
NT_STMT_EXPR, // expr;
NT_BLOCK,
// NT_TYPE_BASE, // 基础类型节点
// NT_TYPE_PTR, // 指针类型
// NT_TYPE_ARRAY, // 数组类型
// NT_TYPE_FUNC, // 函数类型
// NT_TYPE_QUAL, // 限定符节点
NT_DECL_VAR, // type name; or type name = expr;
NT_DECL_FUNC, // type func_name(param_list);
NT_FUNC, // type func_name(param_list) {...}
NT_PARAM, // 函数形参
NT_ARG_LIST, // 实参列表需要与NT_CALL配合
NT_TERM_CALL, // func (expr)
NT_TERM_VAL,
NT_TERM_IDENT,
NT_TERM_TYPE,
} ast_type_t;
typedef struct ASTNode {
ast_type_t type;
union {
void *children[6];
struct {
vector_header(children, struct ASTNode*);
} root;
struct {
vector_header(children, struct ASTNode*);
} block;
struct {
struct ASTNode* decl_node;
tok_t tok;
} syms;
struct {
vector_header(params, struct ASTNode*);
} params;
struct {
struct ASTNode* name;
struct ASTNode* params;
struct ASTNode* func_decl;
} call;
struct {
struct ASTNode *type;
struct ASTNode *name;
struct ASTNode *expr_stmt; // optional
void* data;
} decl_val;
struct {
struct ASTNode *ret;
struct ASTNode *name;
struct ASTNode *params; // array of params
struct ASTNode *def;
} decl_func;
struct {
struct ASTNode *decl;
struct ASTNode *body; // optional
void* data;
} func;
struct {
struct ASTNode *left;
struct ASTNode *right;
struct ASTNode *optional; // optional
} expr;
struct {
struct ASTNode *cond;
struct ASTNode *if_stmt;
struct ASTNode *else_stmt; // optional
} if_stmt;
struct {
struct ASTNode *cond;
struct ASTNode *body;
} switch_stmt;
struct {
struct ASTNode *cond;
struct ASTNode *body;
} while_stmt;
struct {
struct ASTNode *body;
struct ASTNode *cond;
} do_while_stmt;
struct {
struct ASTNode *init;
struct ASTNode *cond; // optional
struct ASTNode *iter; // optional
struct ASTNode *body;
} for_stmt;
struct {
struct ASTNode *expr_stmt; // optional
} return_stmt;
struct {
struct ASTNode *label;
} goto_stmt;
struct {
struct ASTNode *label;
} label_stmt;
struct {
struct ASTNode *block;
} block_stmt;
struct {
struct ASTNode *expr_stmt;
} expr_stmt;
};
} ast_node_t;
struct ASTNode* new_ast_node(void);
void init_ast_node(struct ASTNode* node);
void pnt_ast(struct ASTNode* node, int depth);
typedef struct parser parser_t;
typedef struct ASTNode* (*parse_func_t) (parser_t*);
void parse_prog(parser_t* parser);
ast_node_t* parse_decl(parser_t* parser);
ast_node_t* parse_decl_val(parser_t* parser);
ast_node_t* parse_block(parser_t* parser);
ast_node_t* parse_stmt(parser_t* parser);
ast_node_t* parse_expr(parser_t* parser);
ast_node_t* parse_type(parser_t* parser);
ast_node_t* new_ast_ident_node(tok_t* tok);
ast_node_t* expect_pop_ident(tok_buf_t* tokbuf);
int peek_decl(tok_buf_t* tokbuf);
#endif

View File

@@ -1,51 +0,0 @@
#include "ast.h"
#include "../parser.h"
#include "../symtab/symtab.h"
#ifndef BLOCK_MAX_NODE
#define BLOCK_MAX_NODE (1024)
#endif
ast_node_t* new_ast_node_block() {
ast_node_t* node = new_ast_node();
node->type = NT_BLOCK;
vector_init(node->block.children);
return node;
}
ast_node_t* parse_block(parser_t* parser) {
symtab_enter_scope(parser->symtab);
tok_buf_t *tokbuf = &parser->tokbuf;
flush_peek_tok(tokbuf);
tok_type_t ttype;
ast_node_t* node = new_ast_node_block();
expect_pop_tok(tokbuf, TOKEN_L_BRACE);
ast_node_t* child = NULL;
while (1) {
if (peek_decl(tokbuf)) {
child = parse_decl(parser);
vector_push(node->block.children, child);
continue;
}
flush_peek_tok(tokbuf);
ttype = peek_tok_type(tokbuf);
switch (ttype) {
case TOKEN_R_BRACE: {
pop_tok(tokbuf);
goto END;
}
default: {
child = parse_stmt(parser);
vector_push(node->block.children, child);
break;
}
}
}
END:
symtab_leave_scope(parser->symtab);
return node;
}

View File

@@ -1,96 +0,0 @@
#include "../parser.h"
#include "ast.h"
#include "../symtab/symtab.h"
/**
* 0 false
* 1 true
*/
int peek_decl(tok_buf_t* tokbuf) {
flush_peek_tok(tokbuf);
switch (peek_tok_type(tokbuf)) {
case TOKEN_STATIC:
case TOKEN_EXTERN:
case TOKEN_REGISTER:
case TOKEN_TYPEDEF:
error("not impliment");
break;
default:
flush_peek_tok(tokbuf);
}
switch (peek_tok_type(tokbuf)) {
case TOKEN_VOID:
case TOKEN_CHAR:
case TOKEN_SHORT:
case TOKEN_INT:
case TOKEN_LONG:
case TOKEN_FLOAT:
case TOKEN_DOUBLE:
// FIXME Ptr
return 1;
default:
flush_peek_tok(tokbuf);
}
return 0;
}
ast_node_t* parse_decl_val(parser_t* parser) {
tok_buf_t* tokbuf = &parser->tokbuf;
tok_type_t ttype;
flush_peek_tok(tokbuf);
ast_node_t* node;
ast_node_t* type_node = parse_type(parser);
flush_peek_tok(tokbuf);
ast_node_t* name_node = new_ast_ident_node(peek_tok(tokbuf));
node = new_ast_node();
node->decl_val.type = type_node;
node->decl_val.name = name_node;
node->type = NT_DECL_VAR;
symtab_add_symbol(parser->symtab, name_node->syms.tok.val.str, node, 0);
ttype = peek_tok_type(tokbuf);
if (ttype == TOKEN_ASSIGN) {
node->decl_val.expr_stmt = parse_stmt(parser);
if (node->decl_val.expr_stmt->type != NT_STMT_EXPR) {
error("parser_decl_val want stmt_expr");
}
} else if (ttype == TOKEN_SEMICOLON) {
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_SEMICOLON);
} else {
error("parser_decl_val syntax error");
}
return node;
}
ast_node_t* parse_decl(parser_t* parser) {
tok_buf_t* tokbuf = &parser->tokbuf;
flush_peek_tok(tokbuf);
tok_type_t ttype;
ast_node_t* node;
if (peek_decl(tokbuf) == 0) {
error("syntax error expect decl_val TYPE");
}
if (peek_tok_type(tokbuf) != TOKEN_IDENT) {
error("syntax error expect decl_val IDENT");
}
ttype = peek_tok_type(tokbuf);
switch (ttype) {
case TOKEN_L_PAREN: // (
return NULL;
break;
case TOKEN_ASSIGN:
case TOKEN_SEMICOLON:
node = parse_decl_val(parser);
break;
default:
error("syntax error expect decl_val ASSIGN or SEMICOLON");
return NULL;
}
return node;
}

View File

@@ -1,425 +0,0 @@
#include "../parser.h"
#include "ast.h"
#include "../symtab/symtab.h"
// Copy from `CParse`
/**
* Operator precedence classes
*/
enum Precedence {
PREC_BOTTOM,
PREC_EXPRESSION, /* , left to right */
PREC_ASSIGNMENT, /* = += -= *= /= %= <<= >>= &= ^= |= right to left */
PREC_CONDITIONAL, /* ?: right to left */
PREC_LOGICAL_OR, /* || left to right */
PREC_LOGICAL_AND, /* && left to right */
PREC_OR, /* | left to right */
PREC_XOR, /* ^ left to right */
PREC_AND, /* & left to right */
PREC_EQUALITY, /* == != left to right */
PREC_RELATIONAL, /* < <= > >= left to right */
PREC_SHIFT, /* << >> left to right */
PREC_ADDITIVE, /* + - left to right */
PREC_MULTIPLICATIVE, /* * / % left to right */
PREC_CAST, /* (type) right to left */
PREC_UNARY, /* ! ~ ++ -- + - * & sizeof right to left */
PREC_POSTFIX, /* () [] -> . left to right */
PREC_PRIMARY,
PREC_TOP
};
enum ParseType {
INFIX_PARSER,
PREFIX_PARSER,
};
static ast_node_t *parse_subexpression(tok_buf_t* tokbuf, symtab_t *symtab, enum Precedence prec);
#define NEXT(prec) parse_subexpression(tokbuf, symtab, prec)
static ast_node_t* gen_node2(ast_node_t* left, ast_node_t* right,
ast_type_t type) {
ast_node_t* node = new_ast_node();
node->type = type;
node->expr.left = left;
node->expr.right = right;
return node;
// FIXME
// switch (type) {
// case NT_ADD : printf("+ \n"); break; // (expr) + (expr)
// case NT_SUB : printf("- \n"); break; // (expr) - (expr)
// case NT_MUL : printf("* \n"); break; // (expr) * (expr)
// case NT_DIV : printf("/ \n"); break; // (expr) / (expr)
// case NT_MOD : printf("%%\n"); break; // (expr) % (expr)
// case NT_AND : printf("& \n"); break; // (expr) & (expr)
// case NT_OR : printf("| \n"); break; // (expr) | (expr)
// case NT_XOR : printf("^ \n"); break; // (expr) ^ (expr)
// case NT_L_SH : printf("<<\n"); break; // (expr) << (expr)
// case NT_R_SH : printf(">>\n"); break; // (expr) >> (expr)
// case NT_EQ : printf("==\n"); break; // (expr) == (expr)
// case NT_NEQ : printf("!=\n"); break; // (expr) != (expr)
// case NT_LE : printf("<=\n"); break; // (expr) <= (expr)
// case NT_GE : printf(">=\n"); break; // (expr) >= (expr)
// case NT_LT : printf("< \n"); break; // (expr) < (expr)
// case NT_GT : printf("> \n"); break; // (expr) > (expr)
// case NT_AND_AND : printf("&&\n"); break; // (expr) && (expr)
// case NT_OR_OR : printf("||\n"); break; // (expr) || (expr)
// case NT_NOT : printf("! \n"); break; // ! (expr)
// case NT_BIT_NOT : printf("~ \n"); break; // ~ (expr)
// case NT_COMMA : printf(", \n"); break; // expr, expr 逗号运算符
// case NT_ASSIGN : printf("= \n"); break; // (expr) = (expr)
// // case NT_COND : // (expr) ? (expr) : (expr)
// }
}
static ast_node_t* parse_comma(tok_buf_t* tokbuf, symtab_t *symtab, ast_node_t* left) {
ast_node_t* node = new_ast_node();
node->type = NT_COMMA;
node->expr.left = left;
node->expr.right = NEXT(PREC_EXPRESSION);
return node;
}
static ast_node_t* parse_assign(tok_buf_t* tokbuf, symtab_t *symtab, ast_node_t* left) {
flush_peek_tok(tokbuf);
tok_type_t ttype = peek_tok_type(tokbuf);
pop_tok(tokbuf);
ast_node_t* node = new_ast_node();
node->type = NT_ASSIGN;
// saved left
node->expr.left = left;
enum Precedence next = PREC_ASSIGNMENT + 1;
switch (ttype) {
case TOKEN_ASSIGN :
left = NEXT(next);
break;
case TOKEN_ASSIGN_ADD :
left = gen_node2(left, NEXT(next), NT_ADD);
break;
case TOKEN_ASSIGN_SUB :
left = gen_node2(left, NEXT(next), NT_SUB);
break;
case TOKEN_ASSIGN_MUL :
left = gen_node2(left, NEXT(next), NT_MUL);
break;
case TOKEN_ASSIGN_DIV :
left = gen_node2(left, NEXT(next), NT_DIV);
break;
case TOKEN_ASSIGN_MOD :
left = gen_node2(left, NEXT(next), NT_MOD);
break;
case TOKEN_ASSIGN_L_SH :
left = gen_node2(left, NEXT(next), NT_L_SH);
break;
case TOKEN_ASSIGN_R_SH :
left = gen_node2(left, NEXT(next), NT_R_SH);
break;
case TOKEN_ASSIGN_AND :
left = gen_node2(left, NEXT(next), NT_AND);
break;
case TOKEN_ASSIGN_OR :
left = gen_node2(left, NEXT(next), NT_OR);
break;
case TOKEN_ASSIGN_XOR :
left = gen_node2(left, NEXT(next), NT_XOR);
break;
default:
error("unsupported operator");
break;
}
node->expr.right = left;
return node;
}
static ast_node_t* parse_cmp(tok_buf_t* tokbuf, symtab_t *symtab, ast_node_t* left) {
flush_peek_tok(tokbuf);
tok_type_t ttype = peek_tok_type(tokbuf);
pop_tok(tokbuf);
ast_node_t* node = new_ast_node();
// saved left
node->expr.left = left;
switch (ttype) {
case TOKEN_EQ:
node->type = NT_EQ;
node->expr.right = NEXT(PREC_EQUALITY);
break;
case TOKEN_NEQ:
node->type = NT_NEQ;
node->expr.right = NEXT(PREC_EQUALITY);
break;
case TOKEN_LT:
node->type = NT_LT;
node->expr.right = NEXT(PREC_RELATIONAL);
break;
case TOKEN_GT:
node->type = NT_GT;
node->expr.right = NEXT(PREC_RELATIONAL);
break;
case TOKEN_LE:
node->type = NT_LE;
node->expr.right = NEXT(PREC_RELATIONAL);
break;
case TOKEN_GE:
node->type = NT_GE;
node->expr.right = NEXT(PREC_RELATIONAL);
break;
default:
error("invalid operator");
}
return node;
}
static ast_node_t* parse_cal(tok_buf_t* tokbuf, symtab_t *symtab, ast_node_t* left) {
flush_peek_tok(tokbuf);
tok_type_t ttype = peek_tok_type(tokbuf);
pop_tok(tokbuf);
ast_node_t* node = new_ast_node();
node->expr.left = left;
switch (ttype) {
case TOKEN_OR_OR:
node->type = NT_OR_OR;
node->expr.right = NEXT(PREC_LOGICAL_OR);
break;
case TOKEN_AND_AND:
node->type = NT_AND_AND;
node->expr.right = NEXT(PREC_LOGICAL_AND);
break;
case TOKEN_OR:
node->type = NT_OR;
node->expr.right = NEXT(PREC_OR);
break;
case TOKEN_XOR:
node->type = NT_XOR;
node->expr.right = NEXT(PREC_XOR);
break;
case TOKEN_AND:
node->type = NT_AND;
node->expr.right = NEXT(PREC_AND);
break;
case TOKEN_L_SH:
node->type = NT_L_SH;
node->expr.right = NEXT(PREC_SHIFT);
break;
case TOKEN_R_SH:
node->type = NT_R_SH;
node->expr.right = NEXT(PREC_SHIFT);
break;
case TOKEN_ADD:
node->type = NT_ADD;
node->expr.right = NEXT(PREC_ADDITIVE);
break;
case TOKEN_SUB:
node->type = NT_SUB;
node->expr.right = NEXT(PREC_ADDITIVE);
break;
case TOKEN_MUL:
node->type = NT_MUL;
node->expr.right = NEXT(PREC_MULTIPLICATIVE);
break;
case TOKEN_DIV:
node->type = NT_DIV;
node->expr.right = NEXT(PREC_MULTIPLICATIVE);
break;
case TOKEN_MOD:
node->type = NT_MOD;
node->expr.right = NEXT(PREC_MULTIPLICATIVE);
break;
default:
break;
}
return node;
}
static ast_node_t* parse_call(tok_buf_t* tokbuf, symtab_t *symtab, ast_node_t* ident) {
ast_node_t* node = new_ast_node();
node->type = NT_TERM_CALL;
node->call.name = ident;
node->call.params = new_ast_node();
vector_init(node->call.params->params.params);
pop_tok(tokbuf); // 跳过 '('
tok_type_t ttype;
while (1) {
flush_peek_tok(tokbuf);
ttype = peek_tok_type(tokbuf);
if (ttype == TOKEN_R_PAREN) {
break;
}
ast_node_t* param = NEXT(PREC_EXPRESSION);
vector_push(node->call.params->params.params, param);
flush_peek_tok(tokbuf);
ttype = peek_tok_type(tokbuf);
if (ttype == TOKEN_COMMA) pop_tok(tokbuf);
}
pop_tok(tokbuf); // 跳过 ')'
const char* name = ident->syms.tok.val.str;
ast_node_t* sym = symtab_lookup_symbol(symtab, name);
// TODO check func is match
if (sym == NULL || sym->type != NT_DECL_FUNC) {
error("function not decl %s", name);
}
node->call.name = ident;
node->call.func_decl = sym;
return node;
}
static ast_node_t* parse_paren(tok_buf_t* tokbuf, symtab_t *symtab, ast_node_t* left) {
flush_peek_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_L_PAREN);
left = NEXT(PREC_EXPRESSION);
flush_peek_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_R_PAREN);
return left;
}
typedef ast_node_t* (*parse_expr_fun_t)(tok_buf_t*, symtab_t* , ast_node_t*);
static struct expr_prec_table_t {
parse_expr_fun_t parser;
enum Precedence prec;
enum ParseType ptype;
} expr_table [256] = {
[TOKEN_COMMA] = {parse_comma, PREC_EXPRESSION, INFIX_PARSER},
[TOKEN_ASSIGN] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_ADD] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_SUB] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_MUL] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_DIV] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_MOD] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_L_SH] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_R_SH] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_AND] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_OR] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_ASSIGN_XOR] = {parse_assign, PREC_ASSIGNMENT, INFIX_PARSER},
[TOKEN_OR_OR] = {parse_cal, PREC_LOGICAL_OR , INFIX_PARSER},
[TOKEN_AND_AND] = {parse_cal, PREC_LOGICAL_AND, INFIX_PARSER},
[TOKEN_OR] = {parse_cal, PREC_OR , INFIX_PARSER},
[TOKEN_XOR] = {parse_cal, PREC_XOR , INFIX_PARSER},
[TOKEN_AND] = {parse_cal, PREC_AND , INFIX_PARSER},
[TOKEN_EQ] = {parse_cmp, PREC_EQUALITY, INFIX_PARSER},
[TOKEN_NEQ] = {parse_cmp, PREC_EQUALITY, INFIX_PARSER},
[TOKEN_LT] = {parse_cmp, PREC_RELATIONAL, INFIX_PARSER},
[TOKEN_LE] = {parse_cmp, PREC_RELATIONAL, INFIX_PARSER},
[TOKEN_GT] = {parse_cmp, PREC_RELATIONAL, INFIX_PARSER},
[TOKEN_GE] = {parse_cmp, PREC_RELATIONAL, INFIX_PARSER},
[TOKEN_L_SH] = {parse_cal, PREC_SHIFT , INFIX_PARSER},
[TOKEN_R_SH] = {parse_cal, PREC_SHIFT , INFIX_PARSER},
[TOKEN_ADD] = {parse_cal, PREC_ADDITIVE , INFIX_PARSER},
[TOKEN_SUB] = {parse_cal, PREC_ADDITIVE , INFIX_PARSER},
[TOKEN_MUL] = {parse_cal, PREC_MULTIPLICATIVE , INFIX_PARSER},
[TOKEN_DIV] = {parse_cal, PREC_MULTIPLICATIVE , INFIX_PARSER},
[TOKEN_MOD] = {parse_cal, PREC_MULTIPLICATIVE , INFIX_PARSER},
[TOKEN_NOT] = {NULL, PREC_UNARY, PREFIX_PARSER},
[TOKEN_BIT_NOT] = {NULL, PREC_UNARY, PREFIX_PARSER},
[TOKEN_ADD_ADD] = {NULL, PREC_UNARY, PREFIX_PARSER},
[TOKEN_SUB_SUB] = {NULL, PREC_UNARY, PREFIX_PARSER},
// + - * & sizeof
[TOKEN_L_PAREN] = {parse_paren, PREC_POSTFIX, INFIX_PARSER},
};
static ast_node_t *parse_primary_expression(tok_buf_t* tokbuf, symtab_t *symtab) {
flush_peek_tok(tokbuf);
tok_t* tok = peek_tok(tokbuf);
ast_node_t *node = new_ast_node();
node->type = NT_TERM_VAL;
node->syms.tok = *tok;
switch (tok->type) {
case TOKEN_INT_LITERAL:
// node->data.data_type = TYPE_INT;
break;
case TOKEN_FLOAT_LITERAL:
warn("float not supported");
break;
case TOKEN_CHAR_LITERAL:
// node->data.data_type = TYPE_CHAR;
break;
case TOKEN_STRING_LITERAL:
// node->data.data_type = TYPE_POINTER;
case TOKEN_IDENT:
node = expect_pop_ident(tokbuf);
tok_type_t ttype = peek_tok_type(tokbuf);
if (ttype == TOKEN_L_PAREN) {
node = parse_call(tokbuf, symtab, node);
} else {
void *sym = symtab_lookup_symbol(symtab, tok->val.str);
if (sym == NULL) {
error("undefined symbol but use %s", tok->val.str);
}
node->type = NT_TERM_IDENT;
node->syms.decl_node = sym;
}
goto END;
default:
return NULL;
}
pop_tok(tokbuf);
END:
return node;
}
static ast_node_t *parse_subexpression(tok_buf_t* tokbuf, symtab_t *symtab, enum Precedence prec) {
tok_type_t ttype;
struct expr_prec_table_t* work;
ast_node_t* left;
while (1) {
flush_peek_tok(tokbuf);
ttype = peek_tok_type(tokbuf);
work = &expr_table[ttype];
// FIXME
if (ttype == TOKEN_SEMICOLON || ttype == TOKEN_R_PAREN) {
break;
}
if (work == NULL || work->parser == NULL || work->ptype == PREFIX_PARSER) {
if (work->parser != NULL) {
left = work->parser(tokbuf, symtab, NULL);
} else {
left = parse_primary_expression(tokbuf, symtab);
}
} else if (work->ptype == INFIX_PARSER) {
if (work->parser == NULL)
break;
if (work->prec <= prec)
break;
left = work->parser(tokbuf, symtab, left);
}
// assert(left != NULL);
}
return left;
}
ast_node_t* parse_expr(parser_t* parser) {
tok_buf_t* tokbuf = &(parser->tokbuf);
symtab_t *symtab = parser->symtab;
flush_peek_tok(tokbuf);
tok_type_t ttype = peek_tok_type(tokbuf);
switch (ttype) {
case TOKEN_NOT:
case TOKEN_AND:
case TOKEN_L_PAREN:
case TOKEN_MUL:
case TOKEN_ADD:
case TOKEN_SUB:
case TOKEN_BIT_NOT:
case TOKEN_AND_AND:
case TOKEN_CHAR_LITERAL:
case TOKEN_INT_LITERAL:
case TOKEN_STRING_LITERAL:
case TOKEN_ADD_ADD:
case TOKEN_SUB_SUB:
case TOKEN_SIZEOF:
case TOKEN_IDENT:
return NEXT(PREC_EXPRESSION);
default:
error("Want expr but not got %s", get_tok_name(ttype));
break;
}
}

View File

@@ -1,169 +0,0 @@
#include "../parser.h"
#include "../symtab/symtab.h"
#include "ast.h"
#ifndef FUNC_PARAM_CACHE_SIZE
#define FUNC_PARAM_CACHE_SIZE 32 // 合理初始值可覆盖99%常见情况
#endif
// TODO 语义分析压入符号表
static void parse_params(parser_t* parser, tok_buf_t* cache, ast_node_t* node) {
flush_peek_tok(cache);
tok_type_t ttype;
ast_node_t *params = new_ast_node();
node->decl_func.params = params;
vector_init(params->params.params);
int depth = 1;
while (depth) {
ttype = peek_tok_type(cache);
switch (ttype) {
case TOKEN_COMMA:
break;
case TOKEN_ELLIPSIS:
ttype = peek_tok_type(cache);
if (ttype != TOKEN_R_PAREN) {
error("... must be a last parameter list (expect ')')");
}
// TODO
error("not implement");
break;
case TOKEN_IDENT:
// TODO 静态数组
flush_peek_tok(cache);
ast_node_t* id_node = new_ast_ident_node(peek_tok(cache));
ast_node_t* node = new_ast_node();
node->type = NT_DECL_VAR;
node->decl_val.name = id_node;
// TODO typing sys
node->decl_val.type = NULL;
node->decl_val.expr_stmt = NULL;
node->decl_val.data = NULL;
vector_push(params->params.params, node);
symtab_add_symbol(parser->symtab, id_node->syms.tok.val.str, node, 0);
break;
case TOKEN_L_PAREN: {
depth++;
break;
}
case TOKEN_R_PAREN: {
depth--;
break;
}
default:
break;
// TODO 使用cache的类型解析
// parse_type(parser);
// TODO type parse
// ttype = peekcachetype(cache);
// ttype = peekcachetype(cache);
// if (ttype != TOKEN_IDENT) {
// node->node_type = NT_DECL_FUNC;
// flush_peek_tok(tokbuf);
// continue;
// }
// error("function expected ')' or ','\n");
}
pop_tok(cache);
}
}
ast_type_t check_is_func_decl(tok_buf_t* tokbuf, tok_buf_t* cache) {
expect_pop_tok(tokbuf, TOKEN_L_PAREN);
int depth = 1;
while (depth) {
tok_t* tok = peek_tok(tokbuf);
pop_tok(tokbuf);
if (cache->size >= cache->cap - 1) {
error("function parameter list too long");
}
cache->buf[cache->size++] = *tok;
switch (tok->type) {
case TOKEN_L_PAREN:
depth++;
break;
case TOKEN_R_PAREN:
depth--;
break;
default:
break;
}
}
cache->end = cache->size;
switch (peek_tok_type(tokbuf)) {
case TOKEN_SEMICOLON:
pop_tok(tokbuf);
return NT_DECL_FUNC;
case TOKEN_L_BRACE:
return NT_FUNC;
break;
default:
error("function define or decl need '{' or ';' but you don't got");
}
}
static ast_node_t* new_ast_node_funcdecl(ast_node_t* ret, ast_node_t* name) {
ast_node_t* node = new_ast_node();
node->type = NT_DECL_FUNC;
node->decl_func.ret = ret;
node->decl_func.name = name;
node->decl_func.def = NULL;
return node;
}
void parse_func(parser_t* parser) {
tok_buf_t* tokbuf = &(parser->tokbuf);
flush_peek_tok(tokbuf);
ast_node_t* ret_node = parse_type(parser);
ast_node_t* name_node = expect_pop_ident(tokbuf);
const char* func_name = name_node->syms.tok.val.str;
ast_node_t* decl = new_ast_node_funcdecl(ret_node, name_node);
tok_buf_t cache;
init_tokbuf(&cache, NULL, NULL);
cache.cap = FUNC_PARAM_CACHE_SIZE;
tok_t buf[FUNC_PARAM_CACHE_SIZE];
cache.buf = buf;
ast_type_t type = check_is_func_decl(&(parser->tokbuf), &cache);
ast_node_t* prev = symtab_add_symbol(parser->symtab, func_name, decl, 1);
if (prev != NULL) {
if (prev->type != NT_DECL_FUNC) {
error("the symbol duplicate old is %d, new is func", prev->type);
}
// TODO check redeclare func is match
if (type == NT_FUNC) {
// TODO Free decl;
free(decl);
decl = prev;
goto FUNC;
}
return;
}
vector_push(parser->root->root.children, decl);
if (type == NT_DECL_FUNC) {
return;
}
FUNC:
// 该data临时用于判断是否重复定义
if (decl->decl_func.def != NULL) {
error("redefinition of function %s", func_name);
}
ast_node_t* node = new_ast_node();
node->type = NT_FUNC;
node->func.decl = decl;
node->func.data = NULL;
decl->decl_func.def = node;
symtab_enter_scope(parser->symtab);
parse_params(parser, &cache, decl);
node->func.body = parse_block(parser);
symtab_leave_scope(parser->symtab);
vector_push(parser->root->root.children, node);
}

View File

@@ -1,34 +0,0 @@
#include "../parser.h"
#include "ast.h"
#ifndef PROG_MAX_NODE_SIZE
#define PROG_MAX_NODE_SIZE (1024 * 4)
#endif
void parse_func(parser_t* parser);
void parse_prog(parser_t* parser) {
/**
* Program := (Declaration | Definition)*
* same as
* Program := Declaration* Definition*
*/
tok_buf_t *tokbuf = &(parser->tokbuf);
parser->root = new_ast_node();
ast_node_t* node;
parser->root->type = NT_ROOT;
vector_init(parser->root->root.children);
while (1) {
flush_peek_tok(tokbuf);
if (peek_tok_type(tokbuf) == TOKEN_EOF) {
break;
}
node = parse_decl(parser);
if (node == NULL) {
parse_func(parser);
} else {
vector_push(parser->root->root.children, node);
}
}
return;
}

View File

@@ -1,246 +0,0 @@
#include "../parser.h"
#include "ast.h"
ast_node_t* parse_stmt(parser_t* parser) {
tok_buf_t* tokbuf = &parser->tokbuf;
flush_peek_tok(tokbuf);
tok_type_t ttype = peek_tok_type(tokbuf);
ast_node_t* node = new_ast_node();
switch (ttype) {
case TOKEN_IF: {
/**
* if (exp) stmt
* if (exp) stmt else stmt
*/
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_L_PAREN);
node->if_stmt.cond = parse_expr(parser);
flush_peek_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_R_PAREN);
node->if_stmt.if_stmt = parse_stmt(parser);
ttype = peek_tok_type(tokbuf);
if (ttype == TOKEN_ELSE) {
pop_tok(tokbuf);
node->if_stmt.else_stmt = parse_stmt(parser);
} else {
node->if_stmt.else_stmt = NULL;
}
node->type = NT_STMT_IF;
break;
}
case TOKEN_SWITCH: {
/**
* switch (exp) stmt
*/
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_L_PAREN);
node->switch_stmt.cond = parse_expr(parser);
expect_pop_tok(tokbuf, TOKEN_R_PAREN);
node->switch_stmt.body = parse_stmt(parser);
node->type = NT_STMT_SWITCH;
break;
}
case TOKEN_WHILE: {
/**
* while (exp) stmt
*/
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_L_PAREN);
node->while_stmt.cond = parse_expr(parser);
expect_pop_tok(tokbuf, TOKEN_R_PAREN);
node->while_stmt.body = parse_stmt(parser);
node->type = NT_STMT_WHILE;
break;
}
case TOKEN_DO: {
/**
* do stmt while (exp)
*/
pop_tok(tokbuf);
node->do_while_stmt.body = parse_stmt(parser);
ttype = peek_tok_type(tokbuf);
if (ttype != TOKEN_WHILE) {
error("expected while after do");
}
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_L_PAREN);
node->do_while_stmt.cond = parse_expr(parser);
expect_pop_tok(tokbuf, TOKEN_R_PAREN);
node->type = NT_STMT_DOWHILE;
break;
}
case TOKEN_FOR: {
/**
* for (init; [cond]; [iter]) stmt
*/
// node->children.stmt.for_stmt.init
pop_tok(tokbuf);
ttype = peek_tok_type(tokbuf);
if (ttype != TOKEN_L_PAREN) {
error("expected ( after for");
}
pop_tok(tokbuf);
// init expr or init decl_var
// TODO need add this feature
if (peek_decl(tokbuf)) {
node->for_stmt.init = parse_decl_val(parser);
} else {
node->for_stmt.init = parse_expr(parser);
expect_pop_tok(tokbuf, TOKEN_SEMICOLON);
}
// cond expr or null
ttype = peek_tok_type(tokbuf);
if (ttype != TOKEN_SEMICOLON) {
node->for_stmt.cond = parse_expr(parser);
expect_pop_tok(tokbuf, TOKEN_SEMICOLON);
} else {
node->for_stmt.cond = NULL;
pop_tok(tokbuf);
}
// iter expr or null
ttype = peek_tok_type(tokbuf);
if (ttype != TOKEN_R_PAREN) {
node->for_stmt.iter = parse_expr(parser);
expect_pop_tok(tokbuf, TOKEN_R_PAREN);
} else {
node->for_stmt.iter = NULL;
pop_tok(tokbuf);
}
node->for_stmt.body = parse_stmt(parser);
node->type = NT_STMT_FOR;
break;
}
case TOKEN_BREAK: {
/**
* break ;
*/
// TODO check 导致外围 for、while 或 do-while 循环或 switch 语句终止。
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_SEMICOLON);
node->type = NT_STMT_BREAK;
break;
}
case TOKEN_CONTINUE: {
/**
* continue ;
*/
// TODO check 导致跳过整个 for、 while 或 do-while 循环体的剩余部分。
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_SEMICOLON);
node->type = NT_STMT_CONTINUE;
break;
}
case TOKEN_RETURN: {
/**
* return [exp] ;
*/
// TODO 终止当前函数并返回指定值给调用方函数。
pop_tok(tokbuf);
ttype = peek_tok_type(tokbuf);
if (ttype != TOKEN_SEMICOLON) {
node->return_stmt.expr_stmt = parse_expr(parser);
flush_peek_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_SEMICOLON);
} else {
node->return_stmt.expr_stmt = NULL;
pop_tok(tokbuf);
}
node->type = NT_STMT_RETURN;
break;
}
case TOKEN_GOTO: {
/**
* goto label ;
*/
// TODO check label 将控制无条件转移到所欲位置。
//在无法用约定的构造将控制转移到所欲位置时使用。
pop_tok(tokbuf);
// find symbol table
ttype = peek_tok_type(tokbuf);
if (ttype != TOKEN_IDENT) {
error("expect identifier after goto");
}
expect_pop_tok(tokbuf, TOKEN_SEMICOLON);
// TODO filling label
node->goto_stmt.label = expect_pop_ident(tokbuf);
node->type = NT_STMT_GOTO;
break;
}
case TOKEN_SEMICOLON: {
/**
* ;
* empty stmt using by :
* while () ;
* if () ;
* for () ;
*/
pop_tok(tokbuf);
node->type = NT_STMT_EMPTY;
break;
}
case TOKEN_L_BRACE: {
/**
* stmt_block like: { (decl_var | stmt) ... }
*/
node->block_stmt.block = parse_block(parser);
node->type = NT_STMT_BLOCK;
break;
}
case TOKEN_IDENT: {
// TODO label goto
if (peek_tok_type(tokbuf) != TOKEN_COLON) {
goto EXP;
}
node->label_stmt.label = expect_pop_ident(tokbuf);
expect_pop_tok(tokbuf, TOKEN_COLON);
node->type = NT_STMT_LABEL;
break;
}
case TOKEN_CASE: {
// TODO label switch
pop_tok(tokbuf);
error("unimplemented switch label");
node->label_stmt.label = parse_expr(parser);
// TODO 该表达式为const int
expect_pop_tok(tokbuf, TOKEN_COLON);
node->type = NT_STMT_CASE;
break;
}
case TOKEN_DEFAULT: {
// TODO label switch default
pop_tok(tokbuf);
expect_pop_tok(tokbuf, TOKEN_COLON);
node->type = NT_STMT_DEFAULT;
break;
}
default: {
/**
* exp ;
*/
EXP:
node->expr_stmt.expr_stmt = parse_expr(parser);
flush_peek_tok(tokbuf);
ttype = peek_tok_type(tokbuf);
if (ttype != TOKEN_SEMICOLON) {
error("exp must end with \";\"");
}
pop_tok(tokbuf);
node->type = NT_STMT_EXPR;
break;
}
}
return node;
}

View File

@@ -1,51 +0,0 @@
#include "../parser.h"
#include "../type.h"
#include "ast.h"
ast_node_t* new_ast_ident_node(tok_t* tok) {
if (tok->type != TOKEN_IDENT) {
error("syntax error: want identifier but got %d", tok->type);
}
ast_node_t* node = new_ast_node();
node->type = NT_TERM_IDENT;
node->syms.tok = *tok;
node->syms.decl_node = NULL;
return node;
}
ast_node_t* expect_pop_ident(tok_buf_t* tokbuf) {
flush_peek_tok(tokbuf);
tok_t* tok = peek_tok(tokbuf);
ast_node_t* node = new_ast_ident_node(tok);
pop_tok(tokbuf);
return node;
}
ast_node_t* parse_type(parser_t* parser) {
tok_buf_t* tokbuf = &parser->tokbuf;
flush_peek_tok(tokbuf);
tok_type_t ttype = peek_tok_type(tokbuf);
data_type_t dtype;
switch(ttype) {
case TOKEN_VOID: dtype = TYPE_VOID; break;
case TOKEN_CHAR: dtype = TYPE_CHAR; break;
case TOKEN_SHORT: dtype = TYPE_SHORT; break;
case TOKEN_INT: dtype = TYPE_INT; break;
case TOKEN_LONG: dtype = TYPE_LONG; break;
case TOKEN_FLOAT: dtype = TYPE_FLOAT; break;
case TOKEN_DOUBLE: dtype = TYPE_DOUBLE; break;
default:
error("无效的类型说明符");
}
ast_node_t* node = new_ast_node();
node->type = NT_TERM_TYPE;
// TODO added by disable warning, will add typing system
dtype += 1;
pop_tok(tokbuf);
if (peek_tok_type(tokbuf) == TOKEN_MUL) {
pop_tok(tokbuf);
}
return node;
}

View File

@@ -1,136 +0,0 @@
// #include "../parser.h"
// #include "../type.h"
// enum TypeParseState {
// TPS_BASE_TYPE, // 解析基础类型 (int/char等)
// TPS_QUALIFIER, // 解析限定符 (const/volatile)
// TPS_POINTER, // 解析指针 (*)
// TPS_ARRAY, // 解析数组维度 ([n])
// TPS_FUNC_PARAMS // 解析函数参数列表
// };
// ast_node_t* parse_type(parser_t* p) {
// ast_node_t* type_root = new_ast_node();
// ast_node_t* current = type_root;
// current->type = NT_TYPE_BASE;
// enum TypeParseState state = TPS_QUALIFIER;
// int pointer_level = 0;
// while (1) {
// tok_type_t t = peektoktype(p);
// switch (state) {
// // 基础类型解析 (int, char等)
// case TPS_BASE_TYPE:
// if (is_base_type(t)) {
// // current->data.data_type = token_to_datatype(t);
// pop_tok(p);
// state = TPS_POINTER;
// } else {
// error("Expected type specifier");
// }
// break;
// // 类型限定符 (const/volatile)
// case TPS_QUALIFIER:
// if (t == TOKEN_CONST || t == TOKEN_VOLATILE) {
// ast_node_t* qual_node = new_ast_node();
// qual_node->type = NT_TYPE_QUAL;
// qual_node->data.data_type = t; // 复用data_type字段存储限定符
// current->child.decl.type = qual_node;
// current = qual_node;
// pop_tok(p);
// } else {
// state = TPS_BASE_TYPE;
// }
// break;
// // 指针解析 (*)
// case TPS_POINTER:
// if (t == TOKEN_MUL) {
// ast_node_t* ptr_node = new_ast_node();
// ptr_node->type = NT_TYPE_PTR;
// current->child.decl.type = ptr_node;
// current = ptr_node;
// pointer_level++;
// pop_tok(p);
// } else {
// state = TPS_ARRAY;
// }
// break;
// // 数组维度 ([n])
// case TPS_ARRAY:
// if (t == TOKEN_L_BRACKET) {
// pop_tok(p); // 吃掉[
// ast_node_t* arr_node = new_ast_node();
// arr_node->type = NT_TYPE_ARRAY;
// // 解析数组大小(仅语法检查)
// if (peektoktype(p) != TOKEN_R_BRACKET) {
// parse_expr(p); // 不计算实际值
// }
// expecttok(p, TOKEN_R_BRACKET);
// current->child.decl.type = arr_node;
// current = arr_node;
// } else {
// state = TPS_FUNC_PARAMS;
// }
// break;
// // 函数参数列表
// case TPS_FUNC_PARAMS:
// if (t == TOKEN_L_PAREN) {
// ast_node_t* func_node = new_ast_node();
// func_node->type = NT_TYPE_FUNC;
// current->child.decl.type = func_node;
// // 解析参数列表(仅结构,不验证类型)
// parse_param_list(p, func_node);
// current = func_node;
// } else {
// return type_root; // 类型解析结束
// }
// break;
// }
// }
// }
// // 判断是否是基础类型
// static int is_base_type(tok_type_t t) {
// return t >= TOKEN_VOID && t <= TOKEN_DOUBLE;
// }
// // // 转换token到数据类型简化版
// // static enum DataType token_to_datatype(tok_type_t t) {
// // static enum DataType map[] = {
// // [TOKEN_VOID] = DT_VOID,
// // [TOKEN_CHAR] = DT_CHAR,
// // [TOKEN_INT] = DT_INT,
// // // ...其他类型映射
// // };
// // return map[t];
// // }
// // 解析参数列表(轻量级)
// static void parse_param_list(parser_t* p, ast_node_t* func) {
// expecttok(p, TOKEN_L_PAREN);
// while (peektoktype(p) != TOKEN_R_PAREN) {
// ast_node_t* param = parse_type(p); // 递归解析类型
// // 允许可选参数名(仅语法检查)
// if (peektoktype(p) == TOKEN_IDENT) {
// pop_tok(p); // 吃掉参数名
// }
// if (peektoktype(p) == TOKEN_COMMA) {
// pop_tok(p);
// }
// }
// expecttok(p, TOKEN_R_PAREN);
// }

View File

@@ -1,17 +0,0 @@
#include "parser.h"
#include "type.h"
void init_parser(parser_t* parser, lexer_t* lexer, symtab_t* symtab) {
parser->cur_node = NULL;
parser->root = NULL;
parser->lexer = lexer;
parser->symtab = symtab;
init_tokbuf(&parser->tokbuf, lexer, (get_tokbuf_func)get_valid_token);
parser->tokbuf.cap = sizeof(parser->TokenBuffer) / sizeof(parser->TokenBuffer[0]);
parser->tokbuf.buf = parser->TokenBuffer;
}
void run_parser(parser_t* parser) {
parse_prog(parser);
}

View File

@@ -1,25 +0,0 @@
#ifndef __PARSER_H__
#define __PARSER_H__
#include "../frontend.h"
#include "../lexer/lexer.h"
typedef struct lexer lexer_t;
typedef struct symtab symtab_t;
#define PARSER_MAX_TOKEN_QUEUE 16
typedef struct parser {
struct ASTNode* root;
struct ASTNode* cur_node;
lexer_t* lexer;
symtab_t* symtab;
tok_buf_t tokbuf;
tok_t TokenBuffer[PARSER_MAX_TOKEN_QUEUE];
int err_level;
} parser_t;
void init_parser(parser_t* parser, lexer_t* lexer, symtab_t* symtab);
void run_parser(parser_t* parser);
#endif

View File

@@ -1,53 +0,0 @@
// hashmap.c
#include "hashmap.h"
#include <stdlib.h>
#include <string.h>
// DJB2哈希算法
static unsigned long hash(const char* str) {
unsigned long hash = 5381;
int c;
while ((c = *str++))
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash % HMAP_SIZE;
}
void hmap_init(HashMap* map) {
memset(map->buckets, 0, sizeof(map->buckets));
}
void hmap_put(HashMap* map, const char* key, void* value) {
unsigned long idx = hash(key);
HashMapEntry* entry = malloc(sizeof(HashMapEntry));
entry->key = strdup(key);
entry->value = value;
entry->next = map->buckets[idx];
map->buckets[idx] = entry;
}
void* hmap_get(HashMap* map, const char* key) {
unsigned long idx = hash(key);
HashMapEntry* entry = map->buckets[idx];
while (entry) {
if (strcmp(entry->key, key) == 0)
return entry->value;
entry = entry->next;
}
return NULL;
}
int hmap_contains(HashMap* map, const char* key) {
return hmap_get(map, key) != NULL;
}
void hmap_destroy(HashMap* map) {
for (int i = 0; i < HMAP_SIZE; i++) {
HashMapEntry* entry = map->buckets[i];
while (entry) {
HashMapEntry* next = entry->next;
free(entry->key);
free(entry);
entry = next;
}
}
}

View File

@@ -1,31 +0,0 @@
#ifndef HASHMAP_H
#define HASHMAP_H
#define HMAP_SIZE 64
typedef struct HashMapEntry {
char* key;
void* value;
struct HashMapEntry* next;
} HashMapEntry;
typedef struct {
HashMapEntry* buckets[HMAP_SIZE];
} HashMap;
// 初始化哈希表
void hmap_init(HashMap* map);
// 插入键值对
void hmap_put(HashMap* map, const char* key, void* value);
// 查找键值
void* hmap_get(HashMap* map, const char* key);
// 检查键是否存在
int hmap_contains(HashMap* map, const char* key);
// 释放哈希表内存不释放value
void hmap_destroy(HashMap* map);
#endif

View File

@@ -1,43 +0,0 @@
// scope.c
#include "scope.h"
#include <stdio.h>
#include <stdlib.h>
typedef struct Scope Scope;
Scope* scope_create(Scope* parent) {
Scope* scope = malloc(sizeof(Scope));
hmap_init(&scope->symbols);
scope->parent = parent;
scope->base_offset = 0;
scope->cur_offset = 0;
return scope;
}
void scope_destroy(Scope* scope) {
hmap_destroy(&scope->symbols);
free(scope);
}
void scope_insert(Scope* scope, const char* name, void* symbol) {
if (hmap_contains(&scope->symbols, name)) {
// 处理重复定义错误
fprintf(stderr, "Error: Symbol '%s' already defined\n", name);
exit(EXIT_FAILURE);
}
hmap_put(&scope->symbols, name, symbol);
}
void* scope_lookup(Scope* scope, const char* name) {
void* symbol = NULL;
while (scope) {
symbol = hmap_get(&scope->symbols, name);
if (symbol) break;
scope = scope->parent;
}
return symbol;
}
void* scope_lookup_current(Scope* scope, const char* name) {
return hmap_get(&scope->symbols, name);
}

View File

@@ -1,28 +0,0 @@
#ifndef SCOPE_H
#define SCOPE_H
#include "hashmap.h"
struct Scope {
HashMap symbols; // 当前作用域符号表
struct Scope* parent; // 上层作用域
int base_offset;
int cur_offset;
};
// 创建新作用域父作用域可为NULL
struct Scope* scope_create(struct Scope* parent);
// 销毁作用域
void scope_destroy(struct Scope* scope);
// 在当前作用域插入符号
void scope_insert(struct Scope* scope, const char* name, void* symbol);
// 逐级查找符号
void* scope_lookup(struct Scope* scope, const char* name);
// 仅在当前作用域查找
void* scope_lookup_current(struct Scope* scope, const char* name);
#endif

View File

@@ -1,49 +0,0 @@
// symtab.c
#include "../../frontend.h"
#include "scope.h"
#include "symtab.h"
typedef symtab_t symtab_t;
typedef struct Scope Scope;
void init_symtab(symtab_t* symtab) {
symtab->global_scope = scope_create(NULL);
symtab->cur_scope = symtab->global_scope;
}
void del_symtab(symtab_t* symtab) {
scope_destroy(symtab->global_scope);
}
void symtab_enter_scope(symtab_t* symtab) {
struct Scope* scope = scope_create(symtab->cur_scope);
scope->base_offset = symtab->cur_scope->base_offset + symtab->cur_scope->cur_offset;
symtab->cur_scope = scope;
}
void symtab_leave_scope(symtab_t* symtab) {
Scope * scope = symtab->cur_scope;
if (scope == NULL) {
error("cannot leave NULL scope or global scope");
}
symtab->cur_scope = symtab->cur_scope->parent;
scope_destroy(scope);
}
void* symtab_add_symbol(symtab_t* symtab, const char* name, void* ast_node, int can_duplicate) {
struct Scope* scope = symtab->cur_scope;
void* node = scope_lookup_current(scope, name);
if (node != NULL) {
if (!can_duplicate) {
error("duplicate symbol %s", name);
}
return node;
}
scope_insert(scope, name, ast_node);
return node;
}
void* symtab_lookup_symbol(symtab_t* symtab, const char* name) {
return scope_lookup(symtab->cur_scope, name);
}

View File

@@ -1,18 +0,0 @@
// symtab.h
#ifndef __SYMTAB_H__
#define __SYMTAB_H__
typedef struct symtab {
struct Scope* cur_scope;
struct Scope* global_scope;
} symtab_t;
void init_symtab(symtab_t* symtab);
void del_symtab(symtab_t* symtab);
void symtab_enter_scope(symtab_t* symtab);
void symtab_leave_scope(symtab_t* symtab);
void* symtab_add_symbol(symtab_t* symtab, const char* name, void* ast_node, int can_duplicate);
void* symtab_lookup_symbol(symtab_t* symtab, const char* name);
#endif

View File

@@ -1,4 +0,0 @@
extern int _print_str(const char* str);
int main(void) {
_print_str("Hello, world!\n");
}

View File

@@ -1,14 +0,0 @@
// int __print_str(char* str);
int f(void);
int main(void) {
int a;
// f();
// a = 1 + 2 * 3 + 4;
// __print_str("Hello, world!\n");
a = 3 - f() * (3 + 2) % 6;
// 测试用例:
// if (a) if (2) 3; else b;
// 是否正确解析为 if (a) { if (b) c else d }
}

View File

@@ -1,34 +0,0 @@
#include "../parser.h"
#include "../ast/ast.h"
#include "../symtab/symtab.h"
#include <stdio.h>
// gcc -g ../parser.c ../../lexer/lexer.c ../ast/ast.c ../ast/block.c ../ast/decl.c ../ast/expr.c ../ast/func.c ../ast/program.c ../ast/stmt.c ../ast/term.c ../symtab/hashmap.c ../symtab/scope.c ../symtab/symtab.c test_parser.c -o test_parser
// gcc -g test_parser.c -L../.. -lfrontend -o test_parser
int main(int argc, char** argv) {
const char* file_name = "test_file.c";
if (argc == 2) {
file_name = argv[1];
}
FILE* fp = fopen(file_name, "r");
if (fp == NULL) {
perror("open file failed");
return 1;
}
printf("open file success\n");
struct Lexer lexer;
init_lexer(&lexer, file_name, fp, (lexer_sread_fn)fread_s);
struct SymbolTable symtab;
init_symtab(&symtab);
struct parser parser;
init_parser(&parser, &lexer, &symtab);
parse_prog(&parser);
printf("parse_end\n");
pnt_ast(parser.root, 0);
return 0;
}

View File

@@ -1,35 +0,0 @@
#ifndef __TYPE_H__
#define __TYPE_H__
#include "../lexer/token.h"
typedef enum {
TYPE_VOID,
TYPE_CHAR,
TYPE_SHORT,
TYPE_INT,
TYPE_LONG,
TYPE_LONG_LONG,
TYPE_FLOAT,
TYPE_DOUBLE,
TYPE_LONG_DOUBLE,
// prefix
TYPE_SIGNED,
TYPE_UNSIGNED,
// TYPE_BOOL,
// TYPE_COMPLEX,
// TYPE_IMAGINARY,
TYPE_ENUM,
TYPE_ARRAY,
TYPE_STRUCT,
TYPE_UNION,
TYPE_FUNCTION,
TYPE_POINTER,
TYPE_ATOMIC,
TYPE_TYPEDEF,
} data_type_t;
#endif

View File

@@ -1,30 +0,0 @@
# 编译器设置
CC = gcc
AR = ar
CFLAGS = -g -Wall
# 源文件列表
SRCS = \
ir.c \
ir_ast.c \
ir_lib.c \
ir_type.c
# 生成目标文件列表
OBJS = $(SRCS:.c=.o)
# 最终目标
TARGET = libir.a
all: $(TARGET)
$(TARGET): $(OBJS)
$(AR) rcs $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $<
clean:
rm -f $(OBJS) $(TARGET)
.PHONY: all clean

View File

@@ -1,159 +0,0 @@
// ir_core.h
#ifndef IR_CORE_H
#define IR_CORE_H
#include "../../libcore/vector.h"
#include <stddef.h>
#include <stdint.h>
// 错误码定义
typedef enum {
IR_EC_SUCCESS = 0, // 成功
IR_EC_MEMORY_ERROR, // 内存分配失败
IR_EC_TYPE_MISMATCH, // 类型不匹配
IR_EC_INVALID_OPERAND, // 无效操作数
IR_EC_DUPLICATE_SYMBOL, // 符号重定义
} ir_ecode_t;
typedef struct {
enum {
IR_TYPE_INT32,
IR_TYPE_PTR,
IR_TYPE_ARRAY,
IR_TYPE_FUNC,
IR_TYPE_VOID,
} tag;
union {
struct {
struct ir_type *base;
size_t len;
} arr;
struct {
struct ir_type *ret;
struct ir_type **params;
size_t param_cnt;
} func;
};
} ir_type_t;
typedef struct ir_node ir_node_t;
typedef struct ir_bblock {
const char *label;
vector_header(instrs, ir_node_t*);
// ir_arr_t used_by;
} ir_bblock_t; // basic block
typedef struct {
const char *name;
ir_type_t *type;
vector_header(params, ir_node_t*);
vector_header(bblocks, ir_bblock_t*);
} ir_func_t;
typedef struct {
vector_header(global, ir_node_t*);
vector_header(funcs, ir_func_t*);
vector_header(extern_funcs, ir_func_t*);
} ir_prog_t;
typedef enum ir_node_tag {
IR_NODE_NULL,
IR_NODE_CONST_INT,
IR_NODE_ALLOC,
IR_NODE_LOAD,
IR_NODE_STORE,
IR_NODE_GET_PTR,
IR_NODE_OP,
IR_NODE_BRANCH,
IR_NODE_JUMP,
IR_NODE_CALL,
IR_NODE_RET,
} ir_node_tag_t;
struct ir_node {
const ir_type_t* type;
const char* name;
vector_header(used_by, ir_node_t*);
ir_node_tag_t tag;
union {
struct {
int32_t val;
} const_int;
struct {
ir_node_t* target;
} load;
struct {
ir_node_t* target;
ir_node_t* value;
} store;
struct {
ir_node_t* src_addr;
ir_node_t* offset;
} get_ptr;
struct {
enum {
/// Not equal to.
IR_OP_NEQ,
/// Equal to.
IR_OP_EQ,
/// Greater than.
IR_OP_GT,
/// Less than.
IR_OP_LT,
/// Greater than or equal to.
IR_OP_GE,
/// Less than or equal to.
IR_OP_LE,
/// Addition.
IR_OP_ADD,
/// Subtraction.
IR_OP_SUB,
/// Multiplication.
IR_OP_MUL,
/// Division.
IR_OP_DIV,
/// Modulo.
IR_OP_MOD,
/// Bitwise AND.
IR_OP_AND,
/// Bitwise OR.
IR_OP_OR,
/// Bitwise XOR.
IR_OP_XOR,
/// Bitwise NOT.
IR_OP_NOT,
/// Shift left logical.
IR_OP_SHL,
/// Shift right logical.
IR_OP_SHR,
/// Shift right arithmetic.
IR_OP_SAR,
} op;
ir_node_t* lhs;
ir_node_t* rhs;
} op;
struct {
ir_node_t* cond;
ir_bblock_t* true_bblock;
ir_bblock_t* false_bblock;
} branch;
struct {
ir_bblock_t* target_bblock;
} jump;
struct {
ir_func_t* callee;
vector_header(args, ir_node_t*);
} call;
struct {
ir_node_t* ret_val;
} ret;
} data;
};
extern ir_prog_t prog;
struct ASTNode;
void gen_ir_from_ast(struct ASTNode* node);
#endif // IR_CORE_H

View File

@@ -1,439 +0,0 @@
#include "ir.h"
#include "ir_lib.h"
#include "ir_type.h"
#include "../frontend/frontend.h"
// 上下文结构,记录生成过程中的状态
typedef struct {
ir_func_t* cur_func; // 当前处理的函数
ir_bblock_t* cur_block; // 当前基本块
} IRGenContext;
IRGenContext ctx;
ir_prog_t prog;
static void emit_instr(ir_bblock_t* block, ir_node_t* node) {
if (block == NULL) block = ctx.cur_block;
vector_push(block->instrs, node);
// return &(vector_at(block->instrs, block->instrs.size - 1));
}
static ir_node_t* emit_br(ir_node_t* cond, ir_bblock_t* trueb, ir_bblock_t* falseb) {
ir_node_t* br = new_ir_node(NULL, IR_NODE_BRANCH);
emit_instr(NULL, br);
br->data.branch.cond = cond;
br->data.branch.true_bblock = trueb;
br->data.branch.false_bblock = falseb;
return br;
}
static ir_node_t* gen_ir_expr(ast_node_t* node);
static ir_node_t* gen_ir_term(ast_node_t* node) {
switch (node->type) {
case NT_TERM_VAL: {
ir_node_t* ir = new_ir_node(NULL, IR_NODE_CONST_INT);
ir->data.const_int.val = node->syms.tok.val.i;
return ir;
}
case NT_TERM_IDENT: {
ir_node_t* decl = node->syms.decl_node->decl_val.data;
return decl;
}
case NT_TERM_CALL: {
ir_node_t* call = new_ir_node(NULL, IR_NODE_CALL);
call->data.call.callee = node->call.func_decl->decl_func.def->func.data;
for (int i = 0; i < node->call.params->params.params.size; i++) {
ast_node_t* param = vector_at(node->call.params->params.params, i);
ir_node_t *tmp = gen_ir_expr(param);
vector_push(call->data.call.args, tmp);
}
emit_instr(NULL, call);
return call;
}
default: {
assert(0);
}
}
}
static ir_node_t* gen_ir_expr(ast_node_t* node) {
// term node
switch (node->type) {
case NT_TERM_VAL:
case NT_TERM_IDENT:
case NT_TERM_CALL:
return gen_ir_term(node);
default:
break;
}
ir_node_t* lhs = gen_ir_expr(node->expr.left);
ir_node_t* rhs = node->expr.right ? gen_ir_expr(node->expr.right) : NULL;
if (node->type == NT_COMMA) {
return rhs;
}
ir_node_t* instr = NULL;
vector_push(lhs->used_by, instr);
if (rhs) { vector_push(rhs->used_by, instr); }
ir_node_t* ret;
#define BINOP(operand) do { \
instr = new_ir_node(NULL, IR_NODE_OP); \
instr->data.op.op = operand; \
instr->data.op.lhs = lhs; \
instr->data.op.rhs = rhs; \
ret = instr; \
} while (0)
switch (node->type) {
case NT_ADD: {
// (expr) + (expr)
BINOP(IR_OP_ADD); break;
}
case NT_SUB: {
// (expr) - (expr)
BINOP(IR_OP_SUB); break;
}
case NT_MUL: {
// (expr) * (expr)
BINOP(IR_OP_MUL); break;
}
case NT_DIV: {
// (expr) / (expr)
BINOP(IR_OP_DIV); break;
}
case NT_MOD: {
// (expr) % (expr)
BINOP(IR_OP_MOD); break;
}
case NT_AND: {
// (expr) & (expr)
BINOP(IR_OP_AND); break;
}
case NT_OR: {
// (expr) | (expr)
BINOP(IR_OP_OR); break;
}
case NT_XOR: {
// (expr) ^ (expr)
BINOP(IR_OP_XOR); break;
}
case NT_BIT_NOT: {
// ~ (expr)
// TODO
// BINOP(IR_OP_NOT);
break;
}
case NT_L_SH: {
// (expr) << (expr)
BINOP(IR_OP_SHL);
break;
}
case NT_R_SH: {
// (expr) >> (expr)
BINOP(IR_OP_SHR); // Shift right logical.
// TODO
// BINOP(IR_OP_SAR); // Shift right arithmetic.
break;
}
case NT_EQ: {
// (expr) == (expr)
BINOP(IR_OP_EQ); break;
}
case NT_NEQ: {
// (expr) != (expr)
BINOP(IR_OP_NEQ); break;
}
case NT_LE: {
// (expr) <= (expr)
BINOP(IR_OP_LE); break;
}
case NT_GE: {
// (expr) >= (expr)
BINOP(IR_OP_GE); break;
}
case NT_LT: {
// (expr) < (expr)
BINOP(IR_OP_LT); break;
}
case NT_GT: {
// (expr) > (expr)
BINOP(IR_OP_GE); break;
}
case NT_AND_AND:// (expr) && (expr)
error("unimpliment");
break;
case NT_OR_OR:// (expr) || (expr)
error("unimpliment");
break;
case NT_NOT: {
// ! (expr)
instr = new_ir_node(NULL, IR_NODE_OP);
instr->data.op.op = IR_OP_EQ,
instr->data.op.lhs = &node_zero,
instr->data.op.rhs = lhs,
ret = instr;
break;
}
case NT_ASSIGN: {
// (expr) = (expr)
instr = new_ir_node(NULL, IR_NODE_STORE);
instr->data.store.target = lhs;
instr->data.store.value = rhs;
ret = rhs;
break;
}
// case NT_COND: // (expr) ? (expr) : (expr)
default: {
// TODO self error msg
error("Unsupported IR generation for AST node type %d", node->type);
break;
}
}
emit_instr(NULL, instr);
return ret;
}
static void gen_ir_func(ast_node_t* node, ir_func_t* func) {
assert(node->type == NT_FUNC);
ir_bblock_t *entry = new_ir_bblock("entry");
vector_push(func->bblocks, entry);
vector_push(prog.funcs, func);
IRGenContext prev_ctx = ctx;
ctx.cur_func = func;
ctx.cur_block = entry;
ast_node_t* params = node->func.decl->decl_func.params;
for (int i = 0; i < params->params.params.size; i ++) {
ast_node_t* param = params->params.params.data[i];
ir_node_t* decl = new_ir_node(param->decl_val.name->syms.tok.val.str, IR_NODE_ALLOC);
emit_instr(entry, decl);
vector_push(func->params, decl);
// TODO Typing system
decl->type = &type_i32;
param->decl_val.data = decl;
}
gen_ir_from_ast(node->func.body);
ctx = prev_ctx;
}
void gen_ir_jmp(ast_node_t* node) {
ir_bblock_t *bblocks[3];
for (int i = 0; i < sizeof(bblocks)/sizeof(bblocks[0]); i++) {
bblocks[i] = new_ir_bblock(NULL);
vector_push(ctx.cur_func->bblocks, bblocks[i]);
}
#define NEW_IR_JMP(name, block) do { \
name = new_ir_node(NULL, IR_NODE_JUMP); \
name->data.jump.target_bblock = block; \
} while (0)
switch (node->type) {
case NT_STMT_IF: {
ir_bblock_t* trueb = bblocks[0];
ir_bblock_t* falseb = bblocks[1];
ir_bblock_t* endb = bblocks[2];
ir_node_t* jmp;
// cond
ir_node_t *cond = gen_ir_expr(node->if_stmt.cond);
emit_br(cond, trueb, falseb);
// true block
vector_push(ctx.cur_func->bblocks, trueb);
ctx.cur_block = trueb;
gen_ir_from_ast(node->if_stmt.if_stmt);
// else block
if (node->if_stmt.else_stmt != NULL) {
vector_push(ctx.cur_func->bblocks, falseb);
ctx.cur_block = falseb;
gen_ir_from_ast(node->if_stmt.else_stmt);
ir_node_t* jmp;
ctx.cur_block = endb;
vector_push(ctx.cur_func->bblocks, ctx.cur_block);
NEW_IR_JMP(jmp, ctx.cur_block);
emit_instr(falseb, jmp);
} else {
ctx.cur_block = falseb;
}
NEW_IR_JMP(jmp, ctx.cur_block);
emit_instr(trueb, jmp);
break;
}
case NT_STMT_WHILE: {
ir_bblock_t* entryb = bblocks[0];
ir_bblock_t* bodyb = bblocks[1];
ir_bblock_t* endb = bblocks[2];
ir_node_t* entry;
NEW_IR_JMP(entry, entryb);
emit_instr(NULL, entry);
// Entry:
ctx.cur_block = entryb;
ir_node_t *cond = gen_ir_expr(node->while_stmt.cond);
emit_br(cond, bodyb, endb);
// Body:
ir_node_t* jmp;
ctx.cur_block = bodyb;
gen_ir_from_ast(node->while_stmt.body);
NEW_IR_JMP(jmp, entryb);
emit_instr(NULL, jmp);
// End:
ctx.cur_block = endb;
break;
}
case NT_STMT_DOWHILE: {
ir_bblock_t* entryb = bblocks[0];
ir_bblock_t* bodyb = bblocks[1];
ir_bblock_t* endb = bblocks[2];
ir_node_t* entry;
NEW_IR_JMP(entry, bodyb);
emit_instr(NULL, entry);
// Body:
ctx.cur_block = bodyb;
gen_ir_from_ast(node->do_while_stmt.body);
ir_node_t* jmp;
NEW_IR_JMP(jmp, entryb);
emit_instr(NULL, jmp);
// Entry:
ctx.cur_block = entryb;
ir_node_t *cond = gen_ir_expr(node->do_while_stmt.cond);
emit_br(cond, bodyb, endb);
// End:
ctx.cur_block = endb;
break;
}
case NT_STMT_FOR: {
ir_bblock_t* entryb = bblocks[0];
ir_bblock_t* bodyb = bblocks[1];
ir_bblock_t* endb = bblocks[2];
if (node->for_stmt.init) {
gen_ir_from_ast(node->for_stmt.init);
}
ir_node_t* entry;
NEW_IR_JMP(entry, entryb);
emit_instr(NULL, entry);
// Entry:
ctx.cur_block = entryb;
if (node->for_stmt.cond) {
ir_node_t *cond = gen_ir_expr(node->for_stmt.cond);
emit_br(cond, bodyb, endb);
} else {
ir_node_t* jmp;
NEW_IR_JMP(jmp, bodyb);
}
// Body:
ctx.cur_block = bodyb;
gen_ir_from_ast(node->for_stmt.body);
if (node->for_stmt.iter) {
gen_ir_expr(node->for_stmt.iter);
}
ir_node_t* jmp;
NEW_IR_JMP(jmp, entryb);
emit_instr(NULL, jmp);
// End:
ctx.cur_block = endb;
break;
}
default:
error("ir jmp can't hit here");
}
}
void gen_ir_from_ast(ast_node_t* node) {
switch (node->type) {
case NT_ROOT: {
for (int i = 0; i < node->root.children.size; i ++) {
gen_ir_from_ast(node->root.children.data[i]);
}
break;
}
case NT_DECL_FUNC: {
ir_func_t* func = new_ir_func(node->decl_func.name->syms.tok.val.str, &type_i32);
if (node->decl_func.def == NULL) {
ast_node_t* def = new_ast_node();
def->func.body = NULL;
def->func.decl = node;
node->decl_func.def = def;
vector_push(prog.extern_funcs, func);
}
node->decl_func.def->func.data = func;
break;
}
case NT_FUNC: {
gen_ir_func(node, node->func.data);
break;
}
case NT_STMT_RETURN: {
ir_node_t* ret = NULL;
if (node->return_stmt.expr_stmt != NULL) {
ret = gen_ir_expr(node->return_stmt.expr_stmt);
}
ir_node_t* ir = new_ir_node(NULL, IR_NODE_RET);
ir->data.ret.ret_val = ret;
emit_instr(NULL, ir);
ir_bblock_t* block = new_ir_bblock(NULL);
ctx.cur_block = block;
vector_push(ctx.cur_func->bblocks, block);
break;
}
case NT_STMT_BLOCK: {
gen_ir_from_ast(node->block_stmt.block);
break;
}
case NT_BLOCK: {
for (int i = 0; i < node->block.children.size; i ++) {
gen_ir_from_ast(node->block.children.data[i]);
}
break;
}
case NT_STMT_IF:
case NT_STMT_WHILE:
case NT_STMT_DOWHILE:
case NT_STMT_FOR:
gen_ir_jmp(node);
break;
case NT_DECL_VAR: {
ir_node_t* ir = new_ir_node(node->decl_val.name->syms.tok.val.str, IR_NODE_ALLOC);
emit_instr(NULL, ir);
// TODO Typing system
ir->type = &type_i32;
node->decl_val.data = ir;
if (node->decl_val.expr_stmt != NULL) {
gen_ir_from_ast(node->decl_val.expr_stmt);
}
break;
}
case NT_STMT_EXPR: {
gen_ir_expr(node->expr_stmt.expr_stmt);
break;
}
case NT_STMT_EMPTY: {
break;
}
default:
// TODO: 错误处理
error("unknown node type");
break;
}
}

View File

@@ -1,122 +0,0 @@
#include "ir.h"
// FIXME using stdlib.h
#include <stdlib.h>
static int total_alloc = 0;
typedef union ir_alloc_item {
ir_node_t node;
ir_bblock_t bblock;
ir_func_t func;
ir_prog_t prog;
} ir_alloc_item_t;
ir_alloc_item_t* alloc_item() {
return malloc(sizeof(ir_alloc_item_t));
}
void free_item(ir_alloc_item_t* item) {
return free(item);
}
ir_node_t* new_ir_node(const char* name, ir_node_tag_t tag) {
ir_node_t* node = (ir_node_t*)alloc_item();
node->name = name;
node->type = NULL;
node->tag = tag;
switch (tag) {
case IR_NODE_ALLOC: {
node->type = NULL;
break;
}
case IR_NODE_BRANCH: {
node->data.branch.cond = NULL;
node->data.branch.true_bblock = NULL;
node->data.branch.false_bblock = NULL;
break;
}
case IR_NODE_CALL: {
vector_init(node->data.call.args);
node->data.call.callee = NULL;
break;
}
case IR_NODE_CONST_INT: {
node->data.const_int.val = 0;
break;
}
case IR_NODE_JUMP: {
node->data.jump.target_bblock = NULL;
break;
}
case IR_NODE_LOAD: {
node->data.load.target = NULL;
break;
}
case IR_NODE_STORE: {
node->data.store.target = NULL;
node->data.store.value = NULL;
break;
}
case IR_NODE_OP: {
node->data.op.op = 0;
node->data.op.lhs = NULL;
node->data.op.rhs = NULL;
break;
}
case IR_NODE_RET: {
node->data.ret.ret_val = NULL;
break;
}
case IR_NODE_GET_PTR: {
}
default: {
exit(0);
}
}
vector_init(node->used_by);
return node;
}
void dump_ir_node(ir_node_t* node) {
}
void free_irnode() {
}
ir_bblock_t* new_ir_bblock(const char* name) {
ir_bblock_t* block = (ir_bblock_t*)alloc_item();
block->label = name;
vector_init(block->instrs);
return block;
}
void free_irbblock() {
}
ir_func_t* new_ir_func(const char* name, ir_type_t* type) {
ir_func_t* func = (ir_func_t*)alloc_item();
func->name = name;
func->type = type;
vector_init(func->params);
vector_init(func->bblocks);
return func;
}
void free_irfunc() {
}
ir_prog_t* new_ir_prog() {
ir_prog_t* prog = (ir_prog_t*)alloc_item();
vector_init(prog->global);
vector_init(prog->funcs);
vector_init(prog->extern_funcs);
return prog;
}
void free_irprog() {
}

View File

@@ -1,9 +0,0 @@
#ifndef __IR_LIB_H__
#define __IR_LIB_H__
#include "ir.h"
ir_node_t* new_ir_node(const char* name, ir_node_tag_t tag);
ir_bblock_t* new_ir_bblock(const char* name);
ir_func_t* new_ir_func(const char* name, ir_type_t* type);
#endif

View File

@@ -1,12 +0,0 @@
#include "ir.h"
ir_type_t type_i32 = {
.tag = IR_TYPE_INT32,
};
ir_node_t node_zero = {
.tag = IR_NODE_CONST_INT,
.data.const_int = {
.val = 0,
},
};

View File

@@ -1,8 +0,0 @@
#ifndef __IR_TYPE_H__
#define __IR_TYPE_H__
#include "ir.h"
extern ir_type_t type_i32;
extern ir_node_t node_zero;
#endif

View File

@@ -1,8 +0,0 @@
#ifndef __REG_ALLOC_H__
#define __REG_ALLOC_H__
typedef struct {
} reg_alloc_t;
#endif

View File

@@ -1,8 +0,0 @@
all: test_ir
test_ir: frontend
gcc -g ../ir.c test_ir.c -L../../frontend -lfrontend -o test_ir
frontend:
make -C ../../frontend

View File

@@ -1,7 +0,0 @@
int add(int a, int b) {
return a + b;
}
int main(void) {
return add(1, 2);
}

View File

@@ -1,18 +0,0 @@
#include "../ir.h"
#include "../../frontend/frontend.h"
int main(int argc, const char** argv) {
const char* file_name = "test_file.c";
if (argc == 2) {
file_name = argv[1];
}
FILE* fp = fopen(file_name, "r");
if (fp == NULL) {
perror("open file failed");
return 1;
}
printf("open file success\n");
struct ASTNode* root = frontend("test.c", fp, (sread_fn)fread_s);
gen_ir_from_ast(root);
return 0;
}

11
justfile Normal file
View File

@@ -0,0 +1,11 @@
list:
just --list
build-lexer:
python build.py build -p libs/lexer
build-docs:
doxygen Doxyfile
docs: build-docs
python -m http.server -d docs/html

View File

@@ -1,10 +0,0 @@
#ifndef __STDCORE_H__
#define __STDCORE_H__
#ifndef __NO_LINK_STDLIB
#include <stdlib.h>
#else
#error "__NO_LINK_STDLIB"
#endif
#endif

View File

@@ -1,54 +0,0 @@
// vector.h
#ifndef VECTOR_H
#define VECTOR_H
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define vector_header(name, type) \
struct { \
size_t size; \
size_t cap; \
type *data; \
} name \
#define vector_init(vec) \
do { \
(vec).size = 0, \
(vec).cap = 0, \
(vec).data = NULL; \
} while(0)
#define vector_push(vec, value) \
do { \
if (vec.size >= vec.cap) { \
int cap = vec.cap ? vec.cap * 2 : 8; \
void* data = realloc(vec.data, cap * sizeof(*vec.data)); \
if (!data) { \
fprintf(stderr, "vector_push: realloc failed\n"); \
exit(1); \
} \
(vec).cap = cap; \
(vec).data = data; \
} \
(vec).data[(vec).size++] = value; \
} while(0)
#define vector_pop(vec) \
((vec).data[--(vec).size])
#define vector_at(vec, idx) \
(((vec).data)[idx])
#define vector_idx(vec, ptr) \
((ptr) - (vec).data)
#define vector_free(vec) \
do { \
free((vec).data); \
(vec).data = NULL; \
(vec).size = (vec).cap = 0; \
} while(0)
#endif

9
libs/README.md Normal file
View File

@@ -0,0 +1,9 @@
lexer 词法分析
parse 语法分析
ast 抽象语法树
sema 语义分析
ir 中间代码标识
opt 优化器
codegen 代码生成
target 目标平台支持

View File

@@ -0,0 +1,5 @@
[package]
name = "smcc_lex_parser"
version = "0.1.0"
dependencies = [{ name = "libcore", path = "../../runtime/libcore" }]

View File

@@ -0,0 +1,26 @@
#ifndef __SMCC_LEX_PARSER_H__
#define __SMCC_LEX_PARSER_H__
#include <libcore.h>
static inline cbool lex_parse_is_endline(int ch) {
return ch == '\n' || ch == '\r';
}
static inline cbool lex_parse_is_whitespace(int ch) {
return ch == ' ' || ch == '\t';
}
int lex_parse_char(scc_probe_stream_t *input, scc_pos_t *pos);
cbool lex_parse_string(scc_probe_stream_t *input, scc_pos_t *pos,
scc_cstring_t *output);
cbool lex_parse_number(scc_probe_stream_t *input, scc_pos_t *pos,
usize *output);
cbool lex_parse_identifier(scc_probe_stream_t *input, scc_pos_t *pos,
scc_cstring_t *output);
void lex_parse_skip_endline(scc_probe_stream_t *input, scc_pos_t *pos);
void lex_parse_skip_block_comment(scc_probe_stream_t *input, scc_pos_t *pos);
void lex_parse_skip_line(scc_probe_stream_t *input, scc_pos_t *pos);
void lex_parse_skip_whitespace(scc_probe_stream_t *input, scc_pos_t *pos);
#endif /* __SMCC_LEX_PARSER_H__ */

View File

@@ -0,0 +1,435 @@
#include <lex_parser.h>
void lex_parse_skip_endline(scc_probe_stream_t *input, scc_pos_t *pos) {
Assert(input != null && pos != null);
scc_probe_stream_reset(input);
int ch = scc_probe_stream_peek(input);
if (ch == '\r') {
scc_probe_stream_consume(input);
ch = scc_probe_stream_peek(input);
if (ch == '\n') {
scc_probe_stream_consume(input);
}
core_pos_next_line(pos);
} else if (ch == '\n') {
scc_probe_stream_consume(input);
core_pos_next_line(pos);
} else {
LOG_WARN("not a newline character");
}
}
/**
* @brief
*
* @param ch
* @return int
* https://cppreference.cn/w/c/language/escape
* `\'` 单引号 在 ASCII 编码中为字节 0x27
* `\"` 双引号 在 ASCII 编码中为字节 0x22
* `\?` 问号 在 ASCII 编码中为字节 0x3f
* `\\` 反斜杠 在 ASCII 编码中为字节 0x5c
* `\a` 响铃 在 ASCII 编码中为字节 0x07
* `\b` 退格 在 ASCII 编码中为字节 0x08
* `\f` 换页 - 新页 在 ASCII 编码中为字节 0x0c
* `\n` 换行 - 新行 在 ASCII 编码中为字节 0x0a
* `\r` 回车 在 ASCII 编码中为字节 0x0d
* `\t` 水平制表符 在 ASCII 编码中为字节 0x09
* `\v` 垂直制表符 在 ASCII 编码中为字节 0x0b
*/
static inline int got_simple_escape(int ch) {
/* clang-format off */
#define CASE(ch) case ch: return ch;
switch (ch) {
case '\'': return '\'';
case '\"': return '\"';
case '\?': return '\?';
case '\\': return '\\';
case 'a': return '\a';
case 'b': return '\b';
case 'f': return '\f';
case 'n': return '\n';
case 'r': return '\r';
case 't': return '\t';
case 'v': return '\v';
default: return -1;
}
/* clang-format on */
}
void lex_parse_skip_line(scc_probe_stream_t *input, scc_pos_t *pos) {
scc_probe_stream_t *stream = input;
Assert(stream != null && pos != null);
scc_probe_stream_reset(stream);
while (1) {
int ch = scc_probe_stream_peek(stream);
if (ch == core_stream_eof) {
return;
}
// TODO endline
if (lex_parse_is_endline(ch)) {
lex_parse_skip_endline(stream, pos);
return;
} else {
scc_probe_stream_consume(stream);
core_pos_next(pos);
}
}
}
void lex_parse_skip_block_comment(scc_probe_stream_t *input, scc_pos_t *pos) {
scc_probe_stream_t *stream = input;
Assert(stream != null && pos != null);
int ch;
scc_probe_stream_reset(stream);
ch = scc_probe_stream_consume(stream);
core_pos_next(pos);
// FIXME Assertion
Assert(ch == '/');
ch = scc_probe_stream_consume(stream);
core_pos_next(pos);
Assert(ch == '*');
// all ready match `/*`
while (1) {
scc_probe_stream_reset(stream);
ch = scc_probe_stream_peek(stream);
if (ch == core_stream_eof) {
LOG_WARN("Unterminated block comment");
return;
}
if (lex_parse_is_endline(ch)) {
lex_parse_skip_endline(stream, pos);
continue;
}
scc_probe_stream_consume(stream);
core_pos_next(pos);
if (ch == '*') {
ch = scc_probe_stream_peek(stream);
if (ch == '/') {
scc_probe_stream_consume(stream);
core_pos_next(pos);
return;
}
}
}
}
void lex_parse_skip_whitespace(scc_probe_stream_t *input, scc_pos_t *pos) {
scc_probe_stream_t *stream = input;
Assert(stream != null && pos != null);
scc_probe_stream_reset(stream);
while (1) {
int ch = scc_probe_stream_peek(stream);
if (!lex_parse_is_whitespace(ch)) {
return;
}
scc_probe_stream_consume(stream);
core_pos_next(pos);
}
}
static inline cbool _lex_parse_uint(scc_probe_stream_t *input, scc_pos_t *pos,
int base, usize *output) {
Assert(input != null && pos != null);
if (input == null || pos == null) {
return false;
}
Assert(base == 2 || base == 8 || base == 10 || base == 16);
scc_probe_stream_reset(input);
int ch, tmp;
usize n = 0;
usize offset = pos->offset;
while (1) {
ch = scc_probe_stream_peek(input);
if (ch == core_stream_eof) {
break;
} else if (ch >= 'a' && ch <= 'z') {
tmp = ch - 'a' + 10;
} else if (ch >= 'A' && ch <= 'Z') {
tmp = ch - 'A' + 10;
} else if (ch >= '0' && ch <= '9') {
tmp = ch - '0';
} else {
break;
}
if (tmp >= base) {
LOG_ERROR("Invalid digit");
return false;
}
scc_probe_stream_consume(input);
core_pos_next(pos);
n = n * base + tmp;
// TODO number overflow
}
if (offset == pos->offset) {
// None match any number
return false;
}
*output = n;
return true;
}
/**
* @brief
*
* @param input
* @param pos
* @return int
* https://cppreference.cn/w/c/language/character_constant
*/
int lex_parse_char(scc_probe_stream_t *input, scc_pos_t *pos) {
scc_probe_stream_t *stream = input;
Assert(stream != null && pos != null);
scc_probe_stream_reset(stream);
int ch = scc_probe_stream_peek(stream);
int ret = core_stream_eof;
if (ch == core_stream_eof) {
LOG_WARN("Unexpected EOF at begin");
goto ERR;
} else if (ch != '\'') {
LOG_WARN("Unexpected character '%c' at begin", ch);
goto ERR;
}
scc_probe_stream_consume(stream);
core_pos_next(pos);
ch = scc_probe_stream_consume(stream);
core_pos_next(pos);
if (ch == core_stream_eof) {
LOG_WARN("Unexpected EOF at middle");
goto ERR;
} else if (ch == '\\') {
ch = scc_probe_stream_consume(stream);
core_pos_next(pos);
if (ch == '0') {
// 数字转义序列
// \nnn 任意八进制值 码元 nnn
// FIXME 这里如果返回 0 理论上为错误但是恰好与正确值相同
ret = 0;
_lex_parse_uint(stream, pos, 8, (usize *)&ret);
} else if (ch == 'x') {
// TODO https://cppreference.cn/w/c/language/escape
// \xn... 任意十六进制值 码元 n... (任意数量的十六进制数字)
// 通用字符名
TODO();
} else if (ch == 'u' || ch == 'U') {
// \unnnn (C99 起) Unicode 值在允许范围内;
// 可能产生多个码元 码点 U+nnnn
// \Unnnnnnnn (C99 起) Unicode 值在允许范围内;
// 可能产生多个码元 码点 U+nnnnnnnn
TODO();
} else if ((ret = got_simple_escape(ch)) == -1) {
LOG_ERROR("Invalid escape character");
goto ERR;
}
} else {
ret = ch;
}
if ((ch = scc_probe_stream_consume(stream)) != '\'') {
LOG_ERROR("Unclosed character literal '%c' at end, expect `'`", ch);
core_pos_next(pos);
goto ERR;
}
return ret;
ERR:
return core_stream_eof;
}
/**
* @brief
*
* @param input
* @param pos
* @param output
* @return cbool
* https://cppreference.cn/w/c/language/string_literal
*/
cbool lex_parse_string(scc_probe_stream_t *input, scc_pos_t *pos,
scc_cstring_t *output) {
scc_probe_stream_t *stream = input;
Assert(stream != null && pos != null && output != null);
scc_probe_stream_reset(stream);
int ch = scc_probe_stream_peek(stream);
Assert(scc_cstring_is_empty(output));
if (ch == core_stream_eof) {
LOG_WARN("Unexpected EOF at begin");
goto ERR;
} else if (ch != '"') {
LOG_WARN("Unexpected character '%c' at begin", ch);
goto ERR;
}
scc_probe_stream_consume(stream);
core_pos_next(pos);
scc_cstring_t str = scc_cstring_from_cstr("");
while (1) {
ch = scc_probe_stream_peek(stream);
if (ch == core_stream_eof) {
LOG_ERROR("Unexpected EOF at string literal");
goto ERR;
} else if (lex_parse_is_endline(ch)) {
LOG_ERROR("Unexpected newline at string literal");
goto ERR;
} else if (ch == '\\') {
// TODO bad practice and maybe bugs here
scc_probe_stream_consume(stream);
ch = scc_probe_stream_consume(stream);
int val = got_simple_escape(ch);
if (val == -1) {
LOG_ERROR("Invalid escape character it is \\%c [%d]", ch, ch);
} else {
scc_cstring_append_ch(&str, val);
continue;
}
} else if (ch == '"') {
scc_probe_stream_consume(stream);
core_pos_next(pos);
break;
}
scc_probe_stream_consume(stream);
core_pos_next(pos);
scc_cstring_append_ch(&str, ch);
}
*output = str;
return true;
ERR:
scc_cstring_free(&str);
return false;
}
/**
* @brief
*
* @param input
* @param pos
* @param output
* @return cbool
* https://cppreference.cn/w/c/language/integer_constant
*/
cbool lex_parse_number(scc_probe_stream_t *input, scc_pos_t *pos,
usize *output) {
scc_probe_stream_t *stream = input;
Assert(stream != null && pos != null && output != null);
scc_probe_stream_reset(stream);
int ch = scc_probe_stream_peek(stream);
int base = 10; // 默认十进制
if (ch == core_stream_eof) {
LOG_WARN("Unexpected EOF at begin");
goto ERR;
}
if (ch == '0') {
// 消费 '0'
scc_probe_stream_consume(stream);
core_pos_next(pos);
// 查看下一个字符
ch = scc_probe_stream_peek(stream);
if (ch == 'x' || ch == 'X') {
// 十六进制
base = 16;
scc_probe_stream_consume(stream);
core_pos_next(pos);
} else if (ch == 'b' || ch == 'B') {
// 二进制 (C23扩展)
base = 2;
scc_probe_stream_consume(stream);
core_pos_next(pos);
} else if (ch >= '0' && ch <= '7') {
// 八进制
base = 8;
// 不消费,数字将由 _lex_parse_uint 处理
} else if (ch == '8' || ch == '9') {
LOG_ERROR("Invalid digit '%d' in octal literal", ch);
return false;
} else {
// 只是0十进制
*output = 0;
return true;
}
} else if (ch >= '1' && ch <= '9') {
// 十进制,不消费,由 _lex_parse_uint 处理
base = 10;
} else {
// 无效的数字
return false;
}
// 解析整数部分
scc_probe_stream_reset(stream);
usize n;
if (_lex_parse_uint(stream, pos, base, &n) == false) {
// 如果没有匹配任何数字,但输入是 '0',已经处理过了
// 对于十进制数字,至少应该有一个数字
if (base == 10) {
// 单个数字的情况,例如 "1"
// 我们需要消费这个数字并返回它的值
if (ch >= '1' && ch <= '9') {
scc_probe_stream_consume(stream);
core_pos_next(pos);
*output = ch - '0';
return true;
}
}
return false;
}
*output = n;
return true;
ERR:
return false;
}
/**
* @brief
*
* @param input
* @param pos
* @param output
* @return cbool
* https://cppreference.cn/w/c/language/identifier
*/
cbool lex_parse_identifier(scc_probe_stream_t *input, scc_pos_t *pos,
scc_cstring_t *output) {
Assert(input != null && pos != null && output != null);
Assert(scc_cstring_is_empty(output));
scc_probe_stream_t *stream = input;
scc_probe_stream_reset(stream);
int ch = scc_probe_stream_peek(stream);
if (ch == core_stream_eof) {
LOG_WARN("Unexpected EOF at begin");
} else if (ch == '_' || (ch >= 'a' && ch <= 'z') ||
(ch >= 'A' && ch <= 'Z')) {
while (1) {
scc_cstring_append_ch(output, ch);
scc_probe_stream_consume(stream);
core_pos_next(pos);
ch = scc_probe_stream_peek(stream);
if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') ||
(ch == '_') || (ch >= '0' && ch <= '9')) {
continue;
}
break;
}
return true;
}
return false;
}

View File

@@ -0,0 +1,60 @@
// test_char.c
#include <lex_parser.h>
#include <utest/acutest.h>
cbool check_char(const char *str, int expect, int *output) {
log_set_level(&__default_logger_root, 0);
scc_pos_t pos = scc_pos_init();
scc_mem_probe_stream_t mem_stream;
scc_probe_stream_t *stream =
scc_mem_probe_stream_init(&mem_stream, str, scc_strlen(str), false);
*output = lex_parse_char(stream, &pos);
return *output == expect;
}
#define CHECK_CHAR_VALID(str, expect) \
do { \
int _output; \
cbool ret = check_char(str, expect, &_output); \
TEST_CHECK(ret == true); \
} while (0)
#define CHECK_CHAR_INVALID(str) \
do { \
int _output; \
check_char(str, core_stream_eof, &_output); \
TEST_CHECK(_output == core_stream_eof); \
} while (0)
void test_simple_char(void) {
TEST_CASE("simple chars");
CHECK_CHAR_VALID("'a'", 'a');
CHECK_CHAR_VALID("'Z'", 'Z');
CHECK_CHAR_VALID("'0'", '0');
CHECK_CHAR_VALID("' '", ' ');
}
void test_escape_char(void) {
TEST_CASE("escape chars");
CHECK_CHAR_VALID("'\\n'", '\n');
CHECK_CHAR_VALID("'\\t'", '\t');
CHECK_CHAR_VALID("'\\r'", '\r');
CHECK_CHAR_VALID("'\\\\'", '\\');
CHECK_CHAR_VALID("'\\''", '\'');
CHECK_CHAR_VALID("'\\\"'", '\"');
}
void test_invalid_char(void) {
TEST_CASE("invalid chars");
CHECK_CHAR_INVALID("'");
CHECK_CHAR_INVALID("''");
CHECK_CHAR_INVALID("'ab'");
CHECK_CHAR_INVALID("'\\'");
}
TEST_LIST = {
{"test_simple_char", test_simple_char},
{"test_escape_char", test_escape_char},
{"test_invalid_char", test_invalid_char},
{NULL, NULL},
};

View File

@@ -0,0 +1,56 @@
// test_identifier.c
#include <lex_parser.h>
#include <utest/acutest.h>
cbool check_identifier(const char *str, const char *expect,
scc_cstring_t *output) {
log_set_level(&__default_logger_root, 0);
scc_pos_t pos = scc_pos_init();
scc_mem_probe_stream_t mem_stream;
scc_probe_stream_t *stream =
scc_mem_probe_stream_init(&mem_stream, str, scc_strlen(str), false);
cbool ret = lex_parse_identifier(stream, &pos, output);
if (ret && expect) {
return strcmp(output->data, expect) == 0;
}
return ret;
}
#define CHECK_IDENTIFIER_VALID(str, expect) \
do { \
scc_cstring_t _output = scc_cstring_new(); \
cbool ret = check_identifier(str, expect, &_output); \
TEST_CHECK(ret == true); \
TEST_CHECK(strcmp(_output.data, expect) == 0); \
scc_cstring_free(&_output); \
} while (0)
#define CHECK_IDENTIFIER_INVALID(str) \
do { \
scc_cstring_t _output = scc_cstring_new(); \
cbool ret = check_identifier(str, NULL, &_output); \
TEST_CHECK(ret == false); \
scc_cstring_free(&_output); \
} while (0)
void test_valid_identifier(void) {
TEST_CASE("valid identifiers");
CHECK_IDENTIFIER_VALID("variable", "variable");
CHECK_IDENTIFIER_VALID("my_var", "my_var");
CHECK_IDENTIFIER_VALID("_private", "_private");
CHECK_IDENTIFIER_VALID("Var123", "Var123");
CHECK_IDENTIFIER_VALID("a", "a");
}
void test_invalid_identifier(void) {
TEST_CASE("invalid identifiers");
CHECK_IDENTIFIER_INVALID("");
CHECK_IDENTIFIER_INVALID("123var");
}
TEST_LIST = {
{"test_valid_identifier", test_valid_identifier},
{"test_invalid_identifier", test_invalid_identifier},
{NULL, NULL},
};

View File

@@ -0,0 +1,135 @@
#include <lex_parser.h>
#include <utest/acutest.h>
cbool check(const char *str, usize expect, usize *output) {
// TODO maybe have other logger
(void)(expect);
log_set_level(&__default_logger_root, 0);
scc_pos_t pos = scc_pos_init();
scc_mem_probe_stream_t mem_stream;
scc_probe_stream_t *stream =
scc_mem_probe_stream_init(&mem_stream, str, scc_strlen(str), false);
return lex_parse_number(stream, &pos, output);
}
#define CHECK_VALID(str, expect) \
do { \
usize _output; \
cbool ret = check(str, expect, &_output); \
TEST_CHECK(ret == true); \
TEST_CHECK(_output == expect); \
TEST_MSG("Produced: %llu", _output); \
} while (0)
#define CHECK_INVALID(str) \
do { \
usize _output; \
cbool ret = check(str, 0, &_output); \
TEST_CHECK(ret == false); \
} while (0)
void test_simple_hex(void) {
TEST_CASE("lowercase hex");
CHECK_VALID("0xff", 255);
CHECK_VALID("0x0", 0);
CHECK_VALID("0xa", 10);
CHECK_VALID("0xf", 15);
CHECK_VALID("0x1a", 26);
TEST_CASE("uppercase hex");
CHECK_VALID("0xFF", 255);
CHECK_VALID("0xA0", 160);
CHECK_VALID("0xCAFEBABE", 3405691582);
TEST_CASE("mixed case hex");
CHECK_VALID("0xFf", 255);
CHECK_VALID("0xCaFeBaBe", 3405691582);
TEST_CASE("larger hex values");
CHECK_VALID("0xff00", 65280);
CHECK_VALID("0xFFFF", 65535);
TEST_CASE("invalid hex");
CHECK_INVALID("0xG"); // Invalid hex digit
CHECK_INVALID("0xyz"); // Invalid prefix
CHECK_INVALID("0x"); // Incomplete hex
}
void test_simple_oct(void) {
TEST_CASE("basic octal");
CHECK_VALID("00", 0);
CHECK_VALID("01", 1);
CHECK_VALID("07", 7);
TEST_CASE("multi-digit octal");
CHECK_VALID("010", 8);
CHECK_VALID("017", 15);
CHECK_VALID("077", 63);
TEST_CASE("larger octal values");
CHECK_VALID("0177", 127);
CHECK_VALID("0377", 255);
CHECK_VALID("0777", 511);
TEST_CASE("invalid octal");
CHECK_INVALID("08"); // Invalid octal digit
CHECK_INVALID("09"); // Invalid octal digit
}
void test_simple_dec(void) {
TEST_CASE("single digits");
CHECK_VALID("0", 0);
CHECK_VALID("1", 1);
CHECK_VALID("9", 9);
TEST_CASE("multi-digit decimal");
CHECK_VALID("10", 10);
CHECK_VALID("42", 42);
CHECK_VALID("123", 123);
TEST_CASE("larger decimal values");
CHECK_VALID("999", 999);
CHECK_VALID("1234", 1234);
CHECK_VALID("65535", 65535);
}
void test_simple_bin(void) {
TEST_CASE("basic binary");
CHECK_VALID("0b0", 0);
CHECK_VALID("0b1", 1);
TEST_CASE("multi-digit binary");
CHECK_VALID("0b10", 2);
CHECK_VALID("0b11", 3);
CHECK_VALID("0b100", 4);
CHECK_VALID("0b1010", 10);
TEST_CASE("larger binary values");
CHECK_VALID("0b1111", 15);
CHECK_VALID("0b11111111", 255);
CHECK_VALID("0b10101010", 170);
TEST_CASE("invalid binary");
CHECK_INVALID("0b2"); // Invalid binary digit
CHECK_INVALID("0b3"); // Invalid binary digit
CHECK_INVALID("0b"); // Incomplete binary
}
void test_edge_cases(void) {
TEST_CASE("empty string");
CHECK_INVALID(""); // Empty string
TEST_CASE("non-numeric strings");
CHECK_INVALID("abc"); // Non-numeric
CHECK_INVALID("xyz"); // Non-numeric
TEST_CASE("mixed invalid formats");
CHECK_INVALID("0x1G"); // Mixed valid/invalid hex
CHECK_INVALID("0b12"); // Mixed valid/invalid binary
}
TEST_LIST = {
{"test_simple_hex", test_simple_hex}, {"test_simple_oct", test_simple_oct},
{"test_simple_dec", test_simple_dec}, {"test_simple_bin", test_simple_bin},
{"test_edge_cases", test_edge_cases}, {NULL, NULL},
};

View File

@@ -0,0 +1,51 @@
// test_skip_block_comment.c
#include <lex_parser.h>
#include <utest/acutest.h>
void check_skip_block_comment(const char *str, const char *expect_remaining) {
log_set_level(&__default_logger_root, 0);
scc_pos_t pos = scc_pos_init();
scc_mem_probe_stream_t mem_stream;
scc_probe_stream_t *stream =
scc_mem_probe_stream_init(&mem_stream, str, scc_strlen(str), false);
lex_parse_skip_block_comment(stream, &pos);
// Check remaining content
char buffer[256] = {0};
int i = 0;
int ch;
while ((ch = scc_probe_stream_consume(stream)) != core_stream_eof &&
i < 255) {
buffer[i++] = (char)ch;
}
if (expect_remaining) {
TEST_CHECK(strcmp(buffer, expect_remaining) == 0);
}
}
void test_simple_block_comment(void) {
TEST_CASE("simple block comments");
check_skip_block_comment("/* comment */", "");
check_skip_block_comment("/* comment */ int x;", " int x;");
}
void test_multiline_block_comment(void) {
TEST_CASE("multiline block comments");
check_skip_block_comment("/* line1\nline2 */", "");
check_skip_block_comment("/* line1\nline2 */ int x;", " int x;");
}
void test_nested_asterisk_block_comment(void) {
TEST_CASE("nested asterisk block comments");
check_skip_block_comment("/* *** */", "");
check_skip_block_comment("/* *** */ int x;", " int x;");
}
TEST_LIST = {
{"test_simple_block_comment", test_simple_block_comment},
{"test_multiline_block_comment", test_multiline_block_comment},
{"test_nested_asterisk_block_comment", test_nested_asterisk_block_comment},
{NULL, NULL},
};

View File

@@ -0,0 +1,50 @@
// test_skip_line.c
#include <lex_parser.h>
#include <utest/acutest.h>
void check_skip_line(const char *str, const char *expect_remaining) {
log_set_level(&__default_logger_root, 0);
scc_pos_t pos = scc_pos_init();
scc_mem_probe_stream_t mem_stream;
scc_probe_stream_t *stream =
scc_mem_probe_stream_init(&mem_stream, str, scc_strlen(str), false);
lex_parse_skip_line(stream, &pos);
// Check remaining content
char buffer[256] = {0};
int i = 0;
int ch;
while ((ch = scc_probe_stream_consume(stream)) != core_stream_eof &&
i < 255) {
buffer[i++] = (char)ch;
}
if (expect_remaining) {
TEST_CHECK(strcmp(buffer, expect_remaining) == 0);
}
}
void test_simple_line_comment(void) {
TEST_CASE("simple line comments");
check_skip_line("// comment\n", "");
check_skip_line("// comment\nint x;", "int x;");
}
void test_crlf_line_comment(void) {
TEST_CASE("CRLF line comments");
check_skip_line("// comment\r\n", "");
check_skip_line("// comment\r\nint x;", "int x;");
}
void test_eof_line_comment(void) {
TEST_CASE("EOF line comments");
check_skip_line("// comment", "");
}
TEST_LIST = {
{"test_simple_line_comment", test_simple_line_comment},
{"test_crlf_line_comment", test_crlf_line_comment},
{"test_eof_line_comment", test_eof_line_comment},
{NULL, NULL},
};

View File

@@ -0,0 +1,62 @@
// test_string.c
#include <lex_parser.h>
#include <utest/acutest.h>
cbool check_string(const char *str, const char *expect, scc_cstring_t *output) {
log_set_level(&__default_logger_root, 0);
scc_pos_t pos = scc_pos_init();
scc_mem_probe_stream_t mem_stream;
scc_probe_stream_t *stream =
scc_mem_probe_stream_init(&mem_stream, str, scc_strlen(str), false);
cbool ret = lex_parse_string(stream, &pos, output);
if (ret && expect) {
return strcmp(output->data, expect) == 0;
}
return ret;
}
#define CHECK_STRING_VALID(str, expect) \
do { \
scc_cstring_t _output = scc_cstring_new(); \
cbool ret = check_string(str, expect, &_output); \
TEST_CHECK(ret == true); \
TEST_CHECK(strcmp(_output.data, expect) == 0); \
scc_cstring_free(&_output); \
} while (0)
#define CHECK_STRING_INVALID(str) \
do { \
scc_cstring_t _output = scc_cstring_new(); \
cbool ret = check_string(str, NULL, &_output); \
TEST_CHECK(ret == false); \
scc_cstring_free(&_output); \
} while (0)
void test_simple_string(void) {
TEST_CASE("simple strings");
CHECK_STRING_VALID("\"\"", "");
CHECK_STRING_VALID("\"hello\"", "hello");
CHECK_STRING_VALID("\"hello world\"", "hello world");
}
void test_escape_string(void) {
TEST_CASE("escape strings");
CHECK_STRING_VALID("\"\\n\"", "\n");
CHECK_STRING_VALID("\"\\t\"", "\t");
CHECK_STRING_VALID("\"\\\"\"", "\"");
CHECK_STRING_VALID("\"Hello\\nWorld\"", "Hello\nWorld");
}
void test_invalid_string(void) {
TEST_CASE("invalid strings");
CHECK_STRING_INVALID("\"unterminated");
CHECK_STRING_INVALID("\"newline\n\"");
}
TEST_LIST = {
{"test_simple_string", test_simple_string},
{"test_escape_string", test_escape_string},
{"test_invalid_string", test_invalid_string},
{NULL, NULL},
};

8
libs/lexer/cbuild.toml Normal file
View File

@@ -0,0 +1,8 @@
[package]
name = "smcc_lex"
version = "0.1.0"
dependencies = [
{ name = "libcore", path = "../../runtime/libcore" },
{ name = "smcc_lex_parser", path = "../lex_parser" },
]

View File

@@ -0,0 +1,53 @@
/**
* @file lexer.h
* @brief C语言词法分析器核心数据结构与接口
*/
#ifndef __SCC_LEXER_H__
#define __SCC_LEXER_H__
#include "lexer_token.h"
#include <libcore.h>
typedef struct lexer_token {
scc_tok_type_t type;
scc_cvalue_t value;
scc_pos_t loc;
} lexer_tok_t;
/**
* @brief 词法分析器核心结构体
*
* 封装词法分析所需的状态信息和缓冲区管理
*/
typedef struct cc_lexer {
scc_probe_stream_t *stream;
scc_pos_t pos;
} scc_lexer_t;
/**
* @brief 初始化词法分析器
* @param[out] lexer 要初始化的词法分析器实例
* @param[in] stream 输入流对象指针
*/
void scc_lexer_init(scc_lexer_t *lexer, scc_probe_stream_t *stream);
/**
* @brief 获取原始token
* @param[in] lexer 词法分析器实例
* @param[out] token 输出token存储位置
*
* 此函数会返回所有类型的token包括空白符等无效token
*/
void scc_lexer_get_token(scc_lexer_t *lexer, lexer_tok_t *token);
/**
* @brief 获取有效token
* @param[in] lexer 词法分析器实例
* @param[out] token 输出token存储位置
*
* 此函数会自动跳过空白符等无效token返回对语法分析有意义的token
*/
void scc_lexer_get_valid_token(scc_lexer_t *lexer, lexer_tok_t *token);
#endif /* __SCC_LEXER_H__ */

View File

@@ -0,0 +1,48 @@
#ifndef __SMCC_LEXER_LOG_H__
#define __SMCC_LEXER_LOG_H__
#include <libcore.h>
#ifndef LEX_LOG_LEVEL
#define LEX_LOG_LEVEL 4
#endif
#if LEX_LOG_LEVEL <= 1
#define LEX_NOTSET(fmt, ...) MLOG_NOTSET(&__smcc_lexer_log, fmt, ##__VA_ARGS__)
#else
#define LEX_NOTSET(fmt, ...)
#endif
#if LEX_LOG_LEVEL <= 2
#define LEX_DEBUG(fmt, ...) MLOG_DEBUG(&__smcc_lexer_log, fmt, ##__VA_ARGS__)
#else
#define LEX_DEBUG(fmt, ...)
#endif
#if LEX_LOG_LEVEL <= 3
#define LEX_INFO(fmt, ...) MLOG_INFO(&__smcc_lexer_log, fmt, ##__VA_ARGS__)
#else
#define LEX_INFO(fmt, ...)
#endif
#if LEX_LOG_LEVEL <= 4
#define LEX_WARN(fmt, ...) MLOG_WARN(&__smcc_lexer_log, fmt, ##__VA_ARGS__)
#else
#define LEX_WARN(fmt, ...)
#endif
#if LEX_LOG_LEVEL <= 5
#define LEX_ERROR(fmt, ...) MLOG_ERROR(&__smcc_lexer_log, fmt, ##__VA_ARGS__)
#else
#define LEX_ERROR(fmt, ...)
#endif
#if LEX_LOG_LEVEL <= 6
#define LEX_FATAL(fmt, ...) MLOG_FATAL(&__smcc_lexer_log, fmt, ##__VA_ARGS__)
#else
#define LEX_FATAL(fmt, ...)
#endif
extern logger_t __smcc_lexer_log;
#endif // __SMCC_LEXER_LOG_H__

View File

@@ -0,0 +1,140 @@
#ifndef __SMCC_CC_TOKEN_H__
#define __SMCC_CC_TOKEN_H__
#include <libcore.h>
typedef enum scc_cstd {
SCC_CSTD_C89,
SCC_CSTD_C99,
SCC_CEXT_ASM,
} scc_cstd_t;
/* clang-format off */
// WARNING: Using Binary Search To Fast Find Keyword
// 你必须确保其中是按照字典序排列
#define SCC_CKEYWORD_TABLE \
X(asm , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_ASM , SCC_CEXT_ASM) \
X(break , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_BREAK , SCC_CSTD_C89) \
X(case , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_CASE , SCC_CSTD_C89) \
X(char , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_CHAR , SCC_CSTD_C89) \
X(const , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_CONST , SCC_CSTD_C89) \
X(continue , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_CONTINUE , SCC_CSTD_C89) \
X(default , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_DEFAULT , SCC_CSTD_C89) \
X(do , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_DO , SCC_CSTD_C89) \
X(double , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_DOUBLE , SCC_CSTD_C89) \
X(else , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_ELSE , SCC_CSTD_C89) \
X(enum , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_ENUM , SCC_CSTD_C89) \
X(extern , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_EXTERN , SCC_CSTD_C89) \
X(float , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_FLOAT , SCC_CSTD_C89) \
X(for , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_FOR , SCC_CSTD_C89) \
X(goto , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_GOTO , SCC_CSTD_C89) \
X(if , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_IF , SCC_CSTD_C89) \
X(inline , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_INLINE , SCC_CSTD_C99) \
X(int , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_INT , SCC_CSTD_C89) \
X(long , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_LONG , SCC_CSTD_C89) \
X(register , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_REGISTER , SCC_CSTD_C89) \
X(restrict , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_RESTRICT , SCC_CSTD_C99) \
X(return , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_RETURN , SCC_CSTD_C89) \
X(short , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_SHORT , SCC_CSTD_C89) \
X(signed , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_SIGNED , SCC_CSTD_C89) \
X(sizeof , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_SIZEOF , SCC_CSTD_C89) \
X(static , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_STATIC , SCC_CSTD_C89) \
X(struct , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_STRUCT , SCC_CSTD_C89) \
X(switch , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_SWITCH , SCC_CSTD_C89) \
X(typedef , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_TYPEDEF , SCC_CSTD_C89) \
X(union , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_UNION , SCC_CSTD_C89) \
X(unsigned , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_UNSIGNED , SCC_CSTD_C89) \
X(void , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_VOID , SCC_CSTD_C89) \
X(volatile , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_VOLATILE , SCC_CSTD_C89) \
X(while , SCC_TOK_SUBTYPE_KEYWORD , SCC_TOK_WHILE , SCC_CSTD_C89) \
// KEYWORD_TABLE
#define SCC_CTOK_TABLE \
X(unknown , SCC_TOK_SUBTYPE_INVALID, SCC_TOK_UNKNOWN ) \
X(EOF , SCC_TOK_SUBTYPE_EOF, SCC_TOK_EOF ) \
X(blank , SCC_TOK_SUBTYPE_EMPTYSPACE, SCC_TOK_BLANK ) \
X("==" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_EQ ) \
X("=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN ) \
X("++" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ADD_ADD ) \
X("+=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_ADD ) \
X("+" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ADD ) \
X("--" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_SUB_SUB ) \
X("-=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_SUB ) \
X("->" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_DEREF ) \
X("-" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_SUB ) \
X("*=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_MUL ) \
X("*" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_MUL ) \
X("/=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_DIV ) \
X("/" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_DIV ) \
X("//" , SCC_TOK_SUBTYPE_COMMENT , SCC_TOK_LINE_COMMENT ) \
X("/* */" , SCC_TOK_SUBTYPE_COMMENT , SCC_TOK_BLOCK_COMMENT ) \
X("%=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_MOD ) \
X("%" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_MOD ) \
X("&&" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_AND_AND ) \
X("&=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_AND ) \
X("&" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_AND ) \
X("||" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_OR_OR ) \
X("|=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_OR ) \
X("|" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_OR ) \
X("^=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_XOR ) \
X("^" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_XOR ) \
X("<<=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_L_SH ) \
X("<<" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_L_SH ) \
X("<=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_LE ) \
X("<" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_LT ) \
X(">>=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ASSIGN_R_SH ) \
X(">>" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_R_SH ) \
X(">=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_GE ) \
X(">" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_GT ) \
X("!" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_NOT ) \
X("!=" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_NEQ ) \
X("~" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_BIT_NOT ) \
X("[" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_L_BRACKET ) \
X("]" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_R_BRACKET ) \
X("(" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_L_PAREN ) \
X(")" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_R_PAREN ) \
X("{" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_L_BRACE ) \
X("}" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_R_BRACE ) \
X(";" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_SEMICOLON ) \
X("," , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_COMMA ) \
X(":" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_COLON ) \
X("." , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_DOT ) \
X("..." , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_ELLIPSIS ) \
X("?" , SCC_TOK_SUBTYPE_OPERATOR, SCC_TOK_COND ) \
X(ident , SCC_TOK_SUBTYPE_IDENTIFIER, SCC_TOK_IDENT ) \
X(int_literal , SCC_TOK_SUBTYPE_LITERAL, SCC_TOK_INT_LITERAL ) \
X(float_literal , SCC_TOK_SUBTYPE_LITERAL, SCC_TOK_FLOAT_LITERAL ) \
X(char_literal , SCC_TOK_SUBTYPE_LITERAL, SCC_TOK_CHAR_LITERAL ) \
X(string_literal , SCC_TOK_SUBTYPE_LITERAL, SCC_TOK_STRING_LITERAL ) \
// END
/* clang-format on */
// 定义TokenType枚举
typedef enum scc_tok_type {
// 处理普通token
#define X(str, subtype, tok) tok,
SCC_CTOK_TABLE
#undef X
// 处理关键字(保持原有格式)
#define X(name, subtype, tok, std) tok,
SCC_CKEYWORD_TABLE
#undef X
} scc_tok_type_t;
typedef enum scc_tok_subtype {
SCC_TOK_SUBTYPE_INVALID, // 错误占位
SCC_TOK_SUBTYPE_KEYWORD, // 关键字
SCC_TOK_SUBTYPE_OPERATOR, // 操作符
SCC_TOK_SUBTYPE_IDENTIFIER, // 标识符
SCC_TOK_SUBTYPE_LITERAL, // 字面量
SCC_TOK_SUBTYPE_EMPTYSPACE, // 空白
SCC_TOK_SUBTYPE_COMMENT, // 注释
SCC_TOK_SUBTYPE_EOF // 结束标记
} scc_tok_subtype_t;
scc_tok_subtype_t scc_get_tok_subtype(scc_tok_type_t type);
const char *scc_get_tok_name(scc_tok_type_t type);
#endif

482
libs/lexer/src/lexer.c Normal file
View File

@@ -0,0 +1,482 @@
/**
* 仿照LCCompiler的词法分析部分
*
* 如下为LCC的README in 2025.2
This hierarchy is the distribution for lcc version 4.2.
lcc version 3.x is described in the book "A Retargetable C Compiler:
Design and Implementation" (Addison-Wesley, 1995, ISBN 0-8053-1670-1).
There are significant differences between 3.x and 4.x, most notably in
the intermediate code. For details, see
https://drh.github.io/lcc/documents/interface4.pdf.
VERSION 4.2 IS INCOMPATIBLE WITH EARLIER VERSIONS OF LCC. DO NOT
UNLOAD THIS DISTRIBUTION ON TOP OF A 3.X DISTRIBUTION.
LCC is a C89 ("ANSI C") compiler designed to be highly retargetable.
LOG describes the changes since the last release.
CPYRIGHT describes the conditions under you can use, copy, modify, and
distribute lcc or works derived from lcc.
doc/install.html is an HTML file that gives a complete description of
the distribution and installation instructions.
Chris Fraser / cwf@aya.yale.edu
David Hanson / drh@drhanson.net
*/
#include <lex_parser.h>
#include <lexer.h>
#include <lexer_log.h>
static const struct {
const char *name;
scc_cstd_t std_type;
scc_tok_type_t tok;
} keywords[] = {
#define X(name, subtype, tok, std_type, ...) {#name, std_type, tok},
SCC_CKEYWORD_TABLE
#undef X
};
// by using binary search to find the keyword
static inline int keyword_cmp(const char *name, int len) {
int low = 0;
int high = sizeof(keywords) / sizeof(keywords[0]) - 1;
while (low <= high) {
int mid = (low + high) / 2;
const char *key = keywords[mid].name;
int cmp = 0;
// 自定义字符串比较逻辑
for (int i = 0; i < len; i++) {
if (name[i] != key[i]) {
cmp = (unsigned char)name[i] - (unsigned char)key[i];
break;
}
if (name[i] == '\0')
break; // 遇到终止符提前结束
}
if (cmp == 0) {
// 完全匹配检查(长度相同)
if (key[len] == '\0')
return mid;
cmp = -1; // 当前关键词比输入长
}
if (cmp < 0) {
high = mid - 1;
} else {
low = mid + 1;
}
}
return -1; // Not a keyword.
}
void scc_lexer_init(scc_lexer_t *lexer, scc_probe_stream_t *stream) {
lexer->stream = stream;
lexer->pos = scc_pos_init();
// FIXME
lexer->pos.name = scc_cstring_from_cstr(scc_cstring_as_cstr(&stream->name));
}
#define set_err_token(token) ((token)->type = SCC_TOK_UNKNOWN)
static void parse_line(scc_lexer_t *lexer, lexer_tok_t *token) {
token->loc = lexer->pos;
scc_probe_stream_t *stream = lexer->stream;
scc_probe_stream_reset(stream);
int ch = scc_probe_stream_next(stream);
usize n;
scc_cstring_t str = scc_cstring_new();
if (ch == core_stream_eof) {
LEX_WARN("Unexpected EOF at begin");
goto ERR;
} else if (ch != '#') {
LEX_WARN("Unexpected character '%c' at begin", ch);
goto ERR;
}
const char line[] = "line";
for (int i = 0; i < (int)sizeof(line); i++) {
ch = scc_probe_stream_consume(stream);
core_pos_next(&lexer->pos);
if (ch != line[i]) {
LEX_WARN("Maroc does not support in lexer rather in preprocessor, "
"it will be ignored");
goto SKIP_LINE;
}
}
if (lex_parse_number(lexer->stream, &lexer->pos, &n) == false) {
LEX_ERROR("Invalid line number");
goto SKIP_LINE;
}
if (scc_probe_stream_consume(stream) != ' ') {
lex_parse_skip_line(lexer->stream, &lexer->pos);
token->loc.line = token->value.n;
}
if (scc_probe_stream_next(stream) != '"') {
LEX_ERROR("Invalid `#` line");
goto SKIP_LINE;
}
if (lex_parse_string(lexer->stream, &lexer->pos, &str) == false) {
LEX_ERROR("Invalid filename");
goto SKIP_LINE;
}
lex_parse_skip_line(lexer->stream, &lexer->pos);
token->loc.line = n;
// FIXME memory leak
token->loc.name = scc_cstring_from_cstr(scc_cstring_as_cstr(&str));
scc_cstring_free(&str);
return;
SKIP_LINE:
lex_parse_skip_line(lexer->stream, &lexer->pos);
ERR:
set_err_token(token);
scc_cstring_free(&str);
}
// /zh/c/language/operator_arithmetic.html
void scc_lexer_get_token(scc_lexer_t *lexer, lexer_tok_t *token) {
token->loc = lexer->pos;
token->type = SCC_TOK_UNKNOWN;
scc_probe_stream_t *stream = lexer->stream;
scc_probe_stream_reset(stream);
scc_tok_type_t type = SCC_TOK_UNKNOWN;
int ch = scc_probe_stream_next(stream);
// once step
switch (ch) {
case '=':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_EQ;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_ASSIGN;
break;
}
break;
case '+':
switch (scc_probe_stream_next(stream)) {
case '+':
type = SCC_TOK_ADD_ADD;
goto double_char;
case '=':
type = SCC_TOK_ASSIGN_ADD;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_ADD;
break;
}
break;
case '-':
switch (scc_probe_stream_next(stream)) {
case '-':
type = SCC_TOK_SUB_SUB;
goto double_char;
case '=':
type = SCC_TOK_ASSIGN_SUB;
goto double_char;
case '>':
type = SCC_TOK_DEREF;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_SUB;
break;
}
break;
case '*':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_ASSIGN_MUL;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_MUL;
break;
}
break;
case '/':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_ASSIGN_DIV;
goto double_char;
case '/':
lex_parse_skip_line(lexer->stream, &lexer->pos);
token->type = SCC_TOK_LINE_COMMENT;
goto END;
case '*':
lex_parse_skip_block_comment(lexer->stream, &lexer->pos);
token->type = SCC_TOK_BLOCK_COMMENT;
goto END;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_DIV;
break;
}
break;
case '%':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_ASSIGN_MOD;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_MOD;
break;
}
break;
case '&':
switch (scc_probe_stream_next(stream)) {
case '&':
type = SCC_TOK_AND_AND;
goto double_char;
case '=':
type = SCC_TOK_ASSIGN_AND;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_AND;
break;
}
break;
case '|':
switch (scc_probe_stream_next(stream)) {
case '|':
type = SCC_TOK_OR_OR;
goto double_char;
case '=':
type = SCC_TOK_ASSIGN_OR;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_OR;
break;
}
break;
case '^':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_ASSIGN_XOR;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_XOR;
break;
}
break;
case '<':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_LE;
goto double_char;
case '<': {
if (scc_probe_stream_next(stream) == '=') {
type = SCC_TOK_ASSIGN_L_SH;
goto triple_char;
} else {
type = SCC_TOK_L_SH;
goto double_char;
}
break;
}
default:
scc_probe_stream_reset(stream), type = SCC_TOK_LT;
break;
}
break;
case '>':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_GE;
goto double_char;
case '>': {
if (scc_probe_stream_next(stream) == '=') {
type = SCC_TOK_ASSIGN_R_SH;
goto triple_char;
} else {
type = SCC_TOK_R_SH;
goto double_char;
}
break;
}
default:
scc_probe_stream_reset(stream), type = SCC_TOK_GT;
break;
}
break;
case '~':
type = SCC_TOK_BIT_NOT;
break;
case '!':
switch (scc_probe_stream_next(stream)) {
case '=':
type = SCC_TOK_NEQ;
goto double_char;
default:
scc_probe_stream_reset(stream), type = SCC_TOK_NOT;
break;
}
break;
case '[':
type = SCC_TOK_L_BRACKET;
break;
case ']':
type = SCC_TOK_R_BRACKET;
break;
case '(':
type = SCC_TOK_L_PAREN;
break;
case ')':
type = SCC_TOK_R_PAREN;
break;
case '{':
type = SCC_TOK_L_BRACE;
break;
case '}':
type = SCC_TOK_R_BRACE;
break;
case ';':
type = SCC_TOK_SEMICOLON;
break;
case ',':
type = SCC_TOK_COMMA;
break;
case ':':
type = SCC_TOK_COLON;
break;
case '.':
if (scc_probe_stream_next(stream) == '.' &&
scc_probe_stream_next(stream) == '.') {
type = SCC_TOK_ELLIPSIS;
goto triple_char;
}
type = SCC_TOK_DOT;
break;
case '?':
type = SCC_TOK_COND;
break;
case '\v':
case '\f':
case ' ':
case '\t':
type = SCC_TOK_BLANK;
break;
case '\r':
case '\n':
lex_parse_skip_endline(lexer->stream, &lexer->pos);
token->type = SCC_TOK_BLANK;
goto END;
case '#':
parse_line(lexer, token);
token->type = SCC_TOK_BLANK;
goto END;
case '\0':
case core_stream_eof:
// EOF
type = SCC_TOK_EOF;
break;
case '\'': {
token->loc = lexer->pos;
token->type = SCC_TOK_CHAR_LITERAL;
int ch = lex_parse_char(lexer->stream, &lexer->pos);
if (ch == core_stream_eof) {
LEX_ERROR("Unexpected character literal");
token->type = SCC_TOK_UNKNOWN;
} else {
token->value.ch = ch;
}
goto END;
}
case '"': {
token->loc = lexer->pos;
token->type = SCC_TOK_STRING_LITERAL;
scc_cstring_t output = scc_cstring_new();
if (lex_parse_string(lexer->stream, &lexer->pos, &output) == true) {
token->value.cstr.data = scc_cstring_as_cstr(&output);
token->value.cstr.len = scc_cstring_len(&output);
} else {
LEX_ERROR("Unexpected string literal");
token->type = SCC_TOK_UNKNOWN;
}
goto END;
}
/* clang-format off */
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
/* clang-format on */
token->loc = lexer->pos;
token->type = SCC_TOK_INT_LITERAL;
usize output;
if (lex_parse_number(lexer->stream, &lexer->pos, &output) == true) {
token->value.n = output;
} else {
LEX_ERROR("Unexpected number literal");
token->type = SCC_TOK_UNKNOWN;
}
goto END;
/* clang-format off */
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g':
case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u':
case 'v': case 'w': case 'x': case 'y': case 'z':
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N':
case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T': case 'U':
case 'V': case 'W': case 'X': case 'Y': case 'Z': case '_':
/* clang-format on */
scc_cstring_t str = scc_cstring_new();
cbool ret = lex_parse_identifier(lexer->stream, &lexer->pos, &str);
Assert(ret == true);
int res = keyword_cmp(scc_cstring_as_cstr(&str), scc_cstring_len(&str));
if (res == -1) {
token->value.cstr.data = (char *)scc_cstring_as_cstr(&str);
token->value.cstr.len = scc_cstring_len(&str);
type = SCC_TOK_IDENT;
} else {
scc_cstring_free(&str);
type = keywords[res].tok;
}
token->type = type;
goto END;
default:
LEX_ERROR("unsupport char in sourse code `%c`", ch);
break;
}
goto once_char;
triple_char:
scc_probe_stream_consume(stream);
core_pos_next(&lexer->pos);
double_char:
scc_probe_stream_consume(stream);
core_pos_next(&lexer->pos);
once_char:
scc_probe_stream_consume(stream);
core_pos_next(&lexer->pos);
token->type = type;
END:
LEX_DEBUG("get token `%s` in %s:%d:%d", scc_get_tok_name(token->type),
token->loc.name, token->loc.line, token->loc.column);
}
// scc_lexer_get_token maybe got invalid (with parser)
void scc_lexer_get_valid_token(scc_lexer_t *lexer, lexer_tok_t *token) {
scc_tok_subtype_t type;
do {
scc_lexer_get_token(lexer, token);
type = scc_get_tok_subtype(token->type);
AssertFmt(type != SCC_TOK_SUBTYPE_INVALID,
"Invalid token: `%s` at %s:%d:%d",
scc_get_tok_name(token->type), token->loc.name,
token->loc.line, token->loc.col);
Assert(type != SCC_TOK_SUBTYPE_INVALID);
} while (type == SCC_TOK_SUBTYPE_EMPTYSPACE ||
type == SCC_TOK_SUBTYPE_COMMENT);
}

View File

@@ -0,0 +1,7 @@
#include <lexer_log.h>
logger_t __smcc_lexer_log = {
.name = "lexer",
.level = LOG_LEVEL_ALL,
.handler = log_default_handler,
};

30
libs/lexer/src/token.c Normal file
View File

@@ -0,0 +1,30 @@
#include <lexer_token.h>
// 生成字符串映射(根据需求选择#str或#name
static const char *token_strings[] = {
#define X(str, subtype, tok) [tok] = #str,
SCC_CTOK_TABLE
#undef X
#define X(str, subtype, tok, std) [tok] = #str,
SCC_CKEYWORD_TABLE
#undef X
};
static scc_tok_subtype_t token_subtypes[] = {
#define X(str, subtype, tok) [tok] = subtype,
SCC_CTOK_TABLE
#undef X
#define X(str, subtype, tok, std) [tok] = subtype,
SCC_CKEYWORD_TABLE
#undef X
};
scc_tok_subtype_t scc_get_tok_subtype(scc_tok_type_t type) {
return token_subtypes[type];
}
const char *scc_get_tok_name(scc_tok_type_t type) {
return token_strings[type];
}

View File

@@ -0,0 +1,170 @@
// test_lexer.c
#include <lexer.h>
#include <string.h>
#include <utest/acutest.h>
// 测试辅助函数
static inline void test_lexer_string(const char *input,
scc_tok_type_t expected_type) {
scc_lexer_t lexer;
lexer_tok_t token;
scc_mem_probe_stream_t stream;
scc_lexer_init(&lexer, scc_mem_probe_stream_init(&stream, input,
strlen(input), false));
scc_lexer_get_token(&lexer, &token);
TEST_CHECK(token.type == expected_type);
TEST_MSG("Expected: %s", scc_get_tok_name(expected_type));
TEST_MSG("Got: %s", scc_get_tok_name(token.type));
}
// 基础运算符测试
void test_operators() {
TEST_CASE("Arithmetic operators");
{
test_lexer_string("+", SCC_TOK_ADD);
test_lexer_string("++", SCC_TOK_ADD_ADD);
test_lexer_string("+=", SCC_TOK_ASSIGN_ADD);
test_lexer_string("-", SCC_TOK_SUB);
test_lexer_string("--", SCC_TOK_SUB_SUB);
test_lexer_string("-=", SCC_TOK_ASSIGN_SUB);
test_lexer_string("*", SCC_TOK_MUL);
test_lexer_string("*=", SCC_TOK_ASSIGN_MUL);
test_lexer_string("/", SCC_TOK_DIV);
test_lexer_string("/=", SCC_TOK_ASSIGN_DIV);
test_lexer_string("%", SCC_TOK_MOD);
test_lexer_string("%=", SCC_TOK_ASSIGN_MOD);
}
TEST_CASE("Bitwise operators");
{
test_lexer_string("&", SCC_TOK_AND);
test_lexer_string("&&", SCC_TOK_AND_AND);
test_lexer_string("&=", SCC_TOK_ASSIGN_AND);
test_lexer_string("|", SCC_TOK_OR);
test_lexer_string("||", SCC_TOK_OR_OR);
test_lexer_string("|=", SCC_TOK_ASSIGN_OR);
test_lexer_string("^", SCC_TOK_XOR);
test_lexer_string("^=", SCC_TOK_ASSIGN_XOR);
test_lexer_string("~", SCC_TOK_BIT_NOT);
test_lexer_string("<<", SCC_TOK_L_SH);
test_lexer_string("<<=", SCC_TOK_ASSIGN_L_SH);
test_lexer_string(">>", SCC_TOK_R_SH);
test_lexer_string(">>=", SCC_TOK_ASSIGN_R_SH);
}
TEST_CASE("Comparison operators");
{
test_lexer_string("==", SCC_TOK_EQ);
test_lexer_string("!=", SCC_TOK_NEQ);
test_lexer_string("<", SCC_TOK_LT);
test_lexer_string("<=", SCC_TOK_LE);
test_lexer_string(">", SCC_TOK_GT);
test_lexer_string(">=", SCC_TOK_GE);
}
TEST_CASE("Special symbols");
{
test_lexer_string("(", SCC_TOK_L_PAREN);
test_lexer_string(")", SCC_TOK_R_PAREN);
test_lexer_string("[", SCC_TOK_L_BRACKET);
test_lexer_string("]", SCC_TOK_R_BRACKET);
test_lexer_string("{", SCC_TOK_L_BRACE);
test_lexer_string("}", SCC_TOK_R_BRACE);
test_lexer_string(";", SCC_TOK_SEMICOLON);
test_lexer_string(",", SCC_TOK_COMMA);
test_lexer_string(":", SCC_TOK_COLON);
test_lexer_string(".", SCC_TOK_DOT);
test_lexer_string("...", SCC_TOK_ELLIPSIS);
test_lexer_string("->", SCC_TOK_DEREF);
test_lexer_string("?", SCC_TOK_COND);
}
}
// 关键字测试
void test_keywords() {
TEST_CASE("C89 keywords");
test_lexer_string("while", SCC_TOK_WHILE);
test_lexer_string("sizeof", SCC_TOK_SIZEOF);
TEST_CASE("C99 keywords");
test_lexer_string("restrict", SCC_TOK_RESTRICT);
// test_lexer_string("_Bool", SCC_TOK_INT); // 需确认你的类型定义
}
// 字面量测试
void test_literals() {
TEST_CASE("Integer literals");
{
// 十进制
test_lexer_string("0", SCC_TOK_INT_LITERAL);
test_lexer_string("123", SCC_TOK_INT_LITERAL);
test_lexer_string("2147483647", SCC_TOK_INT_LITERAL);
// 十六进制
test_lexer_string("0x0", SCC_TOK_INT_LITERAL);
test_lexer_string("0x1A3F", SCC_TOK_INT_LITERAL);
test_lexer_string("0XABCDEF", SCC_TOK_INT_LITERAL);
// 八进制
test_lexer_string("0123", SCC_TOK_INT_LITERAL);
test_lexer_string("0777", SCC_TOK_INT_LITERAL);
// 边界值测试
test_lexer_string("2147483647", SCC_TOK_INT_LITERAL); // INT_MAX
test_lexer_string("4294967295", SCC_TOK_INT_LITERAL); // UINT_MAX
}
TEST_CASE("Character literals");
{
test_lexer_string("'a'", SCC_TOK_CHAR_LITERAL);
test_lexer_string("'\\n'", SCC_TOK_CHAR_LITERAL);
test_lexer_string("'\\t'", SCC_TOK_CHAR_LITERAL);
test_lexer_string("'\\\\'", SCC_TOK_CHAR_LITERAL);
test_lexer_string("'\\0'", SCC_TOK_CHAR_LITERAL);
}
TEST_CASE("String literals");
{
test_lexer_string("\"hello\"", SCC_TOK_STRING_LITERAL);
test_lexer_string("\"multi-line\\nstring\"", SCC_TOK_STRING_LITERAL);
test_lexer_string("\"escape\\\"quote\"", SCC_TOK_STRING_LITERAL);
}
// TEST_CASE("Floating literals");
// test_lexer_string("3.14e-5", SCC_TOK_FLOAT_LITERAL);
}
// 边界测试
void test_edge_cases() {
// TEST_CASE("Long identifiers");
// char long_id[LEXER_MAX_ SCC_TOK_SIZE+2] = {0};
// memset(long_id, 'a', LEXER_MAX_ SCC_TOK_SIZE+1);
// test_lexer_string(long_id, SCC_TOK_IDENT);
// TEST_CASE("Buffer boundary");
// char boundary[LEXER_BUFFER_SIZE*2] = {0};
// memset(boundary, '+', LEXER_BUFFER_SIZE*2-1);
// test_lexer_string(boundary, SCC_TOK_ADD);
}
// 错误处理测试
// void test_error_handling() {
// TEST_CASE("Invalid characters");
// cc_lexer_t lexer;
// tok_t token;
// init_lexer(&lexer, "test.c", NULL, test_read);
// get_valid_token(&lexer, &token);
// TEST_CHECK(token.type == SCC_TOK_EOF); // 应触发错误处理
// }
// 测试列表
TEST_LIST = {{"operators", test_operators},
{"keywords", test_keywords},
{"literals", test_literals},
{"edge_cases", test_edge_cases},
// {"error_handling", test_error_handling},
{NULL, NULL}};

View File

@@ -0,0 +1,92 @@
#include <lexer.h>
#include <lexer_log.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/// gcc -g ../lexer.c ../token.c test_lexer.c -o test_lexer
/*
tok_tConstant {
int have;
union {
char ch;
int i;
float f;
double d;
long long ll;
char* str;
};
};
*/
int g_num;
int g_num_arr[3];
int main(int argc, char *argv[]) {
// int num = 0;
if (argc == 3 && strcmp(argv[2], "--debug") == 0) {
log_set_level(NULL, LOG_LEVEL_ALL);
} else {
// FIXME it is a hack lexer_logger
log_set_level(&__smcc_lexer_log, LOG_LEVEL_NOTSET);
log_set_level(NULL, LOG_LEVEL_INFO | LOG_LEVEL_WARN | LOG_LEVEL_ERROR);
}
const char *file_name = __FILE__;
if (argc == 2) {
file_name = argv[1];
}
FILE *fp = fopen(file_name, "rb");
if (fp == NULL) {
perror("open file failed");
return 1;
}
if (fseek(fp, 0, SEEK_END) != 0) {
perror("fseek failed");
return 1;
}
usize fsize = ftell(fp);
LOG_INFO("file size: %zu", fsize);
if (fseek(fp, 0, SEEK_SET)) {
perror("fseek failed");
return 1;
}
char *buffer = (char *)malloc(fsize);
usize read_ret = fread(buffer, 1, fsize, fp);
fclose(fp);
if (read_ret != fsize) {
LOG_FATAL("fread failed read_ret %u != fsize %u", read_ret, fsize);
free(buffer);
return 1;
}
scc_lexer_t lexer;
scc_mem_probe_stream_t mem_stream = {0};
scc_probe_stream_t *stream =
scc_mem_probe_stream_init(&mem_stream, buffer, fsize, false);
Assert(stream != null);
scc_cstring_clear(&stream->name);
scc_cstring_append_cstr(&stream->name, file_name, strlen(file_name));
scc_lexer_init(&lexer, stream);
lexer_tok_t tok;
while (1) {
scc_lexer_get_valid_token(&lexer, &tok);
if (tok.type == SCC_TOK_EOF) {
break;
}
LOG_DEBUG("token `%s` at %s:%u:%u", scc_get_tok_name(tok.type),
scc_cstring_as_cstr(&tok.loc.name), tok.loc.line,
tok.loc.col);
Assert(tok.loc.offset <= fsize);
// LOG_DEBUG("%s", tok.val.str);
// printf("line: %d, column: %d, type: %3d, typename: %s\n",
// lexer.line, lexer.index, tok.type, scc_get_tok_name(tok.type));
}
free(buffer);
LOG_INFO("Lexer is Ok...");
return 0;
}

View File

@@ -0,0 +1,4 @@
tests/pp 复制了 TinyCC 的 tests/pp 测试
详情见 [README](tests/pp/README)

View File

@@ -0,0 +1,8 @@
[package]
name = "smcc_pprocesser"
dependencies = [
{ name = "libcore", path = "../../runtime/libcore" },
{ name = "libutils", path = "../../runtime/libutils" },
{ name = "smcc_lex_parser", path = "../lex_parser" },
]

View File

@@ -0,0 +1,30 @@
#ifndef __SMCC_PP_TOKEN_H__
#define __SMCC_PP_TOKEN_H__
/* clang-format off */
/// https://cppreference.cn/w/c/preprocessor
#define PP_INST_TOKEN \
X(define , PP_STD, PP_TOK_DEFINE ) \
X(undef , PP_STD, PP_TOK_UNDEF ) \
X(include , PP_STD, PP_TOK_INCLUDE ) \
X(if , PP_STD, PP_TOK_IF ) \
X(ifdef , PP_STD, PP_TOK_IFDEF ) \
X(ifndef , PP_STD, PP_TOK_IFNDEF ) \
X(else , PP_STD, PP_TOK_ELSE ) \
X(elif , PP_STD, PP_TOK_ELIF ) \
X(elifdef , PP_STD, PP_TOK_ELIFDEF ) \
X(elifndef , PP_C23, PP_TOK_ELIFNDEF ) \
X(endif , PP_STD, PP_TOK_ENDIF ) \
X(line , PP_STD, PP_TOK_LINE ) \
X(embed , PP_C23, PP_TOK_EMBED ) \
X(error , PP_STD, PP_TOK_ERROR ) \
X(warning , PP_C23, PP_TOK_WARNING ) \
X(pragma , PP_STD, PP_TOK_PRAMA ) \
// END
/* clang-format on */
#define X(name, type, tok) tok,
typedef enum pp_token { PP_INST_TOKEN } pp_token_t;
#undef X
#endif /* __SMCC_PP_TOKEN_H__ */

View File

@@ -0,0 +1,72 @@
// pprocessor.h - 更新后的头文件
/**
* @file pprocessor.h
* @brief C语言预处理器核心数据结构与接口
*/
#ifndef __SMCC_PP_H__
#define __SMCC_PP_H__
#include <libcore.h>
#include <libutils.h>
// 宏定义类型
typedef enum {
MACRO_OBJECT, // 对象宏
MACRO_FUNCTION, // 函数宏
} macro_type_t;
typedef VEC(cstring_t) macro_list_t;
// 宏定义结构
typedef struct smcc_macro {
cstring_t name; // 宏名称
macro_type_t type; // 宏类型
macro_list_t replaces; // 替换列表
macro_list_t params; // 参数列表(仅函数宏)
} smcc_macro_t;
// 条件编译状态
typedef enum {
IFState_NONE, // 不在条件编译中
IFState_TRUE, // 条件为真
IFState_FALSE, // 条件为假
IFState_ELSE // 已经执行过else分支
} if_state_t;
// 条件编译栈项
typedef struct if_stack_item {
if_state_t state;
int skip; // 是否跳过当前段
} if_stack_item_t;
// 预处理器状态结构
typedef struct smcc_preprocessor {
core_stream_t *stream; // 输出流
strpool_t strpool; // 字符串池
hashmap_t macros; // 宏定义表
VEC(if_stack_item_t) if_stack; // 条件编译栈
} smcc_pp_t;
/**
* @brief 初始化预处理器
* @param[out] pp 要初始化的预处理器实例
* @param[in] input 输入流对象指针
* @return output 输出流对象指针
*/
core_stream_t *pp_init(smcc_pp_t *pp, core_stream_t *input);
/**
* @brief 执行预处理
* @param[in] pp 预处理器实例
* @return 处理结果
*/
int pp_process(smcc_pp_t *pp);
/**
* @brief 销毁预处理器
* @param[in] pp 预处理器实例
*/
void pp_drop(smcc_pp_t *pp);
#endif /* __SMCC_PP_H__ */

View File

@@ -0,0 +1,427 @@
/**
* @file pprocessor.c
* @brief C语言预处理器实现
*/
#include <lex_parser.h>
#include <pp_token.h>
#include <pprocessor.h>
#define PPROCESSER_BUFFER_SIZE (1024)
static u32 hash_func(cstring_t *string) {
return smcc_strhash32(cstring_as_cstr(string));
}
static int hash_cmp(const cstring_t *str1, const cstring_t *str2) {
if (str1->size != str2->size) {
return str1->size - str2->size;
}
return smcc_strcmp(cstring_as_cstr(str1), cstring_as_cstr(str2));
}
// 添加宏定义
static void add_macro(smcc_pp_t *pp, const cstring_t *name,
const macro_list_t *replaces, const macro_list_t *params,
macro_type_t type) {
smcc_macro_t *macro = smcc_malloc(sizeof(smcc_macro_t));
macro->name = *name;
macro->type = type;
if (replaces) {
macro->replaces = *replaces;
} else {
vec_init(macro->replaces);
}
if (params) {
macro->params = *params;
} else {
vec_init(macro->params);
}
hashmap_set(&pp->macros, &macro->name, macro);
}
// 查找宏定义
static smcc_macro_t *find_macro(smcc_pp_t *pp, cstring_t *name) {
return hashmap_get(&pp->macros, name);
}
// 条件编译处理框架
static void handle_if(smcc_pp_t *pp, const char *condition) {
if_stack_item_t item;
int cond_value;
// cond_value = evaluate_condition(pp, condition);
item.state = cond_value ? IFState_TRUE : IFState_FALSE;
item.skip = !cond_value;
vec_push(pp->if_stack, item);
}
static void handle_else(smcc_pp_t *pp) {
if (pp->if_stack.size == 0) {
// 错误:没有匹配的#if
return;
}
if_stack_item_t *top = &vec_at(pp->if_stack, pp->if_stack.size - 1);
if (top->state == IFState_ELSE) {
// 错误:#else重复出现
return;
}
top->skip = !top->skip;
top->state = IFState_ELSE;
}
static void handle_include(smcc_pp_t *pp, const char *filename,
int system_header) {
// 查找文件路径逻辑
// 创建新的输入流
// 递归处理包含文件
}
// 解析标识符
static cstring_t parse_identifier(core_stream_t *stream) {
cstring_t identifier = cstring_new();
core_stream_reset_char(stream);
int ch = core_stream_peek_char(stream);
// 标识符以字母或下划线开头
if (!((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') || ch == '_')) {
LOG_WARN("Invalid identifier");
return identifier;
}
do {
cstring_push(&identifier, (char)ch);
core_stream_next_char(stream); // 消费字符
ch = core_stream_peek_char(stream);
} while ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') ||
(ch >= '0' && ch <= '9') || ch == '_');
return identifier;
}
// 跳过空白字符 ' ' and '\t'
static void skip_whitespace(core_stream_t *stream) {
int ch;
core_stream_reset_char(stream);
while ((ch = core_stream_peek_char(stream)) != core_stream_eof) {
if (ch == ' ' || ch == '\t') {
core_stream_next_char(stream);
} else {
break;
}
}
}
#define X(name, type, tok) SMCC_STR(name),
static const char *token_strings[] = {PP_INST_TOKEN};
#undef X
static const struct {
const char *name;
pp_token_t tok;
} keywords[] = {
#define X(name, type, tok) {#name, tok},
PP_INST_TOKEN
#undef X
};
// by using binary search to find the keyword
static inline int keyword_cmp(const char *name, int len) {
int low = 0;
int high = sizeof(keywords) / sizeof(keywords[0]) - 1;
while (low <= high) {
int mid = (low + high) / 2;
const char *key = keywords[mid].name;
int cmp = 0;
// 自定义字符串比较逻辑
for (int i = 0; i < len; i++) {
if (name[i] != key[i]) {
cmp = (unsigned char)name[i] - (unsigned char)key[i];
break;
}
if (name[i] == '\0')
break; // 遇到终止符提前结束
}
if (cmp == 0) {
// 完全匹配检查(长度相同)
if (key[len] == '\0')
return mid;
cmp = -1; // 当前关键词比输入长
}
if (cmp < 0) {
high = mid - 1;
} else {
low = mid + 1;
}
}
return -1; // Not a keyword.
}
typedef struct pp_stream {
core_stream_t stream;
core_stream_t *input;
smcc_pp_t *self;
usize size;
usize pos;
char buffer[PPROCESSER_BUFFER_SIZE];
} pp_stream_t;
static cbool parse_list(pp_stream_t *_stream, macro_list_t *list,
cbool is_param) {
Assert(_stream != null);
core_stream_t *stream = _stream->input;
Assert(stream != null);
core_stream_reset_char(stream);
vec_init(*list);
int ch;
cstring_t str = cstring_new();
core_pos_t pos;
while ((ch = core_stream_peek_char(stream)) != core_stream_eof) {
if (is_param) {
// ( 参数 ) ( 参数, ... ) ( ... )
if (lex_parse_is_whitespace(ch)) {
// TODO #define ( A A , B ) need ERROR
lex_parse_skip_whitespace(stream, &pos);
core_stream_reset_char(stream);
} else if (ch == ',') {
vec_push(*list, str);
str = cstring_new();
core_stream_next_char(stream);
continue;
} else if (ch == ')') {
break;
} else if (ch == core_stream_eof || lex_parse_is_endline(ch)) {
LOG_ERROR("Invalid parameter list");
return false;
}
} else {
// 替换列表
if (lex_parse_is_whitespace(ch)) {
lex_parse_skip_whitespace(stream, &pos);
vec_push(*list, str);
str = cstring_new();
core_stream_reset_char(stream);
continue;
} else if (lex_parse_is_endline(ch)) {
break;
}
}
core_stream_next_char(stream);
cstring_push(&str, (char)ch);
}
vec_push(*list, str);
str = cstring_new();
return true;
}
// 解析预处理指令
static void parse_directive(pp_stream_t *_stream) {
Assert(_stream != null);
core_stream_t *stream = _stream->input;
Assert(stream != null);
int ch;
core_pos_t pos;
core_stream_reset_char(stream);
// 跳过 '#' 和后续空白
if (core_stream_peek_char(stream) != '#') {
LOG_WARN("Invalid directive");
return;
}
core_stream_next_char(stream);
// TODO 允许空指令(# 后跟换行符),且无任何效果。
skip_whitespace(stream);
// 解析指令名称
cstring_t directive = parse_identifier(stream);
if (cstring_is_empty(&directive)) {
LOG_ERROR("expected indentifier");
goto ERR;
}
skip_whitespace(stream);
core_stream_reset_char(stream);
pp_token_t token =
keyword_cmp(cstring_as_cstr(&directive), cstring_len(&directive));
switch (token) {
case PP_TOK_DEFINE: {
cstring_t name = parse_identifier(stream);
if (cstring_is_empty(&name)) {
LOG_ERROR("expected indentifier");
goto ERR;
}
skip_whitespace(stream);
core_stream_reset_char(stream);
int ch = core_stream_peek_char(stream);
if (ch == '(') {
macro_list_t params;
parse_list(_stream, &params, true);
ch = core_stream_next_char(stream);
if (ch != ')') {
}
goto ERR;
}
macro_list_t replacement;
parse_list(_stream, &replacement, false);
add_macro(_stream->self, &name, &replacement, NULL, MACRO_OBJECT);
break;
}
case PP_TOK_UNDEF:
case PP_TOK_INCLUDE:
case PP_TOK_IF:
case PP_TOK_IFDEF:
case PP_TOK_IFNDEF:
case PP_TOK_ELSE:
case PP_TOK_ELIF:
case PP_TOK_ELIFDEF:
case PP_TOK_ELIFNDEF:
case PP_TOK_ENDIF:
case PP_TOK_LINE:
case PP_TOK_EMBED:
case PP_TOK_ERROR:
case PP_TOK_WARNING:
case PP_TOK_PRAMA:
TODO();
break;
default:
LOG_WARN("Unknown preprocessor directive: %s",
cstring_as_cstr(&directive));
}
// TODO: win \r\n linux \n mac \r => all need transport to \n
core_stream_reset_char(stream);
lex_parse_skip_line(stream, &pos);
cstring_free(&directive);
return;
ERR:
// TODO skip line
LOG_FATAL("Unhandled preprocessor directive");
}
static inline void stream_push_string(pp_stream_t *stream, cstring_t *str) {
stream->size += cstring_len(str);
Assert(stream->size <= PPROCESSER_BUFFER_SIZE);
smcc_memcpy(stream->buffer, cstring_as_cstr(str), stream->size);
}
static inline void stream_push_char(pp_stream_t *stream, int ch) {
stream->buffer[stream->size++] = ch;
Assert(stream->size <= PPROCESSER_BUFFER_SIZE);
}
static int next_char(core_stream_t *_stream) {
pp_stream_t *stream = (pp_stream_t *)_stream;
Assert(stream != null);
READ_BUF:
if (stream->size != 0) {
if (stream->pos < stream->size) {
return stream->buffer[stream->pos++];
} else {
stream->size = 0;
stream->pos = 0;
}
}
RETRY:
core_stream_reset_char(stream->input);
int ch = core_stream_peek_char(stream->input);
if (ch == '#') {
parse_directive(stream);
goto RETRY;
} else if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') ||
ch == '_') {
cstring_t identifier = parse_identifier(stream->input);
smcc_macro_t *macro = find_macro(stream->self, &identifier);
if (macro == null) {
stream_push_string(stream, &identifier);
cstring_free(&identifier);
goto READ_BUF;
} else {
cstring_free(&identifier);
}
if (macro->type == MACRO_OBJECT) {
for (usize i = 0; i < macro->replaces.size; ++i) {
stream_push_string(stream, &vec_at(macro->replaces, i));
// usize never using `-`
if (i + 1 < macro->replaces.size)
stream_push_char(stream, ' ');
}
goto READ_BUF;
} else if (macro->type == MACRO_FUNCTION) {
TODO();
}
UNREACHABLE();
}
return core_stream_next_char(stream->input);
}
static core_stream_t *pp_stream_init(smcc_pp_t *self, core_stream_t *input) {
pp_stream_t *stream = smcc_malloc(sizeof(pp_stream_t));
if (stream == null) {
LOG_FATAL("Failed to allocate memory for output stream");
}
if (stream == null || self == null) {
return null;
}
stream->self = self;
stream->input = input;
stream->size = 0;
stream->pos = 0;
stream->stream.name = cstring_from_cstr("pipe_stream");
stream->stream.free_stream = null;
stream->stream.next_char = next_char;
stream->stream.peek_char = null;
stream->stream.reset_char = null;
stream->stream.read_buf = null;
return (core_stream_t *)stream;
}
core_stream_t *pp_init(smcc_pp_t *pp, core_stream_t *input) {
if (pp == null || input == null) {
return null;
}
core_mem_stream_t *stream = smcc_malloc(sizeof(core_mem_stream_t));
if (stream == null) {
LOG_FATAL("Failed to allocate memory for output stream");
}
pp->stream = pp_stream_init(pp, input);
Assert(pp->stream != null);
hashmap_init(&pp->macros);
pp->macros.hash_func = (u32 (*)(const void *))hash_func;
pp->macros.key_cmp = (int (*)(const void *, const void *))hash_cmp;
return pp->stream;
}
// 销毁预处理器
void pp_drop(smcc_pp_t *pp) {
if (pp == NULL)
return;
// 清理所有宏定义
// 注意:需要实现 hashmap 的迭代和清理函数
hashmap_drop(&pp->macros);
// 清理字符串池
// strpool_destroy(&pp->strpool);
// 清理条件编译栈
// 需要释放栈中每个元素的资源(如果有的话)
// vec_free(pp->if_stack);
// 清理文件名
cstring_free(&pp->stream->name);
}

View File

@@ -0,0 +1,6 @@
#define hash_hash # ## #
#define mkstr(a) # a
#define in_between(a) mkstr(a)
#define join(c, d) in_between(c hash_hash d)
char p[] = join(x, y);
// char p[] = "x ## y";

View File

@@ -0,0 +1 @@
char p[] = "x ## y";

View File

@@ -0,0 +1,28 @@
#define x 3
#define f(a) f(x * (a))
#undef x
#define x 2
#define g f
#define z z[0]
#define h g(~
#define m(a) a(w)
#define w 0,1
#define t(a) a
#define p() int
#define q(x) x
#define r(x,y) x ## y
#define str(x) # x
f(y+1) + f(f(z)) % t(t(g)(0) + t)(1);
g(x+(3,4)-w) | h 5) & m
(f)^m(m);
char c[2][6] = { str(hello), str() };
/*
* f(2 * (y+1)) + f(2 * (f(2 * (z[0])))) % f(2 * (0)) + t(1);
* f(2 * (2+(3,4)-0,1)) | f(2 * (~ 5)) & f(2 * (0,1))^m(0,1);
* char c[2][6] = { "hello", "" };
*/
#define L21 f(y+1) + f(f(z)) % t(t(g)(0) + t)(1);
#define L22 g(x+(3,4)-w) | h 5) & m\
(f)^m(m);
L21
L22

View File

@@ -0,0 +1,5 @@
f(2 * (y+1)) + f(2 * (f(2 * (z[0])))) % f(2 * (0)) + t(1);
f(2 * (2 +(3,4)-0,1)) | f(2 * (~ 5)) & f(2 * (0,1))^m(0,1);
char c[2][6] = { "hello", "" };
f(2 * (y+1)) + f(2 * (f(2 * (z[0])))) % f(2 * (0)) + t(1);
f(2 * (2 +(3,4)-0,1)) | f(2 * (~ 5)) & f(2 * (0,1))^m(0,1);

View File

@@ -0,0 +1,15 @@
#define str(s) # s
#define xstr(s) str(s)
#define debug(s, t) printf("x" # s "= %d, x" # t "= %s", \
x ## s, x ## t)
#define INCFILE(n) vers ## n
#define glue(a, b) a ## b
#define xglue(a, b) glue(a, b)
#define HIGHLOW "hello"
#define LOW LOW ", world"
debug(1, 2);
fputs(str(strncmp("abc\0d", "abc", '\4') // this goes away
== 0) str(: @\n), s);
\#include xstr(INCFILE(2).h)
glue(HIGH, LOW);
xglue(HIGH, LOW)

View File

@@ -0,0 +1,5 @@
printf("x" "1" "= %d, x" "2" "= %s", x1, x2);
fputs("strncmp(\"abc\\0d\", \"abc\", '\\4') == 0" ": @\n", s);
\#include "vers2.h"
"hello";
"hello" ", world"

View File

@@ -0,0 +1,4 @@
#define foobar 1
#define C(x,y) x##y
#define D(x) (C(x,bar))
D(foo)

View File

@@ -0,0 +1 @@
(1)

View File

@@ -0,0 +1,7 @@
#define t(x,y,z) x ## y ## z
#define xxx(s) int s[] = { t(1,2,3), t(,4,5), t(6,,7), t(8,9,), \
t(10,,), t(,11,), t(,,12), t(,,) };
int j[] = { t(1,2,3), t(,4,5), t(6,,7), t(8,9,),
t(10,,), t(,11,), t(,,12), t(,,) };
xxx(j)

View File

@@ -0,0 +1,3 @@
int j[] = { 123, 45, 67, 89,
10, 11, 12, };
int j[] = { 123, 45, 67, 89, 10, 11, 12, };

View File

@@ -0,0 +1,5 @@
#define X(a,b, \
c,d) \
foo
X(1,2,3,4)

View File

@@ -0,0 +1 @@
foo

View File

@@ -0,0 +1,4 @@
#define a() YES
#define b() a
b()
b()()

View File

@@ -0,0 +1,2 @@
a
YES

View File

@@ -0,0 +1,4 @@
// test macro expansion in arguments
#define s_pos s_s.s_pos
#define foo(x) (x)
foo(hej.s_pos)

View File

@@ -0,0 +1 @@
(hej.s_s.s_pos)

View File

@@ -0,0 +1,4 @@
#define C(a,b,c) a##b##c
#define N(x,y) C(x,_,y)
#define A_O aaaaoooo
N(A,O)

Some files were not shown because too many files have changed in this diff Show More