Two more fixes for non-UTF-8 tests by aitap · Pull Request #7681 · Rdatatable/data.table

aitap · 2026-03-22T13:43:23Z

Tests 1966.* failed on my Windows 7 VM where I test data.table with old versions of R. ö cannot be represented in CP1251, and enc2native() converted it to a plain unaccented o. If the characters cannot be represented in the ANSI encoding, we might as well skip the tests. (What if it returns NA or ? on a different system?)

Test 1164.1 shouldn't require the characters to be represented in the native encoding, because it only uses UTF-8 and Latin-1. Both match() and chmatch() offer a strong enough guarantee. Tested on the same Windows 7 VM, and also using LC_ALL=zh_CN.gb2312 luit R CMD check (GB2312 doesn't have ä or ß) and LC_ALL=C on GNU/Linux.

Not all ANSI encodings can represent accented Latin characters. For non-representable strings, enc2native() may substitute a different character (observed: "o" instead of "o with an umlaut), which fails a later comparison of the file name with the original, non-converted string.

Test 1164.1 should pass with UTF-8 encoded strings without converting them to the native encoding.

codecov · 2026-03-22T14:37:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (7db13b9) to head (6c0c24e).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7681   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          87       87           
  Lines       17031    17031           
=======================================
  Hits        16868    16868           
  Misses        163      163

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

aitap added 2 commits March 22, 2026 16:23

Drop native encoding requirement

6c0c24e

Test 1164.1 should pass with UTF-8 encoded strings without converting them to the native encoding.

aitap requested a review from MichaelChirico as a code owner March 22, 2026 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two more fixes for non-UTF-8 tests#7681

Two more fixes for non-UTF-8 tests#7681
aitap wants to merge 2 commits intomasterfrom
native_file_enc

aitap commented Mar 22, 2026

Uh oh!

codecov bot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aitap commented Mar 22, 2026

Uh oh!

codecov bot commented Mar 22, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant